Imagine we have following sorted table for a whole Lucene index on term values for a field, and a need to have top-10 facets for large resultset:
Digital: 1,700,000 Books: 1,000,000 Computers: 900,000 ... (term:count) sorted in desc order by "count" for a whole index; FilterCache etc Suppose that we have 10,000,000 documents in an index. Simple math: if query results size is higher than (10,000,000 1,700,000) it will intersect with Digital. Then, execute single top-10 DocSet intersection calcs instead of typical thousands (or even terms counting for top-10 terms only instead of thousands)... What is probability that intersection with Digital is too small, and somewhere at bottom (after-top-10) we have larger intersection which we have missed? Again, if size of first intersection is smaller than some value (which Math Stats can predict exactly with probability to be true = 0.999) let say smaller than 170,000 we can predict necessity of counting top-20 and filtering to top-10 P.S. Similar to "pessimistic concurrency" vs. "optimistic"... Fuad Efendi ================================== http://www.linkedin.com/in/liferay http://www.tokenizer.org http://www.casaGURU.com ================================== -----Original Message----- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: August-21-09 12:59 PM To: solr-user@lucene.apache.org Subject: Re: [ANNOUNCEMENT] Newly released book: Solr 1.4 Enterprise Search Server It seems possible to cache the results of facet queries on a per segment basis, providing the caching you're describing. On Fri, Aug 21, 2009 at 8:42 AM, Fuad Efendi<f...@efendi.ca> wrote: >>actually a hybrid that goes back to DocSet intersections when it's more > efficient > > I noticed that too when I played with it, for large query results DocSet > intersections are de-facto standard; but when "faceting" started CNET had > only 400,000 documents :) > Nowadays even 2-3 seconds response time is bad... may be storing all users' > queries and executing some tasks on background (storing "facets" in a > database similar to heavy warehouse, predicting facet counts depending on > query terms and domain analysis, and etc)? > > > On Fri, Aug 21, 2009 at 11:25 AM, Fuad Efendi<f...@efendi.ca> wrote: >> I was joking [off-topic]; "faceting" as a DocSet intersections' replaced > by >> trivial term count calcs which is extremely faster in some (if not all) > use >> cases, including possibly even NON-tokenized (with standard faceting we > can >> use FilterCache)... > > One size does not fit all. The enum method is not outdated or > deprecated, and still works better in some scenarios. The new > faceting code is actually a hybrid that goes back to DocSet > intersections when it's more efficient. > > -Yonik > http://www.lucidimagination.com > > >