Imagine we have following sorted table for a whole Lucene index on term
values for a field, and a need to have top-10 facets for large resultset:

Digital: 1,700,000
Books: 1,000,000
Computers: 900,000
...
(term:count) sorted in desc order by "count" for a whole index; FilterCache
etc

Suppose that we have 10,000,000 documents in an index.

Simple math: if query results size is higher than (10,000,000 – 1,700,000) –
it will intersect with Digital. Then, execute single top-10 DocSet
intersection calcs instead of typical thousands (or even terms counting for
top-10 terms only instead of thousands)... What is probability that
intersection with Digital is too small, and somewhere at bottom
(after-top-10) we have larger intersection which we have missed?  Again, if
size of first intersection is smaller than some value (which Math Stats can
predict exactly with “probability to be true = 0.999”) let say smaller than
170,000 – we can predict necessity of counting top-20 and filtering to
top-10

P.S.
Similar to "pessimistic concurrency" vs. "optimistic"...


Fuad Efendi
==================================
http://www.linkedin.com/in/liferay
http://www.tokenizer.org
http://www.casaGURU.com
==================================

-----Original Message-----
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: August-21-09 12:59 PM
To: solr-user@lucene.apache.org
Subject: Re: [ANNOUNCEMENT] Newly released book: Solr 1.4 Enterprise Search
Server

It seems possible to cache the results of facet queries on a per
segment basis, providing the caching you're describing.

On Fri, Aug 21, 2009 at 8:42 AM, Fuad Efendi<f...@efendi.ca> wrote:
>>actually a hybrid that goes back to DocSet intersections when it's more
> efficient
>
> I noticed that too when I played with it, for large query results DocSet
> intersections are de-facto standard; but when "faceting" started CNET had
> only 400,000 documents :)
> Nowadays even 2-3 seconds response time is bad... may be storing all
users'
> queries and executing some tasks on background (storing "facets" in a
> database similar to heavy warehouse, predicting facet counts depending on
> query terms and domain analysis, and etc)?
>
>
> On Fri, Aug 21, 2009 at 11:25 AM, Fuad Efendi<f...@efendi.ca> wrote:
>> I was joking [off-topic]; "faceting" as a DocSet intersections' replaced
> by
>> trivial term count calcs which is extremely faster in some (if not all)
> use
>> cases, including possibly even NON-tokenized (with standard faceting we
> can
>> use FilterCache)...
>
> One size does not fit all.  The enum method is not outdated or
> deprecated, and still works better in some scenarios.  The new
> faceting code is actually a hybrid that goes back to DocSet
> intersections when it's more efficient.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>


Reply via email to