: > when we facet on the authors, we start with
: > that list and go in order, generating their facet constraint count using
: > the DocSet intersection just like we currently do ... if we reach our
: > facet.limit before we reach the end of hte list and the lowest constraint
: > count is higher then the total doc count of the last author in the list,
: > then we know we don't need to bother testing any other Author, because no
: > other author an possibly have a higher facet constraint count then the
: > ones on our list
:
: This works OK if the intersection counts are high (as a percentage of
: the facet sets).  I'm not sure how often this will be the case though.

well, keep in mind "N" could be very big, big enough to store the full
list of Terms sorted in docFreq order (it shouldn't take up much space
since it's just hte Term and an int)e ... for any query that returns a
"large" number of results, you probably won't need to reach the end of the
list before you can tell that all the remaining Terms have a lower docFreq
then the current last constraint count in your facet.limit list.  For
queries that return a "small" number of results, it wouldn't be as
usefull, but thats where a switch could be fliped to start with the values
mapped to hte docs (using FieldCache -- assuming single-value fields)

: Another tradeoff is to allow getting inexact counts with multi-token fields 
by:
:  - simply faceting on the most popular values
:    OR
:  - do some sort of statistical sampling by reading term vectors for a
: fraction of the matching docs.

i loath inexact counts ... i think of them as "Astrology" to the Astronomy
of true Faceted Searching ... but i'm sure they would be "good enough" for
some peoples use cases.



-Hoss

Reply via email to