: > when we facet on the authors, we start with : > that list and go in order, generating their facet constraint count using : > the DocSet intersection just like we currently do ... if we reach our : > facet.limit before we reach the end of hte list and the lowest constraint : > count is higher then the total doc count of the last author in the list, : > then we know we don't need to bother testing any other Author, because no : > other author an possibly have a higher facet constraint count then the : > ones on our list : : This works OK if the intersection counts are high (as a percentage of : the facet sets). I'm not sure how often this will be the case though.
well, keep in mind "N" could be very big, big enough to store the full list of Terms sorted in docFreq order (it shouldn't take up much space since it's just hte Term and an int)e ... for any query that returns a "large" number of results, you probably won't need to reach the end of the list before you can tell that all the remaining Terms have a lower docFreq then the current last constraint count in your facet.limit list. For queries that return a "small" number of results, it wouldn't be as usefull, but thats where a switch could be fliped to start with the values mapped to hte docs (using FieldCache -- assuming single-value fields) : Another tradeoff is to allow getting inexact counts with multi-token fields by: : - simply faceting on the most popular values : OR : - do some sort of statistical sampling by reading term vectors for a : fraction of the matching docs. i loath inexact counts ... i think of them as "Astrology" to the Astronomy of true Faceted Searching ... but i'm sure they would be "good enough" for some peoples use cases. -Hoss