: It may not even be necessary to cache this type of lookup since it is : simply a TermEnum through specific fields in the index. Maybe simply : doing the TermEnum in the request handler instead of iterating : through a cache would be just as fast or faster. Any thoughts on that?
While commuting I've been letting my brain bounce arround various ideas for a completley generic totally reusable faceting request handler, and I've been mulling over teh same question ... my current theory is that it might make sense to cache a bounded Priority queue of the Terms for each faceting field where the priority is determined by the docFreq, and the size is configurable. that way you can start with the values in the queue and if/when you reach a point where the docFreq of the next item in the queue is less then the lowest intersection count you've found so far, and you already have as many items as you want to display, you don't have to bother checking all of the other values (and you don't have to bother with the TermEnum unless you completely exhaust the queue) : My next challenge is to re-implement the catch-all facets that I used : to do by unioning all documents in an (Open)BitSet and inverting it. : How can I invert a DocSet (I realize I gat get the bits and do it : that way, but is there a better way)? well, the most obvious solution i can think of would be a patch adding an invert() method to DocSet, HashDocSet and BitDocSet. :) there was some discussion about this on the list previously if i recall correctly. -Hoss