Re: Faceted Browsing questions

Erik Hatcher Mon, 26 Jun 2006 16:29:23 -0700


On Jun 24, 2006, at 4:29 PM, Yonik Seeley wrote:

On 6/24/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:

This weekend :)   I have imported more data than my hacked
implementation can handle without bumping up Jetty's JVM heap size,
so I'm now at the point where it is necessary for me to start using
the LRUCache.  Though I have already refactored to use OpenBitSet
instead of BitSet.


You can also fit more in mem if you can use DocSet (HashDocSet) for
smaller sets.  This will also speed up intersection counts.  This is
done automatically when you get the DocSet from Solr, or if numDocs()
is used.

Thanks for this advice, Yonik. I've refactored (but not committedyet, for those that may be looking to see what I've done) thecaching. The cache (currently a single HashMap) is built keyed byfield name, with nested HashMap's keyed by field value. The innermap used to contain BitSets, then OpenBitSets, but now it containsonly TermQuery's. Now I simply use SolrIndexSearcher.getDocSet(query) and rely on the existing query caching. The only thing mycustom cache puts into RAM now is this HashMap of all faceted fields,values, and associated TermQuery's. At some point that might evenbecome an issue, but maybe not.

It may not even be necessary to cache this type of lookup since it issimply a TermEnum through specific fields in the index. Maybe simplydoing the TermEnum in the request handler instead of iteratingthrough a cache would be just as fast or faster. Any thoughts on that?

Either way, at the moment things are screaming fast and memory ispleasantly under control.

My next challenge is to re-implement the catch-all facets that I usedto do by unioning all documents in an (Open)BitSet and inverting it.How can I invert a DocSet (I realize I gat get the bits and do itthat way, but is there a better way)?


        Erik

Re: Faceted Browsing questions

Reply via email to