On Jun 24, 2006, at 4:29 PM, Yonik Seeley wrote:
On 6/24/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
This weekend :)   I have imported more data than my hacked
implementation can handle without bumping up Jetty's JVM heap size,
so I'm now at the point where it is necessary for me to start using
the LRUCache.  Though I have already refactored to use OpenBitSet
instead of BitSet.

You can also fit more in mem if you can use DocSet (HashDocSet) for
smaller sets.  This will also speed up intersection counts.  This is
done automatically when you get the DocSet from Solr, or if numDocs()
is used.

Thanks for this advice, Yonik. I've refactored (but not committed yet, for those that may be looking to see what I've done) the caching. The cache (currently a single HashMap) is built keyed by field name, with nested HashMap's keyed by field value. The inner map used to contain BitSets, then OpenBitSets, but now it contains only TermQuery's. Now I simply use SolrIndexSearcher.getDocSet (query) and rely on the existing query caching. The only thing my custom cache puts into RAM now is this HashMap of all faceted fields, values, and associated TermQuery's. At some point that might even become an issue, but maybe not.

It may not even be necessary to cache this type of lookup since it is simply a TermEnum through specific fields in the index. Maybe simply doing the TermEnum in the request handler instead of iterating through a cache would be just as fast or faster. Any thoughts on that?

Either way, at the moment things are screaming fast and memory is pleasantly under control.

My next challenge is to re-implement the catch-all facets that I used to do by unioning all documents in an (Open)BitSet and inverting it. How can I invert a DocSet (I realize I gat get the bits and do it that way, but is there a better way)?

        Erik

Reply via email to