It is using the cache, but the number of items is larger than the size of the cache.
If you want to continue to use the filter method then you need to increase the size of the filter cache to something larger than the number of unique values than what you are filtering on. I don't know if you will have enough memory to take this approach or not. The second option is to make brand/manu a non-multi-valued string type. When you do that, Solr will use a different method to calculate the facet counts (it will use the FieldCache rather than filters). You would need to reindex to try this approach. -Yonik On 12/6/06, Gmail Account <[EMAIL PROTECTED]> wrote:
I reindexed and optimized and it helped. However now each query averages about 1 second(down from 3-4 seconds). The bottleneck now is the getFacetTermEnumCounts function. If I take that call out it is a non measurable query time and the filtercache is being used. With the getFacetTermEnumCounts in, the filter cache after three queries is below with the hitration at 0 and everything is being evicted. This call is for the brand/manufacturer so I'm sure it is going through many thousands of queries. I'm thinking about pre-processing the brand/manu to get a small set of top brands per category and just quering them no matter what the other facets are set to.(with certain filters, no brands will be shown) If I still want to call the getFacetTermEnumCounts for ALL brands, why is it not using the cache? lookups : 32849 hits : 0 hitratio : 0.00 inserts : 32850 evictions : 32338 size : 512 cumulative_lookups : 32849 cumulative_hits : 0 cumulative_hitratio : 0.00 cumulative_inserts : 32850 cumulative_evictions : 32338 Thanks, Mike ----- Original Message ----- From: "Yonik Seeley" <[EMAIL PROTECTED]> To: <solr-user@lucene.apache.org> Sent: Tuesday, December 05, 2006 8:46 PM Subject: Re: Performance issue. > On 12/5/06, Gmail Account <[EMAIL PROTECTED]> wrote: >> > There's nothing wrong with CPU jumping to 100% each query, that just >> > means you aren't IO bound :-) >> What do you mean not IO bound? > > There is always going to be a bottleneck somewhere. In very large > indicies, the bottleneck may be waiting for IO (waiting for data to be > read from the disk). If you are on a single processor system and you > aren't waiting for data to be read from the disk or the network, then > the request will be using close to 100% CPU, which is actually a good > thing. > > The bad thing is how long the query takes, not the fact that it's CPU > bound. > >> >> > - I did an optimize index through Luke with compound format and >> >> > noticed >> >> > in the solrconfig file that useCompoundFile is set to false. >> > >> > Don't do this unless you really know what you are doing... Luke is >> > probably using a different version of Lucene than Solr, and it could >> > be dangerous. >> Do you think I should reindex everything? > > That would be the safest thing to do. > >> > - if you are using filters, any larger than 3000 will be double the >> > size (maxDoc bits) >> What do you mean larger than 3000? 3000 what and how do I tell? > > From solrconfig.xml: > <!-- This entry enables an int hash representation for filters > (DocSets) > when the number of items in the set is less than maxSize. For > smaller > sets, this representation is more memory efficient, more efficient > to > iterate over, and faster to take intersections. --> > <HashDocSet maxSize="3000" loadFactor="0.75"/> > > The key is that the memory consumed by a HashDocSet is independent of > maxDoc (the maximum internal lucene docid), but a BitSet based set has > maxDoc bits in it. Thus, an unoptimized index with more deleted > documents causes a higher maxDoc and higher memory usage for any > BitSet based filters. > > -Yonik