Your snippet shows it as "text" not "string" Try faceting on manu_exact and you may get better results.
-Yonil On 12/6/06, Gmail Account <[EMAIL PROTECTED]> wrote:
It is currently a string type. Here is everything that has to do with manu in my schema... Should it have been multi-valued? Do you see anything wrong with this? <field name="manu" type="text" indexed="true" stored="true"/> <!-- copied from "manu" via copyField --> <field name="manu_exact" type="string" indexed="true" stored="true"/> <field name="text" type="text" indexed="true" stored="false" multiValued="true"/> ..... <copyField source="manu" dest="text"/> <copyField source="manu" dest="manu_exact"/> Thanks... ----- Original Message ----- From: "Yonik Seeley" <[EMAIL PROTECTED]> To: <solr-user@lucene.apache.org> Sent: Wednesday, December 06, 2006 9:55 PM Subject: Re: Performance issue. > It is using the cache, but the number of items is larger than the size > of the cache. > > If you want to continue to use the filter method then you need to > increase the size of the filter cache to something larger than the > number of unique values than what you are filtering on. I don't know > if you will have enough memory to take this approach or not. > > The second option is to make brand/manu a non-multi-valued string > type. When you do that, Solr will use a different method to calculate > the facet counts (it will use the FieldCache rather than filters). > You would need to reindex to try this approach. > > -Yonik > > On 12/6/06, Gmail Account <[EMAIL PROTECTED]> wrote: >> I reindexed and optimized and it helped. However now each query averages >> about 1 second(down from 3-4 seconds). The bottleneck now is the >> getFacetTermEnumCounts function. If I take that call out it is a non >> measurable query time and the filtercache is being used. With the >> getFacetTermEnumCounts in, the filter cache after three queries is below >> with the hitration at 0 and everything is being evicted. This call is for >> the brand/manufacturer so I'm sure it is going through many thousands of >> queries. I'm thinking about pre-processing the brand/manu to get a small >> set >> of top brands per category and just quering them no matter what the other >> facets are set to.(with certain filters, no brands will be shown) If I >> still want to call the getFacetTermEnumCounts for ALL brands, why is it >> not >> using the cache? >> >> >> lookups : 32849 >> hits : 0 >> hitratio : 0.00 >> inserts : 32850 >> evictions : 32338 >> size : 512 >> cumulative_lookups : 32849 >> cumulative_hits : 0 >> cumulative_hitratio : 0.00 >> cumulative_inserts : 32850 >> cumulative_evictions : 32338 >> >> >> Thanks, >> Mike >> ----- Original Message ----- >> From: "Yonik Seeley" <[EMAIL PROTECTED]> >> To: <solr-user@lucene.apache.org> >> Sent: Tuesday, December 05, 2006 8:46 PM >> Subject: Re: Performance issue. >> >> >> > On 12/5/06, Gmail Account <[EMAIL PROTECTED]> wrote: >> >> > There's nothing wrong with CPU jumping to 100% each query, that just >> >> > means you aren't IO bound :-) >> >> What do you mean not IO bound? >> > >> > There is always going to be a bottleneck somewhere. In very large >> > indicies, the bottleneck may be waiting for IO (waiting for data to be >> > read from the disk). If you are on a single processor system and you >> > aren't waiting for data to be read from the disk or the network, then >> > the request will be using close to 100% CPU, which is actually a good >> > thing. >> > >> > The bad thing is how long the query takes, not the fact that it's CPU >> > bound. >> > >> >> >> > - I did an optimize index through Luke with compound format >> >> >> > and >> >> >> > noticed >> >> >> > in the solrconfig file that useCompoundFile is set to false. >> >> > >> >> > Don't do this unless you really know what you are doing... Luke is >> >> > probably using a different version of Lucene than Solr, and it could >> >> > be dangerous. >> >> Do you think I should reindex everything? >> > >> > That would be the safest thing to do. >> > >> >> > - if you are using filters, any larger than 3000 will be double the >> >> > size (maxDoc bits) >> >> What do you mean larger than 3000? 3000 what and how do I tell? >> > >> > From solrconfig.xml: >> > <!-- This entry enables an int hash representation for filters >> > (DocSets) >> > when the number of items in the set is less than maxSize. For >> > smaller >> > sets, this representation is more memory efficient, more >> > efficient >> > to >> > iterate over, and faster to take intersections. --> >> > <HashDocSet maxSize="3000" loadFactor="0.75"/> >> > >> > The key is that the memory consumed by a HashDocSet is independent of >> > maxDoc (the maximum internal lucene docid), but a BitSet based set has >> > maxDoc bits in it. Thus, an unoptimized index with more deleted >> > documents causes a higher maxDoc and higher memory usage for any >> > BitSet based filters. >> > >> > -Yonik