Sorry for interloping, but I have been wondering the same thing as Ryan. On
my current index with ~6.1M docs, I restarted Solr and ran a query that
included faceting on 4 fields:

QTime: 5712
numFound: 25908
filterCache stats:
        lookups : 0
        hits : 0
        hitratio : 0.00
        inserts : 1
        evictions : 0
        size : 1
        cumulative_lookups : 0
        cumulative_hits : 0
        cumulative_hitratio : 0.00
        cumulative_inserts : 1
        cumulative_evictions : 0 

Then I added faceting on a 5th, multivalued field:

QTime: 65551
numFound: 25908
Filtercache stats:
        lookups : 1898314
        hits : 1
        hitratio : 0.00
        inserts : 1898314
        evictions : 1897802
        size : 512
        cumulative_lookups : 1898314
        cumulative_hits : 1
        cumulative_hitratio : 0.00
        cumulative_inserts : 1898314
        cumulative_evictions : 1897802


I realize there are a lot of different values in the 5th multivalued field.
But this is where I'm fuzzy: are we saying there would be no difference
using a tokenized, single valued field versus a multivalued field? Or are we
saying that multivalued is ok, as long as the number of values is less than
the filterCache size? [Unfortunately I don't have a single valued version of
this field to test with]

Thanks,
-Graham

> I'll be interested in seeing some numbers.  The number of 
> documents matching the base query and filters will also 
> factor in (small will be HashDocSet, large will be BitDocSet).
> 
> Just make sure to run all of your facets, then check the 
> statistics page to see how big you need to make the 
> filterCache to hold them all (and add a little extra for 
> random filters).  The access pattern for the faceting code is 
> worst case for the LRU cache, so it needs to avoid any evictions.
> 
> -Yonik


Reply via email to