On 3/1/07, Graham Stead <[EMAIL PROTECTED]> wrote:
Sorry for interloping, but I have been wondering the same thing as Ryan. On
my current index with ~6.1M docs, I restarted Solr and ran a query that
included faceting on 4 fields:

<snip>

Non-tokenized, single valued.

Then I added faceting on a 5th, multivalued field:

QTime: 65551
numFound: 25908
Filtercache stats:
        lookups : 1898314
        hits : 1
        hitratio : 0.00
        inserts : 1898314
        evictions : 1897802
        size : 512
        cumulative_lookups : 1898314
        cumulative_hits : 1
        cumulative_hitratio : 0.00
        cumulative_inserts : 1898314
        cumulative_evictions : 1897802


I realize there are a lot of different values in the 5th multivalued field.
But this is where I'm fuzzy: are we saying there would be no difference
using a tokenized, single valued field versus a multivalued field? Or are we
saying that multivalued is ok, as long as the number of values is less than
the filterCache size? [Unfortunately I don't have a single valued version of
this field to test with]

For non- singled-valued, untokenized fields, all[1] that matters is
the number of "things" faceted on.  Whether these things are arbitrary
queries, tokens from tokenized fields or multiple values in
untokenized fields is moot.  You've got 2million values, which implies
the construction of 2million filters and an intersection with the main
query docset.  Even if you enlarge the filter cache to contain all 2m
filtters, you still require time to do 2m set intersections.  This may
take too long if the filters are all small.

As a point of comparison, here is a query that returned ~200k docs and
faceted against 70 facets with roughly 140k docs in each filter
(cached):

329.0   total time
 0.0    set up/parsing
 125.0          main query
 46.0   faceting
 100.0          optimized pre-fetch
 58.0   debug

Times are in milliseconds.  I've found breaking down the timing rather
useful since I have huge stored docs and non-query-related tasks often
take up big chunks of time.  I could contribute it if anyone else
would find it useful.

-Mike



[1] well, much, if not all.

Reply via email to