On 3/1/07, Graham Stead <[EMAIL PROTECTED]> wrote:
Sorry for interloping, but I have been wondering the same thing as Ryan. On my current index with ~6.1M docs, I restarted Solr and ran a query that included faceting on 4 fields:
<snip> Non-tokenized, single valued.
Then I added faceting on a 5th, multivalued field: QTime: 65551 numFound: 25908 Filtercache stats: lookups : 1898314 hits : 1 hitratio : 0.00 inserts : 1898314 evictions : 1897802 size : 512 cumulative_lookups : 1898314 cumulative_hits : 1 cumulative_hitratio : 0.00 cumulative_inserts : 1898314 cumulative_evictions : 1897802 I realize there are a lot of different values in the 5th multivalued field. But this is where I'm fuzzy: are we saying there would be no difference using a tokenized, single valued field versus a multivalued field? Or are we saying that multivalued is ok, as long as the number of values is less than the filterCache size? [Unfortunately I don't have a single valued version of this field to test with]
For non- singled-valued, untokenized fields, all[1] that matters is the number of "things" faceted on. Whether these things are arbitrary queries, tokens from tokenized fields or multiple values in untokenized fields is moot. You've got 2million values, which implies the construction of 2million filters and an intersection with the main query docset. Even if you enlarge the filter cache to contain all 2m filtters, you still require time to do 2m set intersections. This may take too long if the filters are all small. As a point of comparison, here is a query that returned ~200k docs and faceted against 70 facets with roughly 140k docs in each filter (cached): 329.0 total time 0.0 set up/parsing 125.0 main query 46.0 faceting 100.0 optimized pre-fetch 58.0 debug Times are in milliseconds. I've found breaking down the timing rather useful since I have huge stored docs and non-query-related tasks often take up big chunks of time. I could contribute it if anyone else would find it useful. -Mike [1] well, much, if not all.