Re: Facet

Erick Erickson Mon, 06 Apr 2015 20:04:42 -0700

fc.method=enum will create an entry in the filter cache for each and
every value. But since the filterCache is bounded, each result will
pretty much be thrown away immediately. At least that's what I
remember.


Which neatly accounts for your issue I think; you're spending a huge
amount of time/cycles calculating filterCache entries to just throw
them away. If you increased your filterCache size to (shudder) 300K+ I
think your performance would be fine after the first one, but I
really, really, really doubt you can do that.

You say "Now we are getting an error". What's the error? I'm guessing OOM...

Faceting really wasn't built for very high cardinality fields. If this
is a reporting kind of thing, and you have the option of using 5.1
(coming Real Soon Now), you might get some usage out of "streaming
aggregation", which is way cool. But it's not going to give you
sub-second responses though.

Best,
Erick

On Sun, Apr 5, 2015 at 2:59 PM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote:
> Bill Bell <billnb...@gmail.com> wrote:
>> The limit is set to -1. But the average result is 300.
>
> Okay, better. Well, somewhat better. But unless your values are very well 
> distributed, I would guess that your worst case is very high. Have you 
> checked if your performance problems are for specific queries?
>
> One way is to look through your solr.log for high QTimes and see if that 
> correlates with large result sets. My guess (still assuming distributed 
> search) is that lines containing __terms (indicating the fine count phase of 
> distributed faceting) will have higher QTimes that the other queries.
>
>> Would creating 900 fields be better ?
>> Then I could just put the prefix in the field name.
>
> With fc, there is an constant overhead for each field that you facet on. 900 
> fields would take up much more memory than a single field with all the 
> values. I don't think that enum leaves structures in memory, but I doubt that 
> it would be better than using a single field and facet.prefix.
>
>> So far I heard solcloud, docvalues as viable solutions. Stay away from enum.
>
> SolrCloud is not a solution to faceting as such. There is a performance 
> penalty when switching from single-shard to SolrCloud, especially for the 
> fairly large facet result sets that you have. I just guessed that you were 
> using SolrCloud already.
>
> A quick test: Try setting facet.limit=10 and run some tests. If performance 
> is fine for that and you're using multiple shards, then your performance (at 
> least for faceting) would probably be a lot higher with just a single shard.
>
> - Toke Eskildsen

Re: Facet

Reply via email to