Re: Facet

Toke Eskildsen Sun, 05 Apr 2015 01:57:41 -0700

William Bell <billnb...@gmail.com> wrote:
Sent: 05 April 2015 06:20
To: solr-user@lucene.apache.org
Subject: Facet


> We increased our number of terms (String) in a facet by 50,000.

Do you mean facet.limit=50000?

>  Now we are getting an error when we facet by this field - so we switched it 
> to
> facet.method=enum, and now the results come back. However, when we put
> it into production we literally hit a wall (CPU went to 100% for 16 cores)
> after about 30 minutes live.

It was strange that enum worked. Internally, the difference between 
facet.limit=100 and facet.limit=50000 is quite small. The real hits are for 
fine-counting within SolrCloud and serializing the result in order to deliver 
it to the client. I thought enum behaved the same as fc with regard to those 
two.

> We tried adding more machines to reduce the CPU, but it did not help.

Sounds like SolrCloud. More machines does not help here, it might even be 
worse. What happens is that distributed faceting is two-phase, where the second 
phase is fine-counting. The fine-counting essentially makes all shards perform 
micro-searches for a large part of the terms returned: Your shards are bogged 
down by tens of thousands of small searches.

If you are feeling adventurous, you can try putting
http://tokee.github.io/lucene-solr/
on a test-installation (I am the author). It changes the way the fine-counting 
is done.


Depending on your container, you might need to raise the internal limits for 
GET-communication. Tomcat has a default of 2MB somewhere (sorry, don't remember 
the details), which is not a lot for 50,000 values.

> What are some ideas? We are going to try docValues on the field. Does
> anyone know if method=fc or method=enum works for docValue? I cannot find
> any documentation on that.

If DocValues are enabled, fc will use them. It does not change anything for 
enum. But I would argue against enum for anything in the thousands anyway.

> We are thinking of splitting the field into 2 fields (fielda, fieldb). At
> least the number will be less, but not sure if it will help memory?

The killer is the number of terms requested/returned.

> The weird thing is for the first 30 minutes things are performing great.
> Literally at like 10% CPU across 16 cores, not much memory and normal GC.

It might be because you have just been lucky. Take a look at
https://twitter.com/anjacks0n/status/509284768035262464
for how different performance can be for different result set sizes.

> Originally the facet was a method=fc. Is there an issue with enum? We have
> facet.threads=20 set, and not sure this is wise for a enum ?

Facet threading does not thread within each field, it just means that multiple 
fields are processed in parallel.

- Toke Eskildsen

Re: Facet

Reply via email to