Ok

Clarification

The limit is set to -1. But the average result is 300. 

The amount of strings stored in the field increased a lot. Like 250k to 350k. 
But the amount coming out is limited by facet.prefix. 

Would creating 900 fields be better ? Then I could just put the prefix in the 
field name. Like this: proc_ps122

Thoughts ?

So far I heard solcloud, docvalues as viable solutions. Stay away from enum.

Bill Bell
Sent from mobile


> On Apr 5, 2015, at 2:56 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote:
> 
> William Bell <billnb...@gmail.com> wrote:
> Sent: 05 April 2015 06:20
> To: solr-user@lucene.apache.org
> Subject: Facet
> 
>> We increased our number of terms (String) in a facet by 50,000.
> 
> Do you mean facet.limit=50000?
> 
>> Now we are getting an error when we facet by this field - so we switched it 
>> to
>> facet.method=enum, and now the results come back. However, when we put
>> it into production we literally hit a wall (CPU went to 100% for 16 cores)
>> after about 30 minutes live.
> 
> It was strange that enum worked. Internally, the difference between 
> facet.limit=100 and facet.limit=50000 is quite small. The real hits are for 
> fine-counting within SolrCloud and serializing the result in order to deliver 
> it to the client. I thought enum behaved the same as fc with regard to those 
> two.
> 
>> We tried adding more machines to reduce the CPU, but it did not help.
> 
> Sounds like SolrCloud. More machines does not help here, it might even be 
> worse. What happens is that distributed faceting is two-phase, where the 
> second phase is fine-counting. The fine-counting essentially makes all shards 
> perform micro-searches for a large part of the terms returned: Your shards 
> are bogged down by tens of thousands of small searches.
> 
> If you are feeling adventurous, you can try putting
> http://tokee.github.io/lucene-solr/
> on a test-installation (I am the author). It changes the way the 
> fine-counting is done.
> 
> 
> Depending on your container, you might need to raise the internal limits for 
> GET-communication. Tomcat has a default of 2MB somewhere (sorry, don't 
> remember the details), which is not a lot for 50,000 values.
> 
>> What are some ideas? We are going to try docValues on the field. Does
>> anyone know if method=fc or method=enum works for docValue? I cannot find
>> any documentation on that.
> 
> If DocValues are enabled, fc will use them. It does not change anything for 
> enum. But I would argue against enum for anything in the thousands anyway.
> 
>> We are thinking of splitting the field into 2 fields (fielda, fieldb). At
>> least the number will be less, but not sure if it will help memory?
> 
> The killer is the number of terms requested/returned.
> 
>> The weird thing is for the first 30 minutes things are performing great.
>> Literally at like 10% CPU across 16 cores, not much memory and normal GC.
> 
> It might be because you have just been lucky. Take a look at
> https://twitter.com/anjacks0n/status/509284768035262464
> for how different performance can be for different result set sizes.
> 
>> Originally the facet was a method=fc. Is there an issue with enum? We have
>> facet.threads=20 set, and not sure this is wise for a enum ?
> 
> Facet threading does not thread within each field, it just means that 
> multiple fields are processed in parallel.
> 
> - Toke Eskildsen

Reply via email to