William Bell <billnb...@gmail.com> wrote: Sent: 05 April 2015 06:20 To: solr-user@lucene.apache.org Subject: Facet
> We increased our number of terms (String) in a facet by 50,000. Do you mean facet.limit=50000? > Now we are getting an error when we facet by this field - so we switched it > to > facet.method=enum, and now the results come back. However, when we put > it into production we literally hit a wall (CPU went to 100% for 16 cores) > after about 30 minutes live. It was strange that enum worked. Internally, the difference between facet.limit=100 and facet.limit=50000 is quite small. The real hits are for fine-counting within SolrCloud and serializing the result in order to deliver it to the client. I thought enum behaved the same as fc with regard to those two. > We tried adding more machines to reduce the CPU, but it did not help. Sounds like SolrCloud. More machines does not help here, it might even be worse. What happens is that distributed faceting is two-phase, where the second phase is fine-counting. The fine-counting essentially makes all shards perform micro-searches for a large part of the terms returned: Your shards are bogged down by tens of thousands of small searches. If you are feeling adventurous, you can try putting http://tokee.github.io/lucene-solr/ on a test-installation (I am the author). It changes the way the fine-counting is done. Depending on your container, you might need to raise the internal limits for GET-communication. Tomcat has a default of 2MB somewhere (sorry, don't remember the details), which is not a lot for 50,000 values. > What are some ideas? We are going to try docValues on the field. Does > anyone know if method=fc or method=enum works for docValue? I cannot find > any documentation on that. If DocValues are enabled, fc will use them. It does not change anything for enum. But I would argue against enum for anything in the thousands anyway. > We are thinking of splitting the field into 2 fields (fielda, fieldb). At > least the number will be less, but not sure if it will help memory? The killer is the number of terms requested/returned. > The weird thing is for the first 30 minutes things are performing great. > Literally at like 10% CPU across 16 cores, not much memory and normal GC. It might be because you have just been lucky. Take a look at https://twitter.com/anjacks0n/status/509284768035262464 for how different performance can be for different result set sizes. > Originally the facet was a method=fc. Is there an issue with enum? We have > facet.threads=20 set, and not sure this is wise for a enum ? Facet threading does not thread within each field, it just means that multiple fields are processed in parallel. - Toke Eskildsen