Ok Clarification
The limit is set to -1. But the average result is 300. The amount of strings stored in the field increased a lot. Like 250k to 350k. But the amount coming out is limited by facet.prefix. Would creating 900 fields be better ? Then I could just put the prefix in the field name. Like this: proc_ps122 Thoughts ? So far I heard solcloud, docvalues as viable solutions. Stay away from enum. Bill Bell Sent from mobile > On Apr 5, 2015, at 2:56 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > > William Bell <billnb...@gmail.com> wrote: > Sent: 05 April 2015 06:20 > To: solr-user@lucene.apache.org > Subject: Facet > >> We increased our number of terms (String) in a facet by 50,000. > > Do you mean facet.limit=50000? > >> Now we are getting an error when we facet by this field - so we switched it >> to >> facet.method=enum, and now the results come back. However, when we put >> it into production we literally hit a wall (CPU went to 100% for 16 cores) >> after about 30 minutes live. > > It was strange that enum worked. Internally, the difference between > facet.limit=100 and facet.limit=50000 is quite small. The real hits are for > fine-counting within SolrCloud and serializing the result in order to deliver > it to the client. I thought enum behaved the same as fc with regard to those > two. > >> We tried adding more machines to reduce the CPU, but it did not help. > > Sounds like SolrCloud. More machines does not help here, it might even be > worse. What happens is that distributed faceting is two-phase, where the > second phase is fine-counting. The fine-counting essentially makes all shards > perform micro-searches for a large part of the terms returned: Your shards > are bogged down by tens of thousands of small searches. > > If you are feeling adventurous, you can try putting > http://tokee.github.io/lucene-solr/ > on a test-installation (I am the author). It changes the way the > fine-counting is done. > > > Depending on your container, you might need to raise the internal limits for > GET-communication. Tomcat has a default of 2MB somewhere (sorry, don't > remember the details), which is not a lot for 50,000 values. > >> What are some ideas? We are going to try docValues on the field. Does >> anyone know if method=fc or method=enum works for docValue? I cannot find >> any documentation on that. > > If DocValues are enabled, fc will use them. It does not change anything for > enum. But I would argue against enum for anything in the thousands anyway. > >> We are thinking of splitting the field into 2 fields (fielda, fieldb). At >> least the number will be less, but not sure if it will help memory? > > The killer is the number of terms requested/returned. > >> The weird thing is for the first 30 minutes things are performing great. >> Literally at like 10% CPU across 16 cores, not much memory and normal GC. > > It might be because you have just been lucky. Take a look at > https://twitter.com/anjacks0n/status/509284768035262464 > for how different performance can be for different result set sizes. > >> Originally the facet was a method=fc. Is there an issue with enum? We have >> facet.threads=20 set, and not sure this is wise for a enum ? > > Facet threading does not thread within each field, it just means that > multiple fields are processed in parallel. > > - Toke Eskildsen