On Thu, 2014-04-10 at 04:23 +0200, Damien Kamerman wrote: > What I have found with Solr 4.6.0 to 4.7.1 is that memory usage continues > to grow with facet queries.
It allocates (potentially significant) temporary structures, yes. > Then I tried to determine a safe limit at which the search would work > without breaking solr. But what I found is that I can break solr in the > same way with one facet (with many distinct values) and one collection. By > holding F5 (reload) in the browser for 10 seconds memory usage continues to > grow. > > e.g. > http://localhost:8000/solr/collection/select?facet=true&facet.mincount=1&q=*:*&facet.threads=5&facet.field=id > > I realize that faceting on 'id' is extreme but it seems to highlight the > issue that memory usage continues to grow (leak?) with each new query until > solr eventually breaks. Qualified guess: Keyboard-repeat kicks in and your browser will break the existing connections and establish new ones very quickly. Each faceted call allocates temporary memory. For standard searches, the amount is small, but faceting on a high-cardinality field like id is more expensive: 4 bytes/unique ID for String field cache. The overhead lives until the faceting call has been fully processed - breaking the connection to Solr does not stop that. You state that you have 17M+ documents in your indes. That is 58MB+ temporary overhead for each call. Let's say your keyboard repeat is about 50/second. That means 50*58MB+ ~= 3,4GB+ for temporary structures in Solr when you F5. I have recently learned that the Jetty provided with Solr is tweaked to accept 1000 concurrent incoming requests (which in your case would require 40GB of heap), so it will happily dispatch those 50 requests to Solr. To avoid this, lower your maxThreads-setting for Jetty to an amount that can be handled with your heap size. The F5-test seems like a very quick and easy way to determine if it works: You should start getting errors in the browser end instead of the Solr end. > This does not happen with the 'old' method 'facet.method=enum' - memory > usage is stable and solr is unbreakable with my hold-reload test. The memory allocation for enum is both low and independent of the amount of unique values in the facets. The trade-off is that is is very slow for medium- to high-cardinality fields. > This post > http://shal.in/post/285908948/inside-solr-improvements-in-faceted-search-performance > describes the new/current facet method and states > "The structure is thrown away and re-created lazily on a commit. There > might be a few concerns around the garbage accumulated by the (re)-creation > of the many arrays needed for this structure. However, the performance gain > is significant enough to warrant the trade-off." I investigated the garbage issue as part of SOLR-5894 and find it to be significant. See https://sbdevel.wordpress.com/2014/04/04/sparse-facet-counting-without-the-downsides/ for some numbers. Solving that does not help with the temporary allocation though. > The wiki http://wiki.apache.org/solr/SimpleFacetParameters#facet.method > says the new/default method 'tends to use less memory'. I do not agree on that part, but it is of course possible that I have misunderstood something. fc allocates the array I described in UnInvertedField.getCounts (look for counts = new int[...]). > I use autoCommit (1min) on my collections - does mean there's a one minute > (or longer with no new docs) window where facet queries will effectively > 'leak'? It does worsen the problem due to the resources used for warmup of the facet. - Toke Eskildsen, State and University Library, Denmark