Re: Facet search and growing memory usage

Toke Eskildsen Wed, 09 Apr 2014 22:54:07 -0700

On Thu, 2014-04-10 at 04:23 +0200, Damien Kamerman wrote:
> What I have found with Solr 4.6.0 to 4.7.1 is that memory usage continues
> to grow with facet queries.

It allocates (potentially significant) temporary structures, yes.

> Then I tried to determine a safe limit at which the search would work
> without breaking solr. But what I found is that I can break solr in the
> same way with one facet (with many distinct values) and one collection. By
> holding F5 (reload) in the browser for 10 seconds memory usage continues to
> grow.
> 
> e.g.
> http://localhost:8000/solr/collection/select?facet=true&facet.mincount=1&q=*:*&facet.threads=5&facet.field=id
> 
> I realize that faceting on 'id' is extreme but it seems to highlight the
> issue that memory usage continues to grow (leak?) with each new query until
> solr eventually breaks.

Qualified guess: Keyboard-repeat kicks in and your browser will break
the existing connections and establish new ones very quickly.

Each faceted call allocates temporary memory. For standard searches, the
amount is small, but faceting on a high-cardinality field like id is
more expensive: 4 bytes/unique ID for String field cache. The overhead
lives until the faceting call has been fully processed - breaking the
connection to Solr does not stop that.


You state that you have 17M+ documents in your indes. That is 58MB+
temporary overhead for each call. Let's say your keyboard repeat is
about 50/second. That means 50*58MB+ ~= 3,4GB+ for temporary structures
in Solr when you F5.

I have recently learned that the Jetty provided with Solr is tweaked to
accept 1000 concurrent incoming requests (which in your case would
require 40GB of heap), so it will happily dispatch those 50 requests to
Solr.

To avoid this, lower your maxThreads-setting for Jetty to an amount that
can be handled with your heap size. The F5-test seems like a very quick
and easy way to determine if it works: You should start getting errors
in the browser end instead of the Solr end.

> This does not happen with the 'old' method 'facet.method=enum' - memory
> usage is stable and solr is unbreakable with my hold-reload test.

The memory allocation for enum is both low and independent of the amount
of unique values in the facets. The trade-off is that is is very slow
for medium- to high-cardinality fields.

> This post
> http://shal.in/post/285908948/inside-solr-improvements-in-faceted-search-performance
> describes the new/current facet method and states
> "The structure is thrown away and re-created lazily on a commit. There
> might be a few concerns around the garbage accumulated by the (re)-creation
> of the many arrays needed for this structure. However, the performance gain
> is significant enough to warrant the trade-off."

I investigated the garbage issue as part of SOLR-5894 and find it to be
significant. See
https://sbdevel.wordpress.com/2014/04/04/sparse-facet-counting-without-the-downsides/
 for some numbers. Solving that does not help with the temporary allocation 
though.

> The wiki http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
> says the new/default method 'tends to use less memory'.

I do not agree on that part, but it is of course possible that I have
misunderstood something. fc allocates the array I described in
UnInvertedField.getCounts (look for counts = new int[...]).

> I use autoCommit (1min) on my collections - does mean there's a one minute
> (or longer with no new docs) window where facet queries will effectively
> 'leak'?

It does worsen the problem due to the resources used for warmup of the
facet.

- Toke Eskildsen, State and University Library, Denmark

Re: Facet search and growing memory usage

Reply via email to