Mohsin Beg Beg [mohsin....@oracle.com] wrote:
> I am getting OOM when faceting on numFound=28. The receiving
> solr node throws the OutOfMemoryError even though there is 7gb
> available heap before the faceting request was submitted.

fc and fcs faceting memory overhead is (nearly) independent on the number of 
hits in the search result. 

> If a different solr node is selected that one fails too. Any suggestions ?

> &facet.field=field1....field15
> &f.field1...field15.facet.method=fc/fcs
> &collection=Collection1...Collection100

You seem to be issuing a facet request for 15 fields in 100 collection 
concurrently. The memory overhead will be linear to the number of documents, 
references from documents to field values and the number of unique values in 
your facets, for each facet independently.

That was confusing. Let me try an example instead:

For each field, static memory requirements will be a structure that maps from 
documents to term ordinals. Depending on circumstances, this can be small 
(DocValues and a numeric field) or big (multi-value, non-DocValue String). Each 
concurrent call will temporarily allocate a structure for counting. If the 
field is numeric, this will be a hashmap. If it is String, it will be an 
integer-array with as many entries as there are unique values: If there are 1M 
unique String values in the field, the overhead will be 4 bytes * 1M = 4MB.

So, if each field has 250K unique String values, the temporary overhead for all 
15 fields will be 15MB. I don't now if the request for multiple collections is 
threaded, but if so, the 15MB should be multiplied with 100, totalling 1.5GB 
memory overhead for each call. Add the static structures and it does not seem 
unreasonable that you run out of memory.

All this is very loose, but the overall message is that documents, unique facet 
values, facets and collections all multiplies memory requirements.

* Do you need to query all collections at once?
* Can you collapse some of the facet fields, to reduce the total number?
* Are some of the fields very small? If so, use enum for them instead of fc/fcs.
* Maybe you can determine your limits by issuing requests first for 1 field, 
then 2 etc. This is to see if it is feasible to do minor tweak to get it to 
work or if your setup is so large that something entirely else needs to be done.

- Toke Eskildsen

Reply via email to