Hi all,

I'm converting my legacy facets to JSON facets and am seeing much better 
performance, especially with high cardinality facet fields. However, the one 
issue I can't seem to resolve is excessive memory usage (and OOM errors) when 
trying to simulate the effect of "group.facet" to sort facets according to a 
grouping field.

My situation, slightly simplified is:

Solr 4.6.1

  *   Doc set: ~200,000 docs
  *   Grouping by item_id, an indexed, stored, single value string field with 
~50,000 unique values, ~4 docs per item
  *   Faceting by person_id, an indexed, stored, multi-value string field with 
~50,000 values (w/ a very skewed distribution)
  *   No docValues fields

Each document here is a description of an item, and there are several 
descriptions per item in multiple languages.

With legacy facets I use group.field=item_id and group.facet=true, which gives 
me facet counts with the number of items rather than descriptions, and 
correctly sorted by descending item count.

With JSON facets I'm doing the equivalent like so:

&json.facet={
    "people": {
        "type": "terms",
        "field": "person_id",
        "facet": {
            "grouped_count": "unique(item_id)"
        },
        "sort": "grouped_count desc"
    }
}

This works, and is somewhat faster than legacy faceting, but it also produces a 
massive spike in memory usage when (and only when) the sort parameter is set to 
the aggregate field. A server that runs happily with a 512MB heap OOMs unless I 
give it a 4GB heap. With sort set to (the default) "count desc" there is no 
memory usage spike.

I would be curious if anyone has experienced this kind of memory usage when 
sorting JSON facets by stats and if there’s anything I can do to mitigate it. 
I’ve tried reindexing with docValues enabled on the relevant fields and it 
seems to make no difference in this respect.

Many thanks,
~Mike

Reply via email to