Hi all, I'm converting my legacy facets to JSON facets and am seeing much better performance, especially with high cardinality facet fields. However, the one issue I can't seem to resolve is excessive memory usage (and OOM errors) when trying to simulate the effect of "group.facet" to sort facets according to a grouping field.
My situation, slightly simplified is: Solr 4.6.1 * Doc set: ~200,000 docs * Grouping by item_id, an indexed, stored, single value string field with ~50,000 unique values, ~4 docs per item * Faceting by person_id, an indexed, stored, multi-value string field with ~50,000 values (w/ a very skewed distribution) * No docValues fields Each document here is a description of an item, and there are several descriptions per item in multiple languages. With legacy facets I use group.field=item_id and group.facet=true, which gives me facet counts with the number of items rather than descriptions, and correctly sorted by descending item count. With JSON facets I'm doing the equivalent like so: &json.facet={ "people": { "type": "terms", "field": "person_id", "facet": { "grouped_count": "unique(item_id)" }, "sort": "grouped_count desc" } } This works, and is somewhat faster than legacy faceting, but it also produces a massive spike in memory usage when (and only when) the sort parameter is set to the aggregate field. A server that runs happily with a 512MB heap OOMs unless I give it a 4GB heap. With sort set to (the default) "count desc" there is no memory usage spike. I would be curious if anyone has experienced this kind of memory usage when sorting JSON facets by stats and if there’s anything I can do to mitigate it. I’ve tried reindexing with docValues enabled on the relevant fields and it seems to make no difference in this respect. Many thanks, ~Mike