Mohsin Beg Beg [mohsin....@oracle.com] wrote: > Looking at SimpleFacets.java, doesn't fc/fcs iterate only over the DocSet for > the fields.
To get the seed for the concrete faceting resolving, yes. That still leaves the mapping and the counting structures. > So assuming each field has a unique term across the 28 rows, a max of 28 * 15 > unique small strings (<100bytes), should be in the order of 1MB. > For 100 collections, lets say a total of 1GB. Now lets say I multiply it by 3 > to 3GB. My explanation must have been unclear. You are still operating under the assumption that fc/fcs memory consumption is tied to the size of the search result. That is not the case. What you are describing sounds more like enum. > That still leaves more that 4GB heap used by something else to run out > memory? Who and why? *sigh* The why is quite simple: fc/fcs is designed to deliver fast faceting for fields with non-trivial cardinality. The cost is memory overhead and delayed startup. Both mitigated but not removed by using DocValues. > 2. "hierarchical" field doesn't work when selecting 15 fields (out of 300+) > since > there is no way to give a one hierarchical path in fq or via facet.prefix > value. Sadly Solr does not yet support under-the-hood collapsing of facets. > 3. One facet-at-time exceeds the total latency requirements of the app on top But you can run one facet at a time? If so, what about 2? 3? What is your current limit? > Am I stuck ? Not yet. Are you currently using DocValues? Could you describe your facet fields a bit more? Type? Cardinality? Maximum count for any tag? If you have high cardinality (1M+) for some fields and a low (< 65000) maximum count, http://tokee.github.io/lucene-solr/ could help you by lowering memory usage. If you can accept imprecise counts, you could speed up the faceting process substantially by doing single-phase distributed faceting and maybe get satisfactory performance requesting the facet results one at a time. > ps: Doesn't enum build the uninverted index for each unique term in the field > and then intersect > with the DocSet to return the facet counts? Not an uninverted index as such, but yes, it gets the docIDs for each term and intersects with the query result docIDs. > This causes filterCache entries to be bloated in each core. That causes OOM > on just 4 or 5 string fields (depending on their cardinality). Set filterCache lower? But if your cardinality is in the thousands or higher, enum is unlikely to give you proper response times. - Toke Eskildsen