Mohsin Beg Beg [mohsin....@oracle.com] wrote:

> Looking at SimpleFacets.java, doesn't fc/fcs iterate only over the DocSet for 
> the fields.

To get the seed for the concrete faceting resolving, yes. That still leaves the 
mapping and the counting structures.

> So assuming each field has a unique term across the 28 rows, a max of 28 * 15
> unique small strings (<100bytes), should be in the order of 1MB.
> For 100 collections, lets say a total of 1GB. Now lets say I multiply it by 3 
> to 3GB.

My explanation must have been unclear. You are still operating under the 
assumption that fc/fcs memory consumption is tied to the size of the search 
result. That is not the case. What you are describing sounds more like enum.

> That still leaves more that 4GB heap used by something else to run out 
> memory? Who and why? *sigh*

The why is quite simple: fc/fcs is designed to deliver fast faceting for fields 
with non-trivial cardinality. The cost is memory overhead and delayed startup. 
Both mitigated but not removed by using DocValues.

> 2. "hierarchical" field doesn't work when selecting 15 fields (out of 300+) 
> since
> there is no way to give a one hierarchical path in fq or via facet.prefix 
> value.

Sadly Solr does not yet support under-the-hood collapsing of facets.

> 3. One facet-at-time exceeds the total latency requirements of the app on top

But you can run one facet at a time? If so, what about 2? 3? What is your 
current limit?

> Am I stuck ?

Not yet.

Are you currently using DocValues?

Could you describe your facet fields a bit more? Type? Cardinality? Maximum 
count for any tag? If you have high cardinality (1M+) for some fields and a low 
(< 65000) maximum count, http://tokee.github.io/lucene-solr/ could help you by 
lowering memory usage.

If you can accept imprecise counts, you could speed up the faceting process 
substantially by doing single-phase distributed faceting and maybe get 
satisfactory performance requesting the facet results one at a time.

> ps: Doesn't enum build the uninverted index for each unique term in the field 
> and then intersect
> with the DocSet to return the facet counts?

Not an uninverted index as such, but yes, it gets the docIDs for each term and 
intersects with the query result docIDs.

> This causes filterCache entries to be bloated in each core. That causes OOM
> on just  4 or 5 string fields (depending on their cardinality).

Set filterCache lower? But if your cardinality is in the thousands or higher, 
enum is unlikely to give you proper response times.

- Toke Eskildsen

Reply via email to