RE: (Issue) How improve solr facet performance

Toke Eskildsen Sat, 24 May 2014 00:18:06 -0700

Alice.H.Yang (mis.cnsh04.Newegg) 41493 [alice.h.y...@newegg.com] wrote:
> 1.  I'm sorry, I have made a mistake, the total number of documents is 32 
> Million, not 320 Million.
> 2.  The system memory is large for solr index, OS total has 256G, I set the 
> solr tomcat HEAPSIZE="-Xms25G -Xmx100G"


100G is a very high number. What special requirements dictates such a large 
heap size?

> Reply:  9 fields I facet on.

Solr treats each facet separately and with facet.method=fc and 10M hits, this 
means that it will iterate 9*10M = 90M document IDs and update the counters for 
those.

> Reply:  3 facet fields have one hundred unique values, other 6 facet fields' 
> unique values are between 3 to 15.

So very low cardinality. This is confirmed by your low response time of 6ms for 
2925 hits.

> And we test this scenario:  If the number of facet fields' unique values is 
> less we add facet.method=enum, there is a little to improve performance.

That is a shame: enum is normally the simple answer to a setup like yours. Have 
you tried fine-tuning your fc/enum selection, so that the 3 fields with 
hundreds of values uses fc and the rest uses enum? That might halve your 
response time.


Since the number of unique facets is so low, I do not think that DocValues can 
help you here. Besides the fine-grained fc/enum-selection above, you could try 
collapsing all 9 facet-fields into a single field. The idea behind this is that 
for facet.method=fc, performing faceting on a field with (for example) 300 
unique values takes practically the same amount of time as faceting on a field 
with 1000 unique values: Faceting on a single slightly larger field is much 
faster than faceting on 9 smaller fields. After faceting with facet.limit=-1 on 
the single super-facet-field, you must match the returned values back to their 
original fields:


If you have the facet-fields

field0: 34
field1: 187
field2: 78432
field3: 3
...

then collapse them by or-ing a field-specific mask that is bigger than the max 
in any field, then put it all into a single field:

fieldAll: 0xA0000000 | 34
fieldAll: 0xA1000000 | 187
fieldAll: 0xA2000000 | 78432
fieldAll: 0xA3000000 | 3
...

perform the facet request on fieldAll with facet.limit=-1 and split the 
resulting counts with

for (entry: facetResultAll) {
  switch (0xFF000000 & entry.value) {
    case 0xA0000000:
      field0.add(entry.value, entry.count);
      break;
    case 0xA1000000:
      field1.add(entry.value, entry.count);
      break;
...
  }
}


Regards,
Toke Eskildsen, State and University Library, Denmark

RE: (Issue) How improve solr facet performance

Reply via email to