答复: (Issue) How improve solr facet performance

Alice.H.Yang (mis.cnsh04.Newegg) 41493 Tue, 27 May 2014 01:18:08 -0700

Hi, Token

1.
        I set the 3 fields with hundreds of values uses fc and the rest uses 
enum, the performance is improved 2 times compared with no parameter, and then 
I add facet.method=20 , the performance is improved about 4 times compared with 
no parameter.
        And I also tried setting 9 facet field to one copyfield, I test the 
performance, it is improved about 2.5 times compared with no parameter.
        So, It is improved a lot under your advice, thanks a lot.
2.
        Now I have another performance issue, It's the group performance. The 
number of data is as same as facet performance scenario. 
When the keyword search hits about one million documents, the QTime is about 
600ms.(It doesn't query the first time, it's in cache)

Query url: 
select?fl=item_catalog&q=default_search:paramter&defType=edismax&rows=50&group=true&group.field=item_group_id&group.ngroups=true&group.sort=stock4sort%20desc,final_price%20asc,is_selleritem%20asc&sort=score%20desc,default_sort%20desc

It need Qtime about 600ms.

This query have two parameter: 
                                                1. fl one field 
                                                2. group=true, 
group.ngroups=true

If I set group=false,, the QTime is only 1 ms.
But I need do group and group.ngroups, How can I improve the group performance 
under this demand. Do you have some advice for me. I'm looking forward to your 
reply.

Best Regards,
Alice Yang
+86-021-51530666*41493
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

-----邮件原件-----
发件人: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
发送时间: 2014年5月24日 15:17
收件人: solr-user@lucene.apache.org
主题: RE: (Issue) How improve solr facet performance

Alice.H.Yang (mis.cnsh04.Newegg) 41493 [alice.h.y...@newegg.com] wrote:
> 1.  I'm sorry, I have made a mistake, the total number of documents is 32 
> Million, not 320 Million.
> 2.  The system memory is large for solr index, OS total has 256G, I set the 
> solr tomcat HEAPSIZE="-Xms25G -Xmx100G"

100G is a very high number. What special requirements dictates such a large 
heap size?

> Reply:  9 fields I facet on.

Solr treats each facet separately and with facet.method=fc and 10M hits, this 
means that it will iterate 9*10M = 90M document IDs and update the counters for 
those.

> Reply:  3 facet fields have one hundred unique values, other 6 facet fields' 
> unique values are between 3 to 15.

So very low cardinality. This is confirmed by your low response time of 6ms for 
2925 hits.

> And we test this scenario:  If the number of facet fields' unique values is 
> less we add facet.method=enum, there is a little to improve performance.

That is a shame: enum is normally the simple answer to a setup like yours. Have 
you tried fine-tuning your fc/enum selection, so that the 3 fields with 
hundreds of values uses fc and the rest uses enum? That might halve your 
response time.

Since the number of unique facets is so low, I do not think that DocValues can 
help you here. Besides the fine-grained fc/enum-selection above, you could try 
collapsing all 9 facet-fields into a single field. The idea behind this is that 
for facet.method=fc, performing faceting on a field with (for example) 300 
unique values takes practically the same amount of time as faceting on a field 
with 1000 unique values: Faceting on a single slightly larger field is much 
faster than faceting on 9 smaller fields. After faceting with facet.limit=-1 on 
the single super-facet-field, you must match the returned values back to their 
original fields:

If you have the facet-fields

field0: 34
field1: 187
field2: 78432
field3: 3
...

then collapse them by or-ing a field-specific mask that is bigger than the max 
in any field, then put it all into a single field:

fieldAll: 0xA0000000 | 34
fieldAll: 0xA1000000 | 187
fieldAll: 0xA2000000 | 78432
fieldAll: 0xA3000000 | 3
...

perform the facet request on fieldAll with facet.limit=-1 and split the 
resulting counts with

for (entry: facetResultAll) {
  switch (0xFF000000 & entry.value) {
    case 0xA0000000:
      field0.add(entry.value, entry.count);
      break;
    case 0xA1000000:
      field1.add(entry.value, entry.count);
      break;
...
  }
}

Regards,
Toke Eskildsen, State and University Library, Denmark

答复: (Issue) How improve solr facet performance

Reply via email to