Slow quires and facets

Rudenko, Artur Wed, 12 Feb 2020 12:33:13 -0800

Hello everyone,
I'm am currently investigating a performance issue in our environment:
20M large PARENT documents and 800M nested small CHILD documents.
The system inserts about 400K PARENT documents and 16M CHILD documents per day. 
(Currently we stopped the calls insertion to investigate the performance issue)
This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 
24GB allocated to Solr) with single collection (32 shards and replication 
factor 2).


We experience generally slow queries (about 4-7 seconds) and facet times. The 
below query runs in about 14-16 seconds (we have to use limit:-1 due to a 
business case - cardinality is 1K values).

fq=channel:345133
&fq=content_type:PARENT
&fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of int 562 
values)
&q=*:*
&json.facet={
                "Chart_01_Bins":{
                                                type:terms,
                                                field:groupIds,
                                                mincount:1,
                                                limit:-1,
                                                numBuckets:true,
                                                missing:false,
                                                refine:true,
                                                facet:{
                                                                
min_score_avg:"avg(min_score)",
                                                                
max_score_avg:"avg(max_score)",
                                                                
avg_score_avg:"avg(avg_score)"
                                                }
                },
                "Chart_01_FIELD_NOT_EXISTS":{
                                type:query,
                                q:"-groupIds:[* TO *]",
                                facet:{
                                                min_score_avg:"avg(min_score)",
                                                max_score_avg:"avg(max_score)",
                                                avg_score_avg:"avg(avg_score)"
                                }
                }
}
&rows=0

Also, when the facet is simplified, it takes about 4-6 seconds

fq=channel:345133
&fq=content_type:PARENT
&fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of int 562 
values)
&q=*:*
&json.facet={
                "Chart_01_Bins":{
                                type:terms,
                                field:groupIds,
                                mincount:1,
                                limit:-1,
                                numBuckets:true,
                                missing:false,
                                refine:true
                }
}
&rows=0

Schema relevant fields:

<fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/>
<fieldType name="pint" class="solr.IntPointField" docValues="true"/>

<!-- Currently only 1 value, in the future we expect to have about 25 different 
values -->
<field name="channel" type="string" indexed="true" stored="true" 
required="true" multiValued="false" />

<!-- 2 Possible values (PARENT\CHILD) -->
<field name="content_type" type="string" indexed="true" stored="true" 
required="true" multiValued="false" />

<!-- Cardinality of 1K values, document may have 0 to all possible values -->
<field name="groupIds" type="pint" indexed="true" stored="true" 
required="false" multiValued="true" />

<!-- Float value between -2 to 2, all documents have this field (applied for 
the below 3 fields) -->
<field name="min_score" type="pfloat" indexed="true" stored="true" 
required="false" multiValued="false" />
<field name="avg_score" type="pfloat" indexed="true" stored="true" 
required="false" multiValued="false" />
<field name="max_score" type="pfloat" indexed="true" stored="true" 
required="false" multiValued="false" />

<!-- Cardinality with about few thousands values, currently only 1 dynamic 
field exists with this prefix, document may have 1 to all possible values -->
<dynamicField name="Meta_is_*" type="pint" indexed="true" stored="true" 
multiValued="true" />


Any suggestions how to proceed with the investigation?

Right now we are trying to figure out if using single shard on each machine 
will help.
Artur Rudenko
Analytics Developer
Customer Engagement Solutions, VERINT
T +972.74.747.2536 | M +972.52.425.4686



This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.

Slow quires and facets

Reply via email to