I don't have any particular idea. Probably it's worth to start from
learning debugQuery=true output. There are caveats:
- it takes a while
- it's worth to limit shards to a few ones
- it used to produce incorrect json, and worked in only in wt=xml
At least it let to sneak something about the longest part of computation.
Few other thoughts: output with 1K entries doesn't seem like a regular
search engine response, usually results are scrolled with limit/offset, but
anyway it looks like analytical job for spark.


On Wed, Feb 12, 2020 at 11:32 PM Rudenko, Artur <artur.rude...@verint.com>
wrote:

> Hello everyone,
> I'm am currently investigating a performance issue in our environment:
> 20M large PARENT documents and 800M nested small CHILD documents.
> The system inserts about 400K PARENT documents and 16M CHILD documents per
> day. (Currently we stopped the calls insertion to investigate the
> performance issue)
> This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM
> each, 24GB allocated to Solr) with single collection (32 shards and
> replication factor 2).
>
> We experience generally slow queries (about 4-7 seconds) and facet times.
> The below query runs in about 14-16 seconds (we have to use limit:-1 due to
> a business case - cardinality is 1K values).
>
> fq=channel:345133
> &fq=content_type:PARENT
> &fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of
> int 562 values)
> &q=*:*
> &json.facet={
>                 "Chart_01_Bins":{
>                                                 type:terms,
>                                                 field:groupIds,
>                                                 mincount:1,
>                                                 limit:-1,
>                                                 numBuckets:true,
>                                                 missing:false,
>                                                 refine:true,
>                                                 facet:{
>
> min_score_avg:"avg(min_score)",
>
> max_score_avg:"avg(max_score)",
>
> avg_score_avg:"avg(avg_score)"
>                                                 }
>                 },
>                 "Chart_01_FIELD_NOT_EXISTS":{
>                                 type:query,
>                                 q:"-groupIds:[* TO *]",
>                                 facet:{
>
> min_score_avg:"avg(min_score)",
>
> max_score_avg:"avg(max_score)",
>
> avg_score_avg:"avg(avg_score)"
>                                 }
>                 }
> }
> &rows=0
>
> Also, when the facet is simplified, it takes about 4-6 seconds
>
> fq=channel:345133
> &fq=content_type:PARENT
> &fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of
> int 562 values)
> &q=*:*
> &json.facet={
>                 "Chart_01_Bins":{
>                                 type:terms,
>                                 field:groupIds,
>                                 mincount:1,
>                                 limit:-1,
>                                 numBuckets:true,
>                                 missing:false,
>                                 refine:true
>                 }
> }
> &rows=0
>
> Schema relevant fields:
>
> <fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/>
> <fieldType name="pint" class="solr.IntPointField" docValues="true"/>
>
> <!-- Currently only 1 value, in the future we expect to have about 25
> different values -->
> <field name="channel" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>
> <!-- 2 Possible values (PARENT\CHILD) -->
> <field name="content_type" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>
> <!-- Cardinality of 1K values, document may have 0 to all possible values
> -->
> <field name="groupIds" type="pint" indexed="true" stored="true"
> required="false" multiValued="true" />
>
> <!-- Float value between -2 to 2, all documents have this field (applied
> for the below 3 fields) -->
> <field name="min_score" type="pfloat" indexed="true" stored="true"
> required="false" multiValued="false" />
> <field name="avg_score" type="pfloat" indexed="true" stored="true"
> required="false" multiValued="false" />
> <field name="max_score" type="pfloat" indexed="true" stored="true"
> required="false" multiValued="false" />
>
> <!-- Cardinality with about few thousands values, currently only 1 dynamic
> field exists with this prefix, document may have 1 to all possible values
> -->
> <dynamicField name="Meta_is_*" type="pint" indexed="true" stored="true"
> multiValued="true" />
>
>
> Any suggestions how to proceed with the investigation?
>
> Right now we are trying to figure out if using single shard on each
> machine will help.
> Artur Rudenko
> Analytics Developer
> Customer Engagement Solutions, VERINT
> T +972.74.747.2536 | M +972.52.425.4686
>
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev

Reply via email to