I don't have any particular idea. Probably it's worth to start from learning debugQuery=true output. There are caveats: - it takes a while - it's worth to limit shards to a few ones - it used to produce incorrect json, and worked in only in wt=xml At least it let to sneak something about the longest part of computation. Few other thoughts: output with 1K entries doesn't seem like a regular search engine response, usually results are scrolled with limit/offset, but anyway it looks like analytical job for spark.
On Wed, Feb 12, 2020 at 11:32 PM Rudenko, Artur <artur.rude...@verint.com> wrote: > Hello everyone, > I'm am currently investigating a performance issue in our environment: > 20M large PARENT documents and 800M nested small CHILD documents. > The system inserts about 400K PARENT documents and 16M CHILD documents per > day. (Currently we stopped the calls insertion to investigate the > performance issue) > This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM > each, 24GB allocated to Solr) with single collection (32 shards and > replication factor 2). > > We experience generally slow queries (about 4-7 seconds) and facet times. > The below query runs in about 14-16 seconds (we have to use limit:-1 due to > a business case - cardinality is 1K values). > > fq=channel:345133 > &fq=content_type:PARENT > &fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of > int 562 values) > &q=*:* > &json.facet={ > "Chart_01_Bins":{ > type:terms, > field:groupIds, > mincount:1, > limit:-1, > numBuckets:true, > missing:false, > refine:true, > facet:{ > > min_score_avg:"avg(min_score)", > > max_score_avg:"avg(max_score)", > > avg_score_avg:"avg(avg_score)" > } > }, > "Chart_01_FIELD_NOT_EXISTS":{ > type:query, > q:"-groupIds:[* TO *]", > facet:{ > > min_score_avg:"avg(min_score)", > > max_score_avg:"avg(max_score)", > > avg_score_avg:"avg(avg_score)" > } > } > } > &rows=0 > > Also, when the facet is simplified, it takes about 4-6 seconds > > fq=channel:345133 > &fq=content_type:PARENT > &fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of > int 562 values) > &q=*:* > &json.facet={ > "Chart_01_Bins":{ > type:terms, > field:groupIds, > mincount:1, > limit:-1, > numBuckets:true, > missing:false, > refine:true > } > } > &rows=0 > > Schema relevant fields: > > <fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/> > <fieldType name="pint" class="solr.IntPointField" docValues="true"/> > > <!-- Currently only 1 value, in the future we expect to have about 25 > different values --> > <field name="channel" type="string" indexed="true" stored="true" > required="true" multiValued="false" /> > > <!-- 2 Possible values (PARENT\CHILD) --> > <field name="content_type" type="string" indexed="true" stored="true" > required="true" multiValued="false" /> > > <!-- Cardinality of 1K values, document may have 0 to all possible values > --> > <field name="groupIds" type="pint" indexed="true" stored="true" > required="false" multiValued="true" /> > > <!-- Float value between -2 to 2, all documents have this field (applied > for the below 3 fields) --> > <field name="min_score" type="pfloat" indexed="true" stored="true" > required="false" multiValued="false" /> > <field name="avg_score" type="pfloat" indexed="true" stored="true" > required="false" multiValued="false" /> > <field name="max_score" type="pfloat" indexed="true" stored="true" > required="false" multiValued="false" /> > > <!-- Cardinality with about few thousands values, currently only 1 dynamic > field exists with this prefix, document may have 1 to all possible values > --> > <dynamicField name="Meta_is_*" type="pint" indexed="true" stored="true" > multiValued="true" /> > > > Any suggestions how to proceed with the investigation? > > Right now we are trying to figure out if using single shard on each > machine will help. > Artur Rudenko > Analytics Developer > Customer Engagement Solutions, VERINT > T +972.74.747.2536 | M +972.52.425.4686 > > > > This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or subsidiaries. The > information is intended to be for the use of the individual(s) or > entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may not > use, copy, disclose or distribute to anyone this message or any information > contained in this message. If you have received this electronic message in > error, please notify us by replying to this e-mail. > -- Sincerely yours Mikhail Khludnev