[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Chris M. Hostetter (Jira) Wed, 08 Jul 2020 22:25:16 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154192#comment-17154192
 ]


Chris M. Hostetter commented on SOLR-13132:
-------------------------------------------

Hey Michael, just to sanity check one thing...
{quote}The results below are mostly for without any filterCache,...
{quote}
Are the {{MASTER}} results you reported here from testing with  
"filterCacheSize=0"  ... which would make them an apples to apples comparison 
with your {{"SOLR-13132 sweep_collection=false, filterCacheSize=0}} " results? 
(i'm guessing  so based on your comment above, but i don't see any sort of log 
in your tgz showing exactly what commands were used to run each test, and i see 
a default size of 4096 in your solrconfig.xml, so i wanted to be certain what 
you ment)

Assuming i'm understanding correctly, then what i see here is what we expected: 
for the "common" case of sorting on {{relatedness()}} (high cardinality fields 
– larger then filterCache size – w/ non-trivially FG sets) sweeping is big 
improvement, and in the _uncommon_ case of low cardinality fields and/or small 
FG sets there is only a small (negative) impact of using the new code – and 
that seems to go away (numbers close enough to be noise) by specyifing 
"{{sweep_collection:false}} ".

sound right?

So i think we're good to go – we just need the corrections/updates to the ref 
guide. If you can please remove the incorrect stuff i mentioned in my last 
comment, and replace it with a note explaining when/why people _may_ want to 
consider use {{sweep_collection: false}} (ie: atypical SKG situations where 
cardinality or FG set size is very low) I'll try to merge & backport ASAP.

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-benchmarks.tgz, 
> SOLR-13132-with-cache-01.patch, SOLR-13132-with-cache.patch, 
> SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Reply via email to