[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Chris M. Hostetter (Jira) Sun, 05 Jul 2020 15:57:19 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151677#comment-17151677
 ]


Chris M. Hostetter commented on SOLR-13132:
-------------------------------------------

bq. ... perhaps the only remaining question is "what is the allBucketSlotNum 
when sweeping?". I'm definitely comfortable with the initial "hackish" approach 
as you've refined it in your recent commit ...

Yep, I haven't been able to think of a better solution that balances 
"HACKishness" vs "overly complex change to APIs" ... so let's move forward.

[~mgibney] - can you please merge in the latests apache/master branch to pick 
up fea6c1b9daa243e48297b4ba5a1ece68b8790a5b & confirm precommit still passes? 
(there have been some additional stricntess checks put into place that may 
require dealing with some compiler warnings in order for the build to continue 
to work -- not sure if that affects us) ... and then i'll (hopefully) have time 
to do one final review of the overall diff & merge to master to commit later 
this week. 

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Reply via email to