[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Michael Gibney (Jira) Thu, 02 Apr 2020 12:42:06 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074044#comment-17074044
 ]


Michael Gibney commented on SOLR-13132:
---------------------------------------

Yes, I think I understand the lines along which you're thinking, and that makes 
sense. Forgive me, I was intending the [earlier 
comment|https://issues.apache.org/jira/browse/SOLR-13132?focusedCommentId=17073878#comment-17073878]
 to be narrowly about the "XXXXXX temporary!" comment ... really coming to the 
conclusion that the functionality it was marking (approximately respecting 
{{cacheDf}}) should in fact be permanent, not temporary (i.e., the comment is 
misleading/out-of-date and should just be removed).

True I misspoke in implying that non-sweep {{SKGSlotAcc}} was just about 
refinement; it's necessary for any use case that takes a more "a la carte" 
approach, not requiring facet counts to be calculated over the full domain 
(refinement, resort, otherAccs, as you say ... maybe others?). In fact, 
otherAccs and resort, being likely to generate more DocSet lookups than 
refinement, make it all the more important that SKGSlotAcc respect {{cacheDf}} 
to control filterCache usage, no?

Your other suggestions, concerns about brittleness, API changes etc. definitely 
resonate with me (your stream-of-consciousness is very intelligible!) – I plan 
to work through them in the next day or two and address any questions as they 
come up.

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Reply via email to