[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Chris M. Hostetter (Jira) Tue, 09 Jun 2020 10:49:14 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129663#comment-17129663
 ]


Chris M. Hostetter commented on SOLR-13132:
-------------------------------------------

FYI: I haven't reviewed your recent commits, but i've been letting my computer 
beast the tests on your branch and haven't found any failures yet.
{quote}...that follows something that I think is like the approach you're 
suggesting: selectively wrapping {{countAcc}} to shim the mismatch between 
sweep and non-sweep code, ...
{quote}
as i said, i haven't reviewed this change, but i wasn't assuming we'd need to 
"shim" or wrap anything – just that we would go back to the code the way it was 
before the last merge with master, with {{ByArrayUIF}} and {{ByArrayDV}} being 
responsible for initializing the {{SweepingCountSlotAcc}} – EXCEPT – in this 
special "refining only the allBuckets bucket" code path where they would just 
use the {{DEV_NULL_SLOT_ACC}} set by their parent class and ignore sweeping.

at a glance, I see the "Shim" class you're refering to, but it has no javadocs 
so i'm not really clear why it needs to exist ... can you please flesh that out 
with some docs explaining it's purpose?
{quote}.. So _if_ we'd be ok with {{allBuckets}} skg being supported for sweep 
collection but not for non-sweep, ..
{quote}
Uh, no. sorry i'm not ok with that, because it would mean you'd get drastically 
diff results depending on wether sweeping was used or not, which is not 
something the user can directly control – changing things like the "sort" 
param, or the field properties changes the processor used regardless of whta 
the user may "suggest" with the method param. any situation where that results 
in diff results being returned should be treated as a bug (ex: SOLR-14514)

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Reply via email to