[
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130827#comment-17130827
]
Michael Gibney commented on SOLR-13132:
---------------------------------------
Just pushed a commit (5e021167fe0cf8ee212edc493c7cd5c9c00f3aa7) that adds
javadocs for the "Shim" class, and also realized that I could simplify the
class itself and clarify its intended use.
I've thought a bit more about the question of how to detect the {{allBuckets}}
slot for disabling {{allBuckets}} relatedness: I don't really have any good
answers, but a handful of thoughts:
# the "baby-bear" solution (disabling sweep when {{allBuckets==true}}) would
work, but could have significant negative performance implications.
# the only downside to the "papa-bear" solution (adding a {{SlotContext}} arg
to {{incrementCount(...)}}) would be adding baggage to the
otherwise-straightforward {{CountSlotAcc}} API; but if you're ok with that, I
do think this would work (and I'd be happy to take a crack at implementing),
and this solution would I think have very little performance overhead, since we
already have {{SlotContexts}} in the {{collect}} methods that would be calling
(directly or indirectly) {{incrementCount(...)}}.
# returning the new {{SlotAcc}} from {{SlotAcc.registerSweepingAccs()}} (or
even a separate, special-purpose {{SlotAcc}}) would be the simplest solution
(in terms of requiring few structural changes) ... but introducing a non-null
{{collectAcc}} (as opposed to setting {{collectAcc==null}}) would have the
unfortunate side-effect of undermining the "count-only" performance
optimizations in {{ByArrayUIF}} and {{ByArrayDV}}.
# If adding an extra API param somewhere carrying additional information about
the slot, doing that in {{setValues(...)}} would be nice because it would be
called fewer times. But unlike for {{incrementCount(...)}}, there's no
{{SlotContext}} already available, so I guess this approach (which I'd
suggested earlier) might be a non-starter, and I'm not even sure it would work.
# ... then there's the hackish solution I put in place. What I like least
about this isn't the reflection, but the fact that it relies on grabbing the
vestigial {{collectAccSlot}} off the {{allBucketsAcc}} that has had its own
{{collectAcc}} set to null :-| ... but, as I said, it illustrates the problem,
and does seem to work.
> Improve JSON "terms" facet performance when sorted by relatedness
> ------------------------------------------------------------------
>
> Key: SOLR-13132
> URL: https://issues.apache.org/jira/browse/SOLR-13132
> Project: Solr
> Issue Type: Improvement
> Components: Facet Module
> Affects Versions: 7.4, master (9.0)
> Reporter: Michael Gibney
> Priority: Major
> Attachments: SOLR-13132-with-cache-01.patch,
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate
> {{relatedness}} for every term.
> The current implementation uses a standard uninverted approach (either
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain
> base docSet, and then uses that initial pass as a pre-filter for a
> second-pass, inverted approach of fetching docSets for each relevant term
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and
> set intersection operations increases request latency to the point where
> relatedness sort may not be usable in practice (for my use case, even after
> applying the patch for SOLR-13108, for a field with ~220k unique terms per
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable
> ~300ms and ~250ms respectively. The approach calculates uninverted facet
> counts over domain base, foreground, and background docSets in parallel in a
> single pass. This allows us to take advantage of the efficiencies built into
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids
> the per-term docSet creation and set intersection overhead.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]