[ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097660#comment-17097660
 ] 

Michael Gibney commented on SOLR-13132:
---------------------------------------

I was worrying about the order of sort keys out of an excess of caution, 
figuring that if it was possible to preserve order, it'd be worth doing so. If 
you don't think it's a concern then it's no concern to me (other than that 
preserving order seems achievable here, and might not be a _bad_ thing :)).
{quote}cleaning the code up to consistently use countAcc.setValues(...) so we 
can override it is a better approach. if the caller also needs to use 
countAcc.getCount(...) to decide when to recurse, so be it.
{quote}
This makes sense to me. The above is also related to the question raised here:
{code:java}
        // nocommit: if we're not using sweeping, why do we need to register 
any mappings?
        // baseSweepingAcc.registerMapping(this, this);
{code}
... indeed we probably could skip this if splitting full-domain accs between 
{{SweepingAcc}} and {{collectAcc}}. The reason I left this in initially was 
partly to preserve key order, but also to help clarify behavior of 
{{registerSweepingAccs(...)}}; specifically, if a processor calls 
{{registerSweepingAccs()}} on a {{SlotAcc}}, then the caller assumes 
responsibility of mediating all the output (calls to {{setValues(...)}}) for 
that acc. Now brainstorming other possible reasons to do this (aside from 
preserving key order), I can't really come up with anything. _If_ there's still 
any interest in preserving order, what would you think about modifying the 
method signature of {{SlotAcc.setValues(SimpleOrderedMap<Object> bucket, int 
slotNum)}} to return boolean instead of void, in the spirit of the 
{{setValues(...)}} method recently added to the {{SweepingAcc}} class? Or maybe 
less invasive, adding an analogous method to {{CountSlotAcc}} that returns 
boolean? The return value would normally be false, but if "true", would 
indicate that the method had handled setting values on all full-domain accs 
(what would formerly have been covered by calling 
{{collectAcc.setValues(...)}})?
{quote}unless there's another use for CollectSlotAccMappingAware i'm 
overlooking?
{quote}
The main/initial purpose of this was to provide an interface for notifying 
FacetProcessors when SlotAccs had been swapped out. Take the case where 
{{sortAcc}} is aliased to an instance of {{SKGSlotAcc}} which is wrapped in a 
{{MultiAcc}} {{collectAcc}}. {{collectAcc.registerSweepingAccs(...)}} could 
return itself (no top-level change -- still containing the other SlotAccs that 
need full collection, but with {{SKGSlotAcc}} removed). The new 
{{SweepSKGSlotAcc}} would be registered with the {{SweepingAcc}} (or 
potentially {{SweepingCountSlotAcc}}), but would still need to register the 
SlotAcc replacement with the {{FacetProcessor}} so that it could make 
corresponding changes to {{sortAcc}} or other aliases. In any case I think it's 
still necessary for a SlotAcc that replaces itself to separately register a new 
read-access instance with the {{SweepingAcc}} (or {{SweepingCountSlotAcc}}), 
since the return value of {{registerSweepingAccs(...)}} is used to modify 
"write" access (on the {{collectAcc}}).
{quote}CountSlotAcc.getBaseSweepingAcc(...) really messy and code-smell-ish 
right now
{quote}
Ah, yeah. I'm sorry I didn't explicitly address the nocommit comment about 
that, but I agree and I think your suggestion of "Perhaps a single 
initSweeping(CollectSlotAccMappingAware notify) for use by processors, that 
fails if called more then once? All other code paths use getBaseSweepingAcc()" 
is a good way forward. I was waiting to address that until after discussion of 
these other questions.
{quote}SweepingCountSlotAcc extends CountSlotArrAcc ... a new concrete 
CountSlotAcc subclass
{quote}
This seems reasonable. If you don't mind, I can take a crack at seeing what 
this might look like. I'm particularly interested in doing this in a way that 
keeps the "sweeping" stuff modular (under the hood, at least), since it's 
possible that other concrete {{CountSlotAcc}} subclasses (whose core "count" 
functionality is substantially different) could still use the exact same code 
for coordinating sweeping. (I'm thinking particularly about cached counts ... 
definitely conscious of not having that bleed into this issue, just trying to 
be transparent!).

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to