[jira] [Commented] (LUCENE-10001) Make CollectionTerminatedException handling in MultiCollector configurable

Greg Miller (Jira) Tue, 15 Jun 2021 06:25:06 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363632#comment-17363632
 ]


Greg Miller commented on LUCENE-10001:
--------------------------------------

Thanks [~jpountz]! I'll clarify a bit but you've mostly nailed it.
{quote}Can you help me understand the use-case a bit more? E.g. if I think of 
the use-case of serving queries on an e-commerce catalog with a single-segment 
index (fully-merged index) sorted by popularity, Lucene would only need to 
collect 100 hits to return the first page of hits (assuming 100 hits per page). 
But then only getting facets for 100 hits sounds way too low to me to be 
actually useful?
{quote}
This is exactly the idea but collecting a fair amount deeper than 100. The 
facet counts with this approach are indeed incomplete but, for the use-case I 
have in mind, generally "see enough" results to still be useful. What I'd tweak 
about your example would be that, if generating hits for a first page of 100 
results, we might collect something more along the lines of 10,000 before 
terminating. So we would still gather faceting information for hits well beyond 
that first page of results, but still wouldn't cover everything.
{quote}Even in the case when I'm missing something and this would be useful, I 
would rather like the collector that makes the termination decision to wrap the 
other collector in such a case, in order to make the connection more explicit 
that which hits are visited by the faceting collector depends on decisions made 
by the top-hits collector?
{quote}
Hmm, yeah. I think I like this approach better as well. Thanks for the 
suggestion! I'll try something in this direction instead and circle back.
{quote}Or if you only run facets on top hits, maybe a simpler approach would be 
to run the faceting collector in a second pass, using doc IDs from the 
ScoreDoc[] array returned by the top-hits collector instead of doing a single 
pass with a MultiCollector?
{quote}
Totally! This works well until you mix in the need for {{DrillSideways}}. In 
cases where we need "alternative" counts for a facet (e.g., counting additional 
values for a facet that already has a selection applied), we need to populate 
our {{FacetsCollector}} in that initial pass since, by the second pass, all 
constraints have been applied and "alternative" values will incorrectly get 
zero-counts.

> Make CollectionTerminatedException handling in MultiCollector configurable
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-10001
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10001
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: main (9.0)
>            Reporter: Greg Miller
>            Assignee: Greg Miller
>            Priority: Minor
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In LUCENE-6772, {{MultiCollector}} was modified to continue collecting 
> against other leaf collectors that had not thrown a 
> {{CollectionTerminatedException}} in cases where another one does. It would 
> be nice if this behavior could be configurable. Some use-cases might actually 
> want to early terminate all leaf collectors as soon as one signals early 
> termination.
> We could add a configurable option to the {{MultiCollector#wrap}} factory 
> methods that allows users to specify the behavior they want.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10001) Make CollectionTerminatedException handling in MultiCollector configurable

Reply via email to