[ https://issues.apache.org/jira/browse/LUCENE-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363632#comment-17363632 ]
Greg Miller commented on LUCENE-10001: -------------------------------------- Thanks [~jpountz]! I'll clarify a bit but you've mostly nailed it. {quote}Can you help me understand the use-case a bit more? E.g. if I think of the use-case of serving queries on an e-commerce catalog with a single-segment index (fully-merged index) sorted by popularity, Lucene would only need to collect 100 hits to return the first page of hits (assuming 100 hits per page). But then only getting facets for 100 hits sounds way too low to me to be actually useful? {quote} This is exactly the idea but collecting a fair amount deeper than 100. The facet counts with this approach are indeed incomplete but, for the use-case I have in mind, generally "see enough" results to still be useful. What I'd tweak about your example would be that, if generating hits for a first page of 100 results, we might collect something more along the lines of 10,000 before terminating. So we would still gather faceting information for hits well beyond that first page of results, but still wouldn't cover everything. {quote}Even in the case when I'm missing something and this would be useful, I would rather like the collector that makes the termination decision to wrap the other collector in such a case, in order to make the connection more explicit that which hits are visited by the faceting collector depends on decisions made by the top-hits collector? {quote} Hmm, yeah. I think I like this approach better as well. Thanks for the suggestion! I'll try something in this direction instead and circle back. {quote}Or if you only run facets on top hits, maybe a simpler approach would be to run the faceting collector in a second pass, using doc IDs from the ScoreDoc[] array returned by the top-hits collector instead of doing a single pass with a MultiCollector? {quote} Totally! This works well until you mix in the need for {{DrillSideways}}. In cases where we need "alternative" counts for a facet (e.g., counting additional values for a facet that already has a selection applied), we need to populate our {{FacetsCollector}} in that initial pass since, by the second pass, all constraints have been applied and "alternative" values will incorrectly get zero-counts. > Make CollectionTerminatedException handling in MultiCollector configurable > -------------------------------------------------------------------------- > > Key: LUCENE-10001 > URL: https://issues.apache.org/jira/browse/LUCENE-10001 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Affects Versions: main (9.0) > Reporter: Greg Miller > Assignee: Greg Miller > Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > In LUCENE-6772, {{MultiCollector}} was modified to continue collecting > against other leaf collectors that had not thrown a > {{CollectionTerminatedException}} in cases where another one does. It would > be nice if this behavior could be configurable. Some use-cases might actually > want to early terminate all leaf collectors as soon as one signals early > termination. > We could add a configurable option to the {{MultiCollector#wrap}} factory > methods that allows users to specify the behavior they want. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org