jpountz opened a new issue, #12375: URL: https://github.com/apache/lucene/issues/12375
### Description It is a common need to run some logic after a segment has been collected. Even though, I can't find previous instances of this discussion I'm pretty sure that this has been raised several times in the past, and the answer was essentially that this logic can easily be implemented on top of Lucene. One good example of this is our own `FacetsCollector`, which collects the set of matching docs per segment: `getLeafCollector` appends the set of doc IDs that were collected on the previous segment to the set, and `getMatchingDocs` takes care of the last segment, since `getLeafCollector` doesn't get called anymore after the last segment has been collected. However, this approach is not perfect. If you are leveraging Lucene's concurrent search capabilities, this forces the post collection logic to run in the current thread for at least one segment per slice, instead of using the executor. This is a missed opportunity for search concurrency, since post collection logic is not always cheap. For instance, in the case of `FacetsCollector` it needs to run `DocIdSetBuilder.build()` which may need to sort a large array of doc IDs. Having a `LeafCollector.postCollect()` API or something along these lines would help address this issue, as `postCollect()` would get called on the `IndexSearcher`'s `executor`. I looked at our collectors to get a sense of how many of our collectors could take advantage of a `postCollect()` hook and found the following ones: - `org.apache.lucene.facet.FacetsCollector` - `org.apache.lucene.search.grouping.BlockGroupingCollector` - `org.apache.lucene.search.grouping.TermGroupFacetCollector` - `org.apache.lucene.search.suggest.document.TopSuggestDocsCollector` - `org.apache.lucene.search.CachingCollector` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org