jpountz opened a new issue, #12375:
URL: https://github.com/apache/lucene/issues/12375

   ### Description
   
   It is a common need to run some logic after a segment has been collected. 
Even though, I can't find previous instances of this discussion I'm pretty sure 
that this has been raised several times in the past, and the answer was 
essentially that this logic can easily be implemented on top of Lucene. One 
good example of this is our own `FacetsCollector`, which collects the set of 
matching docs per segment: `getLeafCollector` appends the set of doc IDs that 
were collected on the previous segment to the set, and `getMatchingDocs` takes 
care of the last segment, since `getLeafCollector` doesn't get called anymore 
after the last segment has been collected.
   
   However, this approach is not perfect. If you are leveraging Lucene's 
concurrent search capabilities, this forces the post collection logic to run in 
the current thread for at least one segment per slice, instead of using the 
executor. This is a missed opportunity for search concurrency, since post 
collection logic is not always cheap. For instance, in the case of 
`FacetsCollector` it needs to run `DocIdSetBuilder.build()` which may need to 
sort a large array of doc IDs. Having a `LeafCollector.postCollect()` API or 
something along these lines would help address this issue, as `postCollect()` 
would get called on the `IndexSearcher`'s `executor`.
   
   I looked at our collectors to get a sense of how many of our collectors 
could take advantage of a `postCollect()` hook and found the following ones:
    - `org.apache.lucene.facet.FacetsCollector`
    - `org.apache.lucene.search.grouping.BlockGroupingCollector`
    - `org.apache.lucene.search.grouping.TermGroupFacetCollector`
    - `org.apache.lucene.search.suggest.document.TopSuggestDocsCollector`
    - `org.apache.lucene.search.CachingCollector`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to