expani commented on issue #13745:
URL: https://github.com/apache/lucene/issues/13745#issuecomment-3058268142

   I was looking to integrate Intra Segment Concurrent Search and found that 
this same problem also applies to downstream consumers of Lucene like 
OpenSearch/ElasticSearch/Solr who use Collectors to build out their Aggregation 
framework. 
   
   Since, we have to make a Query/Collector aware that they are participating 
in an Intra Segment Concurrent Search via Constructor like the [initial PR did 
for TotalHitCountCollectorManager 
](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollectorManager.java#L48-L62)
 the changes required would increase unless we go case by case basis. 
   
   
   Recording the call flow for my own understanding 
   ```
   IndexSearcher#search(Query query, CollectorManager<C, T> collectorManager)
                                   --- calls ---
   IndexSearcher#search(Weight weight, CollectorManager<C, T> collectorManager, 
C firstCollector)
                                         --- calls from a Runnable per Slice ---
   IndexSearcher#search(LeafReaderContextPartition[] partitions, Weight weight, 
Collector collector)
                                            --- calls ---
   IndexSearcher#searchLeaf(LeafReaderContext ctx, int minDocId, int maxDocId, 
Weight weight, Collector collector)
   ```
   
   Two threads can invoke `searchLeaf` with the same LeafReaderContext but for 
different partitions of the segment. 
   
   Things inside `searchLeaf` that need to be done only once even during intra 
segment concurrent search 
   ```
   Collector#getLeafCollector()
   Weight#scorerSupplier()
   ScorerSupplier#bulkScorer()
   LeafReaderContext#reader()#getLiveDocs()
   LeafCollector#finish()
   ```
   
   Other downstream users do some extra operations when profiling the queries. 
   
   My proposal is to handle the de-duplication at the IndexSearcher and ensure 
the above listed steps are only done once per LeafSlice. 
   
   @javanna I would like to pick this up unless you are almost done with 
PointRangeQuery. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to