Re: [PR] WIP: draft of intra segment concurrency [lucene]

via GitHub Mon, 08 Jul 2024 03:12:07 -0700


javanna commented on PR #13542:
URL: https://github.com/apache/lucene/pull/13542#issuecomment-2213602344


   Bulk reply to some of the feedback I got:
   
   hi @shubhamvishu , 
   
   > I know it might be too early to ask(as changes are not yet consolidated), 
but curious if we have any early benchmarking numbers when intra concurrency is 
enabled?
   
   I have run no benchmarks so far. I focused more on making tests happy, which 
felt like a good initial step. We do know that there will be a substantial gain 
when we search force merged indices.
   
   
   @mikemccand 
   
   > I think queries that do substantial work in Weight/ScorerSupplier.get are 
also tricky? E.g. points queries (PointRangeQuery) walks the BKD tree to build 
a bitset. Ideally, if a segment has multiple LeafReaderContextPartitions, we'd 
only do the BKD intersect once, and then the N threads working on those 
partitions can share that bitset?
   
   Yes, definitely, I mentioned this above in the description. Removing 
duplicated effort would be great, but I focused on making things work to start 
with. What you mention is a natural next step. It may though end up influencing 
the design depending on where we decide to de-duplicate work: does the notion 
of segment partition needs to be visible to consumers, or can it be an 
implementation detail?
   
   @jpountz 
   
   > I wonder if it should be on IndexSearcher to not call 
Collector#getLeafCollector multiple times if the first call throws a 
CollectionTerminatedException (what you implemented) or if it should be on the 
CollectorManager to no longer assume that Collector#getLeafCollector gets 
called exactly once per leaf. The latter feels cleaner to me conceptually, and 
I believe that it would also avoid the issue that you had with LRUQueryCache? 
The downside is that we'll need to fix all collectors that make this assumption.
   
   Agreed, that adjustment is a hack. The change in expectation should be 
reflected in the `Collector` API semantics though (rather that 
`CollectorManager`?), is that what you meant? For instance, taking the total 
hit count example, `TotalHitCountCollector` would need to be adapted to handle 
the case where `getLeafCollector` gets called providing the same leaf multiple 
times? It would need to be made synchronized and it would need to track which 
leaves were early terminated? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] WIP: draft of intra segment concurrency [lucene]

Reply via email to