javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2213602344
Bulk reply to some of the feedback I got: hi @shubhamvishu , > I know it might be too early to ask(as changes are not yet consolidated), but curious if we have any early benchmarking numbers when intra concurrency is enabled? I have run no benchmarks so far. I focused more on making tests happy, which felt like a good initial step. We do know that there will be a substantial gain when we search force merged indices. @mikemccand > I think queries that do substantial work in Weight/ScorerSupplier.get are also tricky? E.g. points queries (PointRangeQuery) walks the BKD tree to build a bitset. Ideally, if a segment has multiple LeafReaderContextPartitions, we'd only do the BKD intersect once, and then the N threads working on those partitions can share that bitset? Yes, definitely, I mentioned this above in the description. Removing duplicated effort would be great, but I focused on making things work to start with. What you mention is a natural next step. It may though end up influencing the design depending on where we decide to de-duplicate work: does the notion of segment partition needs to be visible to consumers, or can it be an implementation detail? @jpountz > I wonder if it should be on IndexSearcher to not call Collector#getLeafCollector multiple times if the first call throws a CollectionTerminatedException (what you implemented) or if it should be on the CollectorManager to no longer assume that Collector#getLeafCollector gets called exactly once per leaf. The latter feels cleaner to me conceptually, and I believe that it would also avoid the issue that you had with LRUQueryCache? The downside is that we'll need to fix all collectors that make this assumption. Agreed, that adjustment is a hack. The change in expectation should be reflected in the `Collector` API semantics though (rather that `CollectorManager`?), is that what you meant? For instance, taking the total hit count example, `TotalHitCountCollector` would need to be adapted to handle the case where `getLeafCollector` gets called providing the same leaf multiple times? It would need to be made synchronized and it would need to track which leaves were early terminated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org