harshavamsi commented on issue #9721: URL: https://github.com/apache/lucene/issues/9721#issuecomment-2099280561
> > jpountz said: > > It depends on queries. For term queries, duplicating the overhead of looking up terms in the terms dict may be ok, but for multi-term queries and point queries that often compute the bit set of matches of the whole segment, this could significantly hurt throughput. Maybe it doesn't have to be this way for the first iteration (progress over perfection), but this feels important to me so that we don't have weird recommendations like "only enable intra-segment concurrency if you don't use multi-term or point queries". > > I was thinking a bit about intra-segment concurrency this morning and got thinking specifically about multi-term, point, and vector queries that do most of their heavy-lifting up front (to the point where I've seen a bunch of profiles where relatively little time is spent actually iterating through DISIs). > > Those queries (or at least their ScorerSuppliers) "know" when they're going to be expensive, so it feels like they're in the best position to say "I should be parallelized". What if ScorerSupplier could take a reference to the IndexSearcher's executor and return a CompletableFuture for the Scorer? Something like TermQuery could return a "completed" future, while "expensive" scorers could be computed on another thread. It could be a quick and easy way to parallelize some of the per-segment computation. To add on to this, I was wondering if we could further extend the concurrent logic within a query. For example, in range queries today we traverse the BKD over the whole range. What if we could split the range and give them to an executor to intersect the range? Then we could construct the DISI through multiple threads. Similarly in a terms query, we could get each term to parallely create their BitSets/Iterators and then conjunction/disjunctions over them can happen all at once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org