msfroh commented on issue #9721: URL: https://github.com/apache/lucene/issues/9721#issuecomment-2098929299
> jpountz said: > It depends on queries. For term queries, duplicating the overhead of looking up terms in the terms dict may be ok, but for multi-term queries and point queries that often compute the bit set of matches of the whole segment, this could significantly hurt throughput. Maybe it doesn't have to be this way for the first iteration (progress over perfection), but this feels important to me so that we don't have weird recommendations like "only enable intra-segment concurrency if you don't use multi-term or point queries". I was thinking a bit about intra-segment concurrency this morning and got thinking specifically about multi-term, point, and vector queries that do most of their heavy-lifting up front (to the point where I've seen a bunch of profiles where relatively little time is spent actually iterating through DISIs). Those queries (or at least their ScorerSuppliers) "know" when they're going to be expensive, so it feels like they're in the best position to say "I should be parallelized". What if ScorerSupplier could take a reference to the IndexSearcher's executor and return a CompletableFuture for the Scorer? Something like TermQuery could return a "completed" future, while "expensive" scorers could be computed on another thread. It could be a quick and easy way to parallelize some of the per-segment computation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org