javanna opened a new issue, #13952:
URL: https://github.com/apache/lucene/issues/13952

   Search concurrency creates one task per slice. A slice is a collection of 
one or more segment partitions (although segments are not split into partitions 
by default just yet). The slicing mechanism allows users to plug in their own 
slicing logic via the `IndexSearcher#slices` method. Commonly, the slicing 
depends on the type of operation and number of targeted documents (e.g. 
grouping minimum amount of documents in the same slice, as too many small tasks 
are not going to increase overall performance), as well as number of threads 
available (it does not make a lot of sense to create more tasks than threads 
available).
   
   The knn query rewrite parallelizes execution as well via `TaskExecutor`, but 
it does not rely on the slicing mechanism. The idea around that is that due to 
the nature of the workload, it's cost is determined by the number of segments 
rather than the number of docs. This has been discussed in the past, see #12385 
. A direct consequence of this decision is that knn query rewrite does not 
provide any way of limiting the number of tasks created: either users don't 
provide an executor to the searcher, in which case there will be no concurrency 
anywhere, or knn query rewrite will create one task per segment. With this, it 
does not take many parallel knn query rewrite operations running for tasks to 
queue up.
   
   Especially with the recent changes made in #13472 and its work stealing 
approach, when segment level tasks queue up, their work is in practice executed 
by the caller thread, and queued up items become no-op when their work was 
stolen, yet they still contribute to the queue size until they are "executed" 
and removed from the queue. This is ok for situations where a separate executor 
for parallel execution is used, as that may have an unbounded queue, yet it is 
apparent that we end up creating hundreds of tasks potentially for no gain. 
With a single executor model, you may incur in search rejections more often 
than you'd want, due to the no-op tasks queueing up: although segment level 
tasks are never rejected, rather executed on the caller thread upon rejection, 
calls to `IndexSearcher#search` from the same executor may be rejected.
   
   I think that there should be a way to somehow cap the number of created 
tasks for all parallelizable operations, at the very least based on the number 
of available threads: why create more tasks than number of threads to execute 
them? That may be a good reason to move back the knn query rewrite to rely on 
slicing, as that allows for more flexibility. Yet there are other scenarios 
where `TaskExecutor` is used to parallelize operations, and it seems like a 
general good idea to try and limit the number of tasks created for all these 
potential usecases that don't necessarily fit into the slicing logic. The next 
question though is: if we limit the number of created tasks, without relying on 
slicing, how do we batch together different segments? Do we just created a few 
tasks per segment until we reach the threshold, then execute sequentially 
whatever task we have?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to