javanna opened a new issue, #12498:
URL: https://github.com/apache/lucene/issues/12498

   `IndexSearcher` supports parallelizing search across slices when an executor 
is available. Knn queries can also parallelize their rewrite across segments 
using the same executor. Potentially, other operations will be parallelized in 
the future using the same pattern.
   
   Lucene currently has the notion of a `TaskExecutor` (previously named 
`SliceExecutor`, but because knn rewrite is parallelized across segments rather 
than slices, it was recently renamed to `TaskExecutor`) which is responsible 
for offloading tasks to the executor, wait for them all to complete and return 
the corresponding results.
   
   IndexSearcher currently has an [instanceof 
check](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1011)
 of the provided executor, and if it's a `ThreadPoolExecutor`, which is likely, 
a `QueueSizeBasedExecutor` is used which tries to be smart about when to 
offload tasks to the executor vs when to execute them on the caller thread. 
This behaviour is not configurable in any way.
   
   As part of exposing concurrent search in Elasticsearch, we have found that 
the current task executor is too opinionated, and either it needs to be 
customizable, or simplified to be less opinionated.
   
   In a server-side search engine, the search itself is likely executed by a 
thread pool that has its own sizing, queueing mechanisms as well as rejection 
policy. When offloading requests to an executor for additional parallelism, it 
is important to be able to determine where the load is going to be and what 
type of workload it maps to. Ideally, the caller thread is merely coordinating 
and not doing any I/O nor CPU intensive work, that is instead all delegated to 
the separate worker threads. Having built-in rules for when to execute certain 
operations on the caller thread may cause more problems than it solves, as it 
is unpredictable and makes sizing of thread pools more difficult, because all 
of a sudden you end up with two thread pools that may execute I/O as well as 
CPU intensive operations.
   
   My conclusion is that if flexibility is needed in terms of possibly 
executing on the caller thread, such behaviour can be included in the executor 
that is provided to the searcher (for example with an adaptive mechanism that 
conditionally executes directly instead of offloading based on queue size like 
`QueueSizeBasedExecutor` does), together with its saturation policy (as opposed 
to catching `RejectedExecutionException` and executing on the caller thread, 
which is potentially dangerous). Also, executing the last slice / task on the 
caller thread, as it's the one waiting for all the tasks to complete, is not 
necessarily addressing a real problem around under-utilization of a thread that 
is doing nothing. That wait is cheap, and it's possibly more important to 
divide the workload among the thread pools. That said, such behavior can also 
be included in the `Executor` itself and does not require additional extension 
points.
   
   My proposal is that we remove the `QueueSizeBasedExecutor` entirely and we 
simply offload every task to the executor, unconditionally. It's up to the 
provided executor to determine what to do when execute is called. That is the 
pluggable behaviour. Lucene should not have strong opinions nor provide 
building blocks for how to execute concurrent tasks.
   
   Additionally, I think that we should unconditionally offload execution to 
the executor when available, even when we have a single slice. It may seem 
counter intuitive but it's again to be able to determine what type of workload 
each thread pool performs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to