yashmayya opened a new pull request, #14843:
URL: https://github.com/apache/pinot/pull/14843

   - For most leaf stage combine operators (in SSQE), we calculate the number 
of tasks spawned as `Math.min(numSegments, maxExecutionThreads)` where 
`maxExecutionThreads` is calculated as `Math.max(1, Math.min(10, numCores / 
2))` (unless it is explicitly specified in the query options using 
`maxExecutionThreads`). 
   - Each of these tasks processes a subset of the segments. However, for the 
`GroupByCombineOperator` specifically, this `maxExecutionThreads` is currently 
overridden to be equal to the number of segments being processed (see 
[here](https://github.com/apache/pinot/blob/d1ac83e98020810560acb0c88a6fb7475c596252/pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByCombineOperator.java#L82-L91)
 by default for some reason. 
   - This is not so problematic for the single-stage query engine where these 
tasks are submitted to a fixed thread pool executor service with the number of 
threads being `2 * numCores` by default ([this 
one](https://github.com/apache/pinot/blob/d1ac83e98020810560acb0c88a6fb7475c596252/pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/resources/ResourceManager.java#L94-L95)).
 
   - In fact, some of the combine operators even have comments stating that 
while the `numTasks` are the number of async tasks submitted to the executor 
service, the actual number of threads used by the server would be limited to 
the fixed number of threads available in the executor service. 
   - However, in the multi-stage engine's leaf stages, the executor service 
used here is the same one from `QueryRunner` - i.e., [this cached thread pool 
executor](https://github.com/apache/pinot/blob/fbbabd43f2d8ef6f8e4966d8f7f60dc88dfcede8/pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java#L167-L169)
 that can spawn infinite threads. This means that for leaf stage operators that 
involve a group by, the number of threads spawned for a single multi-stage 
query on each server can be equal to the number of segments being processed on 
that server (could be thousands of threads which is super problematic).
   - This patch updates the override to cap the number of tasks spawned by the 
`GroupByCombineOperator` by default to be the default number of query worker 
threads (which is `2 * numCores`). This shouldn't impact SSQE performance while 
also avoiding the creation of a huge number of threads in MSQE.
   - Note that this is only a bandaid solution and we need to more carefully 
think through all the implications of the threading model for the MSQE (cc 
@gortiz who has some interesting ideas here).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to