yashmayya opened a new pull request, #14843: URL: https://github.com/apache/pinot/pull/14843
- For most leaf stage combine operators (in SSQE), we calculate the number of tasks spawned as `Math.min(numSegments, maxExecutionThreads)` where `maxExecutionThreads` is calculated as `Math.max(1, Math.min(10, numCores / 2))` (unless it is explicitly specified in the query options using `maxExecutionThreads`). - Each of these tasks processes a subset of the segments. However, for the `GroupByCombineOperator` specifically, this `maxExecutionThreads` is currently overridden to be equal to the number of segments being processed (see [here](https://github.com/apache/pinot/blob/d1ac83e98020810560acb0c88a6fb7475c596252/pinot-core/src/main/java/org/apache/pinot/core/operator/combine/GroupByCombineOperator.java#L82-L91) by default for some reason. - This is not so problematic for the single-stage query engine where these tasks are submitted to a fixed thread pool executor service with the number of threads being `2 * numCores` by default ([this one](https://github.com/apache/pinot/blob/d1ac83e98020810560acb0c88a6fb7475c596252/pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/resources/ResourceManager.java#L94-L95)). - In fact, some of the combine operators even have comments stating that while the `numTasks` are the number of async tasks submitted to the executor service, the actual number of threads used by the server would be limited to the fixed number of threads available in the executor service. - However, in the multi-stage engine's leaf stages, the executor service used here is the same one from `QueryRunner` - i.e., [this cached thread pool executor](https://github.com/apache/pinot/blob/fbbabd43f2d8ef6f8e4966d8f7f60dc88dfcede8/pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java#L167-L169) that can spawn infinite threads. This means that for leaf stage operators that involve a group by, the number of threads spawned for a single multi-stage query on each server can be equal to the number of segments being processed on that server (could be thousands of threads which is super problematic). - This patch updates the override to cap the number of tasks spawned by the `GroupByCombineOperator` by default to be the default number of query worker threads (which is `2 * numCores`). This shouldn't impact SSQE performance while also avoiding the creation of a huge number of threads in MSQE. - Note that this is only a bandaid solution and we need to more carefully think through all the implications of the threading model for the MSQE (cc @gortiz who has some interesting ideas here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org