ankitsultana commented on issue #14685: URL: https://github.com/apache/pinot/issues/14685#issuecomment-2628031935
@Jackie-Jiang : adding some color here. I agree completely with not having too many configs, but in this case, the workload profile is very different for the V1 Engine max initial capacity and the V2 Engine AggregateOperator max initial capacity. The V1 Engine will run the `GroupByOperator` for each matching segment for a query, and may end up running the operator 1000s of times for a query. Whereas the AggregateOperator in the MSE will run (Number of agg operators * stageParallelism) times, which is orders of magnitude less. Since the V1 Engine deals with a much lower amount of data, setting a low value of initial result holder capacity for it makes sense. The V2 Engine AggregateOperator however would aggregate over all the groups returned by the leaf operator, which may be in the 100s of millions per server, in which case the hash-map sizing becomes quite crucial. Hence our proposal is to have a separate config for `maxInitialResultHolderCapacity` for the V2 Engine Group By operators. This means that we should use the new config for sizing both the `GroupByResultHolder` and the `GroupIdGenerator` in the V2 Engine `AggregateOperator`. <img width="1134" alt="Image" src="https://github.com/user-attachments/assets/1368ccfe-7a4c-48ac-8a69-bcb5afc4491e" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org