ankitsultana commented on issue #14685:
URL: https://github.com/apache/pinot/issues/14685#issuecomment-2628031935

   @Jackie-Jiang : adding some color here. I agree completely with not having 
too many configs, but in this case, the workload profile is very different for 
the V1 Engine max initial capacity and the V2 Engine AggregateOperator max 
initial capacity.
   
   The V1 Engine will run the `GroupByOperator` for each matching segment for a 
query, and may end up running the operator 1000s of times for a query. Whereas 
the AggregateOperator in the MSE will run (Number of agg operators * 
stageParallelism) times, which is orders of magnitude less.
   
   Since the V1 Engine deals with a much lower amount of data, setting a low 
value of initial result holder capacity for it makes sense.
   
   The V2 Engine AggregateOperator however would aggregate over all the groups 
returned by the leaf operator, which may be in the 100s of millions per server, 
in which case the hash-map sizing becomes quite crucial.
   
   Hence our proposal is to have a separate config for 
`maxInitialResultHolderCapacity` for the V2 Engine Group By operators. This 
means that we should use the new config for sizing both the 
`GroupByResultHolder` and the `GroupIdGenerator` in the V2 Engine 
`AggregateOperator`.
   
   <img width="1134" alt="Image" 
src="https://github.com/user-attachments/assets/1368ccfe-7a4c-48ac-8a69-bcb5afc4491e";
 />


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to