mcvsubbu commented on issue #10137: URL: https://github.com/apache/pinot/issues/10137#issuecomment-1396217784
@ankitsultana anda @Jackie-Jiang Pinot serves queries while consuming rows. In order to have enough CPU left over to process queries, it is best to limit the consumption rate to some maximum number. That is the reason we have the math as described -- the number of rows consumed by any instance within a period of time is limited to a max number. Further, if we rebalance the table with more instances per replica, the number automatically adjusts itself. Granted, that all this does not help much when there are multiple tables in a tenant. But then neither does the new config as proposed. Instead of coming up with yet another config that the administrator cannot do much with in the long run, there are a few things we may be able to do: - Allow the user to specify a max for the average ingestion rate (bytes may make more sense) for each instance in the cluster. - Have a continuous measurement of the ingestion rate (again, bytes may be better than rows) - Run an algorithm on the controller to come up with an optimal number of rows for a segment given the tables and partitions hosted (either in the tenant or in the cluster, as the case maybe) - If any host in the cluster is in danger of exceeding the max, then raise a metric. I agree this is more complex than adding another config, but it is also more useful, and hopefully more interesting problem to solve. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org