navina commented on PR #9800: URL: https://github.com/apache/pinot/pull/9800#issuecomment-1319109856
Summarizing the discussion (or re-discussion) with @mayankshriv / @snleee / @npawar : 1. We all agree that periodic tasks on controller is not built for continuously / frequently run jobs. Moreover, this job will increase intra-cluster traffic and can have a negative impact on performance of Pinot components and, possibly even the upstream source (eg. in Kafka). 2. We all agree that this is not the best approach for emitting consumer lag metrics. It will be better to emit metrics on the server side and aggregate in the monitoring layer. Aggregating in the monitoring the layer and defining the alerting rules has its challenges, esp. during ongoing cluster operations. This has been a challenge in the past with other server-side metrics like `LLC_PARTITION_CONSUMING` . Here is the plan of action: 1. Let's leave this periodic task as an "opt-in" task in the controller. I will add a controller config that will define whether to enable this task or not. By default, it will be turned off. 2. I will take a stab at adding the lag metric from the server side and create a follow-up PR. I would like to keep both options open for use in production so that we can observe how these metrics/handlers workout under various scenarios. @mcvsubbu if Linkedin is also working on the lag metrics, can you please share the design and the timeline for this ? I want to make sure design aligns and works with existing OSS apis. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org