[GitHub] [pinot] navina commented on pull request #9800: Adding a consumer lag as metric via a periodic task in controller

GitBox Wed, 16 Nov 2022 15:30:17 -0800


navina commented on PR #9800:
URL: https://github.com/apache/pinot/pull/9800#issuecomment-1317817172


   > I'm not sure running the periodic task every minute or so is a good idea!
   
   Agreed here :) We will likely not set it to query every minute.
   
   > If we choose to emit the metric on the server side, then we can change the 
gauge as soon as the events are consumed. It's just up to the metric & 
monitoring system (outside pinot) to aggregate the metric values (e.g. finding 
max value) for different replicas of each partition.
   
   Agree that we can detect it sooner. but there doesn't seem to be a good way 
to aggregate it in the monitoring layer in the presence of rebalance 
(clean/unclean) or consuming segment re-distribution for any other reason. 
   We have also noted that sometimes all consuming segments get into ERROR 
state (maybe consumer crashed or hanged) and yet the monitoring metric 
`LLC_PARTITION_CONSUMING` doesn't detect [ @npawar may have more context ].
   Moreover adding a metric in the segment data manager feels like tip-toeing 
across a landmine.
   
   A much cleaner way would be to emit at partition level from the connector 
plugin directly or from server (without involving the server tag, but a stable 
replica id tag). I believe there are some dependency issues to be sorted out 
before getting there. 
   
   > I believe we do invoke the code to remove a metric each time a partition 
completes consumption.
   
   This works well in a stable state and clean operations. But this doesn't 
cover cases of unclean shutdown / crashes in production and it has generally 
been observed to be not very reliable. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] navina commented on pull request #9800: Adding a consumer lag as metric via a periodic task in controller

Reply via email to