lnbest0707-uber opened a new pull request, #12336: URL: https://github.com/apache/pinot/pull/12336
Summary: Add metrics to monitor any segments running with only one replica. This could help monitor the reliability risk, especially during real world node replacement. Reasons for not using existing metrics: - PERCENT_SEGMENTS_AVAILABLE is tracking difference between nOffline and nSegments. Where nOffline refers to nReplicas == 0. In node replacement workflow, we need to track the number of segments whose nReplica==1 which does not have any representation now. - PERCENT_OF_REPLICAS only tells if any segment in the table is unavailable. It is counting segment with lowest percentage of replica running. For example, if a table has 2000 segments, 1999 of them are with 2 replicas, 1 of them is with 1 replica. The metrics will show 50%. In another example, if 1000 of them are with 2 replicas, the other 1000 of them are with 1 replica. The metrics will still show 50%. But apparently those 2 cases have significantly different level of risks. We need differ them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org