lnbest0707-uber commented on PR #12336: URL: https://github.com/apache/pinot/pull/12336#issuecomment-1924507664
> > > the approach looks good. i was wondering if we should have some definitions. e.g. when nReplica equals the following, how should we call each status. HighAvailability is not a very well defined term > > > ``` > > > - 0 : OFFLINE > > > - 1 : ?? > > > - expected - 1 : ?? > > > - expected : HEALTHY > > > ``` > > > > > > Thanks for the review. Yes, having a well defined definition is important. The "high availability" term I am using was refer to [hadoop HA definition](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html), which means running with 2 or more nodes for the same instance as active/standby(s). The issue is not exactly same as here and I am open to the naming. > > Theoretically, running with expected - 1 is what we always observe during real world node replacement. Probably track "actual <= expected - 1" as SEGMENTS_WITH_LESS_REPLICAS? And running with only 1 replica has far more risks than other <expected values (e.g. 2, 3 ...). Not sure if it is a overkill to track it separately. > > I think SEGMENTS_WITH_REDUCED_CAPACITY OR LESS_RESPLICAS is what is useful for admin. This metrics should alert users when there is a prolonged time of reduced capacity. Yes, switched to "SEGMENTS_WITH_LESS_REPLICAS". The failure in the integration test looks not related.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org