virajjasani commented on PR #5396: URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433662089
In the second case where dn is not connected to active nn, the BP offer service would still list active nn as nn-1. The only way for us to actually let a client (administrative applications in this case) know that the given dn is actually out of luck connecting to active nn is by exposing new metric which does internal check of looping through BP service actor metrics and making sure that all BPs have exactly one nn listed as "Active" and has lastHeartbeatReponseTime within few seconds. This is the logic we somehow needs to expose for the clients (admins to take actions, for k8s, it will be some scripting that checks health of dn pods periodically). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
