virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433662089

   In the second case where dn is not connected to active nn, the BP offer 
service would still list active nn as nn-1. The only way for us to actually let 
a client (administrative applications in this case) know that the given dn is 
actually out of luck connecting to active nn is by exposing new metric which 
does internal check of looping through BP service actor metrics and making sure 
that all BPs have exactly one nn listed as "Active" and has 
lastHeartbeatReponseTime within few seconds.
   
   This is the logic we somehow needs to expose for the clients (admins to take 
actions, for k8s, it will be some scripting that checks health of dn pods 
periodically).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to