liubin101 opened a new pull request, #7997:
URL: https://github.com/apache/hadoop/pull/7997

   ### Description of PR
   Under the YARN Federation architecture, when there's only one sub-cluster, 
or when multiple sub-clusters exist but a queue is configured to submit 
exclusively to a specific sub-cluster, the following problem occurs: If this 
target sub-cluster failover, its state will remain "non-active" until the new 
active RM completes registration and sends the first heartbeat (default 30 
seconds later). During this window, any client attempting to submit 
applications will fail with the error: "No active SubCluster available to 
submit the request" or "No positive weight found on active subclusters".
   
   In non-YARN Federation setups, clients automatically retry during RM 
failover. We expect this retry behavior to be preserved when transitioning to 
the YARN Federation architecture.
   
   ### How was this patch tested?
   unit test
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to