virajjasani opened a new pull request, #6462:
URL: https://github.com/apache/hbase/pull/6462

   Jira: HBASE-28638
   
   Master initiated remote procedures are scheduled by RSProcedureDispatcher. 
If it encounters specific errors on first retry (e.g. CallQueueTooBigException 
or SaslException), it is guaranteed that the remote call has not reached the 
regionserver, therefore the remote call is marked failed prompting the parent 
procedure to select different target regionserver to resume the operation.
   If the first attempt is successful, RSProcedureDispatcher continues with 
infinite retries. We can encounter valid case (e.g. ConnectionClosedException) 
which is halting the remote operation. Without manual intervention, it can 
cause significant delay upto several minutes or hours to the 
region-in-transition.
   
   The purpose of this Jira is to impose retry limit for specific error types 
such that if the retry limit is reached, the master can recover the state of 
the ongoing remote call failure by initiating SCP (ServerCrashProcedure) on the 
target server. The SCP is going to override the TRSP 
(TransitRegionStateProcedure) if required. This can ensure that the target 
server has no region hosted online before we suspend the ongoing TRSP.
   
   Scheduling SCP for the target server will always lead to the regionserver in 
stopped state. Either regionserver would be automatically stopped, or if the 
regionserver is able to send the region report to master, master will reject 
it, which will further lead to regionserver abort.
   
   
   **Changes proposed:**
   
   - Allow extending RSProcedureDispatcher
   - RSProcedureDispatcher can impose retry limit for specific errors:
     - CallQueueTooBigException
     - SaslException
     - ConnectionClosedException
   - Default retry limit: 5
   - If retry limit is exhausted, schedule recovery through server crash. Let 
SCP override current procedure state.
   - Tests for ConnectionClosedException


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to