virajjasani opened a new pull request, #6462: URL: https://github.com/apache/hbase/pull/6462
Jira: HBASE-28638 Master initiated remote procedures are scheduled by RSProcedureDispatcher. If it encounters specific errors on first retry (e.g. CallQueueTooBigException or SaslException), it is guaranteed that the remote call has not reached the regionserver, therefore the remote call is marked failed prompting the parent procedure to select different target regionserver to resume the operation. If the first attempt is successful, RSProcedureDispatcher continues with infinite retries. We can encounter valid case (e.g. ConnectionClosedException) which is halting the remote operation. Without manual intervention, it can cause significant delay upto several minutes or hours to the region-in-transition. The purpose of this Jira is to impose retry limit for specific error types such that if the retry limit is reached, the master can recover the state of the ongoing remote call failure by initiating SCP (ServerCrashProcedure) on the target server. The SCP is going to override the TRSP (TransitRegionStateProcedure) if required. This can ensure that the target server has no region hosted online before we suspend the ongoing TRSP. Scheduling SCP for the target server will always lead to the regionserver in stopped state. Either regionserver would be automatically stopped, or if the regionserver is able to send the region report to master, master will reject it, which will further lead to regionserver abort. **Changes proposed:** - Allow extending RSProcedureDispatcher - RSProcedureDispatcher can impose retry limit for specific errors: - CallQueueTooBigException - SaslException - ConnectionClosedException - Default retry limit: 5 - If retry limit is exhausted, schedule recovery through server crash. Let SCP override current procedure state. - Tests for ConnectionClosedException -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org