[
https://issues.apache.org/jira/browse/HBASE-29206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937435#comment-17937435
]
Duo Zhang commented on HBASE-29206:
-----------------------------------
After reviewing related code, I think a possible way is to record all the
resumed servers in RollingBatchSuspendResumeRsAction, and before running any
new actions, we need to make sure that all these region servers are still
alive, and also check again before finishing this action.
> RollingBatchSuspendResumeRsAction can not actually 'resume' a region server
> ---------------------------------------------------------------------------
>
> Key: HBASE-29206
> URL: https://issues.apache.org/jira/browse/HBASE-29206
> Project: HBase
> Issue Type: Improvement
> Components: integration tests
> Reporter: Duo Zhang
> Priority: Major
>
> After HBASE-28023, we can successfully suspend and resume the region servers,
> but the problem is that usually after resuming, the region server will
> receive a YouAreDeadException while calling reportForDuty and also the zk
> session will expire, so soon the region server will abort.
> A possible way is to use autostart, so the region server will restart
> automatically, but the problem is that, we also have some restart actions in
> our integration tests, they use 'start' instead of 'autostart', so even if we
> use autostart, after a BatchRestartRsAction we will fallback to start and
> cause problem when a RollingBatchSuspendResumeRsAction comes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)