junegunn commented on code in PR #6797:
URL: https://github.com/apache/hbase/pull/6797#discussion_r2103706270
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java:
##########
@@ -392,10 +399,26 @@ private void closeRegion(MasterProcedureEnv env,
RegionStateNode regionNode)
) {
return;
}
- if (regionNode.isInState(State.OPEN, State.CLOSING, State.MERGING,
State.SPLITTING)) {
- // this is the normal case
- ProcedureFutureUtil.suspendIfNecessary(this, this::setFuture,
- env.getAssignmentManager().regionClosing(regionNode), env,
+
+ CompletableFuture<Void> future = null;
+ if (regionNode.isInState(STATES_EXPECTED_ON_CLOSING)) {
+ // This is the normal case
+ future = env.getAssignmentManager().regionClosing(regionNode);
+ } else if (regionNode.setState(State.CLOSED, State.FAILED_OPEN)) {
+ // FAILED_OPEN doesn't need further transition, immediately mark the
region as closed
+ AssignmentManager am = env.getAssignmentManager();
+ am.getRegionStates().removeFromFailedOpen(regionNode.getRegionInfo());
+ future = am.getRegionStateStore().updateRegionLocation(regionNode);
Review Comment:
Thanks for the feedback. Before I answer the questions, please take a look
at dd8716f2a2. I added a few more steps so that the intention of this change is
clearer.
> We do not need to set the state to CLOSED when the state is FAILED_OPEN?
So the question is: "Is `regionNode.setState(State.CLOSED,
State.FAILED_OPEN)` really necessary? Can we just check
`regionNode.isInState(State.FAILED_OPEN)` here?" Am I right?
If we don't change the state,
```diff
diff --git
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
index 6b5aaf6975..215e1245ed 100644
---
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
+++
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
@@ -404,11 +404,13 @@ public class TransitRegionStateProcedure
if (regionNode.isInState(STATES_EXPECTED_ON_CLOSING)) {
// This is the normal case
future = env.getAssignmentManager().regionClosing(regionNode);
- } else if (regionNode.setState(State.CLOSED, State.FAILED_OPEN)) {
- // FAILED_OPEN doesn't need further transition, immediately mark the
region as closed
+ } else if (regionNode.isInState(State.FAILED_OPEN)) {
+ // Remove the region from RIT list to suppress periodic "RITs over
threshold" messages
AssignmentManager am = env.getAssignmentManager();
am.getRegionStates().removeFromFailedOpen(regionNode.getRegionInfo());
- future = am.getRegionStateStore().updateRegionLocation(regionNode);
+
+ // FAILED_OPEN doesn't need further transition
+ future = CompletableFuture.allOf();
}
if (future != null) {
ProcedureFutureUtil.suspendIfNecessary(this, this::setFuture, future,
env,
```
we will run into an assertion error in `confirmClosed`:
https://github.com/apache/hbase/blob/cab6eb41543d6ca4ef57cab2d87a3a6e6796c88c/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L423-L449
Because `FAILED_OPEN` is not covered in the conditions. If we add
`FAILED_OPEN` here like so:
```diff
diff --git
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
index 6b5aaf6975..7cf0941f20 100644
---
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
+++
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
@@ -422,7 +424,7 @@ public class TransitRegionStateProcedure
private Flow confirmClosed(MasterProcedureEnv env, RegionStateNode
regionNode)
throws IOException {
- if (regionNode.isInState(State.CLOSED)) {
+ if (regionNode.isInState(State.CLOSED, State.FAILED_OPEN)) {
retryCounter = null;
if (lastState ==
RegionStateTransitionState.REGION_STATE_TRANSITION_CONFIRM_CLOSED) {
// we are the last state, finish
```
We can avoid the assertion error, but `CloseTableRegionsProcedure` will not
finish and endlessly retry:
```
There are still 1 unclosed region(s) for closing regions of table
testDisableFailedOpenRegions when executing CloseTableRegionsProcedure,
continue...
```
So we also need to change the implementation of `numberOfUnclosedRegions` to
account for `FAILED_OPEN` regions.
https://github.com/apache/hbase/blob/42715b681d05428c63eca926b5967d5fa64eb1a0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1147-L1163
But I felt this was getting too complicated, and it's simpler to just change
the state to `CLOSED`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]