junegunn commented on code in PR #6797:
URL: https://github.com/apache/hbase/pull/6797#discussion_r2104293790


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java:
##########
@@ -392,10 +399,26 @@ private void closeRegion(MasterProcedureEnv env, 
RegionStateNode regionNode)
     ) {
       return;
     }
-    if (regionNode.isInState(State.OPEN, State.CLOSING, State.MERGING, 
State.SPLITTING)) {
-      // this is the normal case
-      ProcedureFutureUtil.suspendIfNecessary(this, this::setFuture,
-        env.getAssignmentManager().regionClosing(regionNode), env,
+
+    CompletableFuture<Void> future = null;
+    if (regionNode.isInState(STATES_EXPECTED_ON_CLOSING)) {
+      // This is the normal case
+      future = env.getAssignmentManager().regionClosing(regionNode);
+    } else if (regionNode.setState(State.CLOSED, State.FAILED_OPEN)) {
+      // FAILED_OPEN doesn't need further transition, immediately mark the 
region as closed
+      AssignmentManager am = env.getAssignmentManager();
+      am.getRegionStates().removeFromFailedOpen(regionNode.getRegionInfo());
+      future = am.getRegionStateStore().updateRegionLocation(regionNode);

Review Comment:
   > And I suggest that we use two ProcedureFutureUtil.suspendIfNecessary calls 
for these two conditions, so we do not need to add extra check in 
closeRegionAfterUpdatingMeta
   
   Ahh.. I remembered why I organized the code this way. This was not super 
obvious.
   
   When you disable the table with "FAILED_OPEN" regions,
   1. `DisableTableProcedure` creates a `CloseTableRegionsProcedure`,
   1. It results in `TRSP` with `REGION_STATE_TRANSITION_CLOSE` as the initial 
state
       
https://github.com/apache/hbase/blob/cab6eb41543d6ca4ef57cab2d87a3a6e6796c88c/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L157-L160
   1. Because of the initial condition, it calls  `TRSP#closeRegion`,
       
https://github.com/apache/hbase/blob/cab6eb41543d6ca4ef57cab2d87a3a6e6796c88c/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L516-L518
 
   1. which in turn calls `closeRegionAfterUpdatingMeta`
       
https://github.com/apache/hbase/blob/cab6eb41543d6ca4ef57cab2d87a3a6e6796c88c/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L394-L401
   1. So we're entering `closeRegionAfterUpdatingMeta` for a `FAILED_OPEN` 
region and HBase creates a `CloseRegionProcedure` which is exactly what we 
tried to avoid.
   1. `CloseRegionProcedure` is a subclass of `RegionRemoteProcedureBase` which 
requires `targetServer`, but a `FAILED_OPEN` region lacks it, and the procedure 
fails and hangs.
       ```
       2025-05-23T19:17:19,176 WARN  [PEWorker-1 {}] 
procedure2.ProcedureExecutor$WorkerThread(2184): Worker terminating UNNATURALLY 
null
       java.lang.NullPointerException: null
            at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos$RegionRemoteProcedureBaseStateData$Builder.setTargetServer(MasterProcedureProtos.java:54339)
 ~[classes/:?]
            at 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.serializeStateData(RegionRemoteProcedureBase.java:382)
 ~[classes/:?]
            at 
org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure.serializeStateData(CloseRegionProcedure.java:74)
 ~[classes/:?]
       ```
   
   So that explains why I had to put the check at the start of 
`closeRegionAfterUpdatingMeta`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to