hgromer commented on PR #7084:
URL: https://github.com/apache/hbase/pull/7084#issuecomment-3024611037

    I think we need to revert this PR. This change can cause another deadlock, 
which can interfere with server crash procedures. 
   
   The SCP acquires a server exclusive lock, so it can run in parallel to a 
snapshot procedure. However the SCP will schedule SplitWALRemoteProcedure which 
do acquire table locks. The SplitWALRemoteProcedure won't run until the 
snapshot procedure finishes, however the snapshot procedure will get stuck at 
state SNAPSHOT_SNAPSHOT_SPLIT_REGIONS waiting for the server to go online. 
   
   ```
   2025-07-01T15:23:20,413 [PEWorker-2] WARN 
org.apache.hadoop.hbase.master.procedure.SnapshotRegionProcedure: pid=2365228, 
ppid=2365224, state=RUNNABLE, locked=true; SnapshotRegionProcedure 
91f810e77abe57ea0791ea6e86ada219 can not run currently because target server of 
region 
migrate-test-1,\x7F\xFF\xFF\xFE,1751380484623.91f810e77abe57ea0791ea6e86ada219. 
na1-elegant-jaded-egg.iad03.hubinternal.net,60020,1751300917328 is in state 
SPLITTING, wait 600000 ms to retry
   ```
   
   This puts us in a state where the children of the SCP will never finish, 
which means the SCP will never finish, which also blocks the snapshot procedure 
from finishing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to