hgromer commented on PR #7084: URL: https://github.com/apache/hbase/pull/7084#issuecomment-2970866609
> > > When implementing this SnapshotProcedure, we decided to use shared lock and holdLock = true to prevent other table procedure jumps in in the middle of our execution while not hurting the availability because exclusive lock will also prevent region assigning. > > > In [HBASE-28683](https://issues.apache.org/jira/browse/HBASE-28683), we introduced a new way to only allow one procedure to run at the same time for the same table, so maybe a possible way is to make SnapshotProcedure also acquire exclusive lock and set holdLock = false, so it will not be executed at the same time with Enable and Disable procedure. > > > Thanks. > > > > > > That's good to know, thank you for the context. The issue here is that the EnableTableProcedure will release the lock after it's subprocedures finish, which allows the SnapshotProcedure to execute on a table that is enabling. The SnapshotProcedure then gets stuck continuously re-running the same state over again, and will refuse to release the lock, creating a deadlock > > With all this said, I think it makes the most sense, for simplicity's sake, to fail the procedure if the table is in an invalid state. > > Using the mechanism in [HBASE-28683](https://issues.apache.org/jira/browse/HBASE-28683), SnapshotProcedure can not be executed together with EnableTableProcedure/DisabledTableProcedure together, which could also solve the problem, and I think it is more stable. I'm not sure whether ModifyTableProcedure could also affect SnapshotProcedure if it jumps in just in the middle of the execution... Yes, I'm able to avoid the deadlock if the SnapshotProcedure both sets `holdLock = true` and also yields after procedure execution. Essentially allowing both SnapshotProcedure and EnableTableProcedure to execute without the need to throw any sort of yield / suspended exception (which was my initial implementation). This is certainly a cleaner implementation, though it means we can interleave other table procedures after cycles of the SnapshotProcedure. You hint at this here > I'm not sure whether ModifyTableProcedure could also affect SnapshotProcedure if it jumps in just in the middle of the execution... and I believe that it would be problematic because you may snapshot regions multiple times, regions may move or go offline, so you may miss out on snapshotting regions. My initial implementation prevented this by only allowing SNAPSHOT_WRITE_SNAPSHOT_INFO to release the lock (when it's still safe to do so). However, I don't think it's safe to yield the lock after _any_ cycle. After some thought, I think we still have two solutions here: 1. My initial implementation, which throws an exception that indicates we should release the lock from within SNAPSHOT_WRITE_SNAPSHOT_INFO 2. Fail the procedure It seems the consensus is to avoid complicating the procedure and fail if we encounter the table in an invalid state, so I'm leaning towards 2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
