balodesecurity commented on PR #8295:
URL: https://github.com/apache/hadoop/pull/8295#issuecomment-3990405906
## Docker Integration Test Results
Tested on a 3-DataNode Docker cluster (1 NameNode + 3 DataNodes, RF=3,
balodesecurity/hadoop HDFS-17722 branch):
```
--- Scenario 1: Clean decommission (RF=2, decom DN2) ---
[PASS] DN2 decommissioned cleanly (RF=2)
--- Scenario 2: HDFS-17722 — RF=3→2 creates EXCESS, then decom DN2 ---
[PASS] DN2 decommissioned with EXCESS replicas present (HDFS-17722 FIX
VERIFIED!)
[PASS] All 3 files accessible after decommission
--- Scenario 3: HDFS-17722 on DN3 (variant) ---
[PASS] DN3 decommissioned with EXCESS replicas (HDFS-17722 fix verified on
DN3)
--- Scenario 4: Repeated decom/recommission cycles (3 rounds) ---
[PASS] Round 1: DN2 decommissioned + recommissioned (Normal)
[PASS] Round 2: DN2 decommissioned + recommissioned (Normal)
[PASS] Round 3: DN2 decommissioned + recommissioned (Normal)
--- Scenario 5: Data integrity after decommission ---
[PASS] DN2 decommissioned
[PASS] Data integrity OK: content matches
Results: 0 failure(s) — ALL TESTS PASSED
```
**Note on replicating the bug naturally**: In a single-NameNode setup the
race does not occur naturally (the block manager processes setrep deletions
before the decommission check runs in the same thread). The bug is specific to
the standby NameNode path. The unit tests in
`TestDatanodeAdminManagerIsSufficient` directly exercise `isSufficient()` with
the exact replica counts that trigger the deadlock. The Docker tests verify no
regression in normal decommission behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]