[
https://issues.apache.org/jira/browse/HBASE-29797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048536#comment-18048536
]
Duo Zhang commented on HBASE-29797:
-----------------------------------
Ah, OK, I foudn the root cause here...
When initializaing WAL, we will try to create the wal dir, see this line
https://github.com/apache/hbase/blob/1a3e371ca0a9c5cc3a884cb6121a4c8769f25a70/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java#L539
And in the above scenario, data03 had received an open region request for meta
region, and then it was suspended, and at master side we renamed its wal
directory. But after resuming, data03 started to open meta region, and when
creating the HRegion, it initialized the meta wal(since it did not hold meta in
the past), so we created a new instance of WAL, so the wal directory was back,
and then cause data inconsistency...
Let me check why we added the above logic in AbstractFSWAL, in general, wal
directory should only be created when initializing region server.
> RegionServer aborted because of invalid max sequence id
> -------------------------------------------------------
>
> Key: HBASE-29797
> URL: https://issues.apache.org/jira/browse/HBASE-29797
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Reporter: Duo Zhang
> Priority: Blocker
>
> {noformat}
> 2025-12-29T11:03:32,429 WARN [RS_CLOSE_REGION-regionserver/data02:16020-0]
> handler.UnassignRegionHandler: Fatal error occurred while closing region
> 8d60369be1061570a2f6e47a1af7a797, aborting...
> java.io.IOException: The new max sequence id 1212630 is less than the old max
> sequence id 1212631
> at
> org.apache.hadoop.hbase.wal.WALSplitUtil.writeRegionSequenceIdFile(WALSplitUtil.java:402)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1290)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1950)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1675)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1630)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1613)
> at
> org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:139)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:840)
> 2025-12-29T11:03:32,433 ERROR [RS_CLOSE_REGION-regionserver/data02:16020-0]
> regionserver.HRegionServer: ***** ABORTING region server
> data02,16020,1766977119966: Failed to close region
> 8d60369be1061570a2f6e47a1af7a797 and can not recover *****
> java.io.IOException: The new max sequence id 1212630 is less than the old max
> sequence id 1212631
> at
> org.apache.hadoop.hbase.wal.WALSplitUtil.writeRegionSequenceIdFile(WALSplitUtil.java:402)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1290)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1950)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1675)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1630)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1613)
> at
> org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:139)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:840)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)