[ https://issues.apache.org/jira/browse/HBASE-28569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HBASE-28569: ----------------------------------- Labels: pull-request-available (was: ) > Race condition during WAL splitting leading to corrupt recovered.edits > ---------------------------------------------------------------------- > > Key: HBASE-28569 > URL: https://issues.apache.org/jira/browse/HBASE-28569 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 2.4.17 > Reporter: Benoit Sigoure > Priority: Major > Labels: pull-request-available > > There is a race condition that can happen when a regionserver aborts > initialisation while splitting a WAL from another regionserver. This race > leads to writing the WAL trailer for recovered edits while the writer threads > are still running, thus the trailer gets interleaved with the edits > corrupting the recovered edits file (and preventing the region to be > assigned). > We've seen this happening on HBase 2.4.17, but looking at the latest code it > seems that the race can still happen there. > The sequence of operations that leads to this issue: > * {{org.apache.hadoop.hbase.wal.WALSplitter.splitWAL}} calls > {{outputSink.close()}} after adding all the entries to the buffers > * The output sink is > {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink}} and its {{close}} > method calls first {{finishWriterThreads}} in a try block which in turn will > call {{finish}} on every thread and then join it to make sure it's done. > * However if the splitter thread gets interrupted because of RS aborting, > the join will get interrupted and {{finishWriterThreads}} will rethrow > without waiting for the writer threads to stop. > * This is problematic because coming back to > {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close}} it will call > {{closeWriters}} in a finally block (so it will execute even when the join > was interrupted). > * {{closeWriters}} will call > {{org.apache.hadoop.hbase.wal.AbstractRecoveredEditsOutputSink.closeRecoveredEditsWriter}} > which will call {{close}} on {{{}editWriter.writer{}}}. > * When {{editWriter.writer}} is > {{{}org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter{}}}, its > {{close}} method will write the trailer before closing the file. > * This trailer write will now go in parallel with writer threads writing > entries causing corruption. > * If there are no other errors, {{closeWriters}} will succeed renaming all > temporary files to final recovered edits, causing problems next time the > region is assigned. > Logs evidence supporting the above flow: > Abort is triggered (because it failed to open the WAL due to some ongoing > infra issue): > {noformat} > regionserver-2 regionserver 06:22:00.384 > [RS_OPEN_META-regionserver/host01:16201-0] ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region > server host01,16201,1709187641249: WAL can not clean up after init failed > *****{noformat} > We can see that the writer threads were still active after closing (even > considering that the > ordering in the log might not be accurate, we see that they die because the > channel is closed while still writing, not because they're stopping): > {noformat} > regionserver-2 regionserver 06:22:09.662 [DataStreamer for file > /hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/0000000003169600719-host02%2C16201%2C1709180140645.1709186722780.temp > block BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368] WARN > org.apache.hadoop.hdfs.DataStreamer - Error Recovery for > BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368 in pipeline > [DatanodeInfoWithStorage[192.168.2.230:15010,DS-2aa201ab-1027-47ec-b05f-b39d795fda85,DISK], > > DatanodeInfoWithStorage[192.168.2.232:15010,DS-39651d5a-67d2-4126-88f0-45cdee967dab,DISK], > Datanode > InfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK]]: > datanode > 2(DatanodeInfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK]) > is bad. > regionserver-2 regionserver 06:22:09.742 [split-log-closeStream-pool-1] INFO > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Closed recovered edits > writer > path=hdfs://mycluster/hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/0000000003169600719-host02%2C16201% > 2C1709180140645.1709186722780.temp (wrote 5949 edits, skipped 0 edits in 93 > ms) > regionserver-2 regionserver 06:22:09.743 > [RS_LOG_REPLAY_OPS-regionserver/host01:16201-1-Writer-0] ERROR > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Failed to write log > entry aeris_v2/53308260a6b22eaf6ebb8353f7df3077/3169611655=[#edits: 8 = > <REDACTED>] to log > regionserver-2 regionserver java.nio.channels.ClosedChannelException: null > regionserver-2 regionserver at > org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73) > ~[hadoop-hdfs-client-3.2.4.jar:?] > regionserver-2 regionserver at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:153) > ~[hadoop-hdfs-client-3.2.4.jar:?] > regionserver-2 regionserver at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105) > ~[hadoop-common-3.2.4.jar:?] > regionserver-2 regionserver at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > ~[hadoop-common-3.2.4.jar:?] > regionserver-2 regionserver at java.io.DataOutputStream.write(Unknown > Source) ~[?:?] > regionserver-2 regionserver at > org.apache.hadoop.hbase.io.ByteBufferWriterOutputStream.write(ByteBufferWriterOutputStream.java:94) > ~[hbase-common-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.KeyValue.write(KeyValue.java:2093) > ~[hbase-common-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.KeyValueUtil.oswrite(KeyValueUtil.java:737) > ~[hbase-common-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$EnsureKvEncoder.write(WALCellCodec.java:365) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:58) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.AbstractRecoveredEditsOutputSink$RecoveredEditsWriter.writeRegionEntries(AbstractRecoveredEditsOutputSink.java:237) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.append(RecoveredEditsOutputSink.java:66) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.OutputSink$WriterThread.writeBuffer(OutputSink.java:249) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.OutputSink$WriterThread.doRun(OutputSink.java:241) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.OutputSink$WriterThread.run(OutputSink.java:211) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver 06:22:09.745 > [RS_LOG_REPLAY_OPS-regionserver/host01:16201-1-Writer-0] ERROR > org.apache.hadoop.hbase.wal.OutputSink - Exiting thread > regionserver-2 regionserver java.nio.channels.ClosedChannelException: null > regionserver-2 regionserver at > org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73) > ~[hadoop-hdfs-client-3.2.4.jar:?] > regionserver-2 regionserver at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:153) > ~[hadoop-hdfs-client-3.2.4.jar:?] > regionserver-2 regionserver at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105) > ~[hadoop-common-3.2.4.jar:?] > regionserver-2 regionserver at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > ~[hadoop-common-3.2.4.jar:?] > regionserver-2 regionserver at java.io.DataOutputStream.write(Unknown > Source) ~[?:?] > regionserver-2 regionserver at > org.apache.hadoop.hbase.io.ByteBufferWriterOutputStream.write(ByteBufferWriterOutputStream.java:94) > ~[hbase-common-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.KeyValue.write(KeyValue.java:2093) > ~[hbase-common-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.KeyValueUtil.oswrite(KeyValueUtil.java:737) > ~[hbase-common-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$EnsureKvEncoder.write(WALCellCodec.java:365) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:58) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.AbstractRecoveredEditsOutputSink$RecoveredEditsWriter.writeRegionEntries(AbstractRecoveredEditsOutputSink.java:237) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.append(RecoveredEditsOutputSink.java:66) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.OutputSink$WriterThread.writeBuffer(OutputSink.java:249) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.OutputSink$WriterThread.doRun(OutputSink.java:241) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.OutputSink$WriterThread.run(OutputSink.java:211) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver 06:22:09.749 [split-log-closeStream-pool-1] INFO > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Rename recovered edits > hdfs://mycluster/hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/0000000003169600719-host02%2C16201%2C1709180140645.1709186722780.temp > to > hdfs://mycluster/hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/0000000003169611654{noformat} > And we can see the {{finishWriterThreads}} was interrupted: > {noformat} > regionserver-2 regionserver 06:22:20.643 > [RS_LOG_REPLAY_OPS-regionserver/host01:16201-1] WARN > org.apache.hadoop.hbase.regionserver.SplitLogWorker - Resigning, interrupted > splitting WAL > hdfs://mycluster/hbase/WALs/host02,16201,1709180140645-splitting/host02%2C16201%2C1709180140645.1709186722780 > regionserver-2 regionserver java.io.InterruptedIOException: null > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.OutputSink.finishWriterThreads(OutputSink.java:139) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close(RecoveredEditsOutputSink.java:94) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.WALSplitter.splitWAL(WALSplitter.java:414) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:201) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:108) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:100) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:46) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:98) > [hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver at > java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] > regionserver-2 regionserver at > java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] > regionserver-2 regionserver at java.lang.Thread.run(Unknown Source) [?:?] > regionserver-2 regionserver Caused by: java.lang.InterruptedException > regionserver-2 regionserver at java.lang.Object.wait(Native Method) ~[?:?] > regionserver-2 regionserver at java.lang.Thread.join(Unknown Source) ~[?:?] > regionserver-2 regionserver at java.lang.Thread.join(Unknown Source) ~[?:?] > regionserver-2 regionserver at > org.apache.hadoop.hbase.wal.OutputSink.finishWriterThreads(OutputSink.java:137) > ~[hbase-server-2.4.17.jar:2.4.17] > regionserver-2 regionserver ... 11 more{noformat} > > The corrupted recovered edits file contained one additional piece of evidence > towards a race condition. The 0 bytes that form the 32 bit trailer length > were scattered through the file while the marker {{LAWP}} was written in one > piece. This is because in OpenJDK11 > {{writeInt}} writes each byte individually but byte arrays are written all at > once. > > Original bug report from Vlad Hanciuta. -- This message was sent by Atlassian Jira (v8.20.10#820010)