[ 
https://issues.apache.org/jira/browse/HBASE-28569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28569:
-----------------------------------
    Labels: pull-request-available  (was: )

> Race condition during WAL splitting leading to corrupt recovered.edits
> ----------------------------------------------------------------------
>
>                 Key: HBASE-28569
>                 URL: https://issues.apache.org/jira/browse/HBASE-28569
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 2.4.17
>            Reporter: Benoit Sigoure
>            Priority: Major
>              Labels: pull-request-available
>
> There is a race condition that can happen when a regionserver aborts 
> initialisation while splitting a WAL from another regionserver. This race 
> leads to writing the WAL trailer for recovered edits while the writer threads 
> are still running, thus the trailer gets interleaved with the edits 
> corrupting the recovered edits file (and preventing the region to be 
> assigned).
> We've seen this happening on HBase 2.4.17, but looking at the latest code it 
> seems that the race can still happen there.
> The sequence of operations that leads to this issue:
>  * {{org.apache.hadoop.hbase.wal.WALSplitter.splitWAL}} calls 
> {{outputSink.close()}} after adding all the entries to the buffers
>  * The output sink is 
> {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink}} and its {{close}} 
> method calls first {{finishWriterThreads}} in a try block which in turn will 
> call {{finish}} on every thread and then join it to make sure it's done.
>  * However if the splitter thread gets interrupted because of RS aborting, 
> the join will get interrupted and {{finishWriterThreads}} will rethrow 
> without waiting for the writer threads to stop.
>  * This is problematic because coming back to 
> {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close}} it will call 
> {{closeWriters}} in a finally block (so it will execute even when the join 
> was interrupted).
>  * {{closeWriters}} will call 
> {{org.apache.hadoop.hbase.wal.AbstractRecoveredEditsOutputSink.closeRecoveredEditsWriter}}
>  which will call {{close}} on {{{}editWriter.writer{}}}.
>  * When {{editWriter.writer}} is 
> {{{}org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter{}}}, its 
> {{close}} method will write the trailer before closing the file.
>  * This trailer write will now go in parallel with writer threads writing 
> entries causing corruption.
>  * If there are no other errors, {{closeWriters}} will succeed renaming all 
> temporary files to final recovered edits, causing problems next time the 
> region is assigned.
> Logs evidence supporting the above flow:
> Abort is triggered (because it failed to open the WAL due to some ongoing 
> infra issue):
> {noformat}
> regionserver-2 regionserver 06:22:00.384 
> [RS_OPEN_META-regionserver/host01:16201-0] ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region 
> server host01,16201,1709187641249: WAL can not clean up after init failed 
> *****{noformat}
> We can see that the writer threads were still active after closing (even 
> considering that the
> ordering in the log might not be accurate, we see that they die because the 
> channel is closed while still writing, not because they're stopping):
> {noformat}
> regionserver-2 regionserver 06:22:09.662 [DataStreamer for file 
> /hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/0000000003169600719-host02%2C16201%2C1709180140645.1709186722780.temp
>  block BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368] WARN 
>  org.apache.hadoop.hdfs.DataStreamer - Error Recovery for 
> BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368 in pipeline 
> [DatanodeInfoWithStorage[192.168.2.230:15010,DS-2aa201ab-1027-47ec-b05f-b39d795fda85,DISK],
>  
> DatanodeInfoWithStorage[192.168.2.232:15010,DS-39651d5a-67d2-4126-88f0-45cdee967dab,DISK],
>  Datanode
> InfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK]]:
>  datanode 
> 2(DatanodeInfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK])
>  is bad.
> regionserver-2 regionserver 06:22:09.742 [split-log-closeStream-pool-1] INFO  
> org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Closed recovered edits 
> writer 
> path=hdfs://mycluster/hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/0000000003169600719-host02%2C16201%
> 2C1709180140645.1709186722780.temp (wrote 5949 edits, skipped 0 edits in 93 
> ms)
> regionserver-2 regionserver 06:22:09.743 
> [RS_LOG_REPLAY_OPS-regionserver/host01:16201-1-Writer-0] ERROR 
> org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Failed to write log 
> entry aeris_v2/53308260a6b22eaf6ebb8353f7df3077/3169611655=[#edits: 8 = 
> <REDACTED>] to log
> regionserver-2 regionserver java.nio.channels.ClosedChannelException: null
> regionserver-2 regionserver    at 
> org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73)
>  ~[hadoop-hdfs-client-3.2.4.jar:?]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:153) 
> ~[hadoop-hdfs-client-3.2.4.jar:?]
> regionserver-2 regionserver    at 
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105) 
> ~[hadoop-common-3.2.4.jar:?]
> regionserver-2 regionserver    at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>  ~[hadoop-common-3.2.4.jar:?]
> regionserver-2 regionserver    at java.io.DataOutputStream.write(Unknown 
> Source) ~[?:?]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.io.ByteBufferWriterOutputStream.write(ByteBufferWriterOutputStream.java:94)
>  ~[hbase-common-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.KeyValue.write(KeyValue.java:2093) 
> ~[hbase-common-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.KeyValueUtil.oswrite(KeyValueUtil.java:737) 
> ~[hbase-common-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$EnsureKvEncoder.write(WALCellCodec.java:365)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:58)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.AbstractRecoveredEditsOutputSink$RecoveredEditsWriter.writeRegionEntries(AbstractRecoveredEditsOutputSink.java:237)
>  [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.append(RecoveredEditsOutputSink.java:66)
>  [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.OutputSink$WriterThread.writeBuffer(OutputSink.java:249)
>  [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.OutputSink$WriterThread.doRun(OutputSink.java:241)
>  [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.OutputSink$WriterThread.run(OutputSink.java:211) 
> [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver 06:22:09.745 
> [RS_LOG_REPLAY_OPS-regionserver/host01:16201-1-Writer-0] ERROR 
> org.apache.hadoop.hbase.wal.OutputSink - Exiting thread
> regionserver-2 regionserver java.nio.channels.ClosedChannelException: null
> regionserver-2 regionserver    at 
> org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73)
>  ~[hadoop-hdfs-client-3.2.4.jar:?]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:153) 
> ~[hadoop-hdfs-client-3.2.4.jar:?]
> regionserver-2 regionserver    at 
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105) 
> ~[hadoop-common-3.2.4.jar:?]
> regionserver-2 regionserver    at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>  ~[hadoop-common-3.2.4.jar:?]
> regionserver-2 regionserver    at java.io.DataOutputStream.write(Unknown 
> Source) ~[?:?]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.io.ByteBufferWriterOutputStream.write(ByteBufferWriterOutputStream.java:94)
>  ~[hbase-common-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.KeyValue.write(KeyValue.java:2093) 
> ~[hbase-common-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.KeyValueUtil.oswrite(KeyValueUtil.java:737) 
> ~[hbase-common-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$EnsureKvEncoder.write(WALCellCodec.java:365)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:58)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.AbstractRecoveredEditsOutputSink$RecoveredEditsWriter.writeRegionEntries(AbstractRecoveredEditsOutputSink.java:237)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.append(RecoveredEditsOutputSink.java:66)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.OutputSink$WriterThread.writeBuffer(OutputSink.java:249)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.OutputSink$WriterThread.doRun(OutputSink.java:241)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.OutputSink$WriterThread.run(OutputSink.java:211) 
> [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver 06:22:09.749 [split-log-closeStream-pool-1] INFO  
> org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Rename recovered edits 
> hdfs://mycluster/hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/0000000003169600719-host02%2C16201%2C1709180140645.1709186722780.temp
>  to 
> hdfs://mycluster/hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/0000000003169611654{noformat}
> And we can see the {{finishWriterThreads}} was interrupted:
> {noformat}
> regionserver-2 regionserver 06:22:20.643 
> [RS_LOG_REPLAY_OPS-regionserver/host01:16201-1] WARN  
> org.apache.hadoop.hbase.regionserver.SplitLogWorker - Resigning, interrupted 
> splitting WAL 
> hdfs://mycluster/hbase/WALs/host02,16201,1709180140645-splitting/host02%2C16201%2C1709180140645.1709186722780
> regionserver-2 regionserver java.io.InterruptedIOException: null
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.OutputSink.finishWriterThreads(OutputSink.java:139)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close(RecoveredEditsOutputSink.java:94)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitWAL(WALSplitter.java:414) 
> ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:201) 
> ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:108)
>  [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:100)
>  [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:46)
>  [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49)
>  [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:98) 
> [hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
> regionserver-2 regionserver    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
> regionserver-2 regionserver    at java.lang.Thread.run(Unknown Source) [?:?]
> regionserver-2 regionserver Caused by: java.lang.InterruptedException
> regionserver-2 regionserver    at java.lang.Object.wait(Native Method) ~[?:?]
> regionserver-2 regionserver    at java.lang.Thread.join(Unknown Source) ~[?:?]
> regionserver-2 regionserver    at java.lang.Thread.join(Unknown Source) ~[?:?]
> regionserver-2 regionserver    at 
> org.apache.hadoop.hbase.wal.OutputSink.finishWriterThreads(OutputSink.java:137)
>  ~[hbase-server-2.4.17.jar:2.4.17]
> regionserver-2 regionserver    ... 11 more{noformat}
>  
> The corrupted recovered edits file contained one additional piece of evidence 
> towards a race condition. The 0 bytes that form the 32 bit trailer length 
> were scattered through the file while the marker {{LAWP}} was written in one 
> piece. This is because in OpenJDK11
> {{writeInt}} writes each byte individually but byte arrays are written all at 
> once.
>  
> Original bug report from Vlad Hanciuta.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to