[ 
https://issues.apache.org/jira/browse/HADOOP-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HADOOP-19902:
------------------------------------
    Labels: pull-request-available  (was: )

> [ABFS] Small write optimization fails hflush followed by close by retaining 
> consumed block
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19902
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19902
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Chao Sun
>            Priority: Major
>              Labels: pull-request-available
>
> When `fs.azure.write.enableappendwithflush` is enabled, `AbfsOutputStream` 
> fails for a short write followed by `hflush()` and `close()`.
> h3. Reproducer
> {code:java}
> try (FSDataOutputStream out = fs.create(path)) {
>   out.write(new byte[1000]);
>   out.hflush();
> }
> {code}
> Run with `fs.azure.write.enableappendwithflush=true` and a write buffer 
> larger than the payload. The issue is present on current trunk and branch-3.4.
> h3. Actual behavior
> The `hflush()` call sends an append-with-flush request and consumes the 
> underlying data block. The subsequent `close()` still sees the same block as 
> active and attempts to upload it again, failing before a second append can be 
> sent:
> {code}
> java.lang.IllegalStateException: Expected stream state Writing -but actual 
> state is Closed in ByteBufferBlock\{...}
>   at org.apache.hadoop.fs.store.DataBlocks$DataBlock.verifyState(...)
>   at org.apache.hadoop.fs.store.DataBlocks$ByteBufferBlock.startUpload(...)
>   at org.apache.hadoop.fs.azurebfs.services.AbfsBlock.startUpload(...)
>   at 
> org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.uploadBlockAsync(...)
>   at 
> org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.smallWriteOptimizedflushInternal(...)
>   at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.close(...)
> {code}
> h3. Expected behavior
> After an optimized `hflush()`, `close()` should complete successfully without 
> attempting to re-upload the data already submitted by the flush-mode append.
> h3. Root cause
> `smallWriteOptimizedflushInternal()` calls `uploadBlockAsync()`, which 
> invokes `startUpload()` and consumes the active block, but the optimized path 
> does not clear that block from the block manager. The regular 
> `uploadCurrentBlock()` path already clears the active block in a `finally` 
> block after submission.
> h3. Proposed fix
> Clear the active block after submitting the optimized append-with-flush, 
> matching the lifecycle used by regular uploads, and add a regression test for 
> `write() -> hflush() -> close()` that verifies the payload is appended 
> exactly once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to