[jira] [Commented] (HADOOP-18521) ABFS ReadBufferManager buffer sharing across concurrent HTTP requests

ASF GitHub Bot (Jira) Tue, 15 Nov 2022 07:17:24 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-18521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634407#comment-17634407
 ]


ASF GitHub Bot commented on HADOOP-18521:
-----------------------------------------

snvijaya commented on PR #5133:
URL: https://github.com/apache/hadoop/pull/5133#issuecomment-1315456701

   > Having spent the last couple of weeks staring at this code, I think what I 
would like from the module is
   > 
   > * per file system instance
   > * stats reporting to that fs (gauges on queues; actual fetch/hit/miss 
numbers could come via abfs input stream)
   > * demand creation of the buffer pool. Maybe when the last active stream is 
closed it could even free them all, no that at the reference counting problem 
to the mix.
   > * when an abfs input stream is created for vector/random IO, prefetch 
being automatically disabled.
   > * anything we can do to improve testing, especially simulation of heavy 
load from multiple tasks.
   
   Thanks for your inputs @steveloughran . Agree that all these apply for the 
long term transformation on the prefetch handling.  I am going to take a day 
and see what of these I can get into this change as well, but want to 
prioritize a hotfix and remove the buggy bit quickly.




> ABFS ReadBufferManager buffer sharing across concurrent HTTP requests
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-18521
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18521
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 3.3.2, 3.3.3, 3.3.4
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>              Labels: pull-request-available
>
> AbfsInputStream.close() can trigger the return of buffers used for active 
> prefetch GET requests into the ReadBufferManager free buffer pool.
> A subsequent prefetch by a different stream in the same process may acquire 
> this same buffer. This can lead to risk of corruption of its own prefetched 
> data, data which may then be returned to that other thread.
> On releases without the fix for this (3.3.2+), the bug can be avoided by 
> disabling all prefetching 
> {code}
> fs.azure.readaheadqueue.depth = 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18521) ABFS ReadBufferManager buffer sharing across concurrent HTTP requests

Reply via email to