[PR] Hadoop 19641 [hadoop]

via GitHub Tue, 29 Jul 2025 06:05:07 -0700


anujmodi2021 opened a new pull request, #7835:
URL: https://github.com/apache/hadoop/pull/7835


   ### Description of PR
   JIRA: https://issues.apache.org/jira/browse/HADOOP-19641
   We have observed this across multiple workload runs that when we start 
reading data from input stream. The first read which came to input stream has 
to be read synchronously even if we trigger prefetch request for that 
particular offset. Most of the times we end up doing extra work of checking if 
the prefetch is trigerred, removing prefetch from the pending queue and go 
ahead to do a direct remote read in workload thread itself.
   
   To avoid all this overhead, we will always bypass read ahead for the very 
first read of each input stream and trigger read aheads for second read onwards.
   
   ### How was this patch tested?
   TBA
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Hadoop 19641 [hadoop]

Reply via email to