Re: [PR] Hadoop 19641: [ABFS][ReadAheadV2] First Read should bypass ReadBufferManager [hadoop]

via GitHub Wed, 30 Jul 2025 23:06:32 -0700


Copilot commented on code in PR #7835:
URL: https://github.com/apache/hadoop/pull/7835#discussion_r2244433207



##########
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##########
@@ -477,11 +494,14 @@ public void testSuccessfulReadAhead() throws Exception {
     AbfsInputStream inputStream = getAbfsInputStream(client, 
"testSuccessfulReadAhead.txt");
     int beforeReadCompletedListSize = 
getBufferManager().getCompletedReadListSize();
 
-    // First read request that triggers readAheads.
+    // First read request always bypasses readahead. This will succeed
     inputStream.read(new byte[ONE_KB]);
 
-    // Only the 3 readAhead threads should have triggered client.read
-    verifyReadCallCount(client, 3);
+    // This will start triggering readahead requests.
+    inputStream.read((new byte[1024]));

Review Comment:
   Inconsistent array size usage. The code uses `new byte[1024]` here while 
using `new byte[ONE_KB]` elsewhere in the same file. Using the constant 
`ONE_KB` would be more consistent and maintainable.
   ```suggestion
       inputStream.read((new byte[ONE_KB]));
   ```



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##########
@@ -348,14 +348,20 @@ private int readOneBlock(final byte[] b, final int off, 
final int len) throws IO
       if (alwaysReadBufferSize) {
         bytesRead = readInternal(fCursor, buffer, 0, bufferSize, false);
       } else {
-        // Enable readAhead when reading sequentially
-        if (-1 == fCursorAfterLastRead || fCursorAfterLastRead == fCursor || 
b.length >= bufferSize) {
-          LOG.debug("Sequential read with read ahead size of {}", bufferSize);
+        // Switch between enabling and disabling read ahead based on the 
workload read pattern.
+        // First Read on input stream should always bypass read ahead.
+        if (firstRead) {

Review Comment:
   The `firstRead` variable is referenced but not defined or updated in this 
diff. The code checks this variable but there's no logic shown to set it to 
false after the first read, which could cause all reads to bypass read-ahead 
indefinitely.



##########
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##########
@@ -119,14 +119,14 @@ private AbfsInputStream getAbfsInputStream(AbfsClient 
mockAbfsClient,
       String fileName) throws IOException {
     AbfsInputStreamContext inputStreamContext = new AbfsInputStreamContext(-1);
     // Create AbfsInputStream with the client instance
-    AbfsInputStream inputStream = new AbfsInputStream(
+    AbfsInputStream inputStream = Mockito.spy(new AbfsInputStream(
         mockAbfsClient,
         null,
         FORWARD_SLASH + fileName,
-        THREE_KB,
+        THREE_KB + ONE_KB, // First read will always bypass readahead

Review Comment:
   The comment suggests this change is related to first read bypassing 
readahead, but increasing the file size by ONE_KB doesn't clearly relate to the 
first read behavior. This change appears to be adjusting test data size rather 
than directly testing the first read bypass feature.
   ```suggestion
           ONE_KB, // Set file size to match the read buffer size to ensure the 
first read bypasses readahead
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Hadoop 19641: [ABFS][ReadAheadV2] First Read should bypass ReadBufferManager [hadoop]

Reply via email to