[jira] [Commented] (HADOOP-18439) Fix VectoredIO for LocalFileSystem when checksum is enabled.

ASF GitHub Bot (Jira) Wed, 07 Sep 2022 12:06:09 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601479#comment-17601479
 ]


ASF GitHub Bot commented on HADOOP-18439:
-----------------------------------------

steveloughran commented on code in PR #4862:
URL: https://github.com/apache/hadoop/pull/4862#discussion_r965174231


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
       IntBuffer sums = sumsBytes.asIntBuffer();
       sums.position(offset / FSInputChecker.CHECKSUM_SIZE);
       ByteBuffer current = data.duplicate();
-      int numChunks = data.remaining() / bytesPerSum;
+      int numFullChunks = data.remaining() / bytesPerSum;
+      boolean partialChunk = (data.remaining() % bytesPerSum != 0);

Review Comment:
   wrap in () because % isn't used enough for the precedence to be immediately 
obvious



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -396,11 +407,33 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
       return data;
     }
 
+    /**
+     * Validates range parameters.
+     * In case of CheckSum FS, we already have calculated
+     * fileLength so failing fast here.
+     * @param ranges requested ranges.
+     * @param fileLen length of file.
+     * @throws EOFException end of file exception.
+     */
+    private void validateRangeRequest(List<? extends FileRange> ranges, long 
fileLen) throws EOFException {
+      for (FileRange range : ranges) {
+        VectoredReadUtils.validateRangeRequest(range);
+        if (range.getOffset() + range.getLength() > fileLen) {
+          LOG.warn("Requested range [{}, {}) is beyond EOF for path {}",
+                  range.getOffset(), range.getLength(), file);
+          throw new EOFException("Requested range [" + range.getOffset() + ", "
+                  + range.getLength() + ") is beyond EOF for path " + file);
+        }
+      }
+    }
+
     @Override
     public void readVectored(List<? extends FileRange> ranges,
                              IntFunction<ByteBuffer> allocate) throws 
IOException {
+      long length = fs.getFileStatus(file).getLen();

Review Comment:
   1. use `getFileLength()` which does the on demand read
   2. move that code to use getFileStatus() as the way it does it today makes 
no sense. excessively inefficient.



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
       IntBuffer sums = sumsBytes.asIntBuffer();
       sums.position(offset / FSInputChecker.CHECKSUM_SIZE);
       ByteBuffer current = data.duplicate();
-      int numChunks = data.remaining() / bytesPerSum;
+      int numFullChunks = data.remaining() / bytesPerSum;
+      boolean partialChunk = (data.remaining() % bytesPerSum != 0);
+      int totalChunks = numFullChunks;
+      if (partialChunk) {
+        totalChunks++;
+      }
       CRC32 crc = new CRC32();
       // check each chunk to ensure they match
-      for(int c = 0; c < numChunks; ++c) {
+      for(int c = 0; c < totalChunks; ++c) {
         // set the buffer position and the limit
-        current.limit((c + 1) * bytesPerSum);
+        if (c == numFullChunks) {
+          int lastIncompleteChunk = data.remaining() % bytesPerSum;

Review Comment:
   add comment to explain what is happening





> Fix VectoredIO for LocalFileSystem when checksum is enabled.
> ------------------------------------------------------------
>
>                 Key: HADOOP-18439
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18439
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: common
>    Affects Versions: 3.3.9
>            Reporter: Mukund Thakur
>            Assignee: Mukund Thakur
>            Priority: Major
>              Labels: pull-request-available
>
> While merging the ranges in CheckSumFs, they are rounded up based on the 
> value of checksum bytes size
> which leads to some ranges crossing the EOF thus they need to be fixed else 
> it will cause EOFException during actual reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18439) Fix VectoredIO for LocalFileSystem when checksum is enabled.

Reply via email to