[GitHub] [hadoop] omalley commented on a change in pull request #1830: HADOOP-11867: Add gather API to file system.

GitBox Tue, 04 Feb 2020 13:46:28 -0800

omalley commented on a change in pull request #1830: HADOOP-11867: Add gather 
API to file system.
URL: https://github.com/apache/hadoop/pull/1830#discussion_r374940578


 ##########
 File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java
 ##########
 @@ -261,15 +270,129 @@ protected int readChunk(long pos, byte[] buf, int 
offset, int len,
           len = Math.min(len, bytesPerSum * (sumLenRead / CHECKSUM_SIZE));
         }
       }
-      if(pos != datas.getPos()) {
+      if (pos != datas.getPos()) {
         datas.seek(pos);
       }
       int nread = readFully(datas, buf, offset, len);
       if (eof && nread > 0) {
-        throw new ChecksumException("Checksum error: "+file+" at "+pos, pos);
+        throw new ChecksumException("Checksum error: " + file + " at " + pos, 
pos);
       }
       return nread;
     }
+
+    public static long findChecksumOffset(long dataOffset,
+                                          int bytesPerSum) {
+      return HEADER_LENGTH + (dataOffset/bytesPerSum) * 
FSInputChecker.CHECKSUM_SIZE;
+    }
+
+    /**
+     * Find the checksum ranges that correspond to the given data ranges.
 
 Review comment:
   Why what is needed? You mean the code to compare the checksums? The current 
code requires a lot of context that isn't true in the new API. The current code 
is super inefficient because it did a bad job of working around those 
limitations. In particular, if you look at the current pread code, it reopens 
the crc file for each seek.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] omalley commented on a change in pull request #1830: HADOOP-11867: Add gather API to file system.

Reply via email to