omalley commented on a change in pull request #1830: HADOOP-11867: Add gather
API to file system.
URL: https://github.com/apache/hadoop/pull/1830#discussion_r374940578
##########
File path:
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java
##########
@@ -261,15 +270,129 @@ protected int readChunk(long pos, byte[] buf, int
offset, int len,
len = Math.min(len, bytesPerSum * (sumLenRead / CHECKSUM_SIZE));
}
}
- if(pos != datas.getPos()) {
+ if (pos != datas.getPos()) {
datas.seek(pos);
}
int nread = readFully(datas, buf, offset, len);
if (eof && nread > 0) {
- throw new ChecksumException("Checksum error: "+file+" at "+pos, pos);
+ throw new ChecksumException("Checksum error: " + file + " at " + pos,
pos);
}
return nread;
}
+
+ public static long findChecksumOffset(long dataOffset,
+ int bytesPerSum) {
+ return HEADER_LENGTH + (dataOffset/bytesPerSum) *
FSInputChecker.CHECKSUM_SIZE;
+ }
+
+ /**
+ * Find the checksum ranges that correspond to the given data ranges.
Review comment:
Why what is needed? You mean the code to compare the checksums? The current
code requires a lot of context that isn't true in the new API. The current code
is super inefficient because it did a bad job of working around those
limitations. In particular, if you look at the current pread code, it reopens
the crc file for each seek.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]