[
https://issues.apache.org/jira/browse/HADOOP-18439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601479#comment-17601479
]
ASF GitHub Bot commented on HADOOP-18439:
-----------------------------------------
steveloughran commented on code in PR #4862:
URL: https://github.com/apache/hadoop/pull/4862#discussion_r965174231
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
IntBuffer sums = sumsBytes.asIntBuffer();
sums.position(offset / FSInputChecker.CHECKSUM_SIZE);
ByteBuffer current = data.duplicate();
- int numChunks = data.remaining() / bytesPerSum;
+ int numFullChunks = data.remaining() / bytesPerSum;
+ boolean partialChunk = (data.remaining() % bytesPerSum != 0);
Review Comment:
wrap in () because % isn't used enough for the precedence to be immediately
obvious
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -396,11 +407,33 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
return data;
}
+ /**
+ * Validates range parameters.
+ * In case of CheckSum FS, we already have calculated
+ * fileLength so failing fast here.
+ * @param ranges requested ranges.
+ * @param fileLen length of file.
+ * @throws EOFException end of file exception.
+ */
+ private void validateRangeRequest(List<? extends FileRange> ranges, long
fileLen) throws EOFException {
+ for (FileRange range : ranges) {
+ VectoredReadUtils.validateRangeRequest(range);
+ if (range.getOffset() + range.getLength() > fileLen) {
+ LOG.warn("Requested range [{}, {}) is beyond EOF for path {}",
+ range.getOffset(), range.getLength(), file);
+ throw new EOFException("Requested range [" + range.getOffset() + ", "
+ + range.getLength() + ") is beyond EOF for path " + file);
+ }
+ }
+ }
+
@Override
public void readVectored(List<? extends FileRange> ranges,
IntFunction<ByteBuffer> allocate) throws
IOException {
+ long length = fs.getFileStatus(file).getLen();
Review Comment:
1. use `getFileLength()` which does the on demand read
2. move that code to use getFileStatus() as the way it does it today makes
no sense. excessively inefficient.
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
IntBuffer sums = sumsBytes.asIntBuffer();
sums.position(offset / FSInputChecker.CHECKSUM_SIZE);
ByteBuffer current = data.duplicate();
- int numChunks = data.remaining() / bytesPerSum;
+ int numFullChunks = data.remaining() / bytesPerSum;
+ boolean partialChunk = (data.remaining() % bytesPerSum != 0);
+ int totalChunks = numFullChunks;
+ if (partialChunk) {
+ totalChunks++;
+ }
CRC32 crc = new CRC32();
// check each chunk to ensure they match
- for(int c = 0; c < numChunks; ++c) {
+ for(int c = 0; c < totalChunks; ++c) {
// set the buffer position and the limit
- current.limit((c + 1) * bytesPerSum);
+ if (c == numFullChunks) {
+ int lastIncompleteChunk = data.remaining() % bytesPerSum;
Review Comment:
add comment to explain what is happening
> Fix VectoredIO for LocalFileSystem when checksum is enabled.
> ------------------------------------------------------------
>
> Key: HADOOP-18439
> URL: https://issues.apache.org/jira/browse/HADOOP-18439
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: common
> Affects Versions: 3.3.9
> Reporter: Mukund Thakur
> Assignee: Mukund Thakur
> Priority: Major
> Labels: pull-request-available
>
> While merging the ranges in CheckSumFs, they are rounded up based on the
> value of checksum bytes size
> which leads to some ranges crossing the EOF thus they need to be fixed else
> it will cause EOFException during actual reads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]