Kai Xie created HADOOP-16049:
--------------------------------
Summary: DistCp result has data and checksum mismatch when blocks
per chunk > 0
Key: HADOOP-16049
URL: https://issues.apache.org/jira/browse/HADOOP-16049
Project: Hadoop Common
Issue Type: Bug
Components: tools/distcp
Affects Versions: 2.9.2
Reporter: Kai Xie
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
...
if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
}
... // write to dst
bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never
updated when blocks per chunk is set to > 0 (which always disables append
action). So for chunk with offset != 0, it will keep copying the first few
bytes again and again, causing result to have data & checksum mismatch.
HADOOP-15292 has resolved this ticket by not using the positioned read, but has
not been backported to branch-2 yet
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]