[
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555514#comment-14555514
]
Yi Liu commented on HADOOP-11847:
---------------------------------
{quote}
Sorry I missed to explain why the codes are like that. It was thinking that
it's rarely the first units that's erased, so in most cases just checking
inputs\[0\] will return the wanted result, avoiding involving into the loop.
{quote}
If the first element is not null, it will return. It will have loop?
{quote}
How about simply having maxInvalidUnits = numParityUnits? The good is we don't
have to re-allocate the shared buffers for different erasures.
{quote}
We don't need to allocate {{numParityUnits}} number of buffers, the output
should at least have one, right? Maybe more than one. I don't think we have
to re-allocate the shared buffers for different erasures. If the buffers is
not enough, then we allocate new and add it to the shared pool, it's typically
behavior.
{quote}
We don't have or use chunkSize now. Please note the check is:
{quote}
Right, we don't need to use ChunkSize now. I think
{{bytesArrayBuffers\[0\].length < dataLen}} is OK.
{{ensureBytesArrayBuffer}} and {{ensureDirectBuffers}} need to be renamed and
rewritten per above comments.
{quote}
Would you check again, thanks.
{quote}
{code}
for (int i = 0; i < adjustedByteArrayOutputsParameter.length; i++) {
adjustedByteArrayOutputsParameter[i] =
resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen);
adjustedOutputOffsets[i] = 0; // Always 0 for such temp output
}
int outputIdx = 0;
for (int i = 0; i < erasedIndexes.length; i++, outputIdx++) {
for (int j = 0; j < erasedOrNotToReadIndexes.length; j++) {
// If this index is one requested by the caller via erasedIndexes, then
// we use the passed output buffer to avoid copying data thereafter.
if (erasedIndexes[i] == erasedOrNotToReadIndexes[j]) {
adjustedByteArrayOutputsParameter[j] =
resetBuffer(outputs[outputIdx], 0, dataLen);
adjustedOutputOffsets[j] = outputOffsets[outputIdx];
}
}
}
{code}
You call {{resetBuffer}}: parityNum + erasedIndexes, is that true?
> Enhance raw coder allowing to read least required inputs in decoding
> --------------------------------------------------------------------
>
> Key: HADOOP-11847
> URL: https://issues.apache.org/jira/browse/HADOOP-11847
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: io
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Labels: BB2015-05-TBR
> Attachments: HADOOP-11847-HDFS-7285-v3.patch,
> HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch,
> HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch
>
>
> This is to enhance raw erasure coder to allow only reading least required
> inputs while decoding. It will also refine and document the relevant APIs for
> better understanding and usage. When using least required inputs, it may add
> computating overhead but will possiblly outperform overall since less network
> traffic and disk IO are involved.
> This is something planned to do but just got reminded by [~zhz]' s question
> raised in HDFS-7678, also copied here:
> bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2
> is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should
> I construct the inputs to RawErasureDecoder#decode?
> With this work, hopefully the answer to above question would be obvious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)