[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Kai Zheng (JIRA) Thu, 21 May 2015 19:43:19 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555483#comment-14555483
 ]


Kai Zheng commented on HADOOP-11847:
------------------------------------

Thanks for more review and comment!
bq. for findFirstValidInput, still one comment not addressed:
Sorry I missed to explain why the codes are like that. It was thinking that 
it's rarely the first units that's erased, so in most cases just checking 
{{inputs\[0\]}} will return the wanted result, avoiding involving into the loop.
bq. Do we need maxInvalidUnits * 2 for bytesArrayBuffers and directBuffers? 
Since we don't need additional buffer for inputs. The correct size should be ...
Good catch! How about simply having {{maxInvalidUnits = numParityUnits}}? The 
good is we don't have to re-allocate the shared buffers for different erasures.
bq. The share buffer size should be always the chunk size, otherwise they can't 
be shared, since the dataLen may be different.
We don't have or use chunkSize now. Please note the check is:
{code}
+    if (bytesArrayBuffers == null || bytesArrayBuffers[0].length < dataLen) {
+      /**
+       * Create this set of buffers on demand, which is only needed at the 
first
+       * time running into this, using bytes array.
+       */
{code}
bq. We should check erasedOrNotToReadIndexes contains erasedIndexes. 
Good point. The check would avoid bad usage with mismatched inputs and 
erasedIndexes.
bq. We just need one loop...
Hmm, I'm not sure. We should place the output buffers from caller in the 
correct positions. For example:
Assuming 6+3, recovering d0, not-to-read=\[p1, d3\], outputs = \[d0\]. Then 
adjustedByteArrayOutputsParameter should be: 
\[p1,d0,s1(d3)\], where s* means shared buffer. 

Would you check again, thanks.

> Enhance raw coder allowing to read least required inputs in decoding
> --------------------------------------------------------------------
>
>                 Key: HADOOP-11847
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11847
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: io
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
> HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
> HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch
>
>
> This is to enhance raw erasure coder to allow only reading least required 
> inputs while decoding. It will also refine and document the relevant APIs for 
> better understanding and usage. When using least required inputs, it may add 
> computating overhead but will possiblly outperform overall since less network 
> traffic and disk IO are involved.
> This is something planned to do but just got reminded by [~zhz]' s question 
> raised in HDFS-7678, also copied here:
> bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
> is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
> I construct the inputs to RawErasureDecoder#decode?
> With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Reply via email to