[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Colin Patrick McCabe (JIRA) Thu, 14 Apr 2016 16:52:42 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242157#comment-15242157
 ]


Colin Patrick McCabe commented on HADOOP-13010:
-----------------------------------------------

bq. The underlying buffer for the empty trunk is assumed read-only and will 
only be used to zero coding buffers. Making the entire function safe and also 
private is a good idea because in practice that level should be good enough.

Right.  The arrays themselves are read-only.  But we still have to control 
access to the pointer to the array, which is not read-only and which needs to 
be accessed in a thread-safe fashion.

bq. For pure Java coders that use byte array and on-heap bytebuffer, this way 
to zero buffers is efficient (perhaps the most one but I'm not totally sure); 
to zero direct bytebuffer the more efficient way would be to use an empty 
direct bytebuffer. I don't optimize this because pure Java coder is better not 
to use direct bytebuffer overall. Note native coders will prefer direct 
bytebuffer but won't need to bump into this, as we discussed in HADOOP-11540.

Yeah, the JNI encoders can be more efficient, so we don't have to worry about 
optimizing this.  I was just commenting that it's unfortunate that we have to 
keep around the empty array.

bq. Ok. Comment can be made here to tell the null indexes include erased units 
and the units that's not to read.

The function just finds null array entries.  What these entries mean is up to 
the caller.

bq. Because I want \[the first element\] to return fast considering it's the 
most often case.

I don't see any evidence that adding a special case makes this faster than just 
running the loop.  The loop starts at the first element anyway.  If the loop 
usually stops after the first iteration, I would expect the just-in-time 
compiler to optimize this code.  Let's get rid of the special case, unless we 
have some benchmarks showing that it helps.

bq. \[Decoders are\] intended not to be stateful, thus many threads can use the 
same decoder instance. I'm not sure all the existing coders are already good in 
this aspect, but effort will be made to achieve so if necessary, not sure all 
be done here.

Part of the appeal of object-oriented programming is to combine the data with 
the methods used to operate on that data.  I'm not sure why we would want to 
keep the decoder state separate from the decoder functions.  If we want to do 
multiple decode operations in parallel, we can just create multiple Decoder 
objects, right?

> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>             Fix For: 3.0.0
>
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Reply via email to