[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Kai Zheng (JIRA) Thu, 14 Apr 2016 16:16:23 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242107#comment-15242107
 ]


Kai Zheng commented on HADOOP-13010:
------------------------------------

bq. This isn't safe for multiple threads, since we could be reading 
CoderUtil#emptyChunk while it's in the middle of being written. You must either 
make this volatile or hold the lock for this entire function.
The underlying buffer for the empty trunk is assumed read-only and will only be 
used to zero coding buffers. Making the entire function safe and also private 
is a good idea because in practice that level should be good enough.
bq. It's unfortunate that we need a function like this-- I was hoping that 
there would be some more efficient way of zeroing a ByteBuffer.
For pure Java coders that use byte array and on-heap bytebuffer, this way to 
zero buffers is efficient (perhaps the most one but I'm not totally sure); to 
zero direct bytebuffer the more efficient way would be to use an empty direct 
bytebuffer. I don't optimize this because pure Java coder is better not to use 
direct bytebuffer overall. Note native coders will prefer direct bytebuffer but 
won't need to bump into this, as we discussed in HADOOP-11540.
bq. Maybe something like cloneAsDirectByteBuffer would be a better name?
Ah yes.
bq. Should be named getNullIndexes?
Ok. Comment can be made here to tell the null indexes include erased units and 
the units that's not to read.
bq. Why do we need the special case for the first element here?
Because I want it to return fast considering it's the most often case.
bq. Should be named getNonNullIndexes? Also, why does this one take an array 
passed in, whereas getNullIndexes returns an array? I also don't see how the 
caller is supposed to know how many of the array slots were used by the 
function. ...  Perhaps we could mandate that the caller set all the array slots 
to a negative value before calling the function, ...
Ah good catch and thoughts! It does be able to unify with getNullIndexes, and 
needs to be made general enough as it stays in the utility class, even though 
for now it's only internal and used by {{RSRawDecoder}}. How many slots to be 
used? By controlled by the fact exactly {{numDataUnits}} is expected and the 
valid indexes are put into the array in order (out of 6+3 inputs). The array is 
zeroed before, yes -1 may be better since 0 is also a valid index, but 0 can 
only occur in the first place so no problem here. Yes better to have some 
JavaDoc explaining about this.
bq. I'm not sure why we wouldn't just store DecoderState in the Decoder? These 
are stateful objects, I assume.
It's intended not to be stateful, thus many threads can use the same decoder 
instance. I'm not sure all the existing coders are already good in this aspect, 
but effort will be made to achieve so if necessary, not sure all be done here.


> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>             Fix For: 3.0.0
>
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Reply via email to