[
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242107#comment-15242107
]
Kai Zheng commented on HADOOP-13010:
------------------------------------
bq. This isn't safe for multiple threads, since we could be reading
CoderUtil#emptyChunk while it's in the middle of being written. You must either
make this volatile or hold the lock for this entire function.
The underlying buffer for the empty trunk is assumed read-only and will only be
used to zero coding buffers. Making the entire function safe and also private
is a good idea because in practice that level should be good enough.
bq. It's unfortunate that we need a function like this-- I was hoping that
there would be some more efficient way of zeroing a ByteBuffer.
For pure Java coders that use byte array and on-heap bytebuffer, this way to
zero buffers is efficient (perhaps the most one but I'm not totally sure); to
zero direct bytebuffer the more efficient way would be to use an empty direct
bytebuffer. I don't optimize this because pure Java coder is better not to use
direct bytebuffer overall. Note native coders will prefer direct bytebuffer but
won't need to bump into this, as we discussed in HADOOP-11540.
bq. Maybe something like cloneAsDirectByteBuffer would be a better name?
Ah yes.
bq. Should be named getNullIndexes?
Ok. Comment can be made here to tell the null indexes include erased units and
the units that's not to read.
bq. Why do we need the special case for the first element here?
Because I want it to return fast considering it's the most often case.
bq. Should be named getNonNullIndexes? Also, why does this one take an array
passed in, whereas getNullIndexes returns an array? I also don't see how the
caller is supposed to know how many of the array slots were used by the
function. ... Perhaps we could mandate that the caller set all the array slots
to a negative value before calling the function, ...
Ah good catch and thoughts! It does be able to unify with getNullIndexes, and
needs to be made general enough as it stays in the utility class, even though
for now it's only internal and used by {{RSRawDecoder}}. How many slots to be
used? By controlled by the fact exactly {{numDataUnits}} is expected and the
valid indexes are put into the array in order (out of 6+3 inputs). The array is
zeroed before, yes -1 may be better since 0 is also a valid index, but 0 can
only occur in the first place so no problem here. Yes better to have some
JavaDoc explaining about this.
bq. I'm not sure why we wouldn't just store DecoderState in the Decoder? These
are stateful objects, I assume.
It's intended not to be stateful, thus many threads can use the same decoder
instance. I'm not sure all the existing coders are already good in this aspect,
but effort will be made to achieve so if necessary, not sure all be done here.
> Refactor raw erasure coders
> ---------------------------
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Fix For: 3.0.0
>
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch
>
>
> This will refactor raw erasure coders according to some comments received so
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to
> rely class inheritance to reuse the codes, instead they can be moved to some
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a
> state holder to keep some checking results for later reuse during an
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet
> for the moment and also incurs big impact. I do wish the end result by this
> refactoring will make all the levels more clear and easier to follow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)