[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Kai Zheng (JIRA) Thu, 14 Apr 2016 17:17:07 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242201#comment-15242201
 ]


Kai Zheng commented on HADOOP-13010:
------------------------------------

Thanks Colin for the quick response!
bq. Let's get rid of the special case, unless we have some benchmarks showing 
that it helps.
Ok. It's not hot at all.
bq. If we want to do multiple decode operations in parallel, we can just create 
multiple Decoder objects, right?
The problem is, a decoder associates expensive coding buffers and computed 
coding matrices, which would be good to stay in CPU core near enough caches for 
the performance. The cached data is per decoder, not only schema specific, but 
also erasure index specific in decode call, so it's not good to keep the cache 
out of decoder, but still makes sense to cache it because in HDFS side it's 
repeatedly called in a loop for a large block size (64k cell size -> 256mb 
block size). You might have a check about the native codes for native coders 
about the expensive buffers and data cached in every decode call. We had 
benchmarked the coders and showed this optimization obtained great speedup. 
Java InputStreams are similar to here, but not exactly because it's pure 
view-only and leverages OS/IO level caches for file reading stuffs. 


> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>             Fix For: 3.0.0
>
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Reply via email to