[
https://issues.apache.org/jira/browse/HADOOP-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933884#comment-15933884
]
Kai Zheng commented on HADOOP-13200:
------------------------------------
Roughly the problem is, how to configure a raw coder impl for a codec. A raw
coder impl includes both an encoder and decoder. Currently it uses a raw coder
factory to combine an encoder and decoder together to represent a raw coder
impl for a codec. Available codecs are rs-default, rs-legacy, xor, hh-xor and
etc. raw coder impls for rs-default codec are: RSRawErasureCoderFactory and
NativeRSRawErasureCoderFactory. More impls for a codec could be provided/added
in future. The issue originated from the discussion with Colin and he disliked
the current way using the factory method. It was meant to figure out a way to
get rid of the raw coder factories.
I don’t have a perfect solution for this in mind yet. Some related ideas so far:
1. Dynamically combine the encoder/decoder name given codec name and other
info, roughly suggested by Colin but I may not catch him quite exactly. It
doesn’t look to me very attractive because there is no easy or intuitive way to
reduce a raw coder name directly from some configuration properties. Given the
raw coder name is reduced and then the corresponding encoder/decoder class name
can be out so we can create the needed encoder/decoder instances directly.
2. Combine encoder and decoder together, suggested by ATM somewhere. If we
combine encoder and decoder together, then we can directly save or avoid the
factory. It sounds good for some raw coder impls but not for others. Some raw
encoder/decoder impl is pretty complex, if we combine them the resultant class
will be pretty large and hard to maintain. Generally, decoding logic will be
much complex than encoding. An extreme example, LRC codec, the both encoding
and decoding logic will be quite complicated, so better to be separate.
Note in current approach, to configure something for a codec or a raw coder
impl for the codec, the configuration property starts with something like
{{io.erasurecode.codec.rs.rawcoder.*}}
[~andrew.wang], what's your thought? Thanks!
> Seeking a better approach allowing to customize and configure erasure coders
> ----------------------------------------------------------------------------
>
> Key: HADOOP-13200
> URL: https://issues.apache.org/jira/browse/HADOOP-13200
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Priority: Blocker
> Labels: hdfs-ec-3.0-must-do
>
> This is a follow-on task for HADOOP-13010 as discussed over there. There may
> be some better approach allowing to customize and configure erasure coders
> than the current having raw coder factory, as [~cmccabe] suggested. Will copy
> the relevant comments here to continue the discussion.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]