[ 
https://issues.apache.org/jira/browse/HADOOP-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944415#comment-15944415
 ] 

Andrew Wang commented on HADOOP-13200:
--------------------------------------

Thanks for the writeup Kai, this is a tricky API issue as you note, and there 
are multiple opinions on the matter. This is my first time looking at it, so 
some perhaps basic questions:

IIUC, our raw interfaces are like the following:

RawErasureEncoder:
* encode
* getters for ECOptions
* some configs (direct buffers, change inputs, verbose dump)
* release

RawErasureDecoder
* decode
* getters for ECOptions
* some configs (direct buffers, change inputs, verbose dump)
* release

There's some overlap here, so if we provided both {{encode}} and {{decode}} in 
a single {{RawErasureCoder}} interface, it might be less code and simple to 
configure. This would be like the {{ErasureCoder}} interface that shares 
functionality between the higher-level encoder and decoder. I saw in a comment 
that the raw coders are stateless, so it seems okay from an API perspective.

Overall though I think the current system is okay. The factory is the single 
entry point to configuring a RawCoder. Right now the RawEncoders and 
RawDecoders live in separate Java files, but implementers are allowed to make 
them nested classes of the RawErasureCoderFactory.

Maybe this is what Colin wanted, since the factory classes look trivial by 
themselves. I experimented by putting the NativeRS raw encoder and raw decoder 
into their Factory class, and it looks okay since they're pretty small. I also 
made the nested classes private, to ensure they're created via the factory and 
paired correctly.

What do you think?

Other questions/comments from looking at this code:

* We also should rename RSRawEncoderLegacy to RSLegacyRawEncoder and 
RSRawErasureCoderFactoryLegacy to RSLegacyErasureCoderFactory, to match the 
naming of other subclasses.
* TestRawEncoderBase, should this use a configured factory to get the raw 
coders, rather than referencing the raw coders directly? I didn't check other 
usages, but it seems like we should be creating via the appropriate factory 
whenever possible.

> Seeking a better approach allowing to customize and configure erasure coders
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-13200
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13200
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>            Priority: Blocker
>              Labels: hdfs-ec-3.0-must-do
>
> This is a follow-on task for HADOOP-13010 as discussed over there. There may 
> be some better approach allowing to customize and configure erasure coders 
> than the current having raw coder factory, as [~cmccabe] suggested. Will copy 
> the relevant comments here to continue the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to