[
https://issues.apache.org/jira/browse/HADOOP-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949576#comment-15949576
]
Andrew Wang commented on HADOOP-13200:
--------------------------------------
Hi Kai,
Would be great to file JIRAs to handle the separate small issues noticed. I
also spent some more time thinking about this.
High-level problem: determine what raw encoder and decoder to use for a coder.
Our current system:
{noformat}
coder --------> rawcoder factory method ---------> factory ---------> raw
encoder / decoder
rawcoder reflection hardcode
factory
class name
{noformat}
If we replace the reflection step with a registry, we can save the per-rawcoder
factory classes:
{noformat}
coder ----------> rawcoder factory registry --------> factory ----------> raw
encoder + decoder
rawcoder lookup hardcode
factory
name
{noformat}
* Raw coder factories would be identified by an additional getName() interface.
* The registry is a singleton that maps coders to a map of rawcoder factories,
keyed by getName()
* Registry is prepopulated with the built-in factories; these can be private
nested classes of the registry, or held in a new class.
* The list of pluggable raw coder factory classes are specified in a config
key. We classload these at startup and trigger their static initializers, which
register them with the registry. We could enforce namespacing of pluggable raw
coder names to future-proof.
Since nothing in the registry is config-dependent, I think it's a safe
singleton. Config-specific logic is handled outside the Registry or in static
methods.
I think this might also help with implementing caching later, since it's
centralized and avoids reflection after initialization.
Thoughts?
> Seeking a better approach allowing to customize and configure erasure coders
> ----------------------------------------------------------------------------
>
> Key: HADOOP-13200
> URL: https://issues.apache.org/jira/browse/HADOOP-13200
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Priority: Blocker
> Labels: hdfs-ec-3.0-must-do
>
> This is a follow-on task for HADOOP-13010 as discussed over there. There may
> be some better approach allowing to customize and configure erasure coders
> than the current having raw coder factory, as [~cmccabe] suggested. Will copy
> the relevant comments here to continue the discussion.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]