[
https://issues.apache.org/jira/browse/HADOOP-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276083#comment-13276083
]
Kihwal Lee commented on HADOOP-8240:
------------------------------------
We need this feature to make data copying and verification work across clusters
with different configurations. I would appreciate any feedback.
h4. Design Choices
# *Add a new create method to FileSystem for allowing checksum type to be
specified.* FileSystem#create() already allows specifying bytesPerChecksum.
The new create method may accept a DataChecksum object. Users can use the
existing DataChecksum.newDataChecksum( int type, int bytesPerChecksum) to
create one. Users who wants to specify non-default type likely want to control
bytesPerChecksum as well.
# *Add checksum types to CreateFlags.* This approach minimizes interface
changes, but may not be the most intuitive/consistent way.
# *Add a method to FSDataOutputStream and DFSOutputStream to allow users to
override default checksum parameters.* This method should fail if data is
already written. This is sort of like ioctl. If there are other tunables we
want to support, we could generalize the api. But changing internal parameters
(not encapsulated data) of an object during run-time doesn't go well with
typical java semantics and may cause confusion. So we need to be careful about
this.
h4. Other previously discussed approaches
# *Setting dfs.checksum.type.* FileSystem cache cause it to be stay the same
after the creation of DFSClient. Also, conf is shared, so it can have
unforeseen side-effects.
# *Disable FileSystem cache.* Create a new Configuration and set
dfs.checksum.type. Without cache, memory bloat is too much.
# *Use conf as a part of key in FileSystem cache, in addition to UGI and scheme
+ authority.* Something along this line may work. Doing shallow comparison may
not be enough. Do we create a special hashCode/equals to make it safer? There
will be memory bloat, but how much? It is still up to users to manage
different configurations and may be more prone to mistakes because of that.
> Allow users to specify a checksum type on create()
> --------------------------------------------------
>
> Key: HADOOP-8240
> URL: https://issues.apache.org/jira/browse/HADOOP-8240
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.23.0
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Fix For: 0.23.3, 2.0.0, 3.0.0
>
> Attachments: hadoop-8240.patch
>
>
> Per discussion in HADOOP-8060, a way for users to specify a checksum type on
> create() is needed. The way FileSystem cache works makes it impossible to use
> dfs.checksum.type to achieve this. Also checksum-related API is at
> Filesystem-level, so we prefer something at that level, not hdfs-specific
> one. Current proposal is to use CreatFlag.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira