[ 
https://issues.apache.org/jira/browse/HADOOP-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276083#comment-13276083
 ] 

Kihwal Lee commented on HADOOP-8240:
------------------------------------

We need this feature to make data copying and verification work across clusters 
with different configurations. I would appreciate any feedback.

h4. Design Choices

# *Add a new create method to FileSystem for allowing checksum type to be 
specified.* FileSystem#create() already allows specifying bytesPerChecksum.   
The new create method may accept a DataChecksum object.  Users can use the 
existing DataChecksum.newDataChecksum( int type, int bytesPerChecksum) to 
create one. Users who wants to specify non-default type likely want to control 
bytesPerChecksum as well. 
# *Add checksum types to CreateFlags.* This approach minimizes interface 
changes, but may not be the most intuitive/consistent way.
# *Add a method to FSDataOutputStream and DFSOutputStream to allow users to 
override default checksum parameters.*  This method should fail if data is 
already written.  This is sort of like ioctl. If there are other tunables we 
want to support, we could generalize the api. But changing internal parameters 
(not encapsulated data) of an object during run-time doesn't go well with 
typical java semantics and may cause confusion. So we need to be careful about 
this.

h4. Other previously discussed approaches

# *Setting dfs.checksum.type.*  FileSystem cache cause it to be stay the same 
after the creation of DFSClient.  Also, conf is shared, so it can have 
unforeseen side-effects.
# *Disable FileSystem cache.* Create a new Configuration and set 
dfs.checksum.type. Without cache, memory bloat is too much. 
# *Use conf as a part of key in FileSystem cache, in addition to UGI and scheme 
+ authority.* Something along this line may work.  Doing shallow comparison may 
not be enough. Do we create a special hashCode/equals to make it safer?  There 
will be memory bloat, but how much?  It is still up to users to manage 
different configurations and may be more prone to mistakes because of that.

                
> Allow users to specify a checksum type on create()
> --------------------------------------------------
>
>                 Key: HADOOP-8240
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8240
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.23.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.23.3, 2.0.0, 3.0.0
>
>         Attachments: hadoop-8240.patch
>
>
> Per discussion in HADOOP-8060, a way for users to specify a checksum type on 
> create() is needed. The way FileSystem cache works makes it impossible to use 
> dfs.checksum.type to achieve this. Also checksum-related API is at 
> Filesystem-level, so we prefer something at that level, not hdfs-specific 
> one.  Current proposal is to use CreatFlag.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to