[
https://issues.apache.org/jira/browse/HADOOP-18672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703164#comment-17703164
]
Steve Loughran commented on HADOOP-18672:
-----------------------------------------
azure storage doesn't have checksums
it does have etags. while distcp can't cope with that, you are welcome to
implement your own equivalent which logs the source checksum and dest etag, and
only update when changed. this'd also work for s3, any other store implementing
EtagSource on their filestatus
s3a can export etags as its checksum, but as it's not compatible with distcp,
it just broke all jobs without -skipCrc check. that's why its disabled. abfs
would be the same unless, like gcs, azure storage added a compatible checksum
> ask: abfs connector to support checksum
> ---------------------------------------
>
> Key: HADOOP-18672
> URL: https://issues.apache.org/jira/browse/HADOOP-18672
> Project: Hadoop Common
> Issue Type: Wish
> Components: fs/azure
> Reporter: Wei-Hsiang Lin
> Priority: Major
>
> Hi Hadoop-Azure community,
> I cannot find much information on reason why abfs connector file level
> checksum is not supported, could you share some insights on why it doesn't
> support and is there plan to support in the future ?
> having this would be helpful for migrating data from on-prem to Azure storage
> using abfs connector
> ref https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]