[
https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-13282:
------------------------------------
Attachment: HADOOP-13282-002.patch
HADOOP-13282: etag support for s3a.
* Move the EtagChecksum class into a new fs.store package in hadoop common for
use by other stores
* add tests there on its core equality/round trip operations
* Add a set of ITests for the S3A use. One of these tests is skipped if the FS
is known to be encrypted, in case the bucket returns different etags here. To
aid: added a getter for the S3AFS encryption algorithm.
With these tags, you can assume that if an object's etag changes, it is
different. You cannot safely use it to conclude that other objects, especially
across stores, are equivalent.
(note this patch reorders all the headers in ITestS3AMiscOperations. They'd got
out of order, and as it's a low-patch, low-conflict file, I've taken the chance
to fix it)
Tested
S3 London with encryption turned on; s3 ireland without
> S3 blob etags to be made visible in status/getFileChecksum() calls
> ------------------------------------------------------------------
>
> Key: HADOOP-13282
> URL: https://issues.apache.org/jira/browse/HADOOP-13282
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.9.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
> Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch
>
>
> If the etags of blobs were exported via {{getFileChecksum()}}, it'd be
> possible to probe for a blob being in sync with a local file. Distcp could
> use this to decide whether to skip a file or not.
> Now, there's a problem there: distcp needs source and dest filesystems to
> implement the same algorithm. It'd only work out the box if you were copying
> between S3 instances. There are also quirks with encryption and multipart:
> [s3
> docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html].
> At the very least, it's something which could be used when indexing the FS,
> to check for changes later.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]