[
https://issues.apache.org/jira/browse/HADOOP-19654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036235#comment-18036235
]
ASF GitHub Bot commented on HADOOP-19654:
-----------------------------------------
steveloughran commented on code in PR #7882:
URL: https://github.com/apache/hadoop/pull/7882#discussion_r2503587272
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java:
##########
@@ -202,11 +205,34 @@ private <BuilderT extends S3BaseClientBuilder<BuilderT,
ClientT>, ClientT> Build
configureEndpointAndRegion(builder, parameters, conf);
+ // add a plugin to add a Content-MD5 header.
+ // this is required when performing some operations with third party stores
+ // (for example: bulk delete), and is somewhat harmless when working with
AWS S3.
+ if (parameters.isMd5HeaderEnabled()) {
+ LOG.debug("MD5 header enabled");
+ builder.addPlugin(LegacyMd5Plugin.create());
+ }
+
+ //when to calculate request checksums.
+ final RequestChecksumCalculation checksumCalculation =
+ parameters.isChecksumCalculationEnabled()
+ ? RequestChecksumCalculation.WHEN_SUPPORTED
Review Comment:
some operations require checksums (bulk delete?) and everything which
implemented them has had to expect checksums. This new generation option, "when
supported" is what broke things as it really means "generate checksums on all
requests". There are only two values in the enum, so the sdk always has to
choose one.
when_supported
* doesn't work for most third party stores
* seems to break MPUs if you don't set a content checksum for put/posted
data.
I think having a generation "true/false" is simpler for people to understand
than the nuances of when_supported vs when_required.
> Upgrade AWS SDK to 2.35.4
> -------------------------
>
> Key: HADOOP-19654
> URL: https://issues.apache.org/jira/browse/HADOOP-19654
> Project: Hadoop Common
> Issue Type: Improvement
> Components: build, fs/s3
> Affects Versions: 3.5.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
>
> Upgrade to a recent version of 2.33.x or later while off the critical path of
> things.
> HADOOP-19485 froze the sdk at a version which worked with third party stores.
> Apparently the new version works; early tests show that Bulk Delete calls
> with third party stores complain about lack of md5 headers, so some tuning is
> clearly going to be needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]