steveloughran commented on code in PR #7882:
URL: https://github.com/apache/hadoop/pull/7882#discussion_r2503587272
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java:
##########
@@ -202,11 +205,34 @@ private <BuilderT extends S3BaseClientBuilder<BuilderT,
ClientT>, ClientT> Build
configureEndpointAndRegion(builder, parameters, conf);
+ // add a plugin to add a Content-MD5 header.
+ // this is required when performing some operations with third party stores
+ // (for example: bulk delete), and is somewhat harmless when working with
AWS S3.
+ if (parameters.isMd5HeaderEnabled()) {
+ LOG.debug("MD5 header enabled");
+ builder.addPlugin(LegacyMd5Plugin.create());
+ }
+
+ //when to calculate request checksums.
+ final RequestChecksumCalculation checksumCalculation =
+ parameters.isChecksumCalculationEnabled()
+ ? RequestChecksumCalculation.WHEN_SUPPORTED
Review Comment:
some operations require checksums (bulk delete?) and everything which
implemented them has had to expect checksums. This new generation option, "when
supported" is what broke things as it really means "generate checksums on all
requests". There are only two values in the enum, so the sdk always has to
choose one.
when_supported
* doesn't work for most third party stores
* seems to break MPUs if you don't set a content checksum for put/posted
data.
I think having a generation "true/false" is simpler for people to understand
than the nuances of when_supported vs when_required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]