gavinchou commented on code in PR #63409:
URL: https://github.com/apache/doris/pull/63409#discussion_r3333277413
##########
be/src/common/config.cpp:
##########
@@ -1497,6 +1497,8 @@ DEFINE_mInt32(s3_read_max_wait_time_ms, "800");
DEFINE_mBool(enable_s3_object_check_after_upload, "true");
DEFINE_mInt32(aws_client_request_timeout_ms, "30000");
+DEFINE_mBool(s3_disable_content_md5, "false");
Review Comment:
suggest naming
s3_disable_content_md5 -> enable_s3_content_md5
default to true.
##########
be/src/io/fs/s3_obj_storage_client.cpp:
##########
@@ -116,6 +120,31 @@ using namespace Aws::S3::Model;
static constexpr int S3_REQUEST_THRESHOLD_MS = 5000;
+namespace {
+// S3 Express One Zone endpoints follow the pattern
+// "*.s3express-<zone>.<region>.amazonaws.com" — substring match is sufficient
+// for both the gateway and the s3express subdomain forms.
+bool is_s3_express_endpoint(const std::string& endpoint) {
Review Comment:
what's the difference to
```
const bool is_s3_express = s3_conf.endpoi
```
in s3_util.cpp
it seems better to abstract a static/global function/utility to check if the
given thing/context is s3 express, something like.
```
is_s3_express(endpoint, bucket_name, ...) {
if (one of the input is in s3 express context) return true;
else return false;
}
```
##########
be/src/util/s3_util.cpp:
##########
@@ -528,12 +530,23 @@ std::shared_ptr<io::ObjStorageClient>
S3ClientFactory::_create_s3_client(
aws_config.retryStrategy = std::make_shared<S3CustomRetryStrategy>(
config::max_s3_client_retry /*scaleFactor = 25*/);
- std::shared_ptr<Aws::S3::S3Client> new_client =
std::make_shared<Aws::S3::S3Client>(
- get_aws_credentials_provider(s3_conf), std::move(aws_config),
- Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never,
- s3_conf.use_virtual_addressing);
+ // S3 Express buckets are identified by the --x-s3 suffix or an s3express
endpoint.
+ // Skip endpointOverride for these so the SDK resolves the bucket-specific
endpoint
+ // and manages CreateSession automatically. For all other buckets, keep
the override.
+ const bool is_s3_express = s3_conf.endpoint.find("s3express") !=
std::string::npos ||
Review Comment:
this is not robust, e.g.
"--x-s3" is the bucket name suffix, however, `s3_conf.bucket.find("--x-s3")
!= std::string::npos` does not reflect the truth.
BTW, what is the formal way to support s3 express, better to find some
official references.
##########
be/src/common/config.cpp:
##########
@@ -1497,6 +1497,8 @@ DEFINE_mInt32(s3_read_max_wait_time_ms, "800");
DEFINE_mBool(enable_s3_object_check_after_upload, "true");
DEFINE_mInt32(aws_client_request_timeout_ms, "30000");
+DEFINE_mBool(s3_disable_content_md5, "false");
Review Comment:
or this configuration should be something like
`s3_content_checksum_method = md5`
should change to crc32
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]