[
https://issues.apache.org/jira/browse/HADOOP-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022471#comment-18022471
]
ASF GitHub Bot commented on HADOOP-19681:
-----------------------------------------
steveloughran commented on code in PR #7942:
URL: https://github.com/apache/hadoop/pull/7942#discussion_r2376222445
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -573,8 +573,11 @@ private static void addDeprecatedKeys() {
*/
public void initialize(URI name, Configuration originalConf)
throws IOException {
- // get the host; this is guaranteed to be non-null, non-empty
+ // get the host; fallback to authority if getHost() returns null
bucket = name.getHost();
+ if (bucket == null) {
Review Comment:
pull this out, stick it in `S3AUtils`, add unit tests that now try to break
things. Use everywhere
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/AbstractFileSystem.java:
##########
@@ -335,7 +335,7 @@ private URI getUri(URI uri, String supportedScheme,
int port = uri.getPort();
port = (port == -1 ? defaultPort : port);
if (port == -1) { // no port supplied and default port is not specified
- return new URI(supportedScheme, authority, "/", null);
+ return URI.create(supportedScheme + "://" + authority + "/");
Review Comment:
why this change?
##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:
##########
@@ -31,6 +31,12 @@ before 2021.
Consult [S3A and Directory Markers](directory_markers.html) for
full details.
+### <a name="bucket-name-compatibility"></a> S3 Bucket Name Compatibility
+
+This release adds support for S3 bucket names containing dots followed by
numbers
+(e.g., `my-bucket-v1.1`, `data-store.v2.3`). Previous versions of the Hadoop
S3A
+client failed to initialize such buckets due to URI parsing limitations.
+
Review Comment:
* highlight that per-bucket settings do not work for dotted buckets (they
don't, do they?), so the ability to use them is still very much downgraded.
* Explain that AWS do not recommend dotted buckets for anything other than
web site serving
* highlight that path style access is needed to access (correct? never tried)
> Fix S3A failing to initialize S3 buckets having namespace with dot followed
> by number
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-19681
> URL: https://issues.apache.org/jira/browse/HADOOP-19681
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Reporter: Syed Shameerur Rahman
> Assignee: Syed Shameerur Rahman
> Priority: Major
> Labels: pull-request-available
>
> S3A fails to initialize when S3 bucket namespace is having dot followed by a
> number.
> {*}Specific Problem{*}: URI parsing fails when S3 bucket names contain a dot
> followed by a number (like {{{}bucket-v1.1-us-east-1{}}}). Java's
> URI.getHost() method incorrectly interprets the dot-number pattern as a port
> specification, causing it to return null.
>
> {{}}
> {code:java}
> hadoop dfs -ls s3a://bucket-v1.1-us-east-1/
> WARNING: Use of this script to execute dfs is deprecated.
> WARNING: Attempting to execute replacement "hdfs dfs" instead.
> 2025-09-08 06:13:06,670 WARN fs.FileSystem: Failed to initialize filesystem
> s3://bucket-v1.1-us-east-1/: java.lang.IllegalArgumentException: bucket is
> null/empty
> -ls: bucket is null/empty{code}
>
> {*}Please Note{*}: Although there has been discussion on not allowing S3
> buckets with such a namespace
> ([https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/])
> , Amazon S3 still allows you to create a bucket with such a namespace.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]