tomtongue commented on code in PR #6352: URL: https://github.com/apache/iceberg/pull/6352#discussion_r1046599507
########## aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java: ########## @@ -74,17 +74,14 @@ class S3URI { this.scheme = schemeSplit[0]; String[] authoritySplit = schemeSplit[1].split(PATH_DELIM, 2); - ValidationException.check( - authoritySplit.length == 2, "Invalid S3 URI, cannot determine bucket: %s", location); - ValidationException.check( - !authoritySplit[1].trim().isEmpty(), "Invalid S3 URI, path is empty: %s", location); + Review Comment: Thanks for reviewing, Daniel. > If you use the bucket as the location of a table, you basically cannot use that bucket for anything else. Things like orphan files procedure will wipe anything else in the bucket out. > > Is there a requirement that you cannot have a bucket prefix for some use case? Yes, I agree this, but I understand there are users who build their Iceberg tables on each S3 bucket with some configurations such as encryptions, bucket policy, lifecycle policy etc. Currently, Iceberg doesn’t allow users to directly create an Iceberg table at S3 bucket, but I think it’s possible to provide creating an Iceberg table with the users. Let me share more background for this PR. This PR is based on the following situation I confirmed: 1. You initially build an Iceberg table at direct S3 bucket like `s3://bucket` using Athena or Spark (we can also build an Iceberg table at direct S3 bucket with `LOCATION` clause in `CREATE TABLE` DDL) 2. Then, you write data into the Iceberg table by write operation such as `INSERT INTO`. This is successful. 3. If we run a read query (e.g. `SELECT`) for the Iceberg table after writing data, it fails with `org.apache.iceberg.exceptions.ValidationException : Invalid S3 URI, path is empty: s3://bucket`. Users who have this situation need to re-create the Iceberg table at s3 bucket with path(s) such as `s3://bucket/path`. I confirmed there are users who had this situation. To avoid this situation, we can have three choices (in my understanding): * Just showing warning message to users if they directly create an Iceberg table at s3 bucket * Not allowing users to directly create an Iceberg table at s3 bucket * Allowing users to directly create an Iceberg table at s3 bucket Initially I thought adding the restriction of table creation in regards to the first two choices. However based on the following considerations, I choose the last (third) one and decided to submit this PR. * For the first one, it’s not difficult because just putting warning message in the Iceberg code. But this sometimes wouldn’t work well for users to avoid creating an Iceberg table at direct s3 bucket. * The second one needs to add the restriction of table creation at direct s3 bucket when users specify a s3 bucket name only for an Iceberg table location. However, this restriction forces users to use s3 bucket with paths and it isn’t consistent with other software behavior that allows to use direct s3 bucket. Sorry for my long comments, if there’s my misunderstandings, please let me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org