tomtongue commented on code in PR #6352:
URL: https://github.com/apache/iceberg/pull/6352#discussion_r1046599507


##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java:
##########
@@ -74,17 +74,14 @@ class S3URI {
     this.scheme = schemeSplit[0];
 
     String[] authoritySplit = schemeSplit[1].split(PATH_DELIM, 2);
-    ValidationException.check(
-        authoritySplit.length == 2, "Invalid S3 URI, cannot determine bucket: 
%s", location);
-    ValidationException.check(
-        !authoritySplit[1].trim().isEmpty(), "Invalid S3 URI, path is empty: 
%s", location);
+

Review Comment:
   Thanks for reviewing, Daniel. 
   > If you use the bucket as the location of a table, you basically cannot use 
that bucket for anything else. Things like orphan files procedure will wipe 
anything else in the bucket out.
   > 
   > Is there a requirement that you cannot have a bucket prefix for some use 
case?
   
   Yes, I agree this, but I understand there are users who build their Iceberg 
tables on each S3 bucket with some configurations such as encryptions, bucket 
policy, lifecycle policy etc. Currently, Iceberg doesn’t allow users to 
directly create an Iceberg table at S3 bucket, but I think it’s possible to 
provide creating an Iceberg table with the users.
   
   Let me share more background for this PR. This PR is based on the following 
situation I confirmed:
   1. You initially build an Iceberg table at direct S3 bucket like 
`s3://bucket` using Athena or Spark (we can also build an Iceberg table at 
direct S3 bucket with `LOCATION` clause in `CREATE TABLE` DDL)
   2. Then, you write data into the Iceberg table by write operation such as 
`INSERT INTO`. This is successful.
   3. If we run a read query (e.g. `SELECT`) for the Iceberg table after 
writing data, it fails with `org.apache.iceberg.exceptions.ValidationException 
: Invalid S3 URI, path is empty: s3://bucket`.
   
   Users who have this situation need to re-create the Iceberg table at s3 
bucket with path(s) such as `s3://bucket/path`. I confirmed there are users who 
had this situation.
   
   To avoid this situation, we can have three choices (in my understanding):
   * Just showing warning message to users if they directly create an Iceberg 
table at s3 bucket
   * Not allowing users to directly create an Iceberg table at s3 bucket
   * Allowing users to directly create an Iceberg table at s3 bucket
   
   Initially I thought adding the restriction of table creation in regards to 
the first two choices. However based on the following considerations, I choose 
the last (third) one and decided to submit this PR.
   * For the first one, it’s not difficult because just putting warning message 
in the Iceberg code. But this sometimes wouldn’t work well for users to avoid 
creating an Iceberg table at direct s3 bucket.
   * The second one needs to add the restriction of table creation at direct s3 
bucket when users specify a s3 bucket name only for an Iceberg table location. 
However, this restriction forces users to use s3 bucket with paths and it isn’t 
consistent with other software behavior that allows to use direct s3 bucket.
   
   Sorry for my long comments, if there’s my misunderstandings, please let me 
know.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to