dimas-b commented on PR #10283:
URL: https://github.com/apache/iceberg/pull/10283#issuecomment-2102663607

   Encoding special chars in partition path elements sounds like a good idea, 
but I'm not sure it is that simple
   
   However, the problem is not specific to what Iceberg code puts The base 
location of a table may also include `#` chars. This works well in S3 (as the 
test case in this PR shows), but Iceberg's `S3FileIO` will not handle those 
locations ATM.
   
   How do you propose to deal with special chars in base paths?
   
   Also, with Iceberg URL-encoding special chars in the file locations, the 
interpretation  of those locations requires special logic. The cannot be 
interpreted according to [RFC 3986](https://www.rfc-editor.org/rfc/rfc3986) 
because the URI parsing rules require decoding those chars, which will 
subsequently lead to mismatches at the S3 API level (the latter interpreting 
key parameters verbatim).
   
   As I noted in comments, this PR attempts to interpret S3 URI in a matter 
consistent with what AWS S3 UI produces. For example, if one creates a 
directory named `te#st` in S3 and obtains its URI from AWS UI, the `#` char is 
_not_ encoded. This applies both to Iceberg-produced directory names as well as 
to base locations, controlled by the user.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to