dimas-b commented on PR #10283: URL: https://github.com/apache/iceberg/pull/10283#issuecomment-2102663607
Encoding special chars in partition path elements sounds like a good idea, but I'm not sure it is that simple However, the problem is not specific to what Iceberg code puts The base location of a table may also include `#` chars. This works well in S3 (as the test case in this PR shows), but Iceberg's `S3FileIO` will not handle those locations ATM. How do you propose to deal with special chars in base paths? Also, with Iceberg URL-encoding special chars in the file locations, the interpretation of those locations requires special logic. The cannot be interpreted according to [RFC 3986](https://www.rfc-editor.org/rfc/rfc3986) because the URI parsing rules require decoding those chars, which will subsequently lead to mismatches at the S3 API level (the latter interpreting key parameters verbatim). As I noted in comments, this PR attempts to interpret S3 URI in a matter consistent with what AWS S3 UI produces. For example, if one creates a directory named `te#st` in S3 and obtains its URI from AWS UI, the `#` char is _not_ encoded. This applies both to Iceberg-produced directory names as well as to base locations, controlled by the user. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org