danielcweeks commented on PR #11344: URL: https://github.com/apache/iceberg/pull/11344#issuecomment-2420833542
> Can we have some examples though to guard against these changes in the future? Strings that won't parse correctly? There are a lot of subtle issues like hashCode and equality being inconsistent for escaped characters, handling of casing in escaped characters, it can represent things that are technically not URIs and then you get inconsistent behaviors for the raw values. One of the main ones is hostname handling, which is a problem for GCS (not sure if Azure is affected as well) because may systems allow `_` in the hostname (bucket/container) but then it cannot be parsed at all. I don't have a comprehensive list of the issues, but have run into them enough be vary wary of relying on the URI implementation, which is why we specifically avoid it's usage as it leads to unsafe/incompatible edge cases. You might look at the Trino's [TestAzureLocations](https://github.com/trinodb/trino/blob/master/lib/trino-filesystem-azure/src/test/java/io/trino/filesystem/azure/TestAzureLocation.java) as they have a similar approach to handling URIs. I know their S3 tests have one or two examples that don't parse with Java's URI class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org