danielcweeks commented on PR #11344:
URL: https://github.com/apache/iceberg/pull/11344#issuecomment-2420833542

   > Can we have some examples though to guard against these changes in the 
future? Strings that won't parse correctly?
   
   There are a lot of subtle issues like hashCode and equality being 
inconsistent for escaped characters, handling of casing in escaped characters, 
it can represent things that are technically not URIs and then you get 
inconsistent behaviors for the raw values. One of the main ones is hostname 
handling, which is a problem for GCS (not sure if Azure is affected as well) 
because may systems allow `_` in the hostname (bucket/container) but then it 
cannot be parsed at all.
   
   I don't have a comprehensive list of the issues, but have run into them 
enough be vary wary of relying on the URI implementation, which is why we 
specifically avoid it's usage as it leads to unsafe/incompatible edge cases.  
You might look at the Trino's 
[TestAzureLocations](https://github.com/trinodb/trino/blob/master/lib/trino-filesystem-azure/src/test/java/io/trino/filesystem/azure/TestAzureLocation.java)
 as they have a similar approach to handling URIs.  I know their S3 tests have 
one or two examples that don't parse with Java's URI class. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to