mrcnc opened a new pull request, #11395:
URL: https://github.com/apache/iceberg/pull/11395

   After reviewing the concerns raised in 
https://github.com/apache/iceberg/pull/11344 about using `java.net.URI` for 
parsing in ADLSLocation, I contrived an example of a location that does not 
parse correctly.   It also fails in the current implementation, so this PR adds 
a test and fix for the parsing code.  Additionally it removes test cases that 
are invalid, since they don't test [valid ABFS 
syntax](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction-abfs-uri#uri-syntax)
   
   ## Motivation
   The main reason to avoid using `java.net.URI` is that it parses according to 
[RFC 2396](https://www.ietf.org/rfc/rfc2396.txt) but object storage providers 
do not strictly follow this specification.  Specifically, in standard URI 
syntax, the question mark `?` separates the path component from the query 
component.  However, Azure Blob Storage allows question marks in blob/file 
names, making these names incompatible with the RFC 2396 URI specification.  
   
   Another important point is that Azure Storage APIs are accessed via HTTP 
APIs, so the `abfs` and `wasb` location syntax serve as identifiers to blobs 
accessed through HTTP URLs.  This is the motivation behind removing the tests 
that included query and fragment components, since they would only be used in 
the HTTP URLs and not in the ABFS "URI" syntax.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to