mrcnc opened a new pull request, #11395: URL: https://github.com/apache/iceberg/pull/11395
After reviewing the concerns raised in https://github.com/apache/iceberg/pull/11344 about using `java.net.URI` for parsing in ADLSLocation, I contrived an example of a location that does not parse correctly. It also fails in the current implementation, so this PR adds a test and fix for the parsing code. Additionally it removes test cases that are invalid, since they don't test [valid ABFS syntax](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction-abfs-uri#uri-syntax) ## Motivation The main reason to avoid using `java.net.URI` is that it parses according to [RFC 2396](https://www.ietf.org/rfc/rfc2396.txt) but object storage providers do not strictly follow this specification. Specifically, in standard URI syntax, the question mark `?` separates the path component from the query component. However, Azure Blob Storage allows question marks in blob/file names, making these names incompatible with the RFC 2396 URI specification. Another important point is that Azure Storage APIs are accessed via HTTP APIs, so the `abfs` and `wasb` location syntax serve as identifiers to blobs accessed through HTTP URLs. This is the motivation behind removing the tests that included query and fragment components, since they would only be used in the HTTP URLs and not in the ABFS "URI" syntax. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org