anoopj commented on code in PR #15630:
URL: https://github.com/apache/iceberg/pull/15630#discussion_r3220842567


##########
format/spec.md:
##########
@@ -168,6 +185,35 @@ All columns must be written to data files even if they 
introduce redundancy with
 
 Writers are not allowed to commit files with a partition spec that contains a 
field with an unknown transform.
 
+### Paths in Metadata
+
+Path strings stored in Iceberg metadata files are classified as one of two 
types:
+
+* **Absolute path** -- A path string that includes a [URI 
scheme](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) (e.g., 
`s3:`, `gs:`, `hdfs:`, `file:`). Absolute paths are used as-is without 
modification.
+* **Relative path** -- A path string that does not include a URI scheme. 
Relative paths must be resolved against the table's base location before use.
+
+Prior to v4, all path fields must contain absolute paths. Starting with v4, 
path fields may contain either absolute or relative paths. Directory navigation 
symbols (`.` and `..`) and other file system conventions are not supported in 
relative paths.
+
+#### Path Resolution
+
+Path resolution is the process of producing an absolute path from a relative 
path by combining it with the table's base location:
+
+* If the path contains a URI scheme, it is absolute and is used without 
modification.
+* If the path does not contain a URI scheme, the resolved path is the table 
location followed by the relative path.

Review Comment:
   The line clearly defines resolution as string concatenation, not RFC 3986 
reference resolution.  The RFC has defined the reference resolution algorithm 
[here](https://datatracker.ietf.org/doc/html/rfc3986#section-5.2.2).  
   
   This sentence is easy to miss and an implementer could reasonably reach for 
their language's URL resolver (like Rust's `Url::join()`) and get wrong results 
for `/` prefixed paths.
   
   Specifically Rust `Url::join("s3://bucket/table/", "/metadata/foo.parquet")` 
hits the `if (R.path starts-with "/") then[...]`  branch in the RFC and 
incorrectly produces `s3://bucket/metadata/foo.parquet`.
   
   Should we add a warning somewhere in the Path Resolution section?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to