RussellSpitzer commented on code in PR #15630: URL: https://github.com/apache/iceberg/pull/15630#discussion_r3276514487
########## format/spec.md: ########## @@ -168,6 +185,46 @@ All columns must be written to data files even if they introduce redundancy with Writers are not allowed to commit files with a partition spec that contains a field with an unknown transform. +### Paths in Metadata + +Path strings stored in Iceberg metadata location fields are classified as one of two types: + +* **Absolute path** -- A path string that includes a [URI scheme](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) (e.g., `s3:`, `gs:`, `hdfs:`, `file:`). Absolute paths are used as-is without modification. +* **Relative path** -- A path string that does not include a URI scheme. Relative paths must be resolved against the table's base location before use. + +Prior to v4, all path fields must contain fully-qualified paths. Starting with v4, path fields may contain either absolute or relative paths. [Relative resolution within a URI](https://datatracker.ietf.org/doc/html/rfc3986#section-5.2) (e.g. `.` and `..`) and other file system navigation conventions are not supported in relative paths. Review Comment: We never prohibited this previously but it was basically FileIO dependent on whether these characters would have any kind actual meaning. We actually do follow one set of rules here, we never make a // in our own path building within the Java API. We have a bunch of "strip trailing /" code to prevent someone from using S3FileIO and HadoopFileIO from getting different results. ("bar//baz" resolving to "bar/baz" in S3A and "bar//baz" in S3FileIO) I agree with @rdblue here that it's probably a good note to have in the spec that paths are treated as pure strings. No POSIX resolution should be assumed. I'm also fine with adding a note about absolute paths as well since I don't think we ever had this well defined but anyone using posix style things in their paths is probably making a mistake... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
