emkornfield commented on code in PR #12598: URL: https://github.com/apache/iceberg/pull/12598#discussion_r2070529135
########## format/spec.md: ########## @@ -1473,7 +1473,10 @@ The following table describes the possible values for the some of the field with ### Table Metadata and Snapshots -Table metadata is serialized as a JSON object according to the following table. Snapshots are not serialized separately. Instead, they are stored in the table metadata JSON. +Table metadata is serialized as a JSON object according to the following table. Snapshots are not serialized separately. Instead, they are stored in the table metadata JSON. + +A metadata JSON file name must end in `.metadata.json`. A metadata JSON file may be compressed with [GZIP](https://datatracker.ietf.org/doc/html/rfc1952). A GZIP compressed file name must end with `.gz.metadata.json`. Review Comment: As Russell [points out](https://github.com/apache/iceberg/pull/12598#discussion_r2070480860) the requirement for file naming is already codified in specification under both protocols. So the first sentence here is simply a summary and gives context to naming for GZIP files names. I don't want to generalize the naming scheme here because as I've pointed out, my goal is simply to codify existing behavior (gzip) that should have been put in the specification from the beginning. I think exact naming belongs in the specification for two reasons: 1. We are already doing so for .metadata.json files, and is required for Filesystem based naming at the very least. This can potentially be revisited for V4 once filesystem commits are fully deprecated, but guidance is needed. 2. Even though there are multiple ways to detect GZIP files, the defacto way IIUC in the two most popular implementation of Iceberg (Java and Python) appear to use filename so would break if the naming convention is broken. Python only supports this schema (it doesn't due the java backwards compatibility). I think V4 is a good place to consider relaxing all of these constraints. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org