emkornfield commented on code in PR #12598:
URL: https://github.com/apache/iceberg/pull/12598#discussion_r2070529135


##########
format/spec.md:
##########
@@ -1473,7 +1473,10 @@ The following table describes the possible values for 
the some of the field with
 
 ### Table Metadata and Snapshots
 
-Table metadata is serialized as a JSON object according to the following 
table. Snapshots are not serialized separately. Instead, they are stored in the 
table metadata JSON.
+Table metadata is serialized as a JSON object according to the following 
table. Snapshots are not serialized separately. Instead, they are stored in the 
table metadata JSON. 
+
+A metadata JSON file name must end in `.metadata.json`. A metadata JSON file 
may be compressed with [GZIP](https://datatracker.ietf.org/doc/html/rfc1952). A 
GZIP compressed file name must end with `.gz.metadata.json`.

Review Comment:
   As Russell [points 
out](https://github.com/apache/iceberg/pull/12598#discussion_r2070480860) the 
requirement for file naming is already codified in specification under both 
protocols.  So the first sentence here is simply a summary and gives context to 
naming for GZIP files names.
   
   I don't want to generalize the naming scheme here because as I've pointed 
out, my goal is simply to codify existing behavior (gzip) that should have been 
put in the specification from the beginning.  I think exact naming belongs in 
the specification for two reasons:
   1.  We are already doing so for .metadata.json files, and is required for 
Filesystem based naming at the very least. This can potentially be revisited 
for V4 once filesystem commits are fully deprecated, but guidance is needed.
   2. Even though there are multiple ways to detect GZIP files, the defacto way 
IIUC in the two most popular implementation of Iceberg (Java and Python) appear 
to use filename so would break if the naming convention is broken.  Python only 
supports this schema (it doesn't due the java backwards compatibility).  
   
   I think V4 is a good place to consider relaxing all of these constraints.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to