rdblue commented on code in PR #12598:
URL: https://github.com/apache/iceberg/pull/12598#discussion_r2076448013


##########
format/spec.md:
##########
@@ -1761,6 +1763,10 @@ The reference Java implementation uses a type 4 uuid and 
XORs the 4 most signifi
 
 Java writes `-1` for "no current snapshot" with V1 and V2 tables and considers 
this equivalent to omitted or `null`. This has never been formalized in the 
spec, but for compatibility, other implementations can accept `-1` as `null`. 
Java will no longer write `-1` and will use `null` for "no current snapshot" 
for all tables with a version greater than or equal to V3.
 
+### Naming for GZIP compressed Metadata JSON files
+
+Some implementations require that GZIP compressed files have the suffix 
`.gz.metadata.json` to be read correctly. The Java reference implementation can 
additionally read GZIP compressed files with the suffix `metadata.json.gz`.  

Review Comment:
   @RussellSpitzer what do you think we should do moving forward? Should we 
check magic bytes? We could use 
`Pattern.compile("\\.gz\\b").matcher(filename).find()`?
   
   It looks like implementations should be using `.gz.metadata.json` for 
compatibility, but we should identify the format using the regex. And when we 
read the metadata.json file we should probably use a stream that 
auto-decompresses based on magic bytes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to