istreeter opened a new issue, #12281:
URL: https://github.com/apache/iceberg/issues/12281

   ### Feature Request / Improvement
   
   Currently, metadata files are pretty-printed, with lots of new-lines and 
whitespace indentations.  [This is the relevant line of 
code](https://github.com/apache/iceberg/blob/abb47830e7df7dc2ae93c74b0ad97f06cdd37aad/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L131),
 which uses the Jackson default pretty printer.
   
   If we could write metadata files without redundant whitespace, then it would 
save some storage space, and network overhead.
   
   This will have have most impact for tables with large metadata files.  For 
example, I have seen a metadata files which was 53.6MB.  After removing 
whitespace, this was reduced to 41.4MB.  I have read other issues in github 
which mention gigabyte-scale metadata files, e.g. in #9734.
   
   I cannot think of any downside of this suggested change.  Metadata files are 
mainly read by machines not humans.  And if a human does want to inspect a 
metadata file, then it is fairly easy to prettify a JSON file when needed.
   
   I'd be happy to open a PR for this, if you think it's a good idea?  It seems 
like an easy way to get a small but noticeable performance improvement for 
reads and writes.
   
   ### Query engine
   
   None
   
   ### Willingness to contribute
   
   - [ ] I can contribute this improvement/feature independently
   - [x] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to