tdcmeehan commented on code in PR #8202: URL: https://github.com/apache/iceberg/pull/8202#discussion_r1521568921
########## format/puffin-spec.md: ########## @@ -126,6 +126,23 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct values, derived from the sketch. +#### `hive-column-statistics-obj` blob type + +A serialized form of Hive ColumnStatsObject. Review Comment: Is what's referenced here the Thrift `ColumnStatisticsObj` in the Hive IDL? https://github.com/apache/hive/blob/ffb1165f59defa66b31b4fd9cb6367b71050071b/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L583 If so, I'd recommend correcting the name, and linking to the Thrift IDL, and explicitly calling out that this is Thrift-serialized. I'm also wondering if we need to think about versioning. If this is based on the Thrift IDL, I am not sure if those are intended to be persisted. At the very least, I am concerned if Hive decides to introduce a backwards-incompatible field to this struct, some engines begin to serialize with this newly introduced backwards incompatible field, and other engines begin to attempt to deserialize it with an older IDL, then it will break in the Iceberg library. Please let me know if I'm misunderstanding anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org