tdcmeehan commented on code in PR #8202:
URL: https://github.com/apache/iceberg/pull/8202#discussion_r1521568921


##########
format/puffin-spec.md:
##########
@@ -126,6 +126,23 @@ The blob metadata for this blob may include following 
properties:
 
 - `ndv`: estimate of number of distinct values, derived from the sketch.
 
+#### `hive-column-statistics-obj` blob type
+
+A serialized form of Hive ColumnStatsObject.

Review Comment:
   Is what's referenced here the Thrift `ColumnStatisticsObj` in the Hive IDL? 
https://github.com/apache/hive/blob/ffb1165f59defa66b31b4fd9cb6367b71050071b/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L583
   
   If so, I'd recommend correcting the name, and linking to the Thrift IDL, and 
explicitly calling out that this is Thrift-serialized.
   
   I'm also wondering if we need to think about versioning.  If this is based 
on the Thrift IDL, I am not sure if those are intended to be persisted.  At the 
very least, I am concerned if Hive decides to introduce a 
backwards-incompatible field to this struct, some engines begin to serialize 
with this newly introduced backwards incompatible field, and other engines 
begin to attempt to deserialize it with an older IDL, then it will break in the 
Iceberg library.
   
   Please let me know if I'm misunderstanding anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to