ajantha-bhat commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1073335902
########## core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java: ########## @@ -26,4 +26,6 @@ private StandardBlobTypes() {} * href="https://datasketches.apache.org/">Apache DataSketches</a> library */ public static final String APACHE_DATASKETCHES_THETA_V1 = "apache-datasketches-theta-v1"; + + public static final String NDV_BLOB = "ndv-blob"; Review Comment: I think https://github.com/apache/iceberg/pull/1985 https://github.com/apache/iceberg/issues/1832 https://github.com/apache/iceberg/issues/1833 are super old. We are discussing partition stats in the below proposal now a days. https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk/edit?usp=sharing **In this proposal, we are planning to store partition-level NDV stats also in a puffin file.** So, each partition will have one puffin file. These puffin file's location will become a cell in a row along with file count, row count in a general table-level statistics file (something like an index file) stored as sorted avro or parquet for every snapshot. So, I am also interested to use the same CALL procedure to collect partition-level stats too (with a different argument). Or maybe we need to step back and design some generic interfaces that can work for table level, partition level and even file level stats. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org