ajantha-bhat commented on code in PR #6582:
URL: https://github.com/apache/iceberg/pull/6582#discussion_r1073335902
##########
core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java:
##########
@@ -26,4 +26,6 @@ private StandardBlobTypes() {}
* href="https://datasketches.apache.org/">Apache DataSketches</a> library
*/
public static final String APACHE_DATASKETCHES_THETA_V1 =
"apache-datasketches-theta-v1";
+
+ public static final String NDV_BLOB = "ndv-blob";
Review Comment:
I think https://github.com/apache/iceberg/pull/1985
https://github.com/apache/iceberg/issues/1832
https://github.com/apache/iceberg/issues/1833 are super old. We are discussing
partition stats in the below proposal now a days.
https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk/edit?usp=sharing
**In this proposal, we are planning to store partition-level NDV stats also
in a puffin file.** So, each partition will have one puffin file. These puffin
file's location will become a cell in a row along with file count, row count in
a general table-level statistics file (something like an index file) stored as
sorted avro or parquet for every snapshot.
So, I am also interested to use the same CALL procedure to collect
partition-level stats too (with a different argument).
Or maybe we need to step back and design some generic interfaces that can
work for table level, partition level and even file level stats.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]