ajantha-bhat commented on code in PR #6582:
URL: https://github.com/apache/iceberg/pull/6582#discussion_r1073335902


##########
core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java:
##########
@@ -26,4 +26,6 @@ private StandardBlobTypes() {}
    * href="https://datasketches.apache.org/";>Apache DataSketches</a> library
    */
   public static final String APACHE_DATASKETCHES_THETA_V1 = 
"apache-datasketches-theta-v1";
+
+  public static final String NDV_BLOB = "ndv-blob";

Review Comment:
   I think https://github.com/apache/iceberg/pull/1985 
https://github.com/apache/iceberg/issues/1832 
https://github.com/apache/iceberg/issues/1833 are super old. We are discussing 
partition stats in the below proposal now a days.
   
   
https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk/edit?usp=sharing
   
   **In this proposal, we are planning to store partition-level NDV stats also 
in a puffin file.** So, each partition will have one puffin file. These puffin 
file's location will become a cell in a row along with file count, row count in 
a general table-level statistics file (something like an index file) stored as 
sorted avro or parquet for every snapshot.  
   
   So, I am also interested to use the same CALL procedure to collect 
partition-level stats too (with a different argument). 
   Or maybe we need to step back and design some generic interfaces that can 
work for table level, partition level and even file level stats. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to