[I] I can't find any detailed explanation about column metric options on the official docs for Iceberg configuration [iceberg]

via GitHub Tue, 07 Nov 2023 02:07:58 -0800


okayhooni opened a new issue, #8995:
URL: https://github.com/apache/iceberg/issues/8995


   ### Query engine
   
   Spark/Trino
   
   ### Question
   
   As I asked in 
https://github.com/tabular-io/iceberg-kafka-connect/issues/149#issuecomment-1797636602
 , there is no detailed explanation about difference between `counts`, 
`truncate`, or `full` for each column. (what `full` means..)
   
   @bryanck gave me an answer like below.
   
   counts will only store count stats for a column such as null counts. full 
will also store the lower/upper bounds for the column, each bounds being the 
full column value. truncate likewise will store the lower/upper bounds but only 
the first n bytes of the value for the bounds.
   
   In terms of advice, if a column value in a data file will contain a wide 
range of values, or if the column is not used in filters, then having the 
boundaries probably isn't going to be useful as it won't help prune the file 
scan list. For truncate, try to pick a number of bytes that as small as 
possible but still distinct enough to be useful for filtering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] I can't find any detailed explanation about column metric options on the official docs for Iceberg configuration [iceberg]

Reply via email to