okayhooni opened a new issue, #8995: URL: https://github.com/apache/iceberg/issues/8995
### Query engine Spark/Trino ### Question As I asked in https://github.com/tabular-io/iceberg-kafka-connect/issues/149#issuecomment-1797636602 , there is no detailed explanation about difference between `counts`, `truncate`, or `full` for each column. (what `full` means..) @bryanck gave me an answer like below. counts will only store count stats for a column such as null counts. full will also store the lower/upper bounds for the column, each bounds being the full column value. truncate likewise will store the lower/upper bounds but only the first n bytes of the value for the bounds. In terms of advice, if a column value in a data file will contain a wide range of values, or if the column is not used in filters, then having the boundaries probably isn't going to be useful as it won't help prune the file scan list. For truncate, try to pick a number of bytes that as small as possible but still distinct enough to be useful for filtering. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org