nastra opened a new issue, #13153: URL: https://github.com/apache/iceberg/issues/13153
### Proposed Change ## Motivation Column statistics are currently stored as a mapping of field id to values across multiple columns (lower/upper bounds, value/nan/null counts, sizes). This storage model has critical limitations as the number of columns increases and as new types are being added to Iceberg: Inefficient Storage due to map-based structure: * Large memory overhead during planning/processing * Inability to project specific stats (e.g., only null_value_counts for column X) * Type Erasure: Original logical/physical types are lost when stored as binary blobs, causing: * Lossy type inference during reads * Schema evolution challenges (e.g., widening types) * Rigid Schema: Stats are tied to the data_file entry record, limiting extensibility for new stats. ## Goals Improve the column stats representation to allow for the following: * Projectability: Enable independent access to specific stats (e.g., lower_bounds without loading upper_bounds). * Type Preservation: Store original data types to support accurate reads and schema evolution. * Flexible/Extensible Representation: Allow per-field stats structures (e.g., complex types like Geo/Variant). ## Non-Goals The following issues are out-of-scope or impractical to address * Supporting unlimited stats for tables with extreme column counts * Addressing Parquet column amplification in manifest files ### Proposal document https://s.apache.org/iceberg-column-stats ### Specifications - [x] Table - [ ] View - [x] REST - [ ] Puffin - [ ] Encryption - [ ] Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org