nastra opened a new issue, #13153:
URL: https://github.com/apache/iceberg/issues/13153

   ### Proposed Change
   
   ## Motivation
   Column statistics are currently stored as a mapping of field id to values 
across multiple columns (lower/upper bounds, value/nan/null counts, sizes). 
This storage model has critical limitations as the number of columns increases 
and as new types are being added to Iceberg:
   Inefficient Storage due to map-based structure:
   * Large memory overhead during planning/processing
       * Inability to project specific stats (e.g., only null_value_counts for 
column X)
   * Type Erasure: Original logical/physical types are lost when stored as 
binary blobs, causing:
       * Lossy type inference during reads
       * Schema evolution challenges (e.g., widening types)
   * Rigid Schema: Stats are tied to the data_file entry record, limiting 
extensibility for new stats.
   
   ## Goals
   Improve the column stats representation to allow for the following:
   * Projectability: Enable independent access to specific stats (e.g., 
lower_bounds without loading upper_bounds).
   * Type Preservation: Store original data types to support accurate reads and 
schema evolution.
   * Flexible/Extensible Representation: Allow per-field stats structures 
(e.g., complex types like Geo/Variant).
   
   ## Non-Goals
   The following issues are out-of-scope or impractical to address
   * Supporting unlimited stats for tables with extreme column counts
   * Addressing Parquet column amplification in manifest files
   
   
   ### Proposal document
   
   https://s.apache.org/iceberg-column-stats
   
   ### Specifications
   
   - [x] Table
   - [ ] View
   - [x] REST
   - [ ] Puffin
   - [ ] Encryption
   - [ ] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to