aihuaxu commented on PR #12658: URL: https://github.com/apache/iceberg/pull/12658#issuecomment-2808132998
> > @aihuaxu and @rdblue is there a reason we need to explicitly restrict the lower/upper bounds to shredded fields? I would think that the stats pruning would be useful for any field that a writer would want to include in the bound (regardless of whether it was shredded or not). > > What we were thinking is that the bounds are collected from shredded column stats during shredding process. But it does seem reasonable to me to bounds and shredding can be separated: if a writer has the knowledge of the bounds and chooses not to shred, the bounds can still be used in pruning. @danielcweeks We rethink about this approach. That would cause the stats mismatch between Iceberg manifest files and Parquet footer, i.e., Parquet footer may not have the stats while Iceberg manifest files do. Do you see common use cases that the fields are not shredded while the writers may know the stats? I would prefer to keep the stats in sync between manifest files and Parquet footer, same as the other columns and keep it simpler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org