amogh-jahagirdar commented on issue #10773: URL: https://github.com/apache/iceberg/issues/10773#issuecomment-2251136271
Thanks @antonkw , for the default value case I'd need to look separately but for your original question, you bring up an interesting optimization that I think we don't do yet. Even if the stats are missing in previous manifests with the old schema, it seems theoretically possible in my head to assume that new columns which are not in that schema should be treated as null, and as a result we skip those files based on null counts. I think this optimization wasn't really considered in the past because in practice, when the table is compacted the new data will be written with null values at which point the new manifest(s) for the compaction will have those stats and null/not-null skipping can happen. So there's a window of time between the schema evolution and the compaction where the queries may not be effective at skipping. But at least from my perspective it'd be great to improve performance of queries in that window if it's possible! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org