Re: [I] Should be null_value_counts updated after adding a new column to the schema? [iceberg]

via GitHub Thu, 25 Jul 2024 11:21:04 -0700


amogh-jahagirdar commented on issue #10773:
URL: https://github.com/apache/iceberg/issues/10773#issuecomment-2251136271


   Thanks @antonkw , for the default value case I'd need to look separately but 
for your original question, you bring up an interesting optimization that I 
think we don't do yet. Even if the stats are missing in previous manifests with 
the old schema, it seems theoretically possible in my head to assume that new 
columns which are not in that schema should be treated as null, and as a result 
we skip those files based on null counts.
   
   I think this optimization wasn't really considered in the past because in 
practice, when the table is compacted the new data will be written with null 
values at which point the new manifest(s) for the compaction will have those 
stats and null/not-null skipping can happen. So there's a window of time 
between the schema evolution and the compaction where the queries may not be 
effective at skipping. 
   
   But at least from my perspective it'd be great to improve performance of 
queries in that window if it's possible! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Should be null_value_counts updated after adding a new column to the schema? [iceberg]

Reply via email to