[GitHub] [iceberg] Fokko commented on issue #6858: Schema of the Underlying data files

via GitHub Thu, 16 Feb 2023 05:33:09 -0800


Fokko commented on issue #6858:
URL: https://github.com/apache/iceberg/issues/6858#issuecomment-1433095111


   Hey @fivetran-tusharkumar thanks for reaching out.
   
   Iceberg is designed to do lazy changes. So if you add a column, this will be 
added to the table schema, but not to all the files. Once you read the files, 
the new column (that is missing from the Parquet file), will be added and this 
will be null. Once you rewrite a file, the file that replaces the file will 
have the new column. The Iceberg schema is optionally stored (some writers, do 
not write this unfortunately) in the [Parquet 
metadata](https://parquet.apache.org/docs/file-format/metadata/). Otherwise, 
you need to reconstruct the Iceberg schema from the Parquet schema that 
contains the FieldIDs that match with the Iceberg schema.
   
   For example, in Iceberg if you rename a column. The old files that are 
already part of the table won't get rewritten right away but can be rewritten 
at some point eventually. Using the FieldIDs the columns are looked up, the old 
files are read with their original column name and then renamed to the new 
column name.
   
   TLDR: You need to read the footer of each parquet file to determine the 
schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] Fokko commented on issue #6858: Schema of the Underlying data files

Reply via email to