[I] Schema evolution error when querying Parquet files with different schema versions [iceberg-rust]

via GitHub Sat, 19 Jul 2025 09:34:19 -0700


amitgilad3 opened a new issue, #1529:
URL: https://github.com/apache/iceberg-rust/issues/1529


   ### Apache Iceberg Rust version
   
   0.5.1 (latest version)
   
   ### Describe the bug
   
   When querying an Iceberg table that has evolved its schema, iceberg-rust 
fails if some underlying Parquet files do not yet contain the newly added 
field(s). This violates the schema evolution contract in Iceberg, where missing 
fields should be treated as null. Instead, the reader returns a schema mismatch 
error.
   
   Error: External(DataInvalid => Parquet schema table {
     1: t_id: optional string
     2: primal_country: optional string
     3: other_countries: optional string
     4: primary_country_iso2: optional string
     5: other_countries_iso2: optional string
   }
    and Iceberg schema table {
     1: t_id: optional string
     2: primal_country: optional string
     3: other_countries: optional string
     4: primary_country_iso2: optional string
     5: other_countries_iso2: optional string
     6: event_time: optional timestamp
   }
    do not match.
   
   
   i was looking through the slack and found this - 
https://github.com/apache/iceberg-rust/pull/602 but im not sure this pr 
addresses this issue 
   
   ### To Reproduce
   
   1. spark.sql("""
   CREATE OR REPLACE TABLE my_namespace.demo (
       t_id BIGINT,
       primal_country STRING,
       other_countries STRING,
       primary_country_iso2 STRING,
       other_countries_iso2 STRING,
       event_time TIMESTAMP
   )
   USING iceberg
   """);
   2.  spark.sql("""
   INSERT INTO demo (t_id, primal_country, other_countries, 
primary_country_iso2, other_countries_iso2)
   VALUES
     (1, 'USA', 'Canada,Mexico', 'US', 'CA,MX'),
     (2, 'Germany', 'France,Italy', 'DE', 'FR,IT')
   """)
   
   3. Now modify the table schema 
   spark.sql("""ALTER TABLE demo
   ADD COLUMNS (
       event_time timestamp 
   )""")
   
   4.  Insert rows WITH event_time (evolved schema)
   spark.sql("""
   INSERT INTO demo (t_id, primal_country, other_countries, 
primary_country_iso2, other_countries_iso2, event_time)
   VALUES
     (3, 'India', 'Nepal,Bangladesh', 'IN', 'NP,BD', current_timestamp()),
     (4, 'UK', 'Ireland,Scotland', 'GB', 'IE,SCT', current_timestamp())
   """)
   
   and now query the table using iceberg-rust - in my case i am working with 
aws-glue so i dont have a rust example to showcase
   
   ### Expected behavior
   
   The query should succeed, returning results from both the old and new 
Parquet files. Files that don't contain the newly added ingest_ts field should 
have null or default values for it. This behavior is consistent with how schema 
evolution is handled in Iceberg (and in other implementations like Java or 
Python).
   
   ### Willingness to contribute
   
   I would be willing to contribute a fix for this bug with guidance from the 
Iceberg community


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Schema evolution error when querying Parquet files with different schema versions [iceberg-rust]

Reply via email to