sfc-gh-asudhakar opened a new issue, #10008: URL: https://github.com/apache/iceberg/issues/10008
**Apache Iceberg version** v2 **Query engine** Spark **Please describe the bug 🐞** **The issue** Files ingested to an Iceberg table using the `system.add_files` utility don't reflect the latest partition spec, but instead reflects the original partition spec, even if the source Parquet table matches the latest partition spec. **Repro** 1) Create an Iceberg table with Partition Spec A: `create table testIcebergTable (data string, p1 int, p2 int) USING ICEBERG partitioned by (p1);` 2) Modify the partition spec by adding another partition field to make Partition Spec B: `alter table testIcebergTable add partition field p2;` 3) Create a Parquet table whose partitioning matches the new Partition Spec B: `create table testParquetTable (data string, p1 int, p2 int) USING PARQUET partitioned by (p1, p2)` 4) Insert data into Parquet table `insert into testParquetTable values ("hello", 10, 20)` 5) Call the `system.add_files` utility with the source as the Parquet table and the destination as the Iceberg table `CALL system.add_files(table => 'testIcebergTable', source_table => 'testParquetTable')` 6) Run a select query on the Iceberg table `select * from testIcebergTable` The select query returns `NULL` for column p2. Looking at the manifest file that gets created after the `add_files` call, it only contains the value for partition column `p1` and does not contain the value for `p2`. Looking at the `partition-spec` in the Avro file's key-value metadata also shows a partition spec with only 1 partition column (corresponding to Partition Spec A from step 1 above). As a result, the value for partition column p2 is lost and cannot be retrieved. Note - if the Iceberg table is originally created with 2 partition columns, then the select query returns both values. But it would face a similar issue if a 3rd partition field were to be added after. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org