sfc-gh-yijli opened a new issue, #12273: URL: https://github.com/apache/iceberg/issues/12273
### Apache Iceberg version None ### Query engine Spark ### Please describe the bug 🐞 The behavior of `add_files` procedure in Spark is affected by table property `compatibility.snapshot-id-inheritance.enabled` Setting this property will lead to the mismatch in source-id in partition-spec: manifest header will take column ids start from 0 but metadata json have id start at 1. Reproduce steps: ``` create table snapshot_id_parquet(id int, desc string) using parquet partitioned by (name string) location 's3a://my_bucket_location'; insert into snapshot_id_parquet values (1, 'abc', 'a'); create table null_snapshot_id_in_manifest(id int, desc string, name string) using iceberg partitioned by (name) TBLPROPERTIES ('compatibility.snapshot-id-inheritance.enabled'='true'); CALL system.add_files(table => 'default.null_snapshot_id_in_manifest', source_table => 'default.snapshot_id_parquet'); ``` Examine the manifest header and metadata json of the table null_snapshot_id_in_manifest, v3 will be the latest write. This v3.metadata.json will have partition spec written correctly: ``` "partition-specs" : [ { "spec-id" : 0, "fields" : [ { "name" : "name", "transform" : "identity", "source-id" : 3, "field-id" : 1000 } ] } ], ``` However, using `avro-tools getmeta` to check the manifest avro file, we found `partition-spec [{"name":"name","transform":"identity","source-id":2,"field-id":1000}]`, which is a mismatch from json Also verified that not setting the flag will write matching `source-id` from manifest and metadata json. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org