sfc-gh-yijli opened a new issue, #12273:
URL: https://github.com/apache/iceberg/issues/12273

   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   The behavior of `add_files` procedure in Spark is affected by table property 
`compatibility.snapshot-id-inheritance.enabled`
   Setting this property will lead to the mismatch in source-id in 
partition-spec: manifest header will take column ids start from 0 but metadata 
json have id start at 1.
   
   Reproduce steps:
   ```
   create table snapshot_id_parquet(id int, desc string) using parquet 
partitioned by (name string)
       location 's3a://my_bucket_location';
   
   insert into  snapshot_id_parquet values (1, 'abc', 'a');
   
   create table null_snapshot_id_in_manifest(id int, desc string, name string) 
using iceberg 
   partitioned by (name) 
   TBLPROPERTIES ('compatibility.snapshot-id-inheritance.enabled'='true');
   
   CALL system.add_files(table => 'default.null_snapshot_id_in_manifest', 
source_table => 'default.snapshot_id_parquet');
   ```
   
   Examine the manifest header and metadata json of the table 
null_snapshot_id_in_manifest, v3 will be the latest write.
   This v3.metadata.json will have partition spec written correctly:
   ```
   "partition-specs" : [ {
       "spec-id" : 0,
       "fields" : [ {
         "name" : "name",
         "transform" : "identity",
         "source-id" : 3,
         "field-id" : 1000
       } ]
     } ],
   ```
   However, using `avro-tools getmeta` to check the manifest avro file, we 
found `partition-spec        
[{"name":"name","transform":"identity","source-id":2,"field-id":1000}]`, which 
is a mismatch from json
   
   Also verified that not setting the flag will write matching `source-id` from 
manifest and metadata json.
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to