amogh-jahagirdar commented on issue #10008: URL: https://github.com/apache/iceberg/issues/10008#issuecomment-2014209251
I looked into this a bit and I think I know the problem. Here's a sample test that can be added to `TestAddFilesProcedure` to repro ``` @TestTemplate public void addFilesPartitionEvolved() { createIcebergTable( "p1 int, p2 int, data int not null", "PARTITIONED BY (p1)"); sql("ALTER TABLE %s ADD PARTITION FIELD p2", tableName); String createParquet = "CREATE TABLE %s (p1 int, p2 int, data int) USING %s " + "PARTITIONED BY (p1, p2) LOCATION '%s'"; sql(createParquet, sourceTableName, "parquet", fileTableDir.getAbsolutePath()); sql("INSERT INTO %s PARTITION (p1=1, p2=10) VALUES (100)", sourceTableName); List<Object[]> result = sql( "CALL %s.system.add_files('%s', '%s')", catalogName, tableName, sourceTableName); sql("SELECT * FROM %s", tableName); } ``` When we import the partitions, we derive an Icebeg partition spec from the hive style partitioning here https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java#L430. This new partition spec will have a spec ID of 0 (the same spec ID as when you created the Iceberg table). This is the spec that gets used when writing the manifests here https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java#L350 But in the target Iceberg table, the spec ID with (p1, p2) is actually 1. I'll need to think more about what the right solution is, but on the surface it seems like the right thing to do here is to 1.) Derive the partition spec from the source table partitioning. 2.) See if that same schema exists in the target table 3.) If so build a copy of the derived partition spec but with the updated spec ID of the target table. But that seems too specific of a fix for this. I'm also not sure what the behavior of the procedure is if the partition spec on the target is completely different. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org