pvary commented on code in PR #12651: URL: https://github.com/apache/iceberg/pull/12651#discussion_r2026871589
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ########## @@ -227,10 +227,11 @@ private StructLikeMap<List<FileScanTask>> groupByPartition( for (FileScanTask task : tasks) { // If a task uses an incompatible partition spec the data inside could contain values - // which belong to multiple partitions in the current spec. Treating all such files as - // un-partitioned and grouping them together helps to minimize new files made. + // which belong to multiple partitions in the current spec. StructLike taskPartition = - task.file().specId() == table.spec().specId() ? task.file().partition() : emptyStruct; + table.spec().equalOrFinerThan(table.specs().get(task.file().specId())) + ? task.file().partition() + : emptyStruct; Review Comment: We are in the process to refactoring out the compaction planning part to the core module. Please make sure that any changes here land in the `BinPackRewriteFilePlanner` too: https://github.com/apache/iceberg/blob/d5971429ea903be873b5884c64a3dd41076179ea/core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java#L279-L287 FWIW, i have an open PR to move the Spark compaction to the new API (#12692) which will remove the planning from here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org