pvary commented on code in PR #12651:
URL: https://github.com/apache/iceberg/pull/12651#discussion_r2026871589


##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -227,10 +227,11 @@ private StructLikeMap<List<FileScanTask>> 
groupByPartition(
 
     for (FileScanTask task : tasks) {
       // If a task uses an incompatible partition spec the data inside could 
contain values
-      // which belong to multiple partitions in the current spec. Treating all 
such files as
-      // un-partitioned and grouping them together helps to minimize new files 
made.
+      // which belong to multiple partitions in the current spec.
       StructLike taskPartition =
-          task.file().specId() == table.spec().specId() ? 
task.file().partition() : emptyStruct;
+          
table.spec().equalOrFinerThan(table.specs().get(task.file().specId()))
+              ? task.file().partition()
+              : emptyStruct;

Review Comment:
   We are in the process to refactoring out the compaction planning part to the 
core module.
   Please make sure that any changes here land in the 
`BinPackRewriteFilePlanner` too:
   
https://github.com/apache/iceberg/blob/d5971429ea903be873b5884c64a3dd41076179ea/core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java#L279-L287
   
   FWIW, i have an open PR to move the Spark compaction to the new API (#12692) 
which will remove the planning from here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to