pvary commented on PR #11131: URL: https://github.com/apache/iceberg/pull/11131#issuecomment-2517865727
I have a compaction test (`TestRewriteDataFiles.testV2Table`) in an ongoing PR. The PR: #11497 The test code: https://github.com/apache/iceberg/pull/11497/files#diff-39871b9e62b1e4e68c69f126035226176df902f364ea765b417898ad5952e496R328-R341 The tests creates an Iceberg table with 2 snapshots with delete files each: - Snapshot 1: - Data file - DF1 - Equality delete file - EQD1 - Position delete file - PD1 - Snapshot 2: - Data file - DF2 - Equality delete file - EQD2 Then the test creates a compaction commit with the `RewriteDataFilesCommitManager.CommitService` which rewrites the 2 data files (DF1 + DF2) to a single compacted data (DF3) file and removes the deleted rows. Before this change (#11131) the resulting snapshot contained a single data file, and a single delete file. The table content is: DF3, EQ2 After this change (#11131) the resulting snapshot contains a single data file, and no delete files are removed. The table content is: DF3, EQ1, PD1, EQ2 Is this change intentional? Data wise the result is correct in both cases, as no data files are remaining for which the delete files need to be applied, but the new result is definitely suboptimal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org