pvary commented on PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#issuecomment-2517865727

   I have a compaction test (`TestRewriteDataFiles.testV2Table`) in an ongoing 
PR.
   The PR: #11497
   The test code: 
https://github.com/apache/iceberg/pull/11497/files#diff-39871b9e62b1e4e68c69f126035226176df902f364ea765b417898ad5952e496R328-R341
   
   The tests creates an Iceberg table with 2 snapshots with delete files each:
   - Snapshot 1:
       - Data file - DF1
       - Equality delete file - EQD1
       - Position delete file - PD1
    - Snapshot 2:
       - Data file - DF2
       - Equality delete file - EQD2
   
   Then the test creates a compaction commit with the 
`RewriteDataFilesCommitManager.CommitService` which rewrites the 2 data files 
(DF1 + DF2) to a single compacted data (DF3) file and removes the deleted rows.
   
   Before this change (#11131) the resulting snapshot contained a single data 
file, and a single delete file. The table content is: DF3, EQ2
   After this change (#11131) the resulting snapshot contains a single data 
file, and no delete files are removed. The table content is: DF3, EQ1, PD1, EQ2
   
   Is this change intentional? Data wise the result is correct in both cases, 
as no data files are remaining for which the  delete files need to be applied, 
but the new result is definitely suboptimal.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to