jasonf20 commented on code in PR #10962:
URL: https://github.com/apache/iceberg/pull/10962#discussion_r1864787713


##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -833,7 +833,17 @@ public List<ManifestFile> apply(TableMetadata base, 
Snapshot snapshot) {
         filterManager.filterManifests(
             SnapshotUtil.schemaFor(base, targetBranch()),
             snapshot != null ? snapshot.dataManifests(ops.io()) : null);
-    long minDataSequenceNumber =
+
+    long minNewFileSequenceNumber =

Review Comment:
   @amogh-jahagirdar Responded 
[here](https://github.com/apache/iceberg/pull/10962#discussion_r1862492365). 



##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -363,6 +363,10 @@ private ManifestFile filterManifest(
   }
 
   private boolean canContainDeletedFiles(ManifestFile manifest, boolean 
trustManifestReferences) {
+    if (manifest.minSequenceNumber() > 0 && manifest.minSequenceNumber() < 
minSequenceNumber) {
+      return true;
+    }

Review Comment:
   If that's the case then doesn't the last condition here do nothing since it 
can only reach this check if one of the earlier checks was true anyway:
   
   ```java
    deletePaths.contains(file.location())
                             || deleteFiles.contains(file)
                             || dropPartitions.contains(file.specId(), 
file.partition())
                             || (isDelete
                                 && entry.isLive()
                                 && entry.dataSequenceNumber() > 0
                                 && entry.dataSequenceNumber() < 
minSequenceNumber);
    ```
    
    It seems like perhaps `dropDeleteFilesOlderThan` has no affect anymore 
(unless maybe `allDeletesReferenceManifests` gets set to false or something). 
    
   I think not removing by `minSequenceNumber` leaves undeleted delete files 
that just never get applied to any files at query time, so it's not the end of 
the world, but it does lead to some wasted storage and slightly longer scan 
planning times. 
   
   Assuming we want to keep this behaviour perhaps we should just not use 
`dropDeleteFilesOlderThan` anymore in `mergingSnapshotProducer` then? 
   
   We can try getting a minimal test working by doing some actual delete or 
something from a shared manifest. But it seems like `dropDeleteFilesOlderThan` 
is not exactly doing anything right now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to