rdblue commented on code in PR #10962:
URL: https://github.com/apache/iceberg/pull/10962#discussion_r1861332491


##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -924,6 +934,7 @@ public List<ManifestFile> apply(TableMetadata base, 
Snapshot snapshot) {
                         != ManifestWriter
                             .UNASSIGNED_SEQ) // filter out unassigned in 
rewritten manifests
             .reduce(base.lastSequenceNumber(), Math::min);
+    long minDataSequenceNumber = Math.min(minNewFileSequenceNumber, 
minExistingDataSequenceNumber);

Review Comment:
   The logic for calculating the min sequence number is getting too long to 
embed here. I think it should be moved to a separate private method:
   
   ```java
     private long minDataSequenceNumber() {
       long minAddedDataSequenceNumber = addedDataFiles().stream()
           .map(ContentFile::dataSequenceNumber)
           .filter(Objects::nonNull)
           .filter(seq -> seq >= 0)
           .reduce(base.nextSequenceNumber(), Math::min);
   
       long minExistingDataSequenceNumber =
           filtered.stream()
               .map(ManifestFile::minSequenceNumber)
               .filter(
                   seq ->
                       seq
                           != ManifestWriter
                               .UNASSIGNED_SEQ) // filter out unassigned in 
rewritten manifests
               .reduce(base.lastSequenceNumber(), Math::min);
       long minDataSequenceNumber = Math.min(minAddedDataSequenceNumber, 
minExistingDataSequenceNumber)
   
       return Math.min(Math.min(minAddedDataSequenceNumber, 
minExistingDataSequenceNumber), newDataFilesDataSequenceNumber);
     }
   ```
   
   This also fixes the awkwardness of deciding whether to use 
`base.nextSequenceNumber()` or `newDataFilesDataSequenceNumber`.
   
   I also agree with @amogh-jahagirdar that checking `addedDataFiles()` is not 
currently necessary, but it seems like a good idea to future-proof this issue 
in case we change the code again. This bug was likely introduced when we added 
`newDataFilesDataSequenceNumber`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to