rdblue commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1861332491
########## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ########## @@ -924,6 +934,7 @@ public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot) { != ManifestWriter .UNASSIGNED_SEQ) // filter out unassigned in rewritten manifests .reduce(base.lastSequenceNumber(), Math::min); + long minDataSequenceNumber = Math.min(minNewFileSequenceNumber, minExistingDataSequenceNumber); Review Comment: The logic for calculating the min sequence number is getting too long to embed here. I think it should be moved to a separate private method: ```java private long minDataSequenceNumber() { long minAddedDataSequenceNumber = addedDataFiles().stream() .map(ContentFile::dataSequenceNumber) .filter(Objects::nonNull) .filter(seq -> seq >= 0) .reduce(base.nextSequenceNumber(), Math::min); long minExistingDataSequenceNumber = filtered.stream() .map(ManifestFile::minSequenceNumber) .filter( seq -> seq != ManifestWriter .UNASSIGNED_SEQ) // filter out unassigned in rewritten manifests .reduce(base.lastSequenceNumber(), Math::min); long minDataSequenceNumber = Math.min(minAddedDataSequenceNumber, minExistingDataSequenceNumber) return Math.min(Math.min(minAddedDataSequenceNumber, minExistingDataSequenceNumber), newDataFilesDataSequenceNumber); } ``` This also fixes the awkwardness of deciding whether to use `base.nextSequenceNumber()` or `newDataFilesDataSequenceNumber`. I also agree with @amogh-jahagirdar that checking `addedDataFiles()` is not currently necessary, but it seems like a good idea to future-proof this issue in case we change the code again. This bug was likely introduced when we added `newDataFilesDataSequenceNumber`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org