flyrain commented on code in PR #11612: URL: https://github.com/apache/iceberg/pull/11612#discussion_r1901376613
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ########## @@ -560,20 +560,15 @@ public Scan buildChangelogScan() { } boolean emptyScan = false; - if (startTimestamp != null) { - if (table.currentSnapshot() != null - && table.currentSnapshot().timestampMillis() < startTimestamp) { - emptyScan = true; + if (startTimestamp != null || endTimestamp != null) { + if (startTimestamp != null) { + startSnapshotId = getStartSnapshotId(startTimestamp); } - startSnapshotId = getStartSnapshotId(startTimestamp); - if (startSnapshotId == null && endTimestamp == null) { - emptyScan = true; + if (endTimestamp != null) { + endSnapshotId = getEndSnapshotId(startTimestamp, endTimestamp); } - } - - if (endTimestamp != null) { - endSnapshotId = getEndSnapshotId(endTimestamp); if ((startSnapshotId == null && endSnapshotId == null) + || (endSnapshotId == null && endTimestamp != null) || (startSnapshotId != null && startSnapshotId.equals(endSnapshotId))) { emptyScan = true; } Review Comment: I believe this part of logic should be out side of line 563 ` if (startTimestamp != null || endTimestamp != null) `, as we are not only check the when timestamp is one of the inputs, we also need to check when inputs are start snapshot id and end snapshot id. ########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ########## @@ -615,10 +609,14 @@ private Long getStartSnapshotId(Long startTimestamp) { } } - private Long getEndSnapshotId(Long endTimestamp) { + private Long getEndSnapshotId(Long startTimestamp, Long endTimestamp) { Long endSnapshotId = null; for (Snapshot snapshot : SnapshotUtil.currentAncestors(table)) { - if (snapshot.timestampMillis() <= endTimestamp) { + long ts = snapshot.timestampMillis(); + if (startTimestamp != null && ts <= startTimestamp) { + break; + } Review Comment: I found a bit more confusing by moving the logic here. We are checking if start timestamp is larger than the current snapshot's one. Checking within an ancestor loop yields the same result, but seems unnecessary. Plus, it has to introduces the line 571, which is another burden for reader to understand. ``` || (endSnapshotId == null && endTimestamp != null) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org