flyrain commented on code in PR #11564: URL: https://github.com/apache/iceberg/pull/11564#discussion_r1851145985
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ########## @@ -560,11 +560,13 @@ public Scan buildChangelogScan() { } boolean emptyScan = false; + if (startTimestamp != null) { + boolean isBeforeCurrentSnapshot = table.currentSnapshot().timestampMillis() < startTimestamp; Review Comment: Should we check if table.currentSnapshot() is null? It is an empty scan if a table doesn't have any snapshot. I'm concerned about a NPE here. ########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ########## @@ -560,11 +560,13 @@ public Scan buildChangelogScan() { } boolean emptyScan = false; + if (startTimestamp != null) { + boolean isBeforeCurrentSnapshot = table.currentSnapshot().timestampMillis() < startTimestamp; startSnapshotId = getStartSnapshotId(startTimestamp); - if (startSnapshotId == null && endTimestamp == null) { - emptyScan = true; - } + boolean hasNoValidStartSnapshot = startSnapshotId == null && endTimestamp == null; + + emptyScan = isBeforeCurrentSnapshot || hasNoValidStartSnapshot; Review Comment: Minor: I'd not try to combine these two logic together. We could add the extra logic like following in the beginning and keep everything else as is, which I think is the simplest solution for a quick fix. But i'm OK with either, given we will refactor this method later. ``` if (table.currentSanpshot != null && table.currentSnapshot().timestampMillis() < startTimestamp ) { emptyScan = true; } ``` ########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ########## @@ -560,11 +560,13 @@ public Scan buildChangelogScan() { } boolean emptyScan = false; + if (startTimestamp != null) { + boolean isBeforeCurrentSnapshot = table.currentSnapshot().timestampMillis() < startTimestamp; Review Comment: I feel the name is a bit hard to understand. We are trying to say that if the startTimestamp is a time after the current snapshot's time, right? How about `isAfterCurrentSnapshot`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org