hililiwei commented on code in PR #5984: URL: https://github.com/apache/iceberg/pull/5984#discussion_r1056798443
########## api/src/main/java/org/apache/iceberg/IncrementalScan.java: ########## @@ -21,6 +21,23 @@ /** API for configuring an incremental scan. */ public interface IncrementalScan<ThisT, T extends ScanTask, G extends ScanTaskGroup<T>> extends Scan<ThisT, T, G> { + + /** + * Instructs this scan to look for changes starting from a particular snapshot (inclusive). + * + * <p>If the start snapshot is not configured, it is defaulted to the oldest ancestor of the end + * snapshot (inclusive). + * + * @param fromSnapshotId the start snapshot ID (inclusive) + * @param referenceName the ref used + * @return this for method chaining + * @throws IllegalArgumentException if the start snapshot is not an ancestor of the end snapshot + */ + default ThisT fromSnapshotInclusive(long fromSnapshotId, String referenceName) { Review Comment: Agree with @stevenzwu. Yes, tag is a fixed point in time, but when using it for incremental read, we can think of it semantically the same as using `fromSnapshot(Long snapshotId)`. Just like the @stevenzwu's example, I have daily tags(`20220101` `20220102`), If I want to read the incremental data from `20220102` to the current., so I can use `fromSnapshotExclusive("20220102")`: ``` table.newIncrementalScan() .fromSnapshotExclusive("20220102") .planTasks() ``` Another way is to use snapshot time to find the snapshot id first, but sometimes that doesn't work. For example, we can generate tags based on the event time of the data, or we tag the snapshot only after the application has completed. The application may finish at 3:00 2022/01/02, and tag the newly generated snapshot as `20220102`. If we use snapshot time `2022-01-02 00:00:00` to find the snapshot ID, incorrect incremental data will be return. cc @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org