hililiwei commented on code in PR #5984:
URL: https://github.com/apache/iceberg/pull/5984#discussion_r1056798443


##########
api/src/main/java/org/apache/iceberg/IncrementalScan.java:
##########
@@ -21,6 +21,23 @@
 /** API for configuring an incremental scan. */
 public interface IncrementalScan<ThisT, T extends ScanTask, G extends 
ScanTaskGroup<T>>
     extends Scan<ThisT, T, G> {
+
+  /**
+   * Instructs this scan to look for changes starting from a particular 
snapshot (inclusive).
+   *
+   * <p>If the start snapshot is not configured, it is defaulted to the oldest 
ancestor of the end
+   * snapshot (inclusive).
+   *
+   * @param fromSnapshotId the start snapshot ID (inclusive)
+   * @param referenceName the ref used
+   * @return this for method chaining
+   * @throws IllegalArgumentException if the start snapshot is not an ancestor 
of the end snapshot
+   */
+  default ThisT fromSnapshotInclusive(long fromSnapshotId, String 
referenceName) {

Review Comment:
   Agree with @stevenzwu.
   Yes, tag is a fixed point in time, but when using it for incremental read, 
we can think of it semantically the same as using `fromSnapshot(Long 
snapshotId)`.
   Just like the @stevenzwu's example, I have  daily tags(`20220101` 
`20220102`), If I want to read the incremental data from `20220102` to the 
current., so I can use `fromSnapshotExclusive("20220102")`:
   ```
   table.newIncrementalScan()
         .fromSnapshotExclusive("20220102")
         .planTasks()
   ```
   Another way is to use snapshot time to find the snapshot id first, but 
sometimes that doesn't work. For example, we can generate tags based on the 
event time of the data, or we tag the snapshot only after the application has 
completed. The application may finish at 3:00 2022/01/02, and tag the newly 
generated snapshot as `20220102`. If we use snapshot time `2022-01-02 00:00:00` 
to find the snapshot ID, incorrect incremental data will be return.
   
   cc @rdblue 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to