[GitHub] [iceberg] hililiwei commented on a diff in pull request #6350: Query changelog table with a timestamp range

GitBox Fri, 09 Dec 2022 02:54:03 -0800


hililiwei commented on code in PR #6350:
URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044323786



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -308,6 +339,17 @@ public Scan buildChangelogScan() {
     return new SparkChangelogScan(spark, table, scan, readConf, 
expectedSchema, filterExpressions);
   }
 
+  private Long getStartSnapshotId(Long startTimestamp) {
+    Snapshot oldestSnapshotAfter = SnapshotUtil.oldestAncestorAfter(table, 
startTimestamp);

Review Comment:
   Spark processes `startTimestamp` logically differently from Flink. I wonder 
if we should unify them?
   
   In Flink, it will first find whether there is a snapshot, its time is equal 
`startTimestamp`. If not, `SnapshotUtil.oldestAncestorAfter` will be used.



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkReadOptions.java:
##########
@@ -32,6 +32,12 @@ private SparkReadOptions() {}
   // End snapshot ID used in incremental scans (inclusive)
   public static final String END_SNAPSHOT_ID = "end-snapshot-id";
 
+  // Start timestamp used in multi-snapshot scans (exclusive)
+  public static final String START_TIMESTAMP = "start-timestamp";

Review Comment:
   Yes, it's called "start-snapshot-timestamp" in flink.
   
   
https://github.com/apache/iceberg/blob/6d47097151b4df1d3269563a7ebfdb4b6c270a64/flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/FlinkReadOptions.java#L48-L49
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6350: Query changelog table with a timestamp range

Reply via email to