[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

GitBox Fri, 09 Dec 2022 15:16:05 -0800


flyrain commented on code in PR #6350:
URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044906141



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -308,6 +339,17 @@ public Scan buildChangelogScan() {
     return new SparkChangelogScan(spark, table, scan, readConf, 
expectedSchema, filterExpressions);
   }
 
+  private Long getStartSnapshotId(Long startTimestamp) {
+    Snapshot oldestSnapshotAfter = SnapshotUtil.oldestAncestorAfter(table, 
startTimestamp);
+    Preconditions.checkArgument(

Review Comment:
   > do we need to differentiate between a snapshot existed yet had no changes 
and no such snapshot existed.
   
   I will consider these two use cases.
   1. Query with snapshot id. We should differentiate it, throwing an exception 
makes sense here.
   2. Query only with time range. We may assume user doesn't have to know the 
concept of `snapshot`. We may not differentiate it, returning an empty set 
makes more sense.
   
   Subtle difference though. I am fine with either option.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

Reply via email to