Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

via GitHub Fri, 09 Feb 2024 09:59:08 -0800


aokolnychyi commented on code in PR #9455:
URL: https://github.com/apache/iceberg/pull/9455#discussion_r1484642976



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -405,15 +407,18 @@ public boolean equals(Object other) {
       return false;
     }
 
-    // use only name in order to correctly invalidate Spark cache
+    // use name only unless branch/snapshotId is given in order to correctly 
invalidate Spark cache
+    // when branch or snapshotId is given, it's time travel
     SparkTable that = (SparkTable) other;
-    return icebergTable.name().equals(that.icebergTable.name());
+    return icebergTable.name().equals(that.icebergTable.name())
+        && Objects.equals(snapshotId, that.snapshotId);

Review Comment:
   I think we should compare the table name, branch, and snapshot and don't 
initialize the snapshot when the branch is provided. The question is whether we 
can load the MAIN branch explicitly and whether that should be equal to simply 
loading the table. If we can load MAIN explicitly, I think we should use 
[this](https://github.com/apache/iceberg/pull/9455/files#r1475244447) snippet 
above with the normalization. If not, we can go back to what is mentioned 
[here](https://github.com/apache/iceberg/pull/9455/files#r1475705807).
   
   What do you think about it, @nastra?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

Reply via email to