aokolnychyi commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1475245547
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ########## @@ -405,15 +407,18 @@ public boolean equals(Object other) { return false; } - // use only name in order to correctly invalidate Spark cache + // use name only unless branch/snapshotId is given in order to correctly invalidate Spark cache + // when branch or snapshotId is given, it's time travel SparkTable that = (SparkTable) other; - return icebergTable.name().equals(that.icebergTable.name()); + return icebergTable.name().equals(that.icebergTable.name()) + && Objects.equals(snapshotId, that.snapshotId); Review Comment: We would need to double check our caching catalogs and whether `refreshEagerly` has to be part of this comparison. ########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ########## @@ -117,7 +119,7 @@ public class SparkTable .build(); private final Table icebergTable; - private final Long snapshotId; + private Long snapshotId; Review Comment: Do we have to remove the final keyword here? ########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ########## @@ -131,12 +133,12 @@ public SparkTable(Table icebergTable, boolean refreshEagerly) { public SparkTable(Table icebergTable, String branch, boolean refreshEagerly) { this(icebergTable, refreshEagerly); this.branch = branch; + final Snapshot snapshot = icebergTable.snapshot(branch); ValidationException.check( - branch == null - || SnapshotRef.MAIN_BRANCH.equals(branch) - || icebergTable.snapshot(branch) != null, + branch == null || SnapshotRef.MAIN_BRANCH.equals(branch) || snapshot != null, "Cannot use branch (does not exist): %s", branch); + this.snapshotId = snapshot.snapshotId(); Review Comment: Won't this throw an NPE in some cases as `snapshot` could be null? Also, I am not sure this logic is correct. Tables loaded for a particular snapshot ID and for a particular branch may not be logically equal, more operations could happen to the branch upon its initial load. ########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ########## @@ -405,15 +407,18 @@ public boolean equals(Object other) { return false; } - // use only name in order to correctly invalidate Spark cache + // use name only unless branch/snapshotId is given in order to correctly invalidate Spark cache + // when branch or snapshotId is given, it's time travel SparkTable that = (SparkTable) other; - return icebergTable.name().equals(that.icebergTable.name()); + return icebergTable.name().equals(that.icebergTable.name()) + && Objects.equals(snapshotId, that.snapshotId); Review Comment: An alternative to loading snapshot ID for a branch, could be something like this. ``` @Override public boolean equals(Object other) { if (this == other) { return true; } else if (other == null || getClass() != other.getClass()) { return false; } SparkTable that = (SparkTable) other; return icebergTable.name().equals(that.icebergTable.name()) && normalizedBranch().equals(that.normalizedBranch()) && Objects.equals(snapshotId, that.snapshotId()); } @Override public int hashCode() { return Objects.hash(icebergTable.name(), normalizedBranch(), snapshotId); } private String normalizedBranch() { return branch != null ? branch : SnapshotRef.MAIN_BRANCH; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org