wooyeong commented on PR #9455: URL: https://github.com/apache/iceberg/pull/9455#issuecomment-1886170959
I need your opinion regarding [failed tests](https://github.com/apache/iceberg/actions/runs/7475896517/job/20353547835?pr=9455). Previously SparkTable used only `name` intentionally according to [the original code's comment](https://github.com/apache/iceberg/blob/53a1c8671dd1b9b93f4a857230008c812d79ddbf/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L408), to invalidate cache when needed such as [`rollback_to_snapshot`](https://iceberg.apache.org/docs/1.3.1/spark-procedures/#rollback_to_snapshot) (_This procedure invalidates all cached Spark plans that reference the affected table._). However, this patch introduced effectiveSnapshotId, it can't invalidate cache in this case because the altered table will have another snapshot id. - Cache table at snapshot 1 - Alter table, now it has snapshot 2 - Try to invalidate the cache for the table, but they have different snapshot ids(cached plan's id: 1 != new plan's id: 2) so it cannot be invalidated. Therefore, I think it's better to use my previous approach, checking `name`, `branch`, and `snapshotId` altogether. Whenever `snapshotId` or `branch` is supplied, it points to a fixed snapshot point to the table. Thus we don't need to invalidate cache even if the table is modified. In general cases, it acts like before, we can invalidate the cache when necessary. The downside of this approach is that we may lose optimization chances. The optimizer doesn't know both sub-queries are the same thing, so it will fetch both of them, but that **does not affect** the query result. ```sql SELECT * FROM iceberg_except_test UNION SELECT * FROM iceberg_except_test VERSION AS OF '2024-01-01'; -- optimized plan will be == Optimized Logical Plan == Aggregate [id#30, a#31, b#32], [id#30, a#31, b#32] +- Union false, false :- RelationV2[id#30, a#31, b#32] local.iceberg_except_test +- RelationV2[id#33, a#34, b#35] local.iceberg_except_test -- but the result will be > 1 b 2024-01-01 00:00:00 ``` What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org