wooyeong commented on PR #9455:
URL: https://github.com/apache/iceberg/pull/9455#issuecomment-1886170959

   I need your opinion regarding [failed 
tests](https://github.com/apache/iceberg/actions/runs/7475896517/job/20353547835?pr=9455).
   
   Previously SparkTable used only `name` intentionally according to [the 
original code's 
comment](https://github.com/apache/iceberg/blob/53a1c8671dd1b9b93f4a857230008c812d79ddbf/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L408),
 to invalidate cache when needed such as 
[`rollback_to_snapshot`](https://iceberg.apache.org/docs/1.3.1/spark-procedures/#rollback_to_snapshot)
 (_This procedure invalidates all cached Spark plans that reference the 
affected table._).
   
   However, this patch introduced effectiveSnapshotId, it can't invalidate 
cache in this case because the altered table will have another snapshot id.
   
   - Cache table at snapshot 1
   - Alter table, now it has snapshot 2
   - Try to invalidate the cache for the table, but they have different 
snapshot ids(cached plan's id: 1 != new plan's id: 2) so it cannot be 
invalidated.
   
   Therefore, I think it's better to use my previous approach, checking `name`, 
`branch`, and `snapshotId` altogether.
   
   Whenever `snapshotId` or `branch` is supplied, it points to a fixed snapshot 
point to the table. Thus we don't need to invalidate cache even if the table is 
modified. In general cases, it acts like before, we can invalidate the cache 
when necessary.
   
   The downside of this approach is that we may lose optimization chances. The 
optimizer doesn't know both sub-queries are the same thing, so it will fetch 
both of them, but that **does not affect** the query result.
   
   ```sql
   SELECT * FROM iceberg_except_test
   UNION
   SELECT * FROM iceberg_except_test
   VERSION AS OF '2024-01-01';
   
   -- optimized plan will be
   == Optimized Logical Plan ==
   Aggregate [id#30, a#31, b#32], [id#30, a#31, b#32]
   +- Union false, false
      :- RelationV2[id#30, a#31, b#32] local.iceberg_except_test
      +- RelationV2[id#33, a#34, b#35] local.iceberg_except_test
   
   -- but the result will be
   > 1       b       2024-01-01 00:00:00
   ```
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to