gabeiglio opened a new issue, #16226:
URL: https://github.com/apache/iceberg/issues/16226
### Feature Request / Improvement
We would like to support a **session-level default timestamp** for
time-travel reads of Iceberg tables in Spark, so that all table reads in a
Spark session automatically use the same historical snapshot without modifying
queries, java/scala, and python spark jobs.
Some concrete use cases:
- Replay historical runs during migration (e.g., from one Spark version
or cluster to another)
- Point-in-time audits where multiple tables and/or views must be read at
a consistent timestamp.
- Investigate issues by reproducing historical conditions across a
multi-table pipeline.
In all these scenarios, being able to set a **single session-level
timestamp** is far more robust and maintainable than attempting to make
`TIMESTAMP AS OF` through every individual query, especially when views and
higher-level abstractions are involved.
The proposal is for the time-travel spark conf
```scala
spark.conf.set("spark.sql.iceberg.read.as-of-timestamp", "1704067200000")
```
Once set, any read of an Iceberg table in that session should automatically
use the specified historical snapshot, without requiring changes to the
query/code:
```scala
spark.sql("SELECT * FROM orders")
spark.sql("SELECT * FROM customers")
```
Both of these should read from the same historical snapshot, and the
behavior should also propagate through views that ultimately read from Iceberg
tables.
Here is a [PR](https://github.com/apache/iceberg/pull/16205) implementing
this feature for Spark 4.1 It basically mirrors `ResolveBranch.scala`, but
instead of resolving a branch name, it resolves a default timestamp.
cc: @rdblue @danielcweeks As this was mentioned in a previous sync few
months ago & @RussellSpitzer as you expressed interest in the
[Issue](https://github.com/apache/iceberg/issues/15163#issuecomment-3811863152)
### Query engine
Spark
### Willingness to contribute
- [x] I can contribute this improvement/feature independently
- [ ] I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- [ ] I cannot contribute this improvement/feature at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]