gabeiglio opened a new issue, #16226:
URL: https://github.com/apache/iceberg/issues/16226

   ### Feature Request / Improvement
   
   We would like to support a **session-level default timestamp** for 
time-travel reads of Iceberg tables in Spark, so that all table reads in a 
Spark session automatically use the same historical snapshot without modifying 
queries, java/scala, and python spark jobs.
   
   Some concrete use cases:
   
      - Replay historical runs during migration (e.g., from one Spark version 
or cluster to another)
      - Point-in-time audits where multiple tables and/or views must be read at 
a consistent timestamp.
      - Investigate issues by reproducing historical conditions across a 
multi-table pipeline.
   
   In all these scenarios, being able to set a **single session-level 
timestamp** is far more robust and maintainable than attempting to make 
`TIMESTAMP AS OF` through every individual query, especially when views and 
higher-level abstractions are involved.
   
   The proposal is for the time-travel spark conf
   
   ```scala
   spark.conf.set("spark.sql.iceberg.read.as-of-timestamp", "1704067200000")
   ```
   
   Once set, any read of an Iceberg table in that session should automatically 
use the specified historical snapshot, without requiring changes to the 
query/code:
   
   ```scala
   spark.sql("SELECT * FROM orders")
   spark.sql("SELECT * FROM customers")
   ```
   
   Both of these should read from the same historical snapshot, and the 
behavior should also propagate through views that ultimately read from Iceberg 
tables.
   
   Here is a [PR](https://github.com/apache/iceberg/pull/16205) implementing 
this feature for Spark 4.1 It basically mirrors `ResolveBranch.scala`, but 
instead of resolving a branch name, it resolves a default timestamp.
   
   cc: @rdblue @danielcweeks As this was mentioned in a previous sync few 
months ago & @RussellSpitzer as you expressed interest in the 
[Issue](https://github.com/apache/iceberg/issues/15163#issuecomment-3811863152)
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [x] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to