[I] [feature request] Allow engines to time travel [iceberg-python]

via GitHub Fri, 12 Apr 2024 09:07:10 -0700


kevinjqliu opened a new issue, #600:
URL: https://github.com/apache/iceberg-python/issues/600


   ### Feature Request / Improvement
   
   When engines, such as Daft, read from the `Table` object  (see 
[scan_iceberg](https://github.com/pola-rs/polars/blob/py-0.20.19/py-polars/polars/io/iceberg.py#L42-L46)),
 it would be great if PyIceberg transparently handles time travel. 
   
   For example, to query an Iceberg table at a specific commit or timestamp, we 
can use PyIceberg to time travel to the particular snapshot-id or timestamp and 
then pass it into the engine.
   
   There are several options to achieve this:
   
   1. Construct `Table` object with the metadata of a specific `Snapshot`. 
Maybe a function like `Table.as_of(snapshot_id/timestamp) -> Table`. This will 
make time travel transparent to the engine.
   2. Pass the `Snapshot` object to the engine. The function 
`Table.snapshot_by_id -> Snapshot` already exists, and represents a specific 
Iceberg commit. The engine will need to be able to read from both `Snapshot` 
and `Table`
   
   Happy to explore other options as well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] [feature request] Allow engines to time travel [iceberg-python]

Reply via email to