rdblue commented on code in PR #8982: URL: https://github.com/apache/iceberg/pull/8982#discussion_r1687239911
########## format/spec.md: ########## @@ -1370,3 +1370,16 @@ Writing v2 metadata: * `sort_columns` was removed Note that these requirements apply when writing data to a v2 table. Tables that are upgraded from v1 may contain metadata that does not follow these requirements. Implementations should remain backward-compatible with v1 metadata requirements. + +## Appendix F: Implementation Notes + +This section covers topics not required by the specification but recommendations for systems implementing the Iceberg specification +to help maintain a uniform experience. + +### Point in Time Reads (Time Travel) + +Iceberg supports two types of histories for tables. A history of previous "current snapshots" stored in ["snapshot-log" table metadata](#table-metadata-fields) and [parent-child lineage stored in "snapshots"](#table-metadata-fields). These two histories +might indicate different snapshot IDs for a specific timestamp. The discrepancies can be caused by a variety of table operations (e.g. updating the `current-snapshot-id` of the table). + +When processing point in time queries the Iceberg community has chosen to use "snapshot-log" metadata to lookup the table state Review Comment: This spec is independent of the REST catalog protocol. The protocol covers how to exchange table information covered by this spec, but this spec covers how to track that information and, in this case, recommendations for how that information is used. In the context of your suggestion, this assumes that "when the catalog makes the snapshot history available in the metadata JSON" is all the time. It is always true because this spec defines the metadata JSON. I think it is valuable to say that engines are encouraged to use the information from `snapshot-log` for time travel by timestamp so that the results match what a query would have seen at that time. We made that choice for Spark because we think that is what users expect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org