rdblue commented on code in PR #8982:
URL: https://github.com/apache/iceberg/pull/8982#discussion_r1687239911
##########
format/spec.md:
##########
@@ -1370,3 +1370,16 @@ Writing v2 metadata:
* `sort_columns` was removed
Note that these requirements apply when writing data to a v2 table. Tables
that are upgraded from v1 may contain metadata that does not follow these
requirements. Implementations should remain backward-compatible with v1
metadata requirements.
+
+## Appendix F: Implementation Notes
+
+This section covers topics not required by the specification but
recommendations for systems implementing the Iceberg specification
+to help maintain a uniform experience.
+
+### Point in Time Reads (Time Travel)
+
+Iceberg supports two types of histories for tables. A history of previous
"current snapshots" stored in ["snapshot-log" table
metadata](#table-metadata-fields) and [parent-child lineage stored in
"snapshots"](#table-metadata-fields). These two histories
+might indicate different snapshot IDs for a specific timestamp. The
discrepancies can be caused by a variety of table operations (e.g. updating the
`current-snapshot-id` of the table).
+
+When processing point in time queries the Iceberg community has chosen to use
"snapshot-log" metadata to lookup the table state
Review Comment:
This spec is independent of the REST catalog protocol. The protocol covers
how to exchange table information covered by this spec, but this spec covers
how to track that information and, in this case, recommendations for how that
information is used. In the context of your suggestion, this assumes that "when
the catalog makes the snapshot history available in the metadata JSON" is all
the time. It is always true because this spec defines the metadata JSON.
I think it is valuable to say that engines are encouraged to use the
information from `snapshot-log` for time travel by timestamp so that the
results match what a query would have seen at that time. We made that choice
for Spark because we think that is what users expect.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]