yyanyy commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r2082737494


##########
format/view-spec.md:
##########
@@ -160,6 +179,57 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _optional_  | `catalog`      | A string specifying the name of the catalog. 
If set to `null`, the catalog is the same as the view's catalog |
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the 
table/view |

Review Comment:
   sorry I'm late to this discussion and this might be discussed/could be a 
stupid question - do we think if it's worth also include the snapshot id of the 
storage table, so that under circumstances where the storage table was updated 
but the materialized view was not or vice versa (race condition or commit 
partial succeeded), there's a reference link in both way (I know storage table 
already points to view version id under "refresh state"), which might be useful 
especially when view/table undergo version upgrades that could result in e.g. 
schema mismatch between the view and the underlying storage table? 



##########
format/view-spec.md:
##########
@@ -160,6 +179,57 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _optional_  | `catalog`      | A string specifying the name of the catalog. 
If set to `null`, the catalog is the same as the view's catalog |

Review Comment:
   +1 on avoid a different `catalog` field that may introduce engine 
interoperability challenges that I personally feel quite unnecessary, unless 
there's a strong use case



##########
format/view-spec.md:
##########
@@ -160,6 +179,57 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _optional_  | `catalog`      | A string specifying the name of the catalog. 
If set to `null`, the catalog is the same as the view's catalog |
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the 
table/view |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialzied views.

Review Comment:
   nit - typo in `materialzied`



##########
format/view-spec.md:
##########
@@ -160,6 +179,57 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _optional_  | `catalog`      | A string specifying the name of the catalog. 
If set to `null`, the catalog is the same as the view's catalog |
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the 
table/view |
+
+### Storage table metadata

Review Comment:
   I wonder if it's still worth mentioning the fields that are only applicable 
to materialized view in appendix of table spec, e.g.  
https://iceberg.apache.org/spec/#optional-snapshot-summary-fields and link to 
this page, so it is still searchable in the table spec page



##########
format/view-spec.md:
##########
@@ -42,12 +42,28 @@ An atomic swap of one view metadata file for another 
provides the basis for maki
 
 Writers create view metadata files optimistically, assuming that the current 
metadata location will not be changed before the writer's commit. Once a writer 
has created an update, it commits by swapping the view's metadata file pointer 
from the base location to the new location.
 
+### Materialized Views
+
+Materialized views are a type of view with precomputed results from the view 
query stored as a table.
+When queried, engines may return the precomputed data for the materialized 
views, shifting the cost of query execution to the precomputation step.
+
+Iceberg materialized views are implemented as a combination of an Iceberg view 
and an underlying Iceberg table, known as the storage table, which stores the 
precomputed data.
+The metadata for a materialized view extends the Iceberg view metadata, adding 
a pointer to the precomputed data and refresh information to determine if the 
data is still fresh. 
+The refresh information is composed of data about the so-called "source 
tables", which are the tables referenced in the query definition of the 
materialized view. 
+The storage table can be in the states of "fresh", "stale" or "invalid", which 
are determined from the following situations:
+* **fresh** -- The `snapshot_id`s of the last refresh operation match the 
current `snapshot_id`s of the source tables.
+* **stale** -- The `snapshot_id`s do not match, indicating that a refresh 
operation needs to be performed to capture the latest source table changes.

Review Comment:
   it seems that from the rest of the spec, nested view is indeed a valid use 
case, for materialized view referring other materialized views or even common 
views. However it seems to me that the rest of the spec didn't mention one of 
these states being recorded anywhere in the spec, which I would also agree to 
not include since it feels very challenging to ensure this info could be 
trustworthy especially when there are multiple engines doing reads/writes on 
the same data. With this, do we mention these `fresh`/`stale`/`invalid` here 
mostly for establishing a nomenclature for view users? If so I wonder if it's 
worth explicitly mentioning that to avoid the user expecting one of these three 
states to be recorded in their view metadata. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to