szehon-ho commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1851160036
########## format/view-spec.md: ########## @@ -158,6 +175,57 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) | | _required_ | `version-id` | ID that `current-version-id` was set to | +#### Partial identifier + +The partial identifier holds a reference, containing a namespace and a name, of a table or view in the catalog. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `namespace` | A list of namespace levels | +| _required_ | `name` | A string specifying the name of the table/view | + +### Materialized View Metadata stored as part of the Table Metadata + +To be able to determine the freshness of the precomputed data, additional metadata is stored as part of the storage table. Review Comment: I would simplify this. ( no need to state 'additional metadata' or 'introduced' as its obvious from the context)? ```A snapshot property called 'refresh state' in the storage table determines the freshness of the precomputed data in that table. ``` Can we also link to snapshot property section of the table spec? ########## format/view-spec.md: ########## @@ -81,9 +94,13 @@ Each version in `versions` is a struct with the following fields: | _required_ | `representations` | A list of [representations](#representations) for the view definition | | _optional_ | `default-catalog` | Catalog name to use when a reference in the SELECT does not contain a catalog | | _required_ | `default-namespace` | Namespace to use when a reference in the SELECT is a single identifier | +| _optional_ | `storage-table` | A [partial identifier](#partial-identifier) of the storage table | Review Comment: Nit: align the column with previous line? ########## format/view-spec.md: ########## @@ -158,6 +175,57 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) | | _required_ | `version-id` | ID that `current-version-id` was set to | +#### Partial identifier + +The partial identifier holds a reference, containing a namespace and a name, of a table or view in the catalog. + +| Requirement | Field name | Description | Review Comment: Sorry if I forgot the discussion about this, but why not have UUID here as well? As its UUID for the source table reference. ########## format/view-spec.md: ########## @@ -158,6 +175,57 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) | | _required_ | `version-id` | ID that `current-version-id` was set to | +#### Partial identifier + +The partial identifier holds a reference, containing a namespace and a name, of a table or view in the catalog. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `namespace` | A list of namespace levels | +| _required_ | `name` | A string specifying the name of the table/view | + +### Materialized View Metadata stored as part of the Table Metadata + +To be able to determine the freshness of the precomputed data, additional metadata is stored as part of the storage table. + +For that the additional field "refresh-state" is introduced as an opaque record in the table snapshot summary. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `refresh-state` | A [refresh state](#refresh-state) record stored as a JSON-encoded string | + +#### Refresh state + +The refresh state record captures the state of all source tables and source views in the fully expanded query tree of the materialized view, including indirect references. It has the following fields: Review Comment: Can we define 'indirect references'? ########## format/view-spec.md: ########## @@ -42,12 +42,24 @@ An atomic swap of one view metadata file for another provides the basis for maki Writers create view metadata files optimistically, assuming that the current metadata location will not be changed before the writer's commit. Once a writer has created an update, it commits by swapping the view's metadata file pointer from the base location to the new location. +### Materialized Views + +Views have the option to be turned into materialized views by precomputing the data from the view query. +When queried, materialized views return the precomputed data, shifting the cost of query execution to the precomputation step. + +Iceberg materialized views are implemented as a combination of an Iceberg view and an underlying Iceberg table, known as the storage table, which stores the precomputed data. +The metadata for a materialized view extends the common view metadata, adding a pointer to the precomputed data and freshness information to determine if the data is still fresh. +The storage table can be in the states of "fresh", "stale" or "invalid". Review Comment: I also think we should explicitly explain these states, if we mention them. ########## format/view-spec.md: ########## @@ -42,12 +42,25 @@ An atomic swap of one view metadata file for another provides the basis for maki Writers create view metadata files optimistically, assuming that the current metadata location will not be changed before the writer's commit. Once a writer has created an update, it commits by swapping the view's metadata file pointer from the base location to the new location. +### Materialized Views + +Views have the option to be turned into materialized views by precomputing the data from the view query. +When queried, materialized views return the precomputed data, shifting the cost of query execution to the precomputation step. + +Iceberg materialized views are implemented as a combination of an Iceberg view and an underlying Iceberg table, known as the storage table, which stores the precomputed data. +The metadata for a materialized view extends the common view metadata, adding a pointer to the precomputed data and freshness information to determine if the data is still fresh. Review Comment: Its a bit of repeated information from down below. If this is more of an intro, high level paragraph, how about just ```Materialized views maintain additional metadata to track its storage table and its freshness.``` ########## format/view-spec.md: ########## @@ -42,12 +42,25 @@ An atomic swap of one view metadata file for another provides the basis for maki Writers create view metadata files optimistically, assuming that the current metadata location will not be changed before the writer's commit. Once a writer has created an update, it commits by swapping the view's metadata file pointer from the base location to the new location. +### Materialized Views + +Views have the option to be turned into materialized views by precomputing the data from the view query. Review Comment: Sounds a bit over complicated to mention `option to turn into`, why not just ``` Materialized views are a type of view that precompute the data from the view query``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org