szehon-ho commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r1851160036


##########
format/view-spec.md:
##########
@@ -158,6 +175,57 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Partial identifier
+
+The partial identifier holds a reference, containing a namespace and a name, 
of a table or view in the catalog.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`   | A list of namespace levels |
+| _required_  | `name`   | A string specifying the name of the table/view |
+
+### Materialized View Metadata stored as part of the Table Metadata
+
+To be able to determine the freshness of the precomputed data, additional 
metadata is stored as part of the storage table.

Review Comment:
   I would simplify this.  ( no need to state 'additional metadata' or 
'introduced' as its obvious from the context)?
   
   ```A snapshot property called 'refresh state' in the storage table 
determines the freshness of the precomputed data in that table. ```
   
   Can we also link to snapshot property section of the table spec?



##########
format/view-spec.md:
##########
@@ -81,9 +94,13 @@ Each version in `versions` is a struct with the following 
fields:
 | _required_  | `representations`   | A list of 
[representations](#representations) for the view definition         |
 | _optional_  | `default-catalog`   | Catalog name to use when a reference in 
the SELECT does not contain a catalog |
 | _required_  | `default-namespace` | Namespace to use when a reference in the 
SELECT is a single identifier        |
+| _optional_  | `storage-table`   | A [partial 
identifier](#partial-identifier) of the storage table |

Review Comment:
   Nit: align the column with previous line?



##########
format/view-spec.md:
##########
@@ -158,6 +175,57 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Partial identifier
+
+The partial identifier holds a reference, containing a namespace and a name, 
of a table or view in the catalog.
+
+| Requirement | Field name     | Description |

Review Comment:
   Sorry if I forgot the discussion about this, but why not have UUID here as 
well?  As its UUID for the source table reference.



##########
format/view-spec.md:
##########
@@ -158,6 +175,57 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Partial identifier
+
+The partial identifier holds a reference, containing a namespace and a name, 
of a table or view in the catalog.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`   | A list of namespace levels |
+| _required_  | `name`   | A string specifying the name of the table/view |
+
+### Materialized View Metadata stored as part of the Table Metadata
+
+To be able to determine the freshness of the precomputed data, additional 
metadata is stored as part of the storage table.
+
+For that the additional field "refresh-state" is introduced as an opaque 
record in the table snapshot summary.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string | 
+
+#### Refresh state
+
+The refresh state record captures the state of all source tables and source 
views in the fully expanded query tree of the materialized view, including 
indirect references. It has the following fields:

Review Comment:
   Can we define 'indirect references'?



##########
format/view-spec.md:
##########
@@ -42,12 +42,24 @@ An atomic swap of one view metadata file for another 
provides the basis for maki
 
 Writers create view metadata files optimistically, assuming that the current 
metadata location will not be changed before the writer's commit. Once a writer 
has created an update, it commits by swapping the view's metadata file pointer 
from the base location to the new location.
 
+### Materialized Views
+
+Views have the option to be turned into materialized views by precomputing the 
data from the view query.
+When queried, materialized views return the precomputed data, shifting the 
cost of query execution to the precomputation step.
+
+Iceberg materialized views are implemented as a combination of an Iceberg view 
and an underlying Iceberg table, known as the storage table, which stores the 
precomputed data.
+The metadata for a materialized view extends the common view metadata, adding 
a pointer to the precomputed data and freshness information to determine if the 
data is still fresh. 
+The storage table can be in the states of "fresh", "stale" or "invalid".

Review Comment:
   I also think we should explicitly explain these states, if we mention them.



##########
format/view-spec.md:
##########
@@ -42,12 +42,25 @@ An atomic swap of one view metadata file for another 
provides the basis for maki
 
 Writers create view metadata files optimistically, assuming that the current 
metadata location will not be changed before the writer's commit. Once a writer 
has created an update, it commits by swapping the view's metadata file pointer 
from the base location to the new location.
 
+### Materialized Views
+
+Views have the option to be turned into materialized views by precomputing the 
data from the view query.
+When queried, materialized views return the precomputed data, shifting the 
cost of query execution to the precomputation step.
+
+Iceberg materialized views are implemented as a combination of an Iceberg view 
and an underlying Iceberg table, known as the storage table, which stores the 
precomputed data.
+The metadata for a materialized view extends the common view metadata, adding 
a pointer to the precomputed data and freshness information to determine if the 
data is still fresh. 

Review Comment:
   Its a bit of repeated information from down below.  If this is more of an 
intro, high level paragraph, how about just
   
   ```Materialized views maintain additional metadata to track its storage 
table and its freshness.```



##########
format/view-spec.md:
##########
@@ -42,12 +42,25 @@ An atomic swap of one view metadata file for another 
provides the basis for maki
 
 Writers create view metadata files optimistically, assuming that the current 
metadata location will not be changed before the writer's commit. Once a writer 
has created an update, it commits by swapping the view's metadata file pointer 
from the base location to the new location.
 
+### Materialized Views
+
+Views have the option to be turned into materialized views by precomputing the 
data from the view query.

Review Comment:
   Sounds a bit over complicated to mention `option to turn into`, why not just 
   ``` Materialized views are a type of view that precompute the data from the 
view query```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to