szehon-ho commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r1876688205


##########
format/view-spec.md:
##########
@@ -160,6 +179,56 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Full identifier
+
+The full identifier holds a reference, containing a namespace and a name, of a 
table or view in the catalog.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _optional_  | `catalog`      | A string specifying the name of the catalog. 
If set to `null`, the catalog is the same as the views' catalog |
+| _required_  | `namespace`    | A list of namespace levels |
+| _required_  | `name`         | A string specifying the name of the 
table/view |
+
+### Materialized View Metadata stored as part of the Table Metadata
+
+A property "refresh-state" is set on the table [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) to determine the freshness 
of the precomputed data of the storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string | 
+
+#### Refresh state
+
+The refresh state record captures the state of all source tables and source 
views in the fully expanded query tree of the materialized view, including 
indirect references. Indirect references are the tables/views that are not 
directly referenced in the query but are nested within other views. The refresh 
state has the following fields:
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `refresh-version-id`         | The `version-id` of the 
materialized view when the refresh operation was performed  | 
+| _required_  | `source-table-states`        | A list of [source 
table](#source-table) records for all tables that are directly or indirectly 
referenced in the materialized view query |
+| _required_  | `source-view-states`         | A list of [source 
view](#source-view) records for all views that are directly or indirectly 
referenced in the materialized view query |
+| _required_  | `refresh-start-timestamp-ms` | A timestamp of when the refresh 
operation was started |
+
+#### Source table
+
+A source table record captures the state of a source table at the time of the 
last refresh operation.
+
+| Requirement | Field name     | Description |

Review Comment:
   If i understand correctly @wmoustafa comment on the mailing list, then there 
is some ambiguity here for what to put, if the same table in expressed in the 
various forms (catalog.database.name) or (database.name) or (name), either in 
same sql statement or in the different sql representations of the same view.  
@stevenzwu  i wonder do we dictate a standard to follow in this case?



##########
format/view-spec.md:
##########
@@ -42,12 +42,28 @@ An atomic swap of one view metadata file for another 
provides the basis for maki
 
 Writers create view metadata files optimistically, assuming that the current 
metadata location will not be changed before the writer's commit. Once a writer 
has created an update, it commits by swapping the view's metadata file pointer 
from the base location to the new location.
 
+### Materialized Views
+
+Materialized views are a type of view that precompute the data from the view 
query.
+When queried, engines may return the precomputed data for the materialized 
views, shifting the cost of query execution to the precomputation step.
+
+Iceberg materialized views are implemented as a combination of an Iceberg view 
and an underlying Iceberg table, known as the storage table, which stores the 
precomputed data.
+The metadata for a materialized view extends the common view metadata, adding 
a pointer to the precomputed data and refresh information to determine if the 
data is still fresh. 
+The refresh information is composed of data about the so-called "source 
tables", which are the tables referenced in the query definition of the 
materialized view. 
+The storage table can be in the states of "fresh", "stale" or "invalid", which 
are determined from the following situations:
+* **fresh** -- The `snapshot_id`'s of the last refresh operation match the 
current `snapshot_id`'s of the source tables.

Review Comment:
   Nit: i think we shouldnt have the apostrophe here (it seems a possessive one 
then of `snapshot_id`, which is wrong).  Same for the next one.



##########
format/view-spec.md:
##########
@@ -160,6 +179,56 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Full identifier
+
+The full identifier holds a reference, containing a namespace and a name, of a 
table or view in the catalog.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _optional_  | `catalog`      | A string specifying the name of the catalog. 
If set to `null`, the catalog is the same as the views' catalog |

Review Comment:
   should this be `view's` catalog, as view is singular here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to