stevenzwu commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1736801043
########## format/view-spec.md: ########## @@ -158,6 +173,59 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) | | _required_ | `version-id` | ID that `current-version-id` was set to | +#### Full identifier + +The full identifier holds a fully resolved reference for a table or view in the catalog. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `catalog` | A string specifying the catalog of the source table | +| _required_ | `namespace` | A list of namespace levels | +| _required_ | `table` | A string specifying the name of the source table | +| _optional_ | `ref` | Branch name of the source table that is being referenced in the view query | + +When 'ref' is `null` or not set, it defaults to “main”. This field is to be ignored if the referenced entity is a view. + +### Materialized View Metadata stored as part of the Table Metadata + +To be able to determine the freshness of the precomputed data, additional metadata is stored as part of the storage table. + +For that the additional field "refresh-state" is introduced as an opaque record in the table snapshot summary. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `refresh-state` | A [refresh state](#refresh-state) record stored as a JSON-encoded string. | + +#### Refresh state + +The refresh state record captures the state of all source tables and source views in the fully expanded query tree of the materialized view. It has the following fields: + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `refresh-version-id` | The `version-id` of the materialized view when the refresh operation was performed | +| _required_ | `source-table-states` | A list of [source table](#soure-table) records | Review Comment: Also should we clarify that if the list contains the directly referenced tables only? or recursively referenced tables should be resolved to this list? ########## format/view-spec.md: ########## @@ -158,6 +173,59 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) | | _required_ | `version-id` | ID that `current-version-id` was set to | +#### Full identifier + +The full identifier holds a fully resolved reference for a table or view in the catalog. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `catalog` | A string specifying the catalog of the source table | +| _required_ | `namespace` | A list of namespace levels | +| _required_ | `table` | A string specifying the name of the source table | +| _optional_ | `ref` | Branch name of the source table that is being referenced in the view query | + +When 'ref' is `null` or not set, it defaults to “main”. This field is to be ignored if the referenced entity is a view. + +### Materialized View Metadata stored as part of the Table Metadata + +To be able to determine the freshness of the precomputed data, additional metadata is stored as part of the storage table. + +For that the additional field "refresh-state" is introduced as an opaque record in the table snapshot summary. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `refresh-state` | A [refresh state](#refresh-state) record stored as a JSON-encoded string. | + +#### Refresh state + +The refresh state record captures the state of all source tables and source views in the fully expanded query tree of the materialized view. It has the following fields: + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `refresh-version-id` | The `version-id` of the materialized view when the refresh operation was performed | +| _required_ | `source-table-states` | A list of [source table](#soure-table) records | Review Comment: I have been thinking about if this should be a `list` or a `map`. I guess `list` is probably good since we don't really need lookup capability here. We need to iterate through the sources and compare the satate. ########## format/view-spec.md: ########## @@ -158,6 +173,59 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) | | _required_ | `version-id` | ID that `current-version-id` was set to | +#### Full identifier + +The full identifier holds a fully resolved reference for a table or view in the catalog. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `catalog` | A string specifying the catalog of the source table | +| _required_ | `namespace` | A list of namespace levels | +| _required_ | `table` | A string specifying the name of the source table | +| _optional_ | `ref` | Branch name of the source table that is being referenced in the view query | + +When 'ref' is `null` or not set, it defaults to “main”. This field is to be ignored if the referenced entity is a view. + +### Materialized View Metadata stored as part of the Table Metadata + +To be able to determine the freshness of the precomputed data, additional metadata is stored as part of the storage table. + +For that the additional field "refresh-state" is introduced as an opaque record in the table snapshot summary. + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `refresh-state` | A [refresh state](#refresh-state) record stored as a JSON-encoded string. | + +#### Refresh state + +The refresh state record captures the state of all source tables and source views in the fully expanded query tree of the materialized view. It has the following fields: + +| Requirement | Field name | Description | +|-------------|----------------|-------------| +| _required_ | `refresh-version-id` | The `version-id` of the materialized view when the refresh operation was performed | +| _required_ | `source-table-states` | A list of [source table](#soure-table) records | +| _required_ | `source-view-states` | A list of [source view](#soure-view) records | Review Comment: I am wondering if this should be merged with the `source-table-states` as a single `source-states` for a list of referenced source tables and views. The only difference btw the two structs are the `id` part: `snapshot-id` vs `version-id`. I guess the benefit of separated list is to make state diff easier. it is a bit different logic to diff table state vs version state. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org