stevenzwu commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r2783981016


##########
format/view-spec.md:
##########
@@ -160,6 +176,109 @@ Each entry in `version-log` is a struct with the 
following fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of every storage 
table snapshot to determine the freshness of the precomputed data of the 
storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+Consumers should only read from the storage table if the materialized view is 
"fresh" and therefore adequately represents the logical query definition of the 
view.
+Different systems define freshness differently based on time-based and logical 
factors.
+
+**Time-based freshness (consumer-defined):**
+
+Consumers may apply time-based freshness policies, such as allowing a certain 
staleness window based on `refresh-start-timestamp-ms`.
+When evaluating freshness, consumers:
+- Must first evaluate their own time-based freshness policy.
+- May additionally compare the `source-states` list against the states loaded 
from the catalog to verify the producers logical freshness policy.
+- May parse the view definition to implement a more sophisticated policy.
+- When a materialized view is considered stale, can fail, refresh inline, or 
treat the materialized view as a logical view.
+- Must not read from the storage table when the materialized view doesn't meet 
freshness criteria.
+
+**Logical freshness (producer-defined):**

Review Comment:
   I am not sure that we should call out consumer-defined and producer-defined. 
While producer populates the refresh-state, it is still up to consumers to 
interpret it.



##########
format/view-spec.md:
##########
@@ -160,6 +176,109 @@ Each entry in `version-log` is a struct with the 
following fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of every storage 
table snapshot to determine the freshness of the precomputed data of the 
storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+Consumers should only read from the storage table if the materialized view is 
"fresh" and therefore adequately represents the logical query definition of the 
view.
+Different systems define freshness differently based on time-based and logical 
factors.
+
+**Time-based freshness (consumer-defined):**
+
+Consumers may apply time-based freshness policies, such as allowing a certain 
staleness window based on `refresh-start-timestamp-ms`.
+When evaluating freshness, consumers:
+- Must first evaluate their own time-based freshness policy.
+- May additionally compare the `source-states` list against the states loaded 
from the catalog to verify the producers logical freshness policy.
+- May parse the view definition to implement a more sophisticated policy.
+- When a materialized view is considered stale, can fail, refresh inline, or 
treat the materialized view as a logical view.
+- Must not read from the storage table when the materialized view doesn't meet 
freshness criteria.
+
+**Logical freshness (producer-defined):**
+
+Producers define the logical freshness policy and provide the necessary 
information in the [refresh state](#refresh-state) to verify the logical 
equivalence of the precomputed data with the query definition.
+Different producers may define different logical freshness policies, based on 
how much of the dependency graph must be current.
+Some require the entire query tree to be fully up to date, while others only 
require direct children or leaf nodes.
+When writing the refresh state, producers:
+- Must provide a sufficient list of source states so that consumers can 
determine freshness according to the producer's policy.
+- May leave the source states list empty if the source state cannot be 
determined for all objects (for example, for non-Iceberg tables).
+- Must store the entry with the oldest snapshot-id or version-id when the same 
source object appears multiple times in the dependency graph (for example, in 
diamond patterns).

Review Comment:
   diamond pattern may not be a well known term. we may need to explain the 
scenario.



##########
format/view-spec.md:
##########
@@ -160,6 +176,109 @@ Each entry in `version-log` is a struct with the 
following fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of every storage 
table snapshot to determine the freshness of the precomputed data of the 
storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+Consumers should only read from the storage table if the materialized view is 
"fresh" and therefore adequately represents the logical query definition of the 
view.
+Different systems define freshness differently based on time-based and logical 
factors.
+
+**Time-based freshness (consumer-defined):**
+
+Consumers may apply time-based freshness policies, such as allowing a certain 
staleness window based on `refresh-start-timestamp-ms`.
+When evaluating freshness, consumers:
+- Must first evaluate their own time-based freshness policy.
+- May additionally compare the `source-states` list against the states loaded 
from the catalog to verify the producers logical freshness policy.

Review Comment:
   the last 4 bullet points aren't related to time-based freshness. they are 
independent of time-based vs logical freshness.



##########
format/view-spec.md:
##########
@@ -160,6 +176,109 @@ Each entry in `version-log` is a struct with the 
following fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
+The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of every storage 
table snapshot to determine the freshness of the precomputed data of the 
storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
+
+#### Freshness
+
+Consumers should only read from the storage table if the materialized view is 
"fresh" and therefore adequately represents the logical query definition of the 
view.
+Different systems define freshness differently based on time-based and logical 
factors.
+
+**Time-based freshness (consumer-defined):**
+
+Consumers may apply time-based freshness policies, such as allowing a certain 
staleness window based on `refresh-start-timestamp-ms`.
+When evaluating freshness, consumers:
+- Must first evaluate their own time-based freshness policy.
+- May additionally compare the `source-states` list against the states loaded 
from the catalog to verify the producers logical freshness policy.
+- May parse the view definition to implement a more sophisticated policy.
+- When a materialized view is considered stale, can fail, refresh inline, or 
treat the materialized view as a logical view.
+- Must not read from the storage table when the materialized view doesn't meet 
freshness criteria.

Review Comment:
   some engine (like BigQuery) may combine the data from storage table + the 
delta from source tables. So it is not entirely correct to say `Must not read`. 
not sure the best wording here. `Must not consume the storage table as it is`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to