igorbelianski-cyber commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r3242318057


##########
format/view-spec.md:
##########
@@ -190,92 +190,93 @@ The table identifier for the storage table that stores 
the precomputed results.
 ### Storage table metadata
 
 This section describes additional metadata for the storage table that 
supplements the regular table metadata and is required for materialized views.
-The property "refresh-state" is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of a storage 
table snapshot to provide information about the state of the precomputed data.
+The `refresh-state` property is set on the [snapshot 
summary](https://iceberg.apache.org/spec/#snapshots) property of a storage 
table snapshot to provide information about the state of the precomputed data.
 
 | Requirement | Field name      | Description |
 |-------------|-----------------|-------------|
 | _optional_  | `refresh-state` | A [refresh state](#refresh-state) record 
stored as a JSON-encoded string |
 
 #### Freshness
 
-A materialized view is "fresh" when the storage table adequately represents 
the result of the view query at the current state of its dependencies.
-Since different systems define freshness differently, it is left to the 
consumer to evaluate freshness based on its own policy.
+A materialized view is **fresh** when the storage table represents the result 
of the current view query (at the materialized view's current 
`view-version-id`) over the current state of its dependencies. Dependencies are 
determined by parsing the SQL: base Iceberg tables, Iceberg views (whose own 
dependencies are transitively dependencies of the materialized view), and 
intermediate materialized views (treated as their storage tables, with their 
own freshness established recursively from their `refresh-state`).
 
-**Consumer behavior:**
+A change to the materialized view's definition produces a new 
`view-version-id`; any storage-table snapshot recorded at a prior 
`view-version-id` is not fresh under the current definition.
 
-When evaluating freshness, consumers:
+The `refresh-state` summary on each storage-table snapshot records dependency 
state observed at refresh time. Producers populate it; consumers use it to 
assess freshness without re-executing the query. The spec does not mandate what 
producers record or how consumers assess. See [Appendix 
B](#appendix-b-what-counts-as-a-dependency) for what counts as a dependency.
 
-- May apply time-based freshness policies, such as allowing a staleness window 
based on `refresh-start-timestamp-ms`.
-- May compare the `source-states` list against the states loaded from the 
catalog to verify the producer's freshness interpretation.
-- May parse the view definition to implement more sophisticated policies.
-- When a materialized view is considered stale, can fail, refresh inline, or 
treat the materialized view as a logical view.
-- Should not consume the storage table as it is when the materialized view 
doesn't meet the freshness criteria.
+##### Producer flexibility
 
-**Producer behavior:**
-
-Producers should provide the necessary information in the [refresh 
state](#refresh-state) such that consumers can verify the logical equivalence 
of the precomputed data with the query definition.
-Different producers may have different freshness interpretations, based on how 
much of the refresh state's dependency graph should be evaluated.
-Some producers expect the entire dependency graph to be evaluated and 
therefore include source MV dependencies. Other producers may only expect 
dependencies in the MV's SQL to be evaluated and therefore do not include 
dependencies of source MVs.
+Producers may selectively choose a subset of their dependencies to record — 
for example, skipping non-Iceberg sources or recording an empty list.
 
 When writing the refresh state, producers:
 
-- Should provide a sufficient list of source states such that consumers can 
determine freshness according to the producer's intent. If the producers intent 
is such that it doesn't rely on the source-states to determine freshness, it 
may provide an empty list.
-- If the source state cannot be determined for all objects (for example, for 
non-Iceberg tables or non-deterministic functions) may leave the source states 
list empty.
-- If a stored object is reachable through multiple paths in the dependency 
graph (diamond dependency pattern), all distinct source states have to be 
included in the list.
+- **Must** record `view-version-id` and `refresh-start-timestamp-ms`.
+- **Must** include all distinct source states for the inputs they chose to 
track.

Review Comment:
   Must ....... they chose to track.
   sounds awkward and didn't we discuss a case of producers being able to track 
more things than consumers. And the list is meant for consumer.   
     I think we can just use "Should" here  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to