stevenzwu commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r3242616779
##########
format/view-spec.md:
##########
@@ -322,3 +453,142 @@
s3://bucket/warehouse/default.db/event_agg/metadata/00002-(uuid).metadata.json
} ]
}
```
+
+### Materialized View Example
+
+Imagine the following operation, which creates a materialized view that
precomputes daily event counts:
+
+```sql
+USE prod.default
+```
+```sql
+CREATE MATERIALIZED VIEW event_agg_mv (
+ event_count COMMENT 'Count of events',
+ event_date)
+COMMENT 'Precomputed daily event counts'
+AS
+SELECT
+ COUNT(1), CAST(event_ts AS DATE)
+FROM events
+GROUP BY 2
+```
+
+The materialized view metadata JSON file looks as follows:
+
+```
+s3://bucket/warehouse/default.db/event_agg_mv/metadata/00001-(uuid).metadata.json
+```
+```json
+{
+ "view-uuid": "b2a12651-3038-4a72-8a31-5027ab84da35",
+ "format-version" : 1,
+ "location" : "s3://bucket/warehouse/default.db/event_agg_mv",
+ "current-version-id" : 1,
+ "properties" : {
+ "comment" : "Precomputed daily event counts"
+ },
+ "versions" : [ {
+ "version-id" : 1,
+ "timestamp-ms" : 1573518431292,
+ "schema-id" : 1,
+ "default-catalog" : "prod",
+ "default-namespace" : [ "default" ],
+ "summary" : {
+ "engine-name" : "Spark",
+ "engine-version" : "3.4.1"
+ },
+ "representations" : [ {
+ "type" : "sql",
+ "sql" : "SELECT\n COUNT(1), CAST(event_ts AS DATE)\nFROM
events\nGROUP BY 2",
+ "dialect" : "spark"
+ } ],
+ "storage-table" : {
+ "namespace" : [ "default" ],
+ "name" : "event_agg_mv__storage"
+ }
+ } ],
+ "schemas": [ {
+ "schema-id": 1,
+ "type" : "struct",
+ "fields" : [ {
+ "id" : 1,
+ "name" : "event_count",
+ "required" : false,
+ "type" : "int",
+ "doc" : "Count of events"
+ }, {
+ "id" : 2,
+ "name" : "event_date",
+ "required" : false,
+ "type" : "date"
+ } ]
+ } ],
+ "version-log" : [ {
+ "timestamp-ms" : 1573518431292,
+ "version-id" : 1
+ } ]
+}
+```
+
+After a refresh operation, the storage table's snapshot summary contains the
`refresh-state` property.
+The following is an example of the `refresh-state` JSON value stored in the
snapshot summary of the storage table:
+
+```json
+{
+ "view-version-id" : 1,
+ "refresh-start-timestamp-ms" : 1573518435000,
+ "source-states" : [ {
+ "type" : "table",
+ "namespace" : [ "default" ],
+ "name" : "events",
+ "uuid" : "d4a10b5c-1e8a-4b72-9d67-3f4a8c9e1b2d",
+ "snapshot-id" : 6148331192489823102
+ } ]
+}
+```
+
+## Appendix B: What counts as a dependency
+
+The dependencies of a materialized view are determined by parsing the view
query:
+
+- **Base Iceberg tables** in the dependency graph are recorded by
`snapshot-id`.
+- **Iceberg views** in the dependency graph are recorded by `version-id`. A
view's own dependencies are transitively dependencies of the materialized view
and appear as additional entries in `source-states`.
+- **Intermediate materialized views** in the dependency graph are treated as
their storage tables and recorded by the storage table's `snapshot-id`. Their
own freshness is established recursively from their `refresh-state`.
+
+### Example
+
+The query under examination:
+
+- `A` (the materialized view being refreshed): `SELECT ... FROM B JOIN C ON
...`
+- `B` (regular view): `SELECT ... FROM E JOIN D ON ...`
+- `C` (materialized view): `SELECT ... FROM F JOIN G ON ...`
+- `D` (materialized view): `SELECT ... FROM H WHERE ...`
+- `E`, `F`, `G`, `H`: base Iceberg tables
+
+`A`'s dependencies are `B`, `C`, and `D`. `B` is a regular view; its own
dependencies (`E` and `D`) are transitively dependencies of `A`. `C` and `D`
are materialized views; they appear in `A`'s `source-states` as their storage
tables.
Review Comment:
why A depends on D here? D is not direct child of A
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]