wmoustafa commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r2371278472
########## format/view-spec.md: ########## @@ -42,12 +42,28 @@ An atomic swap of one view metadata file for another provides the basis for maki Writers create view metadata files optimistically, assuming that the current metadata location will not be changed before the writer's commit. Once a writer has created an update, it commits by swapping the view's metadata file pointer from the base location to the new location. +### Materialized Views + +Materialized views are a type of view with precomputed results from the view query stored as a table. +When queried, engines may return the precomputed data for the materialized views, shifting the cost of query execution to the precomputation step. + +Iceberg materialized views are implemented as a combination of an Iceberg view and an underlying Iceberg table, known as the storage table, which stores the precomputed data. +The metadata for a materialized view extends the Iceberg view metadata, adding a pointer to the precomputed data and refresh information to determine if the data is still fresh. +The refresh information is composed of data about the so-called "source tables", which are the tables referenced in the query definition of the materialized view. +The storage table can be in the states of "fresh", "stale" or "invalid", which are determined from the following situations: +* **fresh** -- The `snapshot_id`s of the last refresh operation match the current `snapshot_id`s of the source tables. +* **stale** -- The `snapshot_id`s do not match, indicating that a refresh operation needs to be performed to capture the latest source table changes. Review Comment: We discussed the framework for state and lineage information in [this doc](https://docs.google.com/document/d/1-OaPqm8ahVT3_OCbVdAPQ_wZ8I3ToeqU3RLUjcyKQM0/edit?tab=t.0). I understand the conclusion is: * Lineage information is on the view side. It maps _immediate children_ of a view to their UUIDs. * Refresh state information is on the table side. It maps _deeply nested children_ of the materialized view (using their UUID primarily) to snapshot IDs/version IDs. Now to the point of this discussion: if a child happens to be an MV, then it is conceptually still a view. The above framework would also naturally capture that: View version of the view aspect of the MV will be captured, and underlying table snapshot IDs would also be captured, since we are storing deeply nested state information. So to summarize, I prefer to handle MVs as views because: * They are actually views (tables is just implementation detail of MV). * This framing blends well with previously set lineage and state information; it does not introduce new language or treatment, so keeps things simple. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
