[PR] Views, Spark: Add support for Materialized Views; Integrate with Spark SQL [iceberg]

via GitHub Thu, 11 Jun 2026 15:58:08 -0700


wmoustafa opened a new pull request, #9830:
URL: https://github.com/apache/iceberg/pull/9830


   ## Summary                                                                   
                                                                                
                                                                                
                                                                                
  
   This PR adds support for materialized views in Iceberg and integrates the 
implementation with Spark SQL.                                                  
          
   ## Spec                                                                      
                                                                                
                                                                
   Full Materialized View Spec can be found in #11041. A materialized view is 
an Iceberg view whose current version has a `storage-table` field: a struct 
with `namespace` and `name` identifying an Iceberg table that    holds the 
precomputed results. The storage table is used to return the precomputed 
results of the view as long as the results are "fresh".
   
   Freshness is tracked through a `refresh-state` JSON string stored in the 
storage table's snapshot summary. The refresh state captures:                   
           
     - The view version ID at the time of refresh
     - The state of each source table or view (snapshot ID, version ID, UUID)   
                                                                                
         
     - The refresh start timestamp                                   
   
   A materialized view is considered fresh when the view version ID and all 
source snapshot/version IDs in the refresh state match their current values.    
           
   ## Core                                                                      
                                                                                
                                                                      
   New model classes:
     - `ViewVersion.storageTable()` — nullable `TableIdentifier` on the view 
version; non-null indicates a materialized view
     - `RefreshState` / `RefreshStateParser` — model and JSON serialization for 
refresh state stored in snapshot summaries
     - `SourceState` / `SourceTableState` / `SourceViewState` — polymorphic 
source state model discriminated by a `type` field (`table` or `view`)          
             
   ## Spark SQL                                                                 
                                                                                
                                                                    
   This PR adds support for `CREATE MATERIALIZED VIEW` and extends `DROP VIEW` 
to handle materialized views:                                                   
        
     - `CREATE MATERIALIZED VIEW` creates the storage table first, then 
registers the view metadata with a `storage-table` reference on the view 
version. The storage table identifier can be specified via a `STORED AS 
'<identifier>'` clause; otherwise a default `<name>__storage` identifier is 
used.                                
     - `DROP VIEW` on a materialized view removes both the view metadata and 
its associated storage table.
     - `REFRESH MATERIALIZED VIEW` is left as a future enhancement.             
                                                                                
         
   ## Spark Catalog                                                             
                                                                                
                   
     The `SparkCatalog` determines whether to serve precomputed data from the 
storage table or fall back to the view's SQL query:
     - `loadTable()` checks if the requested identifier corresponds to a fresh 
materialized view. If so, it returns a `SparkMaterializedView` backed by the 
storage
     table, allowing queries to read the precomputed data directly.             
                                                                                
         
     - `loadView()` checks if the materialized view is fresh. If fresh, it 
defers to `loadTable()`. If stale, it returns a `SparkView`, triggering the 
usual Spark view logic that re-executes the query against the current state of 
the source tables.                                                              
                      
   ## Notes                                                                     
                                                                                
      
     - The `InMemoryCatalog` has been extended with a test `LocalFileIO` to 
support data file operations required by the storage table.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Views, Spark: Add support for Materialized Views; Integrate with Spark SQL [iceberg]

Reply via email to