szehon-ho commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1912878826

   Hi @JanKaul .  Thanks for putting this together.  I went through the 
detailed discussion, and see the general consensus to the "Open Questions" in 
the design docs are:
   
   1.  The pointer to the storage table should be stored as an optional field 
in the view metadata (option 1)
   2. Lineage information should be stored as additional fields in the summary 
of the storage table (option 2)
   3. Only the view (and not storage table) should be registered in catalog 
(option 2)
   
   Is that correct?  Then given that, I have summarized the additions we are 
making to the current metadata spec below.
   
   # View Metadata
    | v1 | v2 | Field Name | Description |
    |---|---|---|---|
     | optional | materialization | An optional `materialization` struct. If 
the value is null the entity is a common view, otherwise it is a materialized 
view |
   
   Materialization Struct
   | v1 | Field Name | Description |
   | -- | -- | -- |
   required | format-version | An integer version number for the materialized 
view format. Currently, this must be 1. Implementations must throw an exception 
if the materialized view's version is higher than the supported version.
   required | storage-table | Table metadata location | 
   
   # Snapshot
   | v1 | Field Name | Description |
   | -- | -- | -- |
   | optional | refresh-version-id | Version id of the materialized view when 
the refresh operation was performed.
   optional | source-tables | A List of `source-table` records. |
   
   Source Table Struct
   | v1 | Field Name | Description |
   | -- | -- | -- |
   | required | identifier | Identifier of the table as defined in the SQL 
expression. |
   | required | snapshot-id | Snapshot id of the source table when the last 
refresh operation was performed. |
   
   
   Let me know if that looks right.
   
   My 2c on this are:
   1.  The materialization struct having its own format version seems overkill 
to me, maybe we can just flatten it and make the materialization directly just 
the storage-table pointer itself?
   2. Similar to @jackye1995 on the comment above: 
https://github.com/apache/iceberg/issues/6420#issuecomment-1398572156, I feel 
having the list of source-tables is a bit difficult.  Can we proceed without 
this in the first cut?  I feel the engines, if they wanted, could parse the 
source tables, look them up, and get the snapshot-ids directly.  They must to 
be able to parse the view-sql so should be able to parse that.
   3. How about the 'refresh-strategy'?  (Didnt see it in the google-doc).  I 
feel it can in the current 'properties' field of view metadata.  iiuc, @rdblue 
also had suggested putting this in the table properties of the storage table 
along with other fields like materialized_view_format_version and 
view_identifier, which sounds fine too.
   
   If there is general consensus on the direction, it'd be great to move to on 
the actual spec pr change and discuss specifics there, as seems like this 
proposal has been sitting awhile?  I can also help with that, if needed.  
Thanks.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to