amogh-jahagirdar commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1070698774
########## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ########## @@ -36,17 +38,21 @@ default Type type() { String dialect(); /** The default catalog when the view is created. */ + @Nullable String defaultCatalog(); /** The default namespace when the view is created. */ + @Nullable Namespace defaultNamespace(); - /** The query output schema at version create time, without aliases. */ - Schema schema(); + /** The query output schema id at version create time */ + int schemaId(); Review Comment: Since we have the flexibility to change this API until core implementation is complete, I changed the API to schemaID instead of schema. This is mostly because it simplifies the parsing logic and then it's still easy for a caller to obtain the schema from the top level view schema mapping. Currently in the view spec, the schema ID is stored per SQL view representation. So in the current implementation building the representation when parsing is straightforward. There are a few routes to go here: 1.) Maintain schema ID in metadata and the API just returns schema ID as well (keeping parsing logic simple). That's what's done in this PR and I think is preferable since looking up schema via view.schemas().get(schemaID) should be straightforward. 2.) Preserve the schema at the API level, maintain schema ID in metadata. This complicates parsing logic (although not too much) because during parsing we need to pass the top level schemas list to SQLViewRepresentationParser https://iceberg.apache.org/view-spec/#view-metadata and then obtain the schema based on the parsed schema ID. 3.) Update the spec so that the entire schema object is stored in metadata and then serialize/deserialize. Another topic (regardless of option 1 or 3) is the spec currently marks schemaID as optional for sql view representation. I think it must be required (I can't think of a case where for a SQL representation we don't want to maintain a well defined Iceberg schema, engines can still choose to ignore it if they want to although I can't really think of such a case). I can raise a PR to update that if we think it's the right approach. cc: @jzhuge @rdblue @jackye1995 @nastra Let me know your thoughts on which approach above you find preferable and if we agree schema should be updated to be required for SQL view representation in the spec! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org