amogh-jahagirdar commented on code in PR #6598:
URL: https://github.com/apache/iceberg/pull/6598#discussion_r1070698774
##########
api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java:
##########
@@ -36,17 +38,21 @@ default Type type() {
String dialect();
/** The default catalog when the view is created. */
+ @Nullable
String defaultCatalog();
/** The default namespace when the view is created. */
+ @Nullable
Namespace defaultNamespace();
- /** The query output schema at version create time, without aliases. */
- Schema schema();
+ /** The query output schema id at version create time */
+ int schemaId();
Review Comment:
Since we have the flexibility to change this API until core implementation
is complete, I changed the API to schemaID instead of schema. This is mostly
because it simplifies the parsing logic and then it's still easy for a caller
to obtain the schema from the top level view schema mapping.
Currently in the view spec, the schema ID is stored per SQL view
representation. So in the current implementation building the representation
when parsing is straightforward.
There are a few options here:
1.) Maintain schema ID in metadata and the API just returns schema ID as
well (keeping parsing logic simple). That's what's done in this PR and I think
is preferable since looking up schema via view.schemas().get(schemaID) should
be straightforward.
2.) Preserve the schema at the API level, maintain schema ID in metadata.
This complicates parsing logic (although not too much) because during parsing
we need to pass the top level schemas list to SQLViewRepresentationParser
https://iceberg.apache.org/view-spec/#view-metadata and then obtain the schema
based on the parsed schema ID.
3.) Update the spec so that the entire schema object is stored in metadata
and then serialize/deserialize.
Another topic (independent of which option we choose) is the spec currently
marks schemaID as optional for sql view representation. I think it must be
required (I can't think of a case where for a SQL representation we don't want
to maintain a well defined Iceberg schema, engines can still choose to ignore
it if they want to although I can't really think of such a case). I can raise a
PR to update that if we think it's the right approach.
cc: @jzhuge @rdblue @jackye1995 @nastra
Let me know your thoughts on which approach above you find preferable and if
we agree schema should be updated to be required for SQL view representation in
the spec!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]