[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

GitBox Sun, 15 Jan 2023 14:04:00 -0800


amogh-jahagirdar commented on code in PR #6598:
URL: https://github.com/apache/iceberg/pull/6598#discussion_r1070698774



##########
api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java:
##########
@@ -36,17 +38,21 @@ default Type type() {
   String dialect();
 
   /** The default catalog when the view is created. */
+  @Nullable
   String defaultCatalog();
 
   /** The default namespace when the view is created. */
+  @Nullable
   Namespace defaultNamespace();
 
-  /** The query output schema at version create time, without aliases. */
-  Schema schema();
+  /** The query output schema id at version create time */
+  int schemaId();

Review Comment:
   Since we have the flexibility to change this API until core implementation 
is complete, I changed the API to schemaID instead of schema. This is mostly 
because it simplifies the parsing logic and then it's still easy for a caller 
to obtain the schema from the top level view schema mapping.
   
   Currently in the view spec, the schema ID is stored per SQL view 
representation. So in the current implementation building the representation 
when parsing is straightforward. 
   
   There are a few routes to go here:
   
   1.) Maintain schema ID in metadata and the API just returns schema ID as 
well (keeping parsing logic simple). That's what's done in this PR and I think 
is preferable since looking up schema via view.schemas().get(schemaID) should 
be straightforward.
   
   2.) Preserve the schema at the API level, maintain schema ID in metadata. 
This complicates parsing logic (although not too much) because during parsing 
we need to pass the top level schemas list to SQLViewRepresentationParser 
https://iceberg.apache.org/view-spec/#view-metadata and then obtain the schema 
based on the parsed schema ID.
   
   3.) Update the spec so that the entire schema object is stored in metadata 
and then serialize/deserialize.
   
   Another topic (regardless of option 1 or 3) is the spec currently marks 
schemaID as optional for sql view representation. I think it must be required 
(I can't think of a case where for a SQL representation we don't want to 
maintain a well defined Iceberg schema, engines can still choose to ignore it 
if they want to although I can't really think of such a case). I can raise a PR 
to update that if we think it's the right approach. 
   
   cc: @jzhuge @rdblue @jackye1995 @nastra 
   
   Let me know your thoughts on which approach above you find preferable and if 
we agree schema should be updated to be required for SQL view representation in 
the spec! 
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

Reply via email to