Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]

via GitHub Fri, 05 Jan 2024 12:51:49 -0800


amogh-jahagirdar commented on PR #9414:
URL: https://github.com/apache/iceberg/pull/9414#issuecomment-1879243149


   > That line of reasoning makes me think that any change (even comments) 
should just produce a new version of the view.
   
   Yes, that's the approach in this PR currently. A new version is produced via 
ReplaceViewVersion (view.updateColumnDoc())  returns ReplaceViewVersion. In my 
next iteration, I'm going to just put the logic in the existing Replace API. 
That fits the operation pattern that exists throughout Iceberg better.
   
    I think creating a new version is required, since we are producing a new 
schema and there *must* be a new version to point to the new schema. The issue 
is that currently the only API that exists for producing a new version is 
`replace` which requires users to pass in query/dialect pair(s), default 
namespace, schema, etc.
   
   The case for a new API is so that engine integrations don't need to 
determine all the queries/dialects/default namespace/catalog when the replace 
is performed for just column doc updates.
   
   More concretely, take the Trino case, which allows the following 
syntax:https://trino.io/docs/current/sql/comment.html
   
   ```
   COMMENT ON view.col IS 'some col'
   ```
   
   Currently, just using the replace API we'd have to implement creating the 
new Schema with the updated doc (the code in Schema#withUpdatedColumnDoc) and 
then we'd also want to preserve all the queries/dialects, default namespace, 
and default catalog so the integration has to pass the current state of all of 
those to the replace API. 
   
   ```
   public void updateColumn(String col, String doc) {
         View view = loadView(...)
   
         Schema updatedSchema = newSchemaWithColumnDoc(view.schema(), col, doc);
         view.replaceVersion()
               .withSchema(updatedSchema)
               .withQuery(allCurrentDialects, allTheirQueries)
              .withDefaultCatalog(view.currentVersion().defaultCatalog())
              .withNamespace(view.currentVersion().namespace())
               .commit()
   }
   
   private Schema newSchemaWithColumnDoc(Schema schema, String col, String doc) 
{
      NestedField fieldToUpdate = findField(name);
       Preconditions.checkArgument(fieldToUpdate != null, "Field %s does not 
exist", name);
       NestedField updatedField =
           NestedField.of(
               fieldToUpdate.fieldId(),
               fieldToUpdate.isOptional(),
               fieldToUpdate.name(),
               fieldToUpdate.type(),
               doc);
   
       List<NestedField> newFields = Lists.newArrayList();
       newFields.add(updatedField);
   
       for (NestedField field : columns()) {
         if (field.fieldId() != updatedField.fieldId()) {
           newFields.add(field);
         }
       }
   
       return new Schema(newFields, getAliases(), identifierFieldIds());
   
   }
   ```
   
   So the integration has to pass in a bunch of mostly irrelevant details into 
the replace just for updating the docs and if an engine integration misses a 
dialect to include in the replace, then updating the column doc has removed the 
other dialect unintentionally for the new version. Having an API in the library 
which takes care of these details would avoid any of these weird issues. 
   
   With the new API the integration should look something like:
   
   ```
   
   public void updateColumnDoc(String col, String doc) {
        View view = loadView(...)
         view.replaceVersion()
              .withColumnDoc(col, doc)
               .commit()
   }
    
   ```
   
   >If we allow making schema changes, then would we also allow modifying the 
view by widening view columns? 
   
   I would advocate against this. These kinds of schema evolution capabilities 
on views don't seem that useful to me at the cost of a lot of complexity. I 
think there's a semantic distinction in those kinds of capabilities and adding 
docs. That's why I don't think it makes sense to have UpdateViewSchema 
implementation for views and discarded it for the current approach.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]

Reply via email to