amogh-jahagirdar commented on PR #9414: URL: https://github.com/apache/iceberg/pull/9414#issuecomment-1879243149
> That line of reasoning makes me think that any change (even comments) should just produce a new version of the view. Yes, that's the approach in this PR currently. A new version is produced via ReplaceViewVersion (view.updateColumnDoc()) returns ReplaceViewVersion. In my next iteration, I'm going to just put the logic in the existing Replace API. That fits the operation pattern that exists throughout Iceberg better. I think creating a new version is required, since we are producing a new schema and there *must* be a new version to point to the new schema. The issue is that currently the only API that exists for producing a new version is `replace` which requires users to pass in query/dialect pair(s), default namespace, schema, etc. The case for a new API is so that engine integrations don't need to determine all the queries/dialects/default namespace/catalog when the replace is performed for just column doc updates. More concretely, take the Trino case, which allows the following syntax:https://trino.io/docs/current/sql/comment.html ``` COMMENT ON view.col IS 'some col' ``` Currently, just using the replace API we'd have to implement creating the new Schema with the updated doc (the code in Schema#withUpdatedColumnDoc) and then we'd also want to preserve all the queries/dialects, default namespace, and default catalog so the integration has to pass the current state of all of those to the replace API. ``` public void updateColumn(String col, String doc) { View view = loadView(...) Schema updatedSchema = newSchemaWithColumnDoc(view.schema(), col, doc); view.replaceVersion() .withSchema(updatedSchema) .withQuery(allCurrentDialects, allTheirQueries) .withDefaultCatalog(view.currentVersion().defaultCatalog()) .withNamespace(view.currentVersion().namespace()) .commit() } private Schema newSchemaWithColumnDoc(Schema schema, String col, String doc) { NestedField fieldToUpdate = findField(name); Preconditions.checkArgument(fieldToUpdate != null, "Field %s does not exist", name); NestedField updatedField = NestedField.of( fieldToUpdate.fieldId(), fieldToUpdate.isOptional(), fieldToUpdate.name(), fieldToUpdate.type(), doc); List<NestedField> newFields = Lists.newArrayList(); newFields.add(updatedField); for (NestedField field : columns()) { if (field.fieldId() != updatedField.fieldId()) { newFields.add(field); } } return new Schema(newFields, getAliases(), identifierFieldIds()); } ``` So the integration has to pass in a bunch of mostly irrelevant details into the replace just for updating the docs and if an engine integration misses a dialect to include in the replace, then updating the column doc has removed the other dialect unintentionally for the new version. Having an API in the library which takes care of these details would avoid any of these weird issues. With the new API the integration should look something like: ``` public void updateColumnDoc(String col, String doc) { View view = loadView(...) view.replaceVersion() .withColumnDoc(col, doc) .commit() } ``` >If we allow making schema changes, then would we also allow modifying the view by widening view columns? I would advocate against this. These kinds of schema evolution capabilities on views don't seem that useful to me at the cost of a lot of complexity. I think there's a semantic distinction in those kinds of capabilities and adding docs. That's why I don't think it makes sense to have UpdateViewSchema implementation for views and discarded it for the current approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org