aiborodin commented on code in PR #14509:
URL: https://github.com/apache/iceberg/pull/14509#discussion_r2496744800
##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -459,6 +467,23 @@ public void commit() {
return;
}
+ // Pass the snapshot ancestry of the base and the updated
snapshots to the
+ // validator.
+ String validationBranch =
Review Comment:
> I disagree with this since the original validate call is called before we
build the new update. As a generalized API for validating the snapshots
pre-commit, we really need to expose the ability to inspect what is planned to
be committed.
What's the value for clients to inspect the new internal metadata, which is
about to be committed? The new parent `Snapshot` is sufficient minimal
information for both Kafka Connect and Flink to decide if the validation
passes, without relying on the new `TableMetadata`. Everything else can be
enclosed in validation closures, as @rdblue suggested.
There's a reason why the current `validate(base, parentSnapshot);` call runs
before the new metadata is generated - first, we don't need these internal
details to decide as I said above, but also, - because we don't want to waste
CPU cycles, and IO writing the new manifests and the manifest list just to run
the validation and immediately throw away these files if it fails.
> The KC case only looks at the prior state, but I don't think we should
focus so narrowly on that scenario.
I think we can always enhance and extend this API in the future by adding
more information if needed. For now, it seems reasonable to focus on adding the
minimal required API, which aligns with the existing validation mechanism in
`SnapshotProducer::validate(TableMetadata currentMetadata, Snapshot snapshot)`,
and works for both Kafka Connect and Flink use cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]