aiborodin commented on issue #14425: URL: https://github.com/apache/iceberg/issues/14425#issuecomment-3484180614
@pvary We can't use `conflictDetectionFilter()` because it operates on a per-record basis and doesn't expose the `_file` column, which we would need to identify duplicate add/delete files coming from concurrent commits. Adding this column would also require changing the public API and result in a much more complex filter condition passed from the Flink job. All existing validation methods rely on predefined conditions and won't work. We need a simple check of the `Snapshot`'s summary to identify duplicate commits using the `flink.max-committed-checkpoint-id` property. There's no way to do this using the existing API. I understand the concern about changing the core API, and I am happy to get others' opinions on this. I raised an alternative solution, which has no modifications to the core APIs (apart from 1 line), and extends `BaseRowDelta` and `BaseReplacePartitions`: https://github.com/apache/iceberg/pull/14484. However, in this solution, we have to manually instantiate the subclasses: `FlinkRowDelta` and `FlinkReplacePartitions`, and enforce the presence of `HasTableOperations` to access `TableOperations`. In contrast, the first solution (https://github.com/apache/iceberg/pull/14445) cleanly retrieves these operations from the table API. I personally favour the first solution of adding new public validation methods (https://github.com/apache/iceberg/pull/14445), because it seems generic enough and can be useful for other applications where clients may want to have a custom validation, for example, using `Snapshot` properties. But I am also okay with the second option of inheriting from the public APIs. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
