aiborodin opened a new pull request, #14092: URL: https://github.com/apache/iceberg/pull/14092
Addresses the following issue: https://github.com/apache/iceberg/issues/14090. `DynamicWriteResultAggregator` currently produces multiple committables per table/branch/checkpoint triplet because it aggregates write results by WriteTarget, which is unique per schemaId, specId, and equality fields. It violates the idempotence contract of the DynamicCommitter, as it relies on one commit request per triplet to identify and skip already committed requests during recovery. Fix the issue by aggregating `WriteResult` objects by table and branch (aka `TableKey`), which would emit a single committable per checkpoint. It requires serialising the aggregated `WriteResult` and saving it in a Flink checkpoint instead of a temporary manifest file, because, according to the Iceberg spec, a single manifest must contain files with only one partition spec, while we may aggregate write results for potentially multiple partition specs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
