aiborodin opened a new pull request, #14092:
URL: https://github.com/apache/iceberg/pull/14092

   Addresses the following issue: 
https://github.com/apache/iceberg/issues/14090.
   
   `DynamicWriteResultAggregator` currently produces multiple committables per 
table/branch/checkpoint triplet because it aggregates write results by 
WriteTarget, which is unique per schemaId, specId, and equality fields. It 
violates the idempotence contract of the DynamicCommitter, as it relies on one 
commit request per triplet to identify and skip already committed requests 
during recovery.
   
   Fix the issue by aggregating `WriteResult` objects by table and branch (aka 
`TableKey`), which would emit a single committable per checkpoint. It requires 
serialising the aggregated `WriteResult` and saving it in a Flink checkpoint 
instead of a temporary manifest file, because, according to the Iceberg spec, a 
single manifest must contain files with only one partition spec, while we may 
aggregate write results for potentially multiple partition specs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to