aiborodin opened a new pull request, #14312:
URL: https://github.com/apache/iceberg/pull/14312

   Addresses code quality and architectural issues in the following comment: 
https://github.com/apache/iceberg/pull/14182#issuecomment-3336891582.
   
   `DynamicWriteResultAggregator` currently produces multiple 
`DynamicCommittables` per (table, branch, checkpoint) triplet. This initially 
broke the commit recovery of the dynamic Iceberg sink (see 
https://github.com/apache/iceberg/issues/14090), and was later addressed by a 
hot fix to aggregate `WriteResults` in the `DynamicCommitter`. 
   
   Refactor the commit aggregator to output only one committable per triplet. 
Clean up `DynamicCommitter` to remove assumptions of multiple commit requests 
per table, branch, and checkpoint. This requires serializing the aggregated 
WriteResult using multiple temporary manifests for each unique partition spec 
because the Iceberg manifest writer requires a single partition spec per file. 
We can improve this later by changing how we serialize `DataFiles` and 
`DeleteFiles` for Flink checkpoints in the `DynamicSink`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to