mxm commented on PR #14092:
URL: https://github.com/apache/iceberg/pull/14092#issuecomment-3333196900

   >Given the above, @mxm, do you still think we need to commit multiple 
WriteResults separately for (table, branch, checkpoint) triplet and implement 
the index-based solution to guarantee idempotency as you mentioned here: 
https://github.com/apache/iceberg/issues/14090#issuecomment-3324732610? If so, 
could you please explain why this solution is necessary?
   
   Each table / branch pair requires a separate table snapshot. While we could 
combine multiple Flink checkpoints during recovery, I don't think there is much 
benefit from doing that. Apart from recovery, every checkpoint would normally 
be processed independently. We wouldn't gain much from optimizing the snapshots 
by combining commit request from multiple checkpoints.
   
   >We can only commit files for multiple checkpoints when there are only 
appends/data files in the checkpoint.
   
   I wasn't suggesting that we combine WriteResults from multiple Flink 
checkpoints. I'm suggesting to combine the append-only WriteResults in each 
Flink checkpoint. Currently, every WriteResult is processed separately, which 
creates a lot of table snapshots.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to