mxm commented on PR #14092: URL: https://github.com/apache/iceberg/pull/14092#issuecomment-3333196900
>Given the above, @mxm, do you still think we need to commit multiple WriteResults separately for (table, branch, checkpoint) triplet and implement the index-based solution to guarantee idempotency as you mentioned here: https://github.com/apache/iceberg/issues/14090#issuecomment-3324732610? If so, could you please explain why this solution is necessary? Each table / branch pair requires a separate table snapshot. While we could combine multiple Flink checkpoints during recovery, I don't think there is much benefit from doing that. Apart from recovery, every checkpoint would normally be processed independently. We wouldn't gain much from optimizing the snapshots by combining commit request from multiple checkpoints. >We can only commit files for multiple checkpoints when there are only appends/data files in the checkpoint. I wasn't suggesting that we combine WriteResults from multiple Flink checkpoints. I'm suggesting to combine the append-only WriteResults in each Flink checkpoint. Currently, every WriteResult is processed separately, which creates a lot of table snapshots. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
