Re: [I] Restoring the Flink streaming job from and older checkpoint/savepoint might trigger a silent data loss [iceberg]

via GitHub Mon, 20 Jan 2025 05:14:44 -0800


lkokhreidze commented on issue #10892:
URL: https://github.com/apache/iceberg/issues/10892#issuecomment-2602397744


   I know I am late, but I am still planning to look into this issue, sorry for 
the delay :(
   
   FYI - there was a bug in Flink's adaptive scheduler that was causing Flink 
to recover with empty checkpoint under certain conditions. Flink's bug also 
leads to the same data loss scenario as described in this issue.
   https://issues.apache.org/jira/browse/FLINK-34518
   
   We were able to mitigate the issue by implementing the custom Flink operator 
that upon recovery rollbacks the Iceberg table to the snapshot version that 
matches the recovered checkpoint ID. We only have one writer for the Iceberg 
tables so that worked fine for us.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Restoring the Flink streaming job from and older checkpoint/savepoint might trigger a silent data loss [iceberg]

Reply via email to