stevenzwu commented on issue #10892: URL: https://github.com/apache/iceberg/issues/10892#issuecomment-2277051765
this is indeed a problem. let's first clarify what should be the expected behavior. When users rewind the job from an older checkpoint/savepoint to reprocess older data, it indicates users are ok with duplicates. > If this makes sense, I was wondering, if it would be possible to actually rollback the table snapshot to the version that we get from the Flink's checkpoint/savepoint upon recovery. this is a logical thinking. but it may not be always safe and has some other implications to downstream. * if there are multiple jobs write to the same table. Rewinding the Iceberg table state to an earlier snapshot just based on one job is incorrect. * If downstream jobs (streaming or batch) does incremental processing on this Iceberg table, automatic rewinding table state in the upstream writer job is not safe. I prefer this type of Iceberg table state manipulation as a manual admin action (instead of automatic from the Flink writer job). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org