zlzhang0122 opened a new issue, #10165: URL: https://github.com/apache/iceberg/issues/10165
### Apache Iceberg version 1.3.0 ### Query engine Spark ### Please describe the bug 🐞 Iceberg may occur data duplication when use flink to write data to iceberg and commit failed, it cannot distinguish the snapshot emit by each checkpoint, and then once the committer stuck for a moment or similar situation, it will commit all the snapshots produced by current checkpoint and all previous checkpoints. We can modify some flink config to avoid this problem, but I think this is not a perfect resolvent since it can cause other relative problem, maybe we can emit the snapshots and checkpointId together and we can solve it completely. What do u think and any reply will be appreciated, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org