zlzhang0122 opened a new issue, #10165:
URL: https://github.com/apache/iceberg/issues/10165

   ### Apache Iceberg version
   
   1.3.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Iceberg may occur data duplication when use flink to write data to iceberg 
and commit failed, it cannot distinguish the snapshot emit by each checkpoint, 
and then once the committer stuck for a moment or similar situation, it will 
commit all the snapshots produced by current checkpoint and all previous 
checkpoints.
   We can modify some flink config to avoid this problem, but I think this is 
not a perfect resolvent since it can cause other relative problem, maybe we can 
emit the snapshots and checkpointId together and we can solve it completely. 
What do u think and any reply will be appreciated, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to