zhongqishang opened a new issue, #10431: URL: https://github.com/apache/iceberg/issues/10431
### Apache Iceberg version 1.2.1 ### Query engine Flink ### Please describe the bug 🐞 I have a flink upsert job with a checkpoint interval of 5 minutes and an external service periodically(30min) triggers the savepoint. 5 files were generated in one checkpoint cycle, including two data files, two eq delete files, and one pos delete file. The 2 data files and 2 eq-delete files contained the same data. When I queried, duplicate data appeared. I think it is because the subsequent eq delete is not associated with the first data file. Flink TM log ``` 2024-05-31 16:10:57.457 org.apache.hadoop.io.compress.CodecPool [] - Got brand-new compressor [.zstd] 2024-05-31 16:10:57.459 org.apache.hadoop.io.compress.CodecPool [] - Got brand-new compressor [.zstd] 2024-05-31 16:10:57.462 org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - Checkpoint 5765 has been notified as aborted, would not trigger any checkpoint. 2024-05-31 16:13:58.455 org.apache.hadoop.io.compress.CodecPool [] - Got brand-new compressor [.zstd] 2024-05-31 16:13:58.505 org.apache.hadoop.io.compress.CodecPool [] - Got brand-new compressor [.zstd] 2024-05-31 16:13:58.507 org.apache.hadoop.io.compress.CodecPool [] - Got brand-new compressor [.zstd] ``` JM log ``` 2024-05-31 16:08:12.840 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 5764 (type=CHECKPOINT) @ 1717142891998 for job fc721024df3d70e3a1f3a46a63e9635a. 2024-05-31 16:08:16.239 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Triggering savepoint for job fc721024df3d70e3a1f3a46a63e9635a. 2024-05-31 16:08:16.242 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 5765 (type=SAVEPOINT) @ 1717142896239 for job fc721024df3d70e3a1f3a46a63e9635a. 2024-05-31 16:09:41.531 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 5764 for job fc721024df3d70e3a1f3a46a63e9635a (7170 bytes, checkpointDuration=89495 ms, finalizationTime=38 ms). 2024-05-31 16:09:41.532 INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Marking checkpoint 5764 as completed for source Source: TableSourceScan(table=[[default_catalog, default_database, cdc_xxx]], fields=[id, data_status, ...]). 2024-05-31 16:10:46.242 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Checkpoint 5765 of job fc721024df3d70e3a1f3a46a63e9635a expired before completing. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org