maekchi opened a new issue, #10765: URL: https://github.com/apache/iceberg/issues/10765
### Apache Iceberg version 1.4.3 ### Query engine Flink ### Please describe the bug 🐞 It seems like very rare duplicates occur in flink iceberg. Let me explain the situation I experienced: * log with timeline ``` 04:57:33 Triggering checkpoint 19516 (type=CheckpointType{name='Checkpoint', sharingFilesStrategy=FORWARD_BACKWARD}) @ 1721764651219 for job ba65ea243c487f4f0fd52c158e4ed985. 04:57:33 Completed checkpoint 19516 for job ba65ea243c487f4f0fd52c158e4ed985 (6182 bytes, checkpointDuration=389 ms, finalizationTime=60 ms). 04:58:36 Triggering checkpoint 19517 (type=CheckpointType{name='Checkpoint', sharingFilesStrategy=FORWARD_BACKWARD}) @ 1721764711219 for job ba65ea243c487f4f0fd52c158e4ed985. 04:58:42 Triggering Checkpoint 19517 for job ba65ea243c487f4f0fd52c158e4ed985 failed due to java.util.concurrent.TimeoutException: Invocation of [RemoteRpcInvocation(TaskExecutorGateway.triggerCheckpoint(ExecutionAttemptID, long, long, CheckpointOptions))] at recipient [akka.tcp://flink@172.24.64.148:6122/user/rpc/taskmanager_0] timed out. This is usually caused by: 1) Akka failed sending the message silently, due to problems like oversized payload or serialization failures. In that case, you should find detailed error information in the logs. 2) The recipient needs more time for responding, due to problems like slow machines or network jitters. In that case, you can try to increase akka.ask.timeout. 04:58:42 Failed to trigger or complete checkpoint 19517 for job ba65ea243c487f4f0fd52c158e4ed985. (0 consecutive failed attempts so far) 04:59:00 Job clender-korea-member-raw_oauthlog_from_loginjava (ba65ea243c487f4f0fd52c158e4ed985) switched from state RESTARTING to RUNNING. 04:59:00 Clearing resource requirements of job ba65ea243c487f4f0fd52c158e4ed985 04:59:00 Restoring job ba65ea243c487f4f0fd52c158e4ed985 from Checkpoint 19516 @ 1721764651219 for ba65ea243c487f4f0fd52c158e4ed985 located at [hdfs-path]/ff5df181-7682-4153-bafc-8e489c506d92/checkpoints/ba65ea243c487f4f0fd52c158e4ed985/chk-19516. 04:59:45 Failed to trigger checkpoint for job ba65ea243c487f4f0fd52c158e4ed985 since Checkpoint triggering task freezer-IcebergFilesCommitter -> Sink: IcebergSink hive.[tablename] (1/1) of job ba65ea243c487f4f0fd52c158e4ed985 is not being executed at the moment. Aborting checkpoint. Failure reason: Not all required tasks are currently running.. 05:00:46 Triggering checkpoint 19518 (type=CheckpointType{name='Checkpoint', sharingFilesStrategy=FORWARD_BACKWARD}) @ 1721764840957 for job ba65ea243c487f4f0fd52c158e4ed985. ``` * taskmanager pod timeout at 04:59:19 * result * metadata.json ```json { "sequence-number" : 201719, "snapshot-id" : 8203882888081487848, "parent-snapshot-id" : 7556868946872881546, "timestamp-ms" : 1721764676985, "summary" : { "operation" : "append", "flink.operator-id" : "9135501d46e54bf84710f477c1eb5f38", "flink.job-id" : "ba65ea243c487f4f0fd52c158e4ed985", "flink.max-committed-checkpoint-id" : "19516", "added-data-files" : "1", "added-records" : "17554", "added-files-size" : "664840", "changed-partition-count" : "1", "total-records" : "3966880804", "total-files-size" : "241007398466", "total-data-files" : "774", "total-delete-files" : "2", "total-position-deletes" : "18608", "total-equality-deletes" : "0" }, "manifest-list" : "hdfs://~~~~~/metadata/snap-8203882888081487848-1-354fd0bb-38d9-4706-8483-8a4276888dc3.avro", "schema-id" : 2 }, { "sequence-number" : 201720, "snapshot-id" : 3289453546560274810, "parent-snapshot-id" : 8203882888081487848, "timestamp-ms" : 1721764798149, "summary" : { "operation" : "append", "flink.operator-id" : "9135501d46e54bf84710f477c1eb5f38", "flink.job-id" : "ba65ea243c487f4f0fd52c158e4ed985", "flink.max-committed-checkpoint-id" : "19516", "added-data-files" : "1", "added-records" : "17554", "added-files-size" : "664840", "changed-partition-count" : "1", "total-records" : "3966898358", "total-files-size" : "241008063306", "total-data-files" : "775", "total-delete-files" : "2", "total-position-deletes" : "18608", "total-equality-deletes" : "0" }, "manifest-list" : "hdfs://~~~~~/metadata/snap-3289453546560274810-2-e0983626-a2a5-49f2-988b-dc432f100451.avro", "schema-id" : 2 }, ``` * snap-8203882888081487848-1-354fd0bb-38d9-4706-8483-8a4276888dc3.avro <img width="2044" alt="image" src="https://github.com/user-attachments/assets/09cffc8e-3dbe-46ae-ae72-748b62a2479b"> * snap-3289453546560274810-2-e0983626-a2a5-49f2-988b-dc432f100451.avro" <img width="1879" alt="image" src="https://github.com/user-attachments/assets/c3795342-f1c7-4b42-8eca-ba5287e5f89d"> * 354fd0bb-38d9-4706-8483-8a4276888dc3-m0.avro * snapshot-id : 8203882888081487848 * data_file path : hdfs://~~~~/data/xxx/00000-0-01327dd4-6162-483d-b2e0-bb0694402807-00513.parquet * e0983626-a2a5-49f2-988b-dc432f100451-m0.avro * snapshot-id : 3289453546560274810 * data_file path : hdfs://~~~~/data/xxx/00000-0-01327dd4-6162-483d-b2e0-bb0694402807-00513.parquet Looking at the situation, the restore was done with the checkpoint ID that was completed up to the checkpoint, and the commit was performed again up to the completed checkpoint. As a result, the commit for checkpoint id 19516 was performed twice, pointing to the same data file. When I read this in trino, the same data will be read twice and will appear duplicated. I tried to delete it to resolve duplicate data, but the following error occurred in trino. `Found more deleted rows than exist in the file` rewrite_file was also performed ``` CALL spark_catalog.system.rewrite_data_files(table => 'table', where => 'partition="xxxxxx"'); 24/07/24 06:38:55 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxT oStringFields'. 24/07/24 06:44:02 WARN ManifestFilterManager: Deleting a duplicate path from manifest hdfs://~~~~~/metadata/f0b56e5f-9b32-48d2-ba77-ddb93081c881-m1.avro: hdfs://~~~~/data/parttiion/00 000-0-01327dd4-6162-483d-b2e0-bb0694402807-00513.parquet 398 1 373764540 0 Time taken: 313.164 seconds, Fetched 1 row(s) ``` but contrary to the message, data duplication was not actually resolved. and expire snapstot is not worked, too. Finally, I will try to modify manifest directly. Is there a better solution in this situation? and please check this situation Thanks. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org