[I] flink iceberg may occur duplication when succeed to write datafile and commit but checkpoint fail [iceberg]

via GitHub Tue, 23 Jul 2024 16:00:56 -0700


maekchi opened a new issue, #10765:
URL: https://github.com/apache/iceberg/issues/10765


   ### Apache Iceberg version
   
   1.4.3
   
   ### Query engine
   
   Flink
   
   ### Please describe the bug 🐞
   
   It seems like very rare duplicates occur in flink iceberg.
   
   Let me explain the situation I experienced:
   
   * log with timeline
   ```
   04:57:33 Triggering checkpoint 19516 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1721764651219 for job 
ba65ea243c487f4f0fd52c158e4ed985.
   04:57:33 Completed checkpoint 19516 for job ba65ea243c487f4f0fd52c158e4ed985 
(6182 bytes, checkpointDuration=389 ms, finalizationTime=60 ms).
   04:58:36 Triggering checkpoint 19517 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1721764711219 for job 
ba65ea243c487f4f0fd52c158e4ed985.
   04:58:42 Triggering Checkpoint 19517 for job 
ba65ea243c487f4f0fd52c158e4ed985 failed due to 
java.util.concurrent.TimeoutException: Invocation of 
[RemoteRpcInvocation(TaskExecutorGateway.triggerCheckpoint(ExecutionAttemptID, 
long, long, CheckpointOptions))] at recipient 
[akka.tcp://flink@172.24.64.148:6122/user/rpc/taskmanager_0] timed out. This is 
usually caused by: 1) Akka failed sending the message silently, due to problems 
like oversized payload or serialization failures. In that case, you should find 
detailed error information in the logs. 2) The recipient needs more time for 
responding, due to problems like slow machines or network jitters. In that 
case, you can try to increase akka.ask.timeout.
   04:58:42 Failed to trigger or complete checkpoint 19517 for job 
ba65ea243c487f4f0fd52c158e4ed985. (0 consecutive failed attempts so far)
   04:59:00 Job clender-korea-member-raw_oauthlog_from_loginjava 
(ba65ea243c487f4f0fd52c158e4ed985) switched from state RESTARTING to RUNNING.
   04:59:00 Clearing resource requirements of job 
ba65ea243c487f4f0fd52c158e4ed985
   04:59:00 Restoring job ba65ea243c487f4f0fd52c158e4ed985 from Checkpoint 
19516 @ 1721764651219 for ba65ea243c487f4f0fd52c158e4ed985 located at 
[hdfs-path]/ff5df181-7682-4153-bafc-8e489c506d92/checkpoints/ba65ea243c487f4f0fd52c158e4ed985/chk-19516.
   04:59:45 Failed to trigger checkpoint for job 
ba65ea243c487f4f0fd52c158e4ed985 since Checkpoint triggering task 
freezer-IcebergFilesCommitter -> Sink: IcebergSink hive.[tablename] (1/1) of 
job ba65ea243c487f4f0fd52c158e4ed985 is not being executed at the moment. 
Aborting checkpoint. Failure reason: Not all required tasks are currently 
running..
   05:00:46 Triggering checkpoint 19518 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1721764840957 for job 
ba65ea243c487f4f0fd52c158e4ed985.
   ```
   * taskmanager pod timeout at 04:59:19
   
   * result
   * metadata.json
   
   ```json
   {
       "sequence-number" : 201719,
       "snapshot-id" : 8203882888081487848,
       "parent-snapshot-id" : 7556868946872881546,
       "timestamp-ms" : 1721764676985,
       "summary" : {
         "operation" : "append",
         "flink.operator-id" : "9135501d46e54bf84710f477c1eb5f38",
         "flink.job-id" : "ba65ea243c487f4f0fd52c158e4ed985",
         "flink.max-committed-checkpoint-id" : "19516",
         "added-data-files" : "1",
         "added-records" : "17554",
         "added-files-size" : "664840",
         "changed-partition-count" : "1",
         "total-records" : "3966880804",
         "total-files-size" : "241007398466",
         "total-data-files" : "774",
         "total-delete-files" : "2",
         "total-position-deletes" : "18608",
         "total-equality-deletes" : "0"
       },
       "manifest-list" : 
"hdfs://~~~~~/metadata/snap-8203882888081487848-1-354fd0bb-38d9-4706-8483-8a4276888dc3.avro",
       "schema-id" : 2
     }, {
       "sequence-number" : 201720,
       "snapshot-id" : 3289453546560274810,
       "parent-snapshot-id" : 8203882888081487848,
       "timestamp-ms" : 1721764798149,
       "summary" : {
         "operation" : "append",
         "flink.operator-id" : "9135501d46e54bf84710f477c1eb5f38",
         "flink.job-id" : "ba65ea243c487f4f0fd52c158e4ed985",
         "flink.max-committed-checkpoint-id" : "19516",
         "added-data-files" : "1",
         "added-records" : "17554",
         "added-files-size" : "664840",
         "changed-partition-count" : "1",
         "total-records" : "3966898358",
         "total-files-size" : "241008063306",
         "total-data-files" : "775",
         "total-delete-files" : "2",
         "total-position-deletes" : "18608",
         "total-equality-deletes" : "0"
       },
       "manifest-list" : 
"hdfs://~~~~~/metadata/snap-3289453546560274810-2-e0983626-a2a5-49f2-988b-dc432f100451.avro",
       "schema-id" : 2
     },
   ```
   
   * snap-8203882888081487848-1-354fd0bb-38d9-4706-8483-8a4276888dc3.avro
   <img width="2044" alt="image" 
src="https://github.com/user-attachments/assets/09cffc8e-3dbe-46ae-ae72-748b62a2479b";>
   
   * snap-3289453546560274810-2-e0983626-a2a5-49f2-988b-dc432f100451.avro"
   <img width="1879" alt="image" 
src="https://github.com/user-attachments/assets/c3795342-f1c7-4b42-8eca-ba5287e5f89d";>
   
   * 354fd0bb-38d9-4706-8483-8a4276888dc3-m0.avro
     * snapshot-id : 8203882888081487848
     * data_file path : 
hdfs://~~~~/data/xxx/00000-0-01327dd4-6162-483d-b2e0-bb0694402807-00513.parquet
   * e0983626-a2a5-49f2-988b-dc432f100451-m0.avro
     * snapshot-id : 3289453546560274810
     * data_file path : 
hdfs://~~~~/data/xxx/00000-0-01327dd4-6162-483d-b2e0-bb0694402807-00513.parquet
   
   
   
   Looking at the situation, the restore was done with the checkpoint ID that 
was completed up to the checkpoint, and the commit was performed again up to 
the completed checkpoint.
   As a result, the commit for checkpoint id 19516 was performed twice, 
pointing to the same data file.
   
   When I read this in trino, the same data will be read twice and will appear 
duplicated.
   
   I tried to delete it to resolve duplicate data, but the following error 
occurred in trino.
   `Found more deleted rows than exist in the file`
   
   rewrite_file was also performed
   ```
   CALL spark_catalog.system.rewrite_data_files(table => 'table', where => 
'partition="xxxxxx"');
   24/07/24 06:38:55 WARN package: Truncated the string representation of a 
plan since it was too large. This behavior can be adjusted by setting 
'spark.sql.debug.maxT
   oStringFields'.
   24/07/24 06:44:02 WARN ManifestFilterManager: Deleting a duplicate path from 
manifest hdfs://~~~~~/metadata/f0b56e5f-9b32-48d2-ba77-ddb93081c881-m1.avro: 
hdfs://~~~~/data/parttiion/00
   000-0-01327dd4-6162-483d-b2e0-bb0694402807-00513.parquet
   398     1       373764540       0
   Time taken: 313.164 seconds, Fetched 1 row(s)
   ```
   but contrary to the message, data duplication was not actually resolved.
   and expire snapstot is not worked, too.
   
   Finally, I will try to modify manifest directly.
   
   Is there a better solution in this situation?
   and please check this situation
   
   Thanks.
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] flink iceberg may occur duplication when succeed to write datafile and commit but checkpoint fail [iceberg]

Reply via email to