lirui-apache commented on issue #5846: URL: https://github.com/apache/iceberg/issues/5846#issuecomment-1651312676
We had a similar issue here. `IcebergFilesCommitter` commits data files when a checkpoint completes, and stores the checkpoint ID in iceberg snapshot summary, and removes the committed manifest. When restoring from state, it tries to restore the [max committed checkpoint id](https://github.com/apache/iceberg/blob/apache-iceberg-0.13.2/flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java#L147) from table history. But this seems to be just best efforts because the snapshot might have been expired. And when that happens, IcebergFilesCommitter considers all files in the state as "uncommitted" and tris to [commit them again](https://github.com/apache/iceberg/blob/apache-iceberg-0.13.2/flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java#L154), which can fail because the committed manifest has already been removed. @openinx is this a known limitation, so that we should be more careful with snapshot expiration? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
