pvary commented on issue #9089: URL: https://github.com/apache/iceberg/issues/9089#issuecomment-1843199768
Flink Writer operators write the data files to their final place - to the data directory of the table. Then sends the file metadata (file name, statistics, etc) to the Flink Committer operator. On snapshot/checkpoint the Flink Committer does a 2 phase commit: - Phase 1 - snapshotState - creates a single temporary manifest file in the metadata directory, which stores the metadata for every files received in the given snapshot. Also stores the manifest file path (this is a bit simplified for clarity) in the job state. - Phase 2 - notifyCheckpointComplete - reads the temporary manifest file and creates an iceberg commit from it. The commit summary for this commit will contain: - JobId - OperatorId - CheckpointId When the Flink job restarts, it reads the table commits, and finds the highest checkpointId for the jobId and operatorId in the history. Also it reads the state, to checks if we have data which is already snapshotted, but not yet committed to the table. If we have uncommitted data we commit it at this stage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org