pvary commented on issue #9089:
URL: https://github.com/apache/iceberg/issues/9089#issuecomment-1843199768

   Flink Writer operators write the data files to their final place - to the 
data directory of the table. Then sends the file metadata (file name, 
statistics, etc) to the Flink Committer operator.
   On snapshot/checkpoint the Flink Committer does a 2 phase commit:
   - Phase 1 - snapshotState - creates a single temporary manifest file in the 
metadata directory, which stores the metadata for every files received in the 
given snapshot. Also stores the manifest file path (this is a bit simplified 
for clarity) in the job state.
   - Phase 2 - notifyCheckpointComplete - reads the temporary manifest file and 
creates an iceberg commit from it. The commit summary for this commit will 
contain:
       - JobId
       - OperatorId
       - CheckpointId
   
   When the Flink job restarts, it reads the table commits, and finds the 
highest checkpointId for the jobId and operatorId in the history. Also it reads 
the state, to checks if we have data which is already snapshotted, but not yet 
committed to the table. If we have uncommitted data we commit it at this stage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to