rodmeneses commented on issue #10892: URL: https://github.com/apache/iceberg/issues/10892#issuecomment-2302492563
> For the 3rd point: > > * Flink uses two phase commits. In the 1st phase the data is written to a temp manifest file, and the file path is stored into the state. > > So if there is a failure between the 2 commit phases, it could happen that the data is available in the temp manifest file, but not yet committed to the Iceberg table. This should be considered before throwing an error. > > But if the current snapshot of the Iceberg table is newer than the checkpoint we restore from, then it would be fine to revert to the given snapshot - we need to throw an exception if there was any concurrent writes to the table in the meantime (some other writers might written data independently). > > Also we should examine what we can do with the new IcebergSink. @rodmeneses could you please chime in? Both the current FlinkSink and the incoming IcebergSink shares the same logic for `getMaxCommittedCheckpointId`. Both of them are also skipping checkpoints before this maxCheckpointId. In the new IcebergSink, we are also calling `signalAlreadyCommitted` for each checkpoint that is being skipped. So I'd say that the same issue will also impact the new IcebergSink. @lkokhreidze Thanks for reporting this and volunteering to work in a fix. I'd say the first thing would be to create unit test to reproduce this issue. Once you have them, kinda let me know and I can port them over to the new IcebergSink implementation. cc: @pvary @stevenzwu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org