rodmeneses commented on issue #10892:
URL: https://github.com/apache/iceberg/issues/10892#issuecomment-2302492563

   > For the 3rd point:
   > 
   > * Flink uses two phase commits. In the 1st phase the data is written to a 
temp manifest file, and the file path is stored into the state.
   > 
   > So if there is a failure between the 2 commit phases, it could happen that 
the data is available in the temp manifest file, but not yet committed to the 
Iceberg table. This should be considered before throwing an error.
   > 
   > But if the current snapshot of the Iceberg table is newer than the 
checkpoint we restore from, then it would be fine to revert to the given 
snapshot - we need to throw an exception if there was any concurrent writes to 
the table in the meantime (some other writers might written data independently).
   > 
   > Also we should examine what we can do with the new IcebergSink. 
@rodmeneses could you please chime in?
   
   Both the current FlinkSink and the incoming IcebergSink shares the same 
logic for `getMaxCommittedCheckpointId`. Both of them are also skipping 
checkpoints before this maxCheckpointId. In the new IcebergSink, we are also 
calling `signalAlreadyCommitted` for each checkpoint that is being skipped. 
   So I'd say that the same issue will also impact the new IcebergSink. 
   @lkokhreidze Thanks for reporting this and volunteering to work in a fix. 
I'd say the first thing would be to create unit test to reproduce this issue. 
Once you have them, kinda let me know and I can port them over to the new 
IcebergSink implementation. 
   cc: @pvary @stevenzwu  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to