pvary commented on PR #10526: URL: https://github.com/apache/iceberg/pull/10526#issuecomment-2238165277
> @zhongqishang @pvary I have a uber question. > > let's say checkpoint N was cancelled or timed out and checkpoint N+1 completed successfully. In this case, do we know all the writer subtasks have flushed data files for checkpoint N and all write results have all been received by the committer? I think we only need to be sure, that every writer which has successfully closed on a given checkpoint is added in an Iceberg commit for the given checkpoint. If some of the writers are not closed, then they will keep collecting consistent data, and the results of the next checkpoint will be consistent. What we should be aware, is that the Iceberg commit for the non-successful checkpoint might not contain everything which is received up to the checkpoint time. I think this is ok, since the users could check if the given checkpoint was successful or not -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org