sanchay0 commented on issue #11894: URL: https://github.com/apache/iceberg/issues/11894#issuecomment-2569685261
Thanks @pvary, explaining my job setup will provide helpful context. Setup is straightforward: just reads from Kafka, deserializes it, and writes to Iceberg. The sink has this setup: ``` Multiple StreamWriters (parallelism=360) Single FileCommitter (parallelism=1) Writer 1 (temp files) --------┐ Writer 2 (temp files) --------├--> Committer ---> Final Table State Writer n (temp files) --------┘ ``` There is a reshuffle of data between the temporary file writers and the committer. However, it seems that the writers are not contributing to the checkpoint/savepoint data size, whereas the committer is. > Even if the committer doesn't write the data to the Iceberg table, the savepoint is only successful if the temp files are already written out. Based on this, however, it seems like checkpoints should also track the temp files as well? Could it be that when I restored from a savepoint, any temp files that were written but not yet committed are essentially "lost"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org