[I] Data Loss in Flink Job with Iceberg Sink After Restart: How to Ensure Consistent Writes? [iceberg]

via GitHub Tue, 31 Dec 2024 02:05:59 -0800


sanchay0 opened a new issue, #11894:
URL: https://github.com/apache/iceberg/issues/11894

### Query engine

Flink

### Question

I am running a Flink job that reads data from Kafka, processes it into a
Flink [Row
object](https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/types/Row.html),
and writes it to an [Iceberg
Sink](https://iceberg.apache.org/javadoc/1.3.0/org/apache/iceberg/flink/sink/FlinkSink.Builder.html#append--).
To deploy new code changes, I restart the job from the latest savepoint, which
is committed at the Kafka source. This savepoint stores the most recent Kafka
offset read by the job, allowing the subsequent run to continue from the
previous offset, ensuring transparent failure recovery. Recently, I encountered
a scenario where data was lost in the final output after a job restart. Here's
what happened:

- The job reads from Kafka starting at offset 1 and writes the data to an
Iceberg table backed by S3.
- It then reads from offset 2, but the writes to the Iceberg table are
delayed due to backpressure or network issues.
- The job proceeds to read from offset 3.
- Before the data from offset 2 is written to S3, I restart the job. After
the restart, the job begins reading from offset 3, resulting in the loss of
data from offset 2, which was never written to S3.

Is there a workaround for this problem, or is it an inherent limitation of
the Iceberg Sink?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Data Loss in Flink Job with Iceberg Sink After Restart: How to Ensure Consistent Writes? [iceberg]

Reply via email to