[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #6480: Spark: Fail streaming planning when snapshot not found

GitBox Wed, 21 Dec 2022 20:01:15 -0800


amogh-jahagirdar opened a new pull request, #6480:
URL: https://github.com/apache/iceberg/pull/6480

Fixing error handling for https://github.com/apache/iceberg/issues/6388.

Based on the stack trace the following sequence of events seems plausible.

1.) [The snapshot ID for current offset no longer exists (my hunch is due to
expiration)](https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java#L210).
So table.snapshot(currentOffset.snapshotId()) returns null.

2.) Then planning throws an unclear
[NPE](https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java#L229)
here when trying to get the operation associated with the snapshot.

In this approach, planning fails altogether since for streaming it's
required that there is a known chain of snapshot.
Although would appreciate feedback from folks more familiar with Spark
@RussellSpitzer @aokolnychyi @singhpk234 @rajarshisarkar . Not sure if we can
safely just skip the snapshot since a consumer of the stream technically is not
consuming the original state of the table.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #6480: Spark: Fail streaming planning when snapshot not found

Reply via email to