davidrobbo opened a new issue, #12444: URL: https://github.com/apache/iceberg/issues/12444
### Apache Iceberg version 1.6.1 ### Query engine Spark ### Please describe the bug 🐞 I'm trying to use Spark's readStream functionality to incrementally read changes to an Iceberg table on EMR serverless integrated with the AWS Glue Catalog, and S3 also used for checkpointing. ``` spark.readStream .format("iceberg") .load(f"${sourceDatabase}.${sourceTable}") ``` The initial query sets the offset for to the 79th of the entire 130 snapshot versions, and all subsequent runs do not progress. I've confirmed the base table I read from only has append snapshots, not deletes or updates. It appears as though the query is stuck. I have tried setting a variety of different options for similar bugs - more so just to hope they somehow assist with the problem as opposed to expecting that they are needed. This includes setting `"stream-from-timestamp"` to a value prior to the first snapshot, and also using: ``` .option("streaming-skip-overwrite-snapshots", "true") .option("streaming-skip-delete-snapshots", "true") ``` None of which change the behaviour (nor do I expect they should have any effect given the append only base table without snapshot expiration) ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org