[I] Spark readStream not progresing [iceberg]

via GitHub Mon, 03 Mar 2025 05:05:33 -0800


davidrobbo opened a new issue, #12444:
URL: https://github.com/apache/iceberg/issues/12444


   ### Apache Iceberg version
   
   1.6.1
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I'm trying to use Spark's readStream functionality to incrementally read 
changes to an Iceberg table on EMR serverless integrated with the AWS Glue 
Catalog, and S3 also used for checkpointing.
   
   ```
   spark.readStream
         .format("iceberg")
         .load(f"${sourceDatabase}.${sourceTable}")
   ```
   
   The initial query sets the offset for to the 79th of the entire 130 snapshot 
versions, and all subsequent runs do not progress.
   
   I've confirmed the base table I read from only has append snapshots, not 
deletes or updates.
   
   It appears as though the query is stuck. I have tried setting a variety of 
different options for similar bugs - more so just to hope they somehow assist 
with the problem as opposed to expecting that they are needed. This includes 
setting `"stream-from-timestamp"` to a value prior to the first snapshot, and 
also using:
   
   ```
   .option("streaming-skip-overwrite-snapshots", "true")
   .option("streaming-skip-delete-snapshots", "true")
   ```
   
   None of which change the behaviour (nor do I expect they should have any 
effect given the append only base table without snapshot expiration)
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Spark readStream not progresing [iceberg]

Reply via email to