chberti opened a new issue, #14622: URL: https://github.com/apache/iceberg/issues/14622
### Feature Request / Improvement Hello ! I’ve been using an Iceberg table as a Spark Structured Streaming input and encountered some errors because of my table settings. Here is the setup : - An Iceberg table, in which a Streaming App we will call **Writer** loads data frequently (30 seconds micro batches) - A maintenance App we might call **Maintainer** ensures older data deletion, metadata files rewrites and snapshots expiry. Important : for legal reasons, I must keep only recent snapshots. I currently have **2 hours snapshot retention**. - Another Streaming App we will call **Reader** reads append Snapshots and does some work. It ignores delete and update snapshots. In my case, if the Writer does nothing for more than 2 hours, then the Reader will fail with a NoSuchSnapshotException. I think this is due to the fact that the table no longer contains the last read snapshot and is in a state where no new append snapshot exists. When reading a Kafka topic, Reader never crashes for this reason. Even if I might find some workaround, I think it would be a great feature to enable Spark apps to continue working properly in this kind of situations. I will do some digging on how Spark stores checkpoints for this kind of input, but I think it should be able to wait for an append snapshot for a long time and even if its current snapshot ID in the checkpoint directory is expired, it should keep the files already read in checkpoint metadata in order to continue reading new files when Writer app starts producing again. Is there some settings I might use to achieve this behavior or is a change in Spark checkpointing needed ? Thank you in advance, not an Iceberg expert for now, but I hope I might be one someday ! Regards, Charles ### Query engine Spark ### Willingness to contribute - [ ] I can contribute this improvement/feature independently - [x] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
