VidakM commented on issue #9948: URL: https://github.com/apache/iceberg/issues/9948#issuecomment-1996740393
Thanks for getting back to me. Looking at the docs I also see that Spark Structured Streaming has a similar warning of only supporting appends. A bit too bad, as it would unlock a lot of interesting solutions. Is Flink and Spark append+upsert (delete) streaming not something you want to support due to some different preferred way or is it just later in the roadmap? If so any idea on timeline? For us this would unlock plenty of interesting use-cases for Iceberg. Atm we dump raw events into Iceberg for replay storage, with Flink we also unpack them and tidy them a little bit. Call it a silver layer if you feel like. But if we could subscribe to the more complex upsert silver writes (late arrival joins, aggregations, pivots, partial GDPR deletes), we could actually automate pipelines that maintain more “golden” tables. For data science teams to gain clean generated and maintained tables, SuperSet users better views, easier to stream transformed data from Iceberg to other products etc. Now we would perhaps need to use Kafka or something as a buffer and do dual writes to gain CDC for upsert. It would also circumvent some of the Flink limitations, such as not needing to buffer in memory as much for late arrivals, as it lacks some good transformation SQL that Spark has. We would happily help contribute if possible and given a few pointers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org