[I] Spark Streaming connector read initial snapshot of iceberg table [iceberg]

via GitHub Fri, 30 May 2025 00:43:57 -0700


chaoqin-li1123 opened a new issue, #13188:
URL: https://github.com/apache/iceberg/issues/13188


   ### Feature Request / Improvement
   
   The spark structured streaming connector do not scan all files in initial 
table snapshot.
   
   In our use case, we want the streaming query to process all existing data in 
the table(regardless of whether there is any historical delete or overwrite) 
and then process new append commits incrementally. This is the behavior of the 
delta lake spark streaming connector. But it seems that scanning all data in 
initial snapshot is not supported by iceberg connector. Can this be supported 
by iceberg spark connector?  @singhpk234 @wypoon 
   
   I wonder if it can be supported by marking scanAllFiles field of the initial 
offset as true.
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [ ] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Spark Streaming connector read initial snapshot of iceberg table [iceberg]

Reply via email to