[GitHub] [iceberg] namrathamyske opened a new issue, #7885: Rate limiting feature for structured streaming

via GitHub Thu, 22 Jun 2023 17:22:43 -0700


namrathamyske opened a new issue, #7885:
URL: https://github.com/apache/iceberg/issues/7885


   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   In rate limiting for structured streaming PR -  
https://github.com/apache/iceberg/pull/4479.  
   According to 
https://github.com/apache/iceberg/pull/4479/files#diff-26782bf5c27f69e5cc9cd4a9363f601a97d1c9f97fe0c1a7fb927da7c60c014fR169
 unit test, it says the stream get stuck if 
SparkReadOptions.STREAMING_MAX_ROWS_PER_MICRO_BATCH is not respected. It's is a 
major blocker to consume this feature. If the stream is stuck, then no further 
advancement of stream takes place even if new snapshots comes in.
   
   E.g.:
   STREAMING_MAX_ROWS_PER_MICRO_BATCH - 2
   Snapshot1 - (2 records, 1 file) - Read fully in Microbatch-1
   Snapshot2 - (3 records, 1 file) - Can never be read as 3 records > 
STREAMING_MAX_ROWS_PER_MICRO_BATCH ( Stuck forever )
   Snapshot3 - 3 records
   Please let me know if this is intended behavior or is it expected to change. 
   
   @singhpk234 @jackye1995 @RussellSpitzer @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] namrathamyske opened a new issue, #7885: Rate limiting feature for structured streaming

Reply via email to