stevenzwu commented on code in PR #12988:
URL: https://github.com/apache/iceberg/pull/12988#discussion_r2162658679


##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestStructuredStreamingRead3.java:
##########
@@ -200,7 +198,7 @@ public void testReadStreamWithMaxRows2() throws Exception {
     assertThat(
             microBatchCount(
                 
ImmutableMap.of(SparkReadOptions.STREAMING_MAX_ROWS_PER_MICRO_BATCH, "2")))
-        .isEqualTo(4);

Review Comment:
   why is the micro batch count 4 previously?



##########
docs/docs/spark-configuration.md:
##########
@@ -225,8 +225,7 @@ spark.read
 | streaming-max-rows-per-micro-batch  | INT_MAX | Maximum number of rows per 
microbatch |
 
 !!! warning
-    streaming-max-rows-per-micro-batch should always be greater than the 
number of records in any data file in the table.
-    The smallest unit that will be streamed is a single file, so if a data 
file contains more records than this limit, the stream will get stuck at this 
file.
+    streaming-max-rows-per-micro-batch option sets a “soft max”, a batch will 
always include all the rows in the next unprocessed data file but additional 
files will not be included if doing so would exceed the soft-max.

Review Comment:
   nit: `soft-max` -> `soft limit` at the end of the sentence



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to