stevenzwu commented on code in PR #12988: URL: https://github.com/apache/iceberg/pull/12988#discussion_r2162658679
########## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestStructuredStreamingRead3.java: ########## @@ -200,7 +198,7 @@ public void testReadStreamWithMaxRows2() throws Exception { assertThat( microBatchCount( ImmutableMap.of(SparkReadOptions.STREAMING_MAX_ROWS_PER_MICRO_BATCH, "2"))) - .isEqualTo(4); Review Comment: why is the micro batch count 4 previously? ########## docs/docs/spark-configuration.md: ########## @@ -225,8 +225,7 @@ spark.read | streaming-max-rows-per-micro-batch | INT_MAX | Maximum number of rows per microbatch | !!! warning - streaming-max-rows-per-micro-batch should always be greater than the number of records in any data file in the table. - The smallest unit that will be streamed is a single file, so if a data file contains more records than this limit, the stream will get stuck at this file. + streaming-max-rows-per-micro-batch option sets a “soft max”, a batch will always include all the rows in the next unprocessed data file but additional files will not be included if doing so would exceed the soft-max. Review Comment: nit: `soft-max` -> `soft limit` at the end of the sentence -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org