Re: [PR] Docs: Add documentation for Rate limiting in Spark Structured Streaming [iceberg]

via GitHub Wed, 19 Feb 2025 09:56:09 -0800


RussellSpitzer commented on code in PR #12217:
URL: https://github.com/apache/iceberg/pull/12217#discussion_r1962134783



##########
docs/docs/spark-configuration.md:
##########
@@ -165,6 +165,8 @@ spark.read
 | vectorization-enabled  | As per table property | Overrides this table's 
read.parquet.vectorization.enabled                                          |
 | batch-size  | As per table property | Overrides this table's 
read.parquet.vectorization.batch-size                                          |
 | stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; 
if before the oldest known ancestor snapshot, the oldest will be used |
+| streaming-max-files-per-micro-batch | INT_MAX | Maximum number of files per 
microbatch |
+| streaming-max-rows-per-micro-batch  | INT_MAX | Maximum number of rows per 
microbatch. This number should be greater than the number of records in any 
data file in the table. The smallest unit that will be streamed is a single 
file, so if a data file contains more records than this limit, the stream will 
get stuck at this file.|

Review Comment:
   nit: we try to keep these  descriptions small. I think you can add a * or 
note if you have more info you want in here, especially since it seems like we 
are just describing a  bug



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Add documentation for Rate limiting in Spark Structured Streaming [iceberg]

Reply via email to