This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 036f591448ec [SPARK-54050][PYTHON][DOCS] Update the documents of
arrow-batching related configures
036f591448ec is described below
commit 036f591448ecd466a018794c2e6cc7a049486200
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Tue Oct 28 12:58:04 2025 +0800
[SPARK-54050][PYTHON][DOCS] Update the documents of arrow-batching related
configures
### What changes were proposed in this pull request?
Update the documents of arrow-batching related configures
### Why are the changes needed?
remove
```
This configuration is not effective for the grouping API such as
DataFrame(.cogroup).groupby.applyInPandas because each group becomes each
ArrowRecordBatch.
```
to reflect recent changes on arrow-batching
### Does this PR introduce _any_ user-facing change?
yes, doc-only changes
### How was this patch tested?
CI
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #52753 from zhengruifeng/update_doc_max_records.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
.../main/scala/org/apache/spark/sql/internal/SQLConf.scala | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 71a01d4c0700..46629aaca776 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3905,9 +3905,7 @@ object SQLConf {
val ARROW_EXECUTION_MAX_RECORDS_PER_BATCH =
buildConf("spark.sql.execution.arrow.maxRecordsPerBatch")
.doc("When using Apache Arrow, limit the maximum number of records that
can be written " +
- "to a single ArrowRecordBatch in memory. This configuration is not
effective for the " +
- "grouping API such as DataFrame(.cogroup).groupby.applyInPandas
because each group " +
- "becomes each ArrowRecordBatch. If set to zero or negative there is no
limit. " +
+ "to a single ArrowRecordBatch in memory. If set to zero or negative
there is no limit. " +
"See also spark.sql.execution.arrow.maxBytesPerBatch. If both are set,
each batch " +
"is created when any condition of both is met.")
.version("2.3.0")
@@ -3950,11 +3948,9 @@ object SQLConf {
buildConf("spark.sql.execution.arrow.maxBytesPerBatch")
.internal()
.doc("When using Apache Arrow, limit the maximum bytes in each batch
that can be written " +
- "to a single ArrowRecordBatch in memory. This configuration is not
effective for the " +
- "grouping API such as DataFrame(.cogroup).groupby.applyInPandas
because each group " +
- "becomes each ArrowRecordBatch. Unlike
'spark.sql.execution.arrow.maxRecordsPerBatch', " +
- "this configuration does not work for createDataFrame/toPandas with
Arrow/pandas " +
- "instances. " +
+ "to a single ArrowRecordBatch in memory. " +
+ "Unlike 'spark.sql.execution.arrow.maxRecordsPerBatch', this
configuration does not " +
+ "work for createDataFrame/toPandas with Arrow/pandas instances. " +
"See also spark.sql.execution.arrow.maxRecordsPerBatch. If both are
set, each batch " +
"is created when any condition of both is met.")
.version("4.0.0")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]