(spark) branch master updated: [PYTHON][MINOR] Decrease default `arrowMaxBytesPerBatch` for arrow-optimized UDF

ruifengz Wed, 25 Jun 2025 00:29:10 -0700

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new bde09039fb2f [PYTHON][MINOR] Decrease default `arrowMaxBytesPerBatch` 
for arrow-optimized UDF
bde09039fb2f is described below

commit bde09039fb2f26eea996733bab11182440d7bbc7
Author: Amanda Liu <[email protected]>
AuthorDate: Wed Jun 25 15:28:43 2025 +0800

    [PYTHON][MINOR] Decrease default `arrowMaxBytesPerBatch` for 
arrow-optimized UDF
    
    ### What changes were proposed in this pull request?
    
    Decrease default `arrowMaxBytesPerBatch` config for arrow-optimized UDF to 
64MB. The previous max config of 256MB is large and has higher risk of OOMs for 
large row input sizes.
    
    ### Why are the changes needed?
    
    The previous max config of 256MB is large and has higher risk of OOMs for 
large row input sizes.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Existing tests
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #51277 from asl3/arrowMaxBytesPerBatch-default.
    
    Authored-by: Amanda Liu <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 0fe08fc719a8..48feb26d653b 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3648,7 +3648,7 @@ object SQLConf {
         errorMsg = "The value of " +
           "spark.sql.execution.arrow.maxBytesPerBatch should be greater " +
           "than zero and less than INT_MAX.")
-      .createWithDefaultString("256MB")
+      .createWithDefaultString("64MB")
 
   val ARROW_TRANSFORM_WITH_STATE_IN_PYSPARK_MAX_STATE_RECORDS_PER_BATCH =
     
buildConf("spark.sql.execution.arrow.transformWithStateInPySpark.maxStateRecordsPerBatch")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [PYTHON][MINOR] Decrease default `arrowMaxBytesPerBatch` for arrow-optimized UDF

Reply via email to