This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new b969ffe5d816 [SPARK-50752][SQL][PYSPARK][FOLLOWUP] Respect PYTHON_UDF_MAX_RECORDS_PER_BATCH in Python worker b969ffe5d816 is described below commit b969ffe5d81616151efad0bfef2a345aa51f988b Author: Jungtaek Lim <kabhwan.opensou...@gmail.com> AuthorDate: Mon Jun 9 14:37:14 2025 +0900 [SPARK-50752][SQL][PYSPARK][FOLLOWUP] Respect PYTHON_UDF_MAX_RECORDS_PER_BATCH in Python worker ### What changes were proposed in this pull request? This PR is a follow-up of SPARK-50752 (PR #49397), which introduced PYTHON_UDF_MAX_RECORDS_PER_BATCH into SQLConf to control the batch size of normal Python UDF. This PR is to fix the config to be effective for Python worker side as well. ### Why are the changes needed? The original PR enabled the control of batch size for JVM side, but it wasn't properly propagated to Python worker since we missed to override the value for Python UDF. (Default value before overriding is a static one, 100) ### Does this PR introduce _any_ user-facing change? Yes, but it's about tuning and does not impact any change on the output. ### How was this patch tested? No automated test, since there is no good way to test this properly since missing this does not change the output itself. This was manually tested with artificial debug logging. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51124 from HeartSaVioR/SPARK-50752-FOLLOWUP-respect-PYTHON_UDF_MAX_RECORDS_PER_BATCH-in-python-worker. Authored-by: Jungtaek Lim <kabhwan.opensou...@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensou...@gmail.com> --- .../scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala index 11773f92c482..4baddcd4d9e7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala @@ -50,6 +50,8 @@ abstract class BasePythonUDFRunner( override val killOnIdleTimeout: Boolean = SQLConf.get.pythonUDFWorkerKillOnIdleTimeout override val bufferSize: Int = SQLConf.get.getConf(SQLConf.PYTHON_UDF_BUFFER_SIZE) + override val batchSizeForPythonUDF: Int = + SQLConf.get.getConf(SQLConf.PYTHON_UDF_MAX_RECORDS_PER_BATCH) protected def writeUDF(dataOut: DataOutputStream): Unit --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org