This is an automated email from the ASF dual-hosted git repository.
asf-gitbox-commits pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new 4eacbfe2bdc4 [SPARK-56661] Fixing MapPartitionsExternalUDF to generate
output attributes only once
4eacbfe2bdc4 is described below
commit 4eacbfe2bdc4dd10212372dc1a3192aa91516059
Author: Sven Weber <[email protected]>
AuthorDate: Sat May 30 10:27:34 2026 -0400
[SPARK-56661] Fixing MapPartitionsExternalUDF to generate output attributes
only once
### What changes were proposed in this pull request?
This is a follow-up PR on the recently merged
[55768](https://github.com/apache/spark/pull/55768). I noticed that the
`MapPartitionsExternalUDF ` re-generates its output attributes on every
function call. Instead, we should compute the output attributes once and store
them in a local variable. This behavior is fixed by this PR.
### Why are the changes needed?
The current behavior re-computes the attributes over and over again while
they are not expected to change.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing unit tests for this class.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #56206 from sven-weber-db/sven-weber_data/fix-map-op.
Authored-by: Sven Weber <[email protected]>
Signed-off-by: Herman van Hövell <[email protected]>
(cherry picked from commit d004e0d8567c3c4ccba9311bbc69dcf442555fa1)
Signed-off-by: Herman van Hövell <[email protected]>
---
.../sql/catalyst/plans/logical/logicalExternalUDFOperators.scala | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/logicalExternalUDFOperators.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/logicalExternalUDFOperators.scala
index 49c3e938d5b6..84163e7350f3 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/logicalExternalUDFOperators.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/logicalExternalUDFOperators.scala
@@ -56,11 +56,13 @@ case class MapPartitionsExternalUDF(
child: LogicalPlan)
extends ExternalUDF {
- // Map partitions always operate on StructTypes
- override def output: Seq[Attribute] = toAttributes(
+ val nodeOutputAttributes = toAttributes(
function.dataType.asInstanceOf[StructType]
)
+ // Map partitions always operate on StructTypes
+ override def output: Seq[Attribute] = nodeOutputAttributes
+
override protected def withNewChildInternal(
newChild: LogicalPlan): MapPartitionsExternalUDF =
copy(child = newChild)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]