(spark) branch branch-4.0 updated: [SPARK-55802][SQL][4.0] Fix integer overflow when computing Arrow batch bytes

viirya Wed, 04 Mar 2026 10:17:42 -0800

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.0 by this push:
     new 9d45e6fc36b0 [SPARK-55802][SQL][4.0] Fix integer overflow when 
computing Arrow batch bytes
9d45e6fc36b0 is described below

commit 9d45e6fc36b02c84051743bcad7a372688c2940a
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Wed Mar 4 10:17:21 2026 -0800

    [SPARK-55802][SQL][4.0] Fix integer overflow when computing Arrow batch 
bytes
    
    ### What changes were proposed in this pull request?
    
    ### Why are the changes needed?
    
    `ArrowWriter.sizeInBytes()` accumulated per-column buffer sizes (each an 
`Int`) into an `Int` accumulator. When the total exceeds 2 GB the sum silently 
wraps negative, causing the byte-limit check controlled by 
`spark.sql.execution.arrow.maxBytesPerBatch` to behave incorrectly and 
potentially allow oversized batches through.
    
    Fix by changing the accumulator and return type to `Long`.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Existing tests.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Sonnet 4.6 <noreplyanthropic.com>
    
    Closes #54624 from viirya/backport-arrow-batch-bytes-overflow-branch-4.0.
    
    Authored-by: Liang-Chi Hsieh <[email protected]>
    Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
 .../main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
index d91b6de9b1df..4a68cf6c8f9f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
@@ -112,9 +112,9 @@ class ArrowWriter(val root: VectorSchemaRoot, fields: 
Array[ArrowFieldWriter]) {
     count += 1
   }
 
-  def sizeInBytes(): Int = {
+  def sizeInBytes(): Long = {
     var i = 0
-    var bytes = 0
+    var bytes = 0L
     while (i < fields.size) {
       bytes += fields(i).getSizeInBytes()
       i += 1


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.0 updated: [SPARK-55802][SQL][4.0] Fix integer overflow when computing Arrow batch bytes

Reply via email to