(spark) branch branch-4.x updated: [SPARK-57184][SQL] Recurse into CalendarInterval children when appending a null struct

maxgekk Mon, 01 Jun 2026 02:18:30 -0700

This is an automated email from the ASF dual-hosted git repository.

MaxGekk pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.x by this push:
     new 4c39501445f4 [SPARK-57184][SQL] Recurse into CalendarInterval children 
when appending a null struct
4c39501445f4 is described below

commit 4c39501445f451de03e4918e98d85e1352fadadd
Author: Maxim Gekk <[email protected]>
AuthorDate: Mon Jun 1 11:17:48 2026 +0200

    [SPARK-57184][SQL] Recurse into CalendarInterval children when appending a 
null struct
    
    ### What changes were proposed in this pull request?
    
    Include `CalendarIntervalType` in the recursion guard of 
`WritableColumnVector.appendStruct(boolean isNull)`, so that appending a NULL 
parent struct cascades `appendStruct(true)` into a `CalendarInterval` child 
column and advances all three of its grandchild columns (months / days / 
microseconds).
    
    ```java
    for (WritableColumnVector c: childColumns) {
      if (c.type instanceof StructType || c.type instanceof VariantType
          || c.type instanceof CalendarIntervalType) {
        c.appendStruct(true);
      } else {
        c.appendNull();
      }
    }
    ```
    
    This was split out of the nanosecond-timestamp `ColumnVector` PR 
(SPARK-57100, #56198) per review, since it is an independent fix.
    
    ### Why are the changes needed?
    
    A `CalendarInterval` column is struct-shaped: it is backed by three 
grandchild primitive columns (`months` as int, `days` as int, `microseconds` as 
long). The recursion guard in `appendStruct` only handled `StructType` and 
`VariantType`, so an interval child column took the `else` branch 
(`c.appendNull()`), which advances only the interval column's own cursor and 
leaves its three grandchild cursors un-advanced.
    
    As a result, for a struct column with a `CalendarInterval` field, appending 
a NULL parent row left the interval's grandchild cursors behind by one. A 
subsequent non-null row then wrote its `months`/`days`/`microseconds` into the 
wrong (earlier) grandchild slots, and reading that row back returned a skewed 
value - silent data corruption for the nested struct-of-interval case.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes. Reading back a struct-of-interval column that contains a NULL parent 
row followed by a non-null row now returns the correct interval value instead 
of a skewed one. Previously it returned corrupted data.
    
    ### How was this patch tested?
    
    Added a unit test to `ColumnarBatchSuite` that uses `RowToColumnConverter` 
to convert a null parent struct followed by non-null struct-of-interval rows, 
and verifies the interval values are read back correctly. The test fails 
without the fix and passes with it.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Opus 4.8
    
    Closes #56235 from MaxGekk/fix-appendstruct-interval.
    
    Authored-by: Maxim Gekk <[email protected]>
    Signed-off-by: Max Gekk <[email protected]>
    (cherry picked from commit 18f2bac5cb17dba85c6c74dfb77df4ba34b27f3c)
    Signed-off-by: Max Gekk <[email protected]>
---
 .../execution/vectorized/WritableColumnVector.java |  1 +
 .../execution/vectorized/ColumnarBatchSuite.scala  | 31 ++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
index 302840dbe2f3..708eb8b4146f 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
@@ -763,6 +763,7 @@ public abstract class WritableColumnVector extends 
ColumnVector {
       elementsAppended++;
       for (WritableColumnVector c: childColumns) {
         if (c.type instanceof StructType || c.type instanceof VariantType
+            || c.type instanceof CalendarIntervalType
             || c.type instanceof TimestampNTZNanosType
             || c.type instanceof TimestampLTZNanosType) {
           c.appendStruct(true);
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
index 3b5f05210dee..4825d662a295 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
@@ -1977,6 +1977,37 @@ class ColumnarBatchSuite extends SparkFunSuite {
     }
   }
 
+  test("SPARK-57184: appendStruct(true) recurses into CalendarInterval child 
columns") {
+    // A struct column whose only field is a CalendarInterval. When the parent 
struct is
+    // appended as null, the recursion must also advance the interval child's 
grandchild
+    // columns (months / days / microseconds). Otherwise a subsequent non-null 
row's interval
+    // would be read from skewed grandchild slots.
+    val schema = new StructType()
+      .add("s", new StructType().add("cal", CalendarIntervalType))
+    val converter = new RowToColumnConverter(schema)
+    val columns = OnHeapColumnVector.allocateColumns(3, schema)
+    try {
+      // row 0: null parent struct.
+      converter.convert(new GenericInternalRow(Array[Any](null)), 
columns.toArray)
+      // row 1: non-null struct holding interval (1, 2, 3).
+      converter.convert(
+        new GenericInternalRow(Array[Any](
+          new GenericInternalRow(Array[Any](new CalendarInterval(1, 2, 3))))),
+        columns.toArray)
+      // row 2: non-null struct holding interval (4, 5, 6).
+      converter.convert(
+        new GenericInternalRow(Array[Any](
+          new GenericInternalRow(Array[Any](new CalendarInterval(4, 5, 6))))),
+        columns.toArray)
+
+      assert(columns(0).isNullAt(0))
+      assert(columns(0).getStruct(1).getInterval(0) === new 
CalendarInterval(1, 2, 3))
+      assert(columns(0).getStruct(2).getInterval(0) === new 
CalendarInterval(4, 5, 6))
+    } finally {
+      columns.foreach(_.close())
+    }
+  }
+
   testVector("Decimal API", 4, DecimalType.IntDecimal) {
     column =>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.x updated: [SPARK-57184][SQL] Recurse into CalendarInterval children when appending a null struct

Reply via email to