This is an automated email from the ASF dual-hosted git repository.
MaxGekk pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new 4c39501445f4 [SPARK-57184][SQL] Recurse into CalendarInterval children
when appending a null struct
4c39501445f4 is described below
commit 4c39501445f451de03e4918e98d85e1352fadadd
Author: Maxim Gekk <[email protected]>
AuthorDate: Mon Jun 1 11:17:48 2026 +0200
[SPARK-57184][SQL] Recurse into CalendarInterval children when appending a
null struct
### What changes were proposed in this pull request?
Include `CalendarIntervalType` in the recursion guard of
`WritableColumnVector.appendStruct(boolean isNull)`, so that appending a NULL
parent struct cascades `appendStruct(true)` into a `CalendarInterval` child
column and advances all three of its grandchild columns (months / days /
microseconds).
```java
for (WritableColumnVector c: childColumns) {
if (c.type instanceof StructType || c.type instanceof VariantType
|| c.type instanceof CalendarIntervalType) {
c.appendStruct(true);
} else {
c.appendNull();
}
}
```
This was split out of the nanosecond-timestamp `ColumnVector` PR
(SPARK-57100, #56198) per review, since it is an independent fix.
### Why are the changes needed?
A `CalendarInterval` column is struct-shaped: it is backed by three
grandchild primitive columns (`months` as int, `days` as int, `microseconds` as
long). The recursion guard in `appendStruct` only handled `StructType` and
`VariantType`, so an interval child column took the `else` branch
(`c.appendNull()`), which advances only the interval column's own cursor and
leaves its three grandchild cursors un-advanced.
As a result, for a struct column with a `CalendarInterval` field, appending
a NULL parent row left the interval's grandchild cursors behind by one. A
subsequent non-null row then wrote its `months`/`days`/`microseconds` into the
wrong (earlier) grandchild slots, and reading that row back returned a skewed
value - silent data corruption for the nested struct-of-interval case.
### Does this PR introduce _any_ user-facing change?
Yes. Reading back a struct-of-interval column that contains a NULL parent
row followed by a non-null row now returns the correct interval value instead
of a skewed one. Previously it returned corrupted data.
### How was this patch tested?
Added a unit test to `ColumnarBatchSuite` that uses `RowToColumnConverter`
to convert a null parent struct followed by non-null struct-of-interval rows,
and verifies the interval values are read back correctly. The test fails
without the fix and passes with it.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.8
Closes #56235 from MaxGekk/fix-appendstruct-interval.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
(cherry picked from commit 18f2bac5cb17dba85c6c74dfb77df4ba34b27f3c)
Signed-off-by: Max Gekk <[email protected]>
---
.../execution/vectorized/WritableColumnVector.java | 1 +
.../execution/vectorized/ColumnarBatchSuite.scala | 31 ++++++++++++++++++++++
2 files changed, 32 insertions(+)
diff --git
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
index 302840dbe2f3..708eb8b4146f 100644
---
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
+++
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
@@ -763,6 +763,7 @@ public abstract class WritableColumnVector extends
ColumnVector {
elementsAppended++;
for (WritableColumnVector c: childColumns) {
if (c.type instanceof StructType || c.type instanceof VariantType
+ || c.type instanceof CalendarIntervalType
|| c.type instanceof TimestampNTZNanosType
|| c.type instanceof TimestampLTZNanosType) {
c.appendStruct(true);
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
index 3b5f05210dee..4825d662a295 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
@@ -1977,6 +1977,37 @@ class ColumnarBatchSuite extends SparkFunSuite {
}
}
+ test("SPARK-57184: appendStruct(true) recurses into CalendarInterval child
columns") {
+ // A struct column whose only field is a CalendarInterval. When the parent
struct is
+ // appended as null, the recursion must also advance the interval child's
grandchild
+ // columns (months / days / microseconds). Otherwise a subsequent non-null
row's interval
+ // would be read from skewed grandchild slots.
+ val schema = new StructType()
+ .add("s", new StructType().add("cal", CalendarIntervalType))
+ val converter = new RowToColumnConverter(schema)
+ val columns = OnHeapColumnVector.allocateColumns(3, schema)
+ try {
+ // row 0: null parent struct.
+ converter.convert(new GenericInternalRow(Array[Any](null)),
columns.toArray)
+ // row 1: non-null struct holding interval (1, 2, 3).
+ converter.convert(
+ new GenericInternalRow(Array[Any](
+ new GenericInternalRow(Array[Any](new CalendarInterval(1, 2, 3))))),
+ columns.toArray)
+ // row 2: non-null struct holding interval (4, 5, 6).
+ converter.convert(
+ new GenericInternalRow(Array[Any](
+ new GenericInternalRow(Array[Any](new CalendarInterval(4, 5, 6))))),
+ columns.toArray)
+
+ assert(columns(0).isNullAt(0))
+ assert(columns(0).getStruct(1).getInterval(0) === new
CalendarInterval(1, 2, 3))
+ assert(columns(0).getStruct(2).getInterval(0) === new
CalendarInterval(4, 5, 6))
+ } finally {
+ columns.foreach(_.close())
+ }
+ }
+
testVector("Decimal API", 4, DecimalType.IntDecimal) {
column =>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]