This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new dd22010c7a78 [SPARK-55681][SQL] Fix singleton DataType equality after
deserialization
dd22010c7a78 is described below
commit dd22010c7a781bf4bb073bb44c6dacfce92c614a
Author: Tim Lee <[email protected]>
AuthorDate: Wed Feb 25 17:52:23 2026 +0800
[SPARK-55681][SQL] Fix singleton DataType equality after deserialization
### What changes were proposed in this pull request?
Override `equals()` and `hashCode()` on 14 singleton `DataType` classes so
that non-singleton instances compare equal to the case object singletons:
```
BinaryType, BooleanType, ByteType, ShortType, IntegerType, LongType,
FloatType, DoubleType, DateType, TimestampType, TimestampNTZType, NullType,
CalendarIntervalType, VariantType
```
For each type:
```
override def equals(obj: Any): Boolean = obj.isInstanceOf[XType]
override def hashCode(): Int = classOf[XType].getSimpleName.hashCode
```
`getSimpleName` is used because Scala's auto-generated hashCode for 0-arity
case objects returns `productPrefix.hashCode` (the simple class name). This
preserves the exact same hash values, avoiding any change in hash-dependent
code paths.
Other DataTypes did not need this change:
- VarcharType, CharType, TimeType, GeometryType, GeographyType — already
matched by type (case _: XType
=>) across the codebase
- StringType — has custom equals() comparing collationId
- DecimalType, ArrayType, MapType, StructType — case classes with
auto-generated equals()
### Why are the changes needed?
Scala case object pattern matching (e.g., `case BinaryType =>`) relies on
`equals()`, which for case objects defaults to reference equality. If a
non-singleton instance of a DataType class is created at runtime — through any
serialization framework that bypasses `readResolve()` — every case `BinaryType`
=> match in the codebase silently falls through, leading to errors like ([code
pointer](https://github.com/apache/spark/blob/3e4df1c1c1e1d9f594a6afcf29b3bae05cc8a348/sql/catalyst/src/m
[...]
```
IllegalStateException: The data type 'binary' is not supported in
generating a writer function...
```
Although the constructors are private, this is a compile-time guard only —
serialization frameworks
bypass constructors at runtime, so non-singleton instances can be created.
### Does this PR introduce _any_ user-facing change?
Yes. Before this change, if a non-singleton DataType instance was created
through deserialization, pattern matches like `case BinaryType =>` would
silently fail, leading to non-deterministic runtime errors. After this change,
non-singleton instances are correctly recognized as equal to the singleton, and
pattern matching works as expected.
### How was this patch tested?
1. Unit test in DataTypeSuite (singleton DataType equality after
deserialization): creates non-singleton instances via reflection for all 14
types, verifies bidirectional equality with singletons, matching hashCode, and
correct PhysicalDataType resolution (no fallthrough to
UninitializedPhysicalType).
2. Also verified that ExpressionSetSuite passes unchanged (its magic
hashCode values are calibrated against IntegerType.hashCode, confirming hash
preservation).
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.6)
Closes #54475 from timlee0119/spark-55681-datatype-equality.
Authored-by: Tim Lee <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
.../org/apache/spark/sql/types/BinaryType.scala | 4 +++
.../org/apache/spark/sql/types/BooleanType.scala | 4 +++
.../org/apache/spark/sql/types/ByteType.scala | 4 +++
.../spark/sql/types/CalendarIntervalType.scala | 4 +++
.../org/apache/spark/sql/types/DateType.scala | 4 +++
.../org/apache/spark/sql/types/DoubleType.scala | 4 +++
.../org/apache/spark/sql/types/FloatType.scala | 4 +++
.../org/apache/spark/sql/types/IntegerType.scala | 4 +++
.../org/apache/spark/sql/types/LongType.scala | 4 +++
.../org/apache/spark/sql/types/NullType.scala | 4 +++
.../org/apache/spark/sql/types/ShortType.scala | 4 +++
.../apache/spark/sql/types/TimestampNTZType.scala | 4 +++
.../org/apache/spark/sql/types/TimestampType.scala | 4 +++
.../org/apache/spark/sql/types/VariantType.scala | 4 +++
.../org/apache/spark/sql/types/DataTypeSuite.scala | 42 +++++++++++++++++++++-
15 files changed, 97 insertions(+), 1 deletion(-)
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/BinaryType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/BinaryType.scala
index 20bfd9bf5684..7ece7902b963 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/BinaryType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/BinaryType.scala
@@ -31,6 +31,10 @@ class BinaryType private () extends AtomicType {
*/
override def defaultSize: Int = 100
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[BinaryType]
+
+ override def hashCode(): Int = classOf[BinaryType].getSimpleName.hashCode
+
private[spark] override def asNullable: BinaryType = this
}
diff --git
a/sql/api/src/main/scala/org/apache/spark/sql/types/BooleanType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/BooleanType.scala
index 090c56eaf9af..3427fd2dd392 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/BooleanType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/BooleanType.scala
@@ -32,6 +32,10 @@ class BooleanType private () extends AtomicType {
*/
override def defaultSize: Int = 1
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[BooleanType]
+
+ override def hashCode(): Int = classOf[BooleanType].getSimpleName.hashCode
+
private[spark] override def asNullable: BooleanType = this
}
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/ByteType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/ByteType.scala
index 4a27a00dacb8..924f6a320638 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/ByteType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/ByteType.scala
@@ -34,6 +34,10 @@ class ByteType private () extends IntegralType {
override def simpleString: String = "tinyint"
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[ByteType]
+
+ override def hashCode(): Int = classOf[ByteType].getSimpleName.hashCode
+
private[spark] override def asNullable: ByteType = this
}
diff --git
a/sql/api/src/main/scala/org/apache/spark/sql/types/CalendarIntervalType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/CalendarIntervalType.scala
index f6b6256db041..16deb461204a 100644
---
a/sql/api/src/main/scala/org/apache/spark/sql/types/CalendarIntervalType.scala
+++
b/sql/api/src/main/scala/org/apache/spark/sql/types/CalendarIntervalType.scala
@@ -39,6 +39,10 @@ class CalendarIntervalType private () extends DataType {
override def typeName: String = "interval"
+ override def equals(obj: Any): Boolean =
obj.isInstanceOf[CalendarIntervalType]
+
+ override def hashCode(): Int =
classOf[CalendarIntervalType].getSimpleName.hashCode
+
private[spark] override def asNullable: CalendarIntervalType = this
}
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/DateType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/DateType.scala
index 402c4c0d9529..f363d3423c17 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/DateType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/DateType.scala
@@ -34,6 +34,10 @@ class DateType private () extends DatetimeType {
*/
override def defaultSize: Int = 4
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[DateType]
+
+ override def hashCode(): Int = classOf[DateType].getSimpleName.hashCode
+
private[spark] override def asNullable: DateType = this
}
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
index 873f0c237c6c..e0944f60d749 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
@@ -34,6 +34,10 @@ class DoubleType private () extends FractionalType {
*/
override def defaultSize: Int = 8
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[DoubleType]
+
+ override def hashCode(): Int = classOf[DoubleType].getSimpleName.hashCode
+
private[spark] override def asNullable: DoubleType = this
}
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
index df4b03cd42bd..9888ead91045 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
@@ -34,6 +34,10 @@ class FloatType private () extends FractionalType {
*/
override def defaultSize: Int = 4
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[FloatType]
+
+ override def hashCode(): Int = classOf[FloatType].getSimpleName.hashCode
+
private[spark] override def asNullable: FloatType = this
}
diff --git
a/sql/api/src/main/scala/org/apache/spark/sql/types/IntegerType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/IntegerType.scala
index dc4727cb1215..33485b226715 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/IntegerType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/IntegerType.scala
@@ -34,6 +34,10 @@ class IntegerType private () extends IntegralType {
override def simpleString: String = "int"
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[IntegerType]
+
+ override def hashCode(): Int = classOf[IntegerType].getSimpleName.hashCode
+
private[spark] override def asNullable: IntegerType = this
}
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/LongType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/LongType.scala
index f65c4c70acd2..56f45c25f65c 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/LongType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/LongType.scala
@@ -34,6 +34,10 @@ class LongType private () extends IntegralType {
override def simpleString: String = "bigint"
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[LongType]
+
+ override def hashCode(): Int = classOf[LongType].getSimpleName.hashCode
+
private[spark] override def asNullable: LongType = this
}
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/NullType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/NullType.scala
index 4e7fd3a00a8a..d4cb0c2a95f9 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/NullType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/NullType.scala
@@ -31,6 +31,10 @@ class NullType private () extends DataType {
// Defined with a private constructor so the companion object is the only
possible instantiation.
override def defaultSize: Int = 1
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[NullType]
+
+ override def hashCode(): Int = classOf[NullType].getSimpleName.hashCode
+
private[spark] override def asNullable: NullType = this
override def typeName: String = "void"
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/ShortType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/ShortType.scala
index c3b6bc75facd..45cdf913707a 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/ShortType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/ShortType.scala
@@ -34,6 +34,10 @@ class ShortType private () extends IntegralType {
override def simpleString: String = "smallint"
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[ShortType]
+
+ override def hashCode(): Int = classOf[ShortType].getSimpleName.hashCode
+
private[spark] override def asNullable: ShortType = this
}
diff --git
a/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala
index b08d16f0e2c9..a49e195fc24c 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala
@@ -38,6 +38,10 @@ class TimestampNTZType private () extends DatetimeType {
override def typeName: String = "timestamp_ntz"
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[TimestampNTZType]
+
+ override def hashCode(): Int =
classOf[TimestampNTZType].getSimpleName.hashCode
+
private[spark] override def asNullable: TimestampNTZType = this
}
diff --git
a/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampType.scala
index bf869d1f38c5..7d6c683df334 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampType.scala
@@ -35,6 +35,10 @@ class TimestampType private () extends DatetimeType {
*/
override def defaultSize: Int = 8
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[TimestampType]
+
+ override def hashCode(): Int = classOf[TimestampType].getSimpleName.hashCode
+
private[spark] override def asNullable: TimestampType = this
}
diff --git
a/sql/api/src/main/scala/org/apache/spark/sql/types/VariantType.scala
b/sql/api/src/main/scala/org/apache/spark/sql/types/VariantType.scala
index 4d775c3e1e39..808503c37c2e 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/VariantType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/VariantType.scala
@@ -32,6 +32,10 @@ class VariantType private () extends AtomicType {
// picked and we currently don't have any data to support it. This may need
revisiting later.
override def defaultSize: Int = 2048
+ override def equals(obj: Any): Boolean = obj.isInstanceOf[VariantType]
+
+ override def hashCode(): Int = classOf[VariantType].getSimpleName.hashCode
+
/** This is a no-op because values with VARIANT type are always nullable. */
private[spark] override def asNullable: VariantType = this
}
diff --git
a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
index c698a03d7f34..ce4f5e89be2b 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
@@ -23,7 +23,7 @@ import org.json4s.jackson.JsonMethods
import org.apache.spark.{SparkException, SparkFunSuite,
SparkIllegalArgumentException}
import org.apache.spark.sql.catalyst.analysis.{caseInsensitiveResolution,
caseSensitiveResolution}
import org.apache.spark.sql.catalyst.parser.{CatalystSqlParser, ParseException}
-import org.apache.spark.sql.catalyst.types.DataTypeUtils
+import org.apache.spark.sql.catalyst.types.{DataTypeUtils, PhysicalDataType,
UninitializedPhysicalType}
import org.apache.spark.sql.catalyst.util.{CollationFactory, StringConcat}
import org.apache.spark.sql.types.DataTypeTestUtils.{dayTimeIntervalTypes,
yearMonthIntervalTypes}
@@ -1447,4 +1447,44 @@ class DataTypeSuite extends SparkFunSuite {
condition = "PARSE_SYNTAX_ERROR",
parameters = Map("error" -> "'time'", "hint" -> ""))
}
+
+ test("singleton DataType equality after deserialization") {
+ // Singleton DataTypes that use `case object` pattern matching (e.g.,
`case BinaryType =>`).
+ // If a non-singleton instance is created (e.g., via Kryo deserialization
which doesn't call
+ // readResolve), the pattern match would fail without proper
equals/hashCode overrides.
+ val singletonTypes: Seq[(DataType, Class[_ <: DataType])] = Seq(
+ (BinaryType, classOf[BinaryType]),
+ (BooleanType, classOf[BooleanType]),
+ (ByteType, classOf[ByteType]),
+ (ShortType, classOf[ShortType]),
+ (IntegerType, classOf[IntegerType]),
+ (LongType, classOf[LongType]),
+ (FloatType, classOf[FloatType]),
+ (DoubleType, classOf[DoubleType]),
+ (DateType, classOf[DateType]),
+ (TimestampType, classOf[TimestampType]),
+ (TimestampNTZType, classOf[TimestampNTZType]),
+ (NullType, classOf[NullType]),
+ (CalendarIntervalType, classOf[CalendarIntervalType]),
+ (VariantType, classOf[VariantType])
+ )
+
+ singletonTypes.foreach { case (singleton, clazz) =>
+ val ctor = clazz.getDeclaredConstructor()
+ ctor.setAccessible(true)
+ val nonSingleton = ctor.newInstance()
+ assert(nonSingleton ne singleton,
+ s"${clazz.getSimpleName}: reflection should create a distinct
instance")
+
+ assert(nonSingleton == singleton,
+ s"${clazz.getSimpleName}: non-singleton == singleton should be true")
+ assert(singleton == nonSingleton,
+ s"${clazz.getSimpleName}: singleton == non-singleton should be true")
+ assert(nonSingleton.hashCode == singleton.hashCode,
+ s"${clazz.getSimpleName}: hashCode should be equal")
+
+ assert(PhysicalDataType(nonSingleton) != UninitializedPhysicalType,
+ s"${clazz.getSimpleName}: PhysicalDataType should recognize
non-singleton instance")
+ }
+ }
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]