This is an automated email from the ASF dual-hosted git repository.
MaxGekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new fa57fa2f5e9c [SPARK-57166][SQL] Reject nanosecond-capable timestamp
types in built-in datasources and JDBC
fa57fa2f5e9c is described below
commit fa57fa2f5e9cfe97ab44e8af59b3ce15c906a0f9
Author: Maxim Gekk <[email protected]>
AuthorDate: Wed Jun 3 10:09:55 2026 +0200
[SPARK-57166][SQL] Reject nanosecond-capable timestamp types in built-in
datasources and JDBC
### What changes were proposed in this pull request?
This PR makes all built-in file datasources and JDBC explicitly **reject**
the nanosecond-capable timestamp types `TimestampNTZNanosType` and
`TimestampLTZNanosType` on both the read and write paths, raising the existing
`UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE` error.
An explicit arm is added before the `case _: AtomicType => true` catch-all
in each datasource's `supportDataType` / `supportsDataType`:
```scala
// Nanosecond-capable timestamps are not yet supported by this datasource.
case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
```
Sources changed:
- V1 `FileFormat.supportDataType`: `ParquetFileFormat`, `OrcFileFormat`
(native), `JsonFileFormat`, `XmlFileFormat`, and the private
`CSVFileFormat.supportDataType(dataType, allowVariant)` (covers both
`supportDataType` and `supportReadDataType`).
- `AvroUtils.supportsDataType` (single edit covering V1 `AvroFileFormat`
and V2 `AvroTable`).
- `sql/hive` `OrcFileFormat.supportDataType` (Hive ORC serde).
- V2 `FileTable.supportsDataType`: `ParquetTable`, `OrcTable`, `JsonTable`,
`CSVTable`.
No code change was needed for:
- Text (string-only, already rejects nanos).
- JDBC read (`getCatalystType` maps JDBC `TIMESTAMP`/`TIME` to microsecond
types only).
- JDBC write (already rejected via the `CreatableRelationProvider` type
whitelist and `JdbcUtils.getCommonJDBCType` returning `None`); a test is added
to lock the behavior.
- XML V2: there is no `XmlTable`; XML is V1-only, so the `XmlFileFormat`
edit covers all XML paths.
The rejection is **unconditional** (independent of the
`spark.sql.timestampNanosTypes.enabled` preview flag). The flag governs only
whether the type may exist; these sources have no real nanos read/write support
yet. As each source adds support later (e.g. Parquet read via SPARK-57102), it
can carve out its own exception.
This is a sub-task of
[SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP:
Timestamps with nanosecond precision).
### Why are the changes needed?
When the preview flag `spark.sql.timestampNanosTypes.enabled` is on, the
nanos types extend `DatetimeType extends AtomicType`, so every file source's
`case _: AtomicType => true` catch-all silently **accepts** them and then
misbehaves at read/write time because no real support exists yet. Users get
confusing downstream failures or silent precision issues instead of a clear,
actionable error. This PR provides a clear guardrail until real support is
implemented per source.
### Does this PR introduce _any_ user-facing change?
Yes, but only for the preview feature gated by
`spark.sql.timestampNanosTypes.enabled` (disabled by default in production).
With the flag enabled, writing or reading a `TIMESTAMP_NTZ(p)` /
`TIMESTAMP_LTZ(p)` (p in [7, 9]) column through Parquet, ORC, Avro, JSON, CSV,
XML (v1 and v2), and Hive ORC now fails with
`UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`, and JDBC write fails with a clear
error. Previously these were silently accepted and then misbehaved. Behavior
for all other types is un [...]
### How was this patch tested?
Added unit tests:
- `FileBasedDataSourceSuite`: a new test that, with
`TIMESTAMP_NANOS_TYPES_ENABLED=true`, iterates over v1 and v2
(`USE_V1_SOURCE_LIST`) and the built-in formats (parquet, orc, json, csv, xml),
asserting that both write and read of a `TIMESTAMP_NTZ(9)` / `TIMESTAMP_LTZ(9)`
column fail with `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`. The nanos column is
built from a typed literal rather than via CAST.
- `AvroSuite`: an equivalent write+read assertion for Avro.
- `JDBCWriteSuite`: a new test asserting that writing a nanos column via
`.write.jdbc(...)` is rejected.
All three suites pass locally, and `sql/core`, `sql/hive`, and
`connector/avro` test sources compile.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)
Closes #56269 from MaxGekk/nanos-ds-off.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
---
.../org/apache/spark/sql/avro/AvroSuite.scala | 43 +++++++++++++++-
.../org/apache/spark/sql/avro/AvroUtils.scala | 3 ++
.../execution/datasources/csv/CSVFileFormat.scala | 3 ++
.../datasources/json/JsonFileFormat.scala | 3 ++
.../execution/datasources/orc/OrcFileFormat.scala | 3 ++
.../datasources/parquet/ParquetFileFormat.scala | 3 ++
.../execution/datasources/v2/csv/CSVTable.scala | 5 +-
.../execution/datasources/v2/json/JsonTable.scala | 3 ++
.../execution/datasources/v2/orc/OrcTable.scala | 3 ++
.../datasources/v2/parquet/ParquetTable.scala | 3 ++
.../execution/datasources/xml/XmlFileFormat.scala | 3 ++
.../spark/sql/FileBasedDataSourceSuite.scala | 60 ++++++++++++++++++++++
.../org/apache/spark/sql/jdbc/JDBCWriteSuite.scala | 25 ++++++++-
.../apache/spark/sql/hive/orc/OrcFileFormat.scala | 2 +
.../spark/sql/hive/orc/HiveOrcSourceSuite.scala | 44 +++++++++++++++-
15 files changed, 202 insertions(+), 4 deletions(-)
diff --git
a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
index 56b107c14f57..c680850e602c 100644
--- a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
+++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
@@ -38,7 +38,7 @@ import org.apache.spark.TestUtils.assertExceptionMsg
import org.apache.spark.sql._
import org.apache.spark.sql.TestingUDT.IntervalData
import org.apache.spark.sql.avro.AvroCompressionCodec._
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Literal}
import org.apache.spark.sql.catalyst.plans.logical.Filter
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils
import
org.apache.spark.sql.catalyst.util.DateTimeTestUtils.{withDefaultTimeZone, LA,
UTC}
@@ -52,6 +52,7 @@ import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.test.SharedSparkSession
import org.apache.spark.sql.types._
import org.apache.spark.sql.v2.avro.AvroScan
+import org.apache.spark.unsafe.types.TimestampNanosVal
import org.apache.spark.util.Utils
abstract class AvroSuite
@@ -3191,6 +3192,46 @@ abstract class AvroSuite
}
}
+ test("SPARK-57166: nanosecond timestamp types are not supported in Avro") {
+ val nanosTypes = Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9))
+ withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") {
+ nanosTypes.foreach { nanosType =>
+ val expectedType = s""""${nanosType.sql}""""
+ withTempDir { dir =>
+ // Write path: a nanos-typed column cannot be written. The nanos
literal is built
+ // directly from its internal value to avoid relying on cast/parser
support.
+ val nanosLiteral = Literal.create(new TimestampNanosVal(0L,
0.toShort), nanosType)
+ val df = spark.range(1).select(Column(nanosLiteral).as("ts"))
+ val writeDir = new File(dir, "write").getCanonicalPath
+ checkError(
+ exception = intercept[AnalysisException] {
+ df.write.format("avro").mode("overwrite").save(writeDir)
+ },
+ condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+ parameters = Map(
+ "columnName" -> "`ts`",
+ "columnType" -> expectedType,
+ "format" -> "Avro"))
+
+ // Read path: a user-specified nanos schema is rejected. Write a
benign file first
+ // so schema validation (not file listing) is what fails.
+ val readDir = new File(dir, "read").getCanonicalPath
+
Seq("a").toDF("ts").write.format("avro").mode("overwrite").save(readDir)
+ checkError(
+ exception = intercept[AnalysisException] {
+ spark.read.schema(new StructType().add("ts", nanosType))
+ .format("avro").load(readDir).collect()
+ },
+ condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+ parameters = Map(
+ "columnName" -> "`ts`",
+ "columnType" -> expectedType,
+ "format" -> "Avro"))
+ }
+ }
+ }
+ }
+
}
class AvroV1Suite extends AvroSuite {
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
b/sql/core/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
index 02220e6c8568..7019e925080c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
@@ -109,6 +109,9 @@ private[sql] object AvroUtils extends Logging {
case _: GeometryType | _: GeographyType => false
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType => true
case st: StructType => st.forall { f => supportsDataType(f.dataType) }
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
index 77a0c53ae469..988223abe8bc 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
@@ -157,6 +157,9 @@ case class CSVFileFormat() extends TextBasedFileFormat with
DataSourceRegister {
case _: GeometryType | _: GeographyType => false
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType => true
case udt: UserDefinedType[_] => supportDataType(udt.sqlType)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
index 2a8708f72ab1..331230c4446b 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
@@ -117,6 +117,9 @@ case class JsonFileFormat() extends TextBasedFileFormat
with DataSourceRegister
case _: GeometryType | _: GeographyType => false
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType => true
case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 633ac8107f32..8b926f51efc3 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -251,6 +251,9 @@ class OrcFileFormat
case _: GeometryType | _: GeographyType => false
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType => true
case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
index a00618a1e2e2..669d826ddac4 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
@@ -416,6 +416,9 @@ class ParquetFileFormat
case g: GeometryType => GeometryType.isSridSupported(g.srid)
case g: GeographyType => GeographyType.isSridSupported(g.srid)
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType | _: NullType => true
case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
index 4938df795cb1..221196c86445 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
@@ -26,7 +26,7 @@ import
org.apache.spark.sql.connector.write.{LogicalWriteInfo, Write, WriteBuild
import org.apache.spark.sql.execution.datasources.FileFormat
import org.apache.spark.sql.execution.datasources.csv.CSVDataSource
import org.apache.spark.sql.execution.datasources.v2.FileTable
-import org.apache.spark.sql.types.{AtomicType, DataType, GeographyType,
GeometryType, StructType, UserDefinedType}
+import org.apache.spark.sql.types.{AtomicType, DataType, GeographyType,
GeometryType, StructType, TimestampLTZNanosType, TimestampNTZNanosType,
UserDefinedType}
import org.apache.spark.sql.util.CaseInsensitiveStringMap
case class CSVTable(
@@ -59,6 +59,9 @@ case class CSVTable(
override def supportsDataType(dataType: DataType): Boolean = dataType match {
case _: GeometryType | _: GeographyType => false
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType => true
case udt: UserDefinedType[_] => supportsDataType(udt.sqlType)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
index cf3c1e11803c..dc3729a119b1 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
@@ -59,6 +59,9 @@ case class JsonTable(
override def supportsDataType(dataType: DataType): Boolean = dataType match {
case _: GeometryType | _: GeographyType => false
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType => true
case st: StructType => st.forall { f => supportsDataType(f.dataType) }
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala
index 08cd89fdacc6..e595a680d2e6 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala
@@ -53,6 +53,9 @@ case class OrcTable(
override def supportsDataType(dataType: DataType): Boolean = dataType match {
case _: GeometryType | _: GeographyType => false
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType => true
case st: StructType => st.forall { f => supportsDataType(f.dataType) }
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetTable.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetTable.scala
index 67052c201a9d..9c3b05504363 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetTable.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetTable.scala
@@ -55,6 +55,9 @@ case class ParquetTable(
case g: GeometryType => GeometryType.isSridSupported(g.srid)
case g: GeographyType => GeographyType.isSridSupported(g.srid)
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType => true
case st: StructType => st.forall { f => supportsDataType(f.dataType) }
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
index 686d81a0b56d..f32b915fc054 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
@@ -134,6 +134,9 @@ case class XmlFileFormat() extends TextBasedFileFormat with
DataSourceRegister {
case _: GeometryType | _: GeographyType => false
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
case _: AtomicType => true
case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
index 1fc45e9703f9..e5f64545986f 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
@@ -44,6 +44,7 @@ import org.apache.spark.sql.functions._
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.test.SharedSparkSession
import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.TimestampNanosVal
class FileBasedDataSourceSuite extends SharedSparkSession
with AdaptiveSparkPlanHelper {
@@ -1334,6 +1335,65 @@ class FileBasedDataSourceSuite extends SharedSparkSession
}
}
}
+
+ test("SPARK-57166: nanosecond timestamp types are not supported in file data
sources") {
+ // None of these built-in file formats support nanosecond-capable
timestamps yet.
+ val unsupportedDataSources = Seq("parquet", "orc", "json", "csv", "xml")
+ val nanosTypes = Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9))
+ withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") {
+ // Test both v1 and v2 data sources.
+ Seq(true, false).foreach { useV1 =>
+ val useV1List = if (useV1) {
+ unsupportedDataSources.mkString(",")
+ } else {
+ ""
+ }
+ withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1List) {
+ unsupportedDataSources.foreach { format =>
+ nanosTypes.foreach { nanosType =>
+ val expectedType = s""""${nanosType.sql}""""
+ withTempDir { dir =>
+ // Write path: a nanos-typed column cannot be written. The
nanos literal is built
+ // directly from its internal value to avoid relying on
cast/parser support.
+ val nanosLiteral =
+ Literal.create(new TimestampNanosVal(0L, 0.toShort),
nanosType)
+ val df = spark.range(1).select(Column(nanosLiteral).as("ts"))
+ val writeDir = new File(dir, "write").getCanonicalPath
+ checkError(
+ exception = intercept[AnalysisException] {
+ df.write.format(format).mode("overwrite").save(writeDir)
+ },
+ condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+ parameters = Map(
+ "columnName" -> "`ts`",
+ "columnType" -> expectedType,
+ "format" -> formatMapping(format)))
+
+ // Read path: a user-specified nanos schema is rejected. Write
a benign file first
+ // so schema validation (not file listing) is what fails.
+ val readDir = new File(dir, "read").getCanonicalPath
+ // XML requires a `rowTag` option on both the read and write
paths.
+ val extraOptions =
+ if (format == "xml") Map("rowTag" -> "row") else
Map.empty[String, String]
+ Seq("a").toDF("ts").write.format(format).options(extraOptions)
+ .mode("overwrite").save(readDir)
+ checkError(
+ exception = intercept[AnalysisException] {
+ spark.read.schema(new StructType().add("ts", nanosType))
+
.format(format).options(extraOptions).load(readDir).collect()
+ },
+ condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+ parameters = Map(
+ "columnName" -> "`ts`",
+ "columnType" -> expectedType,
+ "format" -> formatMapping(format)))
+ }
+ }
+ }
+ }
+ }
+ }
+ }
}
object TestingUDT {
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala
index 535b8257ad94..bce7a883fa2b 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala
@@ -28,12 +28,15 @@ import org.scalatest.BeforeAndAfter
import org.apache.spark.SparkException
import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd}
-import org.apache.spark.sql.{AnalysisException, DataFrame, Row, SaveMode}
+import org.apache.spark.sql.{AnalysisException, Column, DataFrame, Row,
SaveMode}
+import org.apache.spark.sql.catalyst.expressions.Literal
import org.apache.spark.sql.catalyst.parser.ParseException
+import org.apache.spark.sql.classic.ClassicConversions._
import org.apache.spark.sql.execution.datasources.jdbc.{JDBCOptions, JdbcUtils}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.test.SharedSparkSession
import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.TimestampNanosVal
import org.apache.spark.util.ArrayImplicits._
import org.apache.spark.util.Utils
@@ -697,4 +700,24 @@ class JDBCWriteSuite extends SharedSparkSession with
BeforeAndAfter {
assert(rows(0).getAs[java.sql.Timestamp](1)
=== java.sql.Timestamp.valueOf("2020-02-02 04:13:14.56789"))
}
+
+ test("SPARK-57166: nanosecond timestamp types are not supported in JDBC
write") {
+ withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") {
+ Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9)).foreach {
nanosType =>
+ // The nanos literal is built directly from its internal value to
avoid relying on
+ // cast/parser support.
+ val nanosLiteral = Literal.create(new TimestampNanosVal(0L,
0.toShort), nanosType)
+ val df = spark.range(1).select(Column(nanosLiteral).as("ts"))
+ checkErrorMatchPVals(
+ exception = intercept[AnalysisException] {
+ df.write.jdbc(url, "TEST.NANOSTYPES", new Properties())
+ },
+ condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+ parameters = Map(
+ "columnName" -> "`ts`",
+ "columnType" ->
java.util.regex.Pattern.quote(s""""${nanosType.sql}""""),
+ "format" -> ".*"))
+ }
+ }
+ }
}
diff --git
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
index ba37e5c176de..b8961e0ee84e 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
@@ -197,6 +197,8 @@ case class OrcFileFormat() extends FileFormat
case _: AnsiIntervalType => false
case _: TimeType => false
+ // Nanosecond-capable timestamps are not yet supported by this datasource.
+ case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
case _: AtomicType => true
case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala
index c1084dd4ee7f..504d1a5881a0 100644
---
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala
+++
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala
@@ -19,12 +19,16 @@ package org.apache.spark.sql.hive.orc
import java.io.File
-import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.{AnalysisException, Column, Row}
import org.apache.spark.sql.TestingUDT.{IntervalData, IntervalUDT}
+import org.apache.spark.sql.catalyst.expressions.Literal
+import org.apache.spark.sql.classic.ClassicConversions._
import org.apache.spark.sql.execution.datasources.orc.OrcSuite
import org.apache.spark.sql.hive.HiveUtils
import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.TimestampNanosVal
import org.apache.spark.util.Utils
class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton {
@@ -346,6 +350,44 @@ class HiveOrcSourceSuite extends OrcSuite with
TestHiveSingleton {
}
}
+ test("SPARK-57166: nanosecond timestamp types are not supported in Hive
ORC") {
+ val nanosTypes = Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9))
+ withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") {
+ nanosTypes.foreach { nanosType =>
+ val expectedType = s""""${nanosType.sql}""""
+ withTempDir { dir =>
+ // Write path
+ val nanosLiteral = Literal.create(new TimestampNanosVal(0L,
0.toShort), nanosType)
+ val df = spark.range(1).select(Column(nanosLiteral).as("ts"))
+ val writeDir = new File(dir, "write").getCanonicalPath
+ checkError(
+ exception = intercept[AnalysisException] {
+ df.write.format("orc").mode("overwrite").save(writeDir)
+ },
+ condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+ parameters = Map(
+ "columnName" -> "`ts`",
+ "columnType" -> expectedType,
+ "format" -> "ORC"))
+
+ // Read path
+ val readDir = new File(dir, "read").getCanonicalPath
+ spark.range(1).write.format("orc").mode("overwrite").save(readDir)
+ checkError(
+ exception = intercept[AnalysisException] {
+ spark.read.schema(new StructType().add("ts", nanosType))
+ .format("orc").load(readDir).collect()
+ },
+ condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+ parameters = Map(
+ "columnName" -> "`ts`",
+ "columnType" -> expectedType,
+ "format" -> "ORC"))
+ }
+ }
+ }
+ }
+
test("SPARK-31580: Read a file written before ORC-569") {
// Test ORC file came from ORC-621
val df =
readResourceOrcFile("test-data/TestStringDictionary.testRowIndex.orc")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]