(spark) branch branch-4.x updated: [SPARK-57166][SQL] Reject nanosecond-capable timestamp types in built-in datasources and JDBC

maxgekk Wed, 03 Jun 2026 01:10:35 -0700

This is an automated email from the ASF dual-hosted git repository.

MaxGekk pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.x by this push:
     new 50aaa9291bba [SPARK-57166][SQL] Reject nanosecond-capable timestamp 
types in built-in datasources and JDBC
50aaa9291bba is described below

commit 50aaa9291bba511164e8a8562570139e7e226189
Author: Maxim Gekk <[email protected]>
AuthorDate: Wed Jun 3 10:09:55 2026 +0200

    [SPARK-57166][SQL] Reject nanosecond-capable timestamp types in built-in 
datasources and JDBC
    
    ### What changes were proposed in this pull request?
    
    This PR makes all built-in file datasources and JDBC explicitly **reject** 
the nanosecond-capable timestamp types `TimestampNTZNanosType` and 
`TimestampLTZNanosType` on both the read and write paths, raising the existing 
`UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE` error.
    
    An explicit arm is added before the `case _: AtomicType => true` catch-all 
in each datasource's `supportDataType` / `supportsDataType`:
    
    ```scala
    // Nanosecond-capable timestamps are not yet supported by this datasource.
    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
    ```
    
    Sources changed:
    - V1 `FileFormat.supportDataType`: `ParquetFileFormat`, `OrcFileFormat` 
(native), `JsonFileFormat`, `XmlFileFormat`, and the private 
`CSVFileFormat.supportDataType(dataType, allowVariant)` (covers both 
`supportDataType` and `supportReadDataType`).
    - `AvroUtils.supportsDataType` (single edit covering V1 `AvroFileFormat` 
and V2 `AvroTable`).
    - `sql/hive` `OrcFileFormat.supportDataType` (Hive ORC serde).
    - V2 `FileTable.supportsDataType`: `ParquetTable`, `OrcTable`, `JsonTable`, 
`CSVTable`.
    
    No code change was needed for:
    - Text (string-only, already rejects nanos).
    - JDBC read (`getCatalystType` maps JDBC `TIMESTAMP`/`TIME` to microsecond 
types only).
    - JDBC write (already rejected via the `CreatableRelationProvider` type 
whitelist and `JdbcUtils.getCommonJDBCType` returning `None`); a test is added 
to lock the behavior.
    - XML V2: there is no `XmlTable`; XML is V1-only, so the `XmlFileFormat` 
edit covers all XML paths.
    
    The rejection is **unconditional** (independent of the 
`spark.sql.timestampNanosTypes.enabled` preview flag). The flag governs only 
whether the type may exist; these sources have no real nanos read/write support 
yet. As each source adds support later (e.g. Parquet read via SPARK-57102), it 
can carve out its own exception.
    
    This is a sub-task of 
[SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: 
Timestamps with nanosecond precision).
    
    ### Why are the changes needed?
    
    When the preview flag `spark.sql.timestampNanosTypes.enabled` is on, the 
nanos types extend `DatetimeType extends AtomicType`, so every file source's 
`case _: AtomicType => true` catch-all silently **accepts** them and then 
misbehaves at read/write time because no real support exists yet. Users get 
confusing downstream failures or silent precision issues instead of a clear, 
actionable error. This PR provides a clear guardrail until real support is 
implemented per source.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, but only for the preview feature gated by 
`spark.sql.timestampNanosTypes.enabled` (disabled by default in production). 
With the flag enabled, writing or reading a `TIMESTAMP_NTZ(p)` / 
`TIMESTAMP_LTZ(p)` (p in [7, 9]) column through Parquet, ORC, Avro, JSON, CSV, 
XML (v1 and v2), and Hive ORC now fails with 
`UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`, and JDBC write fails with a clear 
error. Previously these were silently accepted and then misbehaved. Behavior 
for all other types is un [...]
    
    ### How was this patch tested?
    
    Added unit tests:
    - `FileBasedDataSourceSuite`: a new test that, with 
`TIMESTAMP_NANOS_TYPES_ENABLED=true`, iterates over v1 and v2 
(`USE_V1_SOURCE_LIST`) and the built-in formats (parquet, orc, json, csv, xml), 
asserting that both write and read of a `TIMESTAMP_NTZ(9)` / `TIMESTAMP_LTZ(9)` 
column fail with `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`. The nanos column is 
built from a typed literal rather than via CAST.
    - `AvroSuite`: an equivalent write+read assertion for Avro.
    - `JDBCWriteSuite`: a new test asserting that writing a nanos column via 
`.write.jdbc(...)` is rejected.
    
    All three suites pass locally, and `sql/core`, `sql/hive`, and 
`connector/avro` test sources compile.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Cursor (Claude Opus 4.8)
    
    Closes #56269 from MaxGekk/nanos-ds-off.
    
    Authored-by: Maxim Gekk <[email protected]>
    Signed-off-by: Max Gekk <[email protected]>
    (cherry picked from commit fa57fa2f5e9cfe97ab44e8af59b3ce15c906a0f9)
    Signed-off-by: Max Gekk <[email protected]>
---
 .../org/apache/spark/sql/avro/AvroSuite.scala      | 43 +++++++++++++++-
 .../org/apache/spark/sql/avro/AvroUtils.scala      |  3 ++
 .../execution/datasources/csv/CSVFileFormat.scala  |  3 ++
 .../datasources/json/JsonFileFormat.scala          |  3 ++
 .../execution/datasources/orc/OrcFileFormat.scala  |  3 ++
 .../datasources/parquet/ParquetFileFormat.scala    |  3 ++
 .../execution/datasources/v2/csv/CSVTable.scala    |  5 +-
 .../execution/datasources/v2/json/JsonTable.scala  |  3 ++
 .../execution/datasources/v2/orc/OrcTable.scala    |  3 ++
 .../datasources/v2/parquet/ParquetTable.scala      |  3 ++
 .../execution/datasources/xml/XmlFileFormat.scala  |  3 ++
 .../spark/sql/FileBasedDataSourceSuite.scala       | 60 ++++++++++++++++++++++
 .../org/apache/spark/sql/jdbc/JDBCWriteSuite.scala | 25 ++++++++-
 .../apache/spark/sql/hive/orc/OrcFileFormat.scala  |  2 +
 .../spark/sql/hive/orc/HiveOrcSourceSuite.scala    | 44 +++++++++++++++-
 15 files changed, 202 insertions(+), 4 deletions(-)

diff --git 
a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala 
b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
index 56b107c14f57..c680850e602c 100644
--- a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
+++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
@@ -38,7 +38,7 @@ import org.apache.spark.TestUtils.assertExceptionMsg
 import org.apache.spark.sql._
 import org.apache.spark.sql.TestingUDT.IntervalData
 import org.apache.spark.sql.avro.AvroCompressionCodec._
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Literal}
 import org.apache.spark.sql.catalyst.plans.logical.Filter
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils
 import 
org.apache.spark.sql.catalyst.util.DateTimeTestUtils.{withDefaultTimeZone, LA, 
UTC}
@@ -52,6 +52,7 @@ import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.v2.avro.AvroScan
+import org.apache.spark.unsafe.types.TimestampNanosVal
 import org.apache.spark.util.Utils
 
 abstract class AvroSuite
@@ -3191,6 +3192,46 @@ abstract class AvroSuite
     }
   }
 
+  test("SPARK-57166: nanosecond timestamp types are not supported in Avro") {
+    val nanosTypes = Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9))
+    withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") {
+      nanosTypes.foreach { nanosType =>
+        val expectedType = s""""${nanosType.sql}""""
+        withTempDir { dir =>
+          // Write path: a nanos-typed column cannot be written. The nanos 
literal is built
+          // directly from its internal value to avoid relying on cast/parser 
support.
+          val nanosLiteral = Literal.create(new TimestampNanosVal(0L, 
0.toShort), nanosType)
+          val df = spark.range(1).select(Column(nanosLiteral).as("ts"))
+          val writeDir = new File(dir, "write").getCanonicalPath
+          checkError(
+            exception = intercept[AnalysisException] {
+              df.write.format("avro").mode("overwrite").save(writeDir)
+            },
+            condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+            parameters = Map(
+              "columnName" -> "`ts`",
+              "columnType" -> expectedType,
+              "format" -> "Avro"))
+
+          // Read path: a user-specified nanos schema is rejected. Write a 
benign file first
+          // so schema validation (not file listing) is what fails.
+          val readDir = new File(dir, "read").getCanonicalPath
+          
Seq("a").toDF("ts").write.format("avro").mode("overwrite").save(readDir)
+          checkError(
+            exception = intercept[AnalysisException] {
+              spark.read.schema(new StructType().add("ts", nanosType))
+                .format("avro").load(readDir).collect()
+            },
+            condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+            parameters = Map(
+              "columnName" -> "`ts`",
+              "columnType" -> expectedType,
+              "format" -> "Avro"))
+        }
+      }
+    }
+  }
+
 }
 
 class AvroV1Suite extends AvroSuite {
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
index 02220e6c8568..7019e925080c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
@@ -109,6 +109,9 @@ private[sql] object AvroUtils extends Logging {
 
     case _: GeometryType | _: GeographyType => false
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType => true
 
     case st: StructType => st.forall { f => supportsDataType(f.dataType) }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
index 77a0c53ae469..988223abe8bc 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
@@ -157,6 +157,9 @@ case class CSVFileFormat() extends TextBasedFileFormat with 
DataSourceRegister {
 
     case _: GeometryType | _: GeographyType => false
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType => true
 
     case udt: UserDefinedType[_] => supportDataType(udt.sqlType)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
index 2a8708f72ab1..331230c4446b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
@@ -117,6 +117,9 @@ case class JsonFileFormat() extends TextBasedFileFormat 
with DataSourceRegister
 
     case _: GeometryType | _: GeographyType => false
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType => true
 
     case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 633ac8107f32..8b926f51efc3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -251,6 +251,9 @@ class OrcFileFormat
 
     case _: GeometryType | _: GeographyType => false
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType => true
 
     case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
index a00618a1e2e2..669d826ddac4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
@@ -416,6 +416,9 @@ class ParquetFileFormat
     case g: GeometryType => GeometryType.isSridSupported(g.srid)
     case g: GeographyType => GeographyType.isSridSupported(g.srid)
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType | _: NullType => true
 
     case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
index 4938df795cb1..221196c86445 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
@@ -26,7 +26,7 @@ import 
org.apache.spark.sql.connector.write.{LogicalWriteInfo, Write, WriteBuild
 import org.apache.spark.sql.execution.datasources.FileFormat
 import org.apache.spark.sql.execution.datasources.csv.CSVDataSource
 import org.apache.spark.sql.execution.datasources.v2.FileTable
-import org.apache.spark.sql.types.{AtomicType, DataType, GeographyType, 
GeometryType, StructType, UserDefinedType}
+import org.apache.spark.sql.types.{AtomicType, DataType, GeographyType, 
GeometryType, StructType, TimestampLTZNanosType, TimestampNTZNanosType, 
UserDefinedType}
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 
 case class CSVTable(
@@ -59,6 +59,9 @@ case class CSVTable(
   override def supportsDataType(dataType: DataType): Boolean = dataType match {
     case _: GeometryType | _: GeographyType => false
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType => true
 
     case udt: UserDefinedType[_] => supportsDataType(udt.sqlType)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
index cf3c1e11803c..dc3729a119b1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
@@ -59,6 +59,9 @@ case class JsonTable(
   override def supportsDataType(dataType: DataType): Boolean = dataType match {
     case _: GeometryType | _: GeographyType => false
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType => true
 
     case st: StructType => st.forall { f => supportsDataType(f.dataType) }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala
index 08cd89fdacc6..e595a680d2e6 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala
@@ -53,6 +53,9 @@ case class OrcTable(
   override def supportsDataType(dataType: DataType): Boolean = dataType match {
     case _: GeometryType | _: GeographyType => false
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType => true
 
     case st: StructType => st.forall { f => supportsDataType(f.dataType) }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetTable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetTable.scala
index 67052c201a9d..9c3b05504363 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetTable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetTable.scala
@@ -55,6 +55,9 @@ case class ParquetTable(
     case g: GeometryType => GeometryType.isSridSupported(g.srid)
     case g: GeographyType => GeographyType.isSridSupported(g.srid)
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType => true
 
     case st: StructType => st.forall { f => supportsDataType(f.dataType) }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
index 686d81a0b56d..f32b915fc054 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
@@ -134,6 +134,9 @@ case class XmlFileFormat() extends TextBasedFileFormat with 
DataSourceRegister {
 
     case _: GeometryType | _: GeographyType => false
 
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
+
     case _: AtomicType => true
 
     case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
index 1fc45e9703f9..e5f64545986f 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
@@ -44,6 +44,7 @@ import org.apache.spark.sql.functions._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.TimestampNanosVal
 
 class FileBasedDataSourceSuite extends SharedSparkSession
   with AdaptiveSparkPlanHelper {
@@ -1334,6 +1335,65 @@ class FileBasedDataSourceSuite extends SharedSparkSession
       }
     }
   }
+
+  test("SPARK-57166: nanosecond timestamp types are not supported in file data 
sources") {
+    // None of these built-in file formats support nanosecond-capable 
timestamps yet.
+    val unsupportedDataSources = Seq("parquet", "orc", "json", "csv", "xml")
+    val nanosTypes = Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9))
+    withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") {
+      // Test both v1 and v2 data sources.
+      Seq(true, false).foreach { useV1 =>
+        val useV1List = if (useV1) {
+          unsupportedDataSources.mkString(",")
+        } else {
+          ""
+        }
+        withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1List) {
+          unsupportedDataSources.foreach { format =>
+            nanosTypes.foreach { nanosType =>
+              val expectedType = s""""${nanosType.sql}""""
+              withTempDir { dir =>
+                // Write path: a nanos-typed column cannot be written. The 
nanos literal is built
+                // directly from its internal value to avoid relying on 
cast/parser support.
+                val nanosLiteral =
+                  Literal.create(new TimestampNanosVal(0L, 0.toShort), 
nanosType)
+                val df = spark.range(1).select(Column(nanosLiteral).as("ts"))
+                val writeDir = new File(dir, "write").getCanonicalPath
+                checkError(
+                  exception = intercept[AnalysisException] {
+                    df.write.format(format).mode("overwrite").save(writeDir)
+                  },
+                  condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+                  parameters = Map(
+                    "columnName" -> "`ts`",
+                    "columnType" -> expectedType,
+                    "format" -> formatMapping(format)))
+
+                // Read path: a user-specified nanos schema is rejected. Write 
a benign file first
+                // so schema validation (not file listing) is what fails.
+                val readDir = new File(dir, "read").getCanonicalPath
+                // XML requires a `rowTag` option on both the read and write 
paths.
+                val extraOptions =
+                  if (format == "xml") Map("rowTag" -> "row") else 
Map.empty[String, String]
+                Seq("a").toDF("ts").write.format(format).options(extraOptions)
+                  .mode("overwrite").save(readDir)
+                checkError(
+                  exception = intercept[AnalysisException] {
+                    spark.read.schema(new StructType().add("ts", nanosType))
+                      
.format(format).options(extraOptions).load(readDir).collect()
+                  },
+                  condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+                  parameters = Map(
+                    "columnName" -> "`ts`",
+                    "columnType" -> expectedType,
+                    "format" -> formatMapping(format)))
+              }
+            }
+          }
+        }
+      }
+    }
+  }
 }
 
 object TestingUDT {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala
index 535b8257ad94..bce7a883fa2b 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala
@@ -28,12 +28,15 @@ import org.scalatest.BeforeAndAfter
 
 import org.apache.spark.SparkException
 import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd}
-import org.apache.spark.sql.{AnalysisException, DataFrame, Row, SaveMode}
+import org.apache.spark.sql.{AnalysisException, Column, DataFrame, Row, 
SaveMode}
+import org.apache.spark.sql.catalyst.expressions.Literal
 import org.apache.spark.sql.catalyst.parser.ParseException
+import org.apache.spark.sql.classic.ClassicConversions._
 import org.apache.spark.sql.execution.datasources.jdbc.{JDBCOptions, JdbcUtils}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.TimestampNanosVal
 import org.apache.spark.util.ArrayImplicits._
 import org.apache.spark.util.Utils
 
@@ -697,4 +700,24 @@ class JDBCWriteSuite extends SharedSparkSession with 
BeforeAndAfter {
     assert(rows(0).getAs[java.sql.Timestamp](1)
       === java.sql.Timestamp.valueOf("2020-02-02 04:13:14.56789"))
   }
+
+  test("SPARK-57166: nanosecond timestamp types are not supported in JDBC 
write") {
+    withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") {
+      Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9)).foreach { 
nanosType =>
+        // The nanos literal is built directly from its internal value to 
avoid relying on
+        // cast/parser support.
+        val nanosLiteral = Literal.create(new TimestampNanosVal(0L, 
0.toShort), nanosType)
+        val df = spark.range(1).select(Column(nanosLiteral).as("ts"))
+        checkErrorMatchPVals(
+          exception = intercept[AnalysisException] {
+            df.write.jdbc(url, "TEST.NANOSTYPES", new Properties())
+          },
+          condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+          parameters = Map(
+            "columnName" -> "`ts`",
+            "columnType" -> 
java.util.regex.Pattern.quote(s""""${nanosType.sql}""""),
+            "format" -> ".*"))
+      }
+    }
+  }
 }
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
index ba37e5c176de..b8961e0ee84e 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
@@ -197,6 +197,8 @@ case class OrcFileFormat() extends FileFormat
 
     case _: AnsiIntervalType => false
     case _: TimeType => false
+    // Nanosecond-capable timestamps are not yet supported by this datasource.
+    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
     case _: AtomicType => true
 
     case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala
index c1084dd4ee7f..504d1a5881a0 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala
@@ -19,12 +19,16 @@ package org.apache.spark.sql.hive.orc
 
 import java.io.File
 
-import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.{AnalysisException, Column, Row}
 import org.apache.spark.sql.TestingUDT.{IntervalData, IntervalUDT}
+import org.apache.spark.sql.catalyst.expressions.Literal
+import org.apache.spark.sql.classic.ClassicConversions._
 import org.apache.spark.sql.execution.datasources.orc.OrcSuite
 import org.apache.spark.sql.hive.HiveUtils
 import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.TimestampNanosVal
 import org.apache.spark.util.Utils
 
 class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton {
@@ -346,6 +350,44 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
     }
   }
 
+  test("SPARK-57166: nanosecond timestamp types are not supported in Hive 
ORC") {
+    val nanosTypes = Seq(TimestampNTZNanosType(9), TimestampLTZNanosType(9))
+    withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") {
+      nanosTypes.foreach { nanosType =>
+        val expectedType = s""""${nanosType.sql}""""
+        withTempDir { dir =>
+          // Write path
+          val nanosLiteral = Literal.create(new TimestampNanosVal(0L, 
0.toShort), nanosType)
+          val df = spark.range(1).select(Column(nanosLiteral).as("ts"))
+          val writeDir = new File(dir, "write").getCanonicalPath
+          checkError(
+            exception = intercept[AnalysisException] {
+              df.write.format("orc").mode("overwrite").save(writeDir)
+            },
+            condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+            parameters = Map(
+              "columnName" -> "`ts`",
+              "columnType" -> expectedType,
+              "format" -> "ORC"))
+
+          // Read path
+          val readDir = new File(dir, "read").getCanonicalPath
+          spark.range(1).write.format("orc").mode("overwrite").save(readDir)
+          checkError(
+            exception = intercept[AnalysisException] {
+              spark.read.schema(new StructType().add("ts", nanosType))
+                .format("orc").load(readDir).collect()
+            },
+            condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
+            parameters = Map(
+              "columnName" -> "`ts`",
+              "columnType" -> expectedType,
+              "format" -> "ORC"))
+        }
+      }
+    }
+  }
+
   test("SPARK-31580: Read a file written before ORC-569") {
     // Test ORC file came from ORC-621
     val df = 
readResourceOrcFile("test-data/TestStringDictionary.testRowIndex.orc")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.x updated: [SPARK-57166][SQL] Reject nanosecond-capable timestamp types in built-in datasources and JDBC

Reply via email to