This is an automated email from the ASF dual-hosted git repository.

cloud-fan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 3bf85b7974d4 [SPARK-57187][SQL] Fix INTERNAL_ERROR when current_user() 
is used as DEFAULT for CHAR/VARCHAR columns
3bf85b7974d4 is described below

commit 3bf85b7974d427a6929bb6c936e831e2069295cd
Author: Dejan Krakovic <[email protected]>
AuthorDate: Tue Jun 2 12:01:18 2026 +0800

    [SPARK-57187][SQL] Fix INTERNAL_ERROR when current_user() is used as 
DEFAULT for CHAR/VARCHAR columns
    
    ### What changes were proposed in this pull request?
    
    Two related fixes for non-foldable DEFAULT expressions (e.g. 
`current_user()`) on CHAR/VARCHAR columns:
    
    1. **Fix INTERNAL_ERROR at DDL time.** 
`ResolveDefaultColumns.coerceDefaultValue` called 
`CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)` to validate 
CHAR/VARCHAR default lengths at compile time, without checking foldability. For 
a non-foldable default such as `current_user()` (processed before it is 
replaced by a literal), `.eval(EmptyRow)` throws "Cannot evaluate expression", 
surfacing as INTERNAL_ERROR. Added an `&& ret.foldable` guard so the eager 
check only runs [...]
    
    2. **Add runtime CHAR/VARCHAR length enforcement for implicit defaults.** 
When a column with a DEFAULT is omitted from an INSERT column list, 
`TableOutputResolver.reorderColumnsByName` filled the default but did not apply 
`stringLengthCheck`, so oversized non-foldable defaults silently succeeded. It 
now wraps the default expression with the write-side length check, using 
`CharVarcharUtils.getRawType(metadata)` to detect CHAR/VARCHAR for both V1 and 
V2 tables. The explicit `DEFAULT` ke [...]
    
    ### Why are the changes needed?
    
    `CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet` (or the 
equivalent `ALTER TABLE ... SET DEFAULT`) previously failed with 
INTERNAL_ERROR. Oversized non-foldable defaults for omitted columns also 
bypassed length checks.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults 
(instead of throwing INTERNAL_ERROR), and oversized non-foldable defaults on 
omitted INSERT columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of 
succeeding.
    
    ### How was this patch tested?
    
    Added 6 tests to `ResolveDefaultColumnsSuite` (CREATE/ALTER with 
`current_user()` on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable 
oversize fails at INSERT for both implicit and explicit DEFAULT paths). Full 
suite: 19 tests pass.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Yes, co-authored using Claude code.
    
    Closes #56238 from dejankrak-db/current-user-default-char-oss.
    
    Authored-by: Dejan Krakovic <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 .../catalyst/analysis/TableOutputResolver.scala    |  29 +++++-
 .../catalyst/util/ResolveDefaultColumnsUtil.scala  |   2 +-
 .../spark/sql/ResolveDefaultColumnsSuite.scala     | 116 +++++++++++++++++++++
 3 files changed, 144 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
index d691c449733f..93d53d9f33a6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
@@ -284,6 +284,31 @@ object TableOutputResolver extends SQLConfHelper with 
Logging {
     }
   }
 
+  /**
+   * Builds the [[NamedExpression]] for a missing column filled with its 
default value, applying a
+   * write-side CHAR/VARCHAR length check so that non-foldable defaults (e.g. 
`current_user()`)
+   * that exceed the column length are caught at runtime. Uses `getRawType` so 
it works for both
+   * V1 and V2 tables. Shared by the by-name and by-position default-fill 
paths.
+   *
+   * `applyColumnMetadata` strips the default's outer alias and re-wraps it 
with the required
+   * metadata, so the length check is applied to the default value itself (the 
alias child).
+   */
+  private def applyDefaultWithLengthCheck(
+      defaultExpr: Expression,
+      expectedCol: Attribute,
+      conf: SQLConf): NamedExpression = {
+    val rawType = 
CharVarcharUtils.getRawType(expectedCol.metadata).getOrElse(expectedCol.dataType)
+    val checked = if (!conf.charVarcharAsString && 
CharVarcharUtils.hasCharVarchar(rawType)) {
+      val value = defaultExpr match {
+        case a: Alias => a.child
+        case other => other
+      }
+      CharVarcharUtils.stringLengthCheck(value, rawType)
+    } else {
+      defaultExpr
+    }
+    applyColumnMetadata(checked, expectedCol)
+  }
 
   private def canWrite(
       tableName: String,
@@ -327,7 +352,7 @@ object TableOutputResolver extends SQLConfHelper with 
Logging {
             tableName, newColPath.quoted
           )
         }
-        Some(applyColumnMetadata(defaultExpr.get, expectedCol))
+        Some(applyDefaultWithLengthCheck(defaultExpr.get, expectedCol, conf))
       } else if (matched.length > 1) {
         throw 
QueryCompilationErrors.incompatibleDataToTableAmbiguousColumnNameError(
           tableName, newColPath.quoted
@@ -448,7 +473,7 @@ object TableOutputResolver extends SQLConfHelper with 
Logging {
           throw 
QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(
             tableName, (colPath :+ expectedCol.name).quoted)
         }
-        applyColumnMetadata(defaultExpr.get, expectedCol)
+        applyDefaultWithLengthCheck(defaultExpr.get, expectedCol, conf)
       }
     } else {
       Nil
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
index 491fb7f24a22..68529e41937e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
@@ -439,7 +439,7 @@ object ResolveDefaultColumns extends QueryErrorsBase
           throw QueryCompilationErrors.defaultValuesDataTypeError(
             statementType, colName, defaultSQL, dataType, other.dataType))
     }
-    if (!conf.charVarcharAsString && 
CharVarcharUtils.hasCharVarchar(dataType)) {
+    if (!conf.charVarcharAsString && CharVarcharUtils.hasCharVarchar(dataType) 
&& ret.foldable) {
       CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)
     }
     ret
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
index 40ed0b301e1a..c3b6bd676d58 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
@@ -18,7 +18,13 @@
 package org.apache.spark.sql
 
 import org.apache.spark.SparkRuntimeException
+import org.apache.spark.sql.catalyst.analysis.TableOutputResolver
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, 
ResolveDefaultColumns}
 import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.{IntegerType, MetadataBuilder, StringType}
 
 class ResolveDefaultColumnsSuite extends SharedSparkSession {
   test("column without default value defined (null as default)") {
@@ -308,4 +314,114 @@ class ResolveDefaultColumnsSuite extends 
SharedSparkSession {
       checkAnswer(sql(s"SELECT * FROM $tableName"), Seq(Row(0, user)))
     }
   }
+
+  test("SPARK-57187: current_user() as default for CHAR column should not 
throw INTERNAL_ERROR") {
+    val tableName = "test_current_user_char"
+    val user = spark.sparkContext.sparkUser
+    withTable(tableName) {
+      sql(s"CREATE TABLE $tableName(i int, s CHAR(100) DEFAULT current_user()) 
USING parquet")
+      sql(s"INSERT INTO $tableName (i) VALUES (1)")
+      val result = sql(s"SELECT i, TRIM(s) FROM $tableName").collect()
+      assert(result.length == 1)
+      assert(result.head.getInt(0) == 1)
+      assert(result.head.getString(1) == user)
+    }
+  }
+
+  test("SPARK-57187: current_user() as default for VARCHAR column") {
+    val tableName = "test_current_user_varchar"
+    val user = spark.sparkContext.sparkUser
+    withTable(tableName) {
+      sql(s"CREATE TABLE $tableName(i int, s VARCHAR(100) DEFAULT 
current_user()) USING parquet")
+      sql(s"INSERT INTO $tableName (i) VALUES (1)")
+      checkAnswer(sql(s"SELECT * FROM $tableName"), Seq(Row(1, user)))
+    }
+  }
+
+  test("SPARK-57187: ALTER TABLE with current_user() default for CHAR column") 
{
+    val tableName = "test_current_user_char_alter"
+    val user = spark.sparkContext.sparkUser
+    withTable(tableName) {
+      sql(s"CREATE TABLE $tableName(id INT, created_by CHAR(100)) USING 
parquet")
+      sql(s"ALTER TABLE $tableName ALTER COLUMN created_by SET DEFAULT 
current_user()")
+      sql(s"INSERT INTO $tableName (id) VALUES (1)")
+      val result = sql(s"SELECT id, TRIM(created_by) FROM 
$tableName").collect()
+      assert(result.length == 1)
+      assert(result.head.getInt(0) == 1)
+      assert(result.head.getString(1) == user)
+    }
+  }
+
+  test("SPARK-57187: foldable default exceeding CHAR/VARCHAR length fails at 
DDL time") {
+    // Foldable expressions are still validated eagerly at DDL time (existing 
behavior)
+    Seq("CHAR", "VARCHAR").foreach { typeName =>
+      checkError(
+        exception = intercept[SparkRuntimeException](
+          sql(s"CREATE TABLE t(c $typeName(3) DEFAULT 'toolong') USING 
parquet")),
+        condition = "EXCEED_LIMIT_LENGTH",
+        parameters = Map("limit" -> "3"))
+    }
+  }
+
+  test("SPARK-57187: non-foldable default exceeding CHAR/VARCHAR length fails 
at INSERT time " +
+      "(implicit default)") {
+    // current_user() exceeds CHAR(1)/VARCHAR(1) -- DDL succeeds because the 
expression is
+    // non-foldable, but INSERT should fail at runtime with 
EXCEED_LIMIT_LENGTH.
+    Seq("CHAR", "VARCHAR").foreach { typeName =>
+      withTable("t") {
+        sql(s"CREATE TABLE t(i INT, s $typeName(1) DEFAULT current_user()) 
USING parquet")
+        checkError(
+          exception = intercept[SparkRuntimeException](
+            sql("INSERT INTO t (i) VALUES (1)")),
+          condition = "EXCEED_LIMIT_LENGTH",
+          parameters = Map("limit" -> "1"))
+      }
+    }
+  }
+
+  test("SPARK-57187: non-foldable default exceeding CHAR/VARCHAR length fails 
at INSERT time " +
+      "(explicit DEFAULT keyword)") {
+    // Using the explicit DEFAULT keyword in VALUES goes through the 
checkField path.
+    Seq("CHAR", "VARCHAR").foreach { typeName =>
+      withTable("t") {
+        sql(s"CREATE TABLE t(i INT, s $typeName(1) DEFAULT current_user()) 
USING parquet")
+        checkError(
+          exception = intercept[SparkRuntimeException](
+            sql("INSERT INTO t VALUES (1, DEFAULT)")),
+          condition = "EXCEED_LIMIT_LENGTH",
+          parameters = Map("limit" -> "1"))
+      }
+    }
+  }
+
+  test("SPARK-57187: by-position default fill applies the CHAR/VARCHAR length 
check") {
+    // The by-position fill path (resolveColumnsByPosition under RECURSE / V2 
schema evolution)
+    // shares the same applyDefaultWithLengthCheck helper as the by-name path. 
This drives that
+    // path directly and asserts the trailing default column is wrapped with 
the write-side
+    // length check, so an oversized non-foldable default is caught at runtime 
there too.
+    // Expected schema: (i INT, s CHAR(100) DEFAULT current_user()). CHAR is 
stored as StringType
+    // plus the raw-type metadata, exactly as the catalog represents it. 
CHAR(100) is wide enough
+    // that the resolved default does not trip the eager DDL-time length 
check, so we observe the
+    // write-side runtime check that the by-position fill path now adds.
+    val charMeta = new MetadataBuilder()
+      .putString(CharVarcharUtils.CHAR_VARCHAR_TYPE_STRING_METADATA_KEY, 
"char(100)")
+      .putString(ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY, 
"current_user()")
+      .build()
+    val expected = Seq(
+      AttributeReference("i", IntegerType)(),
+      AttributeReference("s", StringType, nullable = true, metadata = 
charMeta)())
+    // A by-position INSERT that supplies only the leading column, omitting 
the trailing CHAR one.
+    val query = LocalRelation(AttributeReference("i", IntegerType)())
+
+    val resolved = TableOutputResolver.resolveOutputColumns(
+      "t", expected, query, byName = false, spark.sessionState.conf,
+      TableOutputResolver.DefaultValueFillMode.RECURSE)
+
+    val hasLengthCheck = resolved.expressions.exists(_.exists {
+      case s: StaticInvoke => s.functionName == "charTypeWriteSideCheck"
+      case _ => false
+    })
+    assert(hasLengthCheck,
+      "by-position default fill must apply the CHAR/VARCHAR write-side length 
check")
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to