This is an automated email from the ASF dual-hosted git repository.
cloud-fan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 3bf85b7974d4 [SPARK-57187][SQL] Fix INTERNAL_ERROR when current_user()
is used as DEFAULT for CHAR/VARCHAR columns
3bf85b7974d4 is described below
commit 3bf85b7974d427a6929bb6c936e831e2069295cd
Author: Dejan Krakovic <[email protected]>
AuthorDate: Tue Jun 2 12:01:18 2026 +0800
[SPARK-57187][SQL] Fix INTERNAL_ERROR when current_user() is used as
DEFAULT for CHAR/VARCHAR columns
### What changes were proposed in this pull request?
Two related fixes for non-foldable DEFAULT expressions (e.g.
`current_user()`) on CHAR/VARCHAR columns:
1. **Fix INTERNAL_ERROR at DDL time.**
`ResolveDefaultColumns.coerceDefaultValue` called
`CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)` to validate
CHAR/VARCHAR default lengths at compile time, without checking foldability. For
a non-foldable default such as `current_user()` (processed before it is
replaced by a literal), `.eval(EmptyRow)` throws "Cannot evaluate expression",
surfacing as INTERNAL_ERROR. Added an `&& ret.foldable` guard so the eager
check only runs [...]
2. **Add runtime CHAR/VARCHAR length enforcement for implicit defaults.**
When a column with a DEFAULT is omitted from an INSERT column list,
`TableOutputResolver.reorderColumnsByName` filled the default but did not apply
`stringLengthCheck`, so oversized non-foldable defaults silently succeeded. It
now wraps the default expression with the write-side length check, using
`CharVarcharUtils.getRawType(metadata)` to detect CHAR/VARCHAR for both V1 and
V2 tables. The explicit `DEFAULT` ke [...]
### Why are the changes needed?
`CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet` (or the
equivalent `ALTER TABLE ... SET DEFAULT`) previously failed with
INTERNAL_ERROR. Oversized non-foldable defaults for omitted columns also
bypassed length checks.
### Does this PR introduce _any_ user-facing change?
Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults
(instead of throwing INTERNAL_ERROR), and oversized non-foldable defaults on
omitted INSERT columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of
succeeding.
### How was this patch tested?
Added 6 tests to `ResolveDefaultColumnsSuite` (CREATE/ALTER with
`current_user()` on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable
oversize fails at INSERT for both implicit and explicit DEFAULT paths). Full
suite: 19 tests pass.
### Was this patch authored or co-authored using generative AI tooling?
Yes, co-authored using Claude code.
Closes #56238 from dejankrak-db/current-user-default-char-oss.
Authored-by: Dejan Krakovic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
.../catalyst/analysis/TableOutputResolver.scala | 29 +++++-
.../catalyst/util/ResolveDefaultColumnsUtil.scala | 2 +-
.../spark/sql/ResolveDefaultColumnsSuite.scala | 116 +++++++++++++++++++++
3 files changed, 144 insertions(+), 3 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
index d691c449733f..93d53d9f33a6 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
@@ -284,6 +284,31 @@ object TableOutputResolver extends SQLConfHelper with
Logging {
}
}
+ /**
+ * Builds the [[NamedExpression]] for a missing column filled with its
default value, applying a
+ * write-side CHAR/VARCHAR length check so that non-foldable defaults (e.g.
`current_user()`)
+ * that exceed the column length are caught at runtime. Uses `getRawType` so
it works for both
+ * V1 and V2 tables. Shared by the by-name and by-position default-fill
paths.
+ *
+ * `applyColumnMetadata` strips the default's outer alias and re-wraps it
with the required
+ * metadata, so the length check is applied to the default value itself (the
alias child).
+ */
+ private def applyDefaultWithLengthCheck(
+ defaultExpr: Expression,
+ expectedCol: Attribute,
+ conf: SQLConf): NamedExpression = {
+ val rawType =
CharVarcharUtils.getRawType(expectedCol.metadata).getOrElse(expectedCol.dataType)
+ val checked = if (!conf.charVarcharAsString &&
CharVarcharUtils.hasCharVarchar(rawType)) {
+ val value = defaultExpr match {
+ case a: Alias => a.child
+ case other => other
+ }
+ CharVarcharUtils.stringLengthCheck(value, rawType)
+ } else {
+ defaultExpr
+ }
+ applyColumnMetadata(checked, expectedCol)
+ }
private def canWrite(
tableName: String,
@@ -327,7 +352,7 @@ object TableOutputResolver extends SQLConfHelper with
Logging {
tableName, newColPath.quoted
)
}
- Some(applyColumnMetadata(defaultExpr.get, expectedCol))
+ Some(applyDefaultWithLengthCheck(defaultExpr.get, expectedCol, conf))
} else if (matched.length > 1) {
throw
QueryCompilationErrors.incompatibleDataToTableAmbiguousColumnNameError(
tableName, newColPath.quoted
@@ -448,7 +473,7 @@ object TableOutputResolver extends SQLConfHelper with
Logging {
throw
QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(
tableName, (colPath :+ expectedCol.name).quoted)
}
- applyColumnMetadata(defaultExpr.get, expectedCol)
+ applyDefaultWithLengthCheck(defaultExpr.get, expectedCol, conf)
}
} else {
Nil
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
index 491fb7f24a22..68529e41937e 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
@@ -439,7 +439,7 @@ object ResolveDefaultColumns extends QueryErrorsBase
throw QueryCompilationErrors.defaultValuesDataTypeError(
statementType, colName, defaultSQL, dataType, other.dataType))
}
- if (!conf.charVarcharAsString &&
CharVarcharUtils.hasCharVarchar(dataType)) {
+ if (!conf.charVarcharAsString && CharVarcharUtils.hasCharVarchar(dataType)
&& ret.foldable) {
CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)
}
ret
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
index 40ed0b301e1a..c3b6bd676d58 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
@@ -18,7 +18,13 @@
package org.apache.spark.sql
import org.apache.spark.SparkRuntimeException
+import org.apache.spark.sql.catalyst.analysis.TableOutputResolver
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.catalyst.util.{CharVarcharUtils,
ResolveDefaultColumns}
import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.{IntegerType, MetadataBuilder, StringType}
class ResolveDefaultColumnsSuite extends SharedSparkSession {
test("column without default value defined (null as default)") {
@@ -308,4 +314,114 @@ class ResolveDefaultColumnsSuite extends
SharedSparkSession {
checkAnswer(sql(s"SELECT * FROM $tableName"), Seq(Row(0, user)))
}
}
+
+ test("SPARK-57187: current_user() as default for CHAR column should not
throw INTERNAL_ERROR") {
+ val tableName = "test_current_user_char"
+ val user = spark.sparkContext.sparkUser
+ withTable(tableName) {
+ sql(s"CREATE TABLE $tableName(i int, s CHAR(100) DEFAULT current_user())
USING parquet")
+ sql(s"INSERT INTO $tableName (i) VALUES (1)")
+ val result = sql(s"SELECT i, TRIM(s) FROM $tableName").collect()
+ assert(result.length == 1)
+ assert(result.head.getInt(0) == 1)
+ assert(result.head.getString(1) == user)
+ }
+ }
+
+ test("SPARK-57187: current_user() as default for VARCHAR column") {
+ val tableName = "test_current_user_varchar"
+ val user = spark.sparkContext.sparkUser
+ withTable(tableName) {
+ sql(s"CREATE TABLE $tableName(i int, s VARCHAR(100) DEFAULT
current_user()) USING parquet")
+ sql(s"INSERT INTO $tableName (i) VALUES (1)")
+ checkAnswer(sql(s"SELECT * FROM $tableName"), Seq(Row(1, user)))
+ }
+ }
+
+ test("SPARK-57187: ALTER TABLE with current_user() default for CHAR column")
{
+ val tableName = "test_current_user_char_alter"
+ val user = spark.sparkContext.sparkUser
+ withTable(tableName) {
+ sql(s"CREATE TABLE $tableName(id INT, created_by CHAR(100)) USING
parquet")
+ sql(s"ALTER TABLE $tableName ALTER COLUMN created_by SET DEFAULT
current_user()")
+ sql(s"INSERT INTO $tableName (id) VALUES (1)")
+ val result = sql(s"SELECT id, TRIM(created_by) FROM
$tableName").collect()
+ assert(result.length == 1)
+ assert(result.head.getInt(0) == 1)
+ assert(result.head.getString(1) == user)
+ }
+ }
+
+ test("SPARK-57187: foldable default exceeding CHAR/VARCHAR length fails at
DDL time") {
+ // Foldable expressions are still validated eagerly at DDL time (existing
behavior)
+ Seq("CHAR", "VARCHAR").foreach { typeName =>
+ checkError(
+ exception = intercept[SparkRuntimeException](
+ sql(s"CREATE TABLE t(c $typeName(3) DEFAULT 'toolong') USING
parquet")),
+ condition = "EXCEED_LIMIT_LENGTH",
+ parameters = Map("limit" -> "3"))
+ }
+ }
+
+ test("SPARK-57187: non-foldable default exceeding CHAR/VARCHAR length fails
at INSERT time " +
+ "(implicit default)") {
+ // current_user() exceeds CHAR(1)/VARCHAR(1) -- DDL succeeds because the
expression is
+ // non-foldable, but INSERT should fail at runtime with
EXCEED_LIMIT_LENGTH.
+ Seq("CHAR", "VARCHAR").foreach { typeName =>
+ withTable("t") {
+ sql(s"CREATE TABLE t(i INT, s $typeName(1) DEFAULT current_user())
USING parquet")
+ checkError(
+ exception = intercept[SparkRuntimeException](
+ sql("INSERT INTO t (i) VALUES (1)")),
+ condition = "EXCEED_LIMIT_LENGTH",
+ parameters = Map("limit" -> "1"))
+ }
+ }
+ }
+
+ test("SPARK-57187: non-foldable default exceeding CHAR/VARCHAR length fails
at INSERT time " +
+ "(explicit DEFAULT keyword)") {
+ // Using the explicit DEFAULT keyword in VALUES goes through the
checkField path.
+ Seq("CHAR", "VARCHAR").foreach { typeName =>
+ withTable("t") {
+ sql(s"CREATE TABLE t(i INT, s $typeName(1) DEFAULT current_user())
USING parquet")
+ checkError(
+ exception = intercept[SparkRuntimeException](
+ sql("INSERT INTO t VALUES (1, DEFAULT)")),
+ condition = "EXCEED_LIMIT_LENGTH",
+ parameters = Map("limit" -> "1"))
+ }
+ }
+ }
+
+ test("SPARK-57187: by-position default fill applies the CHAR/VARCHAR length
check") {
+ // The by-position fill path (resolveColumnsByPosition under RECURSE / V2
schema evolution)
+ // shares the same applyDefaultWithLengthCheck helper as the by-name path.
This drives that
+ // path directly and asserts the trailing default column is wrapped with
the write-side
+ // length check, so an oversized non-foldable default is caught at runtime
there too.
+ // Expected schema: (i INT, s CHAR(100) DEFAULT current_user()). CHAR is
stored as StringType
+ // plus the raw-type metadata, exactly as the catalog represents it.
CHAR(100) is wide enough
+ // that the resolved default does not trip the eager DDL-time length
check, so we observe the
+ // write-side runtime check that the by-position fill path now adds.
+ val charMeta = new MetadataBuilder()
+ .putString(CharVarcharUtils.CHAR_VARCHAR_TYPE_STRING_METADATA_KEY,
"char(100)")
+ .putString(ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY,
"current_user()")
+ .build()
+ val expected = Seq(
+ AttributeReference("i", IntegerType)(),
+ AttributeReference("s", StringType, nullable = true, metadata =
charMeta)())
+ // A by-position INSERT that supplies only the leading column, omitting
the trailing CHAR one.
+ val query = LocalRelation(AttributeReference("i", IntegerType)())
+
+ val resolved = TableOutputResolver.resolveOutputColumns(
+ "t", expected, query, byName = false, spark.sessionState.conf,
+ TableOutputResolver.DefaultValueFillMode.RECURSE)
+
+ val hasLengthCheck = resolved.expressions.exists(_.exists {
+ case s: StaticInvoke => s.functionName == "charTypeWriteSideCheck"
+ case _ => false
+ })
+ assert(hasLengthCheck,
+ "by-position default fill must apply the CHAR/VARCHAR write-side length
check")
+ }
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]