(spark) branch master updated: [SPARK-55636][CONNECT] Add detailed errors in case of deduplication of invalid columns

gurwls223 Tue, 24 Feb 2026 14:17:33 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new f0285c489afa [SPARK-55636][CONNECT] Add detailed errors in case of 
deduplication of invalid columns
f0285c489afa is described below

commit f0285c489afa7a0fe03a9b19d582b19406da94c1
Author: pranavdev022 <[email protected]>
AuthorDate: Wed Feb 25 07:16:40 2026 +0900

    [SPARK-55636][CONNECT] Add detailed errors in case of deduplication of 
invalid columns
    
    ### What changes were proposed in this pull request?
    
    This PR updates the error handling for invalid deduplicate column names in 
Spark Connect to use the standard `UNRESOLVED_COLUMN_AMONG_FIELD_NAMES` error 
class instead of throwing INTERNAL_ERROR, a generic error message.
    
    Example | Classic | Connect (Before) | Connect (After)
    -- | -- | -- | --
    <img width="497" height="264" alt="image" 
src="https://github.com/user-attachments/assets/15f9327c-b119-4e00-bcc8-9a85cb477413";
 /> | Cannot resolve column name "artist_id" among (id, song_name, 
artist_name). | 
[[INTERNAL_ERROR](https://docs.databricks.com/error-messages/error-classes.html#internal_error)]
 Invalid deduplicate column artist_id SQLSTATE: XX000 | 
[[UNRESOLVED_COLUMN_AMONG_FIELD_NAMES](https://docs.databricks.com/error-messages/error-classes.html#unresolved_column_among_fi
 [...]
    <img width="618" height="344" alt="image" 
src="https://github.com/user-attachments/assets/dd5e9c63-59e6-4fd2-997a-b4d1387dbdca";
 /> | Cannot resolve column name "cont.f1" among (id, cont). | 
[[INTERNAL_ERROR](https://docs.databricks.com/error-messages/error-classes.html#internal_error)]
 Invalid deduplicate column cont.f1 SQLSTATE: XX000 | 
[[UNRESOLVED_COLUMN_AMONG_FIELD_NAMES](https://docs.databricks.com/error-messages/error-classes.html#unresolved_column_among_field_names)]
 Cannot res [...]
    <img width="462" height="204" alt="image" 
src="https://github.com/user-attachments/assets/1ce2c96c-e1f6-4dac-8dfa-be5f7807f563";
 /> | works | works | works
    <img width="526" height="248" alt="image" 
src="https://github.com/user-attachments/assets/be55ca70-527f-4d0b-87e8-bdbb0a66d88c";
 /> | Cannot resolve column name "song.names" among (id, song.name, 
artist_name). | 
[[INTERNAL_ERROR](https://docs.databricks.com/error-messages/error-classes.html#internal_error)]
 Invalid deduplicate column song.names SQLSTATE: XX000 | 
[[UNRESOLVED_COLUMN_AMONG_FIELD_NAMES](https://docs.databricks.com/error-messages/error-classes.html#unresolved_column_among_
 [...]
    <img width="630" height="323" alt="image" 
src="https://github.com/user-attachments/assets/e48d1bee-e7a1-41ac-b1cd-cb55b5439065";
 /> | works | works | works
    <img width="625" height="333" alt="image" 
src="https://github.com/user-attachments/assets/17823344-f63d-4de3-9d7e-380906092e33";
 /> | Cannot resolve column name "cont.value" among (id, cont.val). | 
[[INTERNAL_ERROR](https://docs.databricks.com/error-messages/error-classes.html#internal_error)]
 Invalid deduplicate column cont.value SQLSTATE: XX000 | 
[[UNRESOLVED_COLUMN_AMONG_FIELD_NAMES](https://docs.databricks.com/error-messages/error-classes.html#unresolved_column_among_field_names)]
  [...]
    <img width="665" height="331" alt="image" 
src="https://github.com/user-attachments/assets/10a82231-603e-4487-bf9a-408875d6e15d";
 /> | <img width="167" height="117" alt="image" 
src="https://github.com/user-attachments/assets/acb15971-dae6-4a9a-ac9d-79192504f90f";
 /> | same | same
    
    ### Why are the changes needed?
    The previous error message in Spark Connect was not consistent with classic 
Spark and lacked helpful context.
    This change aligns Spark Connect error messages with classic Spark, 
providing users with:
    1. The correct error class (`UNRESOLVED_COLUMN_AMONG_FIELD_NAMES` instead 
of `INTERNAL_ERROR`).
    2. The correct SQLSTATE (42703 instead of XX000).
    3. A list of available column names to help users fix the issue.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes. Error messages for invalid deduplicate column names in Spark Connect 
are now more detailed and consistent with classic Spark.
    
    ### How was this patch tested?
    Tested with a custom image with the proposed changes.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #54422 from pranavdev022/dedup-errors-connect.
    
    Authored-by: pranavdev022 <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .../org/apache/spark/sql/connect/planner/InvalidInputErrors.scala   | 6 ++++--
 .../org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala  | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git 
a/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/InvalidInputErrors.scala
 
b/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/InvalidInputErrors.scala
index 81c001ed839f..f4a6913d1eab 100644
--- 
a/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/InvalidInputErrors.scala
+++ 
b/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/InvalidInputErrors.scala
@@ -54,8 +54,10 @@ object InvalidInputErrors {
     InvalidPlanInput(
       "Deduplicate requires to either deduplicate on all columns or a subset 
of columns")
 
-  def invalidDeduplicateColumn(colName: String): InvalidPlanInput =
-    InvalidPlanInput(s"Invalid deduplicate column $colName")
+  def invalidDeduplicateColumn(colName: String, fieldNames: String): 
InvalidPlanInput =
+    InvalidPlanInput(
+      "UNRESOLVED_COLUMN_AMONG_FIELD_NAMES",
+      Map("colName" -> colName, "fieldNames" -> fieldNames))
 
   def functionEvalTypeNotSupported(evalType: Int): InvalidPlanInput =
     InvalidPlanInput(s"Function with EvalType: $evalType is not supported")
diff --git 
a/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 611e19b01b20..ee8180d5e6f8 100644
--- 
a/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -1445,7 +1445,8 @@ class SparkConnectPlanner(
         // so we call filter instead of find.
         val cols = allColumns.filter(col => resolver(col.name, colName))
         if (cols.isEmpty) {
-          throw InvalidInputErrors.invalidDeduplicateColumn(colName)
+          val fieldNames = allColumns.map(_.name).mkString(", ")
+          throw InvalidInputErrors.invalidDeduplicateColumn(colName, 
fieldNames)
         }
         cols
       }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-55636][CONNECT] Add detailed errors in case of deduplication of invalid columns

Reply via email to