This is an automated email from the ASF dual-hosted git repository.
gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 53606f21eb1a [SPARK-51695][SQL][DOCS][FOLLOW-UP] Clarify NULL handling
semantics in UNIQUE constraint Javadoc
53606f21eb1a is described below
commit 53606f21eb1a4dd47be15a2fc353f1dffa23c58d
Author: Yan Yan <[email protected]>
AuthorDate: Wed Feb 25 17:04:24 2026 -0800
[SPARK-51695][SQL][DOCS][FOLLOW-UP] Clarify NULL handling semantics in
UNIQUE constraint Javadoc
### What changes were proposed in this pull request?
Document NULLS DISTINCT behavior: NULL values are treated as distinct from
each other, so rows with NULLs in unique columns never conflict. Also note that
UNIQUE allows nullable columns (unlike PRIMARY KEY) and that NULLS NOT DISTINCT
is not currently supported.
### Why are the changes needed?
Better javadoc clarity on UNIQUE constraint expectation.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
N/A
### Was this patch authored or co-authored using generative AI tooling?
Co-Authored-By: Claude Opus 4.6 <noreplyanthropic.com>
Closes #54357 from yyanyy/unique_definition_clarify.
Authored-by: Yan Yan <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
---
.../apache/spark/sql/connector/catalog/constraints/Unique.java | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Unique.java
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Unique.java
index d983ef656297..d4837932863f 100644
---
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Unique.java
+++
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Unique.java
@@ -28,6 +28,15 @@ import
org.apache.spark.sql.connector.expressions.NamedReference;
* <p>
* A UNIQUE constraint specifies one or more columns as unique columns. Such
constraint is satisfied
* if and only if no two rows in a table have the same non-null values in the
unique columns.
+ * Unlike PRIMARY KEY, UNIQUE allows nullable columns.
+ * <p>
+ * NULL values are treated as distinct from each other (NULLS DISTINCT
semantics). Two rows
+ * are considered duplicates only when every column in the unique key has a
non-null value and
+ * every value matches. If any column in the unique key is NULL, the row is
always considered
+ * unique regardless of other values. In other words, multiple rows with NULL
in one or more
+ * unique columns are allowed and do not violate the constraint definition.
+ * <p>
+ * The {@code NULLS NOT DISTINCT} modifier is not currently supported.
* <p>
* Spark doesn't enforce UNIQUE constraints but leverages them for query
optimization. Each
* constraint is either valid (the existing data is guaranteed to satisfy the
constraint), invalid
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]