This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new d1aa7d6c4427 [SPARK-54713][SQL] Add vector similarity/distance
function expressions support
d1aa7d6c4427 is described below
commit d1aa7d6c4427be8444450af1c1894704575dccb2
Author: zhidongqu-db <[email protected]>
AuthorDate: Wed Jan 7 16:03:47 2026 +0800
[SPARK-54713][SQL] Add vector similarity/distance function expressions
support
### What changes were proposed in this pull request?
This PR adds support for vector distance/similarity functions to Spark SQL,
enabling efficient computation of common distance/similarity metrics on
embedding vectors.
Similarity Functions
- vector_cosine_similarity(vector1, vector2) - Returns the cosine
similarity between two vectors (range: -1.0 to 1.0)
- vector_inner_product(vector1, vector2) - Returns the inner product (dot
product) between two vectors
Distance Functions
- vector_l2_distance(vector1, vector2) - Returns the Euclidean (L2)
distance between two vectors
Key implementation details:
- Type Safety: All functions accept only ARRAY<FLOAT> inputs and return
FLOAT. No implicit type casting is performed - passing ARRAY<DOUBLE> or
ARRAY<INT> results in a DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE error.
- Dimension Validation: Vectors must have matching dimensions; mismatches
throw VECTOR_DIMENSION_MISMATCH error with clear messaging.
- NULL Handling: NULL array inputs return NULL. Arrays containing NULL
elements also return NULL.
- Edge Cases: Empty vectors return NULL for cosine similarity (undefined),
0.0 for inner product and L2 distance. Zero-magnitude vectors return NULL for
cosine similarity.
- Code Generation: Optimized codegen with manual loop unrolling (8 elements
at a time) to enable potential JIT SIMD vectorization. This can be evolved to
use Java Vector API in the future.
This PR only includes SQL language support; DataFrame API need to be added
in a separate PR.
### Why are the changes needed?
Vector distance/similarity functions are fundamental operations for:
- Similarity search: Finding similar items in recommendation systems
- Clustering: K-means and other distance-based clustering algorithms
- Categorization: Categorizing text/image to labels based on embeddings
These functions are commonly available in other OLAP/OLTP systems
(BigQuery, PostgreSQL pgvector, DuckDB...) and are essential for modern ML/AI
workloads in Spark SQL.
### Does this PR introduce _any_ user-facing change?
Yes, this PR introduces 3 new SQL functions:
```
-- Cosine similarity (1.0 = identical, 0 = orthogonal, -1.0 = opposite)
SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F),
array(4.0F, 5.0F, 6.0F));
-- Returns: 0.97463185
-- Inner product (dot product)
SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F,
6.0F));
-- Returns: 32.0
-- L2 (Euclidean) distance
SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F,
6.0F));
-- Returns: 5.196152
```
### How was this patch tested?
SQL Golden File Tests: Added vector-distance.sql with test coverage:
- Basic functionality tests for all three functions
- Mathematical correctness validation (identical vectors, orthogonal
vectors, known values)
- Empty vector handling
- Zero-magnitude vector handling
- NULL array input handling
- NULL element within array handling
- Dimension mismatch error cases
- Type mismatch error cases (ARRAY, ARRAY)
- Large vector tests (16 elements to exercise unrolled loop path)
Expression Schema Tests: Updated sql-expression-schema.md via
ExpressionsSchemaSuite.
### Was this patch authored or co-authored using generative AI tooling?
Yes, code assistance with claude opus 4.5 in combination with manual
editing by the author.
Closes #53481 from zhidongqu-db/distance-functions.
Authored-by: zhidongqu-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
.../src/main/resources/error/error-conditions.json | 6 +
.../scala/org/apache/spark/sql/functions.scala | 1 +
.../sql/catalyst/expressions/ExpressionInfo.java | 3 +-
.../expressions/VectorFunctionImplUtils.java | 247 +++++++
.../sql/catalyst/analysis/FunctionRegistry.scala | 5 +
.../catalyst/expressions/vectorExpressions.scala | 198 ++++++
.../spark/sql/errors/QueryExecutionErrors.scala | 12 +
.../sql-functions/sql-expression-schema.md | 3 +
.../analyzer-results/vector-distance.sql.out | 656 ++++++++++++++++++
.../resources/sql-tests/inputs/vector-distance.sql | 132 ++++
.../sql-tests/results/vector-distance.sql.out | 746 +++++++++++++++++++++
.../sql/expressions/ExpressionInfoSuite.scala | 2 +-
12 files changed, 2009 insertions(+), 2 deletions(-)
diff --git a/common/utils/src/main/resources/error/error-conditions.json
b/common/utils/src/main/resources/error/error-conditions.json
index b101782e28c2..18ef49f01bef 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -7483,6 +7483,12 @@
],
"sqlState" : "22023"
},
+ "VECTOR_DIMENSION_MISMATCH" : {
+ "message" : [
+ "Vectors passed to <functionName> must have the same dimension, but got
<leftDim> and <rightDim>."
+ ],
+ "sqlState" : "22000"
+ },
"VIEW_ALREADY_EXISTS" : {
"message" : [
"Cannot create view <relationName> because it already exists.",
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/functions.scala
b/sql/api/src/main/scala/org/apache/spark/sql/functions.scala
index 78675364d841..c4eaca3ac7a6 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/functions.scala
@@ -75,6 +75,7 @@ import org.apache.spark.util.SparkClassUtils
* @groupname csv_funcs CSV functions
* @groupname json_funcs JSON functions
* @groupname variant_funcs VARIANT functions
+ * @groupname vector_funcs Vector functions
* @groupname xml_funcs XML functions
* @groupname url_funcs URL functions
* @groupname partition_transforms Partition transform functions
diff --git
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
index dd56c650c073..592230cc5b10 100644
---
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
+++
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
@@ -48,7 +48,8 @@ public class ExpressionInfo {
"collection_funcs", "predicate_funcs", "conditional_funcs",
"conversion_funcs",
"csv_funcs", "datetime_funcs", "generator_funcs", "hash_funcs",
"json_funcs",
"lambda_funcs", "map_funcs", "math_funcs", "misc_funcs",
"string_funcs", "struct_funcs",
- "window_funcs", "xml_funcs", "table_funcs", "url_funcs",
"variant_funcs", "st_funcs"));
+ "window_funcs", "xml_funcs", "table_funcs", "url_funcs",
"variant_funcs",
+ "vector_funcs", "st_funcs"));
private static final Set<String> validSources =
new HashSet<>(Arrays.asList("built-in", "hive", "python_udf",
"scala_udf", "sql_udf",
diff --git
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/VectorFunctionImplUtils.java
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/VectorFunctionImplUtils.java
new file mode 100644
index 000000000000..59811298360f
--- /dev/null
+++
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/VectorFunctionImplUtils.java
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+import org.apache.spark.sql.catalyst.util.ArrayData;
+import org.apache.spark.sql.errors.QueryExecutionErrors;
+import org.apache.spark.unsafe.types.UTF8String;
+
+/**
+ * A utility class for vector similarity/distance function implementations.
+ */
+public class VectorFunctionImplUtils {
+
+ /**
+ * Computes the cosine similarity between two float vectors.
+ * Returns NULL if either vector contains NULL elements, has zero magnitude,
or is empty.
+ * Throws an exception if vectors have different dimensions.
+ * Uses manual loop unrolling (8 elements at a time) for speculative SIMD
optimization.
+ */
+ public static Float vectorCosineSimilarity(ArrayData left, ArrayData right,
UTF8String funcName) {
+ int leftLen = left.numElements();
+ int rightLen = right.numElements();
+
+ if (leftLen != rightLen) {
+ throw QueryExecutionErrors.vectorDimensionMismatchError(
+ funcName.toString(), leftLen, rightLen);
+ }
+
+ if (leftLen == 0) {
+ return null;
+ }
+
+ double dotProduct = 0.0;
+ double norm1Sq = 0.0;
+ double norm2Sq = 0.0;
+
+ int i = 0;
+ int simdLimit = (leftLen / 8) * 8;
+
+ // Manual unroll loop - process 8 floats at a time for speculative SIMD
optimization
+ while (i < simdLimit) {
+ // Check for nulls in batch
+ if (left.isNullAt(i) || left.isNullAt(i + 1) ||
+ left.isNullAt(i + 2) || left.isNullAt(i + 3) ||
+ left.isNullAt(i + 4) || left.isNullAt(i + 5) ||
+ left.isNullAt(i + 6) || left.isNullAt(i + 7) ||
+ right.isNullAt(i) || right.isNullAt(i + 1) ||
+ right.isNullAt(i + 2) || right.isNullAt(i + 3) ||
+ right.isNullAt(i + 4) || right.isNullAt(i + 5) ||
+ right.isNullAt(i + 6) || right.isNullAt(i + 7)) {
+ return null;
+ }
+
+ float a0 = left.getFloat(i), a1 = left.getFloat(i + 1);
+ float a2 = left.getFloat(i + 2), a3 = left.getFloat(i + 3);
+ float a4 = left.getFloat(i + 4), a5 = left.getFloat(i + 5);
+ float a6 = left.getFloat(i + 6), a7 = left.getFloat(i + 7);
+
+ float b0 = right.getFloat(i), b1 = right.getFloat(i + 1);
+ float b2 = right.getFloat(i + 2), b3 = right.getFloat(i + 3);
+ float b4 = right.getFloat(i + 4), b5 = right.getFloat(i + 5);
+ float b6 = right.getFloat(i + 6), b7 = right.getFloat(i + 7);
+
+ dotProduct += (double) (a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3 +
+ a4 * b4 + a5 * b5 + a6 * b6 + a7 * b7);
+ norm1Sq += (double) (a0 * a0 + a1 * a1 + a2 * a2 + a3 * a3 +
+ a4 * a4 + a5 * a5 + a6 * a6 + a7 * a7);
+ norm2Sq += (double) (b0 * b0 + b1 * b1 + b2 * b2 + b3 * b3 +
+ b4 * b4 + b5 * b5 + b6 * b6 + b7 * b7);
+ i += 8;
+ }
+
+ // Handle remaining elements
+ while (i < leftLen) {
+ if (left.isNullAt(i) || right.isNullAt(i)) {
+ return null;
+ }
+ float a = left.getFloat(i);
+ float b = right.getFloat(i);
+ dotProduct += (double) (a * b);
+ norm1Sq += (double) (a * a);
+ norm2Sq += (double) (b * b);
+ i++;
+ }
+
+ double normProduct = Math.sqrt(norm1Sq * norm2Sq);
+ if (normProduct == 0.0) {
+ return null;
+ }
+ return (float) (dotProduct / normProduct);
+ }
+
+ /**
+ * Computes the inner product (dot product) between two float vectors.
+ * Returns NULL if either vector contains NULL elements.
+ * Returns 0.0 for empty vectors.
+ * Throws an exception if vectors have different dimensions.
+ * Uses manual loop unrolling (8 elements at a time) for speculative SIMD
optimization.
+ */
+ public static Float vectorInnerProduct(ArrayData left, ArrayData right,
UTF8String funcName) {
+ int leftLen = left.numElements();
+ int rightLen = right.numElements();
+
+ if (leftLen != rightLen) {
+ throw QueryExecutionErrors.vectorDimensionMismatchError(
+ funcName.toString(), leftLen, rightLen);
+ }
+
+ if (leftLen == 0) {
+ return 0.0f;
+ }
+
+ double dotProduct = 0.0;
+
+ int i = 0;
+ int simdLimit = (leftLen / 8) * 8;
+
+ // Manual unroll loop - process 8 floats at a time for speculative SIMD
optimization
+ while (i < simdLimit) {
+ // Check for nulls in batch
+ if (left.isNullAt(i) || left.isNullAt(i + 1) ||
+ left.isNullAt(i + 2) || left.isNullAt(i + 3) ||
+ left.isNullAt(i + 4) || left.isNullAt(i + 5) ||
+ left.isNullAt(i + 6) || left.isNullAt(i + 7) ||
+ right.isNullAt(i) || right.isNullAt(i + 1) ||
+ right.isNullAt(i + 2) || right.isNullAt(i + 3) ||
+ right.isNullAt(i + 4) || right.isNullAt(i + 5) ||
+ right.isNullAt(i + 6) || right.isNullAt(i + 7)) {
+ return null;
+ }
+
+ float a0 = left.getFloat(i), a1 = left.getFloat(i + 1);
+ float a2 = left.getFloat(i + 2), a3 = left.getFloat(i + 3);
+ float a4 = left.getFloat(i + 4), a5 = left.getFloat(i + 5);
+ float a6 = left.getFloat(i + 6), a7 = left.getFloat(i + 7);
+
+ float b0 = right.getFloat(i), b1 = right.getFloat(i + 1);
+ float b2 = right.getFloat(i + 2), b3 = right.getFloat(i + 3);
+ float b4 = right.getFloat(i + 4), b5 = right.getFloat(i + 5);
+ float b6 = right.getFloat(i + 6), b7 = right.getFloat(i + 7);
+
+ dotProduct += (double) (a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3 +
+ a4 * b4 + a5 * b5 + a6 * b6 + a7 * b7);
+ i += 8;
+ }
+
+ // Handle remaining elements
+ while (i < leftLen) {
+ if (left.isNullAt(i) || right.isNullAt(i)) {
+ return null;
+ }
+ float a = left.getFloat(i);
+ float b = right.getFloat(i);
+ dotProduct += (double) (a * b);
+ i++;
+ }
+
+ return (float) dotProduct;
+ }
+
+ /**
+ * Computes the Euclidean (L2) distance between two float vectors.
+ * Returns NULL if either vector contains NULL elements.
+ * Returns 0.0 for empty vectors.
+ * Throws an exception if vectors have different dimensions.
+ * Uses manual loop unrolling (8 elements at a time) for speculative SIMD
optimization.
+ */
+ public static Float vectorL2Distance(ArrayData left, ArrayData right,
UTF8String funcName) {
+ int leftLen = left.numElements();
+ int rightLen = right.numElements();
+
+ if (leftLen != rightLen) {
+ throw QueryExecutionErrors.vectorDimensionMismatchError(
+ funcName.toString(), leftLen, rightLen);
+ }
+
+ if (leftLen == 0) {
+ return 0.0f;
+ }
+
+ double sumSq = 0.0;
+
+ int i = 0;
+ int simdLimit = (leftLen / 8) * 8;
+
+ // Manual unroll loop - process 8 floats at a time for speculative SIMD
optimization
+ while (i < simdLimit) {
+ // Check for nulls in batch
+ if (left.isNullAt(i) || left.isNullAt(i + 1) ||
+ left.isNullAt(i + 2) || left.isNullAt(i + 3) ||
+ left.isNullAt(i + 4) || left.isNullAt(i + 5) ||
+ left.isNullAt(i + 6) || left.isNullAt(i + 7) ||
+ right.isNullAt(i) || right.isNullAt(i + 1) ||
+ right.isNullAt(i + 2) || right.isNullAt(i + 3) ||
+ right.isNullAt(i + 4) || right.isNullAt(i + 5) ||
+ right.isNullAt(i + 6) || right.isNullAt(i + 7)) {
+ return null;
+ }
+
+ float a0 = left.getFloat(i), a1 = left.getFloat(i + 1);
+ float a2 = left.getFloat(i + 2), a3 = left.getFloat(i + 3);
+ float a4 = left.getFloat(i + 4), a5 = left.getFloat(i + 5);
+ float a6 = left.getFloat(i + 6), a7 = left.getFloat(i + 7);
+
+ float b0 = right.getFloat(i), b1 = right.getFloat(i + 1);
+ float b2 = right.getFloat(i + 2), b3 = right.getFloat(i + 3);
+ float b4 = right.getFloat(i + 4), b5 = right.getFloat(i + 5);
+ float b6 = right.getFloat(i + 6), b7 = right.getFloat(i + 7);
+
+ float d0 = a0 - b0, d1 = a1 - b1, d2 = a2 - b2, d3 = a3 - b3;
+ float d4 = a4 - b4, d5 = a5 - b5, d6 = a6 - b6, d7 = a7 - b7;
+
+ sumSq += (double) (d0 * d0 + d1 * d1 + d2 * d2 + d3 * d3 +
+ d4 * d4 + d5 * d5 + d6 * d6 + d7 * d7);
+ i += 8;
+ }
+
+ // Handle remaining elements
+ while (i < leftLen) {
+ if (left.isNullAt(i) || right.isNullAt(i)) {
+ return null;
+ }
+ float a = left.getFloat(i);
+ float b = right.getFloat(i);
+ float diff = a - b;
+ sumSq += (double) (diff * diff);
+ i++;
+ }
+
+ return (float) Math.sqrt(sumSq);
+ }
+}
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index cc8ac07126cd..4faac1a8b6df 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -553,6 +553,11 @@ object FunctionRegistry {
expression[KllSketchGetRankFloat]("kll_sketch_get_rank_float"),
expression[KllSketchGetRankDouble]("kll_sketch_get_rank_double"),
+ // vector functions
+ expression[VectorCosineSimilarity]("vector_cosine_similarity"),
+ expression[VectorInnerProduct]("vector_inner_product"),
+ expression[VectorL2Distance]("vector_l2_distance"),
+
// string functions
expression[Ascii]("ascii"),
expression[Chr]("char", true),
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/vectorExpressions.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/vectorExpressions.scala
new file mode 100644
index 000000000000..0b658eb7d5ea
--- /dev/null
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/vectorExpressions.scala
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.DataTypeMismatch
+import org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke
+import org.apache.spark.sql.errors.QueryErrorsBase
+import org.apache.spark.sql.types.{ArrayType, FloatType, StringType}
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array1, array2) - Returns the cosine similarity between two float
vectors.
+ The vectors must have the same dimension.
+ """,
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
+ 0.97463185
+ """,
+ since = "4.2.0",
+ group = "vector_funcs"
+)
+// scalastyle:on line.size.limit
+case class VectorCosineSimilarity(left: Expression, right: Expression)
+ extends RuntimeReplaceable with QueryErrorsBase {
+
+ override def checkInputDataTypes(): TypeCheckResult = {
+ (left.dataType, right.dataType) match {
+ case (ArrayType(FloatType, _), ArrayType(FloatType, _)) =>
+ TypeCheckResult.TypeCheckSuccess
+ case (ArrayType(FloatType, _), _) =>
+ DataTypeMismatch(
+ errorSubClass = "UNEXPECTED_INPUT_TYPE",
+ messageParameters = Map(
+ "paramIndex" -> ordinalNumber(1),
+ "requiredType" -> toSQLType(ArrayType(FloatType)),
+ "inputSql" -> toSQLExpr(right),
+ "inputType" -> toSQLType(right.dataType)))
+ case _ =>
+ DataTypeMismatch(
+ errorSubClass = "UNEXPECTED_INPUT_TYPE",
+ messageParameters = Map(
+ "paramIndex" -> ordinalNumber(0),
+ "requiredType" -> toSQLType(ArrayType(FloatType)),
+ "inputSql" -> toSQLExpr(left),
+ "inputType" -> toSQLType(left.dataType)))
+ }
+ }
+
+ override lazy val replacement: Expression = StaticInvoke(
+ classOf[VectorFunctionImplUtils],
+ FloatType,
+ "vectorCosineSimilarity",
+ Seq(left, right, Literal(prettyName)),
+ Seq(ArrayType(FloatType), ArrayType(FloatType), StringType))
+
+ override def prettyName: String = "vector_cosine_similarity"
+
+ override def children: Seq[Expression] = Seq(left, right)
+
+ override protected def withNewChildrenInternal(
+ newChildren: IndexedSeq[Expression]): VectorCosineSimilarity = {
+ copy(left = newChildren(0), right = newChildren(1))
+ }
+}
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array1, array2) - Returns the inner product (dot product) between
two float vectors.
+ The vectors must have the same dimension.
+ """,
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
+ 32.0
+ """,
+ since = "4.2.0",
+ group = "vector_funcs"
+)
+// scalastyle:on line.size.limit
+case class VectorInnerProduct(left: Expression, right: Expression)
+ extends RuntimeReplaceable with QueryErrorsBase {
+
+ override def checkInputDataTypes(): TypeCheckResult = {
+ (left.dataType, right.dataType) match {
+ case (ArrayType(FloatType, _), ArrayType(FloatType, _)) =>
+ TypeCheckResult.TypeCheckSuccess
+ case (ArrayType(FloatType, _), _) =>
+ DataTypeMismatch(
+ errorSubClass = "UNEXPECTED_INPUT_TYPE",
+ messageParameters = Map(
+ "paramIndex" -> ordinalNumber(1),
+ "requiredType" -> toSQLType(ArrayType(FloatType)),
+ "inputSql" -> toSQLExpr(right),
+ "inputType" -> toSQLType(right.dataType)))
+ case _ =>
+ DataTypeMismatch(
+ errorSubClass = "UNEXPECTED_INPUT_TYPE",
+ messageParameters = Map(
+ "paramIndex" -> ordinalNumber(0),
+ "requiredType" -> toSQLType(ArrayType(FloatType)),
+ "inputSql" -> toSQLExpr(left),
+ "inputType" -> toSQLType(left.dataType)))
+ }
+ }
+
+ override lazy val replacement: Expression = StaticInvoke(
+ classOf[VectorFunctionImplUtils],
+ FloatType,
+ "vectorInnerProduct",
+ Seq(left, right, Literal(prettyName)),
+ Seq(ArrayType(FloatType), ArrayType(FloatType), StringType))
+
+ override def prettyName: String = "vector_inner_product"
+
+ override def children: Seq[Expression] = Seq(left, right)
+
+ override protected def withNewChildrenInternal(
+ newChildren: IndexedSeq[Expression]): VectorInnerProduct = {
+ copy(left = newChildren(0), right = newChildren(1))
+ }
+}
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array1, array2) - Returns the Euclidean (L2) distance between two
float vectors.
+ The vectors must have the same dimension.
+ """,
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
+ 5.196152
+ """,
+ since = "4.2.0",
+ group = "vector_funcs"
+)
+// scalastyle:on line.size.limit
+case class VectorL2Distance(left: Expression, right: Expression)
+ extends RuntimeReplaceable with QueryErrorsBase {
+
+ override def checkInputDataTypes(): TypeCheckResult = {
+ (left.dataType, right.dataType) match {
+ case (ArrayType(FloatType, _), ArrayType(FloatType, _)) =>
+ TypeCheckResult.TypeCheckSuccess
+ case (ArrayType(FloatType, _), _) =>
+ DataTypeMismatch(
+ errorSubClass = "UNEXPECTED_INPUT_TYPE",
+ messageParameters = Map(
+ "paramIndex" -> ordinalNumber(1),
+ "requiredType" -> toSQLType(ArrayType(FloatType)),
+ "inputSql" -> toSQLExpr(right),
+ "inputType" -> toSQLType(right.dataType)))
+ case _ =>
+ DataTypeMismatch(
+ errorSubClass = "UNEXPECTED_INPUT_TYPE",
+ messageParameters = Map(
+ "paramIndex" -> ordinalNumber(0),
+ "requiredType" -> toSQLType(ArrayType(FloatType)),
+ "inputSql" -> toSQLExpr(left),
+ "inputType" -> toSQLType(left.dataType)))
+ }
+ }
+
+ override lazy val replacement: Expression = StaticInvoke(
+ classOf[VectorFunctionImplUtils],
+ FloatType,
+ "vectorL2Distance",
+ Seq(left, right, Literal(prettyName)),
+ Seq(ArrayType(FloatType), ArrayType(FloatType), StringType))
+
+ override def prettyName: String = "vector_l2_distance"
+
+ override def children: Seq[Expression] = Seq(left, right)
+
+ override protected def withNewChildrenInternal(
+ newChildren: IndexedSeq[Expression]): VectorL2Distance = {
+ copy(left = newChildren(0), right = newChildren(1))
+ }
+}
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 13b948391622..0bde732ea5d3 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -3222,4 +3222,16 @@ private[sql] object QueryExecutionErrors extends
QueryErrorsBase with ExecutionE
"functionName" -> toSQLId(function),
"k" -> toSQLValue(k, IntegerType)))
}
+
+ def vectorDimensionMismatchError(
+ function: String,
+ leftDim: Int,
+ rightDim: Int): RuntimeException = {
+ new SparkRuntimeException(
+ errorClass = "VECTOR_DIMENSION_MISMATCH",
+ messageParameters = Map(
+ "functionName" -> toSQLId(function),
+ "leftDim" -> leftDim.toString,
+ "rightDim" -> rightDim.toString))
+ }
}
diff --git a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
index cd4538162b8d..6ec3db92f4f2 100644
--- a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
+++ b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
@@ -421,6 +421,9 @@
| org.apache.spark.sql.catalyst.expressions.UrlEncode | url_encode | SELECT
url_encode('https://spark.apache.org') |
struct<url_encode(https://spark.apache.org):string> |
| org.apache.spark.sql.catalyst.expressions.Uuid | uuid | SELECT uuid() |
struct<uuid():string> |
| org.apache.spark.sql.catalyst.expressions.ValidateUTF8 | validate_utf8 |
SELECT validate_utf8('Spark') | struct<validate_utf8(Spark):string> |
+| org.apache.spark.sql.catalyst.expressions.VectorCosineSimilarity |
vector_cosine_similarity | SELECT vector_cosine_similarity(array(1.0F, 2.0F,
3.0F), array(4.0F, 5.0F, 6.0F)) | struct<vector_cosine_similarity(array(1.0,
2.0, 3.0), array(4.0, 5.0, 6.0)):float> |
+| org.apache.spark.sql.catalyst.expressions.VectorInnerProduct |
vector_inner_product | SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F),
array(4.0F, 5.0F, 6.0F)) | struct<vector_inner_product(array(1.0, 2.0, 3.0),
array(4.0, 5.0, 6.0)):float> |
+| org.apache.spark.sql.catalyst.expressions.VectorL2Distance |
vector_l2_distance | SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F),
array(4.0F, 5.0F, 6.0F)) | struct<vector_l2_distance(array(1.0, 2.0, 3.0),
array(4.0, 5.0, 6.0)):float> |
| org.apache.spark.sql.catalyst.expressions.WeekDay | weekday | SELECT
weekday('2009-07-30') | struct<weekday(2009-07-30):int> |
| org.apache.spark.sql.catalyst.expressions.WeekOfYear | weekofyear | SELECT
weekofyear('2008-02-20') | struct<weekofyear(2008-02-20):int> |
| org.apache.spark.sql.catalyst.expressions.WidthBucket | width_bucket |
SELECT width_bucket(5.3, 0.2, 10.6, 5) | struct<width_bucket(5.3, 0.2, 10.6,
5):bigint> |
diff --git
a/sql/core/src/test/resources/sql-tests/analyzer-results/vector-distance.sql.out
b/sql/core/src/test/resources/sql-tests/analyzer-results/vector-distance.sql.out
new file mode 100644
index 000000000000..42239b329730
--- /dev/null
+++
b/sql/core/src/test/resources/sql-tests/analyzer-results/vector-distance.sql.out
@@ -0,0 +1,656 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F,
6.0F))
+-- !query analysis
+Project [vector_cosine_similarity(array(1.0, 2.0, 3.0), array(4.0, 5.0, 6.0))
AS vector_cosine_similarity(array(1.0, 2.0, 3.0), array(4.0, 5.0, 6.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 0.0F, 0.0F), array(1.0F, 0.0F,
0.0F))
+-- !query analysis
+Project [vector_cosine_similarity(array(1.0, 0.0, 0.0), array(1.0, 0.0, 0.0))
AS vector_cosine_similarity(array(1.0, 0.0, 0.0), array(1.0, 0.0, 0.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 0.0F), array(0.0F, 1.0F))
+-- !query analysis
+Project [vector_cosine_similarity(array(1.0, 0.0), array(0.0, 1.0)) AS
vector_cosine_similarity(array(1.0, 0.0), array(0.0, 1.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 0.0F), array(-1.0F, 0.0F))
+-- !query analysis
+Project [vector_cosine_similarity(array(1.0, 0.0), array(-1.0, 0.0)) AS
vector_cosine_similarity(array(1.0, 0.0), array(-1.0, 0.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F))
+-- !query analysis
+Project [vector_inner_product(array(1.0, 2.0, 3.0), array(4.0, 5.0, 6.0)) AS
vector_inner_product(array(1.0, 2.0, 3.0), array(4.0, 5.0, 6.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 0.0F), array(0.0F, 1.0F))
+-- !query analysis
+Project [vector_inner_product(array(1.0, 0.0), array(0.0, 1.0)) AS
vector_inner_product(array(1.0, 0.0), array(0.0, 1.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_inner_product(array(3.0F, 4.0F), array(3.0F, 4.0F))
+-- !query analysis
+Project [vector_inner_product(array(3.0, 4.0), array(3.0, 4.0)) AS
vector_inner_product(array(3.0, 4.0), array(3.0, 4.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F))
+-- !query analysis
+Project [vector_l2_distance(array(1.0, 2.0, 3.0), array(4.0, 5.0, 6.0)) AS
vector_l2_distance(array(1.0, 2.0, 3.0), array(4.0, 5.0, 6.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F), array(1.0F, 2.0F))
+-- !query analysis
+Project [vector_l2_distance(array(1.0, 2.0), array(1.0, 2.0)) AS
vector_l2_distance(array(1.0, 2.0), array(1.0, 2.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_l2_distance(array(0.0F, 0.0F), array(3.0F, 4.0F))
+-- !query analysis
+Project [vector_l2_distance(array(0.0, 0.0), array(3.0, 4.0)) AS
vector_l2_distance(array(0.0, 0.0), array(3.0, 4.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_cosine_similarity(array(), array())
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array()\"",
+ "inputType" : "\"ARRAY<VOID>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(), array())\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 49,
+ "fragment" : "vector_cosine_similarity(array(), array())"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(CAST(array() AS ARRAY<FLOAT>), CAST(array() AS
ARRAY<FLOAT>))
+-- !query analysis
+Project [vector_inner_product(cast(array() as array<float>), cast(array() as
array<float>)) AS vector_inner_product(array(), array())#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_l2_distance(CAST(array() AS ARRAY<FLOAT>), CAST(array() AS
ARRAY<FLOAT>))
+-- !query analysis
+Project [vector_l2_distance(cast(array() as array<float>), cast(array() as
array<float>)) AS vector_l2_distance(array(), array())#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_cosine_similarity(array(0.0F, 0.0F, 0.0F), array(1.0F, 2.0F,
3.0F))
+-- !query analysis
+Project [vector_cosine_similarity(array(0.0, 0.0, 0.0), array(1.0, 2.0, 3.0))
AS vector_cosine_similarity(array(0.0, 0.0, 0.0), array(1.0, 2.0, 3.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_cosine_similarity(NULL, array(1.0F, 2.0F, 3.0F))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(NULL, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 62,
+ "fragment" : "vector_cosine_similarity(NULL, array(1.0F, 2.0F, 3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), NULL)
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1.0, 2.0, 3.0), NULL)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 62,
+ "fragment" : "vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), NULL)"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(NULL, array(1.0F, 2.0F, 3.0F))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(NULL, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 58,
+ "fragment" : "vector_inner_product(NULL, array(1.0F, 2.0F, 3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), NULL)
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(array(1.0, 2.0, 3.0), NULL)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 58,
+ "fragment" : "vector_inner_product(array(1.0F, 2.0F, 3.0F), NULL)"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(NULL, array(1.0F, 2.0F, 3.0F))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(NULL, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 56,
+ "fragment" : "vector_l2_distance(NULL, array(1.0F, 2.0F, 3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), NULL)
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(array(1.0, 2.0, 3.0), NULL)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 56,
+ "fragment" : "vector_l2_distance(array(1.0F, 2.0F, 3.0F), NULL)"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, CAST(NULL AS FLOAT), 3.0F),
array(1.0F, 2.0F, 3.0F))
+-- !query analysis
+Project [vector_cosine_similarity(array(1.0, cast(null as float), 3.0),
array(1.0, 2.0, 3.0)) AS vector_cosine_similarity(array(1.0, CAST(NULL AS
FLOAT), 3.0), array(1.0, 2.0, 3.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, CAST(NULL AS FLOAT), 3.0F),
array(1.0F, 2.0F, 3.0F))
+-- !query analysis
+Project [vector_inner_product(array(1.0, cast(null as float), 3.0), array(1.0,
2.0, 3.0)) AS vector_inner_product(array(1.0, CAST(NULL AS FLOAT), 3.0),
array(1.0, 2.0, 3.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, CAST(NULL AS FLOAT), 3.0F), array(1.0F,
2.0F, 3.0F))
+-- !query analysis
+Project [vector_l2_distance(array(1.0, cast(null as float), 3.0), array(1.0,
2.0, 3.0)) AS vector_l2_distance(array(1.0, CAST(NULL AS FLOAT), 3.0),
array(1.0, 2.0, 3.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(1.0F, 2.0F))
+-- !query analysis
+Project [vector_cosine_similarity(array(1.0, 2.0, 3.0), array(1.0, 2.0)) AS
vector_cosine_similarity(array(1.0, 2.0, 3.0), array(1.0, 2.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(1.0F, 2.0F))
+-- !query analysis
+Project [vector_inner_product(array(1.0, 2.0, 3.0), array(1.0, 2.0)) AS
vector_inner_product(array(1.0, 2.0, 3.0), array(1.0, 2.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(1.0F, 2.0F))
+-- !query analysis
+Project [vector_l2_distance(array(1.0, 2.0, 3.0), array(1.0, 2.0)) AS
vector_l2_distance(array(1.0, 2.0, 3.0), array(1.0, 2.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0D, 2.0D, 3.0D), array(4.0D, 5.0D,
6.0D))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1.0, 2.0, 3.0)\"",
+ "inputType" : "\"ARRAY<DOUBLE>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1.0, 2.0, 3.0), array(4.0,
5.0, 6.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 81,
+ "fragment" : "vector_cosine_similarity(array(1.0D, 2.0D, 3.0D),
array(4.0D, 5.0D, 6.0D))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1, 2, 3), array(4, 5, 6))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1, 2, 3)\"",
+ "inputType" : "\"ARRAY<INT>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1, 2, 3), array(4, 5, 6))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 63,
+ "fragment" : "vector_cosine_similarity(array(1, 2, 3), array(4, 5, 6))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(array(1.0D, 2.0D, 3.0D), array(4.0D, 5.0D, 6.0D))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1.0, 2.0, 3.0)\"",
+ "inputType" : "\"ARRAY<DOUBLE>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(array(1.0, 2.0, 3.0), array(4.0, 5.0,
6.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 77,
+ "fragment" : "vector_inner_product(array(1.0D, 2.0D, 3.0D), array(4.0D,
5.0D, 6.0D))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(array(1, 2, 3), array(4, 5, 6))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1, 2, 3)\"",
+ "inputType" : "\"ARRAY<INT>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(array(1, 2, 3), array(4, 5, 6))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 59,
+ "fragment" : "vector_inner_product(array(1, 2, 3), array(4, 5, 6))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0D, 2.0D, 3.0D), array(4.0D, 5.0D, 6.0D))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1.0, 2.0, 3.0)\"",
+ "inputType" : "\"ARRAY<DOUBLE>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(array(1.0, 2.0, 3.0), array(4.0, 5.0,
6.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 75,
+ "fragment" : "vector_l2_distance(array(1.0D, 2.0D, 3.0D), array(4.0D,
5.0D, 6.0D))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(array(1, 2, 3), array(4, 5, 6))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1, 2, 3)\"",
+ "inputType" : "\"ARRAY<INT>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(array(1, 2, 3), array(4, 5, 6))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 57,
+ "fragment" : "vector_l2_distance(array(1, 2, 3), array(4, 5, 6))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(1.0D, 2.0D,
3.0D))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1.0, 2.0, 3.0)\"",
+ "inputType" : "\"ARRAY<DOUBLE>\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1.0, 2.0, 3.0), array(1.0,
2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 81,
+ "fragment" : "vector_cosine_similarity(array(1.0F, 2.0F, 3.0F),
array(1.0D, 2.0D, 3.0D))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity('not an array', array(1.0F, 2.0F, 3.0F))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(not an array, array(1.0, 2.0,
3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 72,
+ "fragment" : "vector_cosine_similarity('not an array', array(1.0F, 2.0F,
3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), 'not an array')
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1.0, 2.0, 3.0), not an
array)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 72,
+ "fragment" : "vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), 'not an
array')"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(123, 456)
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"123\"",
+ "inputType" : "\"INT\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(123, 456)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 41,
+ "fragment" : "vector_cosine_similarity(123, 456)"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product('not an array', array(1.0F, 2.0F, 3.0F))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(not an array, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 68,
+ "fragment" : "vector_inner_product('not an array', array(1.0F, 2.0F,
3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), 'not an array')
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(array(1.0, 2.0, 3.0), not an array)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 68,
+ "fragment" : "vector_inner_product(array(1.0F, 2.0F, 3.0F), 'not an
array')"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance('not an array', array(1.0F, 2.0F, 3.0F))
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(not an array, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 66,
+ "fragment" : "vector_l2_distance('not an array', array(1.0F, 2.0F, 3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), 'not an array')
+-- !query analysis
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(array(1.0, 2.0, 3.0), not an array)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 66,
+ "fragment" : "vector_l2_distance(array(1.0F, 2.0F, 3.0F), 'not an array')"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(
+ array(1.0F, 2.0F, 3.0F, 4.0F, 5.0F, 6.0F, 7.0F, 8.0F, 9.0F, 10.0F, 11.0F,
12.0F, 13.0F, 14.0F, 15.0F, 16.0F),
+ array(16.0F, 15.0F, 14.0F, 13.0F, 12.0F, 11.0F, 10.0F, 9.0F, 8.0F, 7.0F,
6.0F, 5.0F, 4.0F, 3.0F, 2.0F, 1.0F)
+)
+-- !query analysis
+Project [vector_inner_product(array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0,
9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0), array(16.0, 15.0, 14.0, 13.0,
12.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0)) AS
vector_inner_product(array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0,
11.0, 12.0, 13.0, 14.0, 15.0, 16.0), array(16.0, 15.0, 14.0, 13.0, 12.0, 11.0,
10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT vector_l2_distance(
+ array(1.0F, 2.0F, 3.0F, 4.0F, 5.0F, 6.0F, 7.0F, 8.0F, 9.0F, 10.0F, 11.0F,
12.0F, 13.0F, 14.0F, 15.0F, 16.0F),
+ array(16.0F, 15.0F, 14.0F, 13.0F, 12.0F, 11.0F, 10.0F, 9.0F, 8.0F, 7.0F,
6.0F, 5.0F, 4.0F, 3.0F, 2.0F, 1.0F)
+)
+-- !query analysis
+Project [vector_l2_distance(array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0,
10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0), array(16.0, 15.0, 14.0, 13.0, 12.0,
11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0)) AS
vector_l2_distance(array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0,
11.0, 12.0, 13.0, 14.0, 15.0, 16.0), array(16.0, 15.0, 14.0, 13.0, 12.0, 11.0,
10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0))#x]
++- OneRowRelation
diff --git a/sql/core/src/test/resources/sql-tests/inputs/vector-distance.sql
b/sql/core/src/test/resources/sql-tests/inputs/vector-distance.sql
new file mode 100644
index 000000000000..24035963260b
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/vector-distance.sql
@@ -0,0 +1,132 @@
+-- Tests for vector distance functions: vector_cosine_similarity,
vector_inner_product, vector_l2_distance
+
+-- Basic functionality tests
+
+-- vector_cosine_similarity: basic test
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F,
6.0F));
+
+-- vector_cosine_similarity: identical vectors (similarity = 1.0)
+SELECT vector_cosine_similarity(array(1.0F, 0.0F, 0.0F), array(1.0F, 0.0F,
0.0F));
+
+-- vector_cosine_similarity: orthogonal vectors (similarity = 0.0)
+SELECT vector_cosine_similarity(array(1.0F, 0.0F), array(0.0F, 1.0F));
+
+-- vector_cosine_similarity: opposite vectors (similarity = -1.0)
+SELECT vector_cosine_similarity(array(1.0F, 0.0F), array(-1.0F, 0.0F));
+
+-- vector_inner_product: basic test (1*4 + 2*5 + 3*6 = 32)
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
+
+-- vector_inner_product: orthogonal vectors (product = 0)
+SELECT vector_inner_product(array(1.0F, 0.0F), array(0.0F, 1.0F));
+
+-- vector_inner_product: self product (squared L2 norm: 3^2 + 4^2 = 25)
+SELECT vector_inner_product(array(3.0F, 4.0F), array(3.0F, 4.0F));
+
+-- vector_l2_distance: basic test (sqrt((4-1)^2 + (5-2)^2 + (6-3)^2) =
sqrt(27))
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
+
+-- vector_l2_distance: identical vectors (distance = 0)
+SELECT vector_l2_distance(array(1.0F, 2.0F), array(1.0F, 2.0F));
+
+-- vector_l2_distance: 3-4-5 triangle (distance = 5)
+SELECT vector_l2_distance(array(0.0F, 0.0F), array(3.0F, 4.0F));
+
+-- Edge cases
+
+-- Empty vectors: cosine similarity returns NULL
+SELECT vector_cosine_similarity(array(), array());
+
+-- Empty vectors: inner product returns 0.0
+SELECT vector_inner_product(CAST(array() AS ARRAY<FLOAT>), CAST(array() AS
ARRAY<FLOAT>));
+
+-- Empty vectors: L2 distance returns 0.0
+SELECT vector_l2_distance(CAST(array() AS ARRAY<FLOAT>), CAST(array() AS
ARRAY<FLOAT>));
+
+-- Zero magnitude vector: cosine similarity returns NULL
+SELECT vector_cosine_similarity(array(0.0F, 0.0F, 0.0F), array(1.0F, 2.0F,
3.0F));
+
+-- NULL array input: cosine similarity returns NULL
+SELECT vector_cosine_similarity(NULL, array(1.0F, 2.0F, 3.0F));
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), NULL);
+
+-- NULL array input: inner product returns NULL
+SELECT vector_inner_product(NULL, array(1.0F, 2.0F, 3.0F));
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), NULL);
+
+-- NULL array input: L2 distance returns NULL
+SELECT vector_l2_distance(NULL, array(1.0F, 2.0F, 3.0F));
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), NULL);
+
+-- Array containing NULL element: returns NULL
+SELECT vector_cosine_similarity(array(1.0F, CAST(NULL AS FLOAT), 3.0F),
array(1.0F, 2.0F, 3.0F));
+SELECT vector_inner_product(array(1.0F, CAST(NULL AS FLOAT), 3.0F),
array(1.0F, 2.0F, 3.0F));
+SELECT vector_l2_distance(array(1.0F, CAST(NULL AS FLOAT), 3.0F), array(1.0F,
2.0F, 3.0F));
+
+-- Dimension mismatch errors
+
+-- vector_cosine_similarity: dimension mismatch
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(1.0F, 2.0F));
+
+-- vector_inner_product: dimension mismatch
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(1.0F, 2.0F));
+
+-- vector_l2_distance: dimension mismatch
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(1.0F, 2.0F));
+
+-- Type mismatch errors (only ARRAY<FLOAT> is accepted)
+
+-- vector_cosine_similarity: ARRAY<DOUBLE> not accepted
+SELECT vector_cosine_similarity(array(1.0D, 2.0D, 3.0D), array(4.0D, 5.0D,
6.0D));
+
+-- vector_cosine_similarity: ARRAY<INT> not accepted
+SELECT vector_cosine_similarity(array(1, 2, 3), array(4, 5, 6));
+
+-- vector_inner_product: ARRAY<DOUBLE> not accepted
+SELECT vector_inner_product(array(1.0D, 2.0D, 3.0D), array(4.0D, 5.0D, 6.0D));
+
+-- vector_inner_product: ARRAY<INT> not accepted
+SELECT vector_inner_product(array(1, 2, 3), array(4, 5, 6));
+
+-- vector_l2_distance: ARRAY<DOUBLE> not accepted
+SELECT vector_l2_distance(array(1.0D, 2.0D, 3.0D), array(4.0D, 5.0D, 6.0D));
+
+-- vector_l2_distance: ARRAY<INT> not accepted
+SELECT vector_l2_distance(array(1, 2, 3), array(4, 5, 6));
+
+-- Mixed type errors (first arg correct, second wrong)
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(1.0D, 2.0D,
3.0D));
+
+-- Non-array argument errors (argument is not an array at all)
+
+-- vector_cosine_similarity: first argument is not an array
+SELECT vector_cosine_similarity('not an array', array(1.0F, 2.0F, 3.0F));
+
+-- vector_cosine_similarity: second argument is not an array
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), 'not an array');
+
+-- vector_cosine_similarity: both arguments are not arrays
+SELECT vector_cosine_similarity(123, 456);
+
+-- vector_inner_product: first argument is not an array
+SELECT vector_inner_product('not an array', array(1.0F, 2.0F, 3.0F));
+
+-- vector_inner_product: second argument is not an array
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), 'not an array');
+
+-- vector_l2_distance: first argument is not an array
+SELECT vector_l2_distance('not an array', array(1.0F, 2.0F, 3.0F));
+
+-- vector_l2_distance: second argument is not an array
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), 'not an array');
+
+-- Large vectors (test SIMD unroll path with 16 elements)
+SELECT vector_inner_product(
+ array(1.0F, 2.0F, 3.0F, 4.0F, 5.0F, 6.0F, 7.0F, 8.0F, 9.0F, 10.0F, 11.0F,
12.0F, 13.0F, 14.0F, 15.0F, 16.0F),
+ array(16.0F, 15.0F, 14.0F, 13.0F, 12.0F, 11.0F, 10.0F, 9.0F, 8.0F, 7.0F,
6.0F, 5.0F, 4.0F, 3.0F, 2.0F, 1.0F)
+);
+
+SELECT vector_l2_distance(
+ array(1.0F, 2.0F, 3.0F, 4.0F, 5.0F, 6.0F, 7.0F, 8.0F, 9.0F, 10.0F, 11.0F,
12.0F, 13.0F, 14.0F, 15.0F, 16.0F),
+ array(16.0F, 15.0F, 14.0F, 13.0F, 12.0F, 11.0F, 10.0F, 9.0F, 8.0F, 7.0F,
6.0F, 5.0F, 4.0F, 3.0F, 2.0F, 1.0F)
+);
diff --git
a/sql/core/src/test/resources/sql-tests/results/vector-distance.sql.out
b/sql/core/src/test/resources/sql-tests/results/vector-distance.sql.out
new file mode 100644
index 000000000000..14f503dde7f0
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/results/vector-distance.sql.out
@@ -0,0 +1,746 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F,
6.0F))
+-- !query schema
+struct<vector_cosine_similarity(array(1.0, 2.0, 3.0), array(4.0, 5.0,
6.0)):float>
+-- !query output
+0.97463185
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 0.0F, 0.0F), array(1.0F, 0.0F,
0.0F))
+-- !query schema
+struct<vector_cosine_similarity(array(1.0, 0.0, 0.0), array(1.0, 0.0,
0.0)):float>
+-- !query output
+1.0
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 0.0F), array(0.0F, 1.0F))
+-- !query schema
+struct<vector_cosine_similarity(array(1.0, 0.0), array(0.0, 1.0)):float>
+-- !query output
+0.0
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 0.0F), array(-1.0F, 0.0F))
+-- !query schema
+struct<vector_cosine_similarity(array(1.0, 0.0), array(-1.0, 0.0)):float>
+-- !query output
+-1.0
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F))
+-- !query schema
+struct<vector_inner_product(array(1.0, 2.0, 3.0), array(4.0, 5.0, 6.0)):float>
+-- !query output
+32.0
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 0.0F), array(0.0F, 1.0F))
+-- !query schema
+struct<vector_inner_product(array(1.0, 0.0), array(0.0, 1.0)):float>
+-- !query output
+0.0
+
+
+-- !query
+SELECT vector_inner_product(array(3.0F, 4.0F), array(3.0F, 4.0F))
+-- !query schema
+struct<vector_inner_product(array(3.0, 4.0), array(3.0, 4.0)):float>
+-- !query output
+25.0
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F))
+-- !query schema
+struct<vector_l2_distance(array(1.0, 2.0, 3.0), array(4.0, 5.0, 6.0)):float>
+-- !query output
+5.196152
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F), array(1.0F, 2.0F))
+-- !query schema
+struct<vector_l2_distance(array(1.0, 2.0), array(1.0, 2.0)):float>
+-- !query output
+0.0
+
+
+-- !query
+SELECT vector_l2_distance(array(0.0F, 0.0F), array(3.0F, 4.0F))
+-- !query schema
+struct<vector_l2_distance(array(0.0, 0.0), array(3.0, 4.0)):float>
+-- !query output
+5.0
+
+
+-- !query
+SELECT vector_cosine_similarity(array(), array())
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array()\"",
+ "inputType" : "\"ARRAY<VOID>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(), array())\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 49,
+ "fragment" : "vector_cosine_similarity(array(), array())"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(CAST(array() AS ARRAY<FLOAT>), CAST(array() AS
ARRAY<FLOAT>))
+-- !query schema
+struct<vector_inner_product(array(), array()):float>
+-- !query output
+0.0
+
+
+-- !query
+SELECT vector_l2_distance(CAST(array() AS ARRAY<FLOAT>), CAST(array() AS
ARRAY<FLOAT>))
+-- !query schema
+struct<vector_l2_distance(array(), array()):float>
+-- !query output
+0.0
+
+
+-- !query
+SELECT vector_cosine_similarity(array(0.0F, 0.0F, 0.0F), array(1.0F, 2.0F,
3.0F))
+-- !query schema
+struct<vector_cosine_similarity(array(0.0, 0.0, 0.0), array(1.0, 2.0,
3.0)):float>
+-- !query output
+NULL
+
+
+-- !query
+SELECT vector_cosine_similarity(NULL, array(1.0F, 2.0F, 3.0F))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(NULL, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 62,
+ "fragment" : "vector_cosine_similarity(NULL, array(1.0F, 2.0F, 3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), NULL)
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1.0, 2.0, 3.0), NULL)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 62,
+ "fragment" : "vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), NULL)"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(NULL, array(1.0F, 2.0F, 3.0F))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(NULL, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 58,
+ "fragment" : "vector_inner_product(NULL, array(1.0F, 2.0F, 3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), NULL)
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(array(1.0, 2.0, 3.0), NULL)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 58,
+ "fragment" : "vector_inner_product(array(1.0F, 2.0F, 3.0F), NULL)"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(NULL, array(1.0F, 2.0F, 3.0F))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(NULL, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 56,
+ "fragment" : "vector_l2_distance(NULL, array(1.0F, 2.0F, 3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), NULL)
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"NULL\"",
+ "inputType" : "\"VOID\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(array(1.0, 2.0, 3.0), NULL)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 56,
+ "fragment" : "vector_l2_distance(array(1.0F, 2.0F, 3.0F), NULL)"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, CAST(NULL AS FLOAT), 3.0F),
array(1.0F, 2.0F, 3.0F))
+-- !query schema
+struct<vector_cosine_similarity(array(1.0, CAST(NULL AS FLOAT), 3.0),
array(1.0, 2.0, 3.0)):float>
+-- !query output
+NULL
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, CAST(NULL AS FLOAT), 3.0F),
array(1.0F, 2.0F, 3.0F))
+-- !query schema
+struct<vector_inner_product(array(1.0, CAST(NULL AS FLOAT), 3.0), array(1.0,
2.0, 3.0)):float>
+-- !query output
+NULL
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, CAST(NULL AS FLOAT), 3.0F), array(1.0F,
2.0F, 3.0F))
+-- !query schema
+struct<vector_l2_distance(array(1.0, CAST(NULL AS FLOAT), 3.0), array(1.0,
2.0, 3.0)):float>
+-- !query output
+NULL
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(1.0F, 2.0F))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkRuntimeException
+{
+ "errorClass" : "VECTOR_DIMENSION_MISMATCH",
+ "sqlState" : "22000",
+ "messageParameters" : {
+ "functionName" : "`vector_cosine_similarity`",
+ "leftDim" : "3",
+ "rightDim" : "2"
+ }
+}
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(1.0F, 2.0F))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkRuntimeException
+{
+ "errorClass" : "VECTOR_DIMENSION_MISMATCH",
+ "sqlState" : "22000",
+ "messageParameters" : {
+ "functionName" : "`vector_inner_product`",
+ "leftDim" : "3",
+ "rightDim" : "2"
+ }
+}
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(1.0F, 2.0F))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkRuntimeException
+{
+ "errorClass" : "VECTOR_DIMENSION_MISMATCH",
+ "sqlState" : "22000",
+ "messageParameters" : {
+ "functionName" : "`vector_l2_distance`",
+ "leftDim" : "3",
+ "rightDim" : "2"
+ }
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0D, 2.0D, 3.0D), array(4.0D, 5.0D,
6.0D))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1.0, 2.0, 3.0)\"",
+ "inputType" : "\"ARRAY<DOUBLE>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1.0, 2.0, 3.0), array(4.0,
5.0, 6.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 81,
+ "fragment" : "vector_cosine_similarity(array(1.0D, 2.0D, 3.0D),
array(4.0D, 5.0D, 6.0D))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1, 2, 3), array(4, 5, 6))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1, 2, 3)\"",
+ "inputType" : "\"ARRAY<INT>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1, 2, 3), array(4, 5, 6))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 63,
+ "fragment" : "vector_cosine_similarity(array(1, 2, 3), array(4, 5, 6))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(array(1.0D, 2.0D, 3.0D), array(4.0D, 5.0D, 6.0D))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1.0, 2.0, 3.0)\"",
+ "inputType" : "\"ARRAY<DOUBLE>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(array(1.0, 2.0, 3.0), array(4.0, 5.0,
6.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 77,
+ "fragment" : "vector_inner_product(array(1.0D, 2.0D, 3.0D), array(4.0D,
5.0D, 6.0D))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(array(1, 2, 3), array(4, 5, 6))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1, 2, 3)\"",
+ "inputType" : "\"ARRAY<INT>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(array(1, 2, 3), array(4, 5, 6))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 59,
+ "fragment" : "vector_inner_product(array(1, 2, 3), array(4, 5, 6))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0D, 2.0D, 3.0D), array(4.0D, 5.0D, 6.0D))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1.0, 2.0, 3.0)\"",
+ "inputType" : "\"ARRAY<DOUBLE>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(array(1.0, 2.0, 3.0), array(4.0, 5.0,
6.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 75,
+ "fragment" : "vector_l2_distance(array(1.0D, 2.0D, 3.0D), array(4.0D,
5.0D, 6.0D))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(array(1, 2, 3), array(4, 5, 6))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1, 2, 3)\"",
+ "inputType" : "\"ARRAY<INT>\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(array(1, 2, 3), array(4, 5, 6))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 57,
+ "fragment" : "vector_l2_distance(array(1, 2, 3), array(4, 5, 6))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(1.0D, 2.0D,
3.0D))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"array(1.0, 2.0, 3.0)\"",
+ "inputType" : "\"ARRAY<DOUBLE>\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1.0, 2.0, 3.0), array(1.0,
2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 81,
+ "fragment" : "vector_cosine_similarity(array(1.0F, 2.0F, 3.0F),
array(1.0D, 2.0D, 3.0D))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity('not an array', array(1.0F, 2.0F, 3.0F))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(not an array, array(1.0, 2.0,
3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 72,
+ "fragment" : "vector_cosine_similarity('not an array', array(1.0F, 2.0F,
3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), 'not an array')
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(array(1.0, 2.0, 3.0), not an
array)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 72,
+ "fragment" : "vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), 'not an
array')"
+ } ]
+}
+
+
+-- !query
+SELECT vector_cosine_similarity(123, 456)
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"123\"",
+ "inputType" : "\"INT\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_cosine_similarity(123, 456)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 41,
+ "fragment" : "vector_cosine_similarity(123, 456)"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product('not an array', array(1.0F, 2.0F, 3.0F))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(not an array, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 68,
+ "fragment" : "vector_inner_product('not an array', array(1.0F, 2.0F,
3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), 'not an array')
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_inner_product(array(1.0, 2.0, 3.0), not an array)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 68,
+ "fragment" : "vector_inner_product(array(1.0F, 2.0F, 3.0F), 'not an
array')"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance('not an array', array(1.0F, 2.0F, 3.0F))
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "first",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(not an array, array(1.0, 2.0, 3.0))\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 66,
+ "fragment" : "vector_l2_distance('not an array', array(1.0F, 2.0F, 3.0F))"
+ } ]
+}
+
+
+-- !query
+SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), 'not an array')
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.ExtendedAnalysisException
+{
+ "errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+ "sqlState" : "42K09",
+ "messageParameters" : {
+ "inputSql" : "\"not an array\"",
+ "inputType" : "\"STRING\"",
+ "paramIndex" : "second",
+ "requiredType" : "\"ARRAY<FLOAT>\"",
+ "sqlExpr" : "\"vector_l2_distance(array(1.0, 2.0, 3.0), not an array)\""
+ },
+ "queryContext" : [ {
+ "objectType" : "",
+ "objectName" : "",
+ "startIndex" : 8,
+ "stopIndex" : 66,
+ "fragment" : "vector_l2_distance(array(1.0F, 2.0F, 3.0F), 'not an array')"
+ } ]
+}
+
+
+-- !query
+SELECT vector_inner_product(
+ array(1.0F, 2.0F, 3.0F, 4.0F, 5.0F, 6.0F, 7.0F, 8.0F, 9.0F, 10.0F, 11.0F,
12.0F, 13.0F, 14.0F, 15.0F, 16.0F),
+ array(16.0F, 15.0F, 14.0F, 13.0F, 12.0F, 11.0F, 10.0F, 9.0F, 8.0F, 7.0F,
6.0F, 5.0F, 4.0F, 3.0F, 2.0F, 1.0F)
+)
+-- !query schema
+struct<vector_inner_product(array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0,
10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0), array(16.0, 15.0, 14.0, 13.0, 12.0,
11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0)):float>
+-- !query output
+816.0
+
+
+-- !query
+SELECT vector_l2_distance(
+ array(1.0F, 2.0F, 3.0F, 4.0F, 5.0F, 6.0F, 7.0F, 8.0F, 9.0F, 10.0F, 11.0F,
12.0F, 13.0F, 14.0F, 15.0F, 16.0F),
+ array(16.0F, 15.0F, 14.0F, 13.0F, 12.0F, 11.0F, 10.0F, 9.0F, 8.0F, 7.0F,
6.0F, 5.0F, 4.0F, 3.0F, 2.0F, 1.0F)
+)
+-- !query schema
+struct<vector_l2_distance(array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0,
10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0), array(16.0, 15.0, 14.0, 13.0, 12.0,
11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0)):float>
+-- !query output
+36.878178
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala
index 33f128409f7e..2b1d4e7e7df4 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala
@@ -60,7 +60,7 @@ class ExpressionInfoSuite extends SparkFunSuite with
SharedSparkSession {
"predicate_funcs", "conditional_funcs", "conversion_funcs", "csv_funcs",
"datetime_funcs",
"generator_funcs", "hash_funcs", "json_funcs", "lambda_funcs",
"map_funcs", "math_funcs",
"misc_funcs", "string_funcs", "struct_funcs", "window_funcs",
"xml_funcs", "table_funcs",
- "url_funcs", "variant_funcs", "st_funcs").sorted
+ "url_funcs", "variant_funcs", "vector_funcs", "st_funcs").sorted
val invalidGroupName = "invalid_group_funcs"
checkError(
exception = intercept[SparkIllegalArgumentException] {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]