crm26 opened a new issue, #21536: URL: https://github.com/apache/datafusion/issues/21536
## Summary This issue tracks adding vector math and array aggregate scalar functions to DataFusion. These close gaps versus DuckDB and LanceDB for vector search and array analytics workloads. Replaces #21371 and #21376, which were requested to be split into function-per-PR submissions (per @alamb's review). ## Functions ### Vector math (with shared `vector_math.rs` primitives) | Function | Signature | Reference | |----------|-----------|-----------| | `cosine_distance` | `(array, array) → float64` | [DuckDB `array_cosine_similarity`](https://duckdb.org/docs/sql/functions/array.html) | | `inner_product` | `(array, array) → float64` | [DuckDB `array_inner_product`](https://duckdb.org/docs/sql/functions/array.html) | | `array_normalize` | `(array) → array` | NumPy / scipy convention | ### Array element-wise math | Function | Signature | Reference | |----------|-----------|-----------| | `array_add` | `(array, array) → array` | Element-wise addition | | `array_subtract` | `(array, array) → array` | Element-wise subtraction | | `array_scale` | `(array, scalar) → array` | Scalar multiply | ### Array aggregate scalars | Function | Signature | Reference | |----------|-----------|-----------| | `array_sum` / `list_sum` | `(array) → numeric` | [DuckDB `list_sum`](https://duckdb.org/docs/sql/functions/list.html#list_sumlist) | | `array_product` / `list_product` | `(array) → numeric` | [DuckDB `list_product`](https://duckdb.org/docs/sql/functions/list.html) | | `array_avg` / `list_avg` | `(array) → float64` | [DuckDB `list_avg`](https://duckdb.org/docs/sql/functions/list.html) | ### Alias fix | Fix | Description | |-----|-------------| | `list_min` | Missing alias on `ArrayMin` (parity with existing `list_max` on `ArrayMax`) | ## Submission plan One PR per function, submitted serially. Each PR will reference this issue. ## References - [DuckDB list functions](https://duckdb.org/docs/sql/functions/list.html) - [DuckDB array functions](https://duckdb.org/docs/sql/functions/array.html) - [LanceDB distance metrics](https://lancedb.github.io/lancedb/) - [Trino array functions](https://trino.io/docs/current/functions/array.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
