andygrove opened a new issue, #21512:
URL: https://github.com/apache/datafusion/issues/21512
### Describe the bug
The `datafusion-spark` implementation of `array_repeat` incorrectly returns
NULL when the first argument (element) is NULL. In Apache Spark, only a NULL
count (second argument) produces a NULL result — a NULL element should be
repeated into the array.
### To Reproduce
**PySpark (correct behavior):**
```sql
SELECT array_repeat(NULL, 2); -- [NULL, NULL]
SELECT array_repeat(NULL, 1); -- [NULL]
SELECT array_repeat(NULL, 0); -- []
SELECT array_repeat('x', NULL); -- NULL
```
**DataFusion-spark (incorrect behavior):**
```sql
SELECT array_repeat(NULL, 2); -- NULL (should be [NULL, NULL])
SELECT array_repeat(NULL, 1); -- NULL (should be [NULL])
SELECT array_repeat(NULL, 0); -- NULL (should be [])
SELECT array_repeat('x', NULL); -- NULL (correct)
```
The `.slt` test at
`datafusion/sqllogictest/test_files/spark/array/array_repeat.slt` line 59 has
the wrong expected value (`NULL` instead of `[NULL, NULL]`). Line 79 also has a
wrong expected value for the `(NULL, 1)` row (`NULL` instead of `[NULL]`).
### Expected behavior
| Expression | Spark result | datafusion-spark result |
|---|---|---|
| `array_repeat('x', 3)` | `[x, x, x]` | `[x, x, x]` ✓ |
| `array_repeat(NULL, 2)` | `[NULL, NULL]` | `NULL` ✗ |
| `array_repeat(NULL, 1)` | `[NULL]` | `NULL` ✗ |
| `array_repeat(NULL, 0)` | `[]` | `NULL` ✗ |
| `array_repeat('x', NULL)` | `NULL` | `NULL` ✓ |
### Additional context
**Root cause:** `SparkArrayRepeat::spark_array_repeat` in
`datafusion/spark/src/function/array/repeat.rs` uses `compute_null_mask` on all
arguments, which returns NULL if *any* argument is NULL. But `array_repeat`
should only return NULL when the count (second argument) is NULL — a NULL
element should be passed through to DataFusion's underlying `array_repeat`,
which correctly repeats it.
**Fix:** Only check the second argument (count) for NULL, not the first
argument (element).
The `.slt` expected values at lines 59 and 79 will also need to be corrected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]