Fly-a-Kite opened a new issue, #22554:
URL: https://github.com/apache/datafusion/issues/22554
### Describe the bug
`SELECT DISTINCT v FROM t0 ORDER BY v ASC NULLS FIRST LIMIT 1` should return
the first row of the same ordered `DISTINCT` query without `LIMIT`. On
DataFusion `53.0.0`, the full ordered query puts `NULL` first, but adding
`LIMIT 1` returns the first non-null value for string, integer, and float
columns.
### To Reproduce
## Environment
```text
OS: Ubuntu 24.04.1 x86_64
Python: 3.12.3
datafusion: 53.0.0
pyarrow: 24.0.0
```
## Reproduce
```python
import pyarrow as pa
from datafusion import SessionContext
ctx = SessionContext()
batch = pa.RecordBatch.from_pylist(
[{"v": None}, {"v": ""}, {"v": "a"}],
schema=pa.schema([pa.field("v", pa.string(), nullable=True)]),
)
ctx.register_record_batches("t0", [[batch]])
full_sql = "SELECT DISTINCT v FROM t0 ORDER BY v ASC NULLS FIRST"
top1_sql = full_sql + " LIMIT 1"
full = ctx.sql(full_sql).collect()[0].to_pydict()["v"]
top1 = ctx.sql(top1_sql).collect()[0].to_pydict()["v"]
print("full:", full)
print("top1:", top1)
assert top1 == full[:1]
```
## Output
```text
full: [None, '', 'a']
top1: [None]
```
### Expected behavior
```text
full: [None, '', 'a']
top1: ['']
```
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]