gratus00 opened a new pull request, #21837:
URL: https://github.com/apache/datafusion/pull/21837
## Which issue does this PR close?
Partially addresses #21543.
## Rationale for this change
#21543 points out that the existing sort benchmark does not cover an
important ExternalSorter shape: sorting by a cheap key while carrying non-key
payload columns through the sort
The existing tuple/string/dictionary cases sort by every column via
`make_sort_exprs(schema)`. That is useful coverage, but it does not model
queries like `ORDER BY key` over rows that also contain UTF-8 or dictionary
payload columns. This PR adds benchmark coverage for that case, following the
discussion in #21543 and the benchmark/use-case direction from #21688
I used Codex to help draft parts of this benchmark change and the PR
description. I reviewed and adjusted the resulting code locally before opening
the PR
## What changes are included in this PR?
- Change the sort benchmark batch size from `1024` to `8192`, matching
DataFusion's default target batch size more closely
- Add key-only sort benchmark cases:
- `i64 key utf8 payload`
- `i64 key dictionary payload`
- Add selected-column sort helpers for those new benchmark cases, so they
sort only by `key` while carrying payload columns
- Keep the existing benchmark cases sorting all columns as before
The new cases are run across the existing four benchmark shapes:
- `merge sorted`
- `sort merge`
- `sort`
- `sort partitioned`
## Are these changes tested?
Focused checks run locally:
```bash
cargo fmt --all --check
cargo check -p datafusion --bench sort
cargo clippy -p datafusion --bench sort --all-features -- -D warnings
```
## Are there any user-facing changes?
No. This is a benchmark-only change
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]