gratus00 opened a new pull request, #21837:
URL: https://github.com/apache/datafusion/pull/21837

   ## Which issue does this PR close?
   
   Partially addresses #21543.
   
   ## Rationale for this change
   
   #21543 points out that the existing sort benchmark does not cover an 
important ExternalSorter shape: sorting by a cheap key while carrying non-key 
payload columns through the sort
   
   The existing tuple/string/dictionary cases sort by every column via 
`make_sort_exprs(schema)`. That is useful coverage, but it does not model 
queries like `ORDER BY key` over rows that also contain UTF-8 or dictionary 
payload columns. This PR adds benchmark coverage for that case, following the 
discussion in #21543 and the benchmark/use-case direction from #21688
   
   I used Codex to help draft parts of this benchmark change and the PR 
description. I reviewed and adjusted the resulting code locally before opening 
the PR
   
   ## What changes are included in this PR?
   
   - Change the sort benchmark batch size from `1024` to `8192`, matching 
DataFusion's default target batch size more closely
   - Add key-only sort benchmark cases:
     - `i64 key utf8 payload`
     - `i64 key dictionary payload`
   - Add selected-column sort helpers for those new benchmark cases, so they 
sort only by `key` while carrying payload columns
   - Keep the existing benchmark cases sorting all columns as before
   
   The new cases are run across the existing four benchmark shapes:
   
   - `merge sorted`
   - `sort merge`
   - `sort`
   - `sort partitioned`
   
   ## Are these changes tested?
   
   Focused checks run locally:
   
   ```bash
   cargo fmt --all --check
   cargo check -p datafusion --bench sort
   cargo clippy -p datafusion --bench sort --all-features -- -D warnings
   ```
   
   ## Are there any user-facing changes?
   
   No. This is a benchmark-only change
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to