andygrove opened a new pull request, #1634: URL: https://github.com/apache/datafusion-ballista/pull/1634
# Which issue does this PR close? Closes #. # Rationale for this change `shuffle_bench` currently times only the shuffle write phase, which makes it hard to compare the hash-based and sort-based shuffle writers fairly. The two writers produce very different on-disk layouts (`N x M` per-pair files vs. `2 x N` consolidated files plus an index), so read-side cost is exactly where they should diverge, and today that cost is invisible. # What changes are included in this PR? - New `execute_shuffle_read` helper that drains every output partition the writer just produced, using the same local-read primitives the executor uses (`StreamReader` for hash output, `stream_sort_shuffle_partition` for sort output). - Paths come from `create_shuffle_path` so the read side cannot drift from what the writer wrote. - `run_iteration` now times write and read separately and returns an `IterationResult` struct; per-iteration log shows `write / read / total / rows read`, and the Results block reports avg / min / max for each. - `--skip-reads` flag preserves today's write-only profiling output for users who only want to profile the writer. - `arrow-ipc-optimizations` is forwarded from `ballista-benchmarks` to `ballista-core` so the bench's IPC reader matches what the executor does in release builds. # Are there any user-facing changes? No public API changes. The bench's CLI gains an optional `--skip-reads` flag, and the default output now includes read and total timings alongside the existing write timings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
