andygrove opened a new pull request, #1634:
URL: https://github.com/apache/datafusion-ballista/pull/1634

   # Which issue does this PR close?
   
   Closes #.
   
   # Rationale for this change
   
   `shuffle_bench` currently times only the shuffle write phase, which makes it 
hard to compare the hash-based and sort-based shuffle writers fairly. The two 
writers produce very different on-disk layouts (`N x M` per-pair files vs. `2 x 
N` consolidated files plus an index), so read-side cost is exactly where they 
should diverge, and today that cost is invisible.
   
   # What changes are included in this PR?
   
   - New `execute_shuffle_read` helper that drains every output partition the 
writer just produced, using the same local-read primitives the executor uses 
(`StreamReader` for hash output, `stream_sort_shuffle_partition` for sort 
output).
   - Paths come from `create_shuffle_path` so the read side cannot drift from 
what the writer wrote.
   - `run_iteration` now times write and read separately and returns an 
`IterationResult` struct; per-iteration log shows `write / read / total / rows 
read`, and the Results block reports avg / min / max for each.
   - `--skip-reads` flag preserves today's write-only profiling output for 
users who only want to profile the writer.
   - `arrow-ipc-optimizations` is forwarded from `ballista-benchmarks` to 
`ballista-core` so the bench's IPC reader matches what the executor does in 
release builds.
   
   # Are there any user-facing changes?
   
   No public API changes. The bench's CLI gains an optional `--skip-reads` 
flag, and the default output now includes read and total timings alongside the 
existing write timings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to