andygrove opened a new pull request, #1623:
URL: https://github.com/apache/datafusion-ballista/pull/1623
# Which issue does this PR close?
Closes #.
# Rationale for this change
The sort-based shuffle writer was added as an opt-in alternative to the
hash-based writer. It produces `2 × N` files instead of `N × M` (where `N` is
input partitions and `M` is output partitions), coalesces small batches before
writing, and bounds shuffle memory via spill-to-disk. With recent improvements
landed in #1615 (byte-copy spill files and block-IO transport for remote
reads), the sort-based writer is the better default for most workloads,
especially anything with non-trivial partition fan-out where the hash writer's
`N × M` file count becomes a problem.
This PR makes sort-based shuffle the default. The hash-based writer is still
available for narrow shuffles where the buffering and merging overhead of the
sort-based writer is unnecessary.
# What changes are included in this PR?
- Flip the default of `ballista.shuffle.sort_based.enabled` from `false` to
`true` in `ballista/core/src/config.rs`.
- Update test expectations:
- `context_checks.rs` EXPLAIN and EXPLAIN ANALYZE plans now show
`SortShuffleWriterExec` for hash-partitioned stages (and the additional
`spill_*` metrics).
- `execution_graph_dot.rs` dot-graph fixtures show `SortShuffleWriter` for
hash-partitioned stages; terminal stages with `partitioning: None` continue to
use `ShuffleWriter`, since the planner only switches to sort-based when
partitioning is `Hash(...)`.
- `planner.rs` `distributed_window_plan` downcasts to
`SortShuffleWriterExec`.
- `sort_shuffle.rs` `create_hash_shuffle_context` now sets the flag
explicitly to `false` since it can no longer rely on the default.
- Update `docs/source/user-guide/tuning-guide.md` and
`docs/source/user-guide/configs.md` to mark sort-based shuffle as the default
and document how to opt in to the hash-based writer.
# Are there any user-facing changes?
Yes. Existing clusters that do not set `ballista.shuffle.sort_based.enabled`
will switch from the hash-based writer to the sort-based writer on upgrade. The
sort-based writer has different memory and disk-IO characteristics: it buffers
per-output-partition batches in memory up to
`ballista.shuffle.sort_based.memory_limit` (default 256 MiB) and spills to disk
past `spill_threshold` (default 80%). The hash-based writer remains available
by setting `ballista.shuffle.sort_based.enabled=false`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]