andygrove opened a new pull request, #1638: URL: https://github.com/apache/datafusion-ballista/pull/1638
# Which issue does this PR close? Closes #. # Rationale for this change `SortShuffleWriterExec` previously only supported `Partitioning::Hash`. Stages with `partitioning=None` (final output, `CoalescePartitionsExec`, `SortPreservingMergeExec`) had to fall back to `ShuffleWriterExec`, leaving two writer code paths and two on-disk shuffle formats coexisting in executor work directories. Extending sort-shuffle to handle `None` lets every stage use one writer when sort-based shuffle is enabled and is a prerequisite for removing the legacy hash-based writer entirely. # What changes are included in this PR? - `SortShuffleWriterExec::try_new` now takes `Option<Partitioning>` (mirroring `ShuffleWriterExec`). When `None`, the writer skips hashing and writes a single-partition data+index file with every input row in bucket 0. - `shuffle_output_partitioning()` returns `Option<&Partitioning>`. - Encoder/decoder in `ballista_core::serde` accept the absent `output_partitioning` field on `SortShuffleWriterExecNode` (the proto field is implicitly optional, so no proto change). - `DefaultDistributedPlanner::create_shuffle_writer_with_config` selects sort-shuffle for any supported partitioning (`Hash` or `None`) when `BALLISTA_SHUFFLE_SORT_BASED_ENABLED` is on, instead of falling back to `ShuffleWriterExec`. - New unit tests exercise the writer with `None` and the planner routing decision. # Are there any user-facing changes? The signature of `SortShuffleWriterExec::try_new` and `shuffle_output_partitioning()` change shape (`Partitioning` -> `Option<Partitioning>`). The `BALLISTA_SHUFFLE_SORT_BASED_ENABLED` config still controls the choice of writer; the difference is that when enabled, all stages use sort-shuffle, not just the hash-partitioned ones. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
