andygrove opened a new pull request, #1638:
URL: https://github.com/apache/datafusion-ballista/pull/1638

   # Which issue does this PR close?
   
   Closes #.
   
   # Rationale for this change
   
   `SortShuffleWriterExec` previously only supported `Partitioning::Hash`. 
Stages with `partitioning=None` (final output, `CoalescePartitionsExec`, 
`SortPreservingMergeExec`) had to fall back to `ShuffleWriterExec`, leaving two 
writer code paths and two on-disk shuffle formats coexisting in executor work 
directories. Extending sort-shuffle to handle `None` lets every stage use one 
writer when sort-based shuffle is enabled and is a prerequisite for removing 
the legacy hash-based writer entirely.
   
   # What changes are included in this PR?
   
   - `SortShuffleWriterExec::try_new` now takes `Option<Partitioning>` 
(mirroring `ShuffleWriterExec`). When `None`, the writer skips hashing and 
writes a single-partition data+index file with every input row in bucket 0.
   - `shuffle_output_partitioning()` returns `Option<&Partitioning>`.
   - Encoder/decoder in `ballista_core::serde` accept the absent 
`output_partitioning` field on `SortShuffleWriterExecNode` (the proto field is 
implicitly optional, so no proto change).
   - `DefaultDistributedPlanner::create_shuffle_writer_with_config` selects 
sort-shuffle for any supported partitioning (`Hash` or `None`) when 
`BALLISTA_SHUFFLE_SORT_BASED_ENABLED` is on, instead of falling back to 
`ShuffleWriterExec`.
   - New unit tests exercise the writer with `None` and the planner routing 
decision.
   
   # Are there any user-facing changes?
   
   The signature of `SortShuffleWriterExec::try_new` and 
`shuffle_output_partitioning()` change shape (`Partitioning` -> 
`Option<Partitioning>`). The `BALLISTA_SHUFFLE_SORT_BASED_ENABLED` config still 
controls the choice of writer; the difference is that when enabled, all stages 
use sort-shuffle, not just the hash-partitioned ones.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to