jja725 opened a new pull request, #1541: URL: https://github.com/apache/datafusion-ballista/pull/1541
## Summary Add support for using [Riffle](https://github.com/zuston/riffle) (a Rust-native Apache Uniffle implementation) as an alternative remote shuffle backend for Ballista. This enables disaggregated shuffle storage, decoupling compute from storage for cloud-native deployments. ### What's included - **New `ballista-riffle` crate**: Uniffle gRPC client with full shuffle lifecycle (register → require buffer → push data → report → read), Arrow IPC serialization helpers, and configuration types - **Extended `SortShuffleWriterExec`**: Optional Riffle write path (`#[cfg(feature = "riffle")]`) that pushes each partition's Arrow IPC-serialized data to Riffle instead of writing to local disk. Reuses the existing `PartitionBuffer` and `SpillManager` — only the finalization step changes - **Extended `ShuffleReaderExec`**: Riffle fetch path in `send_fetch_partitions()` that pulls data from Riffle servers and deserializes Arrow IPC bytes, reusing the existing `CoalescedShuffleReaderStream` infrastructure - **Protobuf extensions**: `riffle_app_id` and `riffle_shuffle_id` fields added to `ShuffleWritePartition` and `PartitionLocation` messages (backward-compatible) - **Configuration**: `ballista.shuffle.backend` ("local" | "riffle"), `ballista.riffle.coordinator.host`, `ballista.riffle.coordinator.port` ### Design decisions - All Riffle code is behind `feature = "riffle"` — zero compilation impact when disabled - No new `ExecutionPlan` types — extends existing `ShuffleWriterExec`, `SortShuffleWriterExec`, and `ShuffleReaderExec` in-place - Arrow IPC is the wire format (Riffle treats data as opaque bytes) - Sort-based shuffle is the natural integration point: it already buffers per-partition, spills under memory pressure, and writes all data in a single finalization pass — mapping directly to Riffle's "push then commit" model ### Local testing Tested locally against a Riffle cluster (Uniffle coordinator v0.10.0 + Riffle shuffle server v0.21.0): ``` Connected to Riffle coordinator Got assignment: server=127.0.0.1:21100 Registered shuffle 1 Serialized 1352 bytes of Arrow IPC data for partition 0 Pushed data to partition 0 Reported shuffle result Read back 1352 bytes from partition 0 SUCCESS: Full Riffle client lifecycle test passed! ``` The integration test is marked `#[ignore]` since it requires a running Riffle cluster. ## Test plan - [x] `cargo build --workspace --all-targets` — clean build without riffle feature - [x] `cargo build -p ballista-core --features riffle` — clean build with riffle feature - [x] `cargo test --workspace` — 301 tests passed, 0 failed - [x] Integration test against local Riffle cluster (coordinator + shuffle server) — full push/read lifecycle verified - [ ] CI will run without riffle feature (no cluster dependency) 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
