wForget commented on code in PR #3845:
URL: https://github.com/apache/datafusion-comet/pull/3845#discussion_r3032853983


##########
docs/source/contributor-guide/native_shuffle.md:
##########
@@ -129,23 +139,33 @@ Native shuffle (`CometExchange`) is selected when all of 
the following condition
 
 2. **Native execution**: `CometExec.getCometIterator()` executes the plan in 
Rust.
 
-3. **Partitioning**: `ShuffleWriterExec` receives batches and routes to the 
appropriate partitioner:
-   - `MultiPartitionShuffleRepartitioner`: For hash/range/round-robin 
partitioning
-   - `SinglePartitionShufflePartitioner`: For single partition (simpler path)
+3. **Partitioning**: `ShuffleWriterExec` receives batches and routes to the 
appropriate partitioner
+   based on the `partitionerMode` configuration:
+   - **Immediate mode** (`ImmediateModePartitioner`): For 
hash/range/round-robin partitioning.
+     As each batch arrives, rows are scattered into per-partition Arrow array 
builders. When a

Review Comment:
   > When a partition's builder reaches the target batch size, it is flushed as 
a compressed Arrow IPC block to an in-memory buffer.
   
   The current IPC writer uses block compression (compressing each batch), 
which may lead to poor compression ratios. In gluten, the shuffle writer first 
serializes and buffers the batches, then performs streming compression during 
eviction, achieving better compression ratios. I'm not entirely sure which is 
better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to