wForget commented on issue #3855:
URL: 
https://github.com/apache/datafusion-comet/issues/3855#issuecomment-4182594426

   If I understand correctly, the current proposal is similar to gluten's hash 
shuffle writer (default behavior).
   + buffered-mode shuffle is similar to [gluten's sort shuffle 
writer](https://github.com/apache/gluten/blob/main/cpp/velox/shuffle/VeloxSortShuffleWriter.cc):
 insert batch -> add batches to a global buffer -> spilt by pid and evict to 
partition writer
   + immediate-mode shuffle is similar to [gluten's hash shuffle 
writer](https://github.com/apache/gluten/blob/main/cpp/velox/shuffle/VeloxHashShuffleWriter.cc):
 insert batch -> spilt and push to partition buffers -> evict partition buffer 
to parititon writer
   
   According to the performance report at 
https://github.com/apache/gluten/pull/6475, the performance of the two shuffle 
modes may be related to the number of partitions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to