rluvaton commented on code in PR #3845: URL: https://github.com/apache/datafusion-comet/pull/3845#discussion_r3073002722
########## docs/source/user-guide/latest/tuning.md: ########## @@ -144,6 +144,20 @@ Comet provides a fully native shuffle implementation, which generally provides t supports `HashPartitioning`, `RangePartitioning` and `SinglePartitioning` but currently only supports primitive type partitioning keys. Columns that are not partitioning keys may contain complex types like maps, structs, and arrays. +Native shuffle has two partitioner modes, configured via +`spark.comet.exec.shuffle.partitionerMode`: + +- **`buffered`** (default): Buffers all input batches in memory, then uses `interleave` to produce + partitioned output one partition at a time. Only one partition's output batch is in memory at + a time during the write phase, so this mode scales well to large numbers of partitions (1000+). + The trade-off is that it must hold all input data in memory (or spill it) before writing begins. + +- **`immediate`**: Partitions incoming batches immediately using per-partition Arrow array builders, + flushing compressed IPC blocks when they reach the target batch size. This avoids buffering the + entire input in memory. However, because it maintains builders for all partitions simultaneously + (proportional to `num_partitions × batch_size × num_columns`), memory overhead grows with + partition count. For workloads with many partitions (1000+), `buffered` mode is recommended. Review Comment: Also data skew will now use more memory -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
