Re: [PR] feat: Add immediate mode option for native shuffle [datafusion-comet]

via GitHub Mon, 13 Apr 2026 05:38:09 -0700


rluvaton commented on code in PR #3845:
URL: https://github.com/apache/datafusion-comet/pull/3845#discussion_r3073002722



##########
docs/source/user-guide/latest/tuning.md:
##########
@@ -144,6 +144,20 @@ Comet provides a fully native shuffle implementation, 
which generally provides t
 supports `HashPartitioning`, `RangePartitioning` and `SinglePartitioning` but 
currently only supports primitive type
 partitioning keys. Columns that are not partitioning keys may contain complex 
types like maps, structs, and arrays.
 
+Native shuffle has two partitioner modes, configured via
+`spark.comet.exec.shuffle.partitionerMode`:
+
+- **`buffered`** (default): Buffers all input batches in memory, then uses 
`interleave` to produce
+  partitioned output one partition at a time. Only one partition's output 
batch is in memory at
+  a time during the write phase, so this mode scales well to large numbers of 
partitions (1000+).
+  The trade-off is that it must hold all input data in memory (or spill it) 
before writing begins.
+
+- **`immediate`**: Partitions incoming batches immediately using per-partition 
Arrow array builders,
+  flushing compressed IPC blocks when they reach the target batch size. This 
avoids buffering the
+  entire input in memory. However, because it maintains builders for all 
partitions simultaneously
+  (proportional to `num_partitions × batch_size × num_columns`), memory 
overhead grows with
+  partition count. For workloads with many partitions (1000+), `buffered` mode 
is recommended.

Review Comment:
   Also data skew will now use more memory



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add immediate mode option for native shuffle [datafusion-comet]

Reply via email to