mbutrovich opened a new pull request, #21629: URL: https://github.com/apache/datafusion/pull/21629
## Which issue does this PR close? Partially addresses #21543. This is the ExternalSorter pipeline refactor from #21600, separated from the radix sort changes. ## Rationale for this change ExternalSorter's merge path sorts each incoming batch individually (typically 8192 rows), then k-way merges all of them. At scale (TPC-H SF10, ~60M rows in lineitem), this produces ~7300 individually-sorted batches feeding the k-way merge with high fan-in. ## What changes are included in this PR? ### Coalesce-then-sort pipeline Replaces ExternalSorter's buffer-then-sort architecture with a coalesce-then-sort pipeline: - Incoming batches accumulate in a `BatchCoalescer` until `sort_coalesce_target_rows` (default 32768) is reached - Each coalesced batch is sorted and chunked back to `batch_size`, producing fewer, larger sorted runs - On memory pressure, sorted runs spill to disk (merged into one file when headroom is available, one file per run otherwise) - At query completion, runs are k-way merged via the existing `StreamingMergeBuilder` This reduces merge fan-in from ~7300 to ~1800 runs at SF10, which is the primary source of speedup. ### Spill strategy - **With merge headroom** (`sort_spill_reservation_bytes > 0`): merge all runs into a single sorted stream before spilling to one file. Fewer files = lower fan-in for the final `MultiLevelMerge`. - **Without headroom** (`sort_spill_reservation_bytes == 0`): spill each run as its own file. The multi-level merge handles low merge memory by reducing fan-in, so this no longer fails under tight memory budgets. ### Dead code removal Sorted runs no longer require an in-memory merge before spilling. Removes `in_mem_sort_stream`, `sort_batch_stream`, `consume_and_spill_append`, `spill_finish`, `organize_stringview_arrays`, and `in_progress_spill_file`. ### Config changes - New: `sort_coalesce_target_rows` (default 32768) - Deprecated: `sort_in_place_threshold_bytes` (no longer read, `warn` attribute per API health policy) ## Are these changes tested? - 4 new unit tests (coalescing, partial flush, per-run spill, merged spill) - All 48 sort unit tests pass - All sort fuzz, sort query fuzz, and spilling fuzz tests pass - `information_schema.slt` updated for new config ## Are there any user-facing changes? - New config `sort_coalesce_target_rows` (default 32768) controls the coalesce target before sorting - `sort_in_place_threshold_bytes` is deprecated - The pipeline is more memory-efficient (shrinks reservations after sorting) so some workloads may spill less frequently - Sorts under tight memory budgets (`sort_spill_reservation_bytes` near zero) that previously failed now succeed via multi-level merge with reduced fan-in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
