2010YOUY01 commented on issue #21381: URL: https://github.com/apache/datafusion/issues/21381#issuecomment-4190351686
> When `SortExec` is eliminated via sort pushdown (statistics-based file reordering), `SortPreservingMergeExec` reads directly from I/O-bound sources instead of from `SortExec`'s in-memory buffer. Currently SPM does a single K-way merge of all input streams, which can become a bottleneck when there are many partitions. The current implementation already forms a tree of SPMs similar to what you have described, so there shouldn’t be a major bottleneck I think, though there may still be room for improvement. Currently, there are two stages of SPM: - Stage 1: `SortExec` output, where the merge degree equals the number of in-memory batches - Stage 2: Final `SortPreservingMergeExec`, where the merge degree equals the number of partitions Stage 1 is non-blocking, so the two SPM stages can run in parallel. The main optimization opportunity seems to be balancing the merge degrees across the two stages to avoid one becoming a bottleneck. For example, if `SortExec` performs a 4-way merge while Stage 2 performs a 32-way merge, Stage 2 is likely to be the bottleneck. This could potentially be improved with a more recursive merging scheme. > Related: DuckDB implements a similar [parallel merge sort](https://duckdb.org/2021/08/27/external-sorting.html) strategy. They have a more recent version with a parallel k-way merge algorithm: https://duckdb.org/2025/09/24/sorting-again#k-way-merge-path -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
