Re: [PR] feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2) [datafusion]

via GitHub Fri, 03 Apr 2026 21:12:45 -0700


zhuqi-lucas commented on code in PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#discussion_r3035087740



##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -366,7 +366,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
                     .map(|partition| {
                         let stream =
                             self.input.execute(partition, 
Arc::clone(&context))?;
-                        Ok(spawn_buffered(stream, 1))
+                        Ok(spawn_buffered(stream, 16))

Review Comment:
   Thanks @alamb for this concern,  this change is directly related to this PR 
due to performance consideration:
   
   When PushdownSort eliminates SortExec on each partition (Exact path), SPM 
loses SortExec's in-memory buffer — previously SPM read from SortExec's fully 
buffered output (zero I/O wait), now it reads directly from DataSourceExec I/O 
streams. With buffer=1, SPM stalls waiting on I/O between every batch. 
Increasing to 16 allows pipelining I/O prefetch with merge computation, 
recovering the throughput lost by eliminating SortExec.
   
   Benchmarked with TPC-DS: buf=16 has no regressions (buf=32 caused tpcds 
query regressions due to memory overhead).
   
   But i also concern about it because it will affect all cases, i am also 
investigating the best solution for the performance,.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2) [datafusion]

Reply via email to