kosiew commented on code in PR #23066:
URL: https://github.com/apache/datafusion/pull/23066#discussion_r3472570690


##########
datafusion/physical-plan/src/sorts/multi_level_merge.rs:
##########
@@ -373,13 +373,23 @@ impl MultiLevelMergeBuilder {
     ) -> Result<(Vec<SortedSpillFile>, usize)> {
         assert_ne!(buffer_len, 0, "Buffer length must be greater than 0");
         let mut number_of_spills_to_read_for_current_phase = 0;
+        let configured_fan_in = self
+            .spill_manager
+            .env()
+            .disk_manager
+            .max_spill_merge_fan_in();
+        let max_spill_files = effective_spill_merge_fan_in(configured_fan_in);
         // Track total memory needed for spill file buffers. When the
         // reservation has pre-reserved bytes (from 
sort_spill_reservation_bytes),
         // those bytes cover the first N spill files without additional pool
         // allocation, preventing starvation under memory pressure.
         let mut total_needed: usize = 0;
 
         for spill in &self.sorted_spill_files {
+            if number_of_spills_to_read_for_current_phase >= max_spill_files {

Review Comment:
   Nice coverage on the helper cases for `0/1/2/N`. One extra test could make 
this even stronger: build multiple `SortedSpillFile`s with 
`max_spill_merge_fan_in = 2` and assert that a merge phase selects only two 
spill inputs. That would guard the FD-limit regression more directly than the 
current config and helper coverage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to