Kevin-Li-2025 opened a new pull request, #23066: URL: https://github.com/apache/datafusion/pull/23066
## Which issue does this PR close? - Closes #22848. ## Rationale for this change External sort merge phases currently select spill files based only on memory reservation. With many small spills, a single phase can open enough files to exceed the process file-descriptor limit. ## What changes are included in this PR? - Add `datafusion.runtime.max_spill_merge_fan_in` (`0` preserves the current unlimited behavior). - Clamp non-zero values to at least 2 during merge selection so each pass makes progress. - Support builder configuration and dynamic SQL `SET` / `RESET` / `SHOW`. - Add unit, runtime SQL, SQLLogicTest, information schema, and generated documentation coverage. ## Are there any user-facing changes? Users can cap the number of spill files opened in one external merge pass. The default remains unchanged. ## How was this change tested? - `cargo test -p datafusion-execution test_max_spill_merge_fan_in_builder_and_dynamic_update --lib` - `cargo test -p datafusion-physical-plan spill_merge_fan_in --lib` - `cargo test -p datafusion --test core_integration test_max_spill_merge_fan_in_runtime_config` - `cargo test -p datafusion-sqllogictest --test sqllogictests -- set_variable.slt` - `cargo check -p datafusion` - `cargo clippy -p datafusion-execution -p datafusion-physical-plan -p datafusion --lib -- -D warnings` - `cargo fmt --all -- --check` - `dev/update_config_docs.sh` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
