andygrove opened a new pull request, #1637: URL: https://github.com/apache/datafusion-ballista/pull/1637
# Which issue does this PR close? Closes #. # Rationale for this change `--memory-pool-size` is optional today: when not provided, the executor installs no memory pool and DataFusion falls back to its unbounded default. Operators that respect the pool (aggregates, sorts, joins, sort shuffle writer) therefore never spill and just keep growing — until the OS starts swapping. On TPC-H Q3 against SF100 with `--partitions 2`, this caused executor RSS to reach ~24 GB and the query to take ~49s (vs ~13s for hash shuffle). The fix in #1636 patches the sort shuffle writer specifically, but every other memory-pool-aware operator still benefits from a sensible bounded default. New users on a laptop or in a small VM should not have to read the docs to learn that a flag is effectively required to keep the executor from blowing past physical RAM. # What changes are included in this PR? - New constant `DEFAULT_MEMORY_POOL_BYTES_PER_TASK = 1 GiB`. - When `--memory-pool-size` is not provided, default the total pool to `concurrent_tasks * DEFAULT_MEMORY_POOL_BYTES_PER_TASK`. With the default 8 task slots that's an 8 GiB total budget, split into 8 × 1 GiB per-task `FairSpillPool`s. - The startup log line now distinguishes `Memory pool (explicit)` vs `Memory pool (default)` so it's obvious which path is in effect. - CLI help text and `ExecutorProcessConfig::memory_pool_size` doc comment updated to describe the new default. # Are there any user-facing changes? The executor now always installs a bounded memory pool. Users who relied on the old unbounded default and have queries that run within physical RAM will see the same behaviour; users whose queries previously over-committed memory will start seeing operators spill instead. The new total budget is `concurrent_tasks * 1 GiB`; pass `--memory-pool-size` to override. ## Verification TPC-H Q3 against `/opt/tpch/sf100`, executor started with `--concurrent-tasks 8` and **no** `--memory-pool-size`: | `--partitions` | Sort before | Sort after | Hash (unchanged) | | --- | --- | --- | --- | | 2 | 49.1s (~24 GB RSS) | 16.9s (~10 GB RSS) | 13.3s | | 4 | 14.1s | 10.8s | 8.2s | | 8 | 7.7s | 7.9s | 7.1s | | 16 | 6.3s | 8.8s | 11.2s | Startup log confirms the new default is taking effect: ``` INFO ballista_executor::executor_process: Memory pool (default): total 8589934592 bytes split into 8 tasks (1073741824 bytes each) ``` `cargo test --release -p ballista-executor --lib` passes (22/22). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
