andygrove opened a new pull request, #1637:
URL: https://github.com/apache/datafusion-ballista/pull/1637

   # Which issue does this PR close?
   
   Closes #.
   
   # Rationale for this change
   
   `--memory-pool-size` is optional today: when not provided, the executor 
installs no memory pool and DataFusion falls back to its unbounded default. 
Operators that respect the pool (aggregates, sorts, joins, sort shuffle writer) 
therefore never spill and just keep growing — until the OS starts swapping. On 
TPC-H Q3 against SF100 with `--partitions 2`, this caused executor RSS to reach 
~24 GB and the query to take ~49s (vs ~13s for hash shuffle).
   
   The fix in #1636 patches the sort shuffle writer specifically, but every 
other memory-pool-aware operator still benefits from a sensible bounded 
default. New users on a laptop or in a small VM should not have to read the 
docs to learn that a flag is effectively required to keep the executor from 
blowing past physical RAM.
   
   # What changes are included in this PR?
   
   - New constant `DEFAULT_MEMORY_POOL_BYTES_PER_TASK = 1 GiB`.
   - When `--memory-pool-size` is not provided, default the total pool to 
`concurrent_tasks * DEFAULT_MEMORY_POOL_BYTES_PER_TASK`. With the default 8 
task slots that's an 8 GiB total budget, split into 8 × 1 GiB per-task 
`FairSpillPool`s.
   - The startup log line now distinguishes `Memory pool (explicit)` vs `Memory 
pool (default)` so it's obvious which path is in effect.
   - CLI help text and `ExecutorProcessConfig::memory_pool_size` doc comment 
updated to describe the new default.
   
   # Are there any user-facing changes?
   
   The executor now always installs a bounded memory pool. Users who relied on 
the old unbounded default and have queries that run within physical RAM will 
see the same behaviour; users whose queries previously over-committed memory 
will start seeing operators spill instead. The new total budget is 
`concurrent_tasks * 1 GiB`; pass `--memory-pool-size` to override.
   
   ## Verification
   
   TPC-H Q3 against `/opt/tpch/sf100`, executor started with 
`--concurrent-tasks 8` and **no** `--memory-pool-size`:
   
   | `--partitions` | Sort before | Sort after | Hash (unchanged) |
   | --- | --- | --- | --- |
   | 2 | 49.1s (~24 GB RSS) | 16.9s (~10 GB RSS) | 13.3s |
   | 4 | 14.1s | 10.8s | 8.2s |
   | 8 | 7.7s | 7.9s | 7.1s |
   | 16 | 6.3s | 8.8s | 11.2s |
   
   Startup log confirms the new default is taking effect:
   
   ```
   INFO ballista_executor::executor_process: Memory pool (default): total 
8589934592 bytes split into 8 tasks (1073741824 bytes each)
   ```
   
   `cargo test --release -p ballista-executor --lib` passes (22/22).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to