andygrove opened a new pull request, #1624:
URL: https://github.com/apache/datafusion-ballista/pull/1624

   # Which issue does this PR close?
   
   No associated issue.
   
   # Rationale for this change
   
   The executor today builds a `RuntimeEnv` with no `MemoryPool`, so DataFusion 
uses its unbounded default and spillable operators (sort, hash join, hash agg) 
grow until the host OOMs. There is no way to bound executor memory or trigger 
spilling. The `tuning-guide.md` already flagged this as future work.
   
   # What changes are included in this PR?
   
   Adds an opt-in `--memory-pool-size <SIZE>` flag (e.g. `8GB`, `512MiB`, plain 
bytes) on the executor binary. When set, every task receives an isolated 
`FairSpillPool` of size `total / concurrent_tasks`. Wrapping is applied after 
the base `RuntimeProducer` is resolved, so it composes with embedder-supplied 
producers (including the existing S3 helper) and preserves their `DiskManager`, 
`CacheManager`, and `ObjectStoreRegistry`. Hard error at startup if the 
per-task share would round to zero.
   
   Adds `bytesize` as a dep on `ballista-executor` for size parsing. No public 
API changes outside the executor crate.
   
   # Are there any user-facing changes?
   
   Yes. New optional CLI flag and a new section in 
`docs/source/user-guide/tuning-guide.md`. Default behavior is unchanged when 
the flag is omitted.
   
   Note: the spec briefly mentioned a `BALLISTA_MEMORY_POOL_SIZE` env var 
alias, but enabling clap's `env` feature is a workspace-wide change that's been 
deferred to a follow-up PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to