geoffreyclaude opened a new issue, #22708:
URL: https://github.com/apache/datafusion/issues/22708

   ## Summary
   
   `ExecutionPlan` metadata currently describes `EvaluationType::Eager` as an 
operator stream that eagerly generates `RecordBatch` values in one or more 
spawned Tokio tasks. `BufferExec` and `AnalyzeExec` both appear to match that 
behavior, but neither reports eager evaluation in its `PlanProperties`.
   
   This makes `EvaluationType` less reliable for optimizers or integrations 
that need to reason about whether an operator drives child streams from 
background tasks.
   
   ## Current behavior
   
   On current `main`:
   
   - `BufferExec::new` clones the input properties and only changes 
`SchedulingType` to `Cooperative`.
   - `BufferExec::execute` wraps the input in `MemoryBufferedStream::new(...)`.
   - `MemoryBufferedStream::new(...)` immediately creates a `SpawnedTask` that 
polls the input stream into an internal queue.
   
   That behavior looks eager, but the plan retains the input evaluation type.
   
   Similarly:
   
   - `AnalyzeExec::compute_properties(...)` constructs 
`PlanProperties::new(...)` and leaves `evaluation_type` at the default `Lazy`.
   - `AnalyzeExec::execute` creates a `RecordBatchReceiverStream::builder(...)` 
and calls `builder.run_input(...)` for each input partition.
   - The comments describe those futures as running input partitions in 
parallel on separate Tokio tasks.
   
   That also looks eager, but the plan reports lazy evaluation.
   
   ## Expected behavior
   
   If the documented contract for `EvaluationType::Eager` is intended to mean 
that an operator drives child stream polling in spawned Tokio tasks, then 
`BufferExec` and `AnalyzeExec` should set 
`PlanProperties::with_evaluation_type(EvaluationType::Eager)`.
   
   `BufferExec` should probably always be eager because it creates the 
background buffering task for its input stream.
   
   `AnalyzeExec` should probably be eager when it runs input partitions through 
`RecordBatchReceiverStream::builder(...).run_input(...)`, similar to other 
operators that drive input partitions from spawned tasks.
   
   ## Why this matters
   
   DataFusion already exposes `need_data_exchange(plan)` as a helper that 
checks:
   
   ```rust
   plan.properties().evaluation_type == EvaluationType::Eager
   ```
   
   So stale or incomplete `EvaluationType` metadata can make physical-plan 
analysis miss operators that actually create independent child-polling 
pipelines.
   
   ## Version
   
   Observed on Apache DataFusion `main` on June 2, 2026.
   
   ## Possible fix
   
   Set `EvaluationType::Eager` in the `PlanProperties` for these operators, 
with focused tests asserting their reported evaluation type.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to