andygrove commented on PR #1578:
URL: 
https://github.com/apache/datafusion-ballista/pull/1578#issuecomment-4330823344

   ### Code review
   
   Found 8 issues:
   
   1. Breaking public API: `Executor::new` signature changed from 
`Option<Arc<dyn ExecutionEngine>>` to `Arc<dyn ExecutionEngine>`. The PR 
description says "backward compatible" but downstream callers passing `None` 
won't compile. Consider keeping the old form, adding a `#[deprecated]` shim, or 
applying the `api change` label.
   
   
https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/executor/src/executor.rs#L120-L150
   
   2. Breaking public API: `pub fn change_work_dir` on `pub struct 
ShuffleReaderExec` is renamed to `with_work_dir` with no deprecation alias. 
Custom `ExecutionEngine` impls in downstream crates that call the old name will 
break.
   
   
https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/execution_plans/shuffle_reader.rs#L510-L520
   
   3. Cached connections retain stale `GrpcClientConfig`. 
`ShuffleReaderExec::execute()` derives `GrpcClientConfig` per task from 
`TaskContext::session_config()` (so retry counts from #1577 can vary), but the 
pool keys only on `(host, port)` and returns idle clients with their original 
config. Either include the relevant config fields in the cache key, rebuild the 
client when config differs, or document the limitation.
   
   
https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/client_pool.rs#L290-L310
   
   4. Eviction boundary off-by-one: `while deque.front().is_some_and(|e| 
e.idle_since <= deadline)` evicts entries idle for *exactly* `idle_timeout`, 
but the rustdoc says "longer than" the timeout. Should be `<` for the 
documented semantics.
   
   
https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/client_pool.rs#L268-L275
   
   5. Module-level rustdoc inverts the eviction policy: "evicts idle 
connections that have not been returned within the configured `idle_timeout`". 
The eviction task only sees connections that *have* been returned; it removes 
ones that have sat idle for longer than `idle_timeout`.
   
   
https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/client_pool.rs#L20-L35
   
   6. Typo "Number of second" should be "Number of seconds" in both the rustdoc 
and the clap `help` string for `--connection-cache`. User-visible at the CLI; 
appears in two locations.
   
   
https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/executor/src/config.rs#L150-L160
   
   
https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/executor/src/executor_process.rs#L138-L145
   
   7. Truncated rustdoc on `with_eviction_thread`: ``/// Create a pool that 
evicts connections idle longer than \`idle_timeout\`,`` — ends mid-sentence, 
looks copy-pasted from `new()`. `ballista-core` has `#![warn(missing_docs)]` so 
doc quality on public API matters.
   
   
https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/client_pool.rs#L150-L160
   
   8. `dashmap` is added as an unconditional dependency in `ballista-core`, 
even though pooling is runtime-opt-in (`--connection-cache` defaults to 0) and 
only `client_pool.rs` uses it. Consider gating both the dep and the module 
behind a cargo feature so `--no-default-features` builds stay lean.
   
   
https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/Cargo.toml#L18-L25


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to