andygrove commented on PR #1578: URL: https://github.com/apache/datafusion-ballista/pull/1578#issuecomment-4330823344
### Code review Found 8 issues: 1. Breaking public API: `Executor::new` signature changed from `Option<Arc<dyn ExecutionEngine>>` to `Arc<dyn ExecutionEngine>`. The PR description says "backward compatible" but downstream callers passing `None` won't compile. Consider keeping the old form, adding a `#[deprecated]` shim, or applying the `api change` label. https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/executor/src/executor.rs#L120-L150 2. Breaking public API: `pub fn change_work_dir` on `pub struct ShuffleReaderExec` is renamed to `with_work_dir` with no deprecation alias. Custom `ExecutionEngine` impls in downstream crates that call the old name will break. https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/execution_plans/shuffle_reader.rs#L510-L520 3. Cached connections retain stale `GrpcClientConfig`. `ShuffleReaderExec::execute()` derives `GrpcClientConfig` per task from `TaskContext::session_config()` (so retry counts from #1577 can vary), but the pool keys only on `(host, port)` and returns idle clients with their original config. Either include the relevant config fields in the cache key, rebuild the client when config differs, or document the limitation. https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/client_pool.rs#L290-L310 4. Eviction boundary off-by-one: `while deque.front().is_some_and(|e| e.idle_since <= deadline)` evicts entries idle for *exactly* `idle_timeout`, but the rustdoc says "longer than" the timeout. Should be `<` for the documented semantics. https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/client_pool.rs#L268-L275 5. Module-level rustdoc inverts the eviction policy: "evicts idle connections that have not been returned within the configured `idle_timeout`". The eviction task only sees connections that *have* been returned; it removes ones that have sat idle for longer than `idle_timeout`. https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/client_pool.rs#L20-L35 6. Typo "Number of second" should be "Number of seconds" in both the rustdoc and the clap `help` string for `--connection-cache`. User-visible at the CLI; appears in two locations. https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/executor/src/config.rs#L150-L160 https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/executor/src/executor_process.rs#L138-L145 7. Truncated rustdoc on `with_eviction_thread`: ``/// Create a pool that evicts connections idle longer than \`idle_timeout\`,`` — ends mid-sentence, looks copy-pasted from `new()`. `ballista-core` has `#![warn(missing_docs)]` so doc quality on public API matters. https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/src/client_pool.rs#L150-L160 8. `dashmap` is added as an unconditional dependency in `ballista-core`, even though pooling is runtime-opt-in (`--connection-cache` defaults to 0) and only `client_pool.rs` uses it. Consider gating both the dep and the module behind a cargo feature so `--no-default-features` builds stay lean. https://github.com/apache/datafusion-ballista/blob/011b0d3dadd2f74b5da5ecc2d33df04237a36b07/ballista/core/Cargo.toml#L18-L25 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
