andygrove opened a new pull request, #4075: URL: https://github.com/apache/datafusion-comet/pull/4075
## Which issue does this PR close? Part of #4072. ## Rationale for this change Issue #4072 documents allocator and JNI overhead in the per-batch metric reporting path. Profiling on the Rust side shows that `to_native_metric_node` builds a fresh `HashMap` and `Vec` per plan node and goes through `unwrap_or_default()` on the metric set even when a node has no metrics. None of the per-call sizes are large, but the path runs on every batch returned to the JVM, so a few small allocations per node multiply quickly. A small Cargo bench (release, M-series laptop) on synthetic plan shapes: | plan shape | tree walk | tree walk + encode | | ------------------- | ----------------- | ------------------- | | linear 3 x 5 mtx | 826ns -> 614ns | 1.11us -> 1.03us | | linear 8 x 8 mtx | 3.95us -> 2.24us | 5.67us -> 4.30us | | linear 20 x 10 mtx | 11.1us -> 6.91us | 19.3us -> 15.9us | | join 2x5 chains x 8 | 5.46us -> 3.10us | 7.65us -> 5.71us | Tree-walk drops 26-43%. Combined with the protobuf encode the net per-call saving is 7-25%. This does not solve all of the overhead in #4072 (JVM-side parse, JNI byte-array copy, and per-metric `to_string` are unchanged), but it is a low-risk wins-only change that does not touch the wire format or the JVM side. Larger restructuring can be evaluated separately. ## What changes are included in this PR? One file, `native/core/src/execution/metrics/utils.rs`: 1. `HashMap::with_capacity(16)` instead of `HashMap::new()`. Most operator metric maps are well under 20 entries (e.g. `hashJoinMetrics` reports 9, `nativeScanMetrics` ~20), so 16 avoids the default-capacity rehash on virtually every node without over-allocating. 2. `Vec::with_capacity(children.len())` for the children vec. 3. Replace `node_metrics.unwrap_or_default().iter()` with an `if let Some(metrics)` block, skipping the empty `MetricsSet` allocation when a node produces no metrics. The wire format is unchanged. ## How are these changes tested? Existing test coverage. `CometTaskMetricsSuite` continues to pass and exercises the metric pipeline end-to-end. A representative `cargo bench` (numbers above) confirms the perf intent; the bench itself is not included in this PR to keep it minimal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
