[PR] perf: reduce per-node allocations in to_native_metric_node [datafusion-comet]

via GitHub Fri, 24 Apr 2026 15:03:12 -0700


andygrove opened a new pull request, #4075:
URL: https://github.com/apache/datafusion-comet/pull/4075


   ## Which issue does this PR close?
   
   Part of #4072.
   
   ## Rationale for this change
   
   Issue #4072 documents allocator and JNI overhead in the per-batch metric 
reporting path. Profiling on the Rust side shows that `to_native_metric_node` 
builds a fresh `HashMap` and `Vec` per plan node and goes through 
`unwrap_or_default()` on the metric set even when a node has no metrics. None 
of the per-call sizes are large, but the path runs on every batch returned to 
the JVM, so a few small allocations per node multiply quickly.
   
   A small Cargo bench (release, M-series laptop) on synthetic plan shapes:
   
   | plan shape          | tree walk         | tree walk + encode  |
   | ------------------- | ----------------- | ------------------- |
   | linear  3 x  5 mtx  |  826ns -> 614ns   | 1.11us -> 1.03us    |
   | linear  8 x  8 mtx  | 3.95us -> 2.24us  | 5.67us -> 4.30us    |
   | linear 20 x 10 mtx  | 11.1us -> 6.91us  | 19.3us -> 15.9us    |
   | join 2x5 chains x 8 | 5.46us -> 3.10us  | 7.65us -> 5.71us    |
   
   Tree-walk drops 26-43%. Combined with the protobuf encode the net per-call 
saving is 7-25%.
   
   This does not solve all of the overhead in #4072 (JVM-side parse, JNI 
byte-array copy, and per-metric `to_string` are unchanged), but it is a 
low-risk wins-only change that does not touch the wire format or the JVM side. 
Larger restructuring can be evaluated separately.
   
   ## What changes are included in this PR?
   
   One file, `native/core/src/execution/metrics/utils.rs`:
   
   1. `HashMap::with_capacity(16)` instead of `HashMap::new()`. Most operator 
metric maps are well under 20 entries (e.g. `hashJoinMetrics` reports 9, 
`nativeScanMetrics` ~20), so 16 avoids the default-capacity rehash on virtually 
every node without over-allocating.
   2. `Vec::with_capacity(children.len())` for the children vec.
   3. Replace `node_metrics.unwrap_or_default().iter()` with an `if let 
Some(metrics)` block, skipping the empty `MetricsSet` allocation when a node 
produces no metrics.
   
   The wire format is unchanged.
   
   ## How are these changes tested?
   
   Existing test coverage. `CometTaskMetricsSuite` continues to pass and 
exercises the metric pipeline end-to-end. A representative `cargo bench` 
(numbers above) confirms the perf intent; the bench itself is not included in 
this PR to keep it minimal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] perf: reduce per-node allocations in to_native_metric_node [datafusion-comet]

Reply via email to