andygrove opened a new issue, #4072: URL: https://github.com/apache/datafusion-comet/issues/4072
## Describe the problem Profiling with async-profiler on native execution shows a significant amount of time spent in metric reporting. This happens on the per-batch hot path, so the cost is paid for every batch returned from DataFusion back to the JVM. ## Where the cost comes from `update_comet_metric` is called from `jni_api.rs`: - Once for every batch returned to the JVM (`executePlan` fast path) - Periodically during `ScanExec` polling (every ~100 polls, gated by `metrics_update_interval`) - At stream end For each call, the pipeline does the following: **Rust side** (`native/core/src/execution/metrics/utils.rs`): 1. `to_native_metric_node` recursively walks the `SparkPlan` tree. 2. Per node: allocates a fresh `HashMap<String, i64>`, inserts each metric with `name.to_string()` (a per-metric allocation), and pushes the children into a new `Vec`. 3. Serializes the resulting `NativeMetricNode` protobuf via `prost`'s `encode_to_vec()`: allocates a `Vec<u8>`. 4. `byte_array_from_slice` allocates a new Java `byte[]` and copies the protobuf bytes in. 5. One JNI upcall to `set_all_from_bytes([B)V`. **JVM side** (`CometMetricNode.scala`): 6. `Metric.NativeMetricNode.parseFrom(bytes)`: full protobuf tree re-allocation. 7. Recursive walk pairing the deserialized tree with the local `CometMetricNode` tree via `zip(children)`. 8. Per metric: `metrics.get(metricName)` lookup on the node's `Map[String, SQLMetric]`, then `SQLMetric.set(v)`. So every update cycle includes: a tree of per-node `HashMap` allocations, per-metric `String` allocations, protobuf serialization, a Java byte-array allocation and copy, protobuf deserialization into a fresh tree, and a per-metric hash-map lookup on the JVM side, none of which carry state across cycles even though the tree shape and metric names are stable for the lifetime of the plan. ## Impact Visible in async-profiler flame graphs as a disproportionate share of native-thread CPU during query execution, especially for plans with many nodes and small-to-medium batch sizes (where update frequency is high relative to per-batch work). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
