andygrove opened a new issue, #4174: URL: https://github.com/apache/datafusion-comet/issues/4174
## Describe the problem `CometUdfBridge.evaluate` (`common/src/main/java/org/apache/comet/udf/CometUdfBridge.java`, on the JVM-scalar-UDF prototype branch) allocates output Arrow vectors via the project-wide `CometArrowAllocator`. That allocator is a `RootAllocator` that is not registered with Spark's `TaskMemoryManager`, so off-heap memory consumed by the UDF dispatch path is invisible to Spark's task memory accounting and back-pressure machinery. Under workloads with many concurrent JVM-UDF tasks per executor, this can drive native off-heap usage past the operator-level limits Spark would otherwise enforce. ## Describe the potential solution Either: 1. Register `CometArrowAllocator` as a `MemoryConsumer` in Spark's `TaskMemoryManager` so allocations and frees update the task's accounting. 2. Allocate UDF output vectors from a child allocator that is itself registered as a per-task consumer, so leakage and accounting stay scoped to the task. Option (2) is closer to the existing Spark-Arrow integration pattern. ## Additional context Identified during code review of the JVM-scalar-UDF prototype. Filed as a follow-up so the prototype PR can ship without a Spark-integration redesign. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
