andygrove opened a new issue, #4174:
URL: https://github.com/apache/datafusion-comet/issues/4174

   ## Describe the problem
   
   `CometUdfBridge.evaluate` 
(`common/src/main/java/org/apache/comet/udf/CometUdfBridge.java`, on the 
JVM-scalar-UDF prototype branch) allocates output Arrow vectors via the 
project-wide `CometArrowAllocator`. That allocator is a `RootAllocator` that is 
not registered with Spark's `TaskMemoryManager`, so off-heap memory consumed by 
the UDF dispatch path is invisible to Spark's task memory accounting and 
back-pressure machinery.
   
   Under workloads with many concurrent JVM-UDF tasks per executor, this can 
drive native off-heap usage past the operator-level limits Spark would 
otherwise enforce.
   
   ## Describe the potential solution
   
   Either:
   
   1. Register `CometArrowAllocator` as a `MemoryConsumer` in Spark's 
`TaskMemoryManager` so allocations and frees update the task's accounting.
   2. Allocate UDF output vectors from a child allocator that is itself 
registered as a per-task consumer, so leakage and accounting stay scoped to the 
task.
   
   Option (2) is closer to the existing Spark-Arrow integration pattern.
   
   ## Additional context
   
   Identified during code review of the JVM-scalar-UDF prototype. Filed as a 
follow-up so the prototype PR can ship without a Spark-integration redesign.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to