andygrove opened a new issue, #4172:
URL: https://github.com/apache/datafusion-comet/issues/4172

   ## Describe the problem
   
   `JVMClasses::with_env` (in `native/jni-bridge/src/lib.rs`) attaches the 
current thread but does not push/pop a JNI local frame. When the calling thread 
is already attached (the normal case for executor worker threads, which stay 
attached for the lifetime of the JVM), local references created inside the 
closure (`new_string`, `new_long_array`, `new_object`, etc.) are never released 
until the thread terminates.
   
   Per-call sites currently affected (each creates 1–4 local refs per 
invocation):
   
   - `native/core/src/execution/operators/scan.rs` — 4 long arrays per 
`get_next_batch` call
   - `native/core/src/execution/operators/shuffle_scan.rs`
   - `native/spark-expr/src/jvm_udf/mod.rs` — 1 string + 2 long arrays per 
`evaluate` call (added in the JVM-scalar-UDF prototype, currently on 
`prototype-jvm-scalar-udf`)
   - `native/core/src/execution/expressions/subquery.rs`
   - `native/core/src/execution/memory_pools/{unified_pool,fair_pool}.rs`
   - `native/core/src/parquet/encryption_support.rs`
   
   In long-running executors processing many batches, these local references 
accumulate against the JVM's per-thread local-ref capacity (HotSpot default is 
large but finite). The effect is a slow leak that may eventually surface as 
`OutOfMemoryError` in JNI calls under heavy sustained load.
   
   ## Describe the potential solution
   
   Recommended: wrap the closure body in `JVMClasses::with_env` with 
`env.with_local_frame(...)` so every JNI call site gets automatic local-ref 
cleanup on return. This is a single-point fix (~5 LOC in 
`jni-bridge/src/lib.rs`) that needs only a quick audit of existing callers to 
confirm that none rely on local references surviving past the closure (a quick 
scan suggests none do — they all return `ArrayRef`, scalar values, or 
`Result<bool>`).
   
   Alternative: introduce a sibling `with_local_frame_env` helper for opt-in 
use, leaving `with_env` unchanged.
   
   ## Additional context
   
   This issue was identified during a code review of the JVM-scalar-UDF 
prototype (#TBD). The reviewer flagged it specifically against 
`JvmScalarUdfExpr::evaluate`, but a quick survey of the codebase showed it is a 
systemic pattern, not unique to that file. Filing here so the fix can be 
addressed as a separate, focused PR rather than expanded scope on the prototype.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to