andygrove opened a new issue, #4172:
URL: https://github.com/apache/datafusion-comet/issues/4172
## Describe the problem
`JVMClasses::with_env` (in `native/jni-bridge/src/lib.rs`) attaches the
current thread but does not push/pop a JNI local frame. When the calling thread
is already attached (the normal case for executor worker threads, which stay
attached for the lifetime of the JVM), local references created inside the
closure (`new_string`, `new_long_array`, `new_object`, etc.) are never released
until the thread terminates.
Per-call sites currently affected (each creates 1–4 local refs per
invocation):
- `native/core/src/execution/operators/scan.rs` — 4 long arrays per
`get_next_batch` call
- `native/core/src/execution/operators/shuffle_scan.rs`
- `native/spark-expr/src/jvm_udf/mod.rs` — 1 string + 2 long arrays per
`evaluate` call (added in the JVM-scalar-UDF prototype, currently on
`prototype-jvm-scalar-udf`)
- `native/core/src/execution/expressions/subquery.rs`
- `native/core/src/execution/memory_pools/{unified_pool,fair_pool}.rs`
- `native/core/src/parquet/encryption_support.rs`
In long-running executors processing many batches, these local references
accumulate against the JVM's per-thread local-ref capacity (HotSpot default is
large but finite). The effect is a slow leak that may eventually surface as
`OutOfMemoryError` in JNI calls under heavy sustained load.
## Describe the potential solution
Recommended: wrap the closure body in `JVMClasses::with_env` with
`env.with_local_frame(...)` so every JNI call site gets automatic local-ref
cleanup on return. This is a single-point fix (~5 LOC in
`jni-bridge/src/lib.rs`) that needs only a quick audit of existing callers to
confirm that none rely on local references surviving past the closure (a quick
scan suggests none do — they all return `ArrayRef`, scalar values, or
`Result<bool>`).
Alternative: introduce a sibling `with_local_frame_env` helper for opt-in
use, leaving `with_env` unchanged.
## Additional context
This issue was identified during a code review of the JVM-scalar-UDF
prototype (#TBD). The reviewer flagged it specifically against
`JvmScalarUdfExpr::evaluate`, but a quick survey of the codebase showed it is a
systemic pattern, not unique to that file. Filing here so the fix can be
addressed as a separate, focused PR rather than expanded scope on the prototype.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]