andygrove opened a new issue, #3873:
URL: https://github.com/apache/datafusion-comet/issues/3873
## Describe the bug
When Spark's memory manager is under pressure and calls `spill()` on Comet's
`NativeMemoryConsumer`, it returns 0 — meaning Spark cannot reclaim any memory
from Comet's native operators. This prevents cross-task memory eviction and
causes Comet to require significantly more off-heap memory than necessary at
scale.
From `CometTaskMemoryManager.java`:
```java
private class NativeMemoryConsumer extends MemoryConsumer {
public long spill(long size, MemoryConsumer trigger) {
return 0; // No spilling
}
}
```
## Impact
When multiple concurrent tasks share a constrained off-heap pool (e.g., 16
tasks sharing 16GB), each task's shuffle writer greedily buffers data until
`try_grow()` fails. Since Spark cannot reclaim memory from any of them, tasks
compete for the shared pool without coordination, leading to OOM at lower
memory settings.
Benchmarking TPC-H SF100 Q9 with `local[4]` showed Comet's memory growing
elastically with the offHeap budget (448 MB increase from 4g to 8g), while
Spark's native operators stay flat because they participate in the spill
protocol.
## Expected behavior
`NativeMemoryConsumer.spill()` should signal the native memory pool to apply
backpressure, causing DataFusion operators (Sort, Aggregate, Shuffle) to spill
their internal state to disk. The actual bytes freed should be returned to
Spark.
## Proposed approach
Add a `SpillState` struct with atomics and a condvar for cross-thread
coordination:
1. When Spark calls `spill(size)`, JNI into native to set spill pressure
2. The pool's `try_grow()` checks pressure and returns `ResourcesExhausted`
3. DataFusion operators catch this and spill internally
4. As operators call `shrink()`, freed bytes are tracked and returned to
Spark
See `docs/source/contributor-guide/memory-management.md` for full analysis
including comparison with Gluten's approach.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]