Cyl created CASSANDRA-21233:
-------------------------------
Summary: Authenticated DoS via UDF Heap Exhaustion
Key: CASSANDRA-21233
URL: https://issues.apache.org/jira/browse/CASSANDRA-21233
Project: Apache Cassandra
Issue Type: Bug
Reporter: Cyl
# Authenticated DoS via UDF Heap Exhaustion
## 1. Vulnerability Description
**Name**: Authenticated DoS via Java UDF Heap Exhaustion
**Overview**:
When Java UDFs are enabled, the user supplied code executes inside the
Cassandra JVM without any heap quotas or per-invocation memory guards. The
sandbox limits class access and CPU time, but *does not restrict heap usage*. A
malicious user can allocate massive arrays inside a UDF and hold references to
them until the invocation completes. Launching several concurrent UDF
invocations forces Cassandra to allocate multiple gigabytes per request, which
quickly exhausts the heap and causes native transport connections to reset and
the `CassandraDaemon` to terminate.
**Affected Configurations**:
- `user_defined_functions_enabled: true`
- Any authenticated user with `CREATE FUNCTION` capability on a keyspace
**Impact**:
- Sudden drops of native connections (`Connection reset by peer`)
- `NoHostAvailable` and `ConnectionShutdown` errors for clients
- Cassandra process exit (requires manual restart)
- Repeated warnings in `system.log` ("User defined function ... ran longer than
500ms") before crash
Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so
the JVM can be forced to OOM even if each invocation finishes within the
timeout window.
## 2. Proof-of-Concept
Script: `cve-study/finding-vul/DOS/poc_udf_memory_pressure.py`
Key parameters:
```bash
# each invocation allocates 256 * 16 MiB = 4 GiB
UDF_CHUNK_COUNT=256 \
UDF_CHUNK_SIZE_MB=16 \
UDF_CALLS=24 \
UDF_CONCURRENCY=8 \
python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
```
Script flow:
1. Enables/uses keyspace `test_dos`
2. Creates UDF `memory_pressure()` that allocates `chunkCount * chunkSize`
worth of `byte[]`
3. Spawns a `ThreadPoolExecutor` to issue multiple concurrent `SELECT
test_dos.memory_pressure()` calls
4. Prints per-call duration and captures driver errors
## 3. Observed Results
- With the parameters above each invocation allocates 4 GiB. Eight parallel
invocations therefore require ~32 GiB simultaneously.
- After ~9 seconds every worker reported failure:
- `ConnectionShutdown('[Errno 104] Connection reset by peer')`
- `('Unable to complete the operation against any hosts', {})`
- `pgrep -f CassandraDaemon` returned no PID immediately after the attack,
confirming the daemon exited.
- `logs/system.log` recorded repeated warnings: `User defined function
test_dos.memory_pressure : () -> text ran longer than 500ms` just prior to the
crash.
The JVM exited before propagating a Cassandra-side error back to the client,
which means an authenticated attacker can take the entire node offline with
only a handful of requests.
## 4. Recommendations
1. **Per-UDF Heap Quota**: Track allocations (or at least array sizes) and
abort invocations that exceed a configurable threshold.
2. **Execution Guardrails**: Run UDFs inside a dedicated memory-limited process
or leverage `-XX:MaxRAMPercentage` on a child worker JVM.
3. **Rate Limits / Admission Control**: Limit how many UDF invocations can run
simultaneously per user/keyspace to avoid aggregate heap pressure.
4. **Safer Defaults**: Keep `user_defined_functions_enabled` disabled unless
operators explicitly opt into the risk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]