Cyl created CASSANDRA-21233:
-------------------------------

             Summary: Authenticated DoS via UDF Heap Exhaustion
                 Key: CASSANDRA-21233
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21233
             Project: Apache Cassandra
          Issue Type: Bug
            Reporter: Cyl


# Authenticated DoS via UDF Heap Exhaustion

## 1. Vulnerability Description

**Name**: Authenticated DoS via Java UDF Heap Exhaustion

**Overview**:
When Java UDFs are enabled, the user supplied code executes inside the 
Cassandra JVM without any heap quotas or per-invocation memory guards. The 
sandbox limits class access and CPU time, but *does not restrict heap usage*. A 
malicious user can allocate massive arrays inside a UDF and hold references to 
them until the invocation completes. Launching several concurrent UDF 
invocations forces Cassandra to allocate multiple gigabytes per request, which 
quickly exhausts the heap and causes native transport connections to reset and 
the `CassandraDaemon` to terminate.

**Affected Configurations**:
- `user_defined_functions_enabled: true`
- Any authenticated user with `CREATE FUNCTION` capability on a keyspace

**Impact**:
- Sudden drops of native connections (`Connection reset by peer`)
- `NoHostAvailable` and `ConnectionShutdown` errors for clients
- Cassandra process exit (requires manual restart)
- Repeated warnings in `system.log` ("User defined function ... ran longer than 
500ms") before crash

Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so 
the JVM can be forced to OOM even if each invocation finishes within the 
timeout window.

## 2. Proof-of-Concept

Script: `cve-study/finding-vul/DOS/poc_udf_memory_pressure.py`

Key parameters:

```bash
# each invocation allocates 256 * 16 MiB = 4 GiB
UDF_CHUNK_COUNT=256 \
UDF_CHUNK_SIZE_MB=16 \
UDF_CALLS=24 \
UDF_CONCURRENCY=8 \
python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
```

Script flow:
1. Enables/uses keyspace `test_dos`
2. Creates UDF `memory_pressure()` that allocates `chunkCount * chunkSize` 
worth of `byte[]`
3. Spawns a `ThreadPoolExecutor` to issue multiple concurrent `SELECT 
test_dos.memory_pressure()` calls
4. Prints per-call duration and captures driver errors

## 3. Observed Results

- With the parameters above each invocation allocates 4 GiB. Eight parallel 
invocations therefore require ~32 GiB simultaneously.
- After ~9 seconds every worker reported failure:
  - `ConnectionShutdown('[Errno 104] Connection reset by peer')`
  - `('Unable to complete the operation against any hosts', {})`
- `pgrep -f CassandraDaemon` returned no PID immediately after the attack, 
confirming the daemon exited.
- `logs/system.log` recorded repeated warnings: `User defined function 
test_dos.memory_pressure : () -> text ran longer than 500ms` just prior to the 
crash.

The JVM exited before propagating a Cassandra-side error back to the client, 
which means an authenticated attacker can take the entire node offline with 
only a handful of requests.

## 4. Recommendations

1. **Per-UDF Heap Quota**: Track allocations (or at least array sizes) and 
abort invocations that exceed a configurable threshold.
2. **Execution Guardrails**: Run UDFs inside a dedicated memory-limited process 
or leverage `-XX:MaxRAMPercentage` on a child worker JVM.
3. **Rate Limits / Admission Control**: Limit how many UDF invocations can run 
simultaneously per user/keyspace to avoid aggregate heap pressure.
4. **Safer Defaults**: Keep `user_defined_functions_enabled` disabled unless 
operators explicitly opt into the risk.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to