[ 
https://issues.apache.org/jira/browse/CASSANDRA-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyl updated CASSANDRA-21233:
----------------------------
    Description: 
h2. 2. Vulnerability Description

*Name*: Authenticated DoS via Java UDF Heap Exhaustion

*Overview*:
When Java UDFs are enabled, the user supplied code executes inside the 
Cassandra JVM without any heap quotas or per-invocation memory guards. The 
sandbox limits class access and CPU time, but *does not restrict heap usage*. A 
malicious user can allocate massive arrays inside a UDF and hold references to 
them until the invocation completes. Launching several concurrent UDF 
invocations forces Cassandra to allocate multiple gigabytes per request, which 
quickly exhausts the heap and causes native transport connections to reset and 
the {{CassandraDaemon}} to terminate.

*Affected Configurations*:
* {{user_defined_functions_enabled: true}}
* Any authenticated user with {{CREATE FUNCTION}} capability on a keyspace

*Impact*:
* Sudden drops of native connections ({{Connection reset by peer}})
* {{NoHostAvailable}} and {{ConnectionShutdown}} errors for clients
* Cassandra process exit (requires manual restart)
* Repeated warnings in {{system.log}} ("User defined function ... ran longer 
than 500ms") before crash

Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so 
the JVM can be forced to OOM even if each invocation finishes within the 
timeout window.

h2. 2. Proof-of-Concept

Script: {{cve-study/finding-vul/DOS/poc_udf_memory_pressure.py}}

Key parameters:

{code:bash}
# each invocation allocates 256 * 16 MiB = 4 GiB
UDF_CHUNK_COUNT=256 \
UDF_CHUNK_SIZE_MB=16 \
UDF_CALLS=24 \
UDF_CONCURRENCY=8 \
python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
{code}

Script flow:
# Enables/uses keyspace {{test_dos}}
# Creates UDF {{memory_pressure()}} that allocates {{chunkCount * chunkSize}} 
worth of {{byte[]}}
# Spawns a {{ThreadPoolExecutor}} to issue multiple concurrent {{SELECT 
test_dos.memory_pressure()}} calls
# Prints per-call duration and captures driver errors

h2. 3. Observed Results

* With the parameters above each invocation allocates 4 GiB. Eight parallel 
invocations therefore require ~32 GiB simultaneously.
* After ~9 seconds every worker reported failure:
  * {{ConnectionShutdown('[Errno 104] Connection reset by peer')}}
  * {{('Unable to complete the operation against any hosts', {})}}
* {{pgrep -f CassandraDaemon}} returned no PID immediately after the attack, 
confirming the daemon exited.
* {{logs/system.log}} recorded repeated warnings: {{User defined function 
test_dos.memory_pressure : () -> text ran longer than 500ms}} just prior to the 
crash.

The JVM exited before propagating a Cassandra-side error back to the client, 
which means an authenticated attacker can take the entire node offline with 
only a handful of requests.

h2. 4. Recommendations

# *Per-UDF Heap Quota*: Track allocations (or at least array sizes) and abort 
invocations that exceed a configurable threshold.
# *Execution Guardrails*: Run UDFs inside a dedicated memory-limited process or 
leverage {{-XX:MaxRAMPercentage}} on a child worker JVM.
# *Rate Limits / Admission Control*: Limit how many UDF invocations can run 
simultaneously per user/keyspace to avoid aggregate heap pressure.
# *Safer Defaults*: Keep {{user_defined_functions_enabled}} disabled unless 
operators explicitly opt into the risk.

  was:
# Authenticated DoS via UDF Heap Exhaustion

## 1. Vulnerability Description

**Name**: Authenticated DoS via Java UDF Heap Exhaustion

**Overview**:
When Java UDFs are enabled, the user supplied code executes inside the 
Cassandra JVM without any heap quotas or per-invocation memory guards. The 
sandbox limits class access and CPU time, but *does not restrict heap usage*. A 
malicious user can allocate massive arrays inside a UDF and hold references to 
them until the invocation completes. Launching several concurrent UDF 
invocations forces Cassandra to allocate multiple gigabytes per request, which 
quickly exhausts the heap and causes native transport connections to reset and 
the `CassandraDaemon` to terminate.

**Affected Configurations**:
- `user_defined_functions_enabled: true`
- Any authenticated user with `CREATE FUNCTION` capability on a keyspace

**Impact**:
- Sudden drops of native connections (`Connection reset by peer`)
- `NoHostAvailable` and `ConnectionShutdown` errors for clients
- Cassandra process exit (requires manual restart)
- Repeated warnings in `system.log` ("User defined function ... ran longer than 
500ms") before crash

Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so 
the JVM can be forced to OOM even if each invocation finishes within the 
timeout window.

## 2. Proof-of-Concept

Script: `cve-study/finding-vul/DOS/poc_udf_memory_pressure.py`

Key parameters:

```bash
# each invocation allocates 256 * 16 MiB = 4 GiB
UDF_CHUNK_COUNT=256 \
UDF_CHUNK_SIZE_MB=16 \
UDF_CALLS=24 \
UDF_CONCURRENCY=8 \
python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
```

Script flow:
1. Enables/uses keyspace `test_dos`
2. Creates UDF `memory_pressure()` that allocates `chunkCount * chunkSize` 
worth of `byte[]`
3. Spawns a `ThreadPoolExecutor` to issue multiple concurrent `SELECT 
test_dos.memory_pressure()` calls
4. Prints per-call duration and captures driver errors

## 3. Observed Results

- With the parameters above each invocation allocates 4 GiB. Eight parallel 
invocations therefore require ~32 GiB simultaneously.
- After ~9 seconds every worker reported failure:
  - `ConnectionShutdown('[Errno 104] Connection reset by peer')`
  - `('Unable to complete the operation against any hosts', {})`
- `pgrep -f CassandraDaemon` returned no PID immediately after the attack, 
confirming the daemon exited.
- `logs/system.log` recorded repeated warnings: `User defined function 
test_dos.memory_pressure : () -> text ran longer than 500ms` just prior to the 
crash.

The JVM exited before propagating a Cassandra-side error back to the client, 
which means an authenticated attacker can take the entire node offline with 
only a handful of requests.

## 4. Recommendations

1. **Per-UDF Heap Quota**: Track allocations (or at least array sizes) and 
abort invocations that exceed a configurable threshold.
2. **Execution Guardrails**: Run UDFs inside a dedicated memory-limited process 
or leverage `-XX:MaxRAMPercentage` on a child worker JVM.
3. **Rate Limits / Admission Control**: Limit how many UDF invocations can run 
simultaneously per user/keyspace to avoid aggregate heap pressure.
4. **Safer Defaults**: Keep `user_defined_functions_enabled` disabled unless 
operators explicitly opt into the risk.



> Authenticated DoS via UDF Heap Exhaustion
> -----------------------------------------
>
>                 Key: CASSANDRA-21233
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21233
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: CQL/Semantics, Feature/Rate Limiting, Feature/UDF
>            Reporter: Cyl
>            Priority: Normal
>              Labels: dos, oom, performance, security
>
> h2. 2. Vulnerability Description
> *Name*: Authenticated DoS via Java UDF Heap Exhaustion
> *Overview*:
> When Java UDFs are enabled, the user supplied code executes inside the 
> Cassandra JVM without any heap quotas or per-invocation memory guards. The 
> sandbox limits class access and CPU time, but *does not restrict heap usage*. 
> A malicious user can allocate massive arrays inside a UDF and hold references 
> to them until the invocation completes. Launching several concurrent UDF 
> invocations forces Cassandra to allocate multiple gigabytes per request, 
> which quickly exhausts the heap and causes native transport connections to 
> reset and the {{CassandraDaemon}} to terminate.
> *Affected Configurations*:
> * {{user_defined_functions_enabled: true}}
> * Any authenticated user with {{CREATE FUNCTION}} capability on a keyspace
> *Impact*:
> * Sudden drops of native connections ({{Connection reset by peer}})
> * {{NoHostAvailable}} and {{ConnectionShutdown}} errors for clients
> * Cassandra process exit (requires manual restart)
> * Repeated warnings in {{system.log}} ("User defined function ... ran longer 
> than 500ms") before crash
> Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so 
> the JVM can be forced to OOM even if each invocation finishes within the 
> timeout window.
> h2. 2. Proof-of-Concept
> Script: {{cve-study/finding-vul/DOS/poc_udf_memory_pressure.py}}
> Key parameters:
> {code:bash}
> # each invocation allocates 256 * 16 MiB = 4 GiB
> UDF_CHUNK_COUNT=256 \
> UDF_CHUNK_SIZE_MB=16 \
> UDF_CALLS=24 \
> UDF_CONCURRENCY=8 \
> python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
> {code}
> Script flow:
> # Enables/uses keyspace {{test_dos}}
> # Creates UDF {{memory_pressure()}} that allocates {{chunkCount * chunkSize}} 
> worth of {{byte[]}}
> # Spawns a {{ThreadPoolExecutor}} to issue multiple concurrent {{SELECT 
> test_dos.memory_pressure()}} calls
> # Prints per-call duration and captures driver errors
> h2. 3. Observed Results
> * With the parameters above each invocation allocates 4 GiB. Eight parallel 
> invocations therefore require ~32 GiB simultaneously.
> * After ~9 seconds every worker reported failure:
>   * {{ConnectionShutdown('[Errno 104] Connection reset by peer')}}
>   * {{('Unable to complete the operation against any hosts', {})}}
> * {{pgrep -f CassandraDaemon}} returned no PID immediately after the attack, 
> confirming the daemon exited.
> * {{logs/system.log}} recorded repeated warnings: {{User defined function 
> test_dos.memory_pressure : () -> text ran longer than 500ms}} just prior to 
> the crash.
> The JVM exited before propagating a Cassandra-side error back to the client, 
> which means an authenticated attacker can take the entire node offline with 
> only a handful of requests.
> h2. 4. Recommendations
> # *Per-UDF Heap Quota*: Track allocations (or at least array sizes) and abort 
> invocations that exceed a configurable threshold.
> # *Execution Guardrails*: Run UDFs inside a dedicated memory-limited process 
> or leverage {{-XX:MaxRAMPercentage}} on a child worker JVM.
> # *Rate Limits / Admission Control*: Limit how many UDF invocations can run 
> simultaneously per user/keyspace to avoid aggregate heap pressure.
> # *Safer Defaults*: Keep {{user_defined_functions_enabled}} disabled unless 
> operators explicitly opt into the risk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to