[ 
https://issues.apache.org/jira/browse/CASSANDRA-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyl updated CASSANDRA-21233:
----------------------------
    Component/s: Feature/Rate Limiting
                 CQL/Semantics
                 Feature/UDF
         Labels: dos oom performance security  (was: )

> Authenticated DoS via UDF Heap Exhaustion
> -----------------------------------------
>
>                 Key: CASSANDRA-21233
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21233
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: CQL/Semantics, Feature/Rate Limiting, Feature/UDF
>            Reporter: Cyl
>            Priority: Normal
>              Labels: dos, oom, performance, security
>
> # Authenticated DoS via UDF Heap Exhaustion
> ## 1. Vulnerability Description
> **Name**: Authenticated DoS via Java UDF Heap Exhaustion
> **Overview**:
> When Java UDFs are enabled, the user supplied code executes inside the 
> Cassandra JVM without any heap quotas or per-invocation memory guards. The 
> sandbox limits class access and CPU time, but *does not restrict heap usage*. 
> A malicious user can allocate massive arrays inside a UDF and hold references 
> to them until the invocation completes. Launching several concurrent UDF 
> invocations forces Cassandra to allocate multiple gigabytes per request, 
> which quickly exhausts the heap and causes native transport connections to 
> reset and the `CassandraDaemon` to terminate.
> **Affected Configurations**:
> - `user_defined_functions_enabled: true`
> - Any authenticated user with `CREATE FUNCTION` capability on a keyspace
> **Impact**:
> - Sudden drops of native connections (`Connection reset by peer`)
> - `NoHostAvailable` and `ConnectionShutdown` errors for clients
> - Cassandra process exit (requires manual restart)
> - Repeated warnings in `system.log` ("User defined function ... ran longer 
> than 500ms") before crash
> Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so 
> the JVM can be forced to OOM even if each invocation finishes within the 
> timeout window.
> ## 2. Proof-of-Concept
> Script: `cve-study/finding-vul/DOS/poc_udf_memory_pressure.py`
> Key parameters:
> ```bash
> # each invocation allocates 256 * 16 MiB = 4 GiB
> UDF_CHUNK_COUNT=256 \
> UDF_CHUNK_SIZE_MB=16 \
> UDF_CALLS=24 \
> UDF_CONCURRENCY=8 \
> python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
> ```
> Script flow:
> 1. Enables/uses keyspace `test_dos`
> 2. Creates UDF `memory_pressure()` that allocates `chunkCount * chunkSize` 
> worth of `byte[]`
> 3. Spawns a `ThreadPoolExecutor` to issue multiple concurrent `SELECT 
> test_dos.memory_pressure()` calls
> 4. Prints per-call duration and captures driver errors
> ## 3. Observed Results
> - With the parameters above each invocation allocates 4 GiB. Eight parallel 
> invocations therefore require ~32 GiB simultaneously.
> - After ~9 seconds every worker reported failure:
>   - `ConnectionShutdown('[Errno 104] Connection reset by peer')`
>   - `('Unable to complete the operation against any hosts', {})`
> - `pgrep -f CassandraDaemon` returned no PID immediately after the attack, 
> confirming the daemon exited.
> - `logs/system.log` recorded repeated warnings: `User defined function 
> test_dos.memory_pressure : () -> text ran longer than 500ms` just prior to 
> the crash.
> The JVM exited before propagating a Cassandra-side error back to the client, 
> which means an authenticated attacker can take the entire node offline with 
> only a handful of requests.
> ## 4. Recommendations
> 1. **Per-UDF Heap Quota**: Track allocations (or at least array sizes) and 
> abort invocations that exceed a configurable threshold.
> 2. **Execution Guardrails**: Run UDFs inside a dedicated memory-limited 
> process or leverage `-XX:MaxRAMPercentage` on a child worker JVM.
> 3. **Rate Limits / Admission Control**: Limit how many UDF invocations can 
> run simultaneously per user/keyspace to avoid aggregate heap pressure.
> 4. **Safer Defaults**: Keep `user_defined_functions_enabled` disabled unless 
> operators explicitly opt into the risk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to