[
https://issues.apache.org/jira/browse/CASSANDRA-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cyl updated CASSANDRA-21233:
----------------------------
Description:
h2. 2. Vulnerability Description
*Name*: Authenticated DoS via Java UDF Heap Exhaustion
*Overview*:
When Java UDFs are enabled, the user supplied code executes inside the
Cassandra JVM without any heap quotas or per-invocation memory guards. The
sandbox limits class access and CPU time, but *does not restrict heap usage*. A
malicious user can allocate massive arrays inside a UDF and hold references to
them until the invocation completes. Launching several concurrent UDF
invocations forces Cassandra to allocate multiple gigabytes per request, which
quickly exhausts the heap and causes native transport connections to reset and
the {{CassandraDaemon}} to terminate.
*Affected Configurations*:
* {{user_defined_functions_enabled: true}}
* Any authenticated user with {{CREATE FUNCTION}} capability on a keyspace
*Impact*:
* Sudden drops of native connections ({{Connection reset by peer}})
* {{NoHostAvailable}} and {{ConnectionShutdown}} errors for clients
* Cassandra process exit (requires manual restart)
* Repeated warnings in {{system.log}} ("User defined function ... ran longer
than 500ms") before crash
Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so
the JVM can be forced to OOM even if each invocation finishes within the
timeout window.
h2. 2. Proof-of-Concept
Script: {{cve-study/finding-vul/DOS/poc_udf_memory_pressure.py}}
Key parameters:
{code:bash}
# each invocation allocates 256 * 16 MiB = 4 GiB
UDF_CHUNK_COUNT=256 \
UDF_CHUNK_SIZE_MB=16 \
UDF_CALLS=24 \
UDF_CONCURRENCY=8 \
python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
{code}
Script flow:
# Enables/uses keyspace {{test_dos}}
# Creates UDF {{memory_pressure()}} that allocates {{chunkCount * chunkSize}}
worth of {{byte[]}}
# Spawns a {{ThreadPoolExecutor}} to issue multiple concurrent {{SELECT
test_dos.memory_pressure()}} calls
# Prints per-call duration and captures driver errors
h2. 3. Observed Results
* With the parameters above each invocation allocates 4 GiB. Eight parallel
invocations therefore require ~32 GiB simultaneously.
* After ~9 seconds every worker reported failure:
* {{ConnectionShutdown('[Errno 104] Connection reset by peer')}}
* {{('Unable to complete the operation against any hosts', {})}}
* {{pgrep -f CassandraDaemon}} returned no PID immediately after the attack,
confirming the daemon exited.
* {{logs/system.log}} recorded repeated warnings: {{User defined function
test_dos.memory_pressure : () -> text ran longer than 500ms}} just prior to the
crash.
The JVM exited before propagating a Cassandra-side error back to the client,
which means an authenticated attacker can take the entire node offline with
only a handful of requests.
h2. 4. Recommendations
# *Per-UDF Heap Quota*: Track allocations (or at least array sizes) and abort
invocations that exceed a configurable threshold.
# *Execution Guardrails*: Run UDFs inside a dedicated memory-limited process or
leverage {{-XX:MaxRAMPercentage}} on a child worker JVM.
# *Rate Limits / Admission Control*: Limit how many UDF invocations can run
simultaneously per user/keyspace to avoid aggregate heap pressure.
# *Safer Defaults*: Keep {{user_defined_functions_enabled}} disabled unless
operators explicitly opt into the risk.
was:
# Authenticated DoS via UDF Heap Exhaustion
## 1. Vulnerability Description
**Name**: Authenticated DoS via Java UDF Heap Exhaustion
**Overview**:
When Java UDFs are enabled, the user supplied code executes inside the
Cassandra JVM without any heap quotas or per-invocation memory guards. The
sandbox limits class access and CPU time, but *does not restrict heap usage*. A
malicious user can allocate massive arrays inside a UDF and hold references to
them until the invocation completes. Launching several concurrent UDF
invocations forces Cassandra to allocate multiple gigabytes per request, which
quickly exhausts the heap and causes native transport connections to reset and
the `CassandraDaemon` to terminate.
**Affected Configurations**:
- `user_defined_functions_enabled: true`
- Any authenticated user with `CREATE FUNCTION` capability on a keyspace
**Impact**:
- Sudden drops of native connections (`Connection reset by peer`)
- `NoHostAvailable` and `ConnectionShutdown` errors for clients
- Cassandra process exit (requires manual restart)
- Repeated warnings in `system.log` ("User defined function ... ran longer than
500ms") before crash
Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so
the JVM can be forced to OOM even if each invocation finishes within the
timeout window.
## 2. Proof-of-Concept
Script: `cve-study/finding-vul/DOS/poc_udf_memory_pressure.py`
Key parameters:
```bash
# each invocation allocates 256 * 16 MiB = 4 GiB
UDF_CHUNK_COUNT=256 \
UDF_CHUNK_SIZE_MB=16 \
UDF_CALLS=24 \
UDF_CONCURRENCY=8 \
python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
```
Script flow:
1. Enables/uses keyspace `test_dos`
2. Creates UDF `memory_pressure()` that allocates `chunkCount * chunkSize`
worth of `byte[]`
3. Spawns a `ThreadPoolExecutor` to issue multiple concurrent `SELECT
test_dos.memory_pressure()` calls
4. Prints per-call duration and captures driver errors
## 3. Observed Results
- With the parameters above each invocation allocates 4 GiB. Eight parallel
invocations therefore require ~32 GiB simultaneously.
- After ~9 seconds every worker reported failure:
- `ConnectionShutdown('[Errno 104] Connection reset by peer')`
- `('Unable to complete the operation against any hosts', {})`
- `pgrep -f CassandraDaemon` returned no PID immediately after the attack,
confirming the daemon exited.
- `logs/system.log` recorded repeated warnings: `User defined function
test_dos.memory_pressure : () -> text ran longer than 500ms` just prior to the
crash.
The JVM exited before propagating a Cassandra-side error back to the client,
which means an authenticated attacker can take the entire node offline with
only a handful of requests.
## 4. Recommendations
1. **Per-UDF Heap Quota**: Track allocations (or at least array sizes) and
abort invocations that exceed a configurable threshold.
2. **Execution Guardrails**: Run UDFs inside a dedicated memory-limited process
or leverage `-XX:MaxRAMPercentage` on a child worker JVM.
3. **Rate Limits / Admission Control**: Limit how many UDF invocations can run
simultaneously per user/keyspace to avoid aggregate heap pressure.
4. **Safer Defaults**: Keep `user_defined_functions_enabled` disabled unless
operators explicitly opt into the risk.
> Authenticated DoS via UDF Heap Exhaustion
> -----------------------------------------
>
> Key: CASSANDRA-21233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21233
> Project: Apache Cassandra
> Issue Type: Bug
> Components: CQL/Semantics, Feature/Rate Limiting, Feature/UDF
> Reporter: Cyl
> Priority: Normal
> Labels: dos, oom, performance, security
>
> h2. 2. Vulnerability Description
> *Name*: Authenticated DoS via Java UDF Heap Exhaustion
> *Overview*:
> When Java UDFs are enabled, the user supplied code executes inside the
> Cassandra JVM without any heap quotas or per-invocation memory guards. The
> sandbox limits class access and CPU time, but *does not restrict heap usage*.
> A malicious user can allocate massive arrays inside a UDF and hold references
> to them until the invocation completes. Launching several concurrent UDF
> invocations forces Cassandra to allocate multiple gigabytes per request,
> which quickly exhausts the heap and causes native transport connections to
> reset and the {{CassandraDaemon}} to terminate.
> *Affected Configurations*:
> * {{user_defined_functions_enabled: true}}
> * Any authenticated user with {{CREATE FUNCTION}} capability on a keyspace
> *Impact*:
> * Sudden drops of native connections ({{Connection reset by peer}})
> * {{NoHostAvailable}} and {{ConnectionShutdown}} errors for clients
> * Cassandra process exit (requires manual restart)
> * Repeated warnings in {{system.log}} ("User defined function ... ran longer
> than 500ms") before crash
> Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so
> the JVM can be forced to OOM even if each invocation finishes within the
> timeout window.
> h2. 2. Proof-of-Concept
> Script: {{cve-study/finding-vul/DOS/poc_udf_memory_pressure.py}}
> Key parameters:
> {code:bash}
> # each invocation allocates 256 * 16 MiB = 4 GiB
> UDF_CHUNK_COUNT=256 \
> UDF_CHUNK_SIZE_MB=16 \
> UDF_CALLS=24 \
> UDF_CONCURRENCY=8 \
> python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
> {code}
> Script flow:
> # Enables/uses keyspace {{test_dos}}
> # Creates UDF {{memory_pressure()}} that allocates {{chunkCount * chunkSize}}
> worth of {{byte[]}}
> # Spawns a {{ThreadPoolExecutor}} to issue multiple concurrent {{SELECT
> test_dos.memory_pressure()}} calls
> # Prints per-call duration and captures driver errors
> h2. 3. Observed Results
> * With the parameters above each invocation allocates 4 GiB. Eight parallel
> invocations therefore require ~32 GiB simultaneously.
> * After ~9 seconds every worker reported failure:
> * {{ConnectionShutdown('[Errno 104] Connection reset by peer')}}
> * {{('Unable to complete the operation against any hosts', {})}}
> * {{pgrep -f CassandraDaemon}} returned no PID immediately after the attack,
> confirming the daemon exited.
> * {{logs/system.log}} recorded repeated warnings: {{User defined function
> test_dos.memory_pressure : () -> text ran longer than 500ms}} just prior to
> the crash.
> The JVM exited before propagating a Cassandra-side error back to the client,
> which means an authenticated attacker can take the entire node offline with
> only a handful of requests.
> h2. 4. Recommendations
> # *Per-UDF Heap Quota*: Track allocations (or at least array sizes) and abort
> invocations that exceed a configurable threshold.
> # *Execution Guardrails*: Run UDFs inside a dedicated memory-limited process
> or leverage {{-XX:MaxRAMPercentage}} on a child worker JVM.
> # *Rate Limits / Admission Control*: Limit how many UDF invocations can run
> simultaneously per user/keyspace to avoid aggregate heap pressure.
> # *Safer Defaults*: Keep {{user_defined_functions_enabled}} disabled unless
> operators explicitly opt into the risk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]