[ 
https://issues.apache.org/jira/browse/CASSANDRA-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas reassigned CASSANDRA-21233:
--------------------------------------------

    Assignee: C. Scott Andreas

> Authenticated DoS via UDF Heap Exhaustion
> -----------------------------------------
>
>                 Key: CASSANDRA-21233
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21233
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: CQL/Semantics, Feature/Rate Limiting, Feature/UDF
>            Reporter: Cyl
>            Assignee: C. Scott Andreas
>            Priority: Normal
>              Labels: dos, oom, performance, security
>
> h2. Vulnerability Description
> *Name*: Authenticated DoS via Java UDF Heap Exhaustion
> *Overview*:
> When Java UDFs are enabled, the user supplied code executes inside the 
> Cassandra JVM without any heap quotas or per-invocation memory guards. The 
> sandbox limits class access and CPU time, but *does not restrict heap usage*. 
> A malicious user can allocate massive arrays inside a UDF and hold references 
> to them until the invocation completes. Launching several concurrent UDF 
> invocations forces Cassandra to allocate multiple gigabytes per request, 
> which quickly exhausts the heap and causes native transport connections to 
> reset and the {{CassandraDaemon}} to terminate.
> *Affected Configurations*:
> * {{user_defined_functions_enabled: true}}
> * Any authenticated user with {{CREATE FUNCTION}} capability on a keyspace
> *Impact*:
> * Sudden drops of native connections ({{Connection reset by peer}})
> * {{NoHostAvailable}} and {{ConnectionShutdown}} errors for clients
> * Cassandra process exit (requires manual restart)
> * Repeated warnings in {{system.log}} ("User defined function ... ran longer 
> than 500ms") before crash
> Unlike the CPU fail-timeout (1500ms) there is no guardrail for heap usage, so 
> the JVM can be forced to OOM even if each invocation finishes within the 
> timeout window.
> h2. Proof-of-Concept
> Script: {{cve-study/finding-vul/DOS/poc_udf_memory_pressure.py}}
> Key parameters:
> {code:bash}
> # each invocation allocates 256 * 16 MiB = 4 GiB
> UDF_CHUNK_COUNT=256 \
> UDF_CHUNK_SIZE_MB=16 \
> UDF_CALLS=24 \
> UDF_CONCURRENCY=8 \
> python3 cve-study/finding-vul/DOS/poc_udf_memory_pressure.py
> {code}
> Script flow:
> # Enables/uses keyspace {{test_dos}}
> # Creates UDF {{memory_pressure()}} that allocates {{chunkCount * chunkSize}} 
> worth of {{byte[]}}
> # Spawns a {{ThreadPoolExecutor}} to issue multiple concurrent {{SELECT 
> test_dos.memory_pressure()}} calls
> # Prints per-call duration and captures driver errors
> h2. Observed Results
> * With the parameters above each invocation allocates 4 GiB. Eight parallel 
> invocations therefore require ~32 GiB simultaneously.
> * After ~9 seconds every worker reported failure:
>   * {{ConnectionShutdown('[Errno 104] Connection reset by peer')}}
>   * {{('Unable to complete the operation against any hosts', {})}}
> * {{pgrep -f CassandraDaemon}} returned no PID immediately after the attack, 
> confirming the daemon exited.
> * {{logs/system.log}} recorded repeated warnings: {{User defined function 
> test_dos.memory_pressure : () -> text ran longer than 500ms}} just prior to 
> the crash.
> The JVM exited before propagating a Cassandra-side error back to the client, 
> which means an authenticated attacker can take the entire node offline with 
> only a handful of requests.
> h2. Recommendations
> # *Per-UDF Heap Quota*: Track allocations (or at least array sizes) and abort 
> invocations that exceed a configurable threshold.
> # *Execution Guardrails*: Run UDFs inside a dedicated memory-limited process 
> or leverage {{-XX:MaxRAMPercentage}} on a child worker JVM.
> # *Rate Limits / Admission Control*: Limit how many UDF invocations can run 
> simultaneously per user/keyspace to avoid aggregate heap pressure.
> # *Safer Defaults*: Keep {{user_defined_functions_enabled}} disabled unless 
> operators explicitly opt into the risk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to