[ 
https://issues.apache.org/jira/browse/CASSANDRA-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyl updated CASSANDRA-21228:
----------------------------
    Component/s: Feature/Authorization
                 Feature/Rate Limiting
         Labels: dos performance security  (was: )

> ALTER ROLE Password Hash DoS Vulnerability
> ------------------------------------------
>
>                 Key: CASSANDRA-21228
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21228
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Feature/Authorization, Feature/Rate Limiting
>            Reporter: Cyl
>            Priority: Normal
>              Labels: dos, performance, security
>
> # ALTER ROLE Password Hash DoS Vulnerability
> ## 1. Vulnerability Description
> **Name**: Authenticated DoS via `ALTER ROLE` Password Hashing
> **Overview**:
> In current Cassandra builds, `ALTER ROLE ... WITH PASSWORD` executes 
> `BCrypt.hashpw` synchronously on the standard request executor 
> (`Dispatcher.requestExecutor`). When an authenticated user issues many 
> password changes, the expensive bcrypt work monopolizes that pool, starving 
> all other CQL requests and producing an authenticated denial of service. This 
> is the same root problem addressed by CASSANDRA-17812 for `AUTH_RESPONSE`, 
> but the trigger has moved to `ALTER ROLE`.
> **Affected Configurations**:
> - Clusters running `PasswordAuthenticator`.
> - Any authenticated account that may alter its own password (default behavior 
> for non-superusers).
> - Attackers that can reach the native CQL port.
> **Impact**:
> - Legitimate query latency inflates dramatically (observed increase from ~2 
> ms to >1 s).
> - Attack threads hit numerous `OperationTimedOut` errors, demonstrating 
> thread-pool exhaustion.
> - Service recovers immediately once the attack stops, indicating a classic 
> CPU-starvation DoS.
> ## 2. Proof-of-Concept Steps
> The file `poc_dos.py` automates the scenario:
> 1. Start a single-node Cassandra instance with `PasswordAuthenticator` and 
> `CassandraAuthorizer`.
> 2. With the superuser, create a victim role named `target_role`.
> 3. Launch 200 concurrent threads that run `ALTER ROLE target_role WITH 
> PASSWORD '<random>'` in a tight loop.
> 4. Start a monitor thread executing `SELECT now()` once per second to record 
> latency.
> Run the following command:
> ```bash
> python3 poc_dos.py
> ```
> **Observed Output**:
> ```
> Starting attack with 200 threads...
> [Victim] Query latency: 0.3743s
> [Victim] Query latency: 0.9145s
> Worker failed: ('Unable to connect ... OperationTimedOut ...')
> [Victim] Query latency: 1.0181s
> ...
> ```
> Immediately after the attack begins, the monitor reports 300 ms–1 s latency 
> along with repeated `OperationTimedOut` errors. Once the attack stops, 
> latency returns to ~2 ms, proving the DoS is reproducible.
> ## 3. Problematic Code Reference
> The vulnerable path sits in `CassandraRoleManager.optionsToAssignments(...)` 
> and ultimately in `hashpw(...)`, both under 
> `src/java/org/apache/cassandra/auth/`:
> ```java
> private String optionsToAssignments(Map<Option, Object> options)
> {
>   return options.entrySet()
>           .stream()
>           .map(entry ->
>           {
>             switch (entry.getKey())
>             {
>               case PASSWORD:
>                 // bcrypt runs on Dispatcher.requestExecutor
>                 return String.format("salted_hash = '%s'", 
> escape(hashpw((String) entry.getValue())));
>               // other options elided
>             }
>           })
>           .filter(Objects::nonNull)
>           .collect(Collectors.joining(","));
> }
> private static String hashpw(String password)
> {
>   return BCrypt.hashpw(password, PasswordSaltSupplier.get());
> }
> ```
> Because every `ALTER ROLE ... WITH PASSWORD` is processed on the shared 
> `Dispatcher.requestExecutor`, each invocation above performs bcrypt hashing 
> on threads that also handle standard queries, leading to starvation.
> ## 4. Related Issue and Root Cause
> - **Related Fix**: [CASSANDRA-17812] “Rate-limit new client connection auth 
> setup to avoid overwhelming bcrypt”.
>   - Mitigation: route `AUTH_RESPONSE` (and similar) to `authExecutor`.
>   - Gap: `ALTER ROLE` / `CREATE ROLE` continue to run on `requestExecutor`.
> - **Shared Root Cause**: heavyweight bcrypt hashing without rate limiting or 
> pool isolation leads to CPU starvation.
> ## 5. Recommended Fixes
> 1. **Execution Isolation**: Dispatch password hashing work (`ALTER ROLE ... 
> PASSWORD`, `CREATE ROLE ... PASSWORD`, etc.) to a constrained executor 
> similar to `authExecutor`.
> 2. **Rate Limiting**: Enforce per-role, per-connection, or global throttles 
> (e.g., token bucket) on password modifications.
> 3. **Asynchronous Hashing**: Optionally compute bcrypt off-thread and update 
> the system tables once ready, returning an “operation queued” response 
> (requires protocol changes, higher complexity).
> 4. **Operational Mitigations** (until a code fix ships):
>    - Monitor CPU saturation closely; adjusting 
> `auth_bcrypt_gensalt_log2_rounds` does not solve the issue but may highlight 
> abuse sooner.
>    - Tighten credential/role cache TTLs (`roles_validity_in_ms`, 
> `credentials_validity_in_ms`) though this cannot block an active attacker.
> ## 6. Conclusion
> This vulnerability belongs to the same family as CASSANDRA-17812—bcrypt 
> computations starving the main request pool. Because any authenticated 
> account can trigger it with repeated `ALTER ROLE` statements, the risk is 
> high. We recommend extending the rate limiting / dedicated executor strategy 
> to all password-hashing pathways as soon as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to