Cyl created CASSANDRA-21228:
-------------------------------

             Summary: ALTER ROLE Password Hash DoS Vulnerability
                 Key: CASSANDRA-21228
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21228
             Project: Apache Cassandra
          Issue Type: Bug
            Reporter: Cyl


# ALTER ROLE Password Hash DoS Vulnerability

## 1. Vulnerability Description

**Name**: Authenticated DoS via `ALTER ROLE` Password Hashing

**Overview**:
In current Cassandra builds, `ALTER ROLE ... WITH PASSWORD` executes 
`BCrypt.hashpw` synchronously on the standard request executor 
(`Dispatcher.requestExecutor`). When an authenticated user issues many password 
changes, the expensive bcrypt work monopolizes that pool, starving all other 
CQL requests and producing an authenticated denial of service. This is the same 
root problem addressed by CASSANDRA-17812 for `AUTH_RESPONSE`, but the trigger 
has moved to `ALTER ROLE`.

**Affected Configurations**:
- Clusters running `PasswordAuthenticator`.
- Any authenticated account that may alter its own password (default behavior 
for non-superusers).
- Attackers that can reach the native CQL port.

**Impact**:
- Legitimate query latency inflates dramatically (observed increase from ~2 ms 
to >1 s).
- Attack threads hit numerous `OperationTimedOut` errors, demonstrating 
thread-pool exhaustion.
- Service recovers immediately once the attack stops, indicating a classic 
CPU-starvation DoS.

## 2. Proof-of-Concept Steps

The file `poc_dos.py` automates the scenario:

1. Start a single-node Cassandra instance with `PasswordAuthenticator` and 
`CassandraAuthorizer`.
2. With the superuser, create a victim role named `target_role`.
3. Launch 200 concurrent threads that run `ALTER ROLE target_role WITH PASSWORD 
'<random>'` in a tight loop.
4. Start a monitor thread executing `SELECT now()` once per second to record 
latency.

Run the following command:

```bash
python3 poc_dos.py
```

**Observed Output**:

```
Starting attack with 200 threads...
[Victim] Query latency: 0.3743s
[Victim] Query latency: 0.9145s
Worker failed: ('Unable to connect ... OperationTimedOut ...')
[Victim] Query latency: 1.0181s
...
```

Immediately after the attack begins, the monitor reports 300 ms–1 s latency 
along with repeated `OperationTimedOut` errors. Once the attack stops, latency 
returns to ~2 ms, proving the DoS is reproducible.

## 3. Problematic Code Reference

The vulnerable path sits in `CassandraRoleManager.optionsToAssignments(...)` 
and ultimately in `hashpw(...)`, both under 
`src/java/org/apache/cassandra/auth/`:

```java
private String optionsToAssignments(Map<Option, Object> options)
{
  return options.entrySet()
          .stream()
          .map(entry ->
          {
            switch (entry.getKey())
            {
              case PASSWORD:
                // bcrypt runs on Dispatcher.requestExecutor
                return String.format("salted_hash = '%s'", 
escape(hashpw((String) entry.getValue())));
              // other options elided
            }
          })
          .filter(Objects::nonNull)
          .collect(Collectors.joining(","));
}

private static String hashpw(String password)
{
  return BCrypt.hashpw(password, PasswordSaltSupplier.get());
}
```

Because every `ALTER ROLE ... WITH PASSWORD` is processed on the shared 
`Dispatcher.requestExecutor`, each invocation above performs bcrypt hashing on 
threads that also handle standard queries, leading to starvation.

## 4. Related Issue and Root Cause

- **Related Fix**: [CASSANDRA-17812] “Rate-limit new client connection auth 
setup to avoid overwhelming bcrypt”.
  - Mitigation: route `AUTH_RESPONSE` (and similar) to `authExecutor`.
  - Gap: `ALTER ROLE` / `CREATE ROLE` continue to run on `requestExecutor`.
- **Shared Root Cause**: heavyweight bcrypt hashing without rate limiting or 
pool isolation leads to CPU starvation.

## 5. Recommended Fixes

1. **Execution Isolation**: Dispatch password hashing work (`ALTER ROLE ... 
PASSWORD`, `CREATE ROLE ... PASSWORD`, etc.) to a constrained executor similar 
to `authExecutor`.
2. **Rate Limiting**: Enforce per-role, per-connection, or global throttles 
(e.g., token bucket) on password modifications.
3. **Asynchronous Hashing**: Optionally compute bcrypt off-thread and update 
the system tables once ready, returning an “operation queued” response 
(requires protocol changes, higher complexity).
4. **Operational Mitigations** (until a code fix ships):
   - Monitor CPU saturation closely; adjusting 
`auth_bcrypt_gensalt_log2_rounds` does not solve the issue but may highlight 
abuse sooner.
   - Tighten credential/role cache TTLs (`roles_validity_in_ms`, 
`credentials_validity_in_ms`) though this cannot block an active attacker.

## 6. Conclusion

This vulnerability belongs to the same family as CASSANDRA-17812—bcrypt 
computations starving the main request pool. Because any authenticated account 
can trigger it with repeated `ALTER ROLE` statements, the risk is high. We 
recommend extending the rate limiting / dedicated executor strategy to all 
password-hashing pathways as soon as possible.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to