Cyl created CASSANDRA-21228:
-------------------------------
Summary: ALTER ROLE Password Hash DoS Vulnerability
Key: CASSANDRA-21228
URL: https://issues.apache.org/jira/browse/CASSANDRA-21228
Project: Apache Cassandra
Issue Type: Bug
Reporter: Cyl
# ALTER ROLE Password Hash DoS Vulnerability
## 1. Vulnerability Description
**Name**: Authenticated DoS via `ALTER ROLE` Password Hashing
**Overview**:
In current Cassandra builds, `ALTER ROLE ... WITH PASSWORD` executes
`BCrypt.hashpw` synchronously on the standard request executor
(`Dispatcher.requestExecutor`). When an authenticated user issues many password
changes, the expensive bcrypt work monopolizes that pool, starving all other
CQL requests and producing an authenticated denial of service. This is the same
root problem addressed by CASSANDRA-17812 for `AUTH_RESPONSE`, but the trigger
has moved to `ALTER ROLE`.
**Affected Configurations**:
- Clusters running `PasswordAuthenticator`.
- Any authenticated account that may alter its own password (default behavior
for non-superusers).
- Attackers that can reach the native CQL port.
**Impact**:
- Legitimate query latency inflates dramatically (observed increase from ~2 ms
to >1 s).
- Attack threads hit numerous `OperationTimedOut` errors, demonstrating
thread-pool exhaustion.
- Service recovers immediately once the attack stops, indicating a classic
CPU-starvation DoS.
## 2. Proof-of-Concept Steps
The file `poc_dos.py` automates the scenario:
1. Start a single-node Cassandra instance with `PasswordAuthenticator` and
`CassandraAuthorizer`.
2. With the superuser, create a victim role named `target_role`.
3. Launch 200 concurrent threads that run `ALTER ROLE target_role WITH PASSWORD
'<random>'` in a tight loop.
4. Start a monitor thread executing `SELECT now()` once per second to record
latency.
Run the following command:
```bash
python3 poc_dos.py
```
**Observed Output**:
```
Starting attack with 200 threads...
[Victim] Query latency: 0.3743s
[Victim] Query latency: 0.9145s
Worker failed: ('Unable to connect ... OperationTimedOut ...')
[Victim] Query latency: 1.0181s
...
```
Immediately after the attack begins, the monitor reports 300 ms–1 s latency
along with repeated `OperationTimedOut` errors. Once the attack stops, latency
returns to ~2 ms, proving the DoS is reproducible.
## 3. Problematic Code Reference
The vulnerable path sits in `CassandraRoleManager.optionsToAssignments(...)`
and ultimately in `hashpw(...)`, both under
`src/java/org/apache/cassandra/auth/`:
```java
private String optionsToAssignments(Map<Option, Object> options)
{
return options.entrySet()
.stream()
.map(entry ->
{
switch (entry.getKey())
{
case PASSWORD:
// bcrypt runs on Dispatcher.requestExecutor
return String.format("salted_hash = '%s'",
escape(hashpw((String) entry.getValue())));
// other options elided
}
})
.filter(Objects::nonNull)
.collect(Collectors.joining(","));
}
private static String hashpw(String password)
{
return BCrypt.hashpw(password, PasswordSaltSupplier.get());
}
```
Because every `ALTER ROLE ... WITH PASSWORD` is processed on the shared
`Dispatcher.requestExecutor`, each invocation above performs bcrypt hashing on
threads that also handle standard queries, leading to starvation.
## 4. Related Issue and Root Cause
- **Related Fix**: [CASSANDRA-17812] “Rate-limit new client connection auth
setup to avoid overwhelming bcrypt”.
- Mitigation: route `AUTH_RESPONSE` (and similar) to `authExecutor`.
- Gap: `ALTER ROLE` / `CREATE ROLE` continue to run on `requestExecutor`.
- **Shared Root Cause**: heavyweight bcrypt hashing without rate limiting or
pool isolation leads to CPU starvation.
## 5. Recommended Fixes
1. **Execution Isolation**: Dispatch password hashing work (`ALTER ROLE ...
PASSWORD`, `CREATE ROLE ... PASSWORD`, etc.) to a constrained executor similar
to `authExecutor`.
2. **Rate Limiting**: Enforce per-role, per-connection, or global throttles
(e.g., token bucket) on password modifications.
3. **Asynchronous Hashing**: Optionally compute bcrypt off-thread and update
the system tables once ready, returning an “operation queued” response
(requires protocol changes, higher complexity).
4. **Operational Mitigations** (until a code fix ships):
- Monitor CPU saturation closely; adjusting
`auth_bcrypt_gensalt_log2_rounds` does not solve the issue but may highlight
abuse sooner.
- Tighten credential/role cache TTLs (`roles_validity_in_ms`,
`credentials_validity_in_ms`) though this cannot block an active attacker.
## 6. Conclusion
This vulnerability belongs to the same family as CASSANDRA-17812—bcrypt
computations starving the main request pool. Because any authenticated account
can trigger it with repeated `ALTER ROLE` statements, the risk is high. We
recommend extending the rate limiting / dedicated executor strategy to all
password-hashing pathways as soon as possible.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]