andrewmusselman opened a new issue, #944:
URL: https://github.com/apache/tooling-trusted-releases/issues/944
Discussion point that came up during key revokation in #922
From @dave2wave:
```
I think we need to be clear on ldap state in development/debug/test modes
and users. ASVS L2 finds our inconsistencies. We should probably discuss the
rules we really need. This way you can properly handle edge cases.
```
# LDAP State in Dev/Debug/Test Modes — Discussion Points
## Context
ASVS L2 compliance review has surfaced inconsistencies in how ATR handles
LDAP-dependent authentication and authorization across its operating modes:
**Production**, **Debug**, and **Debug+ALLOW_TESTS**. The core tension is that
LDAP is the source of truth for user identity, account status, committee
membership, and admin roles — but in development scenarios it may be
unavailable, partially available, or entirely bypassed.
This document maps the specific inconsistencies found in the codebase and
proposes rules to discuss.
---
## 1. The Three Operating Modes (and what actually changes)
| Behavior | Production | Debug (no ALLOW_TESTS) | Debug + ALLOW_TESTS |
|---|---|---|---|
| LDAP credentials required? | Yes (startup fails without them in
`principal.py`) | No (graceful degradation) | No |
| `ldap.is_active()` when no credentials | N/A | **Returns `True` for
everyone** | Returns `True` for everyone |
| `"test"` user gets admin? | No | No | **Yes** |
| `"test"` user bypasses LDAP entirely? | No | No | **Yes** (hardcoded in
`principal.py`, `ldap.py`, `user.py`) |
| Rate limiting enabled? | Yes | Yes | **No** |
| Session cache file used for auth? | No | **Yes** (fallback in
`AuthoriserLDAP`) | Yes |
| `_augment_test_membership()` adds "test" to everyone? | No | No | **Yes** |
| Test login endpoint (`/test/login`) available? | No (404) | No (404) |
**Yes** (creates session with no LDAP) |
### Discussion Point 1a: `is_active()` silently returns `True` when LDAP is
unconfigured
In `atr/ldap.py` line 248:
```python
async def is_active(asf_uid: str) -> bool:
if get_bind_credentials() is None:
return True # ← Any UID is treated as active
```
This means if a developer accidentally runs without LDAP credentials,
**every user is considered active including banned accounts**. This is the
primary path for checking banned/deleted accounts in `blueprints/common.py`,
`jwtoken.py`, and `ssh.py`.
**Question**: Should this be `return False` (fail-closed) when LDAP is
unavailable in non-Debug modes? Or should it raise an error rather than
silently succeeding?
### Discussion Point 1b: `fetch_admin_users()` and `fetch_tooling_users()`
degrade differently
When LDAP is unavailable:
- `fetch_admin_users()` returns `frozenset()` — no admins (fail-closed)
- `fetch_tooling_users(extra)` returns `extra` — the additional users still
get tooling access
This is inconsistent. One fails closed, the other partially open.
---
## 2. The "test" User — A Synthetic Identity with Real Privileges
When `ALLOW_TESTS=True`, a synthetic `"test"` user is created across at
least **seven different code paths**, each with its own hardcoded bypass:
| Location | What it does |
|---|---|
| `ldap.is_active()` | Returns `True` without checking LDAP |
| `user.is_admin()` / `is_admin_async()` | Returns `True` — test is a full
admin |
| `principal.AuthoriserLDAP.cache_refresh()` | Hardcodes
`committees={"test"}, projects={"test"}` |
| `principal._augment_test_membership()` | Adds "test" committee/project to
**every authenticated user** |
| `server._initialise_test_environment()` | Creates DB records: test
committee, project, with `"test"` as sole member |
| `get/test.py::test_login` | Creates a session cookie with `uid="test"` and
no LDAP verification |
| `user.projects()` | Makes the test project visible to all users |
### Discussion Point 2a: The test user is an admin everywhere, not just for
the test project
`user.is_admin("test")` returns `True` globally. This means in ALLOW_TESTS
mode, the test user can access admin functions like "browse as", LDAP lookup,
release deletion, token revocation, and project management. If ALLOW_TESTS
accidentally leaks into a production-like environment, this is a privilege
escalation path.
**Question**: Should the test user be admin of the test project only, or is
full admin necessary for testing? Could we scope it?
### Discussion Point 2b: `_augment_test_membership()` modifies real users'
memberships
In `principal.py` line 412:
```python
def _augment_test_membership(committees, projects):
if config.get().ALLOW_TESTS:
committees = committees.union({"test"})
projects = projects.union({"test"})
return committees, projects
```
This is called for **every user**, not just the test user. Every real
developer who authenticates via LDAP in Debug+ALLOW_TESTS mode is silently
granted membership in the "test" committee and project. This is intentional (so
devs can interact with the test project) but is never documented or surfaced to
the user.
**Question**: Is this the right approach, or should developers be explicitly
added to the test committee in the database instead?
### Discussion Point 2c: The `/test/login` endpoint bypasses all
authentication
```python
session_data = atr.models.session.CookieData(
uid="test", fullname="Test User",
pmcs=["test"], projects=["test"],
isMember=False, isChair=False, ...
)
util.write_quart_session_cookie(session_data)
```
This creates a valid session with no password, no LDAP check, and no OAuth
flow. The guard is only `config.get().ALLOW_TESTS`. There is also a config
check that `ALLOW_TESTS` can only be enabled in Debug mode, but Debug mode
itself is only restricted via `util.is_dev_environment()`.
**Question**: What are the exact conditions for `is_dev_environment()` and
are they sufficient? If it only checks for the absence of `PRODUCTION=true`, a
misconfigured deployment could end up in Debug mode.
---
## 3. Session Cache File — A Parallel Auth Path
In `principal.py::AuthoriserLDAP.cache_refresh()`, when running in Debug
mode (even without ALLOW_TESTS), there's a **session cache file** fallback:
```python
if config.get_mode() == config.Mode.Debug:
session_cache = await util.session_cache_read()
if asf_uid in session_cache:
# Use cached data instead of querying LDAP
```
This reads from `state/cache/user_session_cache.json` and bypasses LDAP
entirely for committee/project membership data. If this file contains stale
data (e.g., a user was removed from a committee), the application won't know.
### Discussion Point 3: The session cache has no TTL or integrity check
The cache file is a plain JSON file with no expiration timestamp, no
signature, and no validation. If it exists and contains an entry for a UID,
that entry is trusted completely. Compared to the in-memory `principal.Cache`
which has a 600-second TTL, this file-based cache has infinite TTL.
**Question**: Should this file-based cache be removed, given a TTL, or at
minimum documented as a development-only convenience that will never be enabled
in production?
---
## 4. SSH Authentication — Different LDAP Failure Semantics
The SSH server in `ssh.py` checks LDAP directly during `begin_auth()`:
```python
account = await ldap.account_lookup(username)
if account is None or ldap.is_banned(account):
log.failed_authentication("account_banned_or_deleted")
return True # ← Still requires auth, but won't match keys
```
But `ldap.account_lookup()` calls `get_bind_credentials()`, which returns
`None` when LDAP isn't configured. When credentials are `None`,
`account_lookup` returns `None`. This means the SSH path treats "LDAP
unavailable" the same as "account doesn't exist" — which means **SSH auth fails
closed** when LDAP is down.
Meanwhile, the web path (`blueprints/common.py`) calls `ldap.is_active()`,
which returns `True` when LDAP is unavailable — **web auth fails open**.
### Discussion Point 4: SSH fails closed, Web fails open — same system,
opposite behavior
| Auth Path | LDAP unavailable → | LDAP says user is banned → |
|---|---|---|
| Web (`is_active`) | **Allows access** | Denies access |
| SSH (`account_lookup`) | **Denies access** | Denies access |
| JWT (`is_active`) | **Allows access** | Denies access |
| PAT validation (`is_active`) | **Allows access** | Denies access |
**Question**: Should all paths have the same failure mode? The safe default
for a security-sensitive application is fail-closed.
---
## 5. JWT/PAT Authentication and LDAP Coupling
In `jwtoken.py`, the `verify()` function checks `ldap.is_active()` after
validating the JWT signature:
```python
if not await ldap.is_active(asf_uid):
raise base.ASFQuartException("Account is disabled", errorcode=401)
```
But since `is_active()` returns `True` when LDAP is unconfigured, a valid
JWT for a banned user would be accepted if LDAP credentials are missing.
Additionally, PAT hash validation is skipped when the `atr_th` claim is
absent — this is documented as intentional ("not all JWTs come from PATs"), but
it means there's a class of JWTs that have no revocation check at all.
### Discussion Point 5: Should JWT verification require LDAP to be available?
If LDAP is down in production, the current behavior is to accept all JWTs
regardless of account status. This could be acceptable (short-lived tokens,
30-minute TTL), but it contradicts the security model where LDAP is
authoritative for account status.
---
## 6. Proposed Rules for Discussion
Based on the above findings, here are candidate rules for the team to decide
on:
1. **Fail-closed by default**: `ldap.is_active()` should return `False` (or
raise) when LDAP is unavailable, unless Debug mode is active AND ALLOW_TESTS is
enabled. The current `return True` is dangerous.
2. **Consistent LDAP failure behavior**: All auth paths (web, SSH, JWT, PAT)
should behave the same way when LDAP is down. Pick one: fail-open or
fail-closed.
3. **Scope the test user**: The `"test"` user should have minimal privileges
needed for end-to-end testing — member of test committee/project only, not a
global admin. Create a separate `TEST_ADMIN` flag if admin testing is needed.
4. **Remove or TTL the session cache file**: The `user_session_cache.json`
fallback is an undocumented auth bypass in Debug mode. Either remove it, add a
TTL, or clearly mark it in code comments and developer docs.
5. **Audit `_augment_test_membership()`**: Decide whether silently adding
"test" committee/project membership to every real user is the right approach,
or if this should be database-driven.
6. **Document the mode matrix**: Create a clear, authoritative table (like
the one in §1) that lives in the codebase and is tested. When behavior changes
based on `ALLOW_TESTS` or `Mode.Debug`, there should be a single place to
understand the full picture.
7. **Guard Debug mode more strictly**: Verify that `is_dev_environment()` is
sufficient to prevent Debug mode in production-like deployments. Consider
requiring an explicit opt-in (e.g., a file on disk) rather than just the
absence of a `PRODUCTION` env var.
8. **Rate limiting in test mode**: Rate limiting is entirely disabled when
`ALLOW_TESTS` is on. If integration tests need higher limits, consider raising
the limits rather than removing them entirely, so tests still exercise the
rate-limiting code paths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]