Hello team,

In May, public certificate authorities will stop including the Client Authentication Extended Key Usage (EKU) in public TLS certificates. This major change could have significant implications for users of Apache Geode.

Jinwoo Hwang, Principal Software Developer at SAS and release manager of Apache Geode 2.0, has composed an article outlining three viable strategies for mitigating the effects of this change. We would like to propose publishing it on the ASF blog.

I will attach to this email a plain-text version of the piece for your consideration. I will also connect with members of the Marketing and Publicity PMC to provide formatted versions of the article, as well as the necessary graphics files.

Sincerely,
Bryan Behrenshausen
# Navigating the Public Certificate Authority Client Authentication EKU Sunset: Apache Geode's Path Forward

**By: Jinwoo Hwang**  
Lead Developer, Project Lead, and Release Manager, Apache Geode 2.0 
*[https://JinwooHwang.com](https://JinwooHwang.com)*

## The Death of Public mTLS: Prepare Your Apache Geode Deployment for the May Deadline

Effective May 2026, major public Certificate Authorities, such as DigiCert and Let’s Encrypt, will stop including the Client Authentication Extended Key Usage (EKU) in public TLS certificates. If your Apache Geode deployment uses mutual TLS (mTLS) with public certificates, any certificate renewed after this date will lack the necessary credentials to authenticate as a client. This will result in TLS handshake failures and a total loss of cluster connectivity unless you migrate to a Private PKI or an alternative purpose-built trust model.

## Introduction: When Industry Standards Shift Beneath Your Feet

In the ever-evolving landscape of internet security, certificate authorities serve as critical trust anchors. Recently, major public Certificate Authorities (CAs) including Let's Encrypt, DigiCert, and others have begun removing or restricting the `clientAuth` Extended Key Usage (EKU) from publicly-issued leaf certificates. For organizations running distributed systems that relied on public-CA-signed certificates for mutual TLS (mTLS) client authentication, this industry-wide policy shift represents a significant operational challenge.

Apache Geode, like many enterprise distributed data platforms, has historically supported mTLS as a primary mechanism for authenticating clients and cluster members. The removal of the `clientAuth` EKU from public certificates means that deployments using public-CA-issued client certificates can no longer authenticate clients during the TLS handshake. The Java TLS stack will reject certificates lacking this critical extension.

This article explores the technical underpinnings of the EKU change, its impact on distributed systems, and the three comprehensive mitigation strategies I have prescribed. Whether you're operating a single-region cluster or a globally distributed WAN topology, understanding these approaches will be crucial for maintaining secure, uninterrupted Apache Geode operations.

## Understanding the Technical Landscape

### What Are Extended Key Usage (EKU) Extensions?

The X.509 v3 certificate standard defines Extended Key Usage as an extension that constrains the purposes for which a certificate's public key can be used. When a certificate includes an EKU extension, it explicitly declares what cryptographic operations are permissible. The relevant OIDs (Object Identifiers) for TLS operations are:

- **serverAuth** (1.3.6.1.5.5.7.3.1): Indicates the certificate may be used to authenticate a TLS server
- **clientAuth** (1.3.6.1.5.5.7.3.2): Indicates the certificate may be used to authenticate a TLS client

Java's JSSE (Java Secure Socket Extension) stack—which Apache Geode relies upon for all TLS operations—enforces EKU checking by default. During a TLS handshake where client certificate authentication is required (`ssl-require-authentication=true`), JSSE validates that the client's presented certificate contains the `clientAuth` OID. If absent, the handshake fails with an authentication error.

### Why Are Public CAs Removing clientAuth?

The decision by public CAs to cease including `clientAuth` in publicly-issued certificates stems from several factors:

1. **Scope Limitation**: Public CAs are designed to facilitate public internet trust relationships, primarily for HTTPS websites. Client authentication is inherently about private, internal trust relationships.
2. **Abuse Prevention**: Publicly-issued certificates with `clientAuth` could theoretically be used to impersonate legitimate clients across any system accepting that CA's trust anchor—a security risk that tightening EKU restrictions helps mitigate.
3. **Standards Alignment**: The CA/Browser Forum and other standards bodies have increasingly advocated for limiting the scope of publicly-issued certificates to their intended purpose: authenticating servers in public-facing scenarios.
4. **Operational Best Practices**: Certificate lifecycle management for client identities should be under the direct control of the organization managing those identities, not delegated to a public CA whose primary purpose is different.

### Impact on Apache Geode Deployments

Apache Geode's security architecture centers on several key components:

- **Per-component SSL Configuration**: Apache Geode allows independent TLS settings for `cluster` (member-to-member), `server` (client-to-server), `locator`, `gateway` (WAN), `jmx`, and `web` components via the `ssl-enabled-components` property.
- **Certificate-Based Authentication**: When `ssl-require-authentication=true`, Apache Geode members and clients must present certificates that pass the peer's truststore validation and EKU checks.
- **File-Watching Credential Reload**: Apache Geode's `FileWatchingX509ExtendedKeyManager` and `FileWatchingX509ExtendedTrustManager` monitor keystores and truststores on disk, enabling zero-downtime certificate rotation—a critical capability for maintaining security without service interruption.

Deployments that previously obtained certificates from Let's Encrypt, DigiCert, or other public CAs for both their servers and clients now face a dilemma: client certificates obtained from public CAs will no longer contain `clientAuth`, making mTLS client authentication impossible without changes.

## Three Paths Forward: Comprehensive Mitigation Strategies

After extensive testing and validation, I have developed three distinct mitigation strategies. Each addresses the EKU limitation while offering different trade-offs in operational complexity, security posture, and architectural flexibility.

### Approach 1: Internal/Enterprise CA for Full mTLS

**Philosophical Foundation**: The first approach embraces the principle that for internal communication inside your firewall, you can leverage internal certificates issued by your own Certificate Authority instead of public certificates—if you're comfortable operating internal PKI infrastructure. This ensures client identity is managed by internal infrastructure under your direct control.

#### Architecture Overview

Operating an internal Certificate Authority means establishing your own public key infrastructure (PKI):

- **Offline Root CA**: A long-lived root certificate kept air-gapped or in Hardware Security Module(HSM)-backed cold storage, used only to sign intermediate CA certificates
- **Online Issuing Intermediate CA**: An intermediate certificate that signs leaf certificates for servers, clients, and cluster members
- **Separate Environments**: Different issuing intermediates per environment (production, staging, development) prevent trust leakage across boundaries

#### Technical Implementation

The internal CA approach maintains full mTLS across all Apache Geode components. Both servers and clients present certificates:

```properties
# Server/Locator Configuration Example
ssl-enabled-components=all
ssl-keystore=/etc/geode/server-keystore.jks
ssl-keystore-type=JKS
ssl-keystore-password=${SERVER_KEYSTORE_PASSWORD}
ssl-truststore=/etc/geode/server-truststore.jks
ssl-truststore-type=JKS
ssl-truststore-password=${SERVER_TRUSTSTORE_PASSWORD}
ssl-require-authentication=true
ssl-endpoint-identification-enabled=true
```

```properties
# Client Configuration Example
ssl-enabled-components=all
ssl-keystore=/etc/geode/client-keystore.jks
ssl-keystore-type=JKS
ssl-keystore-password=${CLIENT_KEYSTORE_PASSWORD}
ssl-truststore=/etc/geode/client-truststore.jks
ssl-truststore-type=JKS
ssl-truststore-password=${CLIENT_TRUSTSTORE_PASSWORD}
ssl-require-authentication=true
ssl-endpoint-identification-enabled=true
```

#### Certificate Requirements

Certificates issued by your internal CA must include specific extensions:

- **Client certificates**: Must contain `extendedKeyUsage=clientAuth`
- **Server certificates**: Must contain `extendedKeyUsage=serverAuth` and `subjectAltName` fields with DNS names or IP addresses for hostname verification
- **Chain validity**: All certificates must chain back to a CA trusted by the peer's truststore

#### Automation and Lifecycle Management
The shift to internal PKI infrastructure presents an opportunity to embrace modern certificate lifecycle automation. Manual certificate management—creating CSRs, submitting to CAs, retrieving signed certificates, distributing keystores, tracking expiration dates, and coordinating renewal across dozens or hundreds of nodes—is operationally burdensome, error-prone, and scales poorly. Worse, it encourages dangerous practices like issuing long-lived certificates (1-3 years) to reduce renewal frequency, which dramatically increases the blast radius of certificate compromise.

Automation fundamentally transforms this operational model. By programmatically issuing, distributing, and renewing certificates on short lifecycles (hours to days), you achieve:

- **Reduced Compromise Windows**: A compromised 24-hour certificate becomes useless within a day without manual intervention; contrast with a 365-day certificate providing persistent access for a year
- **Elimination of Manual Coordination**: No more scheduled maintenance windows for certificate renewals across production clusters
- **Zero-Downtime Operations**: Apache Geode's file-watching credential managers detect updated keystores and reload credentials without process restarts
- **Audit Trail Precision**: Every certificate issuance logged with requesting principal, timestamp, and validity period
- **Consistent Security Posture**: Automation ensures certificates always contain required EKUs, SANs, and other extensions—no human error in CSR generation
Modern internal PKI solutions support automated certificate lifecycle management:

**HashiCorp Vault PKI**: Vault Agent can auto-authenticate using AppRole or the Kubernetes auth method (ServiceAccount-based). Vault’s PKI secrets engine can issue short-lived X.509 certificates on demand, and Vault Agent templates can render certificates to files with renewal behavior tied to certificate expiration. Apache Geode includes file-watching credential reload (truststore, and commonly keystore) so updates to watched credential files can be picked up without manual restarts, depending on the configured components.

**Smallstep `step-ca`**: The `step ca certificate` command issues a certificate and private key (PEM files). Renewal automation is supported via step ca renew, including daemonized renewal and hooks for service reloads. If a Java keystore is required, convert/import the issued material into PKCS12/JKS using standard tooling (for example, keytool -importkeystore workflows).

**ACME Protocol**: If your internal CA supports ACME (Automated Certificate Management Environment), standard ACME clients like Certbot can manage certificate issuance and renewal.

The critical pattern across all automation approaches:

1. Automation authenticates to the CA
2. CA issues a short-lived certificate with the required EKU
3. Automation writes the keystore file atomically (create temp file, write fully, then rename)
4. `FileWatchingX509ExtendedKeyManager` detects the change and reloads without restart

#### When to Choose This Approach

This approach is optimal when:

- You can operate or integrate with enterprise PKI infrastructure
- You want to maintain strong, certificate-based mutual authentication
- You need fine-grained lifecycle control over all certificates
- Compliance or security requirements mandate certificate-based client identity
- You have the operational expertise to run a CA (or can leverage existing internal CA infrastructure)

#### Operational Considerations

**Advantages**:

- Full preservation of mTLS security model
- Complete control over certificate lifecycle and policies
- Short certificate lifetimes (1-30 days recommend) minimize compromise windows
- No dependency on public CA policies

**Challenges**:

- Requires PKI operational expertise or infrastructure
- Additional automation needed for certificate distribution
- Must manage CA infrastructure availability and security
- Initial setup complexity higher than other approaches

### Approach 2: Hybrid Model—Public CA Servers, Private CA Clients

**Philosophical Foundation**: The second approach recognizes that different system components have different trust requirements and splits the certificate authority hierarchy accordingly.

#### Architecture Overview

The hybrid model combines two certificate authority sources strategically:

- **Public CA for Servers/Locators**: Server and locator certificates come from public CAs (Let's Encrypt, DigiCert, etc.), preserving external trust relationships and enabling peer-to-peer authentication within the cluster
- **Public CA for Peer-to-Peer Communication**: In both client/server and P2P cache topologies, cluster members (servers, locators, peers) use public-CA-issued certificates to authenticate to each other during cluster formation and ongoing peer communication
- **Private CA for Clients**: Client certificates come from your internal/private CA, ensuring `clientAuth` EKU presence and lifecycle control

This split-trust model solves a specific problem: servers that must be trusted by external systems (monitoring tools, management consoles, developers' machines trusting the system trust store) while maintaining controlled client authentication. Peer-to-peer communication leverages the same public CA certificates used for external trust, simplifying infrastructure while keeping client certificate issuance under your control.

#### Trust Relationship Architecture

The hybrid model requires careful trust store configuration:

- **Client Authentication**:
Server validates via server truststore containing private CA root
- **Server Authentication**:
Client validates via public CA in system trust store or custom client truststore
- **Peer-to-Peer Authentication**:
Peer validates via both public CA and private CA

#### Technical Implementation

**For Client/Server Topology:**

![Client Authentication Flow](Client_Authentication_Flow.jpg)
Server validates via server truststore containing private CA root

![Server Authentication Flow](Server_Authentication_Flow.jpg)
Client validates via public CA in system trust store or custom client truststore

Server/locator configuration must trust **both** CAs:

```properties
# Server/Locator Configuration Example
ssl-enabled-components=all

# Server keystore: public-CA-issued server cert
ssl-keystore=/etc/geode/server-keystore.jks
ssl-keystore-type=JKS
ssl-keystore-password=${SERVER_KEYSTORE_PASSWORD}

# Server truststore: BOTH public CA (peer-to-peer) and private CA (clients)
ssl-truststore=/etc/geode/server-truststore.jks
ssl-truststore-type=JKS
ssl-truststore-password=${SERVER_TRUSTSTORE_PASSWORD}

ssl-require-authentication=true
ssl-endpoint-identification-enabled=true
```

The server truststore must be constructed to contain:
1. The public CA root certificate(s) for validating other servers and locators during cluster formation and peer-to-peer communication
2. The private CA root certificate for validating client certificates

Client configuration remains simpler:

```properties
# Client Configuration Example
ssl-enabled-components=all

# Client keystore: private-CA-issued client cert
ssl-keystore=/etc/geode/client-keystore.jks
ssl-keystore-type=JKS
ssl-keystore-password=${CLIENT_KEYSTORE_PASSWORD}

# Client truststore: public CA root (validates server cert)
ssl-truststore=/etc/geode/client-truststore.jks
ssl-truststore-type=JKS
ssl-truststore-password=${CLIENT_TRUSTSTORE_PASSWORD}

ssl-require-authentication=true
ssl-endpoint-identification-enabled=true
```

**For P2P Cache Topology:**

In P2P topology, all members are peers with equal standing. Each peer must trust **both** CAs because:
1. Peers use public-CA-issued certificates to authenticate to each other
2. Peers may also serve client connections (requiring private CA trust)

![Peer-to-Peer Authentication Flow](P2P_Authentication_Flow.jpg)
Peer validates via both public CA and private CA


```properties
# Peer Member Configuration Example (Each Member in P2P Cluster)
ssl-enabled-components=all

# Peer keystore: public-CA-issued certificate (for peer-to-peer auth)
ssl-keystore=/etc/geode/peer-keystore.jks
ssl-keystore-type=JKS
ssl-keystore-password=${PEER_KEYSTORE_PASSWORD}

# Peer truststore: BOTH public CA (other peers) and private CA (clients)
ssl-truststore=/etc/geode/peer-truststore.jks
ssl-truststore-type=JKS
ssl-truststore-password=${PEER_TRUSTSTORE_PASSWORD}

ssl-require-authentication=true
ssl-endpoint-identification-enabled=true
```

If clients connect to the P2P cluster, their configuration remains the same as in client/server topology (private CA keystore, public CA truststore).

#### Certificate Requirements

**Server certificates** (from public CA):

- Must include `serverAuth` Extended Key Usage
- Must include `subjectAltName` with all DNS names and IP addresses where the server is reachable
- Obtained through standard public CA enrollment (ACME, web interface, etc.)

**Client certificates** (from private CA):

- Must include `clientAuth` Extended Key Usage
- Common Name (CN) or Subject Alternative Name typically identifies the client application or service
- Issued via your internal PKI automation

#### Certificate Rotation Workflow

The hybrid model requires careful orchestration during certificate rotation:

| Rotation Event | Procedure |
|----------------|-----------|
| Server certificate renewal (public CA) | Write new keystore atomically; `FileWatchingX509ExtendedKeyManager` reloads |
| Client certificate rotation (private CA) | Automation writes new client keystore; reload happens automatically |
| Private CA rollover | Deploy new private CA to all server truststores *before* issuing client certificates from new CA |
| Public CA root renewal | Update client truststores and server truststores (for peer-to-peer validation) |

#### When to Choose This Approach

This approach excels when:

- Server endpoints must be trusted by external systems using public CA trust stores
- You can operate an internal CA for client certificate issuance
- You want to retain full certificate-based mTLS
- External visibility/trust for servers is required while maintaining controlled client authentication
- Public CA changes to client certificate policies don't affect your operations

#### Operational Considerations

**Advantages**:

- Servers remain trusted by systems relying on public CA trust stores
- Client certificate policies completely under your control
- Public CA EKU changes for client certificates are irrelevant
- Maintains full mTLS security model
- Clear separation of trust domains (public for servers, private for clients)

**Challenges**:

- More complex trust store management (servers need both CAs)
- Certificate rotation requires coordination across two CA hierarchies
- Debugging handshake failures requires understanding which certificate comes from which CA
- Documentation and runbooks must clearly specify the dual-CA architecture

### Approach 3: Server-Only TLS with Application-Layer Authentication

**Philosophical Foundation**: The third approach fundamentally rethinks the authentication model—keeping TLS for transport encryption while moving authentication to the application layer.

#### Architecture Overview

In this model:

- **Transport Encryption**: TLS encrypts all network traffic bidirectionally, exactly as with mTLS
- **Server Authentication**: Servers present certificates (typically from public CAs), clients verify them
- **Client Authentication**: Clients authenticate via application-layer mechanisms (`SecurityManager` framework) rather than presenting certificates

This approach directly addresses the core issue: if client certificate authentication is problematic, remove the dependency on client certificates entirely.

#### The Transport Encryption Clarification

A common misconception requires addressing: does disabling client certificate authentication reduce transport security?

**No.** Transport encryption is independent of client certificate authentication.

When `ssl-enabled-components=all` is configured, Apache Geode enables TLS at the socket/engine layer via JSSE. Once the TLS handshake completes (even without client certificate exchange), all subsequent data is:

- **Encrypted** using the negotiated cipher suite (e.g., AES-256-GCM)
- **Authenticated** with message authentication codes preventing tampering
- **Protected** in both directions (client-to-server and server-to-client)

Setting `ssl-require-authentication=false` disables mandatory client certificate presentation during the handshake. The TLS session still establishes, and data flows over an encrypted channel. The difference is in **identity establishment** (who the client is), not **transport security** (whether data is encrypted).

#### Technical Implementation

Server/locator configuration explicitly disables client certificate requirement:

```properties
# Server/Locator Configuration Example
ssl-enabled-components=all

# Server presents its certificate to clients
ssl-keystore=/etc/geode/server-keystore.jks
ssl-keystore-type=JKS
ssl-keystore-password=${SERVER_KEYSTORE_PASSWORD}

# Truststore needed for peer-to-peer if cluster component uses SSL
ssl-truststore=/etc/geode/server-truststore.jks
ssl-truststore-type=JKS
ssl-truststore-password=${SERVER_TRUSTSTORE_PASSWORD}

# KEY CHANGE: Do not require client certificates
ssl-require-authentication=false

ssl-endpoint-identification-enabled=true

# Application-layer authentication enforcement
security-manager=com.example.geode.security.MySecurityManager
```

Client configuration eliminates the client keystore entirely:

```properties
# Client Configuration Example
ssl-enabled-components=all

# No ssl-keystore needed (client doesn't present a certificate)

# Client truststore: validates server certificates
ssl-truststore=/etc/geode/client-truststore.jks
ssl-truststore-type=JKS
ssl-truststore-password=${CLIENT_TRUSTSTORE_PASSWORD}

ssl-endpoint-identification-enabled=true

# Credential injection for application-layer auth
security-client-auth-init=com.example.geode.security.UserPasswordAuthInit.create
security-username=${GEODE_USERNAME}
security-password=${GEODE_PASSWORD}
```

#### Application-Layer Authentication Implementation

With client certificates eliminated from the TLS handshake, authentication must shift to the application layer. This is where you implement the actual client identity verification that was previously handled by certificate validation. Apache Geode's `SecurityManager` framework provides the necessary extension points to authenticate clients using credentials (username/password, tokens, etc.) rather than certificates.

**When to Implement This**: You must implement application-layer authentication when you need to verify client identity but aren't using client certificates. Without either certificate-based authentication (mTLS) or application-layer authentication, your cluster would accept any client that can complete the TLS handshake—a security vulnerability. The `SecurityManager` framework ensures that even though clients don't present certificates, they still must prove their identity before accessing cluster resources.

Apache Geode's security framework provides two extension points:

**Client Side: `AuthInitialize`**

Implement the `AuthInitialize` interface to inject credentials. The following example demonstrates how this can be implemented in practice:

```java
public class UserPasswordAuthInit implements AuthInitialize {
    public static AuthInitialize create() {
        return new UserPasswordAuthInit();
    }

    @Override
    public Properties getCredentials(Properties securityProps, 
                                     DistributedMember server,
                                     boolean isPeer) {
        Properties credentials = new Properties();
        credentials.setProperty("security-username", 
            System.getenv("GEODE_USERNAME"));
        credentials.setProperty("security-password", 
            System.getenv("GEODE_PASSWORD"));
        return credentials;
    }
}
```

For token-based authentication (JWT, OAuth bearer tokens), the code below provides an example implementation of the proposed approach:

```java
public class TokenAuthInit implements AuthInitialize {
    public static AuthInitialize create() {
        return new TokenAuthInit();
    }

    @Override
    public Properties getCredentials(Properties securityProps, 
                                     DistributedMember server,
                                     boolean isPeer) {
        Properties credentials = new Properties();
        credentials.setProperty("bearer-token", 
            System.getenv("GEODE_BEARER_TOKEN"));
        return credentials;
    }
}
```

**Server Side: `SecurityManager`**

Implement the `SecurityManager` interface to validate credentials and enforce authorization. To help illustrate the approach, here is an example implementation:

```java
public class MySecurityManager implements SecurityManager {
    @Override
    public Object authenticate(Properties credentials) 
            throws AuthenticationFailedException {
        String username = credentials.getProperty("security-username");
        String password = credentials.getProperty("security-password");
        
        // Validate against your authentication backend
        // (LDAP, database, token validation service, etc.)
        if (!isValid(username, password)) {
            throw new AuthenticationFailedException(
                "Invalid credentials for user: " + username);
        }
        
        // Returned value becomes the authenticated principal
        return username;
    }

    @Override
    public boolean authorize(Object principal, ResourcePermission permission) {
        // Implement fine-grained authorization logic
        // based on the authenticated principal and requested operation
        return true; // Simplified for illustration
    }
}
```

#### Authentication Backend Integration Options

The `SecurityManager` implementation can integrate with various authentication backends:

- **LDAP/Active Directory**: Use `javax.naming.ldap` JNDI APIs to bind with supplied credentials and verify against directory server
- **Database-backed**: Query user credentials table, using bcrypt or similar for password hashing
- **OAuth/OIDC**: Validate bearer tokens against OAuth authorization server's token introspection endpoint
- **Multi-factor**: Combine password validation with time-based OTP (TOTP) verification
- **Custom**: Integrate with internal identity management systems via REST APIs or proprietary protocols

#### Peer-to-Peer Topology Support

A critical validation question: does Approach 3 work for peer-to-peer (P2P) topology, or only for client/server topology?

**Answer**: This approach works comprehensively for **both** client/server and P2P cache topologies.

Apache Geode supports two fundamental deployment patterns:

1. **Client/Server Topology**: Clients connect to dedicated cache servers; data is managed by the server tier
2. **P2P Cache Topology**: All members are peers with equal standing, each maintaining a portion of the distributed cache

In P2P topologies, members must authenticate with each other when joining the cluster. The same application-layer authentication mechanism that works for clients applies to peers.

**Technical Implementation for P2P**:

Each peer configures peer authentication properties:

```properties
# Peer Configuration Example (Each Member in P2P Cluster)
ssl-enabled-components=all
ssl-keystore=/etc/geode/peer-keystore.jks
ssl-keystore-type=JKS
ssl-keystore-password=${PEER_KEYSTORE_PASSWORD}
ssl-truststore=/etc/geode/peer-truststore.jks
ssl-truststore-type=JKS
ssl-truststore-password=${PEER_TRUSTSTORE_PASSWORD}

# Disable certificate-based peer authentication
ssl-require-authentication=false
ssl-endpoint-identification-enabled=true

# Application-layer peer authentication
security-peer-auth-init=com.example.geode.security.PeerAuthInit.create
security-manager=com.example.geode.security.MySecurityManager
security-username=${PEER_USERNAME}
security-password=${PEER_PASSWORD}
```

**Authorization Requirement**:

For P2P topology, joining peers must have `CLUSTER:MANAGE` permission. The `SecurityManager.authorize()` method is invoked during peer join, validating that the authenticated principal has the necessary cluster-level permissions:

```java
@Override
public boolean authorize(Object principal, ResourcePermission permission) {
    // Peer joining cluster must have CLUSTER:MANAGE
    if (permission.getResource() == Resource.CLUSTER && 
        permission.getOperation() == Operation.MANAGE) {
        return hasClusterManagePermission(principal);
    }
    // Additional authorization logic for data operations, etc.
    return authorizeDataOperation(principal, permission);
}
```

**Validation**:

This approach is validated for P2P topology:

- **Cluster Formation**: P2P members successfully join clusters using application-layer authentication without presenting certificates
- **TLS Encryption**: All peer-to-peer communication remains fully encrypted (AES-256-GCM or equivalent cipher suites)
- **Data Replication**: Region data replicates successfully across peers over encrypted channels
- **Authentication Enforcement**: Peers with invalid credentials are rejected at the application layer before cluster join
- **Authorization Enforcement**: Peers lacking `CLUSTER:MANAGE` permission fail authorization even with valid credentials
- **Multi-Peer Scenarios**: Clusters with multiple peers using different credentials operate correctly

**Approach 3 comprehensively solves the public CA clientAuth EKU sunset problem for all Apache Geode Topologies**, not just client/server configurations.

#### When to Choose This Approach

This approach is optimal when:

- Certificate distribution to clients is operationally impractical
- You already have application-layer authentication infrastructure
- Certificate lifecycle management overhead outweighs benefits
- You want the simplest migration path from mTLS
- Credential-based authentication aligns with your organizational security model
- Fast rollout is critical and PKI standup would create delays

#### Operational Considerations

**Advantages**:

- No client certificate distribution or lifecycle management
- Leverages existing authentication infrastructure (LDAP, databases, OAuth, etc.)
- Simplest client configuration (no keystore needed)
- Credentials can be rotated without touching TLS infrastructure
- Transport remains encrypted (confidentiality and integrity preserved)

**Challenges**:

- Application-layer authentication logic must be robustly implemented
- Credentials (passwords, tokens) must be protected in client environments
- No transport-level binding between TLS session and authenticated identity
- Requires implementing or integrating `SecurityManager` logic
- Authorization must be enforced at application layer rather than at connection establishment

## Migration Planning and Best Practices

Regardless of which approach you choose, successful migration requires careful planning and execution.

### Pre-Migration Assessment

**Inventory Your Environment**:

- Catalog all Apache Geode server, locator, gateway, and client deployments
- Identify which certificates are public-CA-issued vs. internal-CA-issued
- Document current `ssl-enabled-components` configuration across all members
- Map trust relationships (which truststores contain which CA roots)

**Test in Non-Production**:

- Set up a representative test cluster using your chosen mitigation approach
- Validate that all client connection patterns work as expected
- Use `openssl s_client` and JSSE debug logging (`-Djavax.net.debug=ssl,handshake`) to verify handshake behavior
- Load test to ensure performance meets requirements

### Phased Rollout Strategy

**Dual-Mode Transition** (Particularly for Approach 1 or 2):

If you're transitioning to an internal CA or hybrid model, consider a dual-trust-store intermediate phase:

1. Deploy updated server truststores containing **both** old public CA and new internal/private CA roots
2. This allows old (public CA) and new (internal CA) clients to coexist
3. Gradually migrate clients to new certificates
4. Once all clients use new certificates, remove old CA from truststores

**Component-by-Component** (Particularly for Approach 3):

If transitioning to server-only TLS with application-layer auth:

1. Start with less critical client applications
2. Update clients to remove keystores and add `security-client-auth-init`
3. Deploy `SecurityManager` implementation to servers (can be done before client changes)
4. Once all clients migrated, set `ssl-require-authentication=false` server-wide

### Validation and Troubleshooting

**Verification Tools**:

```bash
# Verify server certificate and TLS connectivity
openssl s_client -connect geode-server.example.com:10334 -showcerts

# Test mTLS handshake (approaches 1 and 2)
openssl s_client -connect geode-server.example.com:10334 \
  -cert client.crt -key client.key \
  -CAfile server-ca.pem -showcerts

# Verify certificate chain and EKU
openssl x509 -in client.crt -text -noout | grep -A2 "Extended Key Usage"
```

**JSSE Debug Logging**:

Enable detailed TLS handshake logging on both clients and servers:

```bash
java -Djavax.net.debug=ssl,handshake,trustmanager -jar geode-server.jar
```

**Common Issues and Solutions**:

| Symptom | Likely Cause | Resolution |
|---|---|---|
| `PKIX path building failed` | Truststore missing the CA that signed peer certificate | Import correct CA root into truststore |
| `certificate_unknown` alert | Client certificate lacks `clientAuth` EKU, or server lacks `serverAuth` | Re-issue certificate from CA that includes required EKU |
| `No subject alternative names` | Server certificate missing SAN when `ssl-endpoint-identification-enabled=true` | Re-issue server certificate with DNS SANs |
| `java.security.UnrecoverableKeyException` | Incorrect keystore password or corrupted keystore | Verify password, check keystore integrity with `keytool` |
| Authentication succeeds but operations fail | Authorization logic in `SecurityManager` too restrictive | Review `authorize()` implementation, check logs for denied permissions |

### Certificate Lifecycle Automation

For long-term operational success, automate certificate management:

**Monitoring and Alerting**:

- Monitor certificate expiry dates (alert 30, 14, 7, 1 days before expiry)
- Track `FileWatchingX509ExtendedKeyManager` reload events in logs
- Alert on handshake failure rate increases

**Automated Renewal**:

- Schedule certificate renewal well before expiry (50-75% through lifetime)
- Write renewed certificates atomically (create temp file, write, rename) to avoid partial reads by file watcher
- Validate renewed certificate before deployment (check EKU, SAN, expiry, chain validity)

**Rollback Plans**:

- Maintain previous keystore/truststore versions during transitions
- Document rollback procedures (swap files, restart if necessary)
- Test rollback procedures in non-production environments

## Security Considerations Across All Approaches

While each approach has specific security characteristics, some considerations apply universally.

### Defense in Depth

No single security mechanism is sufficient. Layer defenses:

- **Transport Security**: TLS provides confidentiality and integrity
- **Authentication**: Whether certificate-based or application-layer, verify identity
- **Authorization**: `SecurityManager` enforcement of fine-grained permissions per operation
- **Audit Logging**: Track authentication events, authorization denials, certificate rotations
- **Network Segmentation**: Isolate Apache Geode clusters on private networks where feasible

### Credential Protection

**For Certificate-Based Approaches (1 and 2)**:

- Store private keys in keystores with strong passwords
- Consider HSM (Hardware Security Module) backing for critical certificates
- Restrict filesystem permissions on keystore files (e.g., `chmod 600`)
- Avoid embedding keystore passwords in configuration files; use environment variables or secret management systems

**For Application-Layer Auth (Approach 3)**:

- Never store plaintext passwords in configuration
- Use environment variables, encrypted configuration, or secret management systems (Vault, AWS Secrets Manager, etc.)
- Implement rate limiting on authentication attempts to mitigate brute-force attacks
- Consider short-lived session tokens after initial authentication

### Operational Security

**Certificate Lifecycle**:

- Use short certificate lifetimes (1-30 days recommended for client certificates)
- Automate renewal to avoid manual processes and human error
- Maintain CRL (Certificate Revocation List) or OCSP infrastructure for longer-lived certificates

**Monitoring and Audit**:

- Log all authentication and authorization events
- Monitor for unusual patterns (failed authentication spikes, certificate validation failures)
- Track certificate rotation events and ensure automation is functioning

**Incident Response**:

- Document procedures for certificate compromise scenarios
- Plan CA key rollover procedures (especially for internal CAs)
- Test emergency certificate revocation and reissuance processes

## Performance and Scalability Implications

Each approach has different performance characteristics that should inform your choice.

### TLS Handshake Overhead

**Certificate-Based (Approaches 1 and 2)**:

- Full mTLS handshake includes additional round trips for client certificate exchange
- Negligible impact for long-lived connections typical of Apache Geode clients
- Connection pooling mitigates handshake cost in high-throughput scenarios

**Server-Only TLS (Approach 3)**:

- Slightly faster handshakes (no client certificate exchange)
- Application-layer authentication adds minimal overhead (one `authenticate()` call per connection)
- For token-based auth, token validation overhead depends on backend (local validation vs. remote API call)

### Certificate Validation Cost

- Modern JSSE implementations cache certificate validation results
- Short certificate lifetimes increase rotation frequency but don't impact runtime validation cost
- Trust store size has minimal impact (linear search is negligible for typical trust store sizes)

### File-Watching Infrastructure

Geode's `PollingFileWatcher` monitors keystores and truststores:
- Default polling interval: configurable, typically 1-60 seconds
- Minimal CPU overhead even with frequent polling
- Atomically-written files ensure consistent reads during rotation

## Looking Ahead: The Future of Distributed System Authentication

The public CA `clientAuth` EKU sunset is more than a technical challenge—it's an opportunity to rethink authentication architecture in distributed systems.

### Industry Trends

**Service Mesh Integration**: Projects like Istio and Linkerd provide transparent mTLS with automated certificate lifecycle management at the infrastructure layer, abstracting this complexity from applications.

**SPIFFE/SPIRE**: The Secure Production Identity Framework For Everyone (SPIFFE) and its reference implementation SPIRE provide workload identity attestation and short-lived certificates independent of traditional PKI hierarchies.

**Zero Trust Architecture**: Modern security models assume no implicit trust, requiring continuous authentication and authorization regardless of network location—aligning well with credential-based approaches.

### Apache Geode's Roadmap

The Apache Geode community continues to evolve security capabilities:

- Enhanced `SecurityManager` extensibility for integration with modern identity providers
- Improved observability around authentication and TLS operations
- Continued refinement of file-watching and zero-downtime certificate rotation

## Conclusion

The removal of `clientAuth` EKU from public CA certificates represents a significant shift in the internet PKI landscape. For Apache Geode deployments and distributed systems broadly, this change requires thoughtful architectural decisions and careful migration planning.

The three approaches outlined in this article—internal CA mTLS, hybrid public/private CA, and server-only TLS with application-layer authentication—provide comprehensive options for any operational environment. Your choice depends on:

- Existing PKI infrastructure and expertise
- External trust requirements for servers
- Operational complexity tolerance
- Security policy and compliance constraints
- Timeline pressures for migration

Regardless of your choice, the Apache Geode community has thoroughly tested these approaches and validated their operation under production-like conditions. The documentation, test implementations, and this guide provide a clear path forward.

As distributed systems continue to evolve, authentication and authorization mechanisms must evolve with them. The public CA EKU sunset serves as a reminder that internet-scale security standards are living frameworks. The systems we build must be flexible enough to adapt as those standards mature.

The Apache Geode project remains committed to providing secure, reliable, and performant distributed data management. These mitigation strategies represent not just a response to an industry change, but a deepening of security capabilities that will serve the community for years to come.

## References and Further Reading

- Apache Geode Security Documentation: [https://geode.apache.org/docs/](https://geode.apache.org/docs/)
- CA/Browser Forum Baseline Requirements: [https://cabforum.org/baseline-requirements/](https://cabforum.org/baseline-requirements/)
- Let's Encrypt Certificate Authority: [https://letsencrypt.org/docs/](https://letsencrypt.org/docs/)
- Java JSSE Reference Guide: [https://docs.oracle.com/en/java/javase/](https://docs.oracle.com/en/java/javase/)
- SPIFFE Project: [https://spiffe.io/](https://spiffe.io/)

Reply via email to