This is an automated email from the ASF dual-hosted git repository.

weichiu pushed a commit to branch HDDS-9225-website-v2
in repository https://gitbox.apache.org/repos/asf/ozone-site.git


The following commit(s) were added to refs/heads/HDDS-9225-website-v2 by this 
push:
     new abe96bd0e HDDS-14479. [Docs] System Internals -> Security -> Symmetric 
Encryption (#308)
abe96bd0e is described below

commit abe96bd0e6f6d4868396105ea6b974b6ccb1d91c
Author: Bolin Lin <[email protected]>
AuthorDate: Thu Feb 5 12:15:15 2026 -0500

    HDDS-14479. [Docs] System Internals -> Security -> Symmetric Encryption 
(#308)
---
 .../05-security/02-symmetric-encryption.md         | 189 ++++++++++++++++++++-
 1 file changed, 187 insertions(+), 2 deletions(-)

diff --git a/docs/07-system-internals/05-security/02-symmetric-encryption.md 
b/docs/07-system-internals/05-security/02-symmetric-encryption.md
index 846c87727..79bd7d8f0 100644
--- a/docs/07-system-internals/05-security/02-symmetric-encryption.md
+++ b/docs/07-system-internals/05-security/02-symmetric-encryption.md
@@ -4,6 +4,191 @@ sidebar_label: Symmetric Encryption
 
 # Symmetric Encryption Within Ozone
 
-**TODO:** File a subtask under 
[HDDS-9862](https://issues.apache.org/jira/browse/HDDS-9862) and complete this 
page or section.
+In secure mode, Ozone issues tokens to authorize and verify each block and 
container access. Traditionally, each token is signed by Ozone Manager (OM) or 
Storage Container Manager (SCM) using RSA private keys and verified by 
Datanodes using public keys and certificates. However, with RSA private key 
sizes of 2048 bits, the signing operation is computationally expensive and can 
contribute more than 80% to the latency of read/write operations in Ozone 
Manager.
 
-Document Ozone's shared secret model and how it is used within Ozone 
components. Also document performance advantages over asymmetric encryption.
+Since Ozone Manager is not horizontally scalable by design, minimizing 
operational costs is critical for achieving sub-millisecond latencies. 
Asymmetric key signing cannot meet this requirement. The solution is to use 
symmetric-key algorithms, such as HMAC with SHA256, to sign tokens—similar to 
how HDFS operates. This approach reduces signature generation costs from 
milliseconds to microseconds.
+
+## Performance Advantages Over Asymmetric Encryption
+
+| Aspect | Asymmetric (RSA-2048) | Symmetric (HMAC-SHA256) |
+|--------|----------------------|------------------------|
+| Signing Speed | Milliseconds | Microseconds |
+| CPU Overhead | High | Low |
+| Latency Impact | >80% of OM read/write latency | Negligible |
+| Scalability | Limited by signing cost | Highly scalable |
+
+## Shared Secret Model
+
+Symmetric key algorithms require both the signer (OM) and the verifier 
(Datanodes) to share the same SecretKey. This necessitates managing SecretKey 
distribution and lifecycle across Ozone components.
+
+### Architecture Overview
+
+**Component Responsibilities:**
+
+| Component | Role |
+|-----------|------|
+| **SCM** | Source of truth. Generates, rotates, stores, and distributes 
SecretKeys. |
+| **OM** | Fetches current SecretKey from SCM, caches it, and signs block 
tokens using HMAC. |
+| **Datanodes** | Receive SecretKeys via heartbeat/register, verify tokens 
using cached keys. |
+
+**SecretKey Flow:**
+
+```mermaid
+flowchart TB
+    SCM["SCM Leader<br/>SecretKey File"]
+    OM["OM<br/>(HMAC)"]
+    Client["Client"]
+    DN["Datanodes<br/>Verify Token"]
+
+    OM -->|Fetch Current Key| SCM
+    SCM -->|Rotate Daily| OM
+    OM -->|Sign Block Token| Client
+    SCM -->|Distribute Keys<br/>via Heartbeat| DN
+    Client -->|Read/Write with Token| DN
+```
+
+## SecretKey Lifecycle
+
+### Key Structure
+
+Each SecretKey encapsulates:
+
+- **ID**: Unique identifier for the SecretKey
+- **creationTime**: Timestamp of key creation
+- **expiryTime**: creationTime + X days (configurable expiry duration)
+- **secretKey**: The actual symmetric key material
+
+### Key Generation and Storage
+
+- SCM generates SecretKeys and stores them persistently in the SCM file system
+- Each SCM generates its own SecretKeys independently
+- SCM maintains both the current active SecretKey and all non-expired keys
+- Keys are stored in a KeyStore file in 
`<hdds.metadata.dir>/scm/<hdds.key.dir.name>`
+- File permissions are restricted to read-only access for the SCM process owner
+
+### Key Rotation
+
+SCM proactively generates and distributes the next SecretKey to ensure the 
current active key is always available on Datanodes before it becomes active:
+
+```java
+// When SCM first starts
+currentKey = generateSecretKey();
+nextKey = generateSecretKey();
+allKeys.add(currentKey);
+allKeys.add(nextKey);
+
+// Key rotation (periodic)
+currentKey = nextKey;
+nextKey = generateSecretKey();
+allKeys.add(nextKey);
+filterExpiredSecretKeys(allKeys);
+```
+
+During each rotation cycle:
+
+1. The previously generated `nextKey` becomes the `currentKey`
+2. A new `nextKey` is generated for the upcoming cycle
+3. Expired SecretKeys are removed from the active set
+
+## Key Distribution
+
+### To Ozone Manager
+
+- OM retrieves the current SecretKey from SCM (leader) via RPC
+- For performance, OM caches the SecretKey in memory with a configurable TTL
+- Signed tokens include the SecretKey ID, allowing Datanodes to identify which 
key to use for verification
+
+### To Datanodes
+
+Datanodes receive SecretKeys through two mechanisms:
+
+1. **Registration**: When a Datanode joins or rejoins a cluster, it registers 
with all SCM instances and fetches all current non-expired SecretKeys
+
+2. **Heartbeat**: During heartbeat processing, SCM checks if new SecretKeys 
need to be distributed and includes them in the heartbeat response
+
+Datanodes store SecretKeys in memory using a HashMap for fast lookup by ID. 
They also periodically remove expired keys.
+
+## Handling Special Events
+
+### OM Restart
+
+After restarting, OM calls SCM to fetch and cache the current SecretKey.
+
+### SCM Restart
+
+After restarting, SCM:
+
+1. Reads the stored file to load non-expired SecretKeys
+2. Removes any expired keys
+3. Assigns the `currentKey` based on timestamps of loaded keys
+4. Generates a new `nextKey` if needed
+
+If all stored keys have expired, SCM behaves as if starting fresh.
+
+The following table illustrates SCM key restoration behavior with a 7-day key 
expiry period. In this example, `kN` represents a key generated on day N. 
Assume SCM was running until Day 6 and stored keys k1-k7 (where k6 was 
`currentKey` and k7 was `nextKey`), then went down. The table shows what 
happens when SCM restarts on different days:
+
+| Stored Keys | Restart Day | Key Restoration Result |
+|-------------|-------------|------------------------|
+| k1-k7 | Day 6 | `currentKey` = k6, `nextKey` = k7, `allKeys` = [k1, k2, k3, 
k4, k5, k6, k7] |
+| k1-k7 | Day 7 | `currentKey` = k7, `nextKey` = generateNewKey(), `allKeys` = 
[k1, k2, k3, k4, k5, k6, k7, nextKey] |
+| k1-k7 | Day 8 | `currentKey` = k7, `nextKey` = generateNewKey(), `allKeys` = 
[k2, k3, k4, k5, k6, k7, nextKey] |
+| k1-k7 | Day 13 | `currentKey` = k7, `nextKey` = generateNewKey(), `allKeys` 
= [k7, nextKey] |
+| k1-k7 | Day 14 | `currentKey` = generateNewKey(), `nextKey` = 
generateNewKey(), `allKeys` = [currentKey, nextKey] |
+
+**Notes:**
+
+- Day 6: Same day as shutdown, keys restored as-is
+- Day 7: k7 promoted to current, new nextKey generated
+- Day 8: k1 expired (generated Day 1 + 7 days = Day 8), removed from allKeys
+- Day 13: Only k7 remains valid, k1-k6 all expired
+- Day 14: All stored keys expired (k7: Day 7 + 7 = Day 14), fresh keys 
generated
+
+### SCM Failover
+
+When SCM leadership transfers to a new instance:
+
+- The new SCM's SecretKeys should already be present on Datanodes (since 
Datanodes register with all SCM instances)
+- OM can continue using its cached SecretKey until the cache expires
+- Edge cases where a Datanode lacks a required SecretKey are handled through 
eventual consistency mechanisms
+
+### Missing SecretKey on Datanode
+
+If a Datanode cannot find a required SecretKey:
+
+1. It triggers an immediate heartbeat to update SecretKeys from all SCMs
+2. Returns a `SecretKeyNotFound` error to the client
+3. The client retries with other nodes in the pipeline
+4. If all nodes fail, the client requests fresh block information from OM with 
a flag to refresh the SecretKey cache
+5. A metric is emitted to expose the situation for monitoring
+
+## Compliance and Security Standards
+
+### Algorithm Selection
+
+Following NIST SP 800-133 recommendations for Message Authentication Codes, 
Ozone uses **HMAC** as it is:
+
+- Highly performant
+- Supported by Java Security Core
+- Compliant with security standards
+
+The default configuration uses **HMAC with SHA256**, which provides 128-bit 
security strength per NIST SP 800-57.
+
+### Key Generation
+
+SecretKeys are generated using Java's `SecureRandom`, which complies with FIPS 
140-2 requirements for approved Random Number Generators.
+
+### Key Storage
+
+- SecretKeys are persisted in a KeyStore file
+- File permissions are restricted to owner-only read access
+- Location: `<hdds.metadata.dir>/scm/<hdds.key.dir.name>`
+
+### Key Transfer
+
+SecretKeys are transferred between SCM, OM, and Datanodes via TLS-protected 
RPC connections, ensuring confidentiality during transit.
+
+## Related Resources
+
+- [HDDS-7733](https://issues.apache.org/jira/browse/HDDS-7733) - Performance 
analysis of token signing
+- [NIST SP 
800-133](https://csrc.nist.gov/publications/detail/sp/800-133/rev-2/final) - 
Recommendation for Cryptographic Key Generation
+- [NIST SP 
800-57](https://csrc.nist.gov/publications/detail/sp/800-57-part-1/rev-5/final) 
- Recommendation for Key Management


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to