This is an automated email from the ASF dual-hosted git repository.
ritesh pushed a commit to branch HDDS-9225-website-v2
in repository https://gitbox.apache.org/repos/asf/ozone-site.git
The following commit(s) were added to refs/heads/HDDS-9225-website-v2 by this
push:
new e899522b HDDS-9864. Add overview documentation (#147)
e899522b is described below
commit e899522ba5380b009a5299fdc1204998b745f558
Author: Ritesh H Shukla <[email protected]>
AuthorDate: Thu Jul 10 13:59:12 2025 -0700
HDDS-9864. Add overview documentation (#147)
---
.markdownlintignore | 1 +
cspell.yaml | 6 ++++
docs/01-overview.md | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 101 insertions(+), 3 deletions(-)
diff --git a/.markdownlintignore b/.markdownlintignore
index 3f9d4c8d..15867b8d 100644
--- a/.markdownlintignore
+++ b/.markdownlintignore
@@ -17,3 +17,4 @@
build
node_modules
+CLAUDE.md
diff --git a/cspell.yaml b/cspell.yaml
index cdc8643e..f864ca45 100644
--- a/cspell.yaml
+++ b/cspell.yaml
@@ -58,6 +58,9 @@ flagWords:
- quasi
# RocksDB docs do not hyphenate this term.
- column-family
+# Exclude CLAUDE.md from spell checking as it contains development-specific
terms
+ignorePaths:
+- CLAUDE.md
# List of words to be always considered correct.
# Case insensitive.
@@ -121,3 +124,6 @@ words:
- UX
- devs
- CLI
+- lakehouse
+- Flink
+- rebalancing
diff --git a/docs/01-overview.md b/docs/01-overview.md
index 2808232c..6ec52a98 100644
--- a/docs/01-overview.md
+++ b/docs/01-overview.md
@@ -5,8 +5,99 @@ slug: /
# Overview
-**TODO:** [HDDS-9864](https://issues.apache.org/jira/browse/HDDS-9864)
complete this page
+## What is Apache Ozone?
-## What is Ozone?
+Apache Ozone is a scalable, distributed object store designed for lakehouse
workloads,
+AI/ML, and cloud-native applications.
+Originating from the BigData analytics ecosystem, it handles both small and
large files,
+supporting deployments up to billions of objects and exabytes of capacity.
+Ozone provides strong consistency guarantees,
+multiple protocol interfaces (including S3 compatibility), and configurable
durability options.
-## Features
+## What it does?
+
+Ozone includes features relevant to large-scale storage requirements:
+
+### Scale
+
+Ozone's architecture separates metadata management from data storage. The
Ozone Manager (OM) and
+Storage Container Manager (SCM) handle metadata operations, while Datanodes
manage the physical storage of data blocks.
+This design allows for independent scaling of these components and supports
incremental cluster growth.
+
+### Flexible Durability
+
+Ozone offers configurable data durability options per bucket or per object:
+
+- **Replication (RATIS):** Uses 3-way replication via the [Ratis
(Raft)](https://ratis.apache.org) consensus protocol for high availability.
+- **Erasure Coding (EC):** Supports various EC codecs (e.g., Reed-Solomon) to
reduce storage overhead compared to replication while maintaining specified
durability levels.
+
+### Secure
+
+Security features are integrated at multiple layers:
+
+- **Authentication:** Supports Kerberos integration for user and service
authentication.
+- **Authorization:** Provides Access Control Lists (ACLs) for managing
permissions at the volume, bucket, and key levels. Supports Apache Ranger
integration for centralized policy management.
+- **Encryption:** Supports TLS/SSL for data in transit and Transparent Data
Encryption (TDE) for data at rest.
+- **Tokens:** Uses delegation tokens and block tokens for access control in
distributed operations.
+
+### Performance
+
+Ozone's design considers performance for different access patterns:
+
+- **Throughput:** Intended for streaming reads and writes of large files. Data
can be served directly from Datanodes after initial metadata lookup.
+- **Latency:** Metadata operations are managed by OM and SCM, designed for
low-latency access.
+- **Small File Handling:** Includes mechanisms for managing metadata and
storage for large quantities of small files.
+
+### Multiple Protocols
+
+Applications can access data stored in Ozone through several interfaces:
+
+- **S3 Protocol:** Provides an S3-compatible REST API, allowing use with
S3-native applications and tools.
+- **Hadoop Compatible File System (ofs):** Offers the `ofs://` scheme for
integration with Hadoop ecosystem tools (e.g., Iceberg, Spark, Hive, Flink,
MapReduce).
+- **Native Java Client API:** A client library for Java applications.
+- **Command Line Interface (CLI):** Provides tools for administrative tasks
and data interaction.
+
+### Efficient Storage Use
+
+Ozone includes features aimed at optimizing storage utilization:
+
+- **Erasure Coding:** Can reduce the physical storage footprint compared to 3x
replication.
+- **Small File Handling:** Manages metadata and block allocation for small
files.
+- **Containerization:** Groups data blocks into larger Storage Containers,
which can simplify management and disk I/O.
+
+### Storage Management
+
+Ozone uses a hierarchical namespace and provides management tools:
+
+- **Namespace:** Organizes data into Volumes (often mapped to tenants) and
Buckets (containers for objects), which hold Keys (objects/files).
+- **Quotas:** Administrators can set storage quotas at the Volume and Bucket
levels.
+- **Snapshots:** Supports point-in-time, read-only snapshots of buckets for
data protection and versioning.
+
+### Strong Consistency
+
+Ozone provides strong consistency for metadata and data operations. Reads
reflect the results of the latest successfully completed write operations.
+
+## Key Characteristics
+
+The design of Ozone leads to certain characteristics relevant for large-scale
data management:
+
+### Storage Costs
+
+Factors influencing storage costs include:
+
+- **Storage Efficiency:** Erasure Coding can reduce physical storage
requirements.
+- **Hardware:** Designed to run on commodity hardware.
+- **Licensing:** Apache Ozone is open-source software under the Apache License
2.0.
+- **Scalability:** Clusters can be expanded by adding nodes or racks. Data
rebalancing mechanisms help manage utilization.
+
+### Operations
+
+Aspects related to storage administration include:
+
+- **Unified Storage:** Can potentially serve as a common storage layer for
different types of workloads.
+- **Management Tools:** Includes the Recon web UI for monitoring and CLI tools
for administration.
+- **Maintenance:** Supports features like rolling upgrades, node
decommissioning, and data balancing.
+
+### Hybrid Cloud Scenarios
+
+Ozone's S3 compatibility allows applications developed for S3 to run
on-premises using Ozone. This can be relevant for hybrid cloud strategies or
migrating workloads between on-premises and cloud environments.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]