This is an automated email from the ASF dual-hosted git repository.
weichiu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ozone.git
The following commit(s) were added to refs/heads/master by this push:
new 3cebcc9664 HDDS-13396. Documentation: Improve the top-level overview
page for new users. (#8753)
3cebcc9664 is described below
commit 3cebcc9664c05a4a94fce30fa40b5648f03f8a97
Author: Wei-Chiu Chuang <[email protected]>
AuthorDate: Wed Jul 23 08:20:27 2025 -0700
HDDS-13396. Documentation: Improve the top-level overview page for new
users. (#8753)
Co-authored-by: Tejaskriya <[email protected]>
---
hadoop-hdds/docs/content/_index.md | 103 ++++++++++++++++++++++++++++++++-----
1 file changed, 90 insertions(+), 13 deletions(-)
diff --git a/hadoop-hdds/docs/content/_index.md
b/hadoop-hdds/docs/content/_index.md
index f65c1527f4..fa50b0518a 100644
--- a/hadoop-hdds/docs/content/_index.md
+++ b/hadoop-hdds/docs/content/_index.md
@@ -1,6 +1,6 @@
---
name: Ozone
-title: Overview
+title: An Introduction to Apache Ozone
menu: main
weight: -10
---
@@ -21,21 +21,98 @@ weight: -10
limitations under the License.
-->
-# Apache Ozone
+## What is Apache Ozone?
-{{<figure class="ozone-usage" src="/ozone-usage.png" width="60%">}}
+Apache Ozone is a scalable, distributed object store designed for lakehouse
workloads,
+AI/ML, and cloud-native applications.
+Originating from the BigData analytics ecosystem, it handles both small and
large files,
+supporting deployments up to billions of objects and exabytes of capacity.
+Ozone provides strong consistency guarantees,
+multiple protocol interfaces (including S3 compatibility), and configurable
durability options.
-*_Ozone is a scalable, redundant, and distributed object store for Big data
workloads. <p>
-Apart from scaling to billions of objects of varying sizes,
-Ozone can function effectively in containerized environments
-like Kubernetes._*
+## What it does?
-Applications like Apache Spark, Hive and YARN, work without any modifications
when using Ozone. Ozone comes with a [Java client library]({{<ref
"JavaApi.md">}}), [S3 protocol support]({{< ref "S3.md" >}}), and a [command
line interface]({{< ref "Cli.md" >}}) which makes it easy to use Ozone.
+Ozone includes features relevant to large-scale storage requirements:
-Ozone consists of volumes, buckets, and keys:
+### Scale
-* Volumes are similar to user accounts. Only administrators can create or
delete volumes.
-* Buckets are similar to directories. A bucket can contain any number of keys,
but buckets cannot contain other buckets.
-* Keys are similar to files.
+Ozone's architecture separates metadata management from data storage. The
Ozone Manager (OM) and Storage Container Manager (SCM) handle metadata
operations, while Datanodes manage the physical storage of data blocks. This
design allows for independent scaling of these components and supports
incremental cluster growth.
-Check out the [Getting Started](start/) guide to dive right in and learn how
to run Ozone on your machine or in the cloud.
+### Flexible Durability
+
+Ozone offers configurable data durability options per bucket or per object:
+* **Replication (RATIS):** Uses 3-way replication via the [Ratis
(Raft)](https://ratis.apache.org) consensus protocol for high availability.
+* **Erasure Coding (EC):** Supports various EC codecs (e.g., Reed-Solomon)
to reduce storage overhead compared to replication while maintaining specified
durability levels.
+
+### Secure
+
+Security features are integrated at multiple layers:
+* **Authentication:** Supports Kerberos integration for user and service
authentication.
+* **Authorization:** Provides Access Control Lists (ACLs) for managing
permissions at the volume, bucket, and key levels. Supports Apache Ranger
integration for centralized policy management.
+* **Encryption:** Supports TLS/SSL for data in transit and Transparent Data
Encryption (TDE) for data at rest.
+* **Tokens:** Uses delegation tokens and block tokens for access control in
distributed operations.
+
+### Performance
+
+Ozone's design considers performance for different access patterns:
+* **Throughput:** Intended for streaming reads and writes of large files.
Data can be served directly from Datanodes after initial metadata lookup.
+* **Latency:** Metadata operations are managed by OM and SCM, designed for
low-latency access.
+* **Small File Handling:** Includes mechanisms for managing metadata and
storage for large quantities of small files.
+
+### Multiple Protocols
+
+Applications can access data stored in Ozone through several interfaces:
+* **S3 Protocol:** Provides an S3-compatible REST API, allowing use with
S3-native applications and tools.
+* **Hadoop Compatible File System (OFS):** Offers the `ofs://` scheme for
integration with Hadoop ecosystem tools (e.g., Iceberg, Spark, Hive, Flink,
MapReduce).
+* **Native Java Client API:** A client library for Java applications.
+* **Command Line Interface (CLI):** Provides tools for administrative tasks
and data interaction.
+
+### Efficient Storage Use
+
+Ozone includes features aimed at optimizing storage utilization:
+* **Erasure Coding:** Can reduce the physical storage footprint compared to
3x replication.
+* **Small File Handling:** Manages metadata and block allocation for small
files.
+* **Containerization:** Groups data blocks into larger Storage Containers,
which can simplify management and disk I/O.
+
+### Storage Management
+
+Ozone uses a hierarchical namespace and provides management tools:
+* **Namespace:** Organizes data into Volumes (often mapped to tenants) and
Buckets (containers for objects), which hold Keys (objects/files).
+* **Quotas:** Administrators can set storage quotas at the Volume and Bucket
levels.
+* **Snapshots:** Supports point-in-time, read-only snapshots of buckets for
data protection and versioning.
+
+### Strong Consistency
+
+Ozone provides strong consistency for metadata and data operations. Reads
reflect the results of the latest successfully completed write operations.
+
+## Key Characteristics
+
+The design of Ozone leads to certain characteristics relevant for large-scale
data management:
+
+### Storage Costs
+
+Factors influencing storage costs include:
+* **Storage Efficiency:** Erasure Coding can reduce physical storage
requirements.
+* **Hardware:** Designed to run on commodity hardware.
+* **Licensing:** Apache Ozone is open-source software under the Apache
License 2.0.
+* **Scalability:** Clusters can be expanded by adding nodes or racks. Data
rebalancing mechanisms help manage utilization.
+
+### Operations
+
+Aspects related to storage administration include:
+* **Unified Storage:** Can potentially serve as a common storage layer for
different types of workloads.
+* **Management Tools:** Includes the Recon web UI for monitoring and CLI
tools for administration.
+* **Maintenance:** Supports features like rolling upgrades, node
decommissioning, and data balancing.
+
+### Hybrid Cloud Scenarios
+
+Ozone's S3 compatibility allows applications developed for S3 to run
on-premises using Ozone. This can be relevant for hybrid cloud strategies or
migrating workloads between on-premises and cloud environments.
+
+## Dive Deeper
+
+To learn more about Ozone, refer to the following sections:
+
+* **New to Ozone?** Try the **[Quick Start Guide]({{< ref "start" >}})** to
set up a cluster.
+* **Want to understand the internals?** Read about the **[Core Concepts]({{<
ref "concept" >}})** (architecture, replication, security).
+* **Need to use Ozone?** Check the **[User Guide]({{< ref "interface" >}})**
for client interfaces and integrations.
+* **Managing a cluster?** Consult the **[Administrator Guide]({{< ref
"tools" >}})** for installation, configuration, and operations.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]