This is an automated email from the ASF dual-hosted git repository.
weichiu pushed a commit to branch HDDS-9225-website-v2
in repository https://gitbox.apache.org/repos/asf/ozone-site.git
The following commit(s) were added to refs/heads/HDDS-9225-website-v2 by this
push:
new 041e659be HDDS-14324. [Website v2] [Blog] Introducing the Automatic
Disk Balancer in Apache Ozone (#276)
041e659be is described below
commit 041e659be5c7fd18162515331de6b434adfc3e5a
Author: ChenChen Lai <[email protected]>
AuthorDate: Thu Jan 29 17:03:04 2026 +0800
HDDS-14324. [Website v2] [Blog] Introducing the Automatic Disk Balancer in
Apache Ozone (#276)
Co-authored-by: Gargi Jaiswal
<[email protected]>
Co-authored-by: Wei-Chiu Chuang <[email protected]>
---
blog/2026-01-29-disk-balancer-preview.md | 140 +++++++++++++++++++++++++++++++
blog/authors.yml | 18 ++++
cspell.yaml | 6 ++
3 files changed, 164 insertions(+)
diff --git a/blog/2026-01-29-disk-balancer-preview.md
b/blog/2026-01-29-disk-balancer-preview.md
new file mode 100644
index 000000000..759c9e4bd
--- /dev/null
+++ b/blog/2026-01-29-disk-balancer-preview.md
@@ -0,0 +1,140 @@
+---
+title: "No More Hotspots: Introducing the Automatic Disk Balancer in Apache
Ozone"
+authors: ["apache-ozone-community","jojochuang", "0lai0","Gargi-jais11"]
+date: 2026-01-29
+tags: [Ozone, Disk Balancer, Ozone 2.2, Datanode]
+---
+
+Ever replaced a drive on a Datanode only to watch it become an I/O hotspot?
+Or seen one disk hit 95% usage while others on the same machine sit idle?
+These imbalances create performance bottlenecks and increase failure risk.
+Apache Ozone's new intra-node Disk Balancer is designed to fix
this—automatically.
+
+<!-- truncate -->
+
+Cluster-wide balancing in Ozone already ensures replicas are evenly spread
across Datanodes. But inside a single Datanode, disks can still drift out of
balance over time — for example after adding new disks, replacing hardware, or
performing large deletions. This leads to I/O hotspots and uneven wear.
+
+Disk Balancer closes that gap.
+
+## Why Disk Balancer?
+
+- **Disks fill unevenly** when nodes gain or lose volumes.
+
+- **Large deletes** can empty some disks disproportionately.
+
+- **Hot disks degrade performance** and become failure risks.
+
+Even if the cluster is balanced, the node itself may not be. Disk Balancer
fixes this automatically.
+
+## How it works
+
+The design ([HDDS-5713](https://issues.apache.org/jira/browse/HDDS-5713))
introduces a simple metric: **Volume Data Density** — how much a disk's
utilization deviates from the node's average. If the deviation exceeds a
threshold, the node begins balancing.
+
+Balancing is local and safe:
+
+- Only **closed containers** are moved.
+- Moves happen entirely **within the same Datanode.**
+- A scheduler periodically checks for imbalance and dispatches copy-and-import
tasks.
+- Bandwidth and concurrency are **operator-tunable** to avoid interfering with
production I/O.
+
+This runs independently on each Datanode. To use it, first enable the feature
by setting `hdds.datanode.disk.balancer.enabled = true` in `ozone-site.xml` on
your Datanodes. Once enabled, clients use `ozone admin datanode diskbalancer`
commands to talk directly to Datanodes, with SCM only used to discover
IN_SERVICE Datanodes when running batch operations with
`--in-service-datanodes`.
+
+## How DiskBalancer Decides What to Move
+
+DiskBalancer uses simple but robust policies to decide **which disks to
balance** and **which containers to move** (see the design doc for details:
`diskbalancer.md` in
[HDDS-5713](https://issues.apache.org/jira/browse/HDDS-5713)).
+
+- **Default Volume Choosing Policy**: Picks the most over‑utilized volume as
the source and the most under‑utilized volume as the destination, based on each
disk’s **Volume Data Density** and the Datanode’s average utilization.
+- **Default Container Choosing Policy**: Scans containers on the source volume
and moves only **CLOSED** containers that are not already being moved. To avoid
repeatedly scanning the same list, it caches container metadata with automatic
expiry.
+
+These defaults aim to make safe, incremental moves that converge the disks
toward an even utilization state.
+
+### Container Move Process
+
+When DiskBalancer moves a container from one disk to another on the **same
Datanode**, it follows a careful **"Copy-Validate-Replace"** flow (summarized
from the design doc for
[HDDS-5713](https://issues.apache.org/jira/browse/HDDS-5713)):
+
+1. Create a temporary copy of the CLOSED container on the destination disk.
+2. Transition that copy into a **RECOVERING** state and import it as a new
container on the destination.
+3. Once import and metadata updates succeed, delete the original CLOSED
container from the source disk.
+
+This ensures that data is always consistent: the destination copy is fully
validated before the original is removed, minimizing risk during balancing.
+
+## Using Disk Balancer
+
+First, enable the Disk Balancer feature on each Datanode by setting the
following in `ozone-site.xml`:
+
+- `hdds.datanode.disk.balancer.enabled = true`
+
+The Disk Balancer CLI supports two command patterns:
+
+- `ozone admin datanode diskbalancer <command> --in-service-datanodes` -
Operate on all **IN_SERVICE and HEALTHY** Datanodes
+- `ozone admin datanode diskbalancer <command>
<dn-hostname/dn-ipaddress:port>` - Operate on a specific Datanode
+
+Available commands:
+
+- **start** - Start the Disk Balancer on the target Datanode(s)
+- **stop** - Stop the Disk Balancer on the target Datanode(s)
+- **status** - Check the current Disk Balancer status
+- **report** - Get a volume density report showing imbalance across disks
+- **update** - Update Disk Balancer configuration settings
+
+Examples:
+
+```bash
+# Start Disk Balancer
+ozone admin datanode diskbalancer start --in-service-datanodes
+or
+ozone admin datanode diskbalancer start ozone-datanode-1 ozone-datanode-5
+
+# user can also specifiy configuration parameters during start
+ozone admin datanode diskbalancer start -t <value> -b <value> -p <value> -s
<value> --in-service-datanodes
+or
+ozone admin datanode diskbalancer start -t <value> -b <value> -p <value> -s
<value> ozone-datanode-1
+
+# Stop Disk Balancer
+ozone admin datanode diskbalancer stop --in-service-datanodes
+or
+ozone admin datanode diskbalancer stop 192.168.1.100:9860
+
+# Check status
+ozone admin datanode diskbalancer status --in-service-datanodes
+or
+ozone admin datanode diskbalancer status ozone-datanode-1
+
+# Get volume density report
+ozone admin datanode diskbalancer report --in-service-datanodes
+or
+ozone admin datanode diskbalancer report 192.168.1.100:9860
+
+# Update configuration
+ozone admin datanode diskbalancer update -t <value> -b <value> -p <value> -s
<value> --in-service-datanodes
+or
+ozone admin datanode diskbalancer update -t <value> -b <value> -p <value> -s
<value> ozone-datanode-1
+```
+
+### Configuration Parameters
+
+The following parameters can be specified during **start** or **update
configuration** Disk Balancer:
+
+| Parameter | Short Flag | Default Value | Description |
+| --------- | ---------- |---------------| ----------- |
+| `--threshold` | `-t` | `10.0` | Percentage deviation from average
utilization of the disks after which a Datanode will be rebalanced. |
+| `--bandwidth-in-mb` | `-b` | `10` | Maximum bandwidth for
DiskBalancer per second. |
+| `--parallel-thread` | `-p` | `5` | Max parallel thread count for
DiskBalancer. |
+| `--stop-after-disk-even` | `-s` | `true` | Stop DiskBalancer
automatically after disk utilization is even. |
+
+## Benefits for operators
+
+- **Even I/O load** across disks → more stable performance.
+- **Smooth ops after hardware changes** (new or replaced disks).
+- **Hands-off balancing** once enabled.
+- **Clear metrics** for observability and troubleshooting.
+
+It complements the existing Container Balancer: one works across nodes, the
other within nodes.
+
+## Closing Thoughts
+
+Disk Balancer is small but impactful. It brings Ozone closer to being a fully
self-healing, self-balancing object store — reducing hotspots, simplifying
maintenance, and improving cluster longevity.
+
+Ozone 2.2 will ship with this feature available via simple CLI controls and
safe defaults. If you run long-lived clusters, this is a feature to watch.
+
+For more information, check out
[HDDS-5713](https://issues.apache.org/jira/browse/HDDS-5713).
diff --git a/blog/authors.yml b/blog/authors.yml
index 215dc4bba..309a74272 100644
--- a/blog/authors.yml
+++ b/blog/authors.yml
@@ -20,3 +20,21 @@ apache-ozone-community:
title: Apache Ozone Project
url: https://ozone.apache.org
image_url: /img/ozone-logo.svg
+
+jojochuang:
+ name: Wei-Chiu Chuang
+ title: Apache Ozone PMC
+ url: https://github.com/jojochuang
+ image_url: https://github.com/jojochuang.png
+
+0lai0:
+ name: Yu-Chen Lai
+ title: Apache Ozone Contributor
+ url: https://github.com/0lai0
+ image_url: https://github.com/0lai0.png
+
+Gargi-jais11:
+ name: Gargi Jaiswal
+ title: Apache Ozone Contributor
+ url: https://github.com/Gargi-jais11
+ image_url: https://github.com/Gargi-jais11.png
diff --git a/cspell.yaml b/cspell.yaml
index c0ec4278c..4d898f64a 100644
--- a/cspell.yaml
+++ b/cspell.yaml
@@ -212,6 +212,12 @@ words:
- utilisation
- utilised
- hotspots
+- hotspot
+# Author names
+- jojochuang
+- Gargi
+- jais
+- lai
- activetimebyseconds
- charindex
- thecount
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]