This is an automated email from the ASF dual-hosted git repository.
sarvekshayr pushed a commit to branch HDDS-9225-website-v2
in repository https://gitbox.apache.org/repos/asf/ozone-site.git
The following commit(s) were added to refs/heads/HDDS-9225-website-v2 by this
push:
new b3e05f01e HDDS-14260. [Website v2] [Docs] [Administrator Guide]
Balancing Data Among Datanodes (#212)
b3e05f01e is described below
commit b3e05f01e920e82e4c13885ca59fd037949ea314
Author: Eric C. Ho <[email protected]>
AuthorDate: Mon Jan 5 15:44:37 2026 +0800
HDDS-14260. [Website v2] [Docs] [Administrator Guide] Balancing Data Among
Datanodes (#212)
---
.../03-operations/05-data-balancing.md | 88 +++++++++++++++++++++-
1 file changed, 87 insertions(+), 1 deletion(-)
diff --git a/docs/05-administrator-guide/03-operations/05-data-balancing.md
b/docs/05-administrator-guide/03-operations/05-data-balancing.md
index 18e3aeba0..43231959b 100644
--- a/docs/05-administrator-guide/03-operations/05-data-balancing.md
+++ b/docs/05-administrator-guide/03-operations/05-data-balancing.md
@@ -4,4 +4,90 @@ sidebar_label: Data Balancing
# Balancing Data Among Datanodes
-**TODO:** File a subtask under
[HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this
page or section.
+## Overview
+
+The Container Balancer is a tool in Apache Ozone that balances data containers
across the cluster.
+Its primary goal is to ensure an even distribution of data based on disk space
usage on Datanodes.
+This helps to prevent some Datanodes from becoming full while others remain
underutilized.
+
+The balancer operates by moving `CLOSED` container replicas, which means it
doesn't interfere with active I/O operations.
+It is designed to work with both regular and Erasure Coded (EC) containers.
+To maintain cluster stability, the Container Balancer's startup is delayed
after a Storage Container Manager (SCM) failover.
+
+## Command Line Usage
+
+The Container Balancer is managed through the `ozone admin containerbalancer`
command.
+
+### Start
+
+To start the Container Balancer with default settings:
+
+```bash
+ozone admin containerbalancer start
+```
+
+You can also start the balancer with specific options:
+
+```bash
+ozone admin containerbalancer start [options]
+```
+
+**Options:**
+
+| Option | Description |
+| ------ | ----------- |
+| `-t`, `--threshold` | The percentage deviation from the average utilization
of the cluster after which a Datanode will be rebalanced. Default is 10%. |
+| `-i`, `--iterations` | The maximum number of consecutive iterations the
balancer will run for. Default is 10. Use -1 for infinite iterations. |
+| `-d`, `--maxDatanodesPercentageToInvolvePerIteration` | The maximum
percentage of healthy, in-service Datanodes that can be involved in balancing
in one iteration. Default is 20%. |
+| `-s`, `--maxSizeToMovePerIterationInGB` | The maximum size of data in GB to
be moved in one iteration. Default is 500GB. |
+| `-e`, `--maxSizeEnteringTargetInGB` | The maximum size in GB that can enter
a target Datanode in one iteration. Default is 26GB. |
+| `-l`, `--maxSizeLeavingSourceInGB` | The maximum size in GB that can leave a
source Datanode in one iteration. Default is 26GB. |
+| `--balancing-iteration-interval-minutes` | The interval in minutes between
each iteration of the Container Balancer. Default is 70 minutes. |
+| `--move-timeout-minutes` | The time in minutes to allow a single container
to move from source to target. Default is 65 minutes. |
+| `--move-replication-timeout-minutes` | The time in minutes to allow a single
container's replication from source to target as part of a container move.
Default is 50 minutes. |
+| `--move-network-topology-enable` | Whether to consider network topology when
selecting a target for a source. Default is false. |
+| `--include-datanodes` | A comma-separated list of Datanode hostnames or IP
addresses to be included in balancing. |
+| `--exclude-datanodes` | A comma-separated list of Datanode hostnames or IP
addresses to be excluded from balancing. |
+
+### Status
+
+To check the status of the Container Balancer:
+
+```bash
+ozone admin containerbalancer status
+```
+
+To get a more detailed status, including the history of iterations:
+
+```bash
+ozone admin containerbalancer status -v --history
+```
+
+### Stop
+
+To stop the Container Balancer:
+
+```bash
+ozone admin containerbalancer stop
+```
+
+## Configuration
+
+The Container Balancer can also be configured through the `ozone-site.xml`
file.
+
+| Property | Description | Default Value |
+| -------- | ----------- | ------------- |
+| `hdds.container.balancer.utilization.threshold` | A cluster is considered
balanced if for each Datanode, the utilization of the Datanode differs from the
utilization of the cluster no more than this threshold. | 10% |
+| `hdds.container.balancer.datanodes.involved.max.percentage.per.iteration` |
Maximum percentage of healthy, in-service Datanodes that can be involved in
balancing in one iteration. | 20% |
+| `hdds.container.balancer.size.moved.max.per.iteration` | The maximum size of
data that will be moved by Container Balancer in one iteration. | 500GB |
+| `hdds.container.balancer.size.entering.target.max` | The maximum size that
can enter a target Datanode in each iteration. | 26GB |
+| `hdds.container.balancer.size.leaving.source.max` | The maximum size that
can leave a source Datanode in each iteration. | 26GB |
+| `hdds.container.balancer.iterations` | The number of iterations that
Container Balancer will run for. | 10 |
+| `hdds.container.balancer.exclude.containers` | A comma-separated list of
container IDs to exclude from balancing. | "" |
+| `hdds.container.balancer.move.timeout` | The amount of time to allow a
single container to move from source to target. | 65m |
+| `hdds.container.balancer.move.replication.timeout` | The amount of time to
allow a single container's replication from source to target as part of a
container move. | 50m |
+| `hdds.container.balancer.balancing.iteration.interval` | The interval period
between each iteration of Container Balancer. | 70m |
+| `hdds.container.balancer.include.datanodes` | A comma-separated list of
Datanode hostnames or IP addresses. Only the Datanodes specified in this list
are balanced. | "" |
+| `hdds.container.balancer.exclude.datanodes` | A comma-separated list of
Datanode hostnames or IP addresses. The Datanodes specified in this list are
excluded from balancing. | "" |
+| `hdds.container.balancer.move.networkTopology.enable` | Whether to take
network topology into account when selecting a target for a source. | false |
+| `hdds.container.balancer.trigger.du.before.move.enable` | Whether to send a
command to all healthy and in-service data nodes to run `du` immediately before
starting a balance iteration. | false |
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]