This is an automated email from the ASF dual-hosted git repository.

weichiu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ozone.git


The following commit(s) were added to refs/heads/master by this push:
     new a5cb560cb5a HDDS-12196. Document ozone repair cli (#8849)
a5cb560cb5a is described below

commit a5cb560cb5a41f7bd9f9fe7c7f4ca3cd65aaf8e2
Author: Sarveksha Yeshavantha Raju 
<[email protected]>
AuthorDate: Tue Aug 5 06:58:27 2025 +0530

    HDDS-12196. Document ozone repair cli (#8849)
---
 hadoop-hdds/docs/content/tools/Repair.md | 252 +++++++++++++++++++++++++++++++
 1 file changed, 252 insertions(+)

diff --git a/hadoop-hdds/docs/content/tools/Repair.md 
b/hadoop-hdds/docs/content/tools/Repair.md
new file mode 100644
index 00000000000..002b163773c
--- /dev/null
+++ b/hadoop-hdds/docs/content/tools/Repair.md
@@ -0,0 +1,252 @@
+---
+title: "Ozone Repair"
+date: 2025-07-22
+summary: Advanced tool to repair Ozone.
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone Repair (`ozone repair`) is an advanced tool to repair Ozone. The nodes 
being repaired must be stopped before the tool is run.
+Note: All repair commands support a `--dry-run` option which allows a user to 
see what repair the command will be performing without actually making any 
changes to the cluster.
+Use the `--force` flag to override the running service check in false-positive 
cases.
+
+```bash
+Usage: ozone repair [-hV] [--verbose] [-conf=<configurationPath>]
+                    [-D=<String=String>]... [COMMAND]
+Advanced tool to repair Ozone. The nodes being repaired must be stopped before
+the tool is run.
+      -conf=<configurationPath>
+
+  -D, --set=<String=String>
+
+  -h, --help      Show this help message and exit.
+  -V, --version   Print version information and exit.
+      --verbose   More verbose output. Show the stack trace of the errors.
+Commands:
+  datanode  Tools to repair Datanode
+  ldb       Operational tool to repair ldb.
+  om        Operational tool to repair OM.
+  scm       Operational tool to repair SCM.
+```
+For more detailed usage see the output of `--help` for each of the subcommands.
+
+## ozone repair datanode
+Operational tool to repair datanode.
+
+### upgrade-container-schema
+Upgrade all schema V2 containers to schema V3 for a datanode in offline mode.
+Optionally takes `--volume` option to specify which volume needs the upgrade.
+
+## ozone repair ldb
+Operational tool to repair ldb.
+
+### compact
+Compact a column family in the DB to clean up tombstones while the service is 
offline.
+```bash
+Usage: ozone repair ldb compact [-hV] [--dry-run] [--force] [--verbose]
+                                --cf=<columnFamilyName> --db=<dbPath>
+CLI to compact a column-family in the DB while the service is offline.
+Note: If om.db is compacted with this tool then it will negatively impact the
+Ozone Manager\'s efficient snapshot diff.
+      --cf, --column-family, --column_family=<columnFamilyName>
+                      Column family name
+      --db=<dbPath>   Database File Path
+```
+
+## ozone repair om
+Operational tool to repair OM.
+
+#### Subcommands under OM
+- fso-tree
+- snapshot
+- update-transaction
+- quota
+- compact
+- skip-ratis-transaction
+
+### fso-tree
+Identify and repair a disconnected FSO tree by marking unreferenced entries 
for deletion.
+Reports the reachable, unreachable (pending delete) and unreferenced 
(orphaned) directories and files.
+OM should be stopped while this tool is run.
+```bash
+Usage: ozone repair om fso-tree [-hV] [--dry-run] [--force] [--verbose]
+                                [-b=<bucketFilter>] --db=<omDBPath>
+                                [-v=<volumeFilter>]
+Identify and repair a disconnected FSO tree by marking unreferenced entries for
+deletion. OM should be stopped while this tool is run.
+  -b, --bucket=<bucketFilter>
+                        Filter by bucket name
+      --db=<omDBPath>   Path to OM RocksDB
+  -v, --volume=<volumeFilter>
+                        Filter by volume name. Add '/' before the volume name.
+```
+
+### snapshot
+Subcommand for all snapshot related repairs.
+
+#### chain
+Update global and path previous snapshot for a snapshot in case snapshot chain 
is corrupted.
+```bash
+Usage: ozone repair om snapshot chain [-hV] [--dry-run] [--force] [--verbose]
+                                      --db=<dbPath>
+                                      --gp=<globalPreviousSnapshotId>
+                                      --pp=<pathPreviousSnapshotId> <value>
+                                      <snapshotName>
+CLI to update global and path previous snapshot for a snapshot in case snapshot
+chain is corrupted.
+      <value>          URI of the bucket (format: volume/bucket).
+      <snapshotName>   Snapshot name to update
+      --db=<dbPath>    Database File Path
+      --gp, --global-previous=<globalPreviousSnapshotId>
+                       Global previous snapshotId to set for the given snapshot
+      --pp, --path-previous=<pathPreviousSnapshotId>
+                       Path previous snapshotId to set for the given snapshot
+```
+
+### update-transaction
+To avoid modifying Ratis logs and only update the latest applied transaction, 
use `update-transaction` command. 
+This updates the highest transaction index in the OM transaction info table.
+```bash
+Usage: ozone repair om update-transaction [-hV] [--dry-run] [--force]
+       [--verbose] --db=<dbPath> --index=<highestTransactionIndex>
+       --term=<highestTransactionTerm>
+CLI to update the highest index in transaction info table.
+      --db=<dbPath>   Database File Path
+      --index=<highestTransactionIndex>
+                      Highest index to set. The input should be non-zero long
+                        integer.
+      --term=<highestTransactionTerm>
+                      Highest term to set. The input should be non-zero long
+                        integer.
+```
+
+### quota
+Operational tool to repair quota in OM DB.
+
+#### start
+To trigger quota repair use the `start` command.
+```bash
+Usage: ozone repair om quota start [-hV] [--dry-run] [--force] [--verbose]
+                                   [--buckets=<buckets>]
+                                   [--service-host=<omHost>]
+                                   [--service-id=<omServiceId>]
+CLI to trigger quota repair.
+      --buckets=<buckets>   start quota repair for specific buckets. Input will
+                              be list of uri separated by comma as
+                              /<volume>/<bucket>[,...]
+      --service-host=<omHost>
+                            Ozone Manager Host. If OM HA is enabled, use
+                              --service-id instead. If you must use
+                              --service-host with OM HA, this must point
+                              directly to the leader OM. This option is
+                              required when --service-id is not provided or
+                              when HA is not enabled.
+      --service-id, --om-service-id=<omServiceId>
+                            Ozone Manager Service ID
+```
+
+#### status
+Get the status of last triggered quota repair.
+```bash
+Usage: ozone repair om quota status [-hV] [--verbose] [--service-host=<omHost>]
+                                    [--service-id=<omServiceId>]
+CLI to get the status of last trigger quota repair if available.
+      --service-host=<omHost>
+                  Ozone Manager Host. If OM HA is enabled, use --service-id
+                    instead. If you must use --service-host with OM HA, this
+                    must point directly to the leader OM. This option is
+                    required when --service-id is not provided or when HA is
+                    not enabled.
+      --service-id, --om-service-id=<omServiceId>
+                  Ozone Manager Service ID
+```
+
+### compact
+Compact a column family in the OM DB to clean up tombstones. The compaction 
happens asynchronously. Requires admin privileges.
+```bash
+Usage: ozone repair om compact [-hV] [--dry-run] [--force] [--verbose]
+                               --cf=<columnFamilyName> [--node-id=<nodeId>]
+                               [--service-id=<omServiceId>]
+CLI to compact a column family in the om.db. The compaction happens
+asynchronously. Requires admin privileges.
+      --cf, --column-family, --column_family=<columnFamilyName>
+                           Column family name
+      --node-id=<nodeId>   NodeID of the OM for which db needs to be compacted.
+      --service-id, --om-service-id=<omServiceId>
+                           Ozone Manager Service ID
+```
+
+### skip-ratis-transaction, srt
+Omit a raft log in a ratis segment file by replacing the specified index with 
a dummy EchoOM command. 
+This is an offline tool meant to be used only when all 3 OMs crash on the same 
transaction. 
+If the issue is isolated to one OM, manually copy the DB from a healthy OM 
instead.
+```bash
+Usage: ozone repair om skip-ratis-transaction [-hV] [--dry-run] [--force]
+       [--verbose] -b=<backupDir> --index=<index> (-s=<segmentFile> |
+       -d=<logDir>)
+CLI to omit a raft log in a ratis segment file. The raft log at the index
+specified is replaced with an EchoOM command (which is a dummy command). It is
+an offline command i.e., doesn\'t require OM to be running. The command should
+be run for the same transaction on all 3 OMs only when all the OMs are crashing
+while applying the same transaction. If only one OM is crashing and the other
+OMs have executed the log successfully, then the DB should be manually copied
+from one of the good OMs to the crashing OM instead.
+  -b, --backup=<backupDir>   Directory to put the backup of the original
+                               repaired segment file before the repair.
+  -d, --ratis-log-dir=<logDir>
+                             Path of the ratis log directory
+      --index=<index>        Index of the failing transaction that should be
+                               removed
+  -s, --segment-path=<segmentFile>
+                             Path of the input segment file
+```
+
+## ozone repair scm
+Operational tool to repair SCM.
+
+#### Subcommands under SCM
+- cert
+- update-transaction
+
+### cert
+Subcommand for all certificate related repairs on SCM
+
+#### recover
+Recover Deleted SCM Certificate from RocksDB
+```bash
+Usage: ozone repair scm cert recover [-hV] [--dry-run] [--force] [--verbose]
+                                     --db=<dbPath>
+Recover Deleted SCM Certificate from RocksDB
+      --db=<dbPath>   SCM DB Path
+```
+
+### update-transaction
+To avoid modifying Ratis logs and only update the latest applied transaction, 
use `update-transaction` command.
+This updates the highest transaction index in the SCM transaction info table.
+```bash
+Usage: ozone repair scm update-transaction [-hV] [--dry-run] [--force]
+       [--verbose] --db=<dbPath> --index=<highestTransactionIndex>
+       --term=<highestTransactionTerm>
+CLI to update the highest index in transaction info table.
+      --db=<dbPath>   Database File Path
+      --index=<highestTransactionIndex>
+                      Highest index to set. The input should be non-zero long
+                        integer.
+      --term=<highestTransactionTerm>
+                      Highest term to set. The input should be non-zero long
+                        integer.
+```


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to