This is an automated email from the ASF dual-hosted git repository.
weichiu pushed a commit to branch HDDS-9225-website-v2
in repository https://gitbox.apache.org/repos/asf/ozone-site.git
The following commit(s) were added to refs/heads/HDDS-9225-website-v2 by this
push:
new c3e401712 HDDS-14323. Migrating HA snapshot troubleshooting guide
(#223)
c3e401712 is described below
commit c3e4017120318fde348196337e5e671b43ada7c0
Author: Jason O'Sullivan <[email protected]>
AuthorDate: Wed Jan 7 21:19:01 2026 +0000
HDDS-14323. Migrating HA snapshot troubleshooting guide (#223)
Co-authored-by: Wei-Chiu Chuang <[email protected]>
---
.../16-om-ha-snapshot-installation-issues.md | 25 ++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/docs/06-troubleshooting/16-om-ha-snapshot-installation-issues.md
b/docs/06-troubleshooting/16-om-ha-snapshot-installation-issues.md
new file mode 100644
index 000000000..0f2decd55
--- /dev/null
+++ b/docs/06-troubleshooting/16-om-ha-snapshot-installation-issues.md
@@ -0,0 +1,25 @@
+---
+sidebar_label: OM HA snapshot installation
+---
+
+# Troubleshooting OM HA snapshot installation issues
+
+When a new Ozone Manager (OM) is added to an existing OM HA cluster, it needs
to obtain the latest OM DB snapshot from the leader OM.
+In cases where the OM DB is very large, the new OM may get stuck in a loop
trying to download the snapshot.
+This can happen if the leader OM purges the Raft logs associated with the
snapshot before the new OM can finish downloading it.
+When this happens, the new OM will have to restart the snapshot download, and
the process can repeat indefinitely.
+
+To avoid this issue, you can configure the following properties on the leader
OM:
+
+1. Set `ozone.om.ratis.log.purge.preservation.log.num` to a high value (e.g.
1000000).
+This property controls how many Raft logs are preserved on the leader OM.
+By setting it to a high value, you can prevent the leader from purging the
logs that the new OM needs to catch up. This is a more balanced approach to
ensure that some logs are preserved so that they can be replicated to the slow
follower (instead of installing snapshot), but if the number of logs exceeded
this amount, OM leader will purge the logs to prevent disk to be full.
+2. Set `ozone.om.ratis.log.purge.upto.snapshot.index` to `false`.
+This property prevents the leader OM from purging any logs until all followers
have installed the latest snapshot.
+This ensures that the new OM will have enough time to download and install the
snapshot without the logs being purged. This is a more risky approach since it
might cause the Raft logs to increase indefinitely when the OM follower is down
for a long time, which can cause OM metadata dir to be full.
+
+:::note
+If `ozone.om.ratis.log.purge.preservation.log.num` is set to a non-zero
number, it is recommended to keep
`ozone.om.ratis.log.purge.upto.snapshot.index` to `true` (default value) since
`ozone.om.ratis.log.purge.upto.snapshot.index` will override the preservation
configuration. Therefore, these two properties should not be set together.
+
+By tuning these two parameters, you can avoid the OM snapshot installation
loop and successfully add new OMs to your HA cluster.
+:::
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]