This is an automated email from the ASF dual-hosted git repository.

ssulav pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ozone-installer.git

commit f2e225c95daa6308e0d1d937f4b374a6380fc37c
Author: Soumitra Sulav <[email protected]>
AuthorDate: Mon Jan 26 19:29:47 2026 +0530

    HDDS-13870: Initial commit with OM, SCM, Recon, S3G support
---
 .gitignore                                       |   2 +
 README.md                                        | 150 ++++++
 ansible.cfg                                      |  26 +
 callback_plugins/last_failed.py                  |  57 ++
 inventories/dev/group_vars/all.yml               |  41 ++
 inventories/dev/hosts.ini                        |  17 +
 ozone_installer.py                               | 656 +++++++++++++++++++++++
 playbooks/cluster.yml                            |  59 ++
 requirements.txt                                 |   2 +
 roles/cleanup/tasks/main.yml                     |  52 ++
 roles/java/defaults/main.yml                     |  12 +
 roles/java/tasks/main.yml                        |  77 +++
 roles/ozone_config/defaults/main.yml             |   6 +
 roles/ozone_config/tasks/main.yml                |  57 ++
 roles/ozone_config/templates/core-site.xml.j2    |  15 +
 roles/ozone_config/templates/ozone-env.sh.j2     |  53 ++
 roles/ozone_config/templates/ozone-hosts.yaml.j2 |  30 ++
 roles/ozone_config/templates/ozone-site.xml.j2   | 128 +++++
 roles/ozone_config/templates/workers.j2          |   3 +
 roles/ozone_fetch/defaults/main.yml              |   9 +
 roles/ozone_fetch/tasks/main.yml                 | 113 ++++
 roles/ozone_layout/defaults/main.yml             |   3 +
 roles/ozone_layout/tasks/main.yml                |  29 +
 roles/ozone_service/tasks/main.yml               | 100 ++++
 roles/ozone_smoke/tasks/main.yml                 |  91 ++++
 roles/ozone_ui/tasks/main.yml                    |  32 ++
 roles/ozone_user/defaults/main.yml               |   6 +
 roles/ozone_user/tasks/main.yml                  |  33 ++
 roles/ssh_bootstrap/defaults/main.yml            |  14 +
 roles/ssh_bootstrap/tasks/main.yml               |  72 +++
 30 files changed, 1945 insertions(+)

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..7859f36
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,2 @@
+logs/**
+*.pyc
\ No newline at end of file
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..7f87e73
--- /dev/null
+++ b/README.md
@@ -0,0 +1,150 @@
+# Ozone Installer (Ansible)
+
+## On‑prem quickstart (with ozone-installer)
+
+This installer automates the on‑prem steps described in the official Ozone 
docs, including SCM/OM initialization and starting services (SCM, OM, 
Datanodes, Recon). See the Ozone on‑prem guide for the conceptual background 
and properties such as `ozone.metadata.dirs`, `ozone.scm.names`, and 
`ozone.om.address` [Ozone On Premise 
Installation](https://ozone.apache.org/docs/edge/start/onprem.html).
+
+
+What the installer does (mapped to the on‑prem doc):
+- Initializes SCM and OM once, in the correct order, then starts them
+- Starts Datanodes on all DN hosts
+- Starts Recon on the first Recon host
+- Renders `ozone-site.xml` with addresses derived from inventory (SCM names, 
OM address/service IDs, replication factor based on DN count)
+
+Ports and service behavior follow Ozone defaults; consult the official 
documentation for details [Ozone On Premise 
Installation](https://ozone.apache.org/docs/edge/start/onprem.html).
+
+## Software Requirements
+
+- Controller: Python 3.10–3.12 (prefer 3.11) and pip
+- Ansible Community 10.x (ansible-core 2.17.x)
+- Python packages (installed via `requirements.txt`):
+  - `ansible-core==2.17.*`
+  - `click==8.*` (for nicer interactive prompts; optional but recommended)
+- SSH prerequisites on controller:
+  - `sshpass` (only if using password auth with `-m password`)
+    - Debian/Ubuntu: `sudo apt-get install -y sshpass`
+    - RHEL/CentOS/Rocky: `sudo yum install -y sshpass` or `sudo dnf install -y 
sshpass`
+    - SUSE: `sudo zypper in -y sshpass`
+
+### Controller node requirements
+- Can be local or remote.
+- Must be on the same network as the target hosts.
+- Requires SSH access (key or password).
+
+### Run on the controller node
+```bash
+pip install -r requirements.txt
+```
+
+## File structure
+
+- `ansible.cfg` (defaults and logging)
+- `playbooks/` (`cluster.yml`)
+- `roles/` (ssh_bootstrap, ozone_user, java, ozone_layout, ozone_fetch, 
ozone_config, ozone_service, ozone_smoke, cleanup, ozone_ui)
+
+## Usage (two options)
+
+1) Python wrapper (orchestrates Ansible for you)
+
+```bash
+# Non-HA upstream
+python3 ozone_installer.py -H host1.domain -v 2.0.0
+
+# HA upstream (3+ hosts) - mode auto-detected
+python3 ozone_installer.py -H "host{1..3}.domain" -v 2.0.0
+
+# Local snapshot build
+python3 ozone_installer.py -H host1 -v local --local-path 
/path/to/share/ozone-2.1.0-SNAPSHOT
+
+# Cleanup and reinstall
+python3 ozone_installer.py --clean -H "host{1..3}.domain" -v 2.0.0
+
+# Notes on cleanup
+# - During a normal install, you'll be asked whether to cleanup an existing 
install (if present). Default is No.
+# - Use --clean to cleanup without prompting before reinstall.
+```
+
+### Interactive prompts and version selection
+- The installer uses `click` for interactive prompts when available (TTY).
+- Version selection shows a numbered list; you can select by number, type a 
specific version, or `local`.
+- A summary table of inputs is displayed and logged before execution; confirm 
to proceed.
+- Use `--yes` to auto-accept defaults (used implicitly during `--resume`).
+
+### Resume last failed task
+
+```bash
+# Python wrapper (picks task name from logs/last_failed_task.txt)
+python3 ozone_installer.py -H host1.domain -v 2.0.0 --resume
+```
+
+```bash
+# Direct Ansible
+ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini 
playbooks/cluster.yml \
+  --start-at-task "$(head -n1 logs/last_failed_task.txt)"
+```
+
+2) Direct Ansible (run playbooks yourself)
+
+```bash
+# Non-HA upstream
+ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini 
playbooks/cluster.yml -e "ozone_version=2.0.0 cluster_mode=non-ha"
+
+# HA upstream
+ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini 
playbooks/cluster.yml -e "ozone_version=2.0.0 cluster_mode=ha"
+
+# Cleanup only (run just the cleanup role)
+ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini 
playbooks/cluster.yml \
+  --tags cleanup -e "do_cleanup=true"
+```
+
+## Inventory
+
+When using the Python wrapper, inventory is built dynamically from `-H/--host` 
and persisted for reuse at:
+- `logs/last_inventory.ini` (groups: `[om]`, `[scm]`, `[datanodes]`, `[recon]` 
and optional `[s3g]`)
+- `logs/last_vars.json` (effective variables passed to the play)
+
+For direct Ansible runs, you may edit `inventories/dev/hosts.ini` and 
`inventories/dev/group_vars/all.yml`, or point to `logs/last_inventory.ini` and 
`logs/last_vars.json` that the wrapper generated.
+
+## Non-HA
+
+```bash
+ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini 
playbooks/cluster.yml -e "cluster_mode=non-ha"
+```
+
+## HA cluster
+
+```bash
+ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini 
playbooks/cluster.yml -e "cluster_mode=ha"
+```
+
+## Notes
+
+- Idempotent where possible; runtime `ozone` init/start guarded with 
`creates:`.
+- JAVA_HOME and PATH are persisted for resume; runtime settings are exported 
via `ozone-env.sh`.
+- Local snapshot mode archives from the controller and uploads/extracts on 
targets using `unarchive`.
+- Logs are written to a per-run file under `logs/` named:
+  - `ansible-<timestamp>-<hosts_raw_sanitized>.log`
+  - Ansible and the Python wrapper share the same logfile.
+- After a successful run, the wrapper prints where to find process logs on 
target hosts, e.g. `<install 
base>/current/logs/ozone-<service-user>-<process>-<host>.log`.
+
+### Directories
+
+- Install base (`install_base`, default `/opt/ozone`): where Ozone binaries 
and configs live. A `current` symlink points to the active version directory.
+- Data base (`data_base`, default `/data/ozone`): where Ozone writes on‑disk 
metadata and Datanode data (e.g., `ozone.metadata.dirs`, `hdds.datanode.dir`).
+
+## Components and config mapping
+
+- Components (per the Ozone docs): Ozone Manager (OM), Storage Container 
Manager (SCM), Datanodes (DN), and Recon. The installer maps:
+  - Non‑HA: first host runs OM+SCM+Recon; all hosts are DNs.
+  - HA: first three hosts serve as OM and SCM sets; all hosts are DNs; first 
host is Recon.
+- `ozone-site.xml` is rendered from templates based on inventory groups:
+  - `ozone.scm.names`, `ozone.scm.client.address`, `ozone.om.address` or HA 
service IDs
+  - `ozone.metadata.dirs`, `hdds.datanode.dir`, and related paths map to 
`data_base`
+  - Replication is set to ONE if DN count < 3, otherwise THREE
+
+## Optional: S3 Gateway (S3G) and smoke
+
+- Define a `[s3g]` group in inventory (commonly the first OM host) to enable 
S3G properties in `ozone-site.xml` (default HTTP port 9878).
+- The smoke role can optionally install `awscli` on the first S3G host, 
configure dummy credentials, and create/list a test bucket against 
`http://localhost:9878` (for simple functional verification).
+
+
diff --git a/ansible.cfg b/ansible.cfg
new file mode 100644
index 0000000..5d7f571
--- /dev/null
+++ b/ansible.cfg
@@ -0,0 +1,26 @@
+[defaults]
+inventory = inventories/dev/hosts.ini
+stdout_callback = default
+retry_files_enabled = False
+gathering = smart
+forks = 20
+strategy = free
+timeout = 30
+roles_path = roles
+log_path = logs/ansible.log
+bin_ansible_callbacks = True
+callback_plugins = callback_plugins
+callbacks_enabled = timer, profile_tasks, last_failed ; for execution time 
profiling and resume hints
+deprecation_warnings = False
+host_key_checking = False
+remote_tmp = /tmp/.ansible-${USER}
+
+[privilege_escalation]
+become = True
+become_method = sudo
+
+[ssh_connection]
+pipelining = True
+ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o 
StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
+
+
diff --git a/callback_plugins/last_failed.py b/callback_plugins/last_failed.py
new file mode 100644
index 0000000..93b07da
--- /dev/null
+++ b/callback_plugins/last_failed.py
@@ -0,0 +1,57 @@
+from __future__ import annotations
+
+import os
+from pathlib import Path
+from ansible.plugins.callback import CallbackBase
+
+
+class CallbackModule(CallbackBase):
+    CALLBACK_VERSION = 2.0
+    CALLBACK_TYPE = 'notification'
+    CALLBACK_NAME = 'last_failed'
+    CALLBACK_NEEDS_WHITELIST = False
+
+    def __init__(self):
+        super().__init__()
+        # Write to installer logs dir
+        self._out_dir = Path(__file__).resolve().parents[1] / "logs"
+        self._out_file = self._out_dir / "last_failed_task.txt"
+        try:
+            os.makedirs(self._out_dir, exist_ok=True)
+        except Exception:
+            pass
+
+    def _write_last_failed(self, result):
+        try:
+            task_name = result._task.get_name()  # noqa
+            task_path = getattr(result._task, "get_path", lambda: None)()  # 
noqa
+            lineno = getattr(result._task, "get_lineno", lambda: None)()  # 
noqa
+            role_name = None
+            if task_path and "/roles/" in task_path:
+                try:
+                    role_segment = task_path.split("/roles/")[1]
+                    role_name = role_segment.split("/")[0]
+                except Exception:
+                    role_name = None
+            host = getattr(result, "_host", None)
+            host_name = getattr(host, "name", "unknown") if host else "unknown"
+            line = f"{task_name}\n# host: {host_name}\n"
+            if task_path:
+                line += f"# file: {task_path}\n"
+            if lineno:
+                line += f"# line: {lineno}\n"
+            if role_name:
+                line += f"# role: {role_name}\n"
+            with open(self._out_file, "w", encoding="utf-8") as f:
+                f.write(line)
+        except Exception:
+            # Best effort only; never break the run
+            pass
+
+    def v2_runner_on_failed(self, result, ignore_errors=False):
+        self._write_last_failed(result)
+
+    def v2_runner_on_unreachable(self, result):
+        self._write_last_failed(result)
+
+
diff --git a/inventories/dev/group_vars/all.yml 
b/inventories/dev/group_vars/all.yml
new file mode 100644
index 0000000..608a457
--- /dev/null
+++ b/inventories/dev/group_vars/all.yml
@@ -0,0 +1,41 @@
+---
+# Global defaults
+cluster_mode: "non-ha"        # non-ha | ha
+
+# Source selection
+ozone_version: "2.0.0"        # "2.0.0" | "local"
+dl_url: "https://dlcdn.apache.org/ozone";
+
+# Local snapshot settings
+local_shared_path: ""
+local_ozone_dirname: ""
+
+# Install and data directories
+install_base: "/opt/ozone"
+data_base: "/data/ozone"
+
+# Java settings
+jdk_major: 17
+ozone_java_home: ""           # autodetected if empty
+
+# Service user/group
+service_user: "ozone"
+service_group: "ozone"
+
+# Runtime and behavior
+use_sudo: true
+start_after_install: true
+ozone_opts: "-Xmx1024m -XX:ParallelGCThreads=8"
+service_command_timeout: 300  # seconds for service init/start commands
+ansible_remote_tmp: "/tmp/.ansible-{{ ansible_user_id }}"
+
+# SSH bootstrap
+allow_cluster_ssh_key_deploy: false
+ssh_public_key_path: ""       # optional path on controller to a public key to 
install
+ssh_private_key_path: ""      # optional path to private key to copy for 
cluster identity
+
+# Markers for profile management
+JAVA_MARKER: "Apache Ozone Installer Java Home"
+ENV_MARKER: "Apache Ozone Installer Env"
+
+
diff --git a/inventories/dev/hosts.ini b/inventories/dev/hosts.ini
new file mode 100644
index 0000000..98e7a4b
--- /dev/null
+++ b/inventories/dev/hosts.ini
@@ -0,0 +1,17 @@
+[om]
+# om1.example.com
+
+[scm]
+# scm1.example.com
+
+[datanodes]
+# dn1.example.com
+# dn2.example.com
+
+[recon]
+# recon1.example.com
+
+[all:vars]
+cluster_mode=non-ha
+
+
diff --git a/ozone_installer.py b/ozone_installer.py
new file mode 100755
index 0000000..9246035
--- /dev/null
+++ b/ozone_installer.py
@@ -0,0 +1,656 @@
+#!/usr/bin/env python3
+
+import argparse
+import json
+import os
+import re
+import shlex
+import subprocess
+import sys
+import tempfile
+import logging
+from datetime import datetime
+from pathlib import Path
+from typing import List, Optional, Tuple
+
+# Optional nicer interactive prompts (fallback to built-in prompts if 
unavailable)
+try:
+    import click  # type: ignore
+except Exception:
+    click = None  # type: ignore
+
+ANSIBLE_ROOT = Path(__file__).resolve().parent
+ANSIBLE_CFG = ANSIBLE_ROOT / "ansible.cfg"
+PLAYBOOKS_DIR = ANSIBLE_ROOT / "playbooks"
+LOGS_DIR = ANSIBLE_ROOT / "logs"
+LAST_FAILED_FILE = LOGS_DIR / "last_failed_task.txt"
+LAST_RUN_FILE = LOGS_DIR / "last_run.json"
+
+DEFAULTS = {
+    "install_base": "/opt/ozone",
+    "data_base": "/data/ozone",
+    "ozone_version": "2.0.0",
+    "jdk_major": 17,
+    "service_user": "ozone",
+    "service_group": "ozone",
+    "dl_url": "https://dlcdn.apache.org/ozone";,
+    "JAVA_MARKER": "Apache Ozone Installer Java Home",
+    "ENV_MARKER": "Apache Ozone Installer Env",
+    "start_after_install": True,
+    "use_sudo": True,
+}
+
+def get_logger(log_path: Optional[Path] = None) -> logging.Logger:
+    try:
+        LOGS_DIR.mkdir(parents=True, exist_ok=True)
+    except Exception:
+        pass
+    logger = logging.getLogger("ozone_installer")
+    logger.setLevel(logging.INFO)
+    # Avoid duplicate handlers if re-invoked
+    if not logger.handlers:
+        dest = log_path or (LOGS_DIR / "ansible.log")
+        fh = logging.FileHandler(dest)
+        fh.setLevel(logging.INFO)
+        formatter = logging.Formatter("%(asctime)s | %(levelname)s | 
%(message)s")
+        fh.setFormatter(formatter)
+        logger.addHandler(fh)
+        sh = logging.StreamHandler(sys.stdout)
+        logger.addHandler(sh)
+    return logger
+
+def parse_args(argv: List[str]) -> argparse.Namespace:
+    p = argparse.ArgumentParser(
+        description="Ozone Ansible Installer (Python trigger) - mirrors bash 
installer flags"
+    )
+    p.add_argument("-H", "--host", help="Target host(s). Non-HA: host. HA: 
comma-separated or brace expansion host{1..n}")
+    p.add_argument("-m", "--auth-method", choices=["password", "key"], 
default=None)
+    p.add_argument("-p", "--password", help="SSH password (for 
--auth-method=password)")
+    p.add_argument("-k", "--keyfile", help="SSH private key file (for 
--auth-method=key)")
+    p.add_argument("-v", "--version", help="Ozone version (e.g., 2.0.0) or 
'local'")
+    p.add_argument("-i", "--install-dir", help=f"Install root (default: 
{DEFAULTS['install_base']})")
+    p.add_argument("-d", "--data-dir", help=f"Data root (default: 
{DEFAULTS['data_base']})")
+    p.add_argument("-s", "--start", action="store_true", help="Initialize and 
start after install")
+    p.add_argument("-M", "--cluster-mode", choices=["non-ha", "ha"], 
help="Force cluster mode (default: auto by host count)")
+    p.add_argument("-r", "--role-file", help="Role file (YAML) for HA mapping 
(optional)")
+    p.add_argument("-j", "--jdk-version", type=int, choices=[17, 21], 
help="JDK major version (default: 17)")
+    p.add_argument("-c", "--config-dir", help="Config dir (optional, templates 
are used by default)")
+    p.add_argument("-x", "--clean", action="store_true", help="(Reserved) 
Cleanup before install [not yet implemented]")
+    p.add_argument("-l", "--ssh-user", help="SSH username (default: root)")
+    p.add_argument("-S", "--use-sudo", action="store_true", help="Run remote 
commands via sudo (default)")
+    p.add_argument("-u", "--service-user", help="Service user (default: 
ozone)")
+    p.add_argument("-g", "--service-group", help="Service group (default: 
ozone)")
+    # Local extras
+    p.add_argument("--local-path", help="Path to local Ozone build (contains 
bin/ozone)")
+    p.add_argument("--dl-url", help="Upstream download base URL")
+    p.add_argument("--yes", action="store_true", help="Non-interactive; accept 
defaults for missing values")
+    p.add_argument("-R", "--resume", action="store_true", help="Resume play at 
last failed task (if available)")
+    return p.parse_args(argv)
+
+def _validate_local_ozone_dir(path: Path) -> bool:
+    """
+    Returns True if 'path/bin/ozone' exists and is executable.
+    """
+    ozone_bin = path / "bin" / "ozone"
+    try:
+        return ozone_bin.exists() and os.access(str(ozone_bin), os.X_OK)
+    except OSError:
+        return False
+
+def prompt(prompt_text: str, default: Optional[str] = None, secret: bool = 
False, yes_mode: bool = False) -> Optional[str]:
+    if yes_mode:
+        return default
+    if click is not None and sys.stdout.isatty():
+        try:
+            display = prompt_text
+            # logger.info(f"prompt_text: {prompt_text} , default: {default}")
+            if default:
+                display = f"{prompt_text} [default={default}]"
+            if secret:
+                return click.prompt(display, default=default, hide_input=True, 
show_default=False)
+            return click.prompt(display, default=default, show_default=False)
+        except (EOFError, KeyboardInterrupt):
+            return default
+    # Fallback to built-in input/getpass
+    try:
+        text = f"{prompt_text}: "
+        if default:
+            text = f"{prompt_text} [default={default}]: "
+        if secret:
+            import getpass
+            val = getpass.getpass(text)
+        else:
+            val = input(text)
+        if not val and default is not None:
+            return default
+        return val
+    except EOFError:
+        return default
+
+def _semver_key(v: str) -> Tuple[int, int, int, str]:
+    """
+    Convert version like '2.0.0' or '2.1.0-RC0' to a sortable key.
+    Pre-release suffix sorts before final.
+    """
+    try:
+        core, *rest = v.split("-", 1)
+        major, minor, patch = core.split(".")
+        suffix = rest[0] if rest else ""
+        return (int(major), int(minor), int(patch), suffix)
+    except Exception:
+        return (0, 0, 0, v)
+
+def _render_table(rows: List[Tuple[str, str]]) -> str:
+    """
+    Returns a simple two-column table string without extra dependencies.
+    """
+    if not rows:
+        return ""
+    col1_width = max(len(k) for k, _ in rows)
+    col2_width = max(len(str(v)) for _, v in rows)
+    sep = f"+-{'-' * col1_width}-+-{'-' * col2_width}-+"
+    out = [sep, f"| {'Field'.ljust(col1_width)} | {'Value'.ljust(col2_width)} 
|", sep]
+    for k, v in rows:
+        out.append(f"| {k.ljust(col1_width)} | {str(v).ljust(col2_width)} |")
+    out.append(sep)
+    return "\n".join(out)
+
+def _confirm_summary(rows: List[Tuple[str, str]], yes_mode: bool) -> bool:
+    """
+    Print the input summary table and ask user to continue. Returns True if 
confirmed.
+    """
+    logger = get_logger()
+    table = _render_table(rows)
+    if click is not None:
+        logger.info(table)
+        if yes_mode:
+            return True
+        return click.confirm("Proceed with these settings?", default=True)
+    else:
+        logger.info(table)
+        if yes_mode:
+            return True
+        answer = prompt("Proceed with these settings? (Y/n)", default="Y", 
yes_mode=False)
+        return str(answer or "Y").strip().lower() in ("y", "yes")
+
+def fetch_available_versions(dl_url: str, limit: int = 30) -> List[str]:
+    """
+    Fetch available Ozone versions from the download base. Returns 
newest-first.
+    """
+    try:
+        import urllib.request
+        with urllib.request.urlopen(dl_url, timeout=10) as resp:
+            html = resp.read().decode("utf-8", errors="ignore")
+        # Apache directory listing usually has anchors like href="2.0.0/"
+        candidates = set(m.group(1) for m in 
re.finditer(r'href="([0-9]+\.[0-9]+\.[0-9]+(?:-[A-Za-z0-9]+)?)\/"', html))
+        versions = sorted(candidates, key=_semver_key, reverse=True)
+        if limit and len(versions) > limit:
+            versions = versions[:limit]
+        return versions
+    except Exception:
+        return []
+
+def choose_version_interactive(versions: List[str], default_version: str, 
yes_mode: bool) -> Optional[str]:
+    """
+    Present a numbered list and prompt user to choose a version.
+    Returns selected version string or None if not chosen.
+    """
+    if not versions:
+        return None
+    if yes_mode:
+        return versions[0]
+    # Use click when available and interactive; otherwise fallback to basic 
prompt
+    if click is not None and sys.stdout.isatty():
+        click.echo("Available Ozone versions (newest first):")
+        for idx, ver in enumerate(versions, start=1):
+            click.echo(f"  {idx}) {ver}")
+        while True:
+            choice = prompt(
+                "Select number, type a version (e.g., 2.1.0) or 'local'",
+                default="1",
+                yes_mode=False,
+            )
+            if choice is None:
+                return versions[0]
+            choice = str(choice).strip()
+            if choice == "":
+                return versions[0]
+            if choice.lower() == "local":
+                return "local"
+            if choice.isdigit():
+                i = int(choice)
+                if 1 <= i <= len(versions):
+                    return versions[i - 1]
+            if re.match(r"^[0-9]+\.[0-9]+\.[0-9]+(?:-[A-Za-z0-9]+)?$", choice):
+                return choice
+            click.echo("Invalid selection. Enter a number, a valid version, or 
'local'.")
+    else:
+        logger = get_logger()
+        logger.info("Available Ozone versions:")
+        for idx, ver in enumerate(versions, start=1):
+            logger.info(f"  {idx}) {ver}")
+        while True:
+            choice = prompt("Select number, type a version (e.g., 2.1.0) or 
'local'", default="1", yes_mode=False)
+            if choice is None or str(choice).strip() == "":
+                return versions[0]
+            choice = str(choice).strip()
+            if choice.lower() == "local":
+                return "local"
+            if choice.isdigit():
+                i = int(choice)
+                if 1 <= i <= len(versions):
+                    return versions[i - 1]
+            # allow typing a specific version not listed
+            if re.match(r"^[0-9]+\.[0-9]+\.[0-9]+(?:-[A-Za-z0-9]+)?$", choice):
+                return choice
+            logger.info("Invalid selection. Please enter a number from the 
list, a valid version (e.g., 2.1.0) or 'local'.")
+
+def expand_braces(expr: str) -> List[str]:
+    # Supports simple pattern like prefix{1..N}suffix
+    if not expr or "{" not in expr or ".." not in expr or "}" not in expr:
+        return [expr]
+    m = re.search(r"(.*)\{(\d+)\.\.(\d+)\}(.*)", expr)
+    if not m:
+        return [expr]
+    pre, a, b, post = m.group(1), int(m.group(2)), int(m.group(3)), m.group(4)
+    return [f"{pre}{i}{post}" for i in range(a, b + 1)]
+
+def parse_hosts(hosts_raw: Optional[str]) -> List[dict]:
+    """
+    Accepts comma-separated hosts; each may contain brace expansion.
+    Returns list of dicts: {host, user, port}
+    """
+    if not hosts_raw:
+        return []
+    out = []
+    for token in hosts_raw.split(","):
+        token = token.strip()
+        expanded = expand_braces(token)
+        for item in expanded:
+            user = None
+            hostport = item
+            if "@" in item:
+                user, hostport = item.split("@", 1)
+            host = hostport
+            port = None
+            if ":" in hostport:
+                host, port = hostport.split(":", 1)
+            out.append({"host": host, "user": user, "port": port})
+    return out
+
+def auto_cluster_mode(hosts: List[dict], forced: Optional[str] = None) -> str:
+    if forced in ("non-ha", "ha"):
+        return forced
+    return "ha" if len(hosts) >= 3 else "non-ha"
+
+def build_inventory(hosts: List[dict], ssh_user: Optional[str] = None, 
keyfile: Optional[str] = None, password: Optional[str] = None, cluster_mode: 
str = "non-ha") -> str:
+    """
+    Returns INI inventory text for our groups: [om], [scm], [datanodes], 
[recon], [s3g]
+    """
+    if not hosts:
+        return ""
+    # Non-HA mapping: OM/SCM on first host; all hosts as datanodes; recon on 
first
+    if cluster_mode == "non-ha":
+        h = hosts[0]
+        return _render_inv_groups(
+            om=[h], scm=[h], dn=hosts, recon=[h], s3g=[h],
+            ssh_user=ssh_user, keyfile=keyfile, password=password
+        )
+    # HA: first 3 go to OM and SCM; all to datanodes; recon is first if present
+    om = hosts[:3] if len(hosts) >= 3 else hosts
+    scm = hosts[:3] if len(hosts) >= 3 else hosts
+    dn = hosts
+    recon = [hosts[0]]
+    s3g = [hosts[0]]
+    return _render_inv_groups(om=om, scm=scm, dn=dn, recon=recon, s3g=s3g,
+                              ssh_user=ssh_user, keyfile=keyfile, 
password=password)
+
+def _render_inv_groups(om: List[dict], scm: List[dict], dn: List[dict], recon: 
List[dict], s3g: List[dict], ssh_user: Optional[str] = None, keyfile: 
Optional[str] = None, password: Optional[str] = None) -> str:
+    def hostline(hd):
+        parts = [hd["host"]]
+        if ssh_user or hd.get("user"):
+            parts.append(f"ansible_user={(ssh_user or hd.get('user'))}")
+        if hd.get("port"):
+            parts.append(f"ansible_port={hd['port']}")
+        if keyfile:
+            
parts.append(f"ansible_ssh_private_key_file={shlex.quote(str(keyfile))}")
+        if password:
+            parts.append(f"ansible_password={shlex.quote(password)}")
+        return " ".join(parts)
+
+    sections = []
+    sections.append("[om]")
+    sections += [hostline(h) for h in om]
+    sections.append("\n[scm]")
+    sections += [hostline(h) for h in scm]
+    sections.append("\n[datanodes]")
+    sections += [hostline(h) for h in dn]
+    sections.append("\n[recon]")
+    sections += [hostline(h) for h in recon]
+    sections.append("\n[s3g]")
+    sections += [hostline(h) for h in s3g]
+    sections.append("\n")
+    return "\n".join(sections)
+
+def run_playbook(playbook: Path, inventory_path: Path, extra_vars_path: Path, 
ask_pass: bool = False, become: bool = True, start_at_task: Optional[str] = 
None, tags: Optional[List[str]] = None) -> int:
+    cmd = [
+        "ansible-playbook",
+        "-i", str(inventory_path),
+        str(playbook),
+        "-e", f"@{extra_vars_path}",
+    ]
+    if ask_pass:
+        cmd.append("-k")
+    if become:
+        cmd.append("--become")
+    if start_at_task:
+        cmd += ["--start-at-task", str(start_at_task)]
+    if tags:
+        cmd += ["--tags", ",".join(tags)]
+    env = os.environ.copy()
+    env["ANSIBLE_CONFIG"] = str(ANSIBLE_CFG)
+    # Route Ansible logs to the same file as the Python logger
+    log_path = LOGS_DIR / "ansible.log"
+    try:
+        logger = get_logger()
+        for h in logger.handlers:
+            if isinstance(h, logging.FileHandler):
+                # type: ignore[attr-defined]
+                log_path = Path(getattr(h, "baseFilename"))  # type: ignore
+                break
+    except Exception:
+        pass
+    env["ANSIBLE_LOG_PATH"] = str(log_path)
+    logger = get_logger()
+    if start_at_task:
+        logger.info(f"Resuming from task: {start_at_task}")
+    if tags:
+        logger.info(f"Using tags: {','.join(tags)}")
+    logger.info(f"Running: {' '.join(shlex.quote(c) for c in cmd)}")
+    return subprocess.call(cmd, env=env)
+
+def main(argv: List[str]) -> int:
+    args = parse_args(argv)
+    # Resume mode: reuse last provided configs and suppress prompts when 
possible
+    resuming = bool(getattr(args, "resume", False))
+    yes = True if resuming else bool(args.yes)
+    last_cfg = None
+    if resuming and LAST_RUN_FILE.exists():
+        try:
+            last_cfg = json.loads(LAST_RUN_FILE.read_text(encoding="utf-8"))
+        except Exception:
+            last_cfg = None
+
+    # Gather inputs interactively where missing
+    hosts_raw_default = (last_cfg.get("hosts_raw") if last_cfg else None)
+    hosts_raw = args.host or hosts_raw_default or prompt("Target host(s) 
[non-ha: host | HA: h1,h2,h3 or brace expansion]", default="", yes_mode=yes)
+    hosts = parse_hosts(hosts_raw) if hosts_raw else []
+    # Initialize per-run logger as soon as we have hosts_raw
+    try:
+        ts = datetime.now().strftime("%Y%m%d-%H%M%S")
+        raw_hosts_for_name = (hosts_raw or "").strip()
+        safe_hosts = re.sub(r"[^A-Za-z0-9_.-]+", "-", raw_hosts_for_name)[:80] 
or "hosts"
+        run_log_path = LOGS_DIR / f"ansible-{ts}-{safe_hosts}.log"
+        logger = get_logger(run_log_path)
+        logger.info(f"Logging to: {run_log_path}")
+    except Exception:
+        run_log_path = LOGS_DIR / "ansible.log"
+        logger = get_logger(run_log_path)
+        logger.info(f"Logging to: {run_log_path} (fallback)")
+
+    if not hosts:
+        logger.error("Error: No hosts provided (-H/--host).")
+        return 2
+    # Decide HA vs Non-HA with user input; default depends on host count
+    resume_cluster_mode = (last_cfg.get("cluster_mode") if last_cfg else None)
+    if args.cluster_mode:
+        cluster_mode = args.cluster_mode
+    elif resume_cluster_mode:
+        cluster_mode = resume_cluster_mode
+    else:
+        default_mode = "ha" if len(hosts) >= 3 else "non-ha"
+        selected = prompt("Deployment type (ha|non-ha)", default=default_mode, 
yes_mode=yes)
+        cluster_mode = (selected or default_mode).strip().lower()
+        if cluster_mode not in ("ha", "non-ha"):
+            cluster_mode = default_mode
+    if cluster_mode == "ha" and len(hosts) < 3:
+        logger.error("Error: HA requires at least 3 hosts (to map 3 OMs and 3 
SCMs).")
+        return 2
+
+    # Resolve download base early for version selection
+    dl_url = args.dl_url or (last_cfg.get("dl_url") if last_cfg else None) or 
DEFAULTS["dl_url"]
+    ozone_version = args.version or (last_cfg.get("ozone_version") if last_cfg 
else None)
+    if not ozone_version:
+        # Try to fetch available versions from dl_url and offer selection
+        versions = fetch_available_versions(dl_url or DEFAULTS["dl_url"])
+        selected = choose_version_interactive(versions, 
DEFAULTS["ozone_version"], yes_mode=yes)
+        if selected:
+            ozone_version = selected
+        else:
+            # Fallback prompt if fetch failed
+            ozone_version = prompt("Ozone version (e.g., 2.1.0 | local)", 
default=DEFAULTS["ozone_version"], yes_mode=yes)
+    jdk_major = args.jdk_version if args.jdk_version is not None else 
((last_cfg.get("jdk_major") if last_cfg else None))
+    if jdk_major is None:
+        _jdk_val = prompt("JDK major (17|21)", 
default=str(DEFAULTS["jdk_major"]), yes_mode=yes)
+        try:
+            jdk_major = int(str(_jdk_val)) if _jdk_val is not None else 
DEFAULTS["jdk_major"]
+        except Exception:
+            jdk_major = DEFAULTS["jdk_major"]
+    install_base = args.install_dir or (last_cfg.get("install_base") if 
last_cfg else None) \
+        or prompt("Install base directory (binaries and configs; e.g., 
/opt/ozone)", default=DEFAULTS["install_base"], yes_mode=yes)
+    data_base = args.data_dir or (last_cfg.get("data_base") if last_cfg else 
None) \
+        or prompt("Data base directory (metadata and DN data; e.g., 
/data/ozone)", default=DEFAULTS["data_base"], yes_mode=yes)
+
+    # Auth (before service user/group)
+    auth_method = args.auth_method or (last_cfg.get("auth_method") if last_cfg 
else None) \
+        or prompt("Auth method (key|password)", default="password", 
yes_mode=yes)
+    if auth_method not in ("key", "password"):
+        auth_method = "password"
+    ssh_user = args.ssh_user or (last_cfg.get("ssh_user") if last_cfg else 
None) \
+        or prompt("SSH username", default="root", yes_mode=yes)
+    password = args.password or ((last_cfg.get("password") if last_cfg else 
None))  # persisted for resume on request
+    keyfile = args.keyfile or (last_cfg.get("keyfile") if last_cfg else None)
+    if auth_method == "password" and not password:
+        password = prompt("SSH password", default="", secret=True, 
yes_mode=yes)
+    if auth_method == "key" and not keyfile:
+        keyfile = prompt("Path to SSH private key", default=str(Path.home() / 
".ssh" / "id_ed25519"), yes_mode=yes)
+    # Ensure we don't mix methods
+    if auth_method == "password":
+        keyfile = None
+    elif auth_method == "key":
+        password = None
+    service_user = args.service_user or (last_cfg.get("service_user") if 
last_cfg else None) \
+        or prompt("Service user", default=DEFAULTS["service_user"], 
yes_mode=yes)
+    service_group = args.service_group or (last_cfg.get("service_group") if 
last_cfg else None) \
+        or prompt("Service group", default=DEFAULTS["service_group"], 
yes_mode=yes)
+    dl_url = args.dl_url or (last_cfg.get("dl_url") if last_cfg else None) or 
DEFAULTS["dl_url"]
+    start_after_install = (args.start or (last_cfg.get("start_after_install") 
if last_cfg else None)
+                           or DEFAULTS["start_after_install"])
+    use_sudo = (args.use_sudo or (last_cfg.get("use_sudo") if last_cfg else 
None)
+                or DEFAULTS["use_sudo"])
+
+    # Local specifics (single path to local build)
+    local_path = (getattr(args, "local_path", None) or 
(last_cfg.get("local_path") if last_cfg else None))
+    local_shared_path = None
+    local_oz_dir = None
+    if ozone_version and ozone_version.lower() == "local":
+        # Accept a direct path to the ozone build dir (relative or absolute) 
and validate it.
+        # Backward-compat: if only legacy split values were saved previously, 
resolve them.
+        candidate = None
+        if local_path:
+            candidate = Path(local_path).expanduser().resolve()
+        else:
+            legacy_shared = (last_cfg.get("local_shared_path") if last_cfg 
else None)
+            legacy_dir = (last_cfg.get("local_ozone_dirname") if last_cfg else 
None)
+            if legacy_shared and legacy_dir:
+                candidate = Path(legacy_shared).expanduser().resolve() / 
legacy_dir
+
+        def ask_for_path():
+            val = prompt("Path to local Ozone build", default="", yes_mode=yes)
+            return Path(val).expanduser().resolve() if val else None
+
+        if candidate is None or not _validate_local_ozone_dir(candidate):
+            if yes:
+                logger.error("Error: For -v local, a valid Ozone build path 
containing bin/ozone is required.")
+                return 2
+            while True:
+                maybe = ask_for_path()
+                if maybe and _validate_local_ozone_dir(maybe):
+                    candidate = maybe
+                    break
+                logger.warning("Invalid path. Expected an Ozone build 
directory with bin/ozone. Please try again.")
+
+        # Normalize back to shared path + dirname for Ansible vars and 
persistable single path
+        local_shared_path = str(candidate.parent)
+        local_oz_dir = candidate.name
+        local_path = str(candidate)
+
+    # Build a human-friendly summary table of inputs before continuing
+    host_list_display = str(hosts_raw or "")
+    summary_rows: List[Tuple[str, str]] = [
+        ("Hosts", host_list_display),
+        ("Cluster mode", cluster_mode),
+        ("Ozone version", str(ozone_version)),
+        ("JDK major", str(jdk_major)),
+        ("Install base", str(install_base)),
+        ("Data base", str(data_base)),
+        ("SSH user", str(ssh_user)),
+        ("Auth method", str(auth_method))
+    ]
+    if keyfile:
+        summary_rows.append(("Key file", str(keyfile)))
+    summary_rows.extend([("Use sudo", str(bool(use_sudo))),
+                        ("Service user", str(service_user)),
+                        ("Service group", str(service_group)),
+                        ("Start after install", 
str(bool(start_after_install)))])
+    if ozone_version and str(ozone_version).lower() == "local":
+        summary_rows.append(("Local Ozone path", str(local_path or "")))
+    if not _confirm_summary(summary_rows, yes_mode=yes):
+        logger.info("Aborted by user.")
+        return 1
+
+    # Prepare dynamic inventory and extra-vars
+    inventory_text = build_inventory(hosts, ssh_user=ssh_user, 
keyfile=keyfile, password=password,
+                                     cluster_mode=cluster_mode)
+    # Decide cleanup behavior up-front (so we can pass it into the unified 
play)
+    do_cleanup = False
+    if args.clean:
+        do_cleanup = True
+    else:
+        answer = prompt(f"Cleanup existing install at {install_base} (if 
present)? (y/N)", default="n", yes_mode=yes)
+        if str(answer).strip().lower().startswith("y"):
+            do_cleanup = True
+
+    extra_vars = {
+        "cluster_mode": cluster_mode,
+        "install_base": install_base,
+        "data_base": data_base,
+        "jdk_major": jdk_major,
+        "service_user": service_user,
+        "service_group": service_group,
+        "dl_url": dl_url,
+        "ozone_version": ozone_version,
+        "start_after_install": bool(start_after_install),
+        "use_sudo": bool(use_sudo),
+        "do_cleanup": bool(do_cleanup),
+        "JAVA_MARKER": DEFAULTS["JAVA_MARKER"],
+        "ENV_MARKER": DEFAULTS["ENV_MARKER"],
+        "controller_logs_dir": str(LOGS_DIR),
+    }
+    if ozone_version and ozone_version.lower() == "local":
+        extra_vars.update({
+            "local_shared_path": local_shared_path or "",
+            "local_ozone_dirname": local_oz_dir or "",
+        })
+
+    ask_pass = (auth_method == "password" and not password)  # whether to 
forward -k; we embed password if provided
+
+    with tempfile.TemporaryDirectory() as tdir:
+        inv_path = Path(tdir) / "hosts.ini"
+        ev_path = Path(tdir) / "vars.json"
+        inv_path.write_text(inventory_text or "", encoding="utf-8")
+        ev_path.write_text(json.dumps(extra_vars, indent=2), encoding="utf-8")
+        # Persist last run configs (and use them for execution)
+        try:
+            os.makedirs(LOGS_DIR, exist_ok=True)
+            # Save inventory/vars for direct reuse
+            persisted_inv = LOGS_DIR / "last_inventory.ini"
+            persisted_ev = LOGS_DIR / "last_vars.json"
+            persisted_inv.write_text(inventory_text or "", encoding="utf-8")
+            persisted_ev.write_text(json.dumps(extra_vars, indent=2), 
encoding="utf-8")
+            # Point playbook execution to persisted files (consistent first 
run and reruns)
+            inv_path = persisted_inv
+            ev_path = persisted_ev
+            # Save effective simple config for future resume
+            LAST_RUN_FILE.write_text(json.dumps({
+                "hosts_raw": hosts_raw,
+                "cluster_mode": cluster_mode,
+                "ozone_version": ozone_version,
+                "jdk_major": jdk_major,
+                "install_base": install_base,
+                "data_base": data_base,
+                "auth_method": auth_method,
+                "ssh_user": ssh_user,
+                "password": password if auth_method == "password" else None,
+                "keyfile": str(keyfile) if keyfile else None,
+                "service_user": service_user,
+                "service_group": service_group,
+                "dl_url": dl_url,
+                "start_after_install": bool(start_after_install),
+                "use_sudo": bool(use_sudo),
+                "local_shared_path": local_shared_path or "",
+                "local_ozone_dirname": local_oz_dir or "",
+            }, indent=2), encoding="utf-8")
+        except Exception:
+            # Fall back to temp files if persisting fails
+            pass
+        # Roles order removed (no resume via tags)
+
+        # Install + (optional) start (single merged playbook)
+        playbook = PLAYBOOKS_DIR / "cluster.yml"
+        start_at = None
+        use_tags = None
+        if args.resume:
+            if LAST_FAILED_FILE.exists():
+                try:
+                    # use first line (task name)
+                    contents = 
LAST_FAILED_FILE.read_text(encoding="utf-8").splitlines()
+                    start_at = contents[0].strip() if contents else None
+                    # derive role tag if present
+                    role_line = next((l for l in contents if l.startswith("# 
role:")), None)
+                    if role_line:
+                        role_name = role_line.split(":", 1)[1].strip()
+                        if role_name:
+                            use_tags = [role_name]
+                except Exception:
+                    start_at = None
+        rc = run_playbook(playbook, inv_path, ev_path, ask_pass=ask_pass, 
become=True, start_at_task=start_at, tags=use_tags)
+        if rc != 0:
+            return rc
+
+        # Successful completion: remove last_* persisted files so a fresh run 
starts clean
+        try:
+            for f in LOGS_DIR.glob("last_*"):
+                try:
+                    f.unlink()
+                except FileNotFoundError:
+                    pass
+                except Exception:
+                    # Best-effort cleanup; ignore failures
+                    pass
+        except Exception:
+            pass
+
+        try:
+            example_host = hosts[0]["host"] if hosts else "HOSTNAME"
+            logger.info(f"To view process logs: ssh to the node and read 
{install_base}/current/logs/ozone-{service_user}-<process>-<host>.log "
+                        f"(e.g., 
{install_base}/current/logs/ozone-{service_user}-recon-{example_host}.log)")
+        except Exception:
+            pass
+    logger.info("All done.")
+    return 0
+
+if __name__ == "__main__":
+    sys.exit(main(sys.argv[1:]))
+
+
diff --git a/playbooks/cluster.yml b/playbooks/cluster.yml
new file mode 100644
index 0000000..fc46321
--- /dev/null
+++ b/playbooks/cluster.yml
@@ -0,0 +1,59 @@
+---
+- name: "Ozone Cluster Deployment"
+  hosts: all
+  gather_facts: false
+  vars:
+    # Expect cluster_mode to be passed in (non-ha | ha). Fallback to non-ha.
+    cluster_mode: "{{ cluster_mode | default('non-ha') }}"
+    ha_enabled: "{{ cluster_mode == 'ha' }}"
+  pre_tasks:
+    - name: "Pre-install: Ensure python3 present"
+      raw: |
+        if command -v apt-get >/dev/null 2>&1; then sudo -n apt-get update -y 
&& sudo -n apt-get install -y python3 || true;
+        elif command -v dnf >/dev/null 2>&1; then sudo -n dnf install -y 
python3 || true;
+        elif command -v yum >/dev/null 2>&1; then sudo -n yum install -y 
python3 || true;
+        elif command -v zypper >/dev/null 2>&1; then sudo -n zypper 
--non-interactive in -y python3 || true;
+        fi
+      args:
+        executable: /bin/bash
+      changed_when: false
+      failed_when: false
+
+    - name: "Pre-install: Gather facts"
+      setup:
+
+    - name: "Pre-install: Ensure Ansible remote tmp exists"
+      file:
+        path: "{{ (ansible_env.TMPDIR | default('/tmp')) ~ '/.ansible-' ~ 
ansible_user_id }}"
+        state: directory
+        mode: "0700"
+        owner: "{{ ansible_user_id }}"
+
+  roles:
+    - role: cleanup
+      tags: ["cleanup"]
+      when: (do_cleanup | default(false))
+    - role: ozone_user
+      tags: ["ozone_user"]
+    - role: ssh_bootstrap
+      tags: ["ssh_bootstrap"]
+    - role: java
+      tags: ["java"]
+    - role: ozone_layout
+      tags: ["ozone_layout"]
+    - role: ozone_fetch
+      tags: ["ozone_fetch"]
+    - role: ozone_config
+      tags: ["ozone_config"]
+    - role: ozone_service
+      tags: ["ozone_service"]
+      when: start_after_install | bool
+
+- name: "Ozone Smoke Test"
+  hosts: "{{ groups['om'] | list | first }}"
+  gather_facts: false
+  roles:
+    - role: ozone_ui
+      tags: ["ozone_ui"]
+    - role: ozone_smoke
+      tags: ["ozone_smoke"]
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..541effe
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,2 @@
+ansible-core==2.17.*
+click==8.*
diff --git a/roles/cleanup/tasks/main.yml b/roles/cleanup/tasks/main.yml
new file mode 100644
index 0000000..7187018
--- /dev/null
+++ b/roles/cleanup/tasks/main.yml
@@ -0,0 +1,52 @@
+---
+- name: "Check install_base presence"
+  stat:
+    path: "{{ install_base }}"
+  register: _st_install_base
+  become: true
+
+- name: "Set presence flag"
+  set_fact:
+    install_present: "{{ _st_install_base.stat.exists | default(false) }}"
+
+- name: "Skip cleanup when install_base is absent on this host"
+  debug:
+    msg: "install_base '{{ install_base }}' not present; skipping cleanup on 
this host"
+  when: not install_present
+  changed_when: false
+
+- name: "Perform cleanup when install_base exists"
+  when: install_present
+  block:
+    - name: "Set ozone bin path"
+      set_fact:
+        ozone_bin: "{{ install_base }}/current/bin/ozone"
+
+    - name: "Kill OMs/SCMs/Datanodes/Recon (if running)"
+      shell: |
+        pkill -KILL -f "{{ item }}"
+      become: true
+      failed_when: false
+      changed_when: false
+      loop:
+        - "org.apache.hadoop.ozone.om.OzoneManagerStarter"
+        - "org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter"
+        - "org.apache.hadoop.ozone.HddsDatanodeService"
+        - "org.apache.hadoop.ozone.recon.ReconServer"
+        - "org.apache.hadoop.ozone.s3.Gateway"
+      loop_control:
+        label: "{{ item }}"
+
+    - name: "Remove install base"
+      file:
+        path: "{{ install_base }}"
+        state: absent
+      become: true
+
+    - name: "Remove data base"
+      file:
+        path: "{{ data_base }}"
+        state: absent
+      become: true
+
+
diff --git a/roles/java/defaults/main.yml b/roles/java/defaults/main.yml
new file mode 100644
index 0000000..ad5afac
--- /dev/null
+++ b/roles/java/defaults/main.yml
@@ -0,0 +1,12 @@
+---
+jdk_major: 17
+
+# Candidate JAVA_HOME directories to probe; first existing will be used
+java_home_candidates:
+  - "/usr/lib/jvm/java-{{ jdk_major }}-openjdk"
+  - "/usr/lib/jvm/jre-{{ jdk_major }}-openjdk"
+  - "/usr/lib/jvm/jdk-{{ jdk_major }}"
+  - "/usr/lib/jvm/java-{{ jdk_major }}-openjdk-amd64"
+  - "/usr/lib64/jvm/java-{{ jdk_major }}-openjdk"
+
+
diff --git a/roles/java/tasks/main.yml b/roles/java/tasks/main.yml
new file mode 100644
index 0000000..bbe1d62
--- /dev/null
+++ b/roles/java/tasks/main.yml
@@ -0,0 +1,77 @@
+---
+- name: "Print OS family"
+  debug:
+    var: ansible_os_family
+
+- name: "Install OpenJDK on RedHat/Rocky/Suse family"
+  package:
+    name:
+      - "java-{{ jdk_major }}-openjdk"
+      - "java-{{ jdk_major }}-openjdk-devel"
+    state: present
+  when: ansible_os_family == "RedHat" or ansible_os_family == "Rocky" or 
ansible_os_family == "Suse"
+  become: true
+
+- name: "Install OpenJDK on Debian/Ubuntu family"
+  package:
+    name:
+      - "openjdk-{{ jdk_major }}-jdk"
+  when: ansible_os_family == "Debian" or ansible_os_family == "Ubuntu"
+  become: true
+
+- name: "Detect JAVA_HOME candidate"
+  stat:
+    path: "{{ item }}"
+  loop: "{{ java_home_candidates }}"
+  register: java_candidates
+  become: false
+
+- name: "Set ozone_java_home from first existing candidate"
+  set_fact:
+    ozone_java_home: "{{ (java_candidates.results | selectattr('stat.exists', 
'defined') | selectattr('stat.exists') | map(attribute='item') | list | first) 
| default('') }}"
+
+- name: "Compute runtime environment for Ozone commands"
+  set_fact:
+    ozone_runtime_env:
+      JAVA_HOME: "{{ ozone_java_home }}"
+      PATH: "{{ (ansible_env.PATH | 
default('/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin')) }}:{{ 
install_base }}/current/bin{{ (':' + ozone_java_home + '/bin') if 
(ozone_java_home | length > 0) else '' }}"
+      OZONE_CONF_DIR: "{{ install_base }}/current/etc/hadoop"
+      HADOOP_CONF_DIR: "{{ install_base }}/current/etc/hadoop"
+
+- name: "Persist ozone_runtime_env for resume (controller)"
+  delegate_to: localhost
+  run_once: true
+  become: false
+  vars:
+    last_vars_path: "{{ playbook_dir }}/../logs/last_vars.json"
+  block:
+    - name: "last_vars.json | Read"
+      slurp:
+        src: "{{ last_vars_path }}"
+      register: last_vars_slurp
+
+    - name: "last_vars.json | Merge ozone_runtime_env"
+      vars:
+        last_vars_json: "{{ (last_vars_slurp.content | b64decode | from_json) 
if (last_vars_slurp is defined and last_vars_slurp.content is defined) else {} 
}}"
+        merged_all: "{{ last_vars_json | combine({'ozone_runtime_env': 
ozone_runtime_env}, recursive=True) }}"
+      copy:
+        dest: "{{ last_vars_path }}"
+        content: "{{ merged_all | to_nice_json }}"
+        mode: "0644"
+
+- name: "Export JAVA_HOME and update PATH in profile.d/ozone.sh"
+  blockinfile:
+    path: "/etc/profile.d/ozone.sh"
+    create: true
+    owner: root
+    group: root
+    mode: "0644"
+    marker: "# {mark} {{ JAVA_MARKER }}"
+    block: |
+      {% if ozone_java_home | length > 0 %}
+      export JAVA_HOME="{{ ozone_java_home }}"
+      export PATH="$PATH:{{ ozone_java_home }}/bin"
+      {% endif %}
+  become: true
+
+
diff --git a/roles/ozone_config/defaults/main.yml 
b/roles/ozone_config/defaults/main.yml
new file mode 100644
index 0000000..a4757ca
--- /dev/null
+++ b/roles/ozone_config/defaults/main.yml
@@ -0,0 +1,6 @@
+---
+install_base: "/opt/ozone"
+data_base: "/data/ozone"
+CONFIG_DIR: ""   # if provided, can be used to feed additional properties via 
vars
+
+
diff --git a/roles/ozone_config/tasks/main.yml 
b/roles/ozone_config/tasks/main.yml
new file mode 100644
index 0000000..c4cd024
--- /dev/null
+++ b/roles/ozone_config/tasks/main.yml
@@ -0,0 +1,57 @@
+---
+- name: "Create etc dir"
+  file:
+    path: "{{ install_base }}/current/etc/hadoop"
+    state: directory
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    mode: "0755"
+  become: true
+
+- name: "Render ozone-hosts.yaml"
+  template:
+    src: "ozone-hosts.yaml.j2"
+    dest: "{{ install_base }}/current/etc/hadoop/ozone-hosts.yaml"
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    mode: "0644"
+  become: true
+
+- name: "Render ozone-site.xml"
+  template:
+    src: "ozone-site.xml.j2"
+    dest: "{{ install_base }}/current/etc/hadoop/ozone-site.xml"
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    mode: "0644"
+  become: true
+
+- name: "Render core-site.xml"
+  template:
+    src: "core-site.xml.j2"
+    dest: "{{ install_base }}/current/etc/hadoop/core-site.xml"
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    mode: "0644"
+  become: true
+
+- name: "Render ozone-env.sh"
+  template:
+    src: "ozone-env.sh.j2"
+    dest: "{{ install_base }}/current/etc/hadoop/ozone-env.sh"
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    mode: "0755"
+  become: true
+
+- name: "Render workers file for datanodes"
+  template:
+    src: "workers.j2"
+    dest: "{{ install_base }}/current/etc/hadoop/workers"
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    mode: "0644"
+  when: groups.get('datanodes', []) | length > 0
+  become: true
+
+
diff --git a/roles/ozone_config/templates/core-site.xml.j2 
b/roles/ozone_config/templates/core-site.xml.j2
new file mode 100644
index 0000000..0f116a4
--- /dev/null
+++ b/roles/ozone_config/templates/core-site.xml.j2
@@ -0,0 +1,15 @@
+<configuration>
+{% set om_hosts = (groups.get('om', []) | list) %}
+{% if (ha_enabled | default(false)) and (om_hosts|length > 1) %}
+  <property>
+    <name>fs.defaultFS</name>
+    <value>ofs://omservice</value>
+  </property>
+{% else %}
+  <property>
+    <name>fs.defaultFS</name>
+    <value>ofs://{{ om_hosts[0] }}:9862</value>
+  </property>
+{% endif %}
+</configuration>
+
diff --git a/roles/ozone_config/templates/ozone-env.sh.j2 
b/roles/ozone_config/templates/ozone-env.sh.j2
new file mode 100644
index 0000000..94b2f69
--- /dev/null
+++ b/roles/ozone_config/templates/ozone-env.sh.j2
@@ -0,0 +1,53 @@
+#!/usr/bin/env bash
+# Managed by Ansible
+
+export OZONE_OS_TYPE=${OZONE_OS_TYPE:-$(uname -s)}
+
+{% if ozone_java_home | default('') | length > 0 %}
+export JAVA_HOME="{{ ozone_java_home }}"
+export PATH="$PATH:{{ ozone_java_home }}/bin"
+{% endif %}
+export OZONE_HOME="{{ install_base }}/current"
+export PATH="$PATH:{{ install_base }}/current/bin"
+export OZONE_CONF_DIR="{{ install_base }}/current/etc/hadoop"
+export HADOOP_CONF_DIR="{{ install_base }}/current/etc/hadoop"
+
+# Relaxed module access for Java 17/21 (needed by Ozone and dependencies)
+export JAVA_TOOL_OPTIONS="${JAVA_TOOL_OPTIONS:+$JAVA_TOOL_OPTIONS} 
--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED"
+
+{% if ozone_opts | default('-XX:ParallelGCThreads=8') | length > 0 %}
+# Extra JVM options for all Ozone components
+export OZONE_OPTS="{{ ozone_opts | default('-XX:ParallelGCThreads=8') }}"
+{% endif %}
+
+export OZONE_OM_USER="{{ service_user }}"
+
+# export OZONE_HEAPSIZE_MAX=
+# export OZONE_HEAPSIZE_MIN=
+# export OZONE_OPTS="-Djava.net.preferIPv4Stack=true 
-Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
+# export OZONE_CLIENT_OPTS=""
+# export OZONE_CLASSPATH="/some/cool/path/on/your/machine"
+# export OZONE_USER_CLASSPATH_FIRST="yes"
+# export OZONE_USE_CLIENT_CLASSLOADER=true
+# export OZONE_SSH_OPTS="-o BatchMode=yes -o StrictHostKeyChecking=no -o 
ConnectTimeout=10s"
+# export OZONE_SSH_PARALLEL=10
+# export OZONE_WORKERS="${OZONE_CONF_DIR}/workers"
+# export OZONE_LOG_DIR=${OZONE_HOME}/logs
+# export OZONE_IDENT_STRING=$USER
+# export OZONE_STOP_TIMEOUT=5
+# export OZONE_PID_DIR=/tmp
+# export OZONE_ROOT_LOGGER=INFO,console
+# export OZONE_DAEMON_ROOT_LOGGER=INFO,RFA
+# export OZONE_SECURITY_LOGGER=INFO,NullAppender
+# export OZONE_NICENESS=0
+# export OZONE_POLICYFILE="hadoop-policy.xml"
+# export OZONE_GC_SETTINGS="-verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps"
+# export JSVC_HOME=/usr/bin
+# export OZONE_SECURE_PID_DIR=${OZONE_PID_DIR}
+# export OZONE_SECURE_LOG=${OZONE_LOG_DIR}
+# export OZONE_SECURE_IDENT_PRESERVE="true"
+# export OZONE_OM_OPTS=""
+# export OZONE_DATANODE_OPTS=""
+# export OZONE_SCM_OPTS=""
+# export OZONE_ENABLE_BUILD_PATHS="true"
+
diff --git a/roles/ozone_config/templates/ozone-hosts.yaml.j2 
b/roles/ozone_config/templates/ozone-hosts.yaml.j2
new file mode 100644
index 0000000..84b221e
--- /dev/null
+++ b/roles/ozone_config/templates/ozone-hosts.yaml.j2
@@ -0,0 +1,30 @@
+om:
+{% if (ha_enabled | default(false)) %}
+{%   for h in (groups.get('om', []) | default([])) %}
+  - {{ h | regex_replace('^.*@','') | regex_replace(':.*$','') }}
+{%   endfor %}
+{% else %}
+{%   if (groups.get('om', []) | default([])) | length > 0 %}
+  - {{ (groups.get('om', [])[0]) | regex_replace('^.*@','') | 
regex_replace(':.*$','') }}
+{%   endif %}
+{% endif %}
+scm:
+{% if (ha_enabled | default(false)) %}
+{%   for h in (groups.get('scm', []) | default([])) %}
+  - {{ h | regex_replace('^.*@','') | regex_replace(':.*$','') }}
+{%   endfor %}
+{% else %}
+{%   if (groups.get('scm', []) | default([])) | length > 0 %}
+  - {{ (groups.get('scm', [])[0]) | regex_replace('^.*@','') | 
regex_replace(':.*$','') }}
+{%   endif %}
+{% endif %}
+datanodes:
+{% for h in (groups.get('datanodes', []) | default([])) %}
+  - {{ h | regex_replace('^.*@','') | regex_replace(':.*$','') }}
+{% endfor %}
+recon:
+{% if (groups.get('recon', []) | default([])) | length > 0 %}
+  - {{ (groups.get('recon', [])[0]) | regex_replace('^.*@','') | 
regex_replace(':.*$','') }}
+{% endif %}
+
+
diff --git a/roles/ozone_config/templates/ozone-site.xml.j2 
b/roles/ozone_config/templates/ozone-site.xml.j2
new file mode 100644
index 0000000..9001a5b
--- /dev/null
+++ b/roles/ozone_config/templates/ozone-site.xml.j2
@@ -0,0 +1,128 @@
+<configuration>
+  <!-- Minimal Ozone site config; extend via group_vars if needed -->
+{% set _om_all = groups.get('om', [])| list %}
+{% set _scm_all = groups.get('scm', []) | list %}
+{% set _all_dn_count = groups.get('datanodes', []) | list | length %}
+{% set recon_hosts = groups.get('recon', []) | list %}
+{% set s3g_hosts = groups.get('s3g', []) | list %}
+{% set om_hosts = (_om_all[:1] if not (ha_enabled | default(false)) else 
_om_all) %}
+{% set scm_hosts = (_scm_all[:1] if not (ha_enabled | default(false)) else 
_scm_all) %}
+
+{% if scm_hosts|length > 0 %}
+  <property>
+    <name>ozone.scm.names</name>
+    <value>{{ scm_hosts | join(',') }}</value>
+  </property>
+  <property>
+    <name>ozone.scm.client.address</name>
+    <value>{{ scm_hosts | join(':9860,') }}:9860</value>
+  </property>
+  <property>
+    <name>ozone.scm.datanode.address</name>
+    <value>{{ scm_hosts | join(':9861,') }}:9861</value>
+  </property>
+{% endif %}
+{% if scm_hosts|length > 1 %}
+  <property>
+    <name>ozone.scm.primordial.node.id</name>
+    <value>{{ scm_hosts[0] }}</value>
+  </property>
+  <property>
+    <name>ozone.scm.service.ids</name>
+    <value>scmservice</value>
+  </property>
+  <property>
+    <name>ozone.scm.nodes.scmservice</name>
+    <value>{% for i in range(scm_hosts|length) %}{{ 'scm' ~ (i+1) }}{% if not 
loop.last %},{% endif %}{% endfor %}</value>
+  </property>
+{% for h in scm_hosts %}
+  <property>
+    <name>ozone.scm.address.scmservice.scm{{ loop.index }}</name>
+    <value>{{ h }}</value>
+  </property>
+{% endfor %}
+{% endif %}
+{% if om_hosts|length == 1 %}
+  <property>
+    <name>ozone.om.address</name>
+    <value>{{ om_hosts[0] }}:9862</value>
+  </property>
+{% elif om_hosts|length > 1 %}
+  <property>
+    <name>ozone.om.service.ids</name>
+    <value>omservice</value>
+  </property>
+  <property>
+    <name>ozone.om.nodes.omservice</name>
+    <value>{% for i in range(om_hosts|length) %}{{ 'om' ~ (i+1) }}{% if not 
loop.last %},{% endif %}{% endfor %}</value>
+  </property>
+{% for h in om_hosts %}
+  <property>
+    <name>ozone.om.address.omservice.om{{ loop.index }}</name>
+    <value>{{ h }}:9862</value>
+  </property>
+{% endfor %}
+{% endif %}
+{% if recon_hosts|length > 0 %}
+  <property>
+    <name>ozone.recon.http-address</name>
+    <value>{{ recon_hosts[0] }}:9888</value>
+  </property>
+  <property>
+    <name>ozone.recon.address</name>
+    <value>{{ recon_hosts[0] }}:9891</value>
+  </property>
+{% endif %}
+{% if s3g_hosts|length > 0 %}
+  <property>
+    <name>ozone.s3g.http-address</name>
+    <value>{{ s3g_hosts[0] }}:9878</value>
+  </property>
+  <property>
+    <name>ozone.s3g.webadmin.http-address</name>
+    <value>{{ s3g_hosts[0] }}:19878</value>
+  </property>
+{% endif %}
+  <property>
+    <name>ozone.metadata.dirs</name>
+    <value>{{ data_base }}/meta</value>
+  </property>
+  <property>
+    <name>hdds.datanode.dir</name>
+    <value>{{ data_base }}/dn</value>
+  </property>
+  <property>
+    <name>dfs.container.ratis.datanode.storage.dir</name>
+    <value>{{ data_base }}/meta/dn</value>
+  </property>
+  <property>
+    <name>ozone.om.db.dirs</name>
+    <value>{{ data_base }}/data/om</value>
+  </property>
+  <property>
+    <name>ozone.om.ratis.snapshot.dir</name>
+    <value>{{ data_base }}/meta/om</value>
+  </property>
+  <property>
+    <name>ozone.scm.db.dirs</name>
+    <value>{{ data_base }}/data/scm</value>
+  </property>
+  <property>
+    <name>ozone.scm.datanode.id.dir</name>
+    <value>{{ data_base }}/meta/scm</value>
+  </property>
+  <property>
+    <name>ozone.scm.skip.bootstrap.validation</name>
+    <value>true</value>
+  </property>
+  <property>
+    <name>ozone.replication</name>
+{% if _all_dn_count < 3 %}
+    <value>ONE</value>
+{% else %}
+    <value>THREE</value>
+{% endif %}
+  </property>
+</configuration>
+
+
diff --git a/roles/ozone_config/templates/workers.j2 
b/roles/ozone_config/templates/workers.j2
new file mode 100644
index 0000000..482ffd0
--- /dev/null
+++ b/roles/ozone_config/templates/workers.j2
@@ -0,0 +1,3 @@
+{% for h in (groups.get('datanodes', []) | list) %}
+{{ h }}
+{% endfor %}
diff --git a/roles/ozone_fetch/defaults/main.yml 
b/roles/ozone_fetch/defaults/main.yml
new file mode 100644
index 0000000..8c96ade
--- /dev/null
+++ b/roles/ozone_fetch/defaults/main.yml
@@ -0,0 +1,9 @@
+---
+ozone_version: "2.0.0"     # "local" also supported
+dl_url: "https://dlcdn.apache.org/ozone";
+
+# Local snapshot settings (controller side)
+local_shared_path: ""
+local_ozone_dirname: ""
+
+
diff --git a/roles/ozone_fetch/tasks/main.yml b/roles/ozone_fetch/tasks/main.yml
new file mode 100644
index 0000000..67d4a77
--- /dev/null
+++ b/roles/ozone_fetch/tasks/main.yml
@@ -0,0 +1,113 @@
+---
+- name: "Ensure install_base exists"
+  file:
+    path: "{{ install_base }}"
+    state: directory
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    mode: "0755"
+  become: true
+
+- name: "Normalize source mode"
+  set_fact:
+    _src_mode: "{{ ozone_version | lower }}"
+
+- name: "Upstream | Download tarball"
+  get_url:
+    url: "{{ dl_url | trim('/') }}/{{ ozone_version }}/ozone-{{ ozone_version 
}}.tar.gz"
+    dest: "{{ install_base }}/ozone-{{ ozone_version }}.tar.gz"
+    mode: "0644"
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    timeout: 60
+  register: download_result
+  retries: 5
+  delay: 10
+  until: download_result is succeeded
+  when: _src_mode != 'local'
+  become: true
+
+- name: "Upstream | Ensure tarball ownership"
+  file:
+    path: "{{ install_base }}/ozone-{{ ozone_version }}.tar.gz"
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    state: file
+  when: _src_mode != 'local'
+  become: true
+
+- name: "Upstream | Unarchive to install_base"
+  unarchive:
+    src: "{{ install_base }}/ozone-{{ ozone_version }}.tar.gz"
+    dest: "{{ install_base }}"
+    remote_src: true
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+  when: _src_mode != 'local'
+  become: true
+
+- name: "Upstream | Link current"
+  file:
+    src: "{{ install_base }}/ozone-{{ ozone_version }}"
+    dest: "{{ install_base }}/current"
+    state: link
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+  when: _src_mode != 'local'
+  become: true
+
+- name: "Upstream | Remove downloaded tarball after extraction"
+  file:
+    path: "{{ install_base }}/ozone-{{ ozone_version }}.tar.gz"
+    state: absent
+  when: _src_mode != 'local'
+  become: true
+
+- name: "Local | Create tarball on controller"
+  delegate_to: localhost
+  run_once: true
+  become: false
+  vars:
+    ansible_become: false
+  command:
+    argv:
+      - tar
+      - -czf
+      - "/tmp/{{ local_ozone_dirname }}.tar.gz"
+      - "{{ local_ozone_dirname }}"
+  args:
+    chdir: "{{ local_shared_path }}"
+    creates: "/tmp/{{ local_ozone_dirname }}.tar.gz"
+  when: _src_mode == 'local'
+
+- name: "Local | Unarchive local tarball to install_base"
+  unarchive:
+    src: "/tmp/{{ local_ozone_dirname }}.tar.gz"    # on controller
+    dest: "{{ install_base }}"                      # on remote
+    remote_src: false                               # transfer then extract
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    keep_newer: true
+  when: _src_mode == 'local'
+  become: true
+
+- name: "Local | Link current"
+  file:
+    src: "{{ install_base }}/{{ local_ozone_dirname }}"
+    dest: "{{ install_base }}/current"
+    state: link
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+  when: _src_mode == 'local'
+  become: true
+
+- name: "Local | Remove controller tarball after extraction"
+  delegate_to: localhost
+  run_once: true
+  become: false
+  vars:
+    ansible_become: false
+  file:
+    path: "/tmp/{{ local_ozone_dirname }}.tar.gz"
+    state: absent
+  when: _src_mode == 'local'
diff --git a/roles/ozone_layout/defaults/main.yml 
b/roles/ozone_layout/defaults/main.yml
new file mode 100644
index 0000000..c8c5df7
--- /dev/null
+++ b/roles/ozone_layout/defaults/main.yml
@@ -0,0 +1,3 @@
+---
+install_base: "/opt/ozone"
+data_base: "/data/ozone"
diff --git a/roles/ozone_layout/tasks/main.yml 
b/roles/ozone_layout/tasks/main.yml
new file mode 100644
index 0000000..0e13135
--- /dev/null
+++ b/roles/ozone_layout/tasks/main.yml
@@ -0,0 +1,29 @@
+---
+- name: "Create install and data directories"
+  file:
+    path: "{{ item }}"
+    state: directory
+    owner: "{{ service_user }}"
+    group: "{{ service_group }}"
+    mode: "0755"
+  loop:
+    - "{{ install_base }}"
+    - "{{ data_base }}"
+    - "{{ data_base }}/dn"
+    - "{{ data_base }}/meta"
+  become: true
+
+- name: "Ensure OZONE_HOME and PATH are in profile.d/ozone.sh"
+  blockinfile:
+    path: "/etc/profile.d/ozone.sh"
+    create: true
+    owner: root
+    group: root
+    mode: "0644"
+    marker: "# {mark} {{ ENV_MARKER }}"
+    block: |
+      export OZONE_HOME="{{ install_base }}/current"
+      export PATH="$PATH:{{ install_base }}/current/bin"
+  become: true
+
+
diff --git a/roles/ozone_service/tasks/main.yml 
b/roles/ozone_service/tasks/main.yml
new file mode 100644
index 0000000..3edf1e9
--- /dev/null
+++ b/roles/ozone_service/tasks/main.yml
@@ -0,0 +1,100 @@
+---
+
+# Common service command context for HA and Non-HA
+- name: "Ozone Service: Start SCM/OM"
+  become: true
+  become_user: "{{ service_user }}"
+  become_flags: "-i"
+  environment: "{{ ozone_runtime_env }}"
+  block:
+    - name: "Initialize/Start first SCM/OM"
+      block:
+        - name: "Initialize first SCM"
+          command: "ozone scm --init"
+          args:
+            creates: "{{ data_base }}/meta/scm"
+          when: (groups['scm'] | length > 0) and (inventory_hostname == 
groups['scm'][0])
+          register: scm_init_first
+          failed_when: scm_init_first.rc != 0
+
+        - name: "Start first SCM"
+          command: "ozone --daemon start scm"
+          when: (groups['scm'] | length > 0) and (inventory_hostname == 
groups['scm'][0])
+          register: scm_start_first
+          failed_when: scm_start_first.rc != 0
+
+        - name: "Initialize first OM"
+          command: "ozone om --init"
+          args:
+            creates: "{{ data_base }}/meta/om"
+          when: (groups['om'] | length > 0) and (inventory_hostname == 
groups['om'][0])
+          register: om_init_first
+          failed_when: om_init_first.rc != 0
+
+        - name: "Start first OM"
+          command: "ozone --daemon start om"
+          when: (groups['om'] | length > 0) and (inventory_hostname == 
groups['om'][0])
+          register: om_start_first
+          failed_when: om_start_first.rc != 0
+
+    - name: "Start/Init remaining SCM/OM (HA only)"
+      when: (ha_enabled | default(false))
+      block:
+        - name: "SCM bootstrap on remaining SCMs"
+          command: "ozone scm --bootstrap"
+          when: "'scm' in groups and (groups['scm'] | length > 1) and 
(inventory_hostname in groups['scm'][1:])"
+          register: scm_bootstrap_rest
+          failed_when: scm_bootstrap_rest.rc != 0
+
+        - name: "Start SCM on remaining SCMs"
+          command: "ozone --daemon start scm"
+          when: "'scm' in groups and (groups['scm'] | length > 1) and 
(inventory_hostname in groups['scm'][1:])"
+          register: scm_start_rest
+          failed_when: scm_start_rest.rc != 0
+
+        - name: "OM init on remaining OMs"
+          command: "ozone om --init"
+          when: "'om' in groups and (groups['om'] | length > 1) and 
(inventory_hostname in groups['om'][1:])"
+          register: om_init_rest
+          failed_when: om_init_rest.rc != 0
+
+        - name: "Start OM on remaining OMs"
+          command: "ozone --daemon start om"
+          when: "'om' in groups and (groups['om'] | length > 1) and 
(inventory_hostname in groups['om'][1:])"
+          register: om_start_rest
+          failed_when: om_start_rest.rc != 0
+
+- name: "Ozone Service: Start Datanodes and Recon"
+  become: true
+  become_user: "{{ service_user }}"
+  become_flags: "-i"
+  environment: "{{ ozone_runtime_env }}"
+  block:
+    - name: "Start Datanodes"
+      command: "ozone --daemon start datanode"
+      when: inventory_hostname in (groups.get('datanodes', []))
+      async: 300
+      poll: 0
+      register: dn_job
+
+    - name: "Wait for Datanode start to complete"
+      when: inventory_hostname in (groups.get('datanodes', []))
+      async_status:
+        jid: "{{ dn_job.ansible_job_id }}"
+      register: dn_wait
+      until: dn_wait.finished
+      failed_when: (dn_wait.rc | default(0)) != 0
+
+    - name: "Start Recon on first recon host"
+      command: "ozone --daemon start recon"
+      when: (groups.get('recon', []) | length > 0) and (inventory_hostname == 
groups['recon'][0])
+      register: recon_start
+      failed_when: recon_start.rc != 0
+
+    - name: "Start S3G on first s3g host"
+      command: "ozone --daemon start s3g"
+      when: (groups.get('s3g', []) | length > 0) and (inventory_hostname == 
groups['s3g'][0])
+      register: s3g_start
+      failed_when: s3g_start.rc != 0
+
+
diff --git a/roles/ozone_smoke/tasks/main.yml b/roles/ozone_smoke/tasks/main.yml
new file mode 100644
index 0000000..d98a9e7
--- /dev/null
+++ b/roles/ozone_smoke/tasks/main.yml
@@ -0,0 +1,91 @@
+---
+- name: "Set replication factor"
+  set_fact:
+    create_key_cmd: "{{ 'sh key put --type RATIS --replication ONE' if 
groups.get('datanodes', []) | length < 3 else 'sh key put' }}"
+    vol: "demovol"
+    bucket: "demobuck"
+    s3g_bucket: "demos3g"
+    key: "demokey"
+    ozone_bin: "{{ install_base }}/current/bin/ozone"
+
+- name: "Print ozone command to create key based on Datanode count"
+  debug:
+    msg: "{{ create_key_cmd }}"
+
+- name: "Run basic smoke commands"
+  shell: |
+    set -euo pipefail
+    dd if=/dev/zero of=/tmp/oz_smoke.bin bs=1M count=1 status=none
+    {{ ozone_bin }} sh vol create {{ vol }} || true
+    {{ ozone_bin }} sh bucket create {{ vol }}/{{ bucket }} || true
+    {{ ozone_bin }} {{ create_key_cmd }} {{ vol }}/{{ bucket }}/{{ key }} 
/tmp/oz_smoke.bin
+    rm -f /tmp/oz_smoke.bin
+  args:
+    executable: /bin/bash
+  register: smoke_commands_result
+  failed_when: smoke_commands_result.rc != 0
+  run_once: true
+  become: true
+  become_user: "{{ service_user }}"
+
+- name: "Verify key info"
+  shell: |
+    set -euo pipefail
+    {{ ozone_bin }} sh key info {{ vol }}/{{ bucket }}/{{ key }}
+  args:
+    executable: /bin/bash
+  register: key_info
+  failed_when: key_info.rc != 0
+  run_once: true
+  become: true
+  become_user: "{{ service_user }}"
+
+- name: "Show key info"
+  debug:
+    msg:
+      - "Stdout: {{ (key_info.stdout_lines | default([])) | join('\n') }}"
+      - "Stderr: {{ (key_info.stderr_lines | default([])) | join('\n') }}"
+  run_once: true
+
+- name: "Create test bucket on S3G host (if present)"
+  block:
+    - name: "Install awscli on S3G host"
+      package:
+        name: awscli
+        state: present
+      become: true
+
+    - name: "AWS CLI configure dummy credentials for S3G tests"
+      shell: |
+        set -euo pipefail
+        aws configure set aws_access_key_id dummy
+        aws configure set aws_secret_access_key dummy
+      args:
+        executable: /bin/bash
+
+    - name: "AWS CLI S3G: create test bucket '{{ s3g_bucket }}'"
+      shell: |
+        set -o pipefail
+        aws s3api create-bucket --bucket {{ s3g_bucket }} --endpoint-url 
"http://{{ groups['s3g'][0] }}:9878" || true
+      args:
+        executable: /bin/bash
+      register: aws_create_result
+      changed_when: false
+
+    - name: "AWS CLI S3G: list buckets"
+      shell: |
+        set -o pipefail
+        aws s3api list-buckets --endpoint-url "http://{{ groups['s3g'][0] 
}}:9878"
+      args:
+        executable: /bin/bash
+      register: aws_list_result
+      changed_when: false
+
+    - name: "Show AWS CLI S3G check output"
+      debug:
+        msg:
+          - "Create bucket output: {{ (aws_create_result.stdout | default('')) 
}}"
+          - "List buckets output: {{ (aws_list_result.stdout | default('')) }}"
+  when:
+    - groups.get('s3g', []) | length > 0
+    - inventory_hostname == groups['s3g'][0]
\ No newline at end of file
diff --git a/roles/ozone_ui/tasks/main.yml b/roles/ozone_ui/tasks/main.yml
new file mode 100644
index 0000000..d4b3f73
--- /dev/null
+++ b/roles/ozone_ui/tasks/main.yml
@@ -0,0 +1,32 @@
+## Print and export service UI endpoints
+- name: "Compute service UI URLs"
+  set_fact:
+    _om_hosts_ui: "{{ groups.get('om', []) | list }}"
+    _scm_hosts_ui: "{{ groups.get('scm', []) | list }}"
+    _recon_hosts_ui: "{{ groups.get('recon', []) | list }}"
+    _s3g_hosts_ui: "{{ groups.get('s3g', []) | list }}"
+    ui_urls:
+      om: "{{ _om_hosts_ui | map('regex_replace','^(.*)$','http://\\1:9874') | 
list }}"
+      scm: "{{ _scm_hosts_ui | map('regex_replace','^(.*)$','http://\\1:9876') 
| list }}"
+      recon: "{{ (_recon_hosts_ui | length > 0) | ternary(['http://' + 
_recon_hosts_ui[0] + ':9888'], []) }}"
+      s3g_http: "{{ _s3g_hosts_ui | 
map('regex_replace','^(.*)$','http://\\1:9878') | list }}"
+      s3g_admin: "{{ _s3g_hosts_ui | 
map('regex_replace','^(.*)$','http://\\1:19878') | list }}"
+
+- name: "Service UI Endpoints"
+  debug:
+    msg:
+      - "OM UI: {{ ui_urls.om }}"
+      - "SCM UI: {{ ui_urls.scm }}"
+      - "Recon UI: {{ ui_urls.recon }}"
+      - "S3G HTTP: {{ ui_urls.s3g_http }}"
+      - "S3G Admin: {{ ui_urls.s3g_admin }}"
+  run_once: true
+
+- name: "Export UI endpoints to controller logs directory"
+  copy:
+    content: "{{ ui_urls | to_nice_json }}"
+    dest: "{{ controller_logs_dir }}/ui_urls.json"
+    mode: "0644"
+  delegate_to: localhost
+  run_once: true
+  when: controller_logs_dir is defined
\ No newline at end of file
diff --git a/roles/ozone_user/defaults/main.yml 
b/roles/ozone_user/defaults/main.yml
new file mode 100644
index 0000000..e798044
--- /dev/null
+++ b/roles/ozone_user/defaults/main.yml
@@ -0,0 +1,6 @@
+---
+service_user: "ozone"
+service_group: "ozone"
+service_shell: "/bin/bash"
+
+
diff --git a/roles/ozone_user/tasks/main.yml b/roles/ozone_user/tasks/main.yml
new file mode 100644
index 0000000..7943d48
--- /dev/null
+++ b/roles/ozone_user/tasks/main.yml
@@ -0,0 +1,33 @@
+---
+- name: "Ensure service group exists"
+  group:
+    name: "{{ service_group }}"
+    state: present
+  become: true
+
+- name: "Ensure service user exists"
+  user:
+    name: "{{ service_user }}"
+    group: "{{ service_group }}"
+    shell: "{{ service_shell }}"
+    create_home: true
+    state: present
+  become: true
+
+- name: "Unlock service user account"
+  command: "passwd -u {{ service_user }}"
+  register: unlock_out
+  changed_when: unlock_out.rc == 0
+  failed_when: false
+  become: true
+
+- name: "Ensure home directory permissions"
+  file:
+    path: "{{ (service_user == 'root') | ternary('/root', '/home/' + 
service_user) }}"
+    state: directory
+    owner: "{{ (service_user == 'root') | ternary('root', service_user) }}"
+    group: "{{ (service_user == 'root') | ternary('root', service_user) }}"
+    mode: "0755"
+  become: true
+
+
diff --git a/roles/ssh_bootstrap/defaults/main.yml 
b/roles/ssh_bootstrap/defaults/main.yml
new file mode 100644
index 0000000..671be53
--- /dev/null
+++ b/roles/ssh_bootstrap/defaults/main.yml
@@ -0,0 +1,14 @@
+---
+# Whether to deploy a cluster-wide SSH identity (private key) to the service 
user
+allow_cluster_ssh_key_deploy: false
+
+# Optional paths on the controller for installer SSH keys
+ssh_public_key_path: ""
+ssh_private_key_path: ""
+
+# Target users for authorized_keys installation
+authorized_key_users:
+  - "{{ service_user }}"
+  - "root"
+
+
diff --git a/roles/ssh_bootstrap/tasks/main.yml 
b/roles/ssh_bootstrap/tasks/main.yml
new file mode 100644
index 0000000..35877d8
--- /dev/null
+++ b/roles/ssh_bootstrap/tasks/main.yml
@@ -0,0 +1,72 @@
+---
+- name: "Ensure .ssh directory exists for target users"
+  file:
+    path: "{{ (item == 'root') | ternary('/root/.ssh', '/home/' + item + 
'/.ssh') }}"
+    state: directory
+    owner: "{{ (item == 'root') | ternary('root', item) }}"
+    group: "{{ (item == 'root') | ternary('root', item) }}"
+    mode: "0700"
+  loop: "{{ authorized_key_users | unique }}"
+  when: item | length > 0
+  become: true
+
+- name: "Install authorized public key for users (if provided)"
+  file:
+    path: "{{ (item == 'root') | ternary('/root/.ssh/authorized_keys', 
'/home/' + item + '/.ssh/authorized_keys') }}"
+    state: touch
+    owner: "{{ (item == 'root') | ternary('root', item) }}"
+    group: "{{ (item == 'root') | ternary('root', item) }}"
+    mode: "0600"
+  loop: "{{ authorized_key_users | unique }}"
+  when:
+    - ssh_public_key_path | length > 0
+    - item | length > 0
+  become: true
+
+- name: "Append authorized public key for users"
+  lineinfile:
+    path: "{{ (item == 'root') | ternary('/root/.ssh/authorized_keys', 
'/home/' + item + '/.ssh/authorized_keys') }}"
+    create: yes
+    line: "{{ lookup('file', ssh_public_key_path) }}"
+    state: present
+    insertafter: EOF
+  loop: "{{ authorized_key_users | unique }}"
+  when:
+    - ssh_public_key_path | length > 0
+    - item | length > 0
+  become: true
+
+- name: "Deploy cluster SSH private key to service user (opt-in)"
+  copy:
+    src: "{{ ssh_private_key_path }}"
+    dest: "{{ (service_user == 'root') | ternary('/root/.ssh/id_ed25519', 
'/home/' + service_user + '/.ssh/id_ed25519') }}"
+    owner: "{{ (service_user == 'root') | ternary('root', service_user) }}"
+    group: "{{ (service_user == 'root') | ternary('root', service_user) }}"
+    mode: "0600"
+  when:
+    - allow_cluster_ssh_key_deploy | bool
+    - ssh_private_key_path | length > 0
+  become: true
+
+- name: "Ensure public half exists for deployed private key"
+  shell: "ssh-keygen -y -f {{ (service_user == 'root') | 
ternary('/root/.ssh/id_ed25519', '/home/' + service_user + '/.ssh/id_ed25519') 
}} > {{ (service_user == 'root') | ternary('/root/.ssh/id_ed25519.pub', 
'/home/' + service_user + '/.ssh/id_ed25519.pub') }}"
+  args:
+    creates: "{{ (service_user == 'root') | 
ternary('/root/.ssh/id_ed25519.pub', '/home/' + service_user + 
'/.ssh/id_ed25519.pub') }}"
+  when: allow_cluster_ssh_key_deploy | bool
+  become: true
+
+- name: "Add passwordless SSH config for users"
+  copy:
+    dest: "{{ (item == 'root') | ternary('/root/.ssh/config', '/home/' + item 
+ '/.ssh/config') }}"
+    owner: "{{ (item == 'root') | ternary('root', item) }}"
+    group: "{{ (item == 'root') | ternary('root', item) }}"
+    mode: "0600"
+    content: |
+      Host *
+        StrictHostKeyChecking no
+        UserKnownHostsFile /dev/null
+  loop: "{{ authorized_key_users | unique }}"
+  when: item | length > 0
+  become: true
+
+


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to