Public bug reported: [Impact] An active ceph-mgr crashes and another ceph-mgr takes over and becomes the active mgr. But this could again hit same issue and crash and the cycle can continue indefinitely (previously crashed ceph-mgr gets restarted by systemd).
This could affect the cluster stability/usability as ceph mgr handles a number of essential operations (modules that control/change Ceph cluster behaviour, metrics, etc). [Test Plan] Deploy and operate a Ceph cluster normally. Increase the log level of mgr to 20. Observe MMgrReport sent from non-active mgrs get ignored (no crash). [Where problems could occur] Possibly the fix may not actually fix and mgr continue to crash as before. Might incorrectly ignore reports from active mgrs. [Other Info] Upstream main bug: https://tracker.ceph.com/issues/48022 Octopus backport PR: https://github.com/ceph/ceph/pull/43861 Octopus backport bug: https://tracker.ceph.com/issues/53198 This has been already been fixed and available in Pacific. So needed to backport only for Octopus. ** Affects: ceph (Ubuntu) Importance: High Assignee: Ponnuvel Palaniyappan (pponnuvel) Status: In Progress ** Affects: ceph (Ubuntu Focal) Importance: High Assignee: Ponnuvel Palaniyappan (pponnuvel) Status: In Progress ** Tags: sts ** Changed in: ceph (Ubuntu) Assignee: (unassigned) => Ponnuvel Palaniyappan (pponnuvel) ** Changed in: ceph (Ubuntu) Status: New => In Progress ** Description changed: - [Impact] + [Impact] An active ceph-mgr crashes and another ceph-mgr takes over and becomes - the active mgr. But this could again hit same issue and crash and the cycle - can continue indefinitely (previously crashed ceph-mgr gets restarted by - systemd). + the active mgr. But this could again hit same issue and crash and the cycle can continue indefinitely (previously crashed ceph-mgr gets restarted by systemd). - This could affect the cluster stability/usability as ceph mgr handles a number - of essential operations (modules that control/change Ceph cluster behaviour, - metrics, etc). + This could affect the cluster stability/usability as ceph mgr handles a + number of essential operations (modules that control/change Ceph cluster + behaviour, metrics, etc). [Test Plan] Deploy and operate a Ceph cluster normally. Increase the log level of mgr to 20. Observe MMgrReport sent from non-active mgrs get ignored (no crash). [Where problems could occur] Possibly the fix may not actually fix and mgr continue to crash as before. Might incorrectly ignore reports from active mgrs. [Other Info] - Upstream main bug: https://tracker.ceph.com/issues/48022 + Upstream main bug: https://tracker.ceph.com/issues/48022 Octopus backport PR: https://github.com/ceph/ceph/pull/43861 Octopus backport bug: https://tracker.ceph.com/issues/53198 This has been already been fixed and available in Pacific. So needed to backport only for Octopus. ** Also affects: ceph (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: ceph (Ubuntu Focal) Assignee: (unassigned) => Ponnuvel Palaniyappan (pponnuvel) ** Changed in: ceph (Ubuntu Focal) Status: New => In Progress ** Changed in: ceph (Ubuntu) Importance: Undecided => High ** Changed in: ceph (Ubuntu Focal) Importance: Undecided => High ** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1955345 Title: Active ceph-mgr crashes on receiving report from a non-active mgr To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1955345/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs