Public bug reported:

[Impact]
An active ceph-mgr crashes and another ceph-mgr takes over and becomes
the active mgr. But this could again hit same issue and crash and the cycle can 
continue indefinitely (previously crashed ceph-mgr gets restarted by systemd).

This could affect the cluster stability/usability as ceph mgr handles a
number of essential operations (modules that control/change Ceph cluster
behaviour, metrics, etc).

[Test Plan]
Deploy and operate a Ceph cluster normally.
Increase the log level of mgr to 20.
Observe MMgrReport sent from non-active mgrs get ignored (no crash).

[Where problems could occur]
Possibly the fix may not actually fix and mgr continue to crash as before.
Might incorrectly ignore reports from active mgrs.

[Other Info]
Upstream main bug: https://tracker.ceph.com/issues/48022
Octopus backport PR: https://github.com/ceph/ceph/pull/43861
Octopus backport bug: https://tracker.ceph.com/issues/53198

This has been already been fixed and available in Pacific.
So needed to backport only for Octopus.

** Affects: ceph (Ubuntu)
     Importance: High
     Assignee: Ponnuvel Palaniyappan (pponnuvel)
         Status: In Progress

** Affects: ceph (Ubuntu Focal)
     Importance: High
     Assignee: Ponnuvel Palaniyappan (pponnuvel)
         Status: In Progress


** Tags: sts

** Changed in: ceph (Ubuntu)
     Assignee: (unassigned) => Ponnuvel Palaniyappan (pponnuvel)

** Changed in: ceph (Ubuntu)
       Status: New => In Progress

** Description changed:

- [Impact] 
+ [Impact]
  An active ceph-mgr crashes and another ceph-mgr takes over and becomes
- the active mgr. But this could again hit same issue and crash and the cycle
- can continue indefinitely (previously crashed ceph-mgr gets restarted by
- systemd). 
+ the active mgr. But this could again hit same issue and crash and the cycle 
can continue indefinitely (previously crashed ceph-mgr gets restarted by 
systemd).
  
- This could affect the cluster stability/usability as ceph mgr handles a number
- of essential operations (modules that control/change Ceph cluster behaviour,
- metrics, etc).
+ This could affect the cluster stability/usability as ceph mgr handles a
+ number of essential operations (modules that control/change Ceph cluster
+ behaviour, metrics, etc).
  
  [Test Plan]
  Deploy and operate a Ceph cluster normally.
  Increase the log level of mgr to 20.
  Observe MMgrReport sent from non-active mgrs get ignored (no crash).
  
  [Where problems could occur]
  Possibly the fix may not actually fix and mgr continue to crash as before.
  Might incorrectly ignore reports from active mgrs.
  
  [Other Info]
- Upstream main bug: https://tracker.ceph.com/issues/48022 
+ Upstream main bug: https://tracker.ceph.com/issues/48022
  Octopus backport PR: https://github.com/ceph/ceph/pull/43861
  Octopus backport bug: https://tracker.ceph.com/issues/53198
  
  This has been already been fixed and available in Pacific.
  So needed to backport only for Octopus.

** Also affects: ceph (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Changed in: ceph (Ubuntu Focal)
     Assignee: (unassigned) => Ponnuvel Palaniyappan (pponnuvel)

** Changed in: ceph (Ubuntu Focal)
       Status: New => In Progress

** Changed in: ceph (Ubuntu)
   Importance: Undecided => High

** Changed in: ceph (Ubuntu Focal)
   Importance: Undecided => High

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1955345

Title:
  Active ceph-mgr crashes on receiving report from a non-active mgr

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1955345/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to