Re: [Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

James Page Fri, 22 May 2020 00:36:32 -0700

Hi Christian

On Fri, May 22, 2020 at 8:10 AM Christian Huebner <
1874...@bugs.launchpad.net> wrote:


> i filed this bug specifically for hyperconverged environments. Upgrading
> monitor nodes first and then upgrading separate OSD nodes is probably
> doable, but in a hyperconverged environment you can not separate.
>

I appreciate that which is why I have endeavoured to reproduce your issue
on a hyperconverged deployment as well.


> I tried do-release-upgrade (a couple of times) without rebooting at the
> end, but found the monitors and OSDs were upgraded and deadlocked at the
> end.
> I tried shutting down all Ceph services first and then do-release upgrade.
> Which started my Ceph services and destroyed my cluster.
> I tried manually upgrading Ceph, which is thwarted by the dependencies,
> it's all or nothing.
>
> I finally accomplished the upgrade by marking all Ceph packages held,
> then digging myself through the dependency jungle to upgrade the
> packages in the right sequence. This was an absolute nightmare and took
> me more than an hour per node. Obviously is not a production ready way
> to do so, but at least Ceph Octopus is running in 20.04 now now.
>
> There are two asks here:
>
> Separate the dependencies so that ceph-mon, ceph-mgr and ceph-osd can be
> installed separately (with the appropriate dependencies, but in a way
> that upgrading ceph-mon does not try to upgrade ceph-osd also. There is
> no good reason why upgrade of ceph-mon should go down and back up the
> dependency tree and try to upgrade ceph-osd too. In fact, I would not
> want monitor packages on my OSD nodes and vice versa in a traditional
> cluster.
>

The versioning between the various binary packages that the ceph source
code produces are strongly versioned so that you can't end up with an
inappropriate/broken mix of binaries on disk at the same time.

Upgrading the ceph-mon package results in an upgrade of the ceph-osd
package because they both depend on ceph-base with a strong version
dependency of a matching binary version.

This is how we enforce a known good set of bits on disks - and is why the
package maintainer scripts don't do restarts of the daemons on upgrade so
that the restart process can be managed with appropriate upgrade ordering.


> And fix do-release-upgrades, so a Ceph cluster does not get restarted
> when the upgrade procedure ends. I can vouch for the services being
> restarted, i tried it several times, once even with the services shut
> down before do-release-upgrade was started.
>

If you shutdown services the postinst script starts
'ceph{-mon,osd,mgr}.target' so they would get started back up, but targets
and services won't get restarted - I tested and validated and checked the
installed maintainer scripts.

I think you'd have to disable and mask the targets *and* services to ensure
that the target start did not force daemons to start as well but I did not
observe any restart behaviour during my upgrade testing (other than due to
the reboot of the system).


>
> An upgrade procedure that breaks customer data should be fixed.
>

Agreed but the first step is reproduction of the issue so that we can
actually identify what the problem is.

I've followed what I think is the same process that you undertook but I've
not seen the same issue when running mixed version MON, MGR and OSD.

So there is something specific in your deployment that we've not captured
in this bug report yet.

Full details of a) /etc/ceph/ceph.conf and b) pool types and configurations
in use would be helpful.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874939

Title:
  ceph-osd can't connect after upgrade to focal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1874939/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

Reply via email to