Hi, On 2019-11-20 15:55, thoralf schulze wrote:
hi,we were able to track this down to the auto balancer: disabling the auto balancer and cleaning out old (and probably not very meaningful) upmap-entries via ceph osd rm-pg-upmap-items brought back stable mgr daemons and an usable dashboard.
I can confirm that, in our case I see this on a 14.2.4 cluster (which has
started its life with an earlier Nautilus version,
and developed this issue over the past weeks) and doing:
ceph balancer off
has been sufficient to make the mgrs stable again (i.e. I left the upmap-items
in place).
Interestingly, we did not see this with Luminous or Mimic on different clusters
(which however have a more stable number of OSDs).
@devs: If there's any more info needed to track this down, please let us know.
Cheers,
Oliver
the not-so-sensible upmap-entries might or might not have been caused by
us updating from mimic to nautilus - it's too late to debug this now.
this seems to be consistent with bryan stillwell's findings ("mgr hangs
with upmap balancer").
thank you very much & with kind regards,
thoralf.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
