My main question now is "which is the 'latest' MON?"

Timestamp of the files within the mon db store. ;-) no need to dig through the db itself.

If you don't feel confident to manage the cluster with DeepSea (I know of people who were literally afraid of DeepSea stages :-D ), then don't. :-) Without cephadm you can try to deploy daemons relatively easy. Two years ago I wrote an article [1] how to migrate from SES 6 (Nautilus) to Upstream (Pacific) after testing this procedure for a potential customer. You can reach out to me if you should get to that point.

I'm still in favor of the single mon approach, it has worked many times for us (actually it always worked), and it's relatively quick and easy to test. If that shouldn't work, there's still the procedure to collect the maps from the OSDs to create a new mon store. But let's see how far you get before exploring this option.

[1] https://heiterbiswolkig.blogs.nde.ag/2023/08/14/how-to-migrate-from-suse-enterprise-storage-to-upstream-ceph/

Zitat von Miles Goodhew <[email protected]>:

On Wed, 18 Jun 2025, at 18:09, Eugen Block wrote:
That does look strange indeed, either an upgrade went wrong or someone
already fiddled with the monmap, I'd say. But anyway, I wouldn't try
to deploy a 4th mon since it would want to sync the store, but we
don't know in which state the store actually is. And besides from
that, 2 out of 4 MONs still isn't a quorum, so there's no real
benefit. So my best bet would be on the mon with the most recent
store. And if the cluster comes back up with one mon, you'll need to
wipe the traces of the previous mons so DeepSea can redeploy
additional mons cleanly. Or is the cluster not managed by DeepSea
anymore?

Replies to fragments from above are below:

either an upgrade went wrong or someone already fiddled with the monmap

That's entirely possible. I'm playing the role of a "guy who knows a bit about Ceph" to try and un-explode an old cluster on unsupported OS and hardware. The original deployers are long-since gone and the day-to-day admins were never given much handover. There are legends of several phases of upgrades and deployment system replacements, but concrete documentation is thin on the ground. Certainly I recently found evidence of failed OS upgrades that broke part of the RGW services years ago.

I had previously documented a plan to migrate the cluster to a new/supported hardware, OS and Ceph version, but the client was still thinking about it when this happened.


wouldn't try to deploy a 4th mon
The idea for the 4th MON was to just see if I can deploy a new MON without breaking the cluster much more. However given I can't get >1 MONs to start, it's pretty broken right now. If that deployment worked, I intended then to remove/redeploy each of the other two MONs before retiring the 4th MON again. A side-benefit of this is that it lets me test some of my cluster upgrade plan. One of the cluster clients is Openstack, which in my experience is pretty "sentimental" about its set of MON IPs.


mon with the most recent store
How would I find out which MON that is? I'm told mon3 was the last one operating (but it gives the wall of "e6 handle_auth_request failed to assign global_id" logs when running). mon2 is the one that survives if you try to start all of them. I've tried inspecting the (SQLite?) DBs, but can't get much comprehensible info out of them yet (I don't have any experience tinkering with SQLite, but I'm OK with an "actual" SQL repl). I can't get quorum, so I can't run "ceph ..." command lines, but I can calk to each of the MONs on their Unix sockets when they're running.


Or is the cluster not managed by DeepSea anymore?
I don't think it is. None of the admins (nor I) have very deep experience in Salt stuff (I'm more Ansible). The aforementioned "legends" of the system's lifetime also say there were multiple different management systems over the years. I've mainly used the existing Salt config to break-into managed nodes I didn't yet have an account for and to do fleet-wide "shell command" operations. Given the probability of historic broken OS upgrades and possibly abandoned Salt management, I'd be wary of trying to use this for deployment automation.


(Now in a later email)
Although I'm not a dev, I looked into the code [0] anyway.

The comments before the maybe_resize_cluster function say:

  * If a cluster is undersized (with respect to max_mds), then
  * attempt to find daemons to grow it. If the cluster is oversized
  * (with respect to max_mds) then shrink it by stopping its highest rank.

Is it possible that an operator/admin tried to resize (shrink or grow
the number of MDS daemons) the MDS culster? Or was a DeepSea stage
executed in order to deploy additional daemons? Maybe some history
could help understand what might have happened."

Yes, I saw all that too. I was told that this all started because one of the admins noticed that the CephFS service was slow and was reporting laggy MDSes. This may well be a latent issue from possible historical failed upgrade (pure guesses here). The admin tried restarting some daemons and eventually only mon2 would run (I'm a bit vague on the detail). I don't *think* they tried removing the MDS daemons, but it's possible (I'll check tomorrow).

One of my possible plans of attack was to see if that "maybe resize..." method might be skipped with some "No"-flag or other config. Hopefully then to try and get quorum established before re-enabling it and possibly coming back to health. This is probably too wishful a prospect, though.

Thanks again for all your feedback. Even if this just turns out to be a massive "rubber ducking" session, you've given me some new ideas and threads to pull. My main question now is "which is the 'latest' MON?"

M0les.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to