[ceph-users] Re: Reef CEPHADM_CHECK_PUBLIC_MEMBERSHIP false positive alerts on some hosts (IPv6 only cluster)

Stefan Kooman Thu, 18 Sep 2025 04:03:14 -0700

Hi,

I added Adam in CC (Cephadm lead). Maybe he can tell us more about howthis information is gathered / parsed.


On 9/17/25 11:24, Boris wrote:

Hi Stefan,

We run the 18.2.7 version.

I turned on the debug logs and this looks really weird:

[DBG] public networks ['PREFIX:22::/64']
[DBG] cluster networks []
[DBG] processing data from 12 hosts
[DBG] checking public network membership for: ['host-20', 'host-13','host-19', 'host-18', 'host-12', 'host-16', 'host-14', 'host-15','host-22', 'host-17', 'host-23', 'host-21']
[DBG] checking network PREFIX:22::/64
[DBG] subnet data - {"subnet": "PREFIX:22::/64", "mtu_map": {"9100":["host-13", "host-12", "host-16", "host-14", "host-15", "host-17","host-21"]}, "speed_map": {"20000": ["host-13", "host-12", "host-16","host-14", "host-17", "host-21"], "10000": ["host-15"]}}[DBG] processing mtu map : {"9100": ["host-13", "host-12", "host-16","host-14", "host-15", "host-17", "host-21"]}
[DBG] MTU problems detected
[DBG] most hosts using 9100
[DBG] processing subnet : {"subnet": "PREFIX:22::/64", "mtu_map":{"9100": ["host-13", "host-12", "host-16", "host-14", "host-15","host-17", "host-21"]}, "speed_map": {"20000": ["host-13", "host-12","host-16", "host-14", "host-17", "host-21"], "10000": ["host-15"]}}
[DBG] linkspeed issue(s) detected
[DBG] most hosts using 10000
But when I check one of the hosts (host-22). that do not show up in thesubnet or mtu map and run a "cephadm --image quay.io/ceph/ceph@sha256:1b9158ce28975f95def6a0ad459fa19f1336506074267a4b47c1bd914a00fec0 <http://quay.io/ceph/ceph@sha256:1b9158ce28975f95def6a0ad459fa19f1336506074267a4b47c1bd914a00fec0> gather-facts" I get the following for the interface which should have the prefix
     "bond0.22": {
       "driver": "",
       "iftype": "logical",
       "ipv4_address": "",
       "ipv6_address": "fe80::.../64",
       "lower_devs_list": [
         "bond0"
       ],
       "mtu": 9100,
       "nic_type": "ethernet",
       "operstate": "up",
       "speed": 20000,
       "upper_devs_list": []
     }
But doing a "cephadm --image quay.io/ceph/ceph@sha256:1b9158ce28975f95def6a0ad459fa19f1336506074267a4b47c1bd914a00fec0 <http://quay.io/ceph/ceph@sha256:1b9158ce28975f95def6a0ad459fa19f1336506074267a4b47c1bd914a00fec0> list-networks" I get
...,
     "PREFIX:22::/64": {
         "bond0.22": [
             "PREFIX:22::186"
         ]
     },
...,
     "fe80::/64": {
...,
         "bond0.22": [
             "fe80::..."
         ],
...
     }
}

So why doesn't it show up in the 9100 mtu_map and in the 20000 speed_map?
And why does it use the link local address in the gather-facts section?


I don't know. Does not make sense to me.

I cross checked with some other host that acutally shows up and in thegather-facts it got the correct prefix, but the list-networks look thesame with the fe80::/64 network.Is there a way to priortize the configured IP addresses in the outputand only use the link-local addresses when there is no other IP address?I really wouldn't like to disable the link-local address, because I wantto have the ceph-nodes in different clusters configured as same aspossible. We have some clusters that use RA on the mgmt interface andsome that have a static gateway configured.


Disabling link-local address will brick IPv6 networking functionality.

I repeated your tests at two different clusters. I can confirm what youare seeing. The "gather-facts" gives correct results for some, but notfor others. In some cases the management IP shows up, but not the Cephpublic interface, on other cases only local-link interfaces show up.

Note, we do not use linux bonding or bridging but we use Open vSwitchbridges, bonds and interfaces. Each host has a dedicated "ceph"interface with the public IP on it (bound to an uplink bridge, which isbound to a bond).

A thing I noticed is the way the hostname is derived. If something likethis is present in /etc/hosts:


::1     ip6-localhost ip6-loopback mon2

The hostname in "gather-facts" will be "ip6-localhost" instead of mon2.

Another thing is the "--image" part of "cephadm --imagequay.io/ceph/ceph@sha256:1b9158ce28975f95def6a0ad459fa19f1336506074267a4b47c1bd914a00fgather-facts" does not seem to work. It doesn't matter if I add areference to a non-existent image, so I doubt this option has any effectat all.


Gr. Stefan

Am Mi., 17. Sept. 2025 um 09:06 Uhr schrieb Stefan Kooman <[email protected]<mailto:[email protected]>>:


    On 9/16/25 18:34, Boris wrote:
     > Hi,
     >
     > I am currently debugging an issue with the ceph config checks.
     > We have some random hosts that alert
     >
     > "HOSTNAME does not have an interface on any public network"
     >
     > but they have. It is IPv6, static configured and, because we
    don't have a
     > cluster_network, OSDs are bound to that IP in the specific network.
     >
     > I went through the netplan config and they are basically the same
    on all
     > hosts.
     > And after rebooting the hosts some of them resolved and some didn't.
     >
     > How can I dig deeper to figure out what is going on.


    Can you find some log output related to these events (docu here [1])?

     >
     > All services are ceph-orch podman containers
     > All hosts are ubuntu 22.04 with latest HWE kernel (6.8.0-79-generic)

    What version of Ceph are you running? We have ubuntu 22.04 clusters
    configured exactly like this (IPv6 only), so I'm really curious. We
    haven't seen this behavior yet (18.2.4).

    Gr. Stefan

    [1]:
    https://docs.ceph.com/en/latest/cephadm/operations/#watching-
    cephadm-log-messages <https://docs.ceph.com/en/latest/cephadm/
    operations/#watching-cephadm-log-messages>



--

Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend imgroÃƒ¼en Saal.

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Reef CEPHADM_CHECK_PUBLIC_MEMBERSHIP false positive alerts on some hosts (IPv6 only cluster)

Reply via email to