[ceph-users] CephFS: no MDS does join the filesystem

Robert Sander Mon, 30 Jun 2025 06:31:48 -0700

Hi,

we are having an issue at a customer site where a 3PB CephFS is in failed state.


The cluster itself is unhealthy and awaits replacements disks:

# ceph -s
  cluster:
    id:     28ca2bfa-d87e-11ed-83a3-1070fddda30f
    health: HEALTH_ERR
            4 failed cephadm daemon(s)
            There are daemons running an older version of ceph
            1 filesystem is degraded
            1 filesystem is offline
            1 mds daemon damaged
            8 nearfull osd(s)
            Low space hindering backfill (add storage if this doesn't resolve 
itself): 46 pgs backfill_toofull
            Possible data damage: 4 pgs inconsistent
            Degraded data redundancy: 6427646/15858772167 objects degraded 
(0.041%), 8 pgs degraded, 8 pgs undersized
            6 pool(s) nearfull
            (muted: OSDMAP_FLAGS OSD_SCRUB_ERRORS(2d) PG_NOT_DEEP_SCRUBBED 
PG_NOT_SCRUBBED)

  services:
    mon: 3 daemons, quorum sn01,sn03,sn02 (age 3w)
    mgr: sn03.crlpzh(active, since 33h), standbys: sn01.tegfya, sn02.mzvgcr
    mds: 18/19 daemons up, 1 standby
    osd: 181 osds: 174 up (since 4d), 172 in (since 4d); 206 remapped pgs
         flags nodeep-scrub

  data:
    volumes: 2/3 healthy, 1 recovering; 1 damaged
    pools:   12 pools, 3585 pgs
    objects: 1.93G objects, 1.3 PiB
    usage:   2.5 PiB used, 501 TiB / 3.0 PiB avail
    pgs:     6427646/15858772167 objects degraded (0.041%)
             293845758/15858772167 objects misplaced (1.853%)
             2844 active+clean
             532  active+clean+scrubbing
             147  active+remapped+backfill_wait
             28   active+remapped+backfill_toofull
             11   active+remapped+backfill_wait+backfill_toofull
             10   active+remapped+backfilling
             6    active+undersized+degraded+remapped+backfill_toofull
             2    active+clean+inconsistent
             1    active+clean+scrubbing+deep+inconsistent+repair
             1    active+undersized+remapped+backfilling
             1    active+undersized+degraded+remapped+backfilling
             1    active+recovering+degraded+remapped
             1    active+remapped+inconsistent+backfill_toofull

  io:
    recovery: 183 MiB/s, 312 objects/s


The CephFS metadata pool is not affected by the inconsistent PGs.

The MDSs have this line in their logfile:

"Monitors have assigned me to become a standby."

The filesystem is joinable:

# ceph fs lsflags storage_cluster
joinable allow_snaps allow_multimds_snaps refuse_client_session

But no MDS joins:

# ceph fs status
storage_cluster - 0 clients
===============
RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
0    failed
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   490G  12.9T
  cephfs_data      data     970T  54.1T
  shared_data      data    1351T  22.5T
        STANDBY MDS
storage_cluster.sn04.cbvzzu
MDS version: ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) 
reef (stable)


Why?


Regards
--
Robert Sander
Linux Consultant

Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: +49 30 405051 - 0
Fax: +49 30 405051 - 19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] CephFS: no MDS does join the filesystem

Reply via email to