Hi,
we are having an issue at a customer site where a 3PB CephFS is in failed state.
The cluster itself is unhealthy and awaits replacements disks:
# ceph -s
cluster:
id: 28ca2bfa-d87e-11ed-83a3-1070fddda30f
health: HEALTH_ERR
4 failed cephadm daemon(s)
There are daemons running an older version of ceph
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
8 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve
itself): 46 pgs backfill_toofull
Possible data damage: 4 pgs inconsistent
Degraded data redundancy: 6427646/15858772167 objects degraded
(0.041%), 8 pgs degraded, 8 pgs undersized
6 pool(s) nearfull
(muted: OSDMAP_FLAGS OSD_SCRUB_ERRORS(2d) PG_NOT_DEEP_SCRUBBED
PG_NOT_SCRUBBED)
services:
mon: 3 daemons, quorum sn01,sn03,sn02 (age 3w)
mgr: sn03.crlpzh(active, since 33h), standbys: sn01.tegfya, sn02.mzvgcr
mds: 18/19 daemons up, 1 standby
osd: 181 osds: 174 up (since 4d), 172 in (since 4d); 206 remapped pgs
flags nodeep-scrub
data:
volumes: 2/3 healthy, 1 recovering; 1 damaged
pools: 12 pools, 3585 pgs
objects: 1.93G objects, 1.3 PiB
usage: 2.5 PiB used, 501 TiB / 3.0 PiB avail
pgs: 6427646/15858772167 objects degraded (0.041%)
293845758/15858772167 objects misplaced (1.853%)
2844 active+clean
532 active+clean+scrubbing
147 active+remapped+backfill_wait
28 active+remapped+backfill_toofull
11 active+remapped+backfill_wait+backfill_toofull
10 active+remapped+backfilling
6 active+undersized+degraded+remapped+backfill_toofull
2 active+clean+inconsistent
1 active+clean+scrubbing+deep+inconsistent+repair
1 active+undersized+remapped+backfilling
1 active+undersized+degraded+remapped+backfilling
1 active+recovering+degraded+remapped
1 active+remapped+inconsistent+backfill_toofull
io:
recovery: 183 MiB/s, 312 objects/s
The CephFS metadata pool is not affected by the inconsistent PGs.
The MDSs have this line in their logfile:
"Monitors have assigned me to become a standby."
The filesystem is joinable:
# ceph fs lsflags storage_cluster
joinable allow_snaps allow_multimds_snaps refuse_client_session
But no MDS joins:
# ceph fs status
storage_cluster - 0 clients
===============
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 failed
POOL TYPE USED AVAIL
cephfs_metadata metadata 490G 12.9T
cephfs_data data 970T 54.1T
shared_data data 1351T 22.5T
STANDBY MDS
storage_cluster.sn04.cbvzzu
MDS version: ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d)
reef (stable)
Why?
Regards
--
Robert Sander
Linux Consultant
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin
https://www.heinlein-support.de
Tel: +49 30 405051 - 0
Fax: +49 30 405051 - 19
Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]