Hi Dev, Please also share a 'ceph config dump' (remove any passwords / confidential information from the output) so we can look for any configuration issues. And share the manufacturer / model of the drives.
Did you check /var/log/messages for any hardware or network issues? Regards, Frédéric. ----- Le 18 Juin 25, à 23:33, Anthony D'Atri [email protected] a écrit : >> On Jun 18, 2025, at 5:03 PM, Devender Singh <[email protected]> wrote: >> >> Hello all >> >> Need urgent help on below… >> I tried reducing min_size but still showing same… >> >> cluster: >> id: 15688cb4-044a-11ec-942e-516035adea04 >> health: HEALTH_ERR >> 3 failed cephadm daemon(s) >> 1 filesystem is degraded >> 1 MDSs report slow metadata IOs >> 20/16670718 objects unfound (0.000%) > > Did you suffer a power outage or something? > What are your OSDs? HDD? SSD? If SSD, are they *enterprise* not client? > >> Reduced data availability: 283 pgs inactive, 464 pgs incomplete >> Possible data damage: 2 pgs recovery_unfound >> Degraded data redundancy: 42998/61175329 objects degraded >> (0.070%), 2 pgs >> degraded, 1 pg undersized >> 304 pgs not deep-scrubbed in time >> 1055 slow ops, oldest one blocked for 94042 sec, daemons >> >> [osd.1,osd.12,osd.13,osd.15,osd.16,osd.19,osd.20,osd.21,osd.26,osd.27]... >> have >> slow ops. > > Look for a pattern. Are these all on the same host? Send `ceph osd tree` > > In the meantime run `ceph osd down 1`, wait for a couple of minutes for > recovery, and see if that improves the numbers. > If it does, repeat with the other OSDs above, waiting for the PGs to peer and > all 31 OSDs to show up before proceeding to the next. > > >> >> services: >> mon: 5 daemons, quorum >> >> van2-converged04n,van2-converged05n,van2-converged01n,van2-converged03n,van2-converged02n >> (age 17h) >> mgr: van2-converged05n.lybhho(active, since 22h), standbys: >> van2-converged04n.azmqik >> mds: 3/3 daemons up, 11 standby >> osd: 31 osds: 31 up (since 5h), 31 in (since 5h); 2 remapped pgs >> rbd-mirror: 2 daemons active (2 hosts) >> >> data: >> volumes: 1/2 healthy, 1 recovering >> pools: 32 pools, 1505 pgs >> objects: 16.67M objects, 56 TiB >> usage: 152 TiB used, 234 TiB / 387 TiB avail >> pgs: 0.332% pgs unknown >> 30.897% pgs not active >> 42998/61175329 objects degraded (0.070%) >> 21469/61175329 objects misplaced (0.035%) >> 20/16670718 objects unfound (0.000%) >> 1034 active+clean >> 463 incomplete >> 5 unknown >> 1 remapped+incomplete >> 1 active+recovery_unfound+degraded >> 1 recovery_unfound+undersized+degraded+remapped+peered >> >> progress: >> Global Recovery Event (22h) >> [===================.........] (remaining: 10h) >> >> >> [WRN] SLOW_OPS: 1055 slow ops, oldest one blocked for 93877 sec, daemons >> [osd.1,osd.12,osd.13,osd.15,osd.16,osd.19,osd.20,osd.21,osd.26,osd.27]... >> have >> slow ops. >> [WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs >> mds.van2.van2-converged05n.mbbzfj(mds.1): 1 slow metadata IOs are blocked >> > 30 >> secs, oldest blocked for 62279 secs >> [WRN] OBJECT_UNFOUND: 20/16670718 objects unfound (0.000%) >> pg 28.1f has 9 unfound objects >> pg 29.2 has 11 unfound objects >> [WRN] PG_AVAILABILITY: Reduced data availability: 470 pgs inactive, 464 pgs >> incomplete >> pg 16.58 is incomplete, acting [7,14,19] (reducing pool lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.5c is incomplete, acting [26,3,19] (reducing pool lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.5f is incomplete, acting [13,6,21] (reducing pool lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.63 is incomplete, acting [9,7,15] (reducing pool lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.64 is incomplete, acting [18,19,5] (reducing pool lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.68 is incomplete, acting [2,5,27] (reducing pool lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.70 is incomplete, acting [19,1,24] (reducing pool lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.74 is incomplete, acting [8,25,14] (reducing pool lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.78 is incomplete, acting [23,14,29] (reducing pool >> lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.7c is incomplete, acting [24,10,29] (reducing pool >> lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 16.7d is incomplete, acting [12,22,13] (reducing pool >> lv-r3-for-ec-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 17.64 is incomplete, acting [20,11,13] (reducing pool >> lv-r3-for-ec-large-disks min_size from 2 may help; search ceph.com/docs >> for >> 'incomplete') >> pg 17.66 is incomplete, acting [21,13,27] (reducing pool >> lv-r3-for-ec-large-disks min_size from 2 may help; search ceph.com/docs >> for >> 'incomplete') >> pg 17.6a is incomplete, acting [4,26,13] (reducing pool >> lv-r3-for-ec-large-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 17.6c is incomplete, acting [13,12,31] (reducing pool >> lv-r3-for-ec-large-disks min_size from 2 may help; search ceph.com/docs >> for >> 'incomplete') >> pg 17.6e is incomplete, acting [8,31,24] (reducing pool >> lv-r3-for-ec-large-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 17.7a is incomplete, acting [8,28,27] (reducing pool >> lv-r3-for-ec-large-disks >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 19.54 is stuck inactive since forever, current state incomplete, last >> acting >> [10,4,20,15,11] (reducing pool lvp-ec-large-disks min_size from 3 may >> help; >> search ceph.com/docs for 'incomplete') >> pg 19.58 is incomplete, acting [15,18,12,28,20] (reducing pool >> lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for >> 'incomplete') >> pg 19.59 is incomplete, acting [6,9,18,21,26] (reducing pool >> lvp-ec-large-disks >> min_size from 3 may help; search ceph.com/docs for 'incomplete') >> pg 19.5a is incomplete, acting [11,31,20,17,24] (reducing pool >> lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for >> ‘incomplceph -s >> Ceph >> >> Any way to make these scrubs complete faster? >> >> pg 19.63 not deep-scrubbed since 2025-05-24T08:18:29.738427+0000 >> pg 34.6d not deep-scrubbed since 2025-06-04T08:33:54.534882+0000 >> pg 16.5f not deep-scrubbed since 2025-05-21T10:44:09.996254+0000 >> pg 19.5d not deep-scrubbed since 2025-05-26T09:36:27.064154+0000 >> pg 19.5e not deep-scrubbed since 2025-06-05T00:52:03.859984+0000 >> pg 16.5c not deep-scrubbed since 2025-06-06T00:36:22.021390+0000 >> pg 19.5f not deep-scrubbed since 2025-06-03T16:27:42.356213+0000 >> pg 22.5a not deep-scrubbed since 2025-06-05T23:00:28.066065+0000 >> pg 34.69 not deep-scrubbed since 2025-06-03T05:07:58.209808+0000 >> pg 19.58 not deep-scrubbed since 2025-05-27T23:32:29.963976+0000 >> pg 19.59 not deep-scrubbed since 2025-05-25T11:50:44.735318+0000 >> pg 19.5a not deep-scrubbed since 2025-06-06T02:34:05.486126+0000 >> pg 16.58 not deep-scrubbed since 2025-05-13T14:10:44.570493+0000 >> >> >> Regards >> Dev >> _______________________________________________ >> ceph-users mailing list -- [email protected] >> To unsubscribe send an email to [email protected] > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
