Hi,

in a previous thread you wrote that you had multiple simultaneous disk failures, and you replaced all of the drives. I assume that the failures happened across different hosts? And the remaining hosts and OSDs were not able to recover? I'm just trying to get a better idea of what exactly happened.

It could help to get a PG query output and see the "blocked_by" info. If it shows a newly deployed OSD, it most likely doesn't have the data to let the PG recover. It could help to see the osd tree, maybe mark the redeployed OSDs (if you know their IDs) so one can map the pg query output. But it does look bad indeed, and apparently you already accepted that there will be data loss (--op mark-complete). Looking at the "test" names of the monitors, this appears to be a test cluster? Could it be easier to recreate those pools if you have data loss anyway? Although it would be interesting to know what led to data loss. Was it a wrong failure domain or just bad luck?

Regards,
Eugen

Zitat von Devender Singh <[email protected]>:

Hello

Seeking some help to recover these incomplete pgs…

I made osd, down then tried but nothing seems working ….
Ceph version 18.2.7, we have multiple disk failed in ceph cluster on different nodes…

for i in  19.5a 19.5d 19.5e 19.6b 19.6f 19.74 19.7b 34.48 34.69; do
  ceph pg map $i | grep -q "acting.*1" && \
ceph-objectstore-tool --data-path /var/lib/ceph/15688cb4-044a-11ec-942e-516035adea04/osd.17 --op mark-complete --pgid $i --force
done

#

for i in  19.5a 19.5d 19.5e 19.6b 19.6f 19.74 19.7b 34.48 34.69; do
  ceph pg map $i | grep -q "acting.*1" && \
ceph-objectstore-tool --data-path /var/lib/ceph/15688cb4-044a-11ec-942e-516035adea04/osd.17 --op repair --pgid $i --force
done &


pg 16.48 is incomplete, acting [20,9,17]
    pg 16.50 is incomplete, acting [20,27,5]
    pg 16.68 is incomplete, acting [2,5,27]
    pg 16.78 is incomplete, acting [23,14,29]
pg 17.4f is incomplete, acting [10,30,18] (reducing pool lv-r3-for-ec-large-disks min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 17.7a is incomplete, acting [8,28,27] (reducing pool lv-r3-for-ec-large-disks min_size from 2 may help; search ceph.com/docs for 'incomplete') pg 19.44 is incomplete, acting [10,4,21,29,0] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.45 is stuck inactive since forever, current state incomplete, last acting [10,28,24,4,0] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.49 is incomplete, acting [28,10,26,0,15] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.4a is incomplete, acting [21,1,23,0,26] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.4b is incomplete, acting [6,12,17,30,7] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.4c is incomplete, acting [5,0,8,21,26] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.4f is incomplete, acting [26,14,5,7,9] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.51 is incomplete, acting [20,14,28,26,23] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.52 is incomplete, acting [10,26,22,30,7] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.53 is incomplete, acting [12,24,25,23,22] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.54 is incomplete, acting [10,4,20,15,11] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.57 is incomplete, acting [15,22,9,0,14] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.58 is incomplete, acting [15,0,12,28,20] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.59 is incomplete, acting [6,9,0,21,26] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.5a is incomplete, acting [11,31,20,17,24] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.5d is incomplete, acting [30,19,4,17,15] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.5e is incomplete, acting [3,9,10,17,16] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.5f is remapped+incomplete, acting [22] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.60 is incomplete, acting [25,27,23,4,22] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.61 is remapped+incomplete, acting [23] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.62 is remapped+incomplete, acting [23] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.63 is incomplete, acting [15,4,1,22,7] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.64 is incomplete, acting [4,7,5,29,31] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.65 is incomplete, acting [3,9,15,10,19] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.6b is remapped+incomplete, acting [1] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.6f is incomplete, acting [30,16,17,18,10] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.72 is remapped+incomplete, acting [19] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.73 is incomplete, acting [12,27,13,16,11] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.74 is incomplete, acting [17,18,19,4,29] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.75 is incomplete, acting [8,31,20,15,7] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.76 is incomplete, acting [5,26,18,30,21] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.78 is incomplete, acting [13,26,14,10,16] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.7a is incomplete, acting [15,1,16,12,6] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.7b is incomplete, acting [2,17,15,29,9] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.7c is incomplete, acting [28,19,6,4,26] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 19.7f is remapped+incomplete, acting [23] (reducing pool lvp-ec-large-disks min_size from 3 may help; search ceph.com/docs for 'incomplete')
    pg 22.4b is incomplete, acting [14,7,23]
    pg 22.51 is incomplete, acting [11,8,2]
    pg 22.5a is incomplete, acting [31,15,4]
    pg 34.48 is incomplete, acting [31,19,17]
    pg 34.51 is incomplete, acting [15,12,20]
    pg 34.65 is incomplete, acting [30,18,20]
    pg 34.6d is incomplete, acting [30,12,21]
    pg 34.7c is incomplete, acting [31,9,3]


# ceph -s
  cluster:
    id:     15688cb4-044a-11ec-942e-516035adea04
    health: HEALTH_ERR
            4 OSD(s) experiencing slow operations in BlueStore
            1 failed cephadm daemon(s)
            1 filesystem is degraded
            1 MDSs report slow metadata IOs
            1/16510107 objects unfound (0.000%)
            Reduced data availability: 449 pgs inactive, 444 pgs incomplete
            Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 3/60411723 objects degraded (0.000%), 1 pg degraded
            343 pgs not deep-scrubbed in time
63 slow ops, oldest one blocked for 262107 sec, daemons [osd.1,osd.15,osd.16,osd.18,osd.19,osd.2,osd.21,osd.25,osd.26,osd.27]... have slow ops.

  services:
mon: 7 daemons, quorum test-host04n,test-host05n,test-host02n,test-host03n,test-host06n,test-host07n,test-host08n (age 5h) mgr: test-host04n.azmqik(active, since 5h), standbys: test-host05n.lybhho
    mds:        3/3 daemons up, 11 standby
osd: 32 osds: 32 up (since 18m), 32 in (since 18m); 36 remapped pgs
    rbd-mirror: 2 daemons active (2 hosts)

  data:
    volumes: 1/2 healthy, 1 recovering
    pools:   32 pools, 1505 pgs
    objects: 16.51M objects, 55 TiB
    usage:   155 TiB used, 246 TiB / 401 TiB avail
    pgs:     0.332% pgs unknown
             29.502% pgs not active
             3/60411723 objects degraded (0.000%)
             1250071/60411723 objects misplaced (2.069%)
             1/16510107 objects unfound (0.000%)
             1022 active+clean
             438  incomplete
             16   active+remapped+backfilling
             14   active+remapped+backfill_wait
             6    remapped+incomplete
             5    unknown
             3    active+clean+scrubbing+deep
             1    active+recovery_unfound+degraded

  io:
    recovery: 1.3 GiB/s, 345 objects/s

  progress:
    Global Recovery Event (4h)
      [===================.........] (remaining: 2h)

Regards
Dev



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to