[ceph-users] Re: Incomplete PG's

Eugen Block Wed, 25 Jun 2025 23:52:47 -0700

Hi,

in a previous thread you wrote that you had multiple simultaneous diskfailures, and you replaced all of the drives. I assume that thefailures happened across different hosts? And the remaining hosts andOSDs were not able to recover? I'm just trying to get a better idea ofwhat exactly happened.

It could help to get a PG query output and see the "blocked_by" info.If it shows a newly deployed OSD, it most likely doesn't have the datato let the PG recover. It could help to see the osd tree, maybe markthe redeployed OSDs (if you know their IDs) so one can map the pgquery output.But it does look bad indeed, and apparently you already accepted thatthere will be data loss (--op mark-complete). Looking at the "test"names of the monitors, this appears to be a test cluster? Could it beeasier to recreate those pools if you have data loss anyway? Althoughit would be interesting to know what led to data loss. Was it a wrongfailure domain or just bad luck?


Regards,
Eugen

Zitat von Devender Singh <[email protected]>:

Hello

Seeking some help to recover these incomplete pgs…

I made osd, down then tried but nothing seems working ….
Ceph version 18.2.7, we have multiple disk failed in ceph cluster ondifferent nodes…
for i in  19.5a 19.5d 19.5e 19.6b 19.6f 19.74 19.7b 34.48 34.69; do
  ceph pg map $i | grep -q "acting.*1" && \
ceph-objectstore-tool --data-path/var/lib/ceph/15688cb4-044a-11ec-942e-516035adea04/osd.17 --opmark-complete --pgid $i --force
done

#

for i in  19.5a 19.5d 19.5e 19.6b 19.6f 19.74 19.7b 34.48 34.69; do
  ceph pg map $i | grep -q "acting.*1" && \
ceph-objectstore-tool --data-path/var/lib/ceph/15688cb4-044a-11ec-942e-516035adea04/osd.17 --oprepair --pgid $i --force
done &


pg 16.48 is incomplete, acting [20,9,17]
    pg 16.50 is incomplete, acting [20,27,5]
    pg 16.68 is incomplete, acting [2,5,27]
    pg 16.78 is incomplete, acting [23,14,29]
pg 17.4f is incomplete, acting [10,30,18] (reducing poollv-r3-for-ec-large-disks min_size from 2 may help; searchceph.com/docs for 'incomplete')pg 17.7a is incomplete, acting [8,28,27] (reducing poollv-r3-for-ec-large-disks min_size from 2 may help; searchceph.com/docs for 'incomplete')pg 19.44 is incomplete, acting [10,4,21,29,0] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.45 is stuck inactive since forever, current stateincomplete, last acting [10,28,24,4,0] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.49 is incomplete, acting [28,10,26,0,15] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.4a is incomplete, acting [21,1,23,0,26] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.4b is incomplete, acting [6,12,17,30,7] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.4c is incomplete, acting [5,0,8,21,26] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.4f is incomplete, acting [26,14,5,7,9] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.51 is incomplete, acting [20,14,28,26,23] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.52 is incomplete, acting [10,26,22,30,7] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.53 is incomplete, acting [12,24,25,23,22] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.54 is incomplete, acting [10,4,20,15,11] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.57 is incomplete, acting [15,22,9,0,14] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.58 is incomplete, acting [15,0,12,28,20] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.59 is incomplete, acting [6,9,0,21,26] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.5a is incomplete, acting [11,31,20,17,24] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.5d is incomplete, acting [30,19,4,17,15] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.5e is incomplete, acting [3,9,10,17,16] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.5f is remapped+incomplete, acting [22] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.60 is incomplete, acting [25,27,23,4,22] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.61 is remapped+incomplete, acting [23] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.62 is remapped+incomplete, acting [23] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.63 is incomplete, acting [15,4,1,22,7] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.64 is incomplete, acting [4,7,5,29,31] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.65 is incomplete, acting [3,9,15,10,19] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.6b is remapped+incomplete, acting [1] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.6f is incomplete, acting [30,16,17,18,10] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.72 is remapped+incomplete, acting [19] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.73 is incomplete, acting [12,27,13,16,11] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.74 is incomplete, acting [17,18,19,4,29] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.75 is incomplete, acting [8,31,20,15,7] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.76 is incomplete, acting [5,26,18,30,21] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.78 is incomplete, acting [13,26,14,10,16] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.7a is incomplete, acting [15,1,16,12,6] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.7b is incomplete, acting [2,17,15,29,9] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.7c is incomplete, acting [28,19,6,4,26] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')pg 19.7f is remapped+incomplete, acting [23] (reducing poollvp-ec-large-disks min_size from 3 may help; search ceph.com/docsfor 'incomplete')
    pg 22.4b is incomplete, acting [14,7,23]
    pg 22.51 is incomplete, acting [11,8,2]
    pg 22.5a is incomplete, acting [31,15,4]
    pg 34.48 is incomplete, acting [31,19,17]
    pg 34.51 is incomplete, acting [15,12,20]
    pg 34.65 is incomplete, acting [30,18,20]
    pg 34.6d is incomplete, acting [30,12,21]
    pg 34.7c is incomplete, acting [31,9,3]


# ceph -s
  cluster:
    id:     15688cb4-044a-11ec-942e-516035adea04
    health: HEALTH_ERR
            4 OSD(s) experiencing slow operations in BlueStore
            1 failed cephadm daemon(s)
            1 filesystem is degraded
            1 MDSs report slow metadata IOs
            1/16510107 objects unfound (0.000%)
            Reduced data availability: 449 pgs inactive, 444 pgs incomplete
            Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 3/60411723 objects degraded(0.000%), 1 pg degraded
            343 pgs not deep-scrubbed in time
63 slow ops, oldest one blocked for 262107 sec, daemons[osd.1,osd.15,osd.16,osd.18,osd.19,osd.2,osd.21,osd.25,osd.26,osd.27]...have slow ops.
  services:
mon: 7 daemons, quorumtest-host04n,test-host05n,test-host02n,test-host03n,test-host06n,test-host07n,test-host08n (age5h)mgr: test-host04n.azmqik(active, since 5h), standbys:test-host05n.lybhho
    mds:        3/3 daemons up, 11 standby
osd: 32 osds: 32 up (since 18m), 32 in (since 18m); 36remapped pgs
    rbd-mirror: 2 daemons active (2 hosts)

  data:
    volumes: 1/2 healthy, 1 recovering
    pools:   32 pools, 1505 pgs
    objects: 16.51M objects, 55 TiB
    usage:   155 TiB used, 246 TiB / 401 TiB avail
    pgs:     0.332% pgs unknown
             29.502% pgs not active
             3/60411723 objects degraded (0.000%)
             1250071/60411723 objects misplaced (2.069%)
             1/16510107 objects unfound (0.000%)
             1022 active+clean
             438  incomplete
             16   active+remapped+backfilling
             14   active+remapped+backfill_wait
             6    remapped+incomplete
             5    unknown
             3    active+clean+scrubbing+deep
             1    active+recovery_unfound+degraded

  io:
    recovery: 1.3 GiB/s, 345 objects/s

  progress:
    Global Recovery Event (4h)
      [===================.........] (remaining: 2h)

Regards
Dev



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Incomplete PG's

Reply via email to