Martin Conway wrote:
> I find that backfilling and possibly scrubbing often comes to a halt for no 
> apparent
> reason. If I put a server into maintenance mode or kill and restart OSDs it 
> bursts back
> into life again.
> 
> Not sure how to diagnose why the recovery processes have stalled.

My cluster is in this stalled state now, I have saved some details below.

Seems to point quite heavily to OSD.32 and OSD.33 but there is nothing of note 
in their logs. They were experiencing slow ops last night, and this morning 
have logged nothing. I am certain recovery and scrubbing will resume if I 
restarted those OSDs, but it would be nice to know what keep causing this.

ceph -s
  cluster:
    id:     16bb4f7a-cf04-4667-aeee-94ce7f6ab672
    health: HEALTH_WARN
            441 pgs not deep-scrubbed in time
            43 pgs not scrubbed in time

  services:
    mon: 5 daemons, quorum scustor3,scustor2,scustor1,scustor4,scustor5 (age 
23h)
    mgr: scustor3.wplaov(active, since 2d), standbys: scustor4.giyegr, 
scustor1.luywbi, scustor2.ncfaec
    mds: 2/2 daemons up, 1 standby
    osd: 31 osds: 31 up (since 23h), 31 in (since 4d); 54 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   8 pools, 897 pgs
    objects: 48.52M objects, 47 TiB
    usage:   130 TiB used, 130 TiB / 260 TiB avail
    pgs:     3497437/145569867 objects misplaced (2.403%)
             757 active+clean
             70  active+clean+scrubbing
             48  active+remapped+backfilling
             14  active+clean+scrubbing+deep
             6   active+recovering+remapped
             2   active+recovering

  io:
    client:   24 KiB/s rd, 416 KiB/s wr, 1 op/s rd, 44 op/s wr


ceph pg dump
https://pastebin.com/raw/KPBie7SD

ceph pg dump_stuck
PG_STAT  STATE                        UP          UP_PRIMARY  ACTING      
ACTING_PRIMARY
5.1f4    active+remapped+backfilling   [32,5,18]          32   [32,5,21]        
      32
5.1f0    active+remapped+backfilling   [32,2,13]          32    [32,2,5]        
      32
5.1e4    active+remapped+backfilling  [33,13,10]          33    [33,6,5]        
      33
5.1c5    active+remapped+backfilling   [33,2,10]          33  [33,10,16]        
      33
5.1bc    active+remapped+backfilling  [18,33,11]          18   [33,11,6]        
      33
5.193    active+remapped+backfilling   [33,14,3]          33   [33,13,5]        
      33
5.180    active+remapped+backfilling   [32,13,1]          32   [32,1,10]        
      32
5.171    active+remapped+backfilling   [33,1,18]          33   [33,1,20]        
      33
5.16b     active+recovering+remapped   [32,9,13]          32    [32,9,2]        
      32
5.16a    active+remapped+backfilling    [13,4,6]          13   [33,4,16]        
      33
5.169    active+remapped+backfilling  [14,33,10]          14  [33,10,13]        
      33
5.162    active+remapped+backfilling   [32,6,13]          32   [32,6,22]        
      32
5.130    active+remapped+backfilling   [13,32,3]          13    [32,3,1]        
      32
5.1cd    active+remapped+backfilling   [33,5,18]          33   [33,5,13]        
      33
6.3b               active+recovering    [32,1,3]          32    [32,1,3]        
      32
6.42     active+remapped+backfilling   [18,33,9]          18   [33,9,13]        
      33
5.4e     active+remapped+backfilling  [32,11,10]          32  [32,10,22]        
      32
5.167    active+remapped+backfilling   [33,18,6]          33    [33,6,2]        
      33
6.20     active+remapped+backfilling  [14,32,10]          14   [32,13,1]        
      32
5.52     active+remapped+backfilling   [14,1,32]          14   [32,1,13]        
      32
5.57      active+recovering+remapped   [32,9,13]          32   [32,9,22]        
      32
5.49     active+remapped+backfilling   [18,32,9]          18   [32,9,13]        
      32
5.1f7     active+recovering+remapped   [9,32,13]           9   [32,20,1]        
      32
5.100    active+remapped+backfilling   [33,9,13]          33   [33,9,20]        
      33
5.58     active+remapped+backfilling    [32,6,4]          32   [32,6,20]        
      32
5.16      active+recovering+remapped   [32,18,3]          32    [32,3,9]        
      32
5.60      active+recovering+remapped   [31,13,4]          31    [32,4,9]        
      32
5.fc               active+recovering   [32,9,10]          32   [32,9,10]        
      32
5.c8     active+remapped+backfilling   [32,14,3]          32   [32,3,13]        
      32
5.ad     active+remapped+backfilling   [32,3,16]          32   [32,3,22]        
      32
5.6e     active+remapped+backfilling   [32,5,18]          32   [32,5,11]        
      32
6.6d     active+remapped+backfilling    [32,6,5]          32   [32,5,21]        
      32
5.cf     active+remapped+backfilling   [33,9,18]          33  [33,11,16]        
      33
5.7e     active+remapped+backfilling   [32,9,18]          32   [32,4,16]        
      32
6.37     active+remapped+backfilling    [32,4,3]          32   [32,4,21]        
      32
5.1aa    active+remapped+backfilling  [13,33,10]          13   [33,10,1]        
      33
5.165    active+remapped+backfilling   [33,5,16]          33   [33,5,21]        
      33
5.76     active+remapped+backfilling   [13,1,32]          13   [32,1,22]        
      32
5.102    active+remapped+backfilling    [33,5,6]          33   [33,5,21]        
      33
5.2d     active+remapped+backfilling   [32,18,4]          32    [32,4,5]        
      32
6.24     active+remapped+backfilling   [33,18,2]          33    [33,9,3]        
      33
5.f6     active+remapped+backfilling   [32,1,14]          32   [32,1,22]        
      32
5.1c     active+remapped+backfilling   [33,18,3]          33   [33,3,22]        
      33
5.d9     active+remapped+backfilling  [33,18,11]          33   [33,11,9]        
      33
5.184    active+remapped+backfilling   [32,14,2]          32   [32,20,5]        
      32
5.e6     active+remapped+backfilling  [18,33,16]          18  [33,16,13]        
      33
5.18f     active+recovering+remapped   [18,32,9]          18   [32,9,13]        
      32
5.e9     active+remapped+backfilling   [32,13,9]          32   [32,13,2]        
      32
5.55     active+remapped+backfilling   [32,6,14]          32    [32,6,3]        
      32
5.eb     active+remapped+backfilling  [18,33,11]          18   [32,20,6]        
      32
6.13     active+remapped+backfilling   [14,10,1]          14  [32,20,13]        
      32
5.107    active+remapped+backfilling   [14,3,31]          14    [32,3,1]        
      32
5.109    active+remapped+backfilling   [32,4,14]          32   [32,4,13]        
      32
5.117    active+remapped+backfilling   [33,16,3]          33  [33,16,20]        
      33
6.30     active+remapped+backfilling    [32,4,1]          32   [32,1,21]        
      32
5.126    active+remapped+backfilling    [33,9,4]          33   [33,9,21]        
      33
ok
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to