Hi Caspar,
Yes, cluster was working fine with number of PGs per OSD warning up until
now. I am not sure how to recover from stale down/inactive PGs. If you
happen to know about this can you let me know?
Current State:
[root@fre101 ~]# ceph -s
2019-01-04 05:22:05.942349 7f314f613700 -1 asok(0x7f31480017a0)
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
bind the UNIX domain socket to
'/var/run/ceph-guests/ceph-client.admin.1053724.139849638091088.asok': (2)
No such file or directory
cluster:
id: adb9ad8e-f458-4124-bf58-7963a8d1391f
health: HEALTH_ERR
3 pools have many more objects per pg than average
505714/12392650 objects misplaced (4.081%)
3883 PGs pending on creation
Reduced data availability: 6519 pgs inactive, 1870 pgs down, 1
pg peering, 886 pgs stale
Degraded data redundancy: 42987/12392650 objects degraded
(0.347%), 634 pgs degraded, 16 pgs undersized
125827 slow requests are blocked > 32 sec
2 stuck requests are blocked > 4096 sec
too many PGs per OSD (2758 > max 200)
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
osd: 39 osds: 39 up, 39 in; 76 remapped pgs
rgw: 1 daemon active
data:
pools: 18 pools, 54656 pgs
objects: 6051k objects, 10944 GB
usage: 21933 GB used, 50688 GB / 72622 GB avail
pgs: 11.927% pgs not active
42987/12392650 objects degraded (0.347%)
505714/12392650 objects misplaced (4.081%)
48080 active+clean
3885 activating
1111 down
759 stale+down
614 activating+degraded
74 activating+remapped
46 stale+active+clean
35 stale+activating
21 stale+activating+remapped
9 stale+active+undersized
9 stale+activating+degraded
5 stale+activating+undersized+degraded+remapped
3 activating+degraded+remapped
1 stale+activating+degraded+remapped
1 stale+active+undersized+degraded
1 remapped+peering
1 active+clean+remapped
1 activating+undersized+degraded+remapped
io:
client: 0 B/s rd, 25397 B/s wr, 4 op/s rd, 4 op/s wr
I will update number of PGs per OSD once these inactive or stale PGs come
online. I am not able to access VMs (VMs, Images) which are using Ceph.
Thanks
Arun
On Fri, Jan 4, 2019 at 4:53 AM Caspar Smit <[email protected]> wrote:
> Hi Arun,
>
> How did you end up with a 'working' cluster with so many pgs per OSD?
>
> "too many PGs per OSD (2968 > max 200)"
>
> To (temporarily) allow this kind of pgs per osd you could try this:
>
> Change these values in the global section in your ceph.conf:
>
> mon max pg per osd = 200
> osd max pg per osd hard ratio = 2
>
> It allows 200*2 = 400 Pgs per OSD before disabling the creation of new
> pgs.
>
> Above are the defaults (for Luminous, maybe other versions too)
> You can check your current settings with:
>
> ceph daemon mon.ceph-mon01 config show |grep pg_per_osd
>
> Since your current pgs per osd ratio is way higher then the default you
> could set them to for instance:
>
> mon max pg per osd = 1000
> osd max pg per osd hard ratio = 5
>
> Which allow for 5000 pgs per osd before disabling creation of new pgs.
>
> You'll need to inject the setting into the mons/osds and restart mgrs to
> make them active.
>
> ceph tell mon.* injectargs ‘--mon_max_pg_per_osd 1000’
> ceph tell mon.* injectargs ‘--osd_max_pg_per_osd_hard_ratio 5’
> ceph tell osd.* injectargs ‘--mon_max_pg_per_osd 1000’
> ceph tell osd.* injectargs ‘--osd_max_pg_per_osd_hard_ratio 5’
> restart mgrs
>
> Kind regards,
> Caspar
>
>
> Op vr 4 jan. 2019 om 04:28 schreef Arun POONIA <
> [email protected]>:
>
>> Hi Chris,
>>
>> Indeed that's what happened. I didn't set noout flag either and I did
>> zapped disk on new server every time. In my cluster status fre201 is only
>> new server.
>>
>> Current Status after enabling 3 OSDs on fre201 host.
>>
>> [root@fre201 ~]# ceph osd tree
>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>> -1 70.92137 root default
>> -2 5.45549 host fre101
>> 0 hdd 1.81850 osd.0 up 1.00000 1.00000
>> 1 hdd 1.81850 osd.1 up 1.00000 1.00000
>> 2 hdd 1.81850 osd.2 up 1.00000 1.00000
>> -9 5.45549 host fre103
>> 3 hdd 1.81850 osd.3 up 1.00000 1.00000
>> 4 hdd 1.81850 osd.4 up 1.00000 1.00000
>> 5 hdd 1.81850 osd.5 up 1.00000 1.00000
>> -3 5.45549 host fre105
>> 6 hdd 1.81850 osd.6 up 1.00000 1.00000
>> 7 hdd 1.81850 osd.7 up 1.00000 1.00000
>> 8 hdd 1.81850 osd.8 up 1.00000 1.00000
>> -4 5.45549 host fre107
>> 9 hdd 1.81850 osd.9 up 1.00000 1.00000
>> 10 hdd 1.81850 osd.10 up 1.00000 1.00000
>> 11 hdd 1.81850 osd.11 up 1.00000 1.00000
>> -5 5.45549 host fre109
>> 12 hdd 1.81850 osd.12 up 1.00000 1.00000
>> 13 hdd 1.81850 osd.13 up 1.00000 1.00000
>> 14 hdd 1.81850 osd.14 up 1.00000 1.00000
>> -6 5.45549 host fre111
>> 15 hdd 1.81850 osd.15 up 1.00000 1.00000
>> 16 hdd 1.81850 osd.16 up 1.00000 1.00000
>> 17 hdd 1.81850 osd.17 up 0.79999 1.00000
>> -7 5.45549 host fre113
>> 18 hdd 1.81850 osd.18 up 1.00000 1.00000
>> 19 hdd 1.81850 osd.19 up 1.00000 1.00000
>> 20 hdd 1.81850 osd.20 up 1.00000 1.00000
>> -8 5.45549 host fre115
>> 21 hdd 1.81850 osd.21 up 1.00000 1.00000
>> 22 hdd 1.81850 osd.22 up 1.00000 1.00000
>> 23 hdd 1.81850 osd.23 up 1.00000 1.00000
>> -10 5.45549 host fre117
>> 24 hdd 1.81850 osd.24 up 1.00000 1.00000
>> 25 hdd 1.81850 osd.25 up 1.00000 1.00000
>> 26 hdd 1.81850 osd.26 up 1.00000 1.00000
>> -11 5.45549 host fre119
>> 27 hdd 1.81850 osd.27 up 1.00000 1.00000
>> 28 hdd 1.81850 osd.28 up 1.00000 1.00000
>> 29 hdd 1.81850 osd.29 up 1.00000 1.00000
>> -12 5.45549 host fre121
>> 30 hdd 1.81850 osd.30 up 1.00000 1.00000
>> 31 hdd 1.81850 osd.31 up 1.00000 1.00000
>> 32 hdd 1.81850 osd.32 up 1.00000 1.00000
>> -13 5.45549 host fre123
>> 33 hdd 1.81850 osd.33 up 1.00000 1.00000
>> 34 hdd 1.81850 osd.34 up 1.00000 1.00000
>> 35 hdd 1.81850 osd.35 up 1.00000 1.00000
>> -27 5.45549 host fre201
>> 36 hdd 1.81850 osd.36 up 1.00000 1.00000
>> 37 hdd 1.81850 osd.37 up 1.00000 1.00000
>> 38 hdd 1.81850 osd.38 up 1.00000 1.00000
>> [root@fre201 ~]#
>> [root@fre201 ~]#
>> [root@fre201 ~]#
>> [root@fre201 ~]#
>> [root@fre201 ~]#
>> [root@fre201 ~]# ceph -s
>> cluster:
>> id: adb9ad8e-f458-4124-bf58-7963a8d1391f
>> health: HEALTH_ERR
>> 3 pools have many more objects per pg than average
>> 585791/12391450 objects misplaced (4.727%)
>> 2 scrub errors
>> 2374 PGs pending on creation
>> Reduced data availability: 6578 pgs inactive, 2025 pgs down,
>> 74 pgs peering, 1234 pgs stale
>> Possible data damage: 2 pgs inconsistent
>> Degraded data redundancy: 64969/12391450 objects degraded
>> (0.524%), 616 pgs degraded, 20 pgs undersized
>> 96242 slow requests are blocked > 32 sec
>> 228 stuck requests are blocked > 4096 sec
>> too many PGs per OSD (2768 > max 200)
>>
>> services:
>> mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
>> mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
>> osd: 39 osds: 39 up, 39 in; 96 remapped pgs
>> rgw: 1 daemon active
>>
>> data:
>> pools: 18 pools, 54656 pgs
>> objects: 6050k objects, 10942 GB
>> usage: 21900 GB used, 50721 GB / 72622 GB avail
>> pgs: 0.002% pgs unknown
>> 12.050% pgs not active
>> 64969/12391450 objects degraded (0.524%)
>> 585791/12391450 objects misplaced (4.727%)
>> 47489 active+clean
>> 3670 activating
>> 1098 stale+down
>> 923 down
>> 575 activating+degraded
>> 563 stale+active+clean
>> 105 stale+activating
>> 78 activating+remapped
>> 72 peering
>> 25 stale+activating+degraded
>> 23 stale+activating+remapped
>> 9 stale+active+undersized
>> 6 stale+activating+undersized+degraded+remapped
>> 5 stale+active+undersized+degraded
>> 4 down+remapped
>> 4 activating+degraded+remapped
>> 2 active+clean+inconsistent
>> 1 stale+activating+degraded+remapped
>> 1 stale+active+clean+remapped
>> 1 stale+remapped+peering
>> 1 remapped+peering
>> 1 unknown
>>
>> io:
>> client: 0 B/s rd, 208 kB/s wr, 22 op/s rd, 22 op/s wr
>>
>>
>>
>> Thanks
>> Arun
>>
>>
>> On Thu, Jan 3, 2019 at 7:19 PM Chris <[email protected]> wrote:
>>
>>> If you added OSDs and then deleted them repeatedly without waiting for
>>> replication to finish as the cluster attempted to re-balance across them,
>>> its highly likely that you are permanently missing PGs (especially if the
>>> disks were zapped each time).
>>>
>>> If those 3 down OSDs can be revived there is a (small) chance that you
>>> can right the ship, but 1400pg/OSD is pretty extreme. I'm surprised
>>> the cluster even let you do that - this sounds like a data loss event.
>>>
>>> Bring back the 3 OSD and see what those 2 inconsistent pgs look like
>>> with ceph pg query.
>>>
>>> On January 3, 2019 21:59:38 Arun POONIA <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Recently I tried adding a new node (OSD) to ceph cluster using
>>>> ceph-deploy tool. Since I was experimenting with tool and ended up deleting
>>>> OSD nodes on new server couple of times.
>>>>
>>>> Now since ceph OSDs are running on new server cluster PGs seems to be
>>>> inactive (10-15%) and they are not recovering or rebalancing. Not sure what
>>>> to do. I tried shutting down OSDs on new server.
>>>>
>>>> Status:
>>>> [root@fre105 ~]# ceph -s
>>>> 2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0)
>>>> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
>>>> bind the UNIX domain socket to
>>>> '/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2)
>>>> No such file or directory
>>>> cluster:
>>>> id: adb9ad8e-f458-4124-bf58-7963a8d1391f
>>>> health: HEALTH_ERR
>>>> 3 pools have many more objects per pg than average
>>>> 373907/12391198 objects misplaced (3.018%)
>>>> 2 scrub errors
>>>> 9677 PGs pending on creation
>>>> Reduced data availability: 7145 pgs inactive, 6228 pgs
>>>> down, 1 pg peering, 2717 pgs stale
>>>> Possible data damage: 2 pgs inconsistent
>>>> Degraded data redundancy: 178350/12391198 objects degraded
>>>> (1.439%), 346 pgs degraded, 1297 pgs undersized
>>>> 52486 slow requests are blocked > 32 sec
>>>> 9287 stuck requests are blocked > 4096 sec
>>>> too many PGs per OSD (2968 > max 200)
>>>>
>>>> services:
>>>> mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
>>>> mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
>>>> osd: 39 osds: 36 up, 36 in; 51 remapped pgs
>>>> rgw: 1 daemon active
>>>>
>>>> data:
>>>> pools: 18 pools, 54656 pgs
>>>> objects: 6050k objects, 10941 GB
>>>> usage: 21727 GB used, 45308 GB / 67035 GB avail
>>>> pgs: 13.073% pgs not active
>>>> 178350/12391198 objects degraded (1.439%)
>>>> 373907/12391198 objects misplaced (3.018%)
>>>> 46177 active+clean
>>>> 5054 down
>>>> 1173 stale+down
>>>> 1084 stale+active+undersized
>>>> 547 activating
>>>> 201 stale+active+undersized+degraded
>>>> 158 stale+activating
>>>> 96 activating+degraded
>>>> 46 stale+active+clean
>>>> 42 activating+remapped
>>>> 34 stale+activating+degraded
>>>> 23 stale+activating+remapped
>>>> 6 stale+activating+undersized+degraded+remapped
>>>> 6 activating+undersized+degraded+remapped
>>>> 2 activating+degraded+remapped
>>>> 2 active+clean+inconsistent
>>>> 1 stale+activating+degraded+remapped
>>>> 1 stale+active+clean+remapped
>>>> 1 stale+remapped
>>>> 1 down+remapped
>>>> 1 remapped+peering
>>>>
>>>> io:
>>>> client: 0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr
>>>>
>>>> Thanks
>>>> --
>>>> Arun Poonia
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>
>>
>> --
>> Arun Poonia
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
Arun Poonia
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com