Hi,
We have six storage nodes, and added three new only-SSD storage.nodes.
I started increasing weight to fill in freshly added OSD on new osd storage
nodes, the command was:
ceph osd crush reweight osd.126 0.2
cluster started rebalance:
2019-05-22 11:00:00.000253 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4607699 :
cluster [INF] overall HEALTH_OK
2019-05-22 12:00:00.000175 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4608927 :
cluster [INF] overall HEALTH_OK
2019-05-22 13:00:00.000216 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4610174 :
cluster [INF] overall HEALTH_OK
2019-05-22 13:44:57.353665 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4611095 :
cluster [WRN] Health check failed: Reduced data availability: 2 pgs peering
(PG_AVAILABILITY)
2019-05-22 13:44:58.642328 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4611097 :
cluster [WRN] Health check failed: 68628/33693246 objects misplaced (0.204%)
(OBJECT_MISPLACED)
2019-05-22 13:45:02.696121 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4611098 :
cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data
availability: 5 pgs peering)
2019-05-22 13:45:04.733172 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 4611099 :
cluster [WRN] Health check update: 694611/33693423 objects misplaced (2.062%)
(OBJECT_MISPLACED)
By my knowledge, it should fill up about 200GB on osd.126 and stop rebalancing,
but in this case disk on ssdstor-a01 was filled over 85% and filling this
disk/osd didn't stopped:
[root@ssdstor-a01 ~]# df -h | grep ceph
/dev/sdc1 1.8T 1.6T 237G 88% /var/lib/ceph/osd/ceph-126
/dev/sdd1 1.8T 136G 1.7T 8% /var/lib/ceph/osd/ceph-127
/dev/sde1 1.8T 99G 1.8T 6% /var/lib/ceph/osd/ceph-128
/dev/sdf1 1.8T 121G 1.7T 7% /var/lib/ceph/osd/ceph-129
/dev/sdg1 1.8T 98G 1.8T 6% /var/lib/ceph/osd/ceph-130
/dev/sdh1 1.8T 38G 1.8T 3% /var/lib/ceph/osd/ceph-131
then I changed waight back to 0.1, but cluster seemed to behaved unstable, so I
changed on other new osds weight to 0.1 (to spreed load and available disk
space among new disks).
then the situation on other osds repeated:
weight for osds 132-137:
-55 0.59995 host ssdstor-b01
132 ssd 0.09999 osd.132 up 1.00000 1.00000
133 ssd 0.09999 osd.133 up 1.00000 1.00000
134 ssd 0.09999 osd.134 up 1.00000 1.00000
135 ssd 0.09999 osd.135 up 1.00000 1.00000
136 ssd 0.09999 osd.136 up 1.00000 1.00000
137 ssd 0.09999 osd.137 up 1.00000 1.00000
and on physical server:
root@ssdstor-b01:~# df -h | grep ceph
/dev/sdc1 1.8T 642G 1.2T 35% /var/lib/ceph/osd/ceph-132
/dev/sdd1 1.8T 342G 1.5T 19% /var/lib/ceph/osd/ceph-133
/dev/sde1 1.8T 285G 1.6T 16% /var/lib/ceph/osd/ceph-134
/dev/sdf1 1.8T 114G 1.7T 7% /var/lib/ceph/osd/ceph-135
/dev/sdg1 1.8T 215G 1.6T 12% /var/lib/ceph/osd/ceph-136
/dev/sdh1 1.8T 101G 1.8T 6% /var/lib/ceph/osd/ceph-137
I was changing weight for osds all evening and at night, and at the end I found
the weights, which stabilized replication:
-54 0.11993 host ssdstor-a01
126 ssd 0.01999 osd.126 up 1.00000 1.00000
127 ssd 0.01999 osd.127 up 0.96999 1.00000
128 ssd 0.01999 osd.128 up 1.00000 1.00000
129 ssd 0.01999 osd.129 up 1.00000 1.00000
130 ssd 0.01999 osd.130 up 1.00000 1.00000
131 ssd 0.01999 osd.131 up 1.00000 1.00000
--
-55 0.26993 host ssdstor-b01
132 ssd 0.01999 osd.132 up 1.00000 1.00000
133 ssd 0.04999 osd.133 up 1.00000 1.00000
134 ssd 0.04999 osd.134 up 1.00000 1.00000
135 ssd 0.04999 osd.135 up 1.00000 1.00000
136 ssd 0.04999 osd.136 up 1.00000 1.00000
137 ssd 0.04999 osd.137 up 1.00000 1.00000
--
-56 0.29993 host ssdstor-c01
138 ssd 0.04999 osd.138 up 1.00000 1.00000
139 ssd 0.04999 osd.139 up 1.00000 1.00000
140 ssd 0.04999 osd.140 up 1.00000 1.00000
141 ssd 0.04999 osd.141 up 1.00000 1.00000
142 ssd 0.04999 osd.142 up 1.00000 1.00000
143 ssd 0.04999 osd.143 up 1.00000 1.00000
I also changed reweight on osd.127, to spread data among osds on the same
storage node
ceph osd reweight osd.127 0.97 - as You can see in above output.
Version of ceph:
# ceph versions
{
"mon": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
luminous (stable)": 3
},
"mgr": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
luminous (stable)": 3
},
"osd": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
luminous (stable)": 144
},
"mds": {},
"rbd-mirror": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
luminous (stable)": 3
},
"rgw": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
luminous (stable)": 6
},
"overall": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
luminous (stable)": 159
}
}
Qustion is: is there some problem/bug with balancing in crush or I miss
some
setting ?
--
Regards,
Lukasz
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com