I change the crush weights. My 4 second sleep doesn't let peering finish for
each one before continuing. I'd test with some small steps to get an idea of
how much remaps when increasing the weight by $x. I've found my cluster is
comfortable with +1 increases...also it take awhile to get to a weight of 11 if
I did anything smaller.
for i in {264..311}; do ceph osd crush reweight osd.${i} 11.0;sleep 4;done
Kevin
On 7/24/19 12:33 PM, Xavier Trilla wrote:
Hi Kevin,
Yeah, that makes a lot of sense, and looks even safer than adding OSDs one by
one. What do you change, the crush weight? Or the reweight? (I guess you change
the crush weight, I am right?)
Thanks!
El 24 jul 2019, a les 19:17, Kevin Hrpcek
<[email protected]<mailto:[email protected]>> va escriure:
I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what I do,
you can obviously change the weight increase steps to what you are comfortable
with. This has worked well for me and my workloads. I've sometimes seen peering
take longer if I do steps too quickly but I don't run any mission critical has
to be up 100% stuff and I usually don't notice if a pg takes a while to peer.
Add all OSDs with an initial weight of 0. (nothing gets remapped)
Ensure cluster is healthy.
Use a for loop to increase weight on all news OSDs to 0.5 with a generous sleep
between each for peering.
Let the cluster balance and get healthy or close to healthy.
Then repeat the previous 2 steps increasing weight by +0.5 or +1.0 until I am
at the desired weight.
Kevin
On 7/24/19 11:44 AM, Xavier Trilla wrote:
Hi,
What would be the proper way to add 100 new OSDs to a cluster?
I have to add 100 new OSDs to our actual > 300 OSDs cluster, and I would like
to know how you do it.
Usually, we add them quite slowly. Our cluster is a pure SSD/NVMe one, and it
can handle plenty of load, but for the sake of safety -it hosts thousands of
VMs via RBD- we usually add them one by one, waiting for a long time between
adding each OSD.
Obviously this leads to PLENTY of data movement, as each time the cluster
geometry changes, data is migrated among all the OSDs. But with the kind of
load we have, if we add several OSDs at the same time, some PGs can get stuck
for a while, while they peer to the new OSDs.
Now that I have to add > 100 new OSDs I was wondering if somebody has some
suggestions.
Thanks!
Xavier.
_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com