Hello, Do manual OSD benchmark for hdd/ssd https://docs.ceph.com/en/squid/rados/configuration/mclock-config-ref/#global-override-of-max-iops-capacity-for-multiple-osds
Change value globaly https://docs.ceph.com/en/squid/rados/configuration/mclock-config-ref/#global-override-of-max-iops-capacity-for-multiple-osds Or per OSD ceph config set osd.X osd_mclock_max_capacity_iops_hdd 900.000000 ceph config set osd.X osd_mclock_max_capacity_iops_ssd 74000.000000 Change: ceph config set osd_scrub_load_threshold 10.000000 It should help mClock work better. On Wed, 11 Jun 2025 at 11:13, Michel Jouvin <[email protected]> wrote: > Janne, > > Thanks for your answer, I'll do as you suggest and see if we observe > negative side effects. We are struggling with slow deep scrubs (like > described in https://tracker.ceph.com/issues/69078) and I'm wondering if > the OSDs with low values may contribute to the problem... > > Michel > > Le 11/06/2025 à 09:37, Janne Johansson a écrit : > > Den tis 10 juni 2025 kl 18:59 skrev Michel Jouvin > > <[email protected]>: > >> a little bit surprised that the osd_mclock_capacity_iops_hdd computed > >> for each OSD is so different (basically a x2 between the lowest and > >> highest values). > >> Also, the documentation explains that you can define a value that you > >> measured and seems to suggest that once defined, it will not be updated. > >> Am I right? If yes, does it mean that once the automatic bench has > >> determined a value the only way to update it is to delete it from the > >> config and restart the OSD (if you want the automatic bench to > >> update/redefine it)? > > I think your assessment is correct on all details. I guess you would > > take a decent value from the high end of your range and set it on all > > drives, to "compensate" for the tests being done in various times. Not > > necessarily the exact highest, but if it was showing between 100 to > > 200 iops, then perhaps 150 or 175 could be reasonable for all drives, > > and unless it causes problems just leave it there for the hdd drives. > > It's hard from the outside to tell if it is worse that it becomes only > > 100 for one or some drives because it was tested when the system was a > > bit more busy than usual, and hence get less io scheduled to it > > (scrubs and repairs and so on), compared to how bad it would be if one > > drive actually only can deliver 100 for some reason and you hard code > > it to 150 so it is given 50% too many non-client-IO requests. > > > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
