Hello Jan, I had the same on two cluster from nautlus to pacific.
On both it did help to fire Ceph tell osd.* compact If this had not help, i would go for a recreate of the osds... Hth Mehmet Am 31. März 2023 10:56:42 MESZ schrieb [email protected]: >Hi, > >we have a very similar situation. We updated from nautilus -> pacific >(16.2.11) and saw a rapid increase in the commit_latency and op_w_latency >(>10s on some OSDs) after a few hours. We also have nearly exclusive rbd >workload. > >After deleting old snapshots we saw an improvenent, and after recreating >snapshots the numbers went up again. Without snapshots the numbers are slowly >getting higher but not as fast as before with existing snapshots. We also use >SAS connected NVMe-SSDs. >bluefs_buffered_io made no difference. We compacted the rocksdb on a single >OSD yesterday, and funnily enough this is now the OSD with the highest >op_w_latency. I generated a perf graph for this single OSD and can generate >more, but I'm not sure how to share this data with you...? > >I saw in the thread that Boris redeployed all OSDs. Could that be a more >permanent solution or is this also just temporarily (like deleting the >snapshots)? > >Greetings, >Jan >_______________________________________________ >ceph-users mailing list -- [email protected] >To unsubscribe send an email to [email protected] _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
