Hello Jan,

I had the same on two cluster from nautlus to pacific.

On both it did help to fire

Ceph tell osd.* compact

If this had not help, i would go for a recreate of the osds...

Hth
Mehmet 


Am 31. März 2023 10:56:42 MESZ schrieb [email protected]:
>Hi,
>
>we have a very similar situation. We updated from nautilus -> pacific 
>(16.2.11) and saw a rapid increase in the commit_latency and op_w_latency 
>(>10s on some OSDs) after a few hours. We also have nearly exclusive rbd 
>workload.
>
>After deleting old snapshots we saw an improvenent, and after recreating 
>snapshots the numbers went up again. Without snapshots the numbers are slowly 
>getting higher but not as fast as before with existing snapshots. We also use 
>SAS connected NVMe-SSDs. 
>bluefs_buffered_io made no difference. We compacted the rocksdb on a single 
>OSD yesterday, and funnily enough this is now the OSD with the highest 
>op_w_latency. I generated a perf graph for this single OSD and can generate 
>more, but I'm not sure how to share this data with you...?
>
>I saw in the thread that Boris redeployed all OSDs. Could that be a more 
>permanent solution or is this also just temporarily (like deleting the 
>snapshots)?
>
>Greetings,
>Jan
>_______________________________________________
>ceph-users mailing list -- [email protected]
>To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to