Hi,

we have a very similar situation. We updated from nautilus -> pacific (16.2.11) 
and saw a rapid increase in the commit_latency and op_w_latency (>10s on some 
OSDs) after a few hours. We also have nearly exclusive rbd workload.

After deleting old snapshots we saw an improvenent, and after recreating 
snapshots the numbers went up again. Without snapshots the numbers are slowly 
getting higher but not as fast as before with existing snapshots. We also use 
SAS connected NVMe-SSDs. 
bluefs_buffered_io made no difference. We compacted the rocksdb on a single OSD 
yesterday, and funnily enough this is now the OSD with the highest 
op_w_latency. I generated a perf graph for this single OSD and can generate 
more, but I'm not sure how to share this data with you...?

I saw in the thread that Boris redeployed all OSDs. Could that be a more 
permanent solution or is this also just temporarily (like deleting the 
snapshots)?

Greetings,
Jan
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to