Hello, Regarding our Ceph issue, we have a new theory that has emerged from our findings.
This morning, we had an OSD crash because the underlying disk had bad blocks. This OSD, still on a machine that needs to be updated and converted, saw its XFS FS crash due to the bad blocks. In itself, this is nothing unusual (with filestore). What we did observe, however, is that during the recovery and rebalancing phase—while the OSD was down, and then during ‘its’ recovery—the MON DB ballooned (~ 3 GB -> 30 GB) and the ‘SSD’ OSDs saw their occupancy increase 0.01% by 0.01% (we took about 2% over ~ 4 hours). Once the recovery was complete and the cluster status returned to ‘Healthy’, the MON DB returned to its normal size and the ‘SSD’ OSDs saw their occupancy return to a normal level, i.e. the level before the incident. So our assumption now, which is more rational, is that during recovery, the cluster keeps many more (if not all?) PGmaps, which explains why the MON DB swells and why the OSDs probably also keep (would keep) as many copies as when the cluster is in an OK state. The questions that now arise are: why does an event as simple as an OSD down cause such a significant swelling of the MON DB (we did not have this before the upgrade and conversion)? And why does the OSD space usage reach such a point that it can become saturated? In any case, based on our previous assumption, the connection point is likely that scrubs are suspended when the cluster is not ‘healthy’, hence our observation/assumption that restarting scrubs could have an impact, which is probably not the case. And in relation to our upgrade/conversion plan, doing it server by server seems safer. Stay tuned... Olivier
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
