Hello,

Regarding our Ceph issue, we have a new theory that has emerged from our 
findings.

This morning, we had an OSD crash because the underlying disk had bad blocks. 
This OSD, still on a machine that needs to be updated and converted, saw its 
XFS FS crash due to the bad blocks.
In itself, this is nothing unusual (with filestore).

What we did observe, however, is that during the recovery and rebalancing 
phase—while the OSD was down, and then during ‘its’ recovery—the MON DB 
ballooned (~ 3 GB -> 30 GB) and the ‘SSD’ OSDs saw their occupancy increase 
0.01% by 0.01% (we took about 2% over ~ 4 hours).

Once the recovery was complete and the cluster status returned to ‘Healthy’, 
the MON DB returned to its normal size and the ‘SSD’ OSDs saw their occupancy 
return to a normal level, i.e. the level before the incident.

So our assumption now, which is more rational, is that during recovery, the 
cluster keeps many more (if not all?) PGmaps, which explains why the MON DB 
swells and why the OSDs probably also keep (would keep) as many copies as when 
the cluster is in an OK state.

The questions that now arise are: why does an event as simple as an OSD down 
cause such a significant swelling of the MON DB (we did not have this before 
the upgrade and conversion)? And why does the OSD space usage reach such a 
point that it can become saturated?

In any case, based on our previous assumption, the connection point is likely 
that scrubs are suspended when the cluster is not ‘healthy’, hence our 
observation/assumption that restarting scrubs could have an impact, which is 
probably not the case.

And in relation to our upgrade/conversion plan, doing it server by server seems 
safer.

Stay tuned...

Olivier

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to