Hi Everyone,
I'm running Ceph Nautilus on CentOS7, using NFS-Ganesha to serve a couple
CentOS 6 clients using CephFS. We have 180 OSDs, each a 12TB disk evenly
spread across 6 servers.
Fairly often, I'll receive something like:
OBJECT_UNFOUND 1/231940937 objects unfound (0.000%)
pg 1.542 has 1 unfound objects
It's usually a very small number of unfound objects.
I can't determine what is causing this to occur, but when it does it hangs the
NFS mounts, but not the CephFS mounts on other servers. This leads me to
believe NFS is the culprit somehow. When this happens the way I recover the
NFS service is to revert the unfound object:
ceph pg 1.542 mark_unfound_lost revert
Reading the RedHat Documentation
<https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/troubleshooting_guide/troubleshooting-placement-groups#unfound-objects>,
it states that this could happen if OSDs are going on and offline, which is
not happening. All the OSDs are stable.
This service had been VERY stable for weeks, until I ran the latest upgrades
last evening from 14.2.20 to 14.2.22. Now it's been happening frequently (4
times so far since last evening).
I have a couple questions:
1. Is there a way to tell which files are associated with a PG in a CephFS? My
thinking is that knowing the file locations and what's going on with these
files at that time might provide a clue. And whether I'm losing data when
performing a revert.
2. What does "revert" really do?
3. What troubleshooting might I do to pinpoint the cause?
Thanks for any pointers, I'm fairly new to Ceph.
Jeff Turmelle
—
Jeff Turmelle, Lead Systems Analyst
International Research Institute for Climate and Society
<http://iri.columbia.edu/>
The Earth Institute <http://www.earthinstitute.columbia.edu/> at Columbia
University <http://www.columbia.edu/>
cell: (845) 652-3461
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]