[ceph-users] Re: OBJECT_UNFOUND and PG_DAMAGED (recovery_unfound) errors

Maged Mokhtar Thu, 15 May 2025 08:42:45 -0700

Some cases that would lead to unfound objects:

1) If min size = 1.

In this case you may have only 1 OSD acting for the pg with the others 2OSDs being down. The 2 down OSDs could be due to problems with drivesthemselves or their hosts going down or under load the heartbeats arenot received and the 2 OSDs will be marked down and could be flapping(up/down). When recovery kicks in and other OSDs are assigned asreplacement or (most likely in your case) downed OSDs are back up, a newepoch is formed and PGs start peering and agree on state of pg, the pgwill become active and backfill and/or recovery will occur. If then theprimary has a hardware failure before all changes were synced, theremaining OSDs will be aware of the new/updated changes (since theypeered) but that they do not have all the data synced. So the new stateof the pg will show unfound objects.

2) Power outage with incorrect/bad hardware. If you have power outageaffecting all/many OSDs and if you have HDDs on controller withwriteback cache and the controller does not have battery backprotection. Or you have sub-grade consumer SSD without PLP. It ispossible that some new writes transactions do not fully persist.

3) I think in EC, if you have more than m+1 failures, the remaining k-1copies will peer but will declare all their objects as unfound as theyonly have shards and cannot construct the objects. In contrast inreplicated pools, if you lose all replicas, the pg will be down or unknown.


/Maged


On 12/05/2025 21:44, Alex wrote:

Hi everyone.
Help me settle a debate.

My coworker is seeing
OBJECT_UNFOUND and PG_DAMAGED (recovery_unfound) errors.
We both agree they are caused by bad drives.
The fix is to mark the drive as out, replace it and add it back in.
Whenever we see this error on Ceph we see corresponding read errors on
the physical drive.

I'm saying that even though the drive is bad since there are two more copies,
only 1 of 3 drives has bad sectors preventing the data from being accessed
which is what dmesg is showing
ie critical medium error, dev sd..., sector 12345...
There should not be OBJECT_UNFOUND since Ceph compares the remaining
two copies and assuming the data matches,
it should be able to recover on it's own and move the data to another
PG or maybe OBJECT_UNFOUND and PG_DAMAGED are warnings not errors.

My coworker is saying because the primary OSD responsible for
coordinating the PG was the one which failed,
and is the "source of truth" the cluster goes into error state.

His argument doesn't make sense to me since there should be no single
point of failure,
but I'm also not sure about my argument since I don't know enough
about how Ceph works under the hood.

Thanks.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: OBJECT_UNFOUND and PG_DAMAGED (recovery_unfound) errors

Reply via email to