Some cases that would lead to unfound objects:
1) If min size = 1.
In this case you may have only 1 OSD acting for the pg with the others 2
OSDs being down. The 2 down OSDs could be due to problems with drives
themselves or their hosts going down or under load the heartbeats are
not received and the 2 OSDs will be marked down and could be flapping
(up/down). When recovery kicks in and other OSDs are assigned as
replacement or (most likely in your case) downed OSDs are back up, a new
epoch is formed and PGs start peering and agree on state of pg, the pg
will become active and backfill and/or recovery will occur. If then the
primary has a hardware failure before all changes were synced, the
remaining OSDs will be aware of the new/updated changes (since they
peered) but that they do not have all the data synced. So the new state
of the pg will show unfound objects.
2) Power outage with incorrect/bad hardware. If you have power outage
affecting all/many OSDs and if you have HDDs on controller with
writeback cache and the controller does not have battery back
protection. Or you have sub-grade consumer SSD without PLP. It is
possible that some new writes transactions do not fully persist.
3) I think in EC, if you have more than m+1 failures, the remaining k-1
copies will peer but will declare all their objects as unfound as they
only have shards and cannot construct the objects. In contrast in
replicated pools, if you lose all replicas, the pg will be down or unknown.
/Maged
On 12/05/2025 21:44, Alex wrote:
Hi everyone.
Help me settle a debate.
My coworker is seeing
OBJECT_UNFOUND and PG_DAMAGED (recovery_unfound) errors.
We both agree they are caused by bad drives.
The fix is to mark the drive as out, replace it and add it back in.
Whenever we see this error on Ceph we see corresponding read errors on
the physical drive.
I'm saying that even though the drive is bad since there are two more copies,
only 1 of 3 drives has bad sectors preventing the data from being accessed
which is what dmesg is showing
ie critical medium error, dev sd..., sector 12345...
There should not be OBJECT_UNFOUND since Ceph compares the remaining
two copies and assuming the data matches,
it should be able to recover on it's own and move the data to another
PG or maybe OBJECT_UNFOUND and PG_DAMAGED are warnings not errors.
My coworker is saying because the primary OSD responsible for
coordinating the PG was the one which failed,
and is the "source of truth" the cluster goes into error state.
His argument doesn't make sense to me since there should be no single
point of failure,
but I'm also not sure about my argument since I don't know enough
about how Ceph works under the hood.
Thanks.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]