[ceph-users] Re: Pgs troubleshooting

GLE, Vivien Wed, 30 Jul 2025 04:55:23 -0700

Hi,


>did the two replaced OSDs fail at the sime time (before they were
>completely drained)? This would most likely mean that both those
>failed OSDs contained the other two replicas of this PG


Unfortunately yes


>This would most likely mean that both those
>failed OSDs contained the other two replicas of this PG. A pg query
>should show which OSDs are missing.


If I understand well I need to move my PG on the OSD 1 ?


ceph -w


 osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost


ceph pg query 11.4



     "up": [
                    1,
                    4,
                    5
                ],
                "acting": [
                    1,
                    4,
                    5
                ],
                "avail_no_missing": [],
                "object_location_counts": [
                    {
                        "shards": "3,4,5",
                        "objects": 2
                    }
                ],
                "blocked_by": [],
                "up_primary": 1,
                "acting_primary": 1,
                "purged_snaps": []
            },



Thanks


Vivien

________________________________
De : Eugen Block <[email protected]>
Envoyé : mardi 29 juillet 2025 16:48:41
À : [email protected]
Objet : [ceph-users] Re: Pgs troubleshooting

Hi,

did the two replaced OSDs fail at the sime time (before they were
completely drained)? This would most likely mean that both those
failed OSDs contained the other two replicas of this PG. A pg query
should show which OSDs are missing.
You could try with objectstore-tool to export the PG from the
remaining OSD and import it on different OSDs. Or you mark the data as
lost if you don't care about the data and want a healthy state quickly.

Regards,
Eugen

Zitat von "GLE, Vivien" <[email protected]>:

> Thanks for your help ! This is my new pg stat with no more peering
> pgs (after rebooting some OSD)
>
> ceph pg stat ->
>
> 498 pgs: 1 active+recovery_unfound+degraded, 3
> recovery_unfound+undersized+degraded+remapped+peered, 14
> active+clean+scrubbing+deep, 480 active+clean;
>
> 36 GiB data, 169 GiB used, 6.2 TiB / 6.4 TiB avail; 8.8 KiB/s rd, 0
> B/s wr, 12 op/s; 715/41838 objects degraded (1.709%); 5/13946
> objects unfound (0.036%)
>
> ceph pg ls recovery_unfound -> shows that PG are replica 3, tried to
> repair but nothing happened
>
>
> ceph -w ->
>
> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost
>
>
>
> ________________________________
> De : Frédéric Nass <[email protected]>
> Envoyé : mardi 29 juillet 2025 14:03:37
> À : GLE, Vivien
> Cc : [email protected]
> Objet : Re: [ceph-users] Pgs troubleshooting
>
> Hi Vivien,
>
> Unless you ran 'ceph pg stat' command when peering was occuring, the
> 37 peering PGs might indicate a temporary peering issue with one or
> more OSDs. If that's the case then restarting associated OSDs could
> help with the peering or ceph pg. You could list those PGs and
> associated OSDs with 'ceph pg ls peering' and trigger peering by
> either restarting one common OSD or by using 'ceph pg repeer <pg_id>'.
>
> Regarding the unfound object and its associated backfill_unfound PG,
> you could identify this PG with 'ceph pg ls backfill_unfound' and
> investigate this PG with 'ceph pg <pg_id> query'. Depending on the
> output, you could try running a 'ceph pg repair <pg_id>'. Could you
> confirm that this PG is not part of a size=2 pool?
>
> Best regards,
> Frédéric.
>
> --
> Frédéric Nass
> Ceph Ambassador France | Senior Ceph Engineer @ CLYSO
> Try our Ceph Analyzer -- https://analyzer.clyso.com/
> https://clyso.com | [email protected]<mailto:[email protected]>
>
>
> Le mar. 29 juil. 2025 à 14:19, GLE, Vivien
> <[email protected]<mailto:[email protected]>> a écrit :
> Hi,
>
> After replacing 2 OSD (data corruption), this is the stats of my
> testing ceph cluster
>
> ceph pg stat
>
> 498 pgs: 37 peering, 1 active+remapped+backfilling, 1
> active+clean+remapped, 1 active+recovery_wait+undersized+remapped, 1
> backfill_unfound+undersized+degraded+remapped+peered, 1
> remapped+peering, 12 active+clean+scrubbing+deep, 1
> active+undersized, 442 active+clean, 1
> active+recovering+undersized+remapped
>
> 34 GiB data, 175 GiB used, 6.2 TiB / 6.4 TiB avail; 1.7 KiB/s rd, 1
> op/s; 31/39768 objects degraded (0.078%); 6/39768 objects misplaced
> (0.015%); 1/13256 objects unfound (0.008%)
>
> ceph osd stat
> 7 osds: 7 up (since 20h), 7 in (since 20h); epoch: e427538; 4 remapped pgs
>
> Anyone had an idea of where to start to get a healthy cluster ?
>
> Thanks !
>
> Vivien
>
>
> _______________________________________________
> ceph-users mailing list -- [email protected]<mailto:[email protected]>
> To unsubscribe send an email to
> [email protected]<mailto:[email protected]>
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Pgs troubleshooting

Reply via email to