Thank you Anthony. I did have an empty pool that I had provisioned for developers that was never used. I’ve removed that pool and the 0 object PGs are gone. I don’t know why I didn’t realize that. Removing that pool halved the # of PGs not scrubbed in time.
This is entirely an HDD cluster. I don’t constrain my scrubs, and I had already set the osd_deep_scrub_interval to 2 weeks, and increased the osd_scrub_load_threshold to 5. But that didn’t help much. I’ve moved our operations to our failover cluster so hopefully this one can catch up now. I don’t understand how this started out of the blue, but at least now, the number is decreasing. Jeff > On Jan 3, 2023, at 12:57 AM, Anthony D'Atri <[email protected]> wrote: > > Look closely at your output. The PGs with 0 objects. Are only “every other” > due to how the command happened to order the output. > > Note that the empty PGs all have IDs matching “3.*”. The numeric prefix of a > PG ID reflects the cardinal ID of the pool to which it belongs. I strongly > suspect that you have a pool with no data. > > > >>> Strangely, ceph pg dump gives shows every other PG with 0 objects. An >>> attempt to perform a deep scrub (or scrub) on one of these PGs does >>> nothing. The cluster appears to be running fine, but obviously there’s an >>> issue. What should my next steps be to troubleshoot ? >>>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES >>>> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE >>>> STATE_STAMP VERSION REPORTED UP >>>> UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP >>>> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN >>>> 3.e9b 0 0 0 0 0 0 >>>> 0 0 0 0 active+clean >>>> 2022-12-31 22:49:07.629579 0'0 23686:19820 [28,79] >>>> 28 [28,79] 28 0'0 2022-12-31 >>>> 22:49:07.629508 0'0 2022-12-31 22:49:07.629508 0 >>>> 1.e99 60594 0 0 0 0 177433523272 >>>> 0 0 3046 3046 active+clean >>>> 2022-12-21 14:35:08.175858 23686'268137 23686:1732399 [178,115] >>>> 178 [178,115] 178 23675'267613 2022-12-21 >>>> 11:01:10.403525 23675'267613 2022-12-21 11:01:10.403525 0 >>>> 3.e9a 0 0 0 0 0 0 >>>> 0 0 0 0 active+clean >>>> 2022-12-31 09:16:48.644619 0'0 23686:22855 [51,140] >>>> 51 [51,140] 51 0'0 2022-12-31 >>>> 09:16:48.644568 0'0 2022-12-30 02:35:23.367344 0 >>>> 1.e98 59962 0 0 0 0 177218669411 >>>> 0 0 3035 3035 active+clean >>>> 2022-12-28 14:14:49.908560 23686'265576 23686:1357499 [92,86] >>>> 92 [92,86] 92 23686'265445 2022-12-28 >>>> 14:14:49.908522 23686'265445 2022-12-28 14:14:49.908522 0 >>>> 3.e95 0 0 0 0 0 0 >>>> 0 0 0 0 active+clean >>>> 2022-12-31 06:09:39.442932 0'0 23686:22757 [48,83] >>>> 48 [48,83] 48 0'0 2022-12-31 >>>> 06:09:39.442879 0'0 2022-12-18 09:33:47.892142 0 > > > As to your PGs not scrubbed in time, what sort of hardware are your OSDs? > Here are some thoughts, especially if they’re HDDs. > > * If you don’t need that empty pool, delete it, then evaluate how many PGs on > average your OSDs hold (eg. `ceph osd df`). If you have an unusually high > number of PGs per, maybe just maybe you’re running afoul of > osd_scrub_extended_sleep / osd_scrub_sleep . In other words, individual > scrubs on empty PGs may naturally be very fast, but they may be DoSing > because of the efforts Ceph makes to spread out the impact of scrubs. > > * Do you limit scrubs to certain times via osd_scrub_begin_hour, > osd_scrub_end_hour, osd_scrub_begin_week_day, osd_scrub_end_week_day? I’ve > seen operators who constraint scrubs to only a few overnight / weekend hours, > but doing so can hobble Ceph’s ability to get through them all in time. > > * Similarly, a value of osd_scrub_load_threshold that’s too low can also > result in starvation. The load average statistic can be misleading on modern > SMP systems with lots of cores. I’ve witnessed 32c/64t OSD nodes report a > load average of like 40, but with tools like htop one could see that they > were barely breaking a sweat. > > * If you have osd_scrub_during_recovery disabled and experience a lot of > backfill / recovery / rebalance traffic, that can starve scrubs too. IMHO > with recent releases this should almost always be enabled, ymmv. > > * Back when I ran busy (read: underspend) HDD clusters I had to bump > osd_deep_scrub_interval by a factor of 4x due to how slow and seek-bound the > LFF spinners were. Of course, the longer one spaces out scrubs, the less > effective they are at detecting problems before they’re impactful. > > > >
_______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
