[ceph-users] Re: pg deep scrubbing issue

Jeffrey Turmelle Tue, 03 Jan 2023 07:15:33 -0800

Thank you Anthony.  I did have an empty pool that I had provisioned for 
developers that was never used.  I’ve removed that pool and the 0 object PGs 
are gone.  I don’t know why I didn’t realize that.  Removing that pool halved 
the # of PGs not scrubbed in time.


This is entirely an HDD cluster.  I don’t constrain my scrubs, and I had 
already set the osd_deep_scrub_interval to 2 weeks, and increased the 
osd_scrub_load_threshold to 5.  But that didn’t help much.

I’ve moved our operations to our failover cluster so hopefully this one can 
catch up now.  I don’t understand how this started out of the blue, but at 
least now, the number is decreasing.

Jeff


> On Jan 3, 2023, at 12:57 AM, Anthony D'Atri <[email protected]> wrote:
> 
> Look closely at your output. The PGs with 0 objects. Are only “every other” 
> due to how the command happened to order the output.
> 
> Note that the empty PGs all have IDs matching “3.*”. The numeric prefix of a 
> PG ID reflects the cardinal ID of the pool to which it belongs.   I strongly 
> suspect that you have a pool with no data.
> 
> 
> 
>>> Strangely, ceph pg dump gives shows every other PG with 0 objects.  An 
>>> attempt to perform a deep scrub (or scrub) on one of these PGs does 
>>> nothing.   The cluster appears to be running fine, but obviously there’s an 
>>> issue.   What should my next steps be to troubleshoot ?
>>>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES        
>>>> OMAP_BYTES* OMAP_KEYS* LOG  DISK_LOG STATE                       
>>>> STATE_STAMP                VERSION       REPORTED       UP            
>>>> UP_PRIMARY ACTING        ACTING_PRIMARY LAST_SCRUB    SCRUB_STAMP          
>>>>       LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN
>>>> 3.e9b         0                  0        0         0       0            0 
>>>>           0          0    0        0                active+clean 
>>>> 2022-12-31 22:49:07.629579           0'0    23686:19820       [28,79]      
>>>>    28       [28,79]             28           0'0 2022-12-31 
>>>> 22:49:07.629508             0'0 2022-12-31 22:49:07.629508             0
>>>> 1.e99     60594                  0        0         0       0 177433523272 
>>>>           0          0 3046     3046                active+clean 
>>>> 2022-12-21 14:35:08.175858  23686'268137  23686:1732399     [178,115]      
>>>>   178     [178,115]            178  23675'267613 2022-12-21 
>>>> 11:01:10.403525    23675'267613 2022-12-21 11:01:10.403525             0
>>>> 3.e9a         0                  0        0         0       0            0 
>>>>           0          0    0        0                active+clean 
>>>> 2022-12-31 09:16:48.644619           0'0    23686:22855      [51,140]      
>>>>    51      [51,140]             51           0'0 2022-12-31 
>>>> 09:16:48.644568             0'0 2022-12-30 02:35:23.367344             0
>>>> 1.e98     59962                  0        0         0       0 177218669411 
>>>>           0          0 3035     3035                active+clean 
>>>> 2022-12-28 14:14:49.908560  23686'265576  23686:1357499       [92,86]      
>>>>    92       [92,86]             92  23686'265445 2022-12-28 
>>>> 14:14:49.908522    23686'265445 2022-12-28 14:14:49.908522             0
>>>> 3.e95         0                  0        0         0       0            0 
>>>>           0          0    0        0                active+clean 
>>>> 2022-12-31 06:09:39.442932           0'0    23686:22757       [48,83]      
>>>>    48       [48,83]             48           0'0 2022-12-31 
>>>> 06:09:39.442879             0'0 2022-12-18 09:33:47.892142             0
> 
> 
> As to your PGs not scrubbed in time, what sort of hardware are your OSDs?  
> Here are some thoughts, especially if they’re HDDs.
> 
> * If you don’t need that empty pool, delete it, then evaluate how many PGs on 
> average your OSDs  hold (eg. `ceph osd df`).  If you have an unusually high 
> number of PGs per, maybe just maybe you’re running afoul of 
> osd_scrub_extended_sleep / osd_scrub_sleep .  In other words, individual 
> scrubs on empty PGs may naturally be very fast, but they may be DoSing 
> because of the efforts Ceph makes to spread out the impact of scrubs.
> 
> * Do you limit scrubs to certain times via osd_scrub_begin_hour, 
> osd_scrub_end_hour, osd_scrub_begin_week_day, osd_scrub_end_week_day?  I’ve 
> seen operators who constraint scrubs to only a few overnight / weekend hours, 
> but doing so can hobble Ceph’s ability to get through them all in time.
> 
> * Similarly, a value of osd_scrub_load_threshold that’s too low can also 
> result in starvation.  The load average statistic can be misleading on modern 
> SMP systems with lots of cores.  I’ve witnessed 32c/64t OSD nodes report a 
> load average of like 40, but with tools like htop one could see that they 
> were barely breaking a sweat.
> 
> * If you have osd_scrub_during_recovery disabled and experience a lot of 
> backfill / recovery / rebalance traffic, that can starve scrubs too.  IMHO 
> with recent releases this should almost always be enabled, ymmv.
> 
> * Back when I ran busy (read: underspend) HDD clusters I had to bump 
> osd_deep_scrub_interval by a factor of 4x due to how slow and seek-bound the 
> LFF spinners were.  Of course, the longer one spaces out scrubs, the less 
> effective they are at detecting problems before they’re impactful.
> 
> 
> 
>

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: pg deep scrubbing issue

Reply via email to