Hi,
I sent another version of this message with pictures that awaits moderation
since it is so big - apologies for that
In the meantime I got approval to share the output of some of the command -
see attached
I have a 19.2.2 cluster deployed with cephadm
7 nodes and 2 networks ( cluster (2 x 100GB) and public (2 x 25Gb))
It provides an S3 bucket as a target for my backup
The underlying pool (default.rgw.buckets.data) is using an EC 4+2 profile
with a storage class for spinning disks
All spinning disks are keeping WAl/DB on NVME
The amount of data grew pretty fast and , since I started the pool with
pg_autoscale_mode = warn,I have decided to increase the number of
PG manually ( from 128 to 256)
As expected, the backfilling started ...and it never ended ...even now
after more than 1 week I still have about 29 pgs backfilling and 13
backfilling_wait
What worries me is that the number of backfilling PGs varies very little
over time e.g 28 and 12 ALTHOUGH there is constant "recovery" traffic
between 250 and 350MiB
There is no OSD or capacity issue ( if I enable pg_autoscale_mode the
cluster health is OK )
The "recovery" seems to be doing something ( but number of objects remain
the same )
Since the recovery should run over the cluster network and the amount of
data in the pool is not huge, I am not sure why it takes so many days - it
seems stuck actually
The only strange thing I noticed is a discrepancy between the number of PG
and PGP
that the pool currently has ...and what autoscale-status says
Any help / suggestions would be very appreciated
What I have tried so for :
increase recovery speed ( by changing mclock profile to
"high_recovery_ops" and overriding various parameters)
(recovery_max_active, recovery_max_active_hdd ... etc)
redeploying some of the OSDs that were "UP_PRIMARY but part of the
backfill_wait PGs
query the PGs and look for a "stuck reason"
stop scrub and deep-scrub
repair the PGs (some)
change the pg_autoscale_mode to true
check the balancer status
Many thanks
Steven
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]