I'm in the middle of increasing PG count for one of our pools by making small
increments, waiting for the process to complete, rinse and repeat. I'm doing
it this way so I can control when all this activity is happening and keeping it
away from the busier production traffic times.
I'm expecting some inbalance as PGs get created on already unbalanced OSDs,
however our monitoring picked up something today that I'm not really
understanding. Our total utilization is just over 50% and about 96% of our
total data is in this one pool. Due to there not being enough PGs, the amount
of data in each is quite large and since they aren't evenly spread across the
OSDs, there's a bit of inbalance. That's all cool and to be expected, which is
the reason for increasing the PG count in the first place.
However, as some PGs are splitting, the new PGs are sometimes being created on
OSDs that already have a disproportionate amount of data. Again, not totally
unexpected. Our monitoring detected the usage of this pool to be >85% today as
I neared the end of another increase in PG count. What I'm not understanding
is how this value is determined. I've read other posts and the calculations
suggested don't give a result that equals what shows in my %USED column. I'm
suspecting that it's somehow related to the MAX AVAIL value (which I believe is
somewhat indirectly related to the amount available based on the individual OSD
utilization), but none of the posts I read mention this in their calculations
and I've been unable to create a formula with any of the values I have to end
up with the &USED value I have.
For the record, my current total utilization based on a 'ceph osd df' looks
like this:
TOTAL 39507G(SIZE) 19931G(USE) 17568G(AVAIL) 50.45(%USE)
My most utilised OSD (currently in the process of moving some data off this
OSD) is 81.58% used with 188G available and a variance of 1.62.
A cut-down output of 'ceph df' looks like this:
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
39507G 17569G 19930G 50.45
POOLS:
NAME ID USED %USED MAX AVAIL
OBJECTS
default.rgw.buckets.data 30 9552G 86.05 1548G
36285066
I suspect that as I get the utilization of my over-utilized OSDs down, this
%USED value will drop. But, I'd just love to fully understand how this value
is calculated.
Thanks,
Mark J
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]