** Description changed:
[ Impact ]
* The watcher releases targeted by this SRU are using a deprecated
- ceilometer metric, cpu_util, which reported cpu utilization as a
- percentage. This metric was deprecated in Openstack Rocky in favor of
- the Gnocchi rate calculation equivalent [1].
+ ceilometer metric, cpu_util, which previously reported cpu utilization
+ as a percentage. This metric was deprecated in Openstack Rocky in favor
+ of "the gnocchi rate calculation equivalent" [1] - essentially meaning
+ that the cpu utilization value should be obtained by performing a
+ calculation with gnocchi's rates. The ceilometer metric cpu_util was
+ then fully removed in Victoria.
* Upstream Watcher continued to use cpu_util until the commit at [2]
- landed on master for 2024.1. This commit correctly performs the cpu
- calculation and removes the deprecated metric. The calculation is
- summarized in the next bullet point and there is an example calculation
- in the original commit
+ landed on master for 2024.1. Since the ceilometer no longer has a
+ cpu_util metric, polling this metric returns "None". What this means is
+ that all Watcher strategies, particularly those relating to workload
+ balancing and migration of VMs to under-utilized hosts, which rely on
+ cpu_util are non-functional from Victoria, when the metric was removed,
+ until Caracal.
- * The gnocchi calculation uses the cumulative cpu time in ns (reported
- by the cpu metric), taken as a rate (the difference in cumulative time
- over the last two sampling intervals) to find the total cpu time during
- the previous sampling period. Dividing the cpu time in one interval by
- the duration of the interval multiplied by the number of vcpus provides
- the cpu utilization as a percentage: cpu_usage = [cpu_time / (period *
- 10^9 * nvcpus)] * 100%. A sample calculation is provided in the original
- commit message.
+ * This commit correctly performs the cpu calculation as intended using
+ gnocchi. The calculation is summarized in the next bullet point and
+ there is an example calculation in the original commit
+
+ * Gnocchi uses the cumulative cpu time in ns (reported by the
+ ceilometer metric, "cpu") and consumes it as a rate (essentially it
+ computes the difference in cumulative cpu time over the last two
+ sampling intervals) to find the total cpu time during the previous
+ sampling period. Dividing the cpu time in one interval by the duration
+ of the interval multiplied by the number of vcpus provides the cpu
+ utilization as a percentage: cpu_usage = [cpu_time / (period * 10^9 *
+ nvcpus)] * 100%. A sample calculation is provided in the original commit
+ message.
* I cherry-picked to stable/2023.2 [3], but the other branches have
gone unmaintained
[ Test Plan ]
* Deploy openstack yoga on jammy with watcher and gnocchi services
* Launch a server and take note of it's resource id. Then find the
gnocchi cpu metric associated with the instance
- * Create a watcher audit based on a goal that previously depended on
instance cpu utilization. For example the workload_balance goal [4]
+ * Create a watcher audit based on a goal that previously depended on
instance cpu utilization (from Watcher's perspective this is called
instance_cpu_usage). For example the workload_balance goal [4] depends on
instance_cpu_usage
Ex. openstack optimize audit create -t CONTINUOUS -i 60 -g
workload_balancing -s workload_balance --auto-trigger
- Without the patch instance_cpu_usage appears as None in the audits. With
the patch you can observe the correct cpu utilization percentage in the
watcher-decision-engine.log
+
+ * Without the patch, the workload_balance strategy does not work. The
+ audit will be created, but it cannot provide any meaningful action plan
+ since instance_cpu_usage is None in the audits. With the patch Watcher
+ obtains the correct cpu utilization percentage and the strategies work
+ as expected.
* Wait for at least one sampling period to elapse and check
/var/log/watcher/watcher-decision-engine.log for entries showing
"instance_cpu_usage" - this is the cpu utilization as a percentage.
* To verify the percentage with a manual calculation, run gnocchi
measure show <metric uuid> --aggregation "rate:mean" and perform the
calculation instance_cpu_usage = 100*[<value> / (period * 10^9 * nvcpus)
using the cpu time from the corresponding sampling period
[ What can go wrong ]
- * While this is replacing a deprecated methodology and metric and
- should lead to improvements, any custom strategies relying on cpu_util
- may be affected.
+ * While the patch restores functionality by calculating cpu
+ utilization using gnocchi's rate metric, if gnocchi is misconfigured or
+ the relevant "cpu" metric is missing, the new calculation may not work
+ as anticipated
[1] https://docs.openstack.org/releasenotes/ceilometer/rocky.html
[2] https://review.opendev.org/c/openstack/watcher/+/898791
[3] https://review.opendev.org/c/openstack/watcher/+/934181
- [4] https://docs.openstack.org/watcher/rocky/strategies/workload_balance.html
+ [4] https://docs.openstack.org/watcher/2024.1/strategies/workload_balance.html
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2088620
Title:
[SRU] Deprecated usage of cpu_util
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2088620/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs