Hi,

In the current codebase, JMXTimer exposes its attributes in inconsistent
time units. The percentiles, Mean and DurationUnit attributes are using
micros. But the Values and RecentValues are based on nanos, since the
underlying Timer collects the time values in nanos.

The inconsistency leads to confusion and misinterpretation of the values,
if the end user is not familiar with the implementation details. One may
consider the Values and RecentValues are also in micros as mentioned in the
DurationUnit.

Besides the confusion, given the intention is to record the time values in
the micros resolution, we do not need to allocate 165 buckets in the
DecayingEstimatedHistogramReservoir. 165 buckets is necessary for nanos,
but not for micros. We can only allocate 90 buckets and it should reduce
~50% memory footprint used by the Timers.

It is relatively a small change to unify the exposed values in the same
unit. But it changes the exposed metrics API, I'd like to start the
discussion thread to gather your opinions. And hope to avoid breaking your
tooling.

There are several options (all for timers specifically):

   1. Enforce the consistency of the time unit.
      - Change all JMXTimers to store values in micros and reduce the
      bucket size to 90. The change has no impact on reading the
statistics. But
      the long[] of Values and RecentValues is reduced to 91, and the
values are
      based on micros.
      - Change all JMXTimer to store values in nanos. The change makes the
      percentiles, mean values returned in nanos. But has no impact on the
      histogram raw values, i.e., Values and RecentValues.
   2. Having a toggle to either keep the current inconsistency or records
   all in micros. This is less invasive than option 1. And it does not affect
   your monitoring tooling if it reads the Values (histogram raw values) at
   nanos resolution.

I'd prefer option 1. So the DurationUnit attribute correctly annotates the
other attributes from the JMXTimer. For most of the timers, we do not need
the nanos resolution. Recording them in micros halves the memory footprint
for timers. If some timers do need the nanos resolution, the duration unit
can be changed to nanos. The external process that reads the attributes can
correctly interpret the values based on the duration unit.

Thoughts?

- Yifan

Reply via email to