Hi!

I don't know if we ever discussed this in the bigger forum, so let's start:
I would like to check the team and community feelings around the idea
of *replacing
and deprecating classic histograms in some (very) distant future* with
the *native
histograms *(with different bucketing options). This is not a formal
proposal, but rather initial thoughts for this vision and intention.

*Why?*

   1. Classic histograms are one of the main sources of cardinality, on top
   of other inefficiencies. This is because each bucket is a separate, unique
   counter series that has to be indexed and represented everywhere (e.g.
   exposition format, query response, remote write etc). They are not "sparse"
   (if you had zero observation in one bucket you still pay for it) and they
   are always floats. This causes more cost, but also in return lower accuracy
   of histograms, because users have to cap their resolution to minimum.
   2. They are generally not usable without series scrape transactionality.
   That transactionality is non trivial to support across the Prometheus
   server, especially when integrating with the remote storage.
   3. They only support a manual, "custom" (explicit) bucketing.
   4. Aggregating over time/labels for histograms with different buckets is
   impossible.

*How?*

Native histograms
<https://github.com/prometheus/docs/blob/b9657b5f5b264b81add39f6db2f1df36faf03efe/content/docs/concepts/native_histograms.md>
(the design
is here
<https://docs.google.com/document/d/1cLNv3aufPZb3fNfaJgdaRBZsInZKKIHo9E6HinJVbpM/edit#heading=h.2ywoyp13c0bx>
, also on Grafana blog
<https://grafana.com/blog/2021/11/03/how-sparse-histograms-can-improve-efficiency-precision-and-mergeability-in-prometheus-tsdb/>)
were designed and implemented almost everywhere now, solving all 4 above
problems. In terms of bucketing, native histograms were implemented with
the sparse, exponential bucketing. Generally this is what I see as the most
efficient and recommended bucketing going forward for any new histograms in
the Prometheus ecosystem.

Two main friction point in migrating to native histograms:

A) Native histograms come with some minor changes to how you query them via
PromQL (you don't need "_bucket" suffix, there are also new functions e.g.
histogram_avg
<https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_avg>,
histogram
<https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_count-and-histogram_sum>
_count and histogram_sum
<https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_count-and-histogram_sum>,
histogram_fraction
<https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_fraction>
).
B) New bucketing can, in theory, break some users that use classic
histograms now. It also might not fit some use cases (?)

For B, luckly, native histograms were designed to allow flexible bucketing
"schema" logic. As a result, we recently
<https://github.com/prometheus/proposals/blob/main/proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md>
proposed
and started implementing the "custom native histogram", which is a
shorthand for the native histograms with custom bucketing schema (also
sometimes referred to as *nhcb* in the code and PRs). This gives a "classic
histogram" bucketing model, while keeping many efficiencies and
transactionality benefits (solves 1 to 3 problems above).

With custom native histograms it's trivial to* add Prometheus modes that
will translate ALL classic histograms to custom native ones on storage to
get those benefits now*. Prometheus can hide this fact and translate those
back (to some degree) on the PromQL layer (for A). Similarly converted
histograms can be reliably (in terms of transactionality) and
efficiently forwarded via remote write (Prometheus Remote-Write 2.0 now
supports custom native histograms!
<https://github.com/prometheus/docs/pull/2462>). The receivers can choose
to interpret those as native histogram or translate to classic histograms
for PromQL compatibility.

Custom native histograms are also great for migration purposes to native
histogram PromQL semantics too if one chooses so.

Note that custom native histograms are NOT the end goal. Ideally most users
migrate to native histograms with sparse, exponential histograms for more
efficiency gains. On top of that the main disadvantage of custom bucketing
over exponential is, similar to classic histogram, querying or aggregating
histograms with different bucketing is simply wrong (4th problem above).

*Potential Future*

The obvious question is.. *do we still need classic histograms*? I don't
think so, but wonder what I am missing?

*Would you agree that the long term plan is to replace and depreate classic
histograms? *Can we at least share that intention?

Another question is migration logic, that might be a separate
discussion. So far we discussed things like emitting both classic
histograms and native histograms, but it feels like an overhead. Can we do
better? Can we solve problem 1 (efficiency) straight away? Why not use
custom native histograms for every classic histogram and then have
compatibility modes on the query side if needed?

I am sure @Bjoern Rabenstein <[email protected]>, @Jeanette
<https://github.com/zenador> and @krajorama
<https://github.com/krajorama>already
had some thoughts around this.

Kind Regards,
Bartek Płotka (@bwplotka)

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CAMssQwY0AHT4sUh3zOdYd8t4G-o7QXnW9F8hm-MJu8_iJPxN6A%40mail.gmail.com.

Reply via email to