Hi! I don't know if we ever discussed this in the bigger forum, so let's start: I would like to check the team and community feelings around the idea of *replacing and deprecating classic histograms in some (very) distant future* with the *native histograms *(with different bucketing options). This is not a formal proposal, but rather initial thoughts for this vision and intention.
*Why?* 1. Classic histograms are one of the main sources of cardinality, on top of other inefficiencies. This is because each bucket is a separate, unique counter series that has to be indexed and represented everywhere (e.g. exposition format, query response, remote write etc). They are not "sparse" (if you had zero observation in one bucket you still pay for it) and they are always floats. This causes more cost, but also in return lower accuracy of histograms, because users have to cap their resolution to minimum. 2. They are generally not usable without series scrape transactionality. That transactionality is non trivial to support across the Prometheus server, especially when integrating with the remote storage. 3. They only support a manual, "custom" (explicit) bucketing. 4. Aggregating over time/labels for histograms with different buckets is impossible. *How?* Native histograms <https://github.com/prometheus/docs/blob/b9657b5f5b264b81add39f6db2f1df36faf03efe/content/docs/concepts/native_histograms.md> (the design is here <https://docs.google.com/document/d/1cLNv3aufPZb3fNfaJgdaRBZsInZKKIHo9E6HinJVbpM/edit#heading=h.2ywoyp13c0bx> , also on Grafana blog <https://grafana.com/blog/2021/11/03/how-sparse-histograms-can-improve-efficiency-precision-and-mergeability-in-prometheus-tsdb/>) were designed and implemented almost everywhere now, solving all 4 above problems. In terms of bucketing, native histograms were implemented with the sparse, exponential bucketing. Generally this is what I see as the most efficient and recommended bucketing going forward for any new histograms in the Prometheus ecosystem. Two main friction point in migrating to native histograms: A) Native histograms come with some minor changes to how you query them via PromQL (you don't need "_bucket" suffix, there are also new functions e.g. histogram_avg <https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_avg>, histogram <https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_count-and-histogram_sum> _count and histogram_sum <https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_count-and-histogram_sum>, histogram_fraction <https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_fraction> ). B) New bucketing can, in theory, break some users that use classic histograms now. It also might not fit some use cases (?) For B, luckly, native histograms were designed to allow flexible bucketing "schema" logic. As a result, we recently <https://github.com/prometheus/proposals/blob/main/proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md> proposed and started implementing the "custom native histogram", which is a shorthand for the native histograms with custom bucketing schema (also sometimes referred to as *nhcb* in the code and PRs). This gives a "classic histogram" bucketing model, while keeping many efficiencies and transactionality benefits (solves 1 to 3 problems above). With custom native histograms it's trivial to* add Prometheus modes that will translate ALL classic histograms to custom native ones on storage to get those benefits now*. Prometheus can hide this fact and translate those back (to some degree) on the PromQL layer (for A). Similarly converted histograms can be reliably (in terms of transactionality) and efficiently forwarded via remote write (Prometheus Remote-Write 2.0 now supports custom native histograms! <https://github.com/prometheus/docs/pull/2462>). The receivers can choose to interpret those as native histogram or translate to classic histograms for PromQL compatibility. Custom native histograms are also great for migration purposes to native histogram PromQL semantics too if one chooses so. Note that custom native histograms are NOT the end goal. Ideally most users migrate to native histograms with sparse, exponential histograms for more efficiency gains. On top of that the main disadvantage of custom bucketing over exponential is, similar to classic histogram, querying or aggregating histograms with different bucketing is simply wrong (4th problem above). *Potential Future* The obvious question is.. *do we still need classic histograms*? I don't think so, but wonder what I am missing? *Would you agree that the long term plan is to replace and depreate classic histograms? *Can we at least share that intention? Another question is migration logic, that might be a separate discussion. So far we discussed things like emitting both classic histograms and native histograms, but it feels like an overhead. Can we do better? Can we solve problem 1 (efficiency) straight away? Why not use custom native histograms for every classic histogram and then have compatibility modes on the query side if needed? I am sure @Bjoern Rabenstein <[email protected]>, @Jeanette <https://github.com/zenador> and @krajorama <https://github.com/krajorama>already had some thoughts around this. Kind Regards, Bartek Płotka (@bwplotka) -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAMssQwY0AHT4sUh3zOdYd8t4G-o7QXnW9F8hm-MJu8_iJPxN6A%40mail.gmail.com.

