Hi, *@Bryan:* Yes, it seems so.
*@Fabian:* This is great news! Thank you all! Do you think it would worth starting a discussion with the OpenMetrics audience about this in the meantime? Thanks, Jonatan On Friday, November 11, 2022 at 3:57:54 AM UTC-8 [email protected] wrote: > Hi, > > Good news everyone: We discussed it on the Prometheus Dev Summit > yesterday, and here's the result: > > CONSENSUS: Prometheus will ingest Exemplars on all time series. > > This includes the _count time series for Summary metrics. The next steps > are: > > * Create a PR in Prometheus, as currently Exemplars are discarded for > these series. > * Allow client libraries to add Exemplars everywhere. > > Strictly speaking the client libraries will then no longer produce > compliant OpenMetrics format as long as the OpenMetrics spec isn't changed, > but we can start implementing now and get back to the spec later. > > Fabian > > > On Wed, Nov 9, 2022 at 4:17 PM Bryan Boreham <[email protected]> wrote: > >> Would I be right in thinking this is the code which tripped you up? >> >> https://github.com/prometheus/prometheus/blob/c2b4de3611a6/model/textparse/openmetricsparse.go#L472-L479 >> This is insisting that exemplars only work on metrics ending in "_bucket" >> or "_total". >> >> Personally I would be fine with relaxing that, although it does seem >> strictly aligned with the OpenMetrics spec. >> >> On Saturday, 5 November 2022 at 05:21:03 UTC+1 [email protected] >> wrote: >> >>> Hi, >>> >>> *@Fabian:* Thank you very much! Is the summit open, can I join? >>> >>> *@Bryan:* >>> I think I would make the decision if it would make sense for the >>> Prometheus Server to be able to process the Exemplars on _count first. If >>> so, then I would look into the different clients and their APIs. >>> >>> For Histograms, I don't think this should result in a client API change, >>> I think this should only affect the behavior of the implementation. Also, I >>> think source, binary, and behavioral compatibility can be kept. I think it >>> does not matter where the Exemplar is coming from (directly from the user >>> or from the sampler), the interesting part happens after the implementation >>> got the Exemplar. In the case of Histograms, this would mean not just >>> updating the reference of the Exemplar of the current bucket but also >>> updating an extra reference to the latest Exemplar (that actually belongs >>> to _count). So the histogram would hold references to N+1 exemplars where N >>> is the number of buckets and the +1 Exemplar is the latest recorded one. >>> >>> For Summaries, this needs a client API change (an addition) since right >>> now Summaries do not have Exemplars support. I think this should be very >>> similar to the Exemplars support of Counters. >>> >>> I would like to call out two things: >>> - I'm not an expert of any of the Prometheus clients (I only used the >>> Java and the JavaScript clients). >>> - I think I would not even be affected by the changes of the client APIs >>> since Micrometer is not using these, it directly creates a >>> Collector.MetricFamilySamples.Sample instead (that can accept Exemplars >>> as of today). So if the Prometheus server could process Exemplars on >>> _count, I think my use-case should be covered. But I would definitely >>> add the support to the client APIs too so that users who use the clients >>> can enjoy these features. >>> >>> Thanks, >>> Jonatan >>> >>> On Friday, November 4, 2022 at 1:09:22 AM UTC-7 [email protected] wrote: >>> >>>> >>> Also, what do you think about supporting _count in Histograms >>>> (since Histogram extends Summary with buckets)? >>>> >> How exactly would you change this while remaining >>>> backwards-compatible for existing users? >>>> > As far as I can see, adding Exemplars to _count should be a >>>> backward-compatible change (it is an addition). >>>> >>>> I would like to see what you can see. Please spell it out for me. >>>> >>>> The current API on Histogram is ObserveWithExemplar(), and it adds the >>>> exemplar to a _bucket metric. >>>> Would you change the behaviour of that API? Would you add a new API? >>>> >>>> Bryan >>>> >>>> On Friday, 4 November 2022 at 05:46:39 UTC [email protected] wrote: >>>> >>>>> Hi, >>>>> >>>>> Sorry for the late reply let me go on-by-one, please bear with me. :) >>>>> >>>>> *>I recommend taking that question to an OpenMetrics list.* >>>>> Thanks, I will open a thread there too based on the discussion here. >>>>> >>>>> *>Prometheus could independently decide to go beyond what OpenMetrics >>>>> says* >>>>> Since Micrometer uses the Prometheus Java Client this would solve the >>>>> issue for us but if it makes sense it would be great to have it >>>>> standardized later(?) (in OM). >>>>> >>>>> *>Maybe your point is that exemplars are tied to the _bucket metrics >>>>> and not the _count metric?* >>>>> Yes, that's exactly what I'm saying with the caveat that I think this >>>>> would be useful for Summaries too not only Histograms. >>>>> >>>>> *>How exactly would you change this while remaining >>>>> backwards-compatible for existing users?* >>>>> As far as I can see, adding Exemplars to _count should be a >>>>> backward-compatible change (it is an addition). >>>>> Would users be broken because of this? >>>>> >>>>> *>Perhaps the upcoming "native histograms" or "sparse histograms" >>>>> feature will suit what you need?* >>>>> I'm not sure but I'm not only talking about histograms, I'm also >>>>> talking about Summary. >>>>> >>>>> *>In an OM Histogram, the +Inf bucket fulfills exactly the same >>>>> function* >>>>> >>>>> >>>>> *as the _count (spec says: "The +Inf bucket counts all requests.") >>>>> Soif you would like an examplar on the _count of a Histogram, you can >>>>> aswell use an exemplar on the +Inf bucket.* >>>>> >>>>> I think I disagree with the second sentence. Let's say you have an >>>>> application where processing the first request is significantly slower >>>>> than >>>>> the rest (lazy init, populating caches, GC, establishing connections, >>>>> etc.). In this environment (I think this is true for lots of apps >>>>> nowadays) >>>>> it can easily happen that the +Inf bucket will be populated with an >>>>> Exemplar for the first request and it will never get updated because the >>>>> app will never be as slow as it was for the first request. Also, nothing >>>>> guarantees that the +Inf bucker will have an exemplar, maybe the >>>>> processing >>>>> was faster than that. As far as I understand, exemplars are not like >>>>> cumulative "le" counters so incrementing a bucket does not mean updating >>>>> an >>>>> exemplar (quite the opposite, maybe all of the buckets will be >>>>> incremented >>>>> but only one will get a new Exemplar). This is also true for apps that >>>>> can >>>>> get significantly faster over time (i.e.: JIT). I think a solution here >>>>> would be give_me_the_last_updated_bucket(my_histogram) or adding >>>>> Exemplar to _count. >>>>> >>>>> *>Side note: In Java this would be particularly useful because the >>>>> popular Spring Boot framework exposes a Summary >>>>> http_server_requests_seconds by default that look like this (no >>>>> quantiles, >>>>> just _count and _sum)* >>>>> This was actually one of the drivers of this request, users are asking >>>>> for this from us. I also find it extremely useful: without this I need to >>>>> create an additional counter which does not seem right since I already >>>>> have >>>>> one. >>>>> >>>>> Thanks, >>>>> Jonatan >>>>> >>>>> On Tuesday, October 18, 2022 at 6:29:27 AM UTC-7 [email protected] >>>>> wrote: >>>>> >>>>>> Side note: In Java this would be particularly useful because the >>>>>> popular Spring Boot framework exposes a Summary >>>>>> http_server_requests_seconds by default that look like this (no >>>>>> quantiles, >>>>>> just _count and _sum): >>>>>> >>>>>> # HELP http_server_requests_seconds >>>>>> # TYPE http_server_requests_seconds summary >>>>>> http_server_requests_seconds_count{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/",} >>>>>> 1.0 >>>>>> http_server_requests_seconds_sum{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/",} >>>>>> 1.014687278 >>>>>> >>>>>> I think this is pretty useful, you can get request rates and error >>>>>> rates out of it. If Prometheus / OpenMetrics had support for Exemplars >>>>>> on >>>>>> the _count, users could find example traces per HTTP status and URI. >>>>>> >>>>>> Fabian >>>>>> >>>>>> On Tue, Oct 18, 2022 at 3:05 PM Bjoern Rabenstein <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> On 06.10.22 14:45, 'Fabian Stäber' via Prometheus Developers wrote: >>>>>>> > >>>>>>> > Great question from the CNCF Slack: What's the reason why we don't >>>>>>> allow >>>>>>> > Exemplars for _count in Summary metrics? >>>>>>> > >>>>>>> > What do you think? Any reason why Exemplars don't work in _count >>>>>>> in >>>>>>> > Summaries? Would that be something we could consider supporting? >>>>>>> >>>>>>> The _count of a Summary _and_ the _count of a Histogram (both >>>>>>> conventional as well as the new native ones) is essentially a counter >>>>>>> within the larger "structured" metric of a Summary/Histogram. >>>>>>> >>>>>>> From that perspective, it should have the option of attaching an >>>>>>> examplar, as a regular Counter has, too. >>>>>>> >>>>>>> My speculation why it doesn't in OpenMetrics: >>>>>>> >>>>>>> In an OM Histogram, the +Inf bucket fulfills exactly the same >>>>>>> function >>>>>>> as the _count (spec says: "The +Inf bucket counts all requests.") So >>>>>>> if you would like an examplar on the _count of a Histogram, you can >>>>>>> as >>>>>>> well use an exemplar on the +Inf bucket. >>>>>>> >>>>>>> That obviously doesn't help in the case of a Summary, but I guess the >>>>>>> rationale is that Histograms are generally to be preferred over >>>>>>> Summaries, and therefore didn't get the thourough treatment when it >>>>>>> came to exemplars. >>>>>>> >>>>>>> >>>>>>> However, even if you really dislike the precalculated quantiles in >>>>>>> Summaries, there is still the case of a Summary without quantiles. I >>>>>>> think adding exemplars to such a Summary is as much needed as adding >>>>>>> exemplars to any regular Counter. >>>>>>> >>>>>>> -- >>>>>>> Björn Rabenstein >>>>>>> [PGP-ID] 0x851C3DA17D748D03 >>>>>>> [email] [email protected] >>>>>>> >>>>>> -- >> > You received this message because you are subscribed to the Google Groups >> "Prometheus Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-developers/456a8d9a-4db7-4185-96cb-ee6b835373d4n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/prometheus-developers/456a8d9a-4db7-4185-96cb-ee6b835373d4n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/0af660e8-8c77-4dd2-b503-82e7eff29bd6n%40googlegroups.com.

