Re: [prometheus-developers] Exemplars for _count in Summaries

Jonatan Ivanov Tue, 15 Nov 2022 11:50:17 -0800

Hi,

*@Bryan:* Yes, it seems so.


*@Fabian:* This is great news! Thank you all!
Do you think it would worth starting a discussion with the OpenMetrics 
audience about this in the meantime?

Thanks,
Jonatan

On Friday, November 11, 2022 at 3:57:54 AM UTC-8 [email protected] wrote:

> Hi,
>
> Good news everyone: We discussed it on the Prometheus Dev Summit 
> yesterday, and here's the result:
>
> CONSENSUS: Prometheus will ingest Exemplars on all time series.
>
> This includes the _count time series for Summary metrics. The next steps 
> are:
>
> * Create a PR in Prometheus, as currently Exemplars are discarded for 
> these series.
> * Allow client libraries to add Exemplars everywhere.
>
> Strictly speaking the client libraries will then no longer produce 
> compliant OpenMetrics format as long as the OpenMetrics spec isn't changed, 
> but we can start implementing now and get back to the spec later.
>
> Fabian
>
>
> On Wed, Nov 9, 2022 at 4:17 PM Bryan Boreham <[email protected]> wrote:
>
>> Would I be right in thinking this is the code which tripped you up?
>>
>> https://github.com/prometheus/prometheus/blob/c2b4de3611a6/model/textparse/openmetricsparse.go#L472-L479
>> This is insisting that exemplars only work on metrics ending in "_bucket" 
>> or "_total".
>>
>> Personally I would be fine with relaxing that, although it does seem 
>> strictly aligned with the OpenMetrics spec.
>>
>> On Saturday, 5 November 2022 at 05:21:03 UTC+1 [email protected] 
>> wrote:
>>
>>> Hi,
>>>
>>> *@Fabian:* Thank you very much! Is the summit open, can I join?
>>>
>>> *@Bryan:*
>>> I think I would make the decision if it would make sense for the 
>>> Prometheus Server to be able to process the Exemplars on _count first. If 
>>> so, then I would look into the different clients and their APIs. 
>>>
>>> For Histograms, I don't think this should result in a client API change, 
>>> I think this should only affect the behavior of the implementation. Also, I 
>>> think source, binary, and behavioral compatibility can be kept. I think it 
>>> does not matter where the Exemplar is coming from (directly from the user 
>>> or from the sampler), the interesting part happens after the implementation 
>>> got the Exemplar. In the case of Histograms, this would mean not just 
>>> updating the reference of the Exemplar of the current bucket but also 
>>> updating an extra reference to the latest Exemplar (that actually belongs 
>>> to _count). So the histogram would hold references to N+1 exemplars where N 
>>> is the number of buckets and the +1 Exemplar is the latest recorded one.
>>>
>>> For Summaries, this needs a client API change (an addition) since right 
>>> now Summaries do not have Exemplars support. I think this should be very 
>>> similar to the Exemplars support of Counters.
>>>
>>> I would like to call out two things:
>>> - I'm not an expert of any of the Prometheus clients (I only used the 
>>> Java and the JavaScript clients).
>>> - I think I would not even be affected by the changes of the client APIs 
>>> since Micrometer is not using these, it directly creates a 
>>> Collector.MetricFamilySamples.Sample instead (that can accept Exemplars 
>>> as of today). So if the Prometheus server could process Exemplars on 
>>> _count, I think my use-case should be covered. But I would definitely 
>>> add the support to the client APIs too so that users who use the clients 
>>> can enjoy these features.
>>>  
>>> Thanks,
>>> Jonatan
>>>
>>> On Friday, November 4, 2022 at 1:09:22 AM UTC-7 [email protected] wrote:
>>>
>>>> >>> Also, what do you think about supporting _count in Histograms 
>>>> (since Histogram extends Summary with buckets)?
>>>> >> How exactly would you change this while remaining 
>>>> backwards-compatible for existing users?
>>>> > As far as I can see, adding Exemplars to _count should be a 
>>>> backward-compatible change (it is an addition).
>>>>
>>>> I would like to see what you can see.  Please spell it out for me.
>>>>
>>>> The current API on Histogram is ObserveWithExemplar(), and it adds the 
>>>> exemplar to a _bucket metric.
>>>> Would you change the behaviour of that API?  Would you add a new API?  
>>>>
>>>> Bryan
>>>>
>>>> On Friday, 4 November 2022 at 05:46:39 UTC [email protected] wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry for the late reply let me go on-by-one, please bear with me. :)
>>>>>
>>>>> *>I recommend taking that question to an OpenMetrics list.* 
>>>>> Thanks, I will open a thread there too based on the discussion here.
>>>>>
>>>>> *>Prometheus could independently decide to go beyond what OpenMetrics 
>>>>> says*
>>>>> Since Micrometer uses the Prometheus Java Client this would solve the 
>>>>> issue for us but if it makes sense it would be great to have it 
>>>>> standardized later(?) (in OM).
>>>>>
>>>>> *>Maybe your point is that exemplars are tied to the _bucket metrics 
>>>>> and not the _count metric?*
>>>>> Yes, that's exactly what I'm saying with the caveat that I think this 
>>>>> would be useful for Summaries too not only Histograms.
>>>>>
>>>>> *>How exactly would you change this while remaining 
>>>>> backwards-compatible for existing users?*
>>>>> As far as I can see, adding Exemplars to _count should be a 
>>>>> backward-compatible change (it is an addition).
>>>>> Would users be broken because of this?
>>>>>
>>>>> *>Perhaps the upcoming "native histograms" or "sparse histograms" 
>>>>> feature will suit what you need?*
>>>>> I'm not sure but I'm not only talking about histograms, I'm also 
>>>>> talking about Summary.
>>>>>
>>>>> *>In an OM Histogram, the +Inf bucket fulfills exactly the same 
>>>>> function*
>>>>>
>>>>>
>>>>> *as the _count (spec says: "The +Inf bucket counts all requests.") 
>>>>> Soif you would like an examplar on the _count of a Histogram, you can 
>>>>> aswell use an exemplar on the +Inf bucket.*
>>>>>
>>>>> I think I disagree with the second sentence. Let's say you have an 
>>>>> application where processing the first request is significantly slower 
>>>>> than 
>>>>> the rest (lazy init, populating caches, GC, establishing connections, 
>>>>> etc.). In this environment (I think this is true for lots of apps 
>>>>> nowadays) 
>>>>> it can easily happen that the +Inf bucket will be populated with an 
>>>>> Exemplar for the first request and it will never get updated because the 
>>>>> app will never be as slow as it was for the first request. Also, nothing 
>>>>> guarantees that the +Inf bucker will have an exemplar, maybe the 
>>>>> processing 
>>>>> was faster than that. As far as I understand, exemplars are not like 
>>>>> cumulative "le" counters so incrementing a bucket does not mean updating 
>>>>> an 
>>>>> exemplar (quite the opposite, maybe all of the buckets will be 
>>>>> incremented 
>>>>> but only one will get a new Exemplar). This is also true for apps that 
>>>>> can 
>>>>> get significantly faster over time (i.e.: JIT).  I think a solution here 
>>>>> would be give_me_the_last_updated_bucket(my_histogram) or adding 
>>>>> Exemplar to _count.
>>>>>
>>>>> *>Side note: In Java this would be particularly useful because the 
>>>>> popular Spring Boot framework exposes a Summary 
>>>>> http_server_requests_seconds by default that look like this (no 
>>>>> quantiles, 
>>>>> just _count and _sum)*
>>>>> This was actually one of the drivers of this request, users are asking 
>>>>> for this from us. I also find it extremely useful: without this I need to 
>>>>> create an additional counter which does not seem right since I already 
>>>>> have 
>>>>> one.
>>>>>
>>>>> Thanks,
>>>>> Jonatan
>>>>>
>>>>> On Tuesday, October 18, 2022 at 6:29:27 AM UTC-7 [email protected] 
>>>>> wrote:
>>>>>
>>>>>> Side note: In Java this would be particularly useful because the 
>>>>>> popular Spring Boot framework exposes a Summary 
>>>>>> http_server_requests_seconds by default that look like this (no 
>>>>>> quantiles, 
>>>>>> just _count and _sum):
>>>>>>
>>>>>> # HELP http_server_requests_seconds  
>>>>>> # TYPE http_server_requests_seconds summary
>>>>>> http_server_requests_seconds_count{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/",}
>>>>>>  1.0
>>>>>> http_server_requests_seconds_sum{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/",}
>>>>>>  1.014687278
>>>>>>
>>>>>> I think this is pretty useful, you can get request rates and error 
>>>>>> rates out of it. If Prometheus / OpenMetrics had support for Exemplars 
>>>>>> on 
>>>>>> the _count, users could find example traces per HTTP status and URI.
>>>>>>
>>>>>> Fabian
>>>>>>
>>>>>> On Tue, Oct 18, 2022 at 3:05 PM Bjoern Rabenstein <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>>> On 06.10.22 14:45, 'Fabian Stäber' via Prometheus Developers wrote:
>>>>>>> > 
>>>>>>> > Great question from the CNCF Slack: What's the reason why we don't 
>>>>>>> allow 
>>>>>>> > Exemplars for _count in Summary metrics?
>>>>>>> > 
>>>>>>> > What do you think? Any reason why Exemplars don't work in _count 
>>>>>>> in 
>>>>>>> > Summaries? Would that be something we could consider supporting?
>>>>>>>
>>>>>>> The _count of a Summary _and_ the _count of a Histogram (both
>>>>>>> conventional as well as the new native ones) is essentially a counter
>>>>>>> within the larger "structured" metric of a Summary/Histogram.
>>>>>>>
>>>>>>> From that perspective, it should have the option of attaching an
>>>>>>> examplar, as a regular Counter has, too.
>>>>>>>
>>>>>>> My speculation why it doesn't in OpenMetrics:
>>>>>>>
>>>>>>> In an OM Histogram, the +Inf bucket fulfills exactly the same 
>>>>>>> function
>>>>>>> as the _count (spec says: "The +Inf bucket counts all requests.") So
>>>>>>> if you would like an examplar on the _count of a Histogram, you can 
>>>>>>> as
>>>>>>> well use an exemplar on the +Inf bucket.
>>>>>>>
>>>>>>> That obviously doesn't help in the case of a Summary, but I guess the
>>>>>>> rationale is that Histograms are generally to be preferred over
>>>>>>> Summaries, and therefore didn't get the thourough treatment when it
>>>>>>> came to exemplars.
>>>>>>>
>>>>>>>
>>>>>>> However, even if you really dislike the precalculated quantiles in
>>>>>>> Summaries, there is still the case of a Summary without quantiles. I
>>>>>>> think adding exemplars to such a Summary is as much needed as adding
>>>>>>> exemplars to any regular Counter.
>>>>>>>
>>>>>>> -- 
>>>>>>> Björn Rabenstein
>>>>>>> [PGP-ID] 0x851C3DA17D748D03
>>>>>>> [email] [email protected]
>>>>>>>
>>>>>> -- 
>>
> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-developers/456a8d9a-4db7-4185-96cb-ee6b835373d4n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-developers/456a8d9a-4db7-4185-96cb-ee6b835373d4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/0af660e8-8c77-4dd2-b503-82e7eff29bd6n%40googlegroups.com.

Re: [prometheus-developers] Exemplars for _count in Summaries

Reply via email to