I wanted to reopen this discussion because I am having a very difficult 
time understanding how pushgateway can be the suggested solution for batch 
job metric collection, yet simultaneously batch jobs are not a great use 
case example for why metric TTLs are needed in push gateway.

The most basic example, two batch jobs that produce the same metrics (grpc 
or http metrics). This is not just `last_completed_at` or something as I 
have seen before where its the same metric being updated over and over 
agin. You have to include a label that identifies these jobs as different 
so that metrics like gRPC request rates can be calculated correctly. In the 
kubernetes world this usually means pod ID. Simple enough until you have 
1000s of these pod IDs compounded by other labels.

By now we all know those metrics are going to stay around forever, but I 
don't understand why the answer to this problem is "this is not a a good 
use case". For push gateway? For TTL? What am I doing wrong? 

I've got a pipeline and library code streamlined for prometheus metric 
collection and the only solution I have seen offered at all is "use 
statsd". No. That's silly. I need new clients and two ways of defining 
metrics in code to account for each potential storage solution. Two APIs. 
etc.

Can someone please help me understand why pushgateway's existence is not 
reason enough to implement TTL?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/2ccd5a73-88c9-4eb3-bbcf-1c53dca11b0en%40googlegroups.com.

Reply via email to