I wanted to reopen this discussion because I am having a very difficult time understanding how pushgateway can be the suggested solution for batch job metric collection, yet simultaneously batch jobs are not a great use case example for why metric TTLs are needed in push gateway.
The most basic example, two batch jobs that produce the same metrics (grpc or http metrics). This is not just `last_completed_at` or something as I have seen before where its the same metric being updated over and over agin. You have to include a label that identifies these jobs as different so that metrics like gRPC request rates can be calculated correctly. In the kubernetes world this usually means pod ID. Simple enough until you have 1000s of these pod IDs compounded by other labels. By now we all know those metrics are going to stay around forever, but I don't understand why the answer to this problem is "this is not a a good use case". For push gateway? For TTL? What am I doing wrong? I've got a pipeline and library code streamlined for prometheus metric collection and the only solution I have seen offered at all is "use statsd". No. That's silly. I need new clients and two ways of defining metrics in code to account for each potential storage solution. Two APIs. etc. Can someone please help me understand why pushgateway's existence is not reason enough to implement TTL? -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/2ccd5a73-88c9-4eb3-bbcf-1c53dca11b0en%40googlegroups.com.

