@Martin Just a ping about this issue, how did you identified what services were causing you trouble with too much metrics? I'm asking because I'm facing a similar problem at the moment.
Thank you. Em segunda-feira, 13 de abril de 2020 às 06:53:13 UTC-3, Martin Man escreveu: > Hi Nishant, > > I’m also new to prometheus and faced similar scenario recently. > > What helped me was to add a job to monitor prometheus instance itself, > then import a prometheus 2.0 grafana dashboard and watch prometheus memory > consumption and samples appended per second while defining new > servicemonitors. This in the end helped me stabilise the memory usage as > well as identify services that generated way too many metrics responsible > for huge memory consumption. > > HTH, > Martin > > > > On 13 Apr 2020, at 11:03, Nishant Ketu <[email protected]> wrote: > > > > We have deployed Prometheus through helm and using after around 2 months > we get OOM error and the pods failed to restart. We have manually clean up > the /data to get the pod running again. I have used the retention flag but > it don't seem to work on wall folder of /data. Any help for this would be > nice. Thanks > > > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a8fad06c-8df5-4581-9549-7264c7a8b3e5n%40googlegroups.com.

