The problem is not that much priorities etc, it is all the questions and confusions around this:
- When do we decide we are overloaded? - What do we do for the low priority targets? and more importantly: - When do we decide that we can scrape the low targets again? How to avoid: High load -> stop low scrapes -> Normal load (because we do not scrape low priorities) -> restart low scrapes -> High load -> stop low scrapes -> Normal load (because we do not scrape low priorities) -> restart low scrapes -> High load -> stop low scrapes -> Normal load (because we do not scrape low priorities) -> restart low scrapes Overall that does not seem easy questions. On 30 Jul 10:10, Bartłomiej Płotka wrote: > Yes, looks like having many scrapers would solve this, and having Thanos on > top for query aggregation can do. However, given the overhead of even > operating the TSDB instances like Prometheus (e.g maintaining persistence > volumes), I would still see some longer-term solution of better multitenant > support (isolation of tenants scrape) within scrape engine. Some > alternative is dynamic relabelling configured from outside as seen here > https://blog.freshtracks.io/bomb-squad-automatic-detection-and-suppression-of-prometheus-cardinality-explosions-62ca8e02fa32 > - > I think with good monitoring of Prometheus health we could implement > "sidecar" applying such priorities dynamically as well. That would be good > for a star maybe (: > > In the meantime, the separate scraper looks like the way to go. > > Kind Regards, > Bartek > > On Thu, 30 Jul 2020 at 10:01, Lili Cosic <[email protected]> wrote: > > > Thanks, everyone for the replies! The official msg seems to be to use a > > Prometheus instance per tenant/priority if you want to have multiple > > tenants in your environment. > > > > Kind regards, > > Lili > > > > On Thursday, 30 July 2020 10:44:59 UTC+2, Ben Kochie wrote: > >> > >> I'm with Brian and Julian on this. > >> > >> Multi-tenancy is not really something we want to solve in Prometheus. > >> This is a concern for higher level systems like Kubernetes. Prometheus is > >> designed to be distributed. If you have targets with different needs, they > >> need to have separate Prometheus instances. > >> > >> This is also why we have things like Thanos and Cortex as aggregation > >> layers. > >> > >> Similar to why we have said we don't plan to implement IO limits, this is > >> a scheduling concern, out of scope for Prometheus. > >> > >> On Thu, Jul 30, 2020, 10:31 Frederic Branczyk <[email protected]> wrote: > >> > >>> That's only effective in limiting the number of targets, the point here > >>> is that selectively scraping those with a higher priority based on > >>> backpressure of the system as a whole. > >>> > >>> On Wed, 22 Jul 2020 at 17:00, Julien Pivotto <[email protected]> > >>> wrote: > >>> > >>>> On 22 Jul 16:47, Frederic Branczyk wrote: > >>>> > In practice even that can still be problematic. You only know that > >>>> > Prometheus has a problem when everything fails, the point is to keep > >>>> things > >>>> > alive well enough for more critical components. > >>>> > > >>>> > On Wed, 22 Jul 2020 at 16:38, Julien Pivotto <[email protected] > >>>> > > >>>> > wrote: > >>>> > > >>>> > > On 22 Jul 16:36, Frederic Branczyk wrote: > >>>> > > > It's unclear how that helps, can you help me understand? > >>>> > > > >>>> > > - job: highprio > >>>> > > relabel_configs: > >>>> > > - target_label: job > >>>> > > replacement: pods > >>>> > > - source_labels: [__meta_pod_priority] > >>>> > > regex: high > >>>> > > action: keep > >>>> > >>>> highprio job will always be scraped. > >>>> > >>>> > > - job: lowprio > >>>> > > relabel_configs: > >>>> > > - target_label: job > >>>> > > replacement: pods > >>>> > > - source_labels: [__meta_pod_priority] > >>>> > > regex: high > >>>> > > action: drop > >>>> > > target_limit: 1000 > >>>> > > > >>>> > > > > >>>> > > > On Wed, 22 Jul 2020 at 16:34, Julien Pivotto < > >>>> [email protected] > >>>> > > > > >>>> > > > wrote: > >>>> > > > > >>>> > > > > On 22 Jul 16:32, Frederic Branczyk wrote: > >>>> > > > > > Can you explain what you mean by two jobs? Do you mean two > >>>> scrape > >>>> > > > > configs? > >>>> > > > > > >>>> > > > > Yes. > >>>> > > > > > >>>> > > > > > > >>>> > > > > > On Wed, 22 Jul 2020 at 11:40, Julien Pivotto < > >>>> > > [email protected] > >>>> > > > > > > >>>> > > > > > wrote: > >>>> > > > > > > >>>> > > > > > > On 22 Jul 02:35, Lili Cosic wrote: > >>>> > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > On Wednesday, 22 July 2020 11:23:00 UTC+2, Brian Brazil > >>>> wrote: > >>>> > > > > > > > > > >>>> > > > > > > > > On Wed, 22 Jul 2020 at 10:18, Julien Pivotto < > >>>> > > > > [email protected] > >>>> > > > > > > > > <javascript:>> wrote: > >>>> > > > > > > > > > >>>> > > > > > > > >> On 22 Jul 02:14, Lili Cosic wrote: > >>>> > > > > > > > >> > Only now seen in the docs that I am supposed to > >>>> start any > >>>> > > > > > > discussions > >>>> > > > > > > > >> here > >>>> > > > > > > > >> > first before opening an issue, sorry about that! :) > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > Currently there is no way of a target to have higher > >>>> scrape > >>>> > > > > > > priority > >>>> > > > > > > > >> over > >>>> > > > > > > > >> > another, but if you have a setup and even if you set > >>>> target > >>>> > > > > limits > >>>> > > > > > > and > >>>> > > > > > > > >> > sample limits you can still overestimate your setup, > >>>> you > >>>> > > still > >>>> > > > > want > >>>> > > > > > > to > >>>> > > > > > > > >> have > >>>> > > > > > > > >> > a higher priority targets that are preferred over > >>>> the entire > >>>> > > > > > > Prometheus > >>>> > > > > > > > >> to > >>>> > > > > > > > >> > fail. It would need to be based on the inability to > >>>> ingest > >>>> > > into > >>>> > > > > > > tsdb on > >>>> > > > > > > > >> the > >>>> > > > > > > > >> > current rate we are scrapping, if that is hit the > >>>> priority > >>>> > > class > >>>> > > > > > > would > >>>> > > > > > > > >> take > >>>> > > > > > > > >> > affect and only the highest priority targets would be > >>>> > > scrapped > >>>> > > > > in > >>>> > > > > > > > >> favour of > >>>> > > > > > > > >> > lower priority. Another option which might be > >>>> simpler would > >>>> > > be > >>>> > > > > to > >>>> > > > > > > have > >>>> > > > > > > > >> a > >>>> > > > > > > > >> > global limit on how much prometheus can handle based > >>>> on perf > >>>> > > > > > > testing. > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > This would be treated as a last resort, and there > >>>> would > >>>> > > > > definitely > >>>> > > > > > > be a > >>>> > > > > > > > >> > need for a high severity alert to inform the admin > >>>> that > >>>> > > > > something > >>>> > > > > > > went > >>>> > > > > > > > >> > terribly wrong, but because we would still be able > >>>> to ingest > >>>> > > > > > > Prometheus > >>>> > > > > > > > >> > metrics for example if they are higher priority class > >>>> > > alerting > >>>> > > > > > > would be > >>>> > > > > > > > >> > possible. > >>>> > > > > > > > >> > >>>> > > > > > > > >> Hi, > >>>> > > > > > > > >> > >>>> > > > > > > > >> I think that limiting the number of targets you scrape > >>>> is > >>>> > > already > >>>> > > > > a > >>>> > > > > > > last > >>>> > > > > > > > >> resort. I don't think we would need a second line of > >>>> defense. > >>>> > > > > > > > >> > >>>> > > > > > > > > > >>>> > > > > > > > > I agree with Julien here. If you've gotten to this > >>>> point you're > >>>> > > > > > > already > >>>> > > > > > > > > seriously overloaded, and prioritising individual > >>>> targets is > >>>> > > just > >>>> > > > > > > > > rearranging the deckchairs at that point. > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > >> > >>>> > > > > > > > >> You can achieve this priority by setting 2 jobs, one > >>>> which is > >>>> > > > > limited > >>>> > > > > > > > >> and one which is not, and use relabeling to decinde > >>>> which > >>>> > > target > >>>> > > > > is > >>>> > > > > > > > >> going in which job. > >>>> > > > > > > > >> > >>>> > > > > > > > > > >>>> > > > > > > > > Or more generally, one Prometheus for the important > >>>> targets and > >>>> > > > > > > another > >>>> > > > > > > > > for the less important and riskier targets. > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > I get your point completely Brian, and agree to some > >>>> degree but > >>>> > > > > people > >>>> > > > > > > are > >>>> > > > > > > > still going to be setting up a multi tenant prometheus > >>>> which then > >>>> > > > > causes > >>>> > > > > > > > the above problems I mentioned. Even within the riskier > >>>> targets > >>>> > > there > >>>> > > > > > > will > >>>> > > > > > > > be some more important than others for users. I think we > >>>> should > >>>> > > still > >>>> > > > > > > > strive to making a single shared Prometheus as safe as > >>>> possible, > >>>> > > if > >>>> > > > > this > >>>> > > > > > > is > >>>> > > > > > > > not the priority class I suggested, open to other ideas! > >>>> > > > > > > > >>>> > > > > > > Then 2 jobs are the answer, one unlimited and one limited. > >>>> > > > > > > > >>>> > > > > > > The target_limit is already pretty advanced use case. > >>>> > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > Brian > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > >> > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > We could model this on something like PriorityClass > >>>> > > > > > > > >> > < > >>>> > > > > > > > >> > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass > >>>> > > > > > > >>>> > > > > > > > >>>> > > > > > > > >> from > >>>> > > > > > > > >> > Kubernetes, but I am open to other suggestions. > >>>> > > > > > > > >> > >>>> > > > > > > > >> That could be used in relabeling as I said. > >>>> > > > > > > > >> > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > I am open to other suggestions, or maybe there is > >>>> something > >>>> > > like > >>>> > > > > > > this > >>>> > > > > > > > >> but I > >>>> > > > > > > > >> > missed it. The main purpose is to ensure there are > >>>> > > protection > >>>> > > > > > > > >> mechanisms in > >>>> > > > > > > > >> > place, so any ideas and suggestions welcome! > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > >>>> > > > > > > > >> regards, > >>>> > > > > > > > >> > >>>> > > > > > > > >> > Thanks and kind regards, > >>>> > > > > > > > >> > Lili > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > -- > >>>> > > > > > > > >> > You received this message because you are subscribed > >>>> to the > >>>> > > > > Google > >>>> > > > > > > > >> Groups "Prometheus Developers" group. > >>>> > > > > > > > >> > To unsubscribe from this group and stop receiving > >>>> emails > >>>> > > from > >>>> > > > > it, > >>>> > > > > > > send > >>>> > > > > > > > >> an email to > >>>> > > [email protected] > >>>> > > > > > > > >> <javascript:>. > >>>> > > > > > > > >> > To view this discussion on the web visit > >>>> > > > > > > > >> > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/30df615e-5420-4bdf-9cb7-2790ef19d520o%40googlegroups.com > >>>> > > > > > > > >> . > >>>> > > > > > > > >> > >>>> > > > > > > > >> > >>>> > > > > > > > >> -- > >>>> > > > > > > > >> Julien Pivotto > >>>> > > > > > > > >> @roidelapluie > >>>> > > > > > > > >> > >>>> > > > > > > > >> -- > >>>> > > > > > > > >> You received this message because you are subscribed > >>>> to the > >>>> > > Google > >>>> > > > > > > Groups > >>>> > > > > > > > >> "Prometheus Developers" group. > >>>> > > > > > > > >> To unsubscribe from this group and stop receiving > >>>> emails from > >>>> > > it, > >>>> > > > > > > send an > >>>> > > > > > > > >> email to > >>>> [email protected] > >>>> > > > > > > <javascript:> > >>>> > > > > > > > >> . > >>>> > > > > > > > >> To view this discussion on the web visit > >>>> > > > > > > > >> > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/20200722091759.GA140540%40oxygen > >>>> > > > > > > > >> . > >>>> > > > > > > > >> > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > -- > >>>> > > > > > > > > Brian Brazil > >>>> > > > > > > > > www.robustperception.io > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > -- > >>>> > > > > > > > You received this message because you are subscribed to > >>>> the > >>>> > > Google > >>>> > > > > > > Groups "Prometheus Developers" group. > >>>> > > > > > > > To unsubscribe from this group and stop receiving emails > >>>> from it, > >>>> > > > > send > >>>> > > > > > > an email to > >>>> [email protected]. > >>>> > > > > > > > To view this discussion on the web visit > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/b0b9e5f7-239a-4cc7-9108-9e6e015a30d6o%40googlegroups.com > >>>> > > > > > > . > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > -- > >>>> > > > > > > Julien Pivotto > >>>> > > > > > > @roidelapluie > >>>> > > > > > > > >>>> > > > > > > -- > >>>> > > > > > > You received this message because you are subscribed to the > >>>> Google > >>>> > > > > Groups > >>>> > > > > > > "Prometheus Developers" group. > >>>> > > > > > > To unsubscribe from this group and stop receiving emails > >>>> from it, > >>>> > > send > >>>> > > > > an > >>>> > > > > > > email to [email protected] > >>>> . > >>>> > > > > > > To view this discussion on the web visit > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/20200722094024.GA175281%40oxygen > >>>> > > > > > > . > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > -- > >>>> > > > > > You received this message because you are subscribed to the > >>>> Google > >>>> > > > > Groups "Prometheus Developers" group. > >>>> > > > > > To unsubscribe from this group and stop receiving emails from > >>>> it, > >>>> > > send > >>>> > > > > an email to [email protected]. > >>>> > > > > > To view this discussion on the web visit > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1Umx-uFZFPoeOMA-ev4oN5QoRUyODiCWnSZML3hessHkmBQ%40mail.gmail.com > >>>> > > > > . > >>>> > > > > > >>>> > > > > -- > >>>> > > > > Julien Pivotto > >>>> > > > > @roidelapluie > >>>> > > > > > >>>> > > > > >>>> > > > -- > >>>> > > > You received this message because you are subscribed to the Google > >>>> > > Groups "Prometheus Developers" group. > >>>> > > > To unsubscribe from this group and stop receiving emails from it, > >>>> send > >>>> > > an email to [email protected]. > >>>> > > > To view this discussion on the web visit > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmzgPKCrpmsDb4v3CrN9Oe%2Bmaka8bosCDuodmjmd-RAyLw%40mail.gmail.com > >>>> > > . > >>>> > > > >>>> > > -- > >>>> > > Julien Pivotto > >>>> > > @roidelapluie > >>>> > > > >>>> > > >>>> > -- > >>>> > You received this message because you are subscribed to the Google > >>>> Groups "Prometheus Developers" group. > >>>> > To unsubscribe from this group and stop receiving emails from it, > >>>> send an email to [email protected]. > >>>> > To view this discussion on the web visit > >>>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmyxR%3DQ%2B6_emwh12CVwkwemU%2B-tzenvgP1WQ%2BCHnw67UUQ%40mail.gmail.com > >>>> . > >>>> > >>>> -- > >>>> Julien Pivotto > >>>> @roidelapluie > >>>> > >>> -- > >>> You received this message because you are subscribed to the Google > >>> Groups "Prometheus Developers" group. > >>> To unsubscribe from this group and stop receiving emails from it, send > >>> an email to [email protected]. > >>> To view this discussion on the web visit > >>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com > >>> <https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer> > >>> . > >>> > >> -- > > You received this message because you are subscribed to the Google Groups > > "Prometheus Developers" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to [email protected]. > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/prometheus-developers/4e4786ba-2ecd-497d-b900-18c8a30e9c75o%40googlegroups.com > > <https://groups.google.com/d/msgid/prometheus-developers/4e4786ba-2ecd-497d-b900-18c8a30e9c75o%40googlegroups.com?utm_medium=email&utm_source=footer> > > . > > > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-developers/CAMssQwZT78NtfWCQCsrx%2B-B3u4RZKGoFmMGKEH_ypXWGoh3w%2Bw%40mail.gmail.com. -- Julien Pivotto @roidelapluie -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/20200730091922.GA156213%40oxygen.

