Re: [prometheus-developers] Introduce the concept of scrape Priority for Targets

Julien Pivotto Thu, 30 Jul 2020 02:20:25 -0700

The problem is not that much priorities etc, it is all the questions and
confusions around this:


- When do we decide we are overloaded?
- What do we do for the low priority targets?

and more importantly:

- When do we decide that we can scrape the low targets again?

How to avoid:

High load -> stop low scrapes
-> Normal load (because we do not scrape low priorities) -> restart low
scrapes
-> High load -> stop low scrapes
-> Normal load (because we do not scrape low priorities) -> restart low
scrapes
-> High load -> stop low scrapes
-> Normal load (because we do not scrape low priorities) -> restart low
scrapes


Overall that does not seem easy questions.

On 30 Jul 10:10, Bartłomiej Płotka wrote:
> Yes, looks like having many scrapers would solve this, and having Thanos on
> top for query aggregation can do. However, given the overhead of even
> operating the TSDB instances like Prometheus (e.g maintaining persistence
> volumes), I would still see some longer-term solution of better multitenant
> support (isolation of tenants scrape) within scrape engine. Some
> alternative is dynamic relabelling configured from outside as seen here
> https://blog.freshtracks.io/bomb-squad-automatic-detection-and-suppression-of-prometheus-cardinality-explosions-62ca8e02fa32
> -
> I think with good monitoring of Prometheus health we could implement
> "sidecar" applying such priorities dynamically as well. That would be good
> for a star maybe (:
> 
> In the meantime, the separate scraper looks like the way to go.
> 
> Kind Regards,
> Bartek
> 
> On Thu, 30 Jul 2020 at 10:01, Lili Cosic <[email protected]> wrote:
> 
> > Thanks, everyone for the replies! The official msg seems to be to use a
> > Prometheus instance per tenant/priority if you want to have multiple
> > tenants in your environment.
> >
> > Kind regards,
> > Lili
> >
> > On Thursday, 30 July 2020 10:44:59 UTC+2, Ben Kochie wrote:
> >>
> >> I'm with Brian and Julian on this.
> >>
> >> Multi-tenancy is not really something we want to solve in Prometheus.
> >> This is a concern for higher level systems like Kubernetes. Prometheus is
> >> designed to be distributed. If you have targets with different needs, they
> >> need to have separate Prometheus instances.
> >>
> >> This is also why we have things like Thanos and Cortex as aggregation
> >> layers.
> >>
> >> Similar to why we have said we don't plan to implement IO limits, this is
> >> a scheduling concern, out of scope for Prometheus.
> >>
> >> On Thu, Jul 30, 2020, 10:31 Frederic Branczyk <[email protected]> wrote:
> >>
> >>> That's only effective in limiting the number of targets, the point here
> >>> is that selectively scraping those with a higher priority based on
> >>> backpressure of the system as a whole.
> >>>
> >>> On Wed, 22 Jul 2020 at 17:00, Julien Pivotto <[email protected]>
> >>> wrote:
> >>>
> >>>> On 22 Jul 16:47, Frederic Branczyk wrote:
> >>>> > In practice even that can still be problematic. You only know that
> >>>> > Prometheus has a problem when everything fails, the point is to keep
> >>>> things
> >>>> > alive well enough for more critical components.
> >>>> >
> >>>> > On Wed, 22 Jul 2020 at 16:38, Julien Pivotto <[email protected]
> >>>> >
> >>>> > wrote:
> >>>> >
> >>>> > > On 22 Jul 16:36, Frederic Branczyk wrote:
> >>>> > > > It's unclear how that helps, can you help me understand?
> >>>> > >
> >>>> > > - job: highprio
> >>>> > >   relabel_configs:
> >>>> > >   - target_label: job
> >>>> > >     replacement: pods
> >>>> > >   - source_labels: [__meta_pod_priority]
> >>>> > >     regex: high
> >>>> > >     action: keep
> >>>>
> >>>> highprio job will always be scraped.
> >>>>
> >>>> > > - job: lowprio
> >>>> > >   relabel_configs:
> >>>> > >   - target_label: job
> >>>> > >     replacement: pods
> >>>> > >   - source_labels: [__meta_pod_priority]
> >>>> > >     regex: high
> >>>> > >     action: drop
> >>>> > >   target_limit: 1000
> >>>> > >
> >>>> > > >
> >>>> > > > On Wed, 22 Jul 2020 at 16:34, Julien Pivotto <
> >>>> [email protected]
> >>>> > > >
> >>>> > > > wrote:
> >>>> > > >
> >>>> > > > > On 22 Jul 16:32, Frederic Branczyk wrote:
> >>>> > > > > > Can you explain what you mean by two jobs? Do you mean two
> >>>> scrape
> >>>> > > > > configs?
> >>>> > > > >
> >>>> > > > > Yes.
> >>>> > > > >
> >>>> > > > > >
> >>>> > > > > > On Wed, 22 Jul 2020 at 11:40, Julien Pivotto <
> >>>> > > [email protected]
> >>>> > > > > >
> >>>> > > > > > wrote:
> >>>> > > > > >
> >>>> > > > > > > On 22 Jul 02:35, Lili Cosic wrote:
> >>>> > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > > > On Wednesday, 22 July 2020 11:23:00 UTC+2, Brian Brazil
> >>>> wrote:
> >>>> > > > > > > > >
> >>>> > > > > > > > > On Wed, 22 Jul 2020 at 10:18, Julien Pivotto <
> >>>> > > > > [email protected]
> >>>> > > > > > > > > <javascript:>> wrote:
> >>>> > > > > > > > >
> >>>> > > > > > > > >> On 22 Jul 02:14, Lili Cosic wrote:
> >>>> > > > > > > > >> > Only now seen in the docs that I am supposed to
> >>>> start any
> >>>> > > > > > > discussions
> >>>> > > > > > > > >> here
> >>>> > > > > > > > >> > first before opening an issue, sorry about that! :)
> >>>> > > > > > > > >> >
> >>>> > > > > > > > >> > Currently there is no way of a target to have higher
> >>>> scrape
> >>>> > > > > > > priority
> >>>> > > > > > > > >> over
> >>>> > > > > > > > >> > another, but if you have a setup and even if you set
> >>>> target
> >>>> > > > > limits
> >>>> > > > > > > and
> >>>> > > > > > > > >> > sample limits you can still overestimate your setup,
> >>>> you
> >>>> > > still
> >>>> > > > > want
> >>>> > > > > > > to
> >>>> > > > > > > > >> have
> >>>> > > > > > > > >> > a higher priority targets that are preferred over
> >>>> the entire
> >>>> > > > > > > Prometheus
> >>>> > > > > > > > >> to
> >>>> > > > > > > > >> > fail. It would need to be based on the inability to
> >>>> ingest
> >>>> > > into
> >>>> > > > > > > tsdb on
> >>>> > > > > > > > >> the
> >>>> > > > > > > > >> > current rate we are scrapping, if that is hit the
> >>>> priority
> >>>> > > class
> >>>> > > > > > > would
> >>>> > > > > > > > >> take
> >>>> > > > > > > > >> > affect and only the highest priority targets would be
> >>>> > > scrapped
> >>>> > > > > in
> >>>> > > > > > > > >> favour of
> >>>> > > > > > > > >> > lower priority. Another option which might be
> >>>> simpler would
> >>>> > > be
> >>>> > > > > to
> >>>> > > > > > > have
> >>>> > > > > > > > >> a
> >>>> > > > > > > > >> > global limit on how much prometheus can handle based
> >>>> on perf
> >>>> > > > > > > testing.
> >>>> > > > > > > > >> >
> >>>> > > > > > > > >> > This would be treated as a last resort, and there
> >>>> would
> >>>> > > > > definitely
> >>>> > > > > > > be a
> >>>> > > > > > > > >> > need for a high severity alert to inform the admin
> >>>> that
> >>>> > > > > something
> >>>> > > > > > > went
> >>>> > > > > > > > >> > terribly wrong, but because we would still be able
> >>>> to ingest
> >>>> > > > > > > Prometheus
> >>>> > > > > > > > >> > metrics for example if they are higher priority class
> >>>> > > alerting
> >>>> > > > > > > would be
> >>>> > > > > > > > >> > possible.
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> Hi,
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> I think that limiting the number of targets you scrape
> >>>> is
> >>>> > > already
> >>>> > > > > a
> >>>> > > > > > > last
> >>>> > > > > > > > >> resort. I don't think we would need a second line of
> >>>> defense.
> >>>> > > > > > > > >>
> >>>> > > > > > > > >
> >>>> > > > > > > > > I agree with Julien here. If you've gotten to this
> >>>> point you're
> >>>> > > > > > > already
> >>>> > > > > > > > > seriously overloaded, and prioritising individual
> >>>> targets is
> >>>> > > just
> >>>> > > > > > > > > rearranging the deckchairs at that point.
> >>>> > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> You can achieve this priority by setting 2 jobs, one
> >>>> which is
> >>>> > > > > limited
> >>>> > > > > > > > >> and one which is not, and use relabeling to decinde
> >>>> which
> >>>> > > target
> >>>> > > > > is
> >>>> > > > > > > > >> going in which job.
> >>>> > > > > > > > >>
> >>>> > > > > > > > >
> >>>> > > > > > > > > Or more generally, one Prometheus for the important
> >>>> targets and
> >>>> > > > > > > another
> >>>> > > > > > > > > for the less important and riskier targets.
> >>>> > > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > > > I get your point completely Brian, and agree to some
> >>>> degree but
> >>>> > > > > people
> >>>> > > > > > > are
> >>>> > > > > > > > still going to be setting up a multi tenant prometheus
> >>>> which then
> >>>> > > > > causes
> >>>> > > > > > > > the above problems I mentioned. Even within the riskier
> >>>> targets
> >>>> > > there
> >>>> > > > > > > will
> >>>> > > > > > > > be some more important than others for users. I think we
> >>>> should
> >>>> > > still
> >>>> > > > > > > > strive to making a single shared Prometheus as safe as
> >>>> possible,
> >>>> > > if
> >>>> > > > > this
> >>>> > > > > > > is
> >>>> > > > > > > > not the priority class I suggested, open to other ideas!
> >>>> > > > > > >
> >>>> > > > > > > Then 2 jobs are the answer, one unlimited and one limited.
> >>>> > > > > > >
> >>>> > > > > > > The target_limit is already pretty advanced use case.
> >>>> > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > > > Brian
> >>>> > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> >
> >>>> > > > > > > > >> > We could model this on something like PriorityClass
> >>>> > > > > > > > >> > <
> >>>> > > > > > > > >>
> >>>> > > > > > >
> >>>> > > > >
> >>>> > >
> >>>> https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
> >>>> > > > > >
> >>>> > > > > > >
> >>>> > > > > > > > >> from
> >>>> > > > > > > > >> > Kubernetes, but I am open to other suggestions.
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> That could be used in relabeling as I said.
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> >
> >>>> > > > > > > > >> > I am open to other suggestions, or maybe there is
> >>>> something
> >>>> > > like
> >>>> > > > > > > this
> >>>> > > > > > > > >> but I
> >>>> > > > > > > > >> > missed it. The main purpose is to ensure there are
> >>>> > > protection
> >>>> > > > > > > > >> mechanisms in
> >>>> > > > > > > > >> > place, so any ideas and suggestions welcome!
> >>>> > > > > > > > >> >
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> regards,
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> > Thanks and kind regards,
> >>>> > > > > > > > >> > Lili
> >>>> > > > > > > > >> >
> >>>> > > > > > > > >> > --
> >>>> > > > > > > > >> > You received this message because you are subscribed
> >>>> to the
> >>>> > > > > Google
> >>>> > > > > > > > >> Groups "Prometheus Developers" group.
> >>>> > > > > > > > >> > To unsubscribe from this group and stop receiving
> >>>> emails
> >>>> > > from
> >>>> > > > > it,
> >>>> > > > > > > send
> >>>> > > > > > > > >> an email to
> >>>> > > [email protected]
> >>>> > > > > > > > >> <javascript:>.
> >>>> > > > > > > > >> > To view this discussion on the web visit
> >>>> > > > > > > > >>
> >>>> > > > > > >
> >>>> > > > >
> >>>> > >
> >>>> https://groups.google.com/d/msgid/prometheus-developers/30df615e-5420-4bdf-9cb7-2790ef19d520o%40googlegroups.com
> >>>> > > > > > > > >> .
> >>>> > > > > > > > >>
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> --
> >>>> > > > > > > > >> Julien Pivotto
> >>>> > > > > > > > >> @roidelapluie
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> --
> >>>> > > > > > > > >> You received this message because you are subscribed
> >>>> to the
> >>>> > > Google
> >>>> > > > > > > Groups
> >>>> > > > > > > > >> "Prometheus Developers" group.
> >>>> > > > > > > > >> To unsubscribe from this group and stop receiving
> >>>> emails from
> >>>> > > it,
> >>>> > > > > > > send an
> >>>> > > > > > > > >> email to
> >>>> [email protected]
> >>>> > > > > > > <javascript:>
> >>>> > > > > > > > >> .
> >>>> > > > > > > > >> To view this discussion on the web visit
> >>>> > > > > > > > >>
> >>>> > > > > > >
> >>>> > > > >
> >>>> > >
> >>>> https://groups.google.com/d/msgid/prometheus-developers/20200722091759.GA140540%40oxygen
> >>>> > > > > > > > >> .
> >>>> > > > > > > > >>
> >>>> > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > > > --
> >>>> > > > > > > > > Brian Brazil
> >>>> > > > > > > > > www.robustperception.io
> >>>> > > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > > > --
> >>>> > > > > > > > You received this message because you are subscribed to
> >>>> the
> >>>> > > Google
> >>>> > > > > > > Groups "Prometheus Developers" group.
> >>>> > > > > > > > To unsubscribe from this group and stop receiving emails
> >>>> from it,
> >>>> > > > > send
> >>>> > > > > > > an email to
> >>>> [email protected].
> >>>> > > > > > > > To view this discussion on the web visit
> >>>> > > > > > >
> >>>> > > > >
> >>>> > >
> >>>> https://groups.google.com/d/msgid/prometheus-developers/b0b9e5f7-239a-4cc7-9108-9e6e015a30d6o%40googlegroups.com
> >>>> > > > > > > .
> >>>> > > > > > >
> >>>> > > > > > >
> >>>> > > > > > > --
> >>>> > > > > > > Julien Pivotto
> >>>> > > > > > > @roidelapluie
> >>>> > > > > > >
> >>>> > > > > > > --
> >>>> > > > > > > You received this message because you are subscribed to the
> >>>> Google
> >>>> > > > > Groups
> >>>> > > > > > > "Prometheus Developers" group.
> >>>> > > > > > > To unsubscribe from this group and stop receiving emails
> >>>> from it,
> >>>> > > send
> >>>> > > > > an
> >>>> > > > > > > email to [email protected]
> >>>> .
> >>>> > > > > > > To view this discussion on the web visit
> >>>> > > > > > >
> >>>> > > > >
> >>>> > >
> >>>> https://groups.google.com/d/msgid/prometheus-developers/20200722094024.GA175281%40oxygen
> >>>> > > > > > > .
> >>>> > > > > > >
> >>>> > > > > >
> >>>> > > > > > --
> >>>> > > > > > You received this message because you are subscribed to the
> >>>> Google
> >>>> > > > > Groups "Prometheus Developers" group.
> >>>> > > > > > To unsubscribe from this group and stop receiving emails from
> >>>> it,
> >>>> > > send
> >>>> > > > > an email to [email protected].
> >>>> > > > > > To view this discussion on the web visit
> >>>> > > > >
> >>>> > >
> >>>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1Umx-uFZFPoeOMA-ev4oN5QoRUyODiCWnSZML3hessHkmBQ%40mail.gmail.com
> >>>> > > > > .
> >>>> > > > >
> >>>> > > > > --
> >>>> > > > > Julien Pivotto
> >>>> > > > > @roidelapluie
> >>>> > > > >
> >>>> > > >
> >>>> > > > --
> >>>> > > > You received this message because you are subscribed to the Google
> >>>> > > Groups "Prometheus Developers" group.
> >>>> > > > To unsubscribe from this group and stop receiving emails from it,
> >>>> send
> >>>> > > an email to [email protected].
> >>>> > > > To view this discussion on the web visit
> >>>> > >
> >>>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmzgPKCrpmsDb4v3CrN9Oe%2Bmaka8bosCDuodmjmd-RAyLw%40mail.gmail.com
> >>>> > > .
> >>>> > >
> >>>> > > --
> >>>> > > Julien Pivotto
> >>>> > > @roidelapluie
> >>>> > >
> >>>> >
> >>>> > --
> >>>> > You received this message because you are subscribed to the Google
> >>>> Groups "Prometheus Developers" group.
> >>>> > To unsubscribe from this group and stop receiving emails from it,
> >>>> send an email to [email protected].
> >>>> > To view this discussion on the web visit
> >>>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmyxR%3DQ%2B6_emwh12CVwkwemU%2B-tzenvgP1WQ%2BCHnw67UUQ%40mail.gmail.com
> >>>> .
> >>>>
> >>>> --
> >>>> Julien Pivotto
> >>>> @roidelapluie
> >>>>
> >>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "Prometheus Developers" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> >>> an email to [email protected].
> >>> To view this discussion on the web visit
> >>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com
> >>> <https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> >>> .
> >>>
> >> --
> > You received this message because you are subscribed to the Google Groups
> > "Prometheus Developers" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to [email protected].
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/prometheus-developers/4e4786ba-2ecd-497d-b900-18c8a30e9c75o%40googlegroups.com
> > <https://groups.google.com/d/msgid/prometheus-developers/4e4786ba-2ecd-497d-b900-18c8a30e9c75o%40googlegroups.com?utm_medium=email&utm_source=footer>
> > .
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-developers/CAMssQwZT78NtfWCQCsrx%2B-B3u4RZKGoFmMGKEH_ypXWGoh3w%2Bw%40mail.gmail.com.

-- 
Julien Pivotto
@roidelapluie

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20200730091922.GA156213%40oxygen.

Re: [prometheus-developers] Introduce the concept of scrape Priority for Targets

Reply via email to