Thanks Matthew and Walter. OK, so you both use the clusterstatus output in
your regular monitoring. This seems to be missing from what we have now (we
collect everything else you mentioned, like response time percentiles, disk
IO, etc). So I guess clusterstatus deserves a priority bump :)

Best regards,
Radu
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Apr 28, 2020 at 6:47 PM Walter Underwood <wun...@wunderwood.org>
wrote:

> I also have some Python that pull stuff from clusterstatus and sends it to
> InfluxDB.
>
> We wrote a servlet filter that intercepts requests to Solr and sends
> performance data
> to monitoring. That gives us per-request handler traffic and response time
> percentiles.
>
> Telegraf for CPU, run queue, disk IO, etc.
>
> CloudWatch for load balancer traffic, errors, and healthy host count.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 28, 2020, at 8:00 AM, matthew sporleder <msporle...@gmail.com>
> wrote:
> >
> > I think clusterstatus is how you find some of that stuff.
> >
> > I wrote this when I was using datadog to supplement what they offered:
> > https://github.com/msporleder/dd-solrcloud/blob/master/solrcloud.py
> > (sorry for crappy python) and it got me most of the monitoring I
> > needed for my particular situation.
> >
> >
> >
> >
> > On Tue, Apr 28, 2020 at 10:52 AM Radu Gheorghe
> > <radu.gheor...@sematext.com> wrote:
> >>
> >> Thanks a lot, Matthew! OK, so you do care about the size of tlogs. As
> well
> >> as Collections API stuff (clusterstatus, overseerstatus).
> >>
> >> And DIH, I didn't think that these stats would be interesting, but
> surely
> >> they are for people who use DIH :)
> >>
> >> Best regards,
> >> Radu
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >> On Tue, Apr 28, 2020 at 4:17 PM matthew sporleder <msporle...@gmail.com
> >
> >> wrote:
> >>
> >>> size-on-disk of cores, size of tlogs, DIH stats over time, last
> >>> modified date of cores
> >>>
> >>> The most important alert-type things are -- collections in recovery or
> >>> down state, solrcloud election events, various error rates
> >>>
> >>> It's also important to be able to tie these back to aliases so you are
> >>> only monitoring cores you care about, even if their backing collection
> >>> name changes every so often
> >>>
> >>>
> >>>
> >>> On Tue, Apr 28, 2020 at 7:57 AM Radu Gheorghe
> >>> <radu.gheor...@sematext.com> wrote:
> >>>>
> >>>> Hi fellow Solr users,
> >>>>
> >>>> I'm looking into improving our Solr monitoring
> >>>> <https://sematext.com/docs/integration/solr/> and I was curious on
> which
> >>>> metrics you consider relevant.
> >>>>
> >>>> From what we currently have, I'm only really missing fieldCache.
> Which we
> >>>> collect, but not show in the UI yet (unless you add a custom chart -
> >>> we'll
> >>>> add it to default soon).
> >>>>
> >>>> You can click on a demo account <https://apps.sematext.com/demo>
> >>> (there's a
> >>>> Solr app there called PH.Prod.Solr7) to see what we already collect,
> but
> >>>> I'll write it here in short:
> >>>> - query rate and latency (you can group per handler, per core, per
> >>>> collection if it's SolrCloud)
> >>>> - index size (number of segments, files...)
> >>>> - indexing: added/deleted docs, commits
> >>>> - caches (size, hit ratio, warmup...)
> >>>> - OS- and JVM-level metrics (from CPU iowait to GC latency and
> everything
> >>>> in between)
> >>>>
> >>>> Anything that we should add?
> >>>>
> >>>> I went through the Metrics API output, and the only significant thing
> I
> >>> can
> >>>> think of is the transaction log. But to be honest I never checked
> those
> >>>> metrics in practice.
> >>>>
> >>>> Or maybe there's something outside the Metrics API that would be
> useful?
> >>> I
> >>>> thought about the breakdown of shards that are up/down/recovering...
> as
> >>>> well as replica types. We plan on adding those, but there's a
> challenge
> >>> in
> >>>> de-duplicating metrics. Because one would install one agent per node,
> and
> >>>> I'm not aware of a way to show only local shards in the Collections
> API
> >>> ->
> >>>> CLUSTERSTATUS.
> >>>>
> >>>> Thanks in advance for any feedback that you may have!
> >>>> Radu
> >>>> --
> >>>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
>
>

Reply via email to