I also have some Python that pull stuff from clusterstatus and sends it to InfluxDB.
We wrote a servlet filter that intercepts requests to Solr and sends performance data to monitoring. That gives us per-request handler traffic and response time percentiles. Telegraf for CPU, run queue, disk IO, etc. CloudWatch for load balancer traffic, errors, and healthy host count. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 28, 2020, at 8:00 AM, matthew sporleder <msporle...@gmail.com> wrote: > > I think clusterstatus is how you find some of that stuff. > > I wrote this when I was using datadog to supplement what they offered: > https://github.com/msporleder/dd-solrcloud/blob/master/solrcloud.py > (sorry for crappy python) and it got me most of the monitoring I > needed for my particular situation. > > > > > On Tue, Apr 28, 2020 at 10:52 AM Radu Gheorghe > <radu.gheor...@sematext.com> wrote: >> >> Thanks a lot, Matthew! OK, so you do care about the size of tlogs. As well >> as Collections API stuff (clusterstatus, overseerstatus). >> >> And DIH, I didn't think that these stats would be interesting, but surely >> they are for people who use DIH :) >> >> Best regards, >> Radu >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> On Tue, Apr 28, 2020 at 4:17 PM matthew sporleder <msporle...@gmail.com> >> wrote: >> >>> size-on-disk of cores, size of tlogs, DIH stats over time, last >>> modified date of cores >>> >>> The most important alert-type things are -- collections in recovery or >>> down state, solrcloud election events, various error rates >>> >>> It's also important to be able to tie these back to aliases so you are >>> only monitoring cores you care about, even if their backing collection >>> name changes every so often >>> >>> >>> >>> On Tue, Apr 28, 2020 at 7:57 AM Radu Gheorghe >>> <radu.gheor...@sematext.com> wrote: >>>> >>>> Hi fellow Solr users, >>>> >>>> I'm looking into improving our Solr monitoring >>>> <https://sematext.com/docs/integration/solr/> and I was curious on which >>>> metrics you consider relevant. >>>> >>>> From what we currently have, I'm only really missing fieldCache. Which we >>>> collect, but not show in the UI yet (unless you add a custom chart - >>> we'll >>>> add it to default soon). >>>> >>>> You can click on a demo account <https://apps.sematext.com/demo> >>> (there's a >>>> Solr app there called PH.Prod.Solr7) to see what we already collect, but >>>> I'll write it here in short: >>>> - query rate and latency (you can group per handler, per core, per >>>> collection if it's SolrCloud) >>>> - index size (number of segments, files...) >>>> - indexing: added/deleted docs, commits >>>> - caches (size, hit ratio, warmup...) >>>> - OS- and JVM-level metrics (from CPU iowait to GC latency and everything >>>> in between) >>>> >>>> Anything that we should add? >>>> >>>> I went through the Metrics API output, and the only significant thing I >>> can >>>> think of is the transaction log. But to be honest I never checked those >>>> metrics in practice. >>>> >>>> Or maybe there's something outside the Metrics API that would be useful? >>> I >>>> thought about the breakdown of shards that are up/down/recovering... as >>>> well as replica types. We plan on adding those, but there's a challenge >>> in >>>> de-duplicating metrics. Because one would install one agent per node, and >>>> I'm not aware of a way to show only local shards in the Collections API >>> -> >>>> CLUSTERSTATUS. >>>> >>>> Thanks in advance for any feedback that you may have! >>>> Radu >>>> -- >>>> Monitoring - Log Management - Alerting - Anomaly Detection >>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >>>