I think clusterstatus is how you find some of that stuff. I wrote this when I was using datadog to supplement what they offered: https://github.com/msporleder/dd-solrcloud/blob/master/solrcloud.py (sorry for crappy python) and it got me most of the monitoring I needed for my particular situation.
On Tue, Apr 28, 2020 at 10:52 AM Radu Gheorghe <radu.gheor...@sematext.com> wrote: > > Thanks a lot, Matthew! OK, so you do care about the size of tlogs. As well > as Collections API stuff (clusterstatus, overseerstatus). > > And DIH, I didn't think that these stats would be interesting, but surely > they are for people who use DIH :) > > Best regards, > Radu > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > On Tue, Apr 28, 2020 at 4:17 PM matthew sporleder <msporle...@gmail.com> > wrote: > > > size-on-disk of cores, size of tlogs, DIH stats over time, last > > modified date of cores > > > > The most important alert-type things are -- collections in recovery or > > down state, solrcloud election events, various error rates > > > > It's also important to be able to tie these back to aliases so you are > > only monitoring cores you care about, even if their backing collection > > name changes every so often > > > > > > > > On Tue, Apr 28, 2020 at 7:57 AM Radu Gheorghe > > <radu.gheor...@sematext.com> wrote: > > > > > > Hi fellow Solr users, > > > > > > I'm looking into improving our Solr monitoring > > > <https://sematext.com/docs/integration/solr/> and I was curious on which > > > metrics you consider relevant. > > > > > > From what we currently have, I'm only really missing fieldCache. Which we > > > collect, but not show in the UI yet (unless you add a custom chart - > > we'll > > > add it to default soon). > > > > > > You can click on a demo account <https://apps.sematext.com/demo> > > (there's a > > > Solr app there called PH.Prod.Solr7) to see what we already collect, but > > > I'll write it here in short: > > > - query rate and latency (you can group per handler, per core, per > > > collection if it's SolrCloud) > > > - index size (number of segments, files...) > > > - indexing: added/deleted docs, commits > > > - caches (size, hit ratio, warmup...) > > > - OS- and JVM-level metrics (from CPU iowait to GC latency and everything > > > in between) > > > > > > Anything that we should add? > > > > > > I went through the Metrics API output, and the only significant thing I > > can > > > think of is the transaction log. But to be honest I never checked those > > > metrics in practice. > > > > > > Or maybe there's something outside the Metrics API that would be useful? > > I > > > thought about the breakdown of shards that are up/down/recovering... as > > > well as replica types. We plan on adding those, but there's a challenge > > in > > > de-duplicating metrics. Because one would install one agent per node, and > > > I'm not aware of a way to show only local shards in the Collections API > > -> > > > CLUSTERSTATUS. > > > > > > Thanks in advance for any feedback that you may have! > > > Radu > > > -- > > > Monitoring - Log Management - Alerting - Anomaly Detection > > > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >