I also have some Python that pull stuff from clusterstatus and sends it to 
InfluxDB.

We wrote a servlet filter that intercepts requests to Solr and sends 
performance data
to monitoring. That gives us per-request handler traffic and response time 
percentiles.

Telegraf for CPU, run queue, disk IO, etc.

CloudWatch for load balancer traffic, errors, and healthy host count.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 28, 2020, at 8:00 AM, matthew sporleder <msporle...@gmail.com> wrote:
> 
> I think clusterstatus is how you find some of that stuff.
> 
> I wrote this when I was using datadog to supplement what they offered:
> https://github.com/msporleder/dd-solrcloud/blob/master/solrcloud.py
> (sorry for crappy python) and it got me most of the monitoring I
> needed for my particular situation.
> 
> 
> 
> 
> On Tue, Apr 28, 2020 at 10:52 AM Radu Gheorghe
> <radu.gheor...@sematext.com> wrote:
>> 
>> Thanks a lot, Matthew! OK, so you do care about the size of tlogs. As well
>> as Collections API stuff (clusterstatus, overseerstatus).
>> 
>> And DIH, I didn't think that these stats would be interesting, but surely
>> they are for people who use DIH :)
>> 
>> Best regards,
>> Radu
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> On Tue, Apr 28, 2020 at 4:17 PM matthew sporleder <msporle...@gmail.com>
>> wrote:
>> 
>>> size-on-disk of cores, size of tlogs, DIH stats over time, last
>>> modified date of cores
>>> 
>>> The most important alert-type things are -- collections in recovery or
>>> down state, solrcloud election events, various error rates
>>> 
>>> It's also important to be able to tie these back to aliases so you are
>>> only monitoring cores you care about, even if their backing collection
>>> name changes every so often
>>> 
>>> 
>>> 
>>> On Tue, Apr 28, 2020 at 7:57 AM Radu Gheorghe
>>> <radu.gheor...@sematext.com> wrote:
>>>> 
>>>> Hi fellow Solr users,
>>>> 
>>>> I'm looking into improving our Solr monitoring
>>>> <https://sematext.com/docs/integration/solr/> and I was curious on which
>>>> metrics you consider relevant.
>>>> 
>>>> From what we currently have, I'm only really missing fieldCache. Which we
>>>> collect, but not show in the UI yet (unless you add a custom chart -
>>> we'll
>>>> add it to default soon).
>>>> 
>>>> You can click on a demo account <https://apps.sematext.com/demo>
>>> (there's a
>>>> Solr app there called PH.Prod.Solr7) to see what we already collect, but
>>>> I'll write it here in short:
>>>> - query rate and latency (you can group per handler, per core, per
>>>> collection if it's SolrCloud)
>>>> - index size (number of segments, files...)
>>>> - indexing: added/deleted docs, commits
>>>> - caches (size, hit ratio, warmup...)
>>>> - OS- and JVM-level metrics (from CPU iowait to GC latency and everything
>>>> in between)
>>>> 
>>>> Anything that we should add?
>>>> 
>>>> I went through the Metrics API output, and the only significant thing I
>>> can
>>>> think of is the transaction log. But to be honest I never checked those
>>>> metrics in practice.
>>>> 
>>>> Or maybe there's something outside the Metrics API that would be useful?
>>> I
>>>> thought about the breakdown of shards that are up/down/recovering... as
>>>> well as replica types. We plan on adding those, but there's a challenge
>>> in
>>>> de-duplicating metrics. Because one would install one agent per node, and
>>>> I'm not aware of a way to show only local shards in the Collections API
>>> ->
>>>> CLUSTERSTATUS.
>>>> 
>>>> Thanks in advance for any feedback that you may have!
>>>> Radu
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 

Reply via email to