Re: Metrics API - Documentation

Andrzej Białecki Tue, 15 Oct 2019 10:07:50 -0700

We keep all essential user documentation (and some dev docs) in the Ref Guide.


The source for the Ref Guide is checked-in under solr/solr-ref-guide, it uses a 
simple ASCII markup so adding some content should be easy. You should follow 
the same workflow as with the code (create a JIRA, and then either add a patch 
or create a PR).

> On 15 Oct 2019, at 17:33, Richard Goodman <richa...@brandwatch.com> wrote:
> 
> Many thanks both for your responses, they've been helpful.
> 
> @Andrzej - Sorry I wasn't clear on the "A latency of 1mil" as I wasn't
> aware the image wouldn't come through. But following your bullet points
> helped me present a better unit for measurement in the axis.
> 
> In regards to contributing, would absolutely love to help there, just not
> sure what the correct direction is? I wasn't sure if the web page source
> code / contributions are in the apache-lucene repository?
> 
> Thanks,
> 
> 
> On Tue, 8 Oct 2019 at 11:04, Andrzej Białecki <a...@getopt.org> wrote:
> 
>> Hi,
>> 
>> Starting with Solr 7.0 all JMX metrics are actually internally driven by
>> the metrics API - JMX (or Prometheus) is just a way of exposing them.
>> 
>> I agree that we need more documentation on metrics - contributions are
>> welcome :)
>> 
>> Regarding your specific examples (btw. our mailing lists aggressively
>> strip all attachments - your graphs didn’t make it):
>> 
>> * time units in time-based counters are in nanoseconds. This is just a
>> unit of value, not necessarily precision. In this specific example
>> `ADMIN./admin/collections.totalTime` (and similarly named metrics for all
>> other request handlers) represents the total elapsed time spent processing
>> requests.
>> * time-based histograms are expressed in milliseconds, where it is
>> indicated by the “_ms” suffix.
>> * 1-, 5- and 15-min rates represent an exponentially weighted moving
>> average over that time window, expressed in events/second.
>> * handlerStart is initialised with System.currentTimeMillis() when this
>> instance of request handler is first created.
>> * details on GC, memory buffer pools, and similar JVM metrics are
>> documented in JDK documentation on Management Beans. For example:
>> 
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
>> <
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
>>> 
>> * "A latency of 1mil” - no idea what that is, I don’t think Solr API uses
>> this abbreviation anywhere.
>> 
>> Hope this helps.
>> 
>> —
>> 
>> Andrzej Białecki
>> 
>>> On 7 Oct 2019, at 13:41, Emir Arnautović <emir.arnauto...@sematext.com>
>> wrote:
>>> 
>>> Hi Richard,
>>> We do not use API to collect metrics but JMX, but I believe that those
>> are the same (did not verify it in code). You can see how we handled those
>> metrics into reports/charts or even use our agent to send data to
>> Prometheus:
>> https://github.com/sematext/sematext-agent-integrations/tree/master/solr <
>> https://github.com/sematext/sematext-agent-integrations/tree/master/solr>
>>> 
>>> You can also see some links to Solr metric related blog posts in this
>> repo. If you find out that managing your own monitoring stack is
>> overwhelming, you can try our Solr integration.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 7 Oct 2019, at 12:40, Richard Goodman <richa...@brandwatch.com>
>> wrote:
>>>> 
>>>> Hi there,
>>>> 
>>>> I'm currently working on using the prometheus exporter to provide some
>> detailed insights for our Solr Cloud clusters.
>>>> 
>>>> Using the provided template killed our prometheus server, as well as
>> the exporter due to the size of our clusters (each cluster is around 96
>> nodes, ~300 collections with 3way replication and 16 shards), so you can
>> imagine the amount of data that comes through /admin/metrics and not
>> filtering it down first.
>>>> 
>>>> I've began working on writing my own template to reduce the amount of
>> data being requested and it's working fine, and I'm starting to build some
>> nice graphs in Grafana.
>>>> 
>>>> The only difficulty I'm having with this, is I'm struggling to find
>> decent documentation on the metrics themselves. I was using the resources
>> metrics reporting - metrics-api <
>> https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api>
>> and monitoring solr with prometheus and grafana <
>> https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html>
>> but there is a lack of information on most metrics.
>>>> 
>>>> For example:
>>>> "ADMIN./admin/collections.totalTime":6715327903,
>>>> I understand this is a counter, however, I'm not sure what unit this
>> would be represented when displaying it, for example:
>>>> 
>>>> 
>>>> 
>>>> A latency of 1mil, not sure if this means milliseconds, million, etc.,
>>>> Another example would be the GC metrics:
>>>>     "gc.ConcurrentMarkSweep.count":7,
>>>>     "gc.ConcurrentMarkSweep.time":1247,
>>>>     "gc.ParNew.count":16759,
>>>>     "gc.ParNew.time":884173,
>>>> Which when displayed, doesn't give the clearest insight as to what the
>> unit is:
>>>> 
>>>> 
>>>> If anyone has any advice / guidance, that would be greatly appreciated.
>> If there isn't documentation for the API, then this would also be something
>> I'll look into help contributing with too.
>>>> 
>>>> Thanks,
>>>> --
>>>> Richard Goodman
>>> 
>> 
>> 
> 
> -- 
> 
> Richard Goodman    |    Data Infrastructure engineer
> 
> richa...@brandwatch.com
> 
> 
> NEW YORK   | BOSTON   | BRIGHTON   | LONDON   | BERLIN |   STUTTGART |
> PARIS   | SINGAPORE | SYDNEY
> 
> <https://www.brandwatch.com/blog/digital-consumer-intelligence/>

Re: Metrics API - Documentation

Reply via email to