Timothy Potter created SOLR-15059:
-------------------------------------

             Summary: Default Grafana dashboard needs to expose graphs for 
monitoring query performance
                 Key: SOLR-15059
                 URL: https://issues.apache.org/jira/browse/SOLR-15059
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: Grafana Dashboard, metrics
            Reporter: Timothy Potter
            Assignee: Timothy Potter


The default Grafana dashboard doesn't expose graphs for monitoring query 
performance. For instance, if I want to see QPS for a collection, that's not 
shown in the default dashboard. Same for quantiles like p95 query latency.

After some digging, these metrics are available in the output from 
{{/admin/metrics}} but are not exported by the exporter.

This PR proposes to enhance the default dashboard with a new Query Metrics 
section with the following metrics:
* Distributed QPS per Collection (aggregated across all cores)
* Distributed QPS per Solr Node (aggregated across all base_url)
* QPS 1-min rate per core
* QPS 5-min rate per core
* Top-level Query latency p99, p95, p75
* Local (non-distrib) query count per core (this is important for determining 
if there is unbalanced load)
* Local (non-distrib) query rate per core (1-min)
* Local (non-distrib) p95 per core

Also, the {{solr-exporter-config.xml}} uses {{jq}} queries to pull metrics from 
the output from {{/admin/metrics}}. This file is huge and contains a bunch of 
{{jq}} boilerplate. Moreover, I'm introducing another 15-20 metrics in this PR, 
it only makes the file more verbose.

Thus, I'm also introducing support for jq templates so as to reduce 
boilerplate, reduce syntax errors, and improve readability. For instance the 
query metrics I'm adding to the config look like this:
{code}
          <str>
            $jq:core-query(1minRate, endswith(".distrib.requestTimes"))
          </str>
          <str>
            $jq:core-query(5minRate, endswith(".distrib.requestTimes"))
          </str>
{code}
Instead of duplicating the complicated {{jq}} query for each metric. The 
templates are optional and only should be used if a given jq structure is 
repeated 3 or more times. Otherwise, inlining the jq query is still supported.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to