[
https://issues.apache.org/jira/browse/SOLR-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260658#comment-17260658
]
ASF subversion and git services commented on SOLR-15059:
--------------------------------------------------------
Commit 1b1b8d333ec90cb71b75e919fdfae5cf70b191fb in lucene-solr's branch
refs/heads/branch_8x from Timothy Potter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1b1b8d3 ]
SOLR-15059: Improve query performance monitoring (#2184)
> Default Grafana dashboard needs to expose graphs for monitoring query
> performance
> ---------------------------------------------------------------------------------
>
> Key: SOLR-15059
> URL: https://issues.apache.org/jira/browse/SOLR-15059
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Grafana Dashboard, metrics
> Reporter: Timothy Potter
> Assignee: Timothy Potter
> Priority: Major
> Fix For: 8.8, master (9.0)
>
> Attachments: Screen Shot 2020-12-23 at 10.22.43 AM.png
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> The default Grafana dashboard doesn't expose graphs for monitoring query
> performance. For instance, if I want to see QPS for a collection, that's not
> shown in the default dashboard. Same for quantiles like p95 query latency.
> After some digging, these metrics are available in the output from
> {{/admin/metrics}} but are not exported by the exporter.
> This PR proposes to enhance the default dashboard with a new Query Metrics
> section with the following metrics:
> * Distributed QPS per Collection (aggregated across all cores)
> * Distributed QPS per Solr Node (aggregated across all base_url)
> * QPS 1-min rate per core
> * QPS 5-min rate per core
> * Top-level Query latency p99, p95, p75
> * Local (non-distrib) query count per core (this is important for determining
> if there is unbalanced load)
> * Local (non-distrib) query rate per core (1-min)
> * Local (non-distrib) p95 per core
> Also, the {{solr-exporter-config.xml}} uses {{jq}} queries to pull metrics
> from the output from {{/admin/metrics}}. This file is huge and contains a
> bunch of {{jq}} boilerplate. Moreover, I'm introducing another 15-20 metrics
> in this PR, it only makes the file more verbose.
> Thus, I'm also introducing support for jq templates so as to reduce
> boilerplate, reduce syntax errors, and improve readability. For instance the
> query metrics I'm adding to the config look like this:
> {code}
> <str>
> $jq:core-query(1minRate, endswith(".distrib.requestTimes"))
> </str>
> <str>
> $jq:core-query(5minRate, endswith(".distrib.requestTimes"))
> </str>
> {code}
> Instead of duplicating the complicated {{jq}} query for each metric. The
> templates are optional and only should be used if a given jq structure is
> repeated 3 or more times. Otherwise, inlining the jq query is still
> supported. Here's how the templates work:
> {code}
> A regex with named groups is used to match template references to template
> + vars using the basic pattern:
> $jq:<TEMPLATE>( <UNIQUE>, <KEYSELECTOR>, <METRIC>, <TYPE> )
> For instance,
> $jq:core(requests_total, endswith(".requestTimes"), count, COUNTER)
> TEMPLATE = core
> UNIQUE = requests_total (unique suffix for this metric, results in a metric
> named "solr_metrics_core_requests_total")
> KEYSELECTOR = endswith(".requestTimes") (filter to select the specific key
> for this metric)
> METRIC = count
> TYPE = COUNTER
> Some templates may have a default type, so you can omit that from your
> template reference, such as:
> $jq:core(requests_total, endswith(".requestTimes"), count)
> Uses the defaultType=COUNTER as many uses of the core template are counts.
> If a template reference omits the metric, then the unique suffix is used,
> for instance:
> $jq:core-query(1minRate, endswith(".distrib.requestTimes"))
> Creates a GAUGE metric (default type) named
> "solr_metrics_core_query_1minRate" using the 1minRate value from the selected
> JSON object.
> {code}
> Just so people don't have to go digging in the large diff on the config XML,
> here are the query metrics I'm adding to the exporter config with use of the
> templates idea:
> {code}
> <str>
> $jq:core-query(errors_1minRate, select(.key |
> endswith(".errors")), 1minRate)
> </str>
> <str>
> $jq:core-query(client_errors_1minRate, select(.key |
> endswith(".clientErrors")), 1minRate)
> </str>
> <str>
> $jq:core-query(1minRate, select(.key |
> endswith(".distrib.requestTimes")), 1minRate)
> </str>
> <str>
> $jq:core-query(5minRate, select(.key |
> endswith(".distrib.requestTimes")), 5minRate)
> </str>
> <str>
> $jq:core-query(median_ms, select(.key |
> endswith(".distrib.requestTimes")), median_ms)
> </str>
> <str>
> $jq:core-query(p75_ms, select(.key |
> endswith(".distrib.requestTimes")), p75_ms)
> </str>
> <str>
> $jq:core-query(p95_ms, select(.key |
> endswith(".distrib.requestTimes")), p95_ms)
> </str>
> <str>
> $jq:core-query(p99_ms, select(.key |
> endswith(".distrib.requestTimes")), p99_ms)
> </str>
> <str>
> $jq:core-query(mean_rate, select(.key |
> endswith(".distrib.requestTimes")), meanRate)
> </str>
>
> <!-- Local (non-distrib) query metrics -->
> <str>
> $jq:core-query(local_1minRate, select(.key |
> endswith(".local.requestTimes")), 1minRate)
> </str>
> <str>
> $jq:core-query(local_5minRate, select(.key |
> endswith(".local.requestTimes")), 5minRate)
> </str>
> <str>
> $jq:core-query(local_median_ms, select(.key |
> endswith(".local.requestTimes")), median_ms)
> </str>
> <str>
> $jq:core-query(local_p75_ms, select(.key |
> endswith(".local.requestTimes")), p75_ms)
> </str>
> <str>
> $jq:core-query(local_p95_ms, select(.key |
> endswith(".local.requestTimes")), p95_ms)
> </str>
> <str>
> $jq:core-query(local_p99_ms, select(.key |
> endswith(".local.requestTimes")), p99_ms)
> </str>
> <str>
> $jq:core-query(local_mean_rate, select(.key |
> endswith(".local.requestTimes")), meanRate)
> </str>
> <str>
> $jq:core-query(local_count, select(.key |
> endswith(".local.requestTimes")), count, COUNTER)
> </str>
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]