This is an automated email from the ASF dual-hosted git repository.
wusheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/skywalking.git
The following commit(s) were added to refs/heads/master by this push:
new 33ad987a72 Support MCP observability for Envoy AI Gateway (#13791)
33ad987a72 is described below
commit 33ad987a72dced25521fe4b2c62fba3d4e187fc9
Author: 吴晟 Wu Sheng <[email protected]>
AuthorDate: Mon Apr 6 23:05:26 2026 +0800
Support MCP observability for Envoy AI Gateway (#13791)
**MAL rules** (new files):
- `gateway-mcp-service.yaml` — 13 MCP service-level metrics (request
CPM/latency/percentile, method CPM, error CPM, initialization latency,
capabilities, per-backend breakdown)
- `gateway-mcp-instance.yaml` — 13 MCP instance-level metrics
**LAL rules** (modified `envoy-ai-gateway.yaml`):
- Split into two rules: `envoy-ai-gateway-llm-access-log` and
`envoy-ai-gateway-mcp-access-log`
- LLM logs: persist error responses (>= 400) and upstream failures only
- MCP logs: persist error responses (>= 400) only
- Both rules tag `ai_route_type` (`llm` or `mcp`) for searchable filtering
**Dashboard** (modified service + instance JSON):
- Added **MCP** tab with 9 widgets (service) / 6 widgets (instance):
request CPM, latency avg/percentile, error CPM, method CPM, initialization
latency, backend breakdown
**E2E test** (modified):
- Added `mcp-server` service (`tzolov/mcp-everything-server:v3` — MCP
reference server with StreamableHttp)
- Added MCP request steps (initialize + tools/list + tools/call)
- Added MCP metric verification cases
- Log query uses `ai_route_type=llm` tag filter
**Config**:
- Added `ai_route_type` to `searchableLogsTags` in `application.yml`
- Fixed aigw healthcheck binary path (`/app` instead of `aigw`)
---
docs/en/changes/changes.md | 1 +
.../backend/backend-envoy-ai-gateway-monitoring.md | 112 ++++--
.../src/main/resources/application.yml | 2 +-
.../src/main/resources/lal/envoy-ai-gateway.yaml | 61 +++-
.../envoy-ai-gateway/gateway-mcp-instance.yaml | 80 +++++
.../envoy-ai-gateway/gateway-mcp-service.yaml | 85 +++++
.../envoy-ai-gateway-instance.json | 368 +++++++++++++++++++-
.../envoy_ai_gateway/envoy-ai-gateway-service.json | 379 +++++++++++++++++++--
.../virtual_genai/virtual-genai-model.json | 10 +-
.../virtual_genai/virtual-genai-provider.json | 7 +-
.../cases/envoy-ai-gateway/docker-compose.yml | 32 +-
test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml | 39 ++-
.../envoy-ai-gateway/envoy-ai-gateway-cases.yaml | 24 +-
.../cases/envoy-ai-gateway/expected/logs.yml | 2 +
test/e2e-v2/cases/storage/expected/config-dump.yml | 2 +-
15 files changed, 1100 insertions(+), 104 deletions(-)
diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md
index 8e5a10c1df..08992c1600 100644
--- a/docs/en/changes/changes.md
+++ b/docs/en/changes/changes.md
@@ -11,6 +11,7 @@
* Fix missing parentheses around OR conditions in
`JDBCZipkinQueryDAO.getTraces()`, which caused the table filter to be bypassed
for all but the first trace ID. Replaced with a proper `IN` clause.
* Fix missing `and` keyword in `JDBCEBPFProfilingTaskDAO.getTaskRecord()` SQL
query, which caused a syntax error on every invocation.
* Fix duplicate `TABLE_COLUMN` condition in
`JDBCMetadataQueryDAO.findEndpoint()`, which was binding the same parameter
twice due to a copy-paste error.
+* Support MCP (Model Context Protocol) observability for Envoy AI Gateway: MCP
metrics (request CPM/latency, method breakdown, backend breakdown,
initialization latency, capabilities), MCP access log sampling (errors only),
`ai_route_type` searchable log tag, and MCP dashboard tabs.
#### UI
diff --git a/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md
b/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md
index e8620edb9d..bd614f6f61 100644
--- a/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md
+++ b/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md
@@ -4,8 +4,9 @@
[Envoy AI Gateway](https://aigateway.envoyproxy.io/) is a gateway/proxy for
AI/LLM API traffic
(OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on
top of Envoy Proxy.
-It natively emits GenAI metrics and access logs via OTLP, following
-[OpenTelemetry GenAI Semantic
Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
+It natively emits GenAI metrics following
+[OpenTelemetry GenAI Semantic
Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/),
+and also emits MCP (Model Context Protocol) metrics and access logs via OTLP.
SkyWalking receives OTLP metrics and logs directly on its gRPC port (11800) —
no OpenTelemetry
Collector is needed between the AI Gateway and SkyWalking OAP.
@@ -15,7 +16,7 @@ Collector is needed between the AI Gateway and SkyWalking OAP.
[Envoy AI Gateway getting
started](https://aigateway.envoyproxy.io/docs/getting-started/) for
installation.
### Data flow
-1. Envoy AI Gateway processes LLM API requests and records GenAI metrics
(token usage, latency, TTFT, TPOT).
+1. Envoy AI Gateway processes LLM API requests and MCP requests, recording
GenAI metrics and MCP metrics.
2. The AI Gateway pushes metrics and access logs via OTLP gRPC to SkyWalking
OAP.
3. SkyWalking OAP parses metrics with [MAL](../../concepts-and-designs/mal.md)
rules and access logs
with [LAL](../../concepts-and-designs/lal.md) rules.
@@ -27,14 +28,14 @@ in SkyWalking OAP. No OAP-side configuration is needed.
Configure the AI Gateway to push OTLP to SkyWalking by setting these
environment variables:
-| Env Var | Value | Purpose |
-|---------|-------|---------|
-| `OTEL_SERVICE_NAME` | Per-deployment gateway name (e.g., `my-ai-gateway`) |
SkyWalking service name |
-| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://skywalking-oap:11800` | SkyWalking
OAP gRPC receiver |
-| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | OTLP transport |
-| `OTEL_METRICS_EXPORTER` | `otlp` | Enable OTLP metrics push |
-| `OTEL_LOGS_EXPORTER` | `otlp` | Enable OTLP access log push |
-| `OTEL_RESOURCE_ATTRIBUTES` | See below | Routing + instance + layer |
+| Env Var | Value
| Purpose |
+|-------------------------------|-----------------------------------------------------|------------------------------|
+| `OTEL_SERVICE_NAME` | Per-deployment gateway name (e.g.,
`my-ai-gateway`) | SkyWalking service name |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://skywalking-oap:11800`
| SkyWalking OAP gRPC receiver |
+| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc`
| OTLP transport |
+| `OTEL_METRICS_EXPORTER` | `otlp`
| Enable OTLP metrics push |
+| `OTEL_LOGS_EXPORTER` | `otlp`
| Enable OTLP access log push |
+| `OTEL_RESOURCE_ATTRIBUTES` | See below
| Routing + instance + layer |
**Required resource attributes** (in `OTEL_RESOURCE_ATTRIBUTES`):
- `job_name=envoy-ai-gateway` — Fixed routing tag for MAL/LAL rules. Same for
all AI Gateway deployments.
@@ -58,47 +59,86 @@ is a service, each pod is an instance. Metrics include
per-provider and per-mode
#### Service Metrics
-| Monitoring Panel | Unit | Metric Name | Description |
-|---|---|---|---|
-| Request CPM | calls/min | meter_envoy_ai_gw_request_cpm | Requests per
minute |
-| Request Latency Avg | ms | meter_envoy_ai_gw_request_latency_avg | Average
request duration |
-| Request Latency Percentile | ms |
meter_envoy_ai_gw_request_latency_percentile | P50/P75/P90/P95/P99 |
-| Input Token Rate | tokens/min | meter_envoy_ai_gw_input_token_rate | Input
(prompt) tokens per minute |
-| Output Token Rate | tokens/min | meter_envoy_ai_gw_output_token_rate |
Output (completion) tokens per minute |
-| TTFT Avg | ms | meter_envoy_ai_gw_ttft_avg | Time to First Token (streaming
only) |
-| TTFT Percentile | ms | meter_envoy_ai_gw_ttft_percentile |
P50/P75/P90/P95/P99 TTFT |
-| TPOT Avg | ms | meter_envoy_ai_gw_tpot_avg | Time Per Output Token
(streaming only) |
-| TPOT Percentile | ms | meter_envoy_ai_gw_tpot_percentile |
P50/P75/P90/P95/P99 TPOT |
+| Monitoring Panel | Unit | Metric Name
| Description |
+|----------------------------|------------|----------------------------------------------|----------------------------------------|
+| Request CPM | calls/min | meter_envoy_ai_gw_request_cpm
| Requests per minute |
+| Request Latency Avg | ms |
meter_envoy_ai_gw_request_latency_avg | Average request duration
|
+| Request Latency Percentile | ms |
meter_envoy_ai_gw_request_latency_percentile | P50/P75/P90/P95/P99
|
+| Input Token Rate | tokens/min | meter_envoy_ai_gw_input_token_rate
| Input (prompt) tokens per minute |
+| Output Token Rate | tokens/min |
meter_envoy_ai_gw_output_token_rate | Output (completion) tokens per
minute |
+| TTFT Avg | ms | meter_envoy_ai_gw_ttft_avg
| Time to First Token (streaming only) |
+| TTFT Percentile | ms | meter_envoy_ai_gw_ttft_percentile
| P50/P75/P90/P95/P99 TTFT |
+| TPOT Avg | ms | meter_envoy_ai_gw_tpot_avg
| Time Per Output Token (streaming only) |
+| TPOT Percentile | ms | meter_envoy_ai_gw_tpot_percentile
| P50/P75/P90/P95/P99 TPOT |
#### Provider Breakdown Metrics
-| Monitoring Panel | Unit | Metric Name | Description |
-|---|---|---|---|
-| Provider Request CPM | calls/min | meter_envoy_ai_gw_provider_request_cpm |
Requests by provider |
-| Provider Token Rate | tokens/min | meter_envoy_ai_gw_provider_token_rate |
Token rate by provider |
-| Provider Latency Avg | ms | meter_envoy_ai_gw_provider_latency_avg | Latency
by provider |
+| Monitoring Panel | Unit | Metric Name |
Description |
+|----------------------|------------|----------------------------------------|------------------------|
+| Provider Request CPM | calls/min | meter_envoy_ai_gw_provider_request_cpm |
Requests by provider |
+| Provider Token Rate | tokens/min | meter_envoy_ai_gw_provider_token_rate |
Token rate by provider |
+| Provider Latency Avg | ms | meter_envoy_ai_gw_provider_latency_avg |
Latency by provider |
#### Model Breakdown Metrics
-| Monitoring Panel | Unit | Metric Name | Description |
-|---|---|---|---|
-| Model Request CPM | calls/min | meter_envoy_ai_gw_model_request_cpm |
Requests by model |
-| Model Token Rate | tokens/min | meter_envoy_ai_gw_model_token_rate | Token
rate by model |
-| Model Latency Avg | ms | meter_envoy_ai_gw_model_latency_avg | Latency by
model |
-| Model TTFT Avg | ms | meter_envoy_ai_gw_model_ttft_avg | TTFT by model |
-| Model TPOT Avg | ms | meter_envoy_ai_gw_model_tpot_avg | TPOT by model |
+| Monitoring Panel | Unit | Metric Name |
Description |
+|-------------------|------------|-------------------------------------|---------------------|
+| Model Request CPM | calls/min | meter_envoy_ai_gw_model_request_cpm |
Requests by model |
+| Model Token Rate | tokens/min | meter_envoy_ai_gw_model_token_rate | Token
rate by model |
+| Model Latency Avg | ms | meter_envoy_ai_gw_model_latency_avg |
Latency by model |
+| Model TTFT Avg | ms | meter_envoy_ai_gw_model_ttft_avg | TTFT
by model |
+| Model TPOT Avg | ms | meter_envoy_ai_gw_model_tpot_avg | TPOT
by model |
#### Instance Metrics
All service-level metrics are also available per instance (pod) with
`meter_envoy_ai_gw_instance_` prefix,
including per-provider and per-model breakdowns.
+### MCP Metrics
+
+When the AI Gateway is configured with MCP (Model Context Protocol) routes,
SkyWalking collects
+MCP-specific metrics. These appear in the **MCP** tab on the service and
instance dashboards.
+
+#### MCP Service Metrics
+
+| Monitoring Panel | Unit | Metric Name
| Description
|
+|---------------------------------------|-----------|---------------------------------------------------------|-------------------------------------------------------------------|
+| MCP Request CPM | calls/min |
meter_envoy_ai_gw_mcp_request_cpm | MCP requests per
minute |
+| MCP Request Latency Avg | ms |
meter_envoy_ai_gw_mcp_request_latency_avg | Average MCP request
duration |
+| MCP Request Latency Percentile | ms |
meter_envoy_ai_gw_mcp_request_latency_percentile | P50/P75/P90/P95/P99
|
+| MCP Method CPM | calls/min |
meter_envoy_ai_gw_mcp_method_cpm | Requests by MCP
method (initialize, tools/list, tools/call, etc.) |
+| MCP Error CPM | calls/min |
meter_envoy_ai_gw_mcp_error_cpm | MCP error requests
per minute |
+| MCP Initialization Latency Avg | ms |
meter_envoy_ai_gw_mcp_initialization_latency_avg | Average MCP session
initialization time |
+| MCP Initialization Latency Percentile | ms |
meter_envoy_ai_gw_mcp_initialization_latency_percentile | P50/P75/P90/P95/P99
|
+| MCP Capabilities CPM | calls/min |
meter_envoy_ai_gw_mcp_capabilities_cpm | Capabilities
negotiated by type |
+
+#### MCP Backend Breakdown Metrics
+
+| Monitoring Panel | Unit | Metric Name
| Description |
+|--------------------------|-----------|----------------------------------------------------------|--------------------------------|
+| Backend Request CPM | calls/min |
meter_envoy_ai_gw_mcp_backend_request_cpm | Requests by MCP
backend |
+| Backend Latency Avg | ms |
meter_envoy_ai_gw_mcp_backend_request_latency_avg | Latency by MCP
backend |
+| Backend Method CPM | calls/min |
meter_envoy_ai_gw_mcp_backend_method_cpm | Requests by backend
and method |
+| Backend Error CPM | calls/min |
meter_envoy_ai_gw_mcp_backend_error_cpm | Errors by MCP
backend |
+| Backend Init Latency Avg | ms |
meter_envoy_ai_gw_mcp_backend_initialization_latency_avg | Init latency by
backend |
+
+#### MCP Instance Metrics
+
+All MCP service-level metrics are also available per instance with
`meter_envoy_ai_gw_mcp_instance_` prefix.
+
### Access Log Sampling
-The LAL rules apply a sampling policy to reduce storage:
+Access logs are tagged with `ai_route_type` (`llm` or `mcp`) for filtering in
the log query UI.
+The `ai_route_type` tag is searchable by default.
+
+**LLM route logs:**
- **Error responses** (HTTP status >= 400) — always persisted.
- **Upstream failures** — always persisted.
- **High token cost** (>= 10,000 total tokens) — persisted for cost anomaly
detection.
- Normal successful responses with low token counts are dropped.
-The token threshold can be adjusted in `lal/envoy-ai-gateway.yaml`.
+**MCP route logs:**
+- **Error responses** (HTTP status >= 400) — always persisted.
+- Normal MCP requests are dropped (MCP observability is covered by metrics).
+
+The sampling policy can be adjusted in `lal/envoy-ai-gateway.yaml`.
diff --git a/oap-server/server-starter/src/main/resources/application.yml
b/oap-server/server-starter/src/main/resources/application.yml
index b79ff36a95..a0112f34d3 100644
--- a/oap-server/server-starter/src/main/resources/application.yml
+++ b/oap-server/server-starter/src/main/resources/application.yml
@@ -118,7 +118,7 @@ core:
searchableTracesTags:
${SW_SEARCHABLE_TAG_KEYS:http.method,http.status_code,rpc.status_code,db.type,db.instance,mq.queue,mq.topic,mq.broker}
# Define the set of log tag keys, which should be searchable through the
GraphQL.
# The max length of key=value should be less than 256 or will be dropped.
- searchableLogsTags: ${SW_SEARCHABLE_LOGS_TAG_KEYS:level,http.status_code}
+ searchableLogsTags:
${SW_SEARCHABLE_LOGS_TAG_KEYS:level,http.status_code,ai_route_type}
# Define the set of alarm tag keys, which should be searchable through the
GraphQL.
# The max length of key=value should be less than 256 or will be dropped.
searchableAlarmTags: ${SW_SEARCHABLE_ALARM_TAG_KEYS:level}
diff --git
a/oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml
b/oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml
index 0af60377b5..ef6e8a5186 100644
--- a/oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml
+++ b/oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml
@@ -15,29 +15,52 @@
# Envoy AI Gateway access log processing via OTLP.
#
-# Sampling policy: only persist abnormal or expensive requests.
-# Normal 200 responses with low token count and no upstream failure are
dropped.
+# Two rules: one for LLM route logs, one for MCP route logs.
+# LLM sampling: persist error responses (>= 400), upstream failures, or
high-token requests (>= 10000).
+# MCP sampling: only persist error responses (>= 400).
+# Both tag ai_route_type for searchable filtering in the UI.
rules:
- - name: envoy-ai-gateway-access-log
+ - name: envoy-ai-gateway-llm-access-log
layer: ENVOY_AI_GATEWAY
dsl: |
filter {
- // Drop normal logs: response < 400, no upstream failure, low token
count
+ // Only process LLM route logs (gen_ai.request.model is always set for
LLM routes, even on errors)
+ if (tag("gen_ai.request.model") == "" || tag("gen_ai.request.model")
== "-") {
+ abort {}
+ }
+
+ // Keep: error responses (>= 400), upstream failures, or high-token
requests (>= 10000 total tokens)
+ // Abort logs without response_code unless there is an upstream failure
+ if (tag("response_code") == "" || tag("response_code") == "-") {
+ if (tag("upstream_transport_failure_reason") == "" ||
tag("upstream_transport_failure_reason") == "-") {
+ abort {}
+ }
+ }
+ // For normal responses (< 400), check upstream failure and token cost
if (tag("response_code") != "" && tag("response_code") != "-") {
if (tag("response_code") as Integer < 400) {
+ if (tag("upstream_transport_failure_reason") != "" &&
tag("upstream_transport_failure_reason") != "-") {
+ // upstream failure — keep
+ }
if (tag("upstream_transport_failure_reason") == "" ||
tag("upstream_transport_failure_reason") == "-") {
+ // no upstream failure — check token cost
if (tag("gen_ai.usage.input_tokens") != "" &&
tag("gen_ai.usage.input_tokens") != "-"
&& tag("gen_ai.usage.output_tokens") != "" &&
tag("gen_ai.usage.output_tokens") != "-") {
if ((tag("gen_ai.usage.input_tokens") as Integer) +
(tag("gen_ai.usage.output_tokens") as Integer) < 10000) {
abort {}
}
}
+ if (tag("gen_ai.usage.input_tokens") == "" ||
tag("gen_ai.usage.input_tokens") == "-"
+ || tag("gen_ai.usage.output_tokens") == "" ||
tag("gen_ai.usage.output_tokens") == "-") {
+ abort {}
+ }
}
}
}
extractor {
+ tag 'ai_route_type': "llm"
tag 'gen_ai.request.model': tag("gen_ai.request.model")
tag 'gen_ai.response.model': tag("gen_ai.response.model")
tag 'gen_ai.provider.name': tag("gen_ai.provider.name")
@@ -50,3 +73,33 @@ rules:
sink {
}
}
+
+ - name: envoy-ai-gateway-mcp-access-log
+ layer: ENVOY_AI_GATEWAY
+ dsl: |
+ filter {
+ // Only process MCP route logs
+ if (tag("mcp.method.name") == "" || tag("mcp.method.name") == "-") {
+ abort {}
+ }
+
+ // Only persist error responses (>= 400)
+ if (tag("response_code") == "" || tag("response_code") == "-") {
+ abort {}
+ }
+ if (tag("response_code") as Integer < 400) {
+ abort {}
+ }
+
+ extractor {
+ tag 'ai_route_type': "mcp"
+ tag 'mcp.method.name': tag("mcp.method.name")
+ tag 'mcp.provider.name': tag("mcp.provider.name")
+ tag 'mcp.session.id': tag("mcp.session.id")
+ tag 'response_code': tag("response_code")
+ tag 'duration': tag("duration")
+ }
+
+ sink {
+ }
+ }
diff --git
a/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-mcp-instance.yaml
b/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-mcp-instance.yaml
new file mode 100644
index 0000000000..8ed1d25278
--- /dev/null
+++
b/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-mcp-instance.yaml
@@ -0,0 +1,80 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Envoy AI Gateway — MCP Instance-level (per-pod) metrics
+#
+# Same metrics as gateway-mcp-service.yaml but scoped to individual pods.
+# All durations are in seconds from the AI Gateway; multiply by 1000 for ms
display.
+
+filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }"
+expSuffix: instance(['service_name'], ['service_instance_id'],
Layer.ENVOY_AI_GATEWAY)
+metricPrefix: meter_envoy_ai_gw_mcp_instance
+
+metricsRules:
+ # ===================== Aggregate MCP metrics =====================
+
+ # MCP request CPM
+ - name: request_cpm
+ exp: mcp_request_duration_count.sum(['service_name',
'service_instance_id']).increase('PT1M')
+
+ # MCP request latency average (ms)
+ - name: request_latency_avg
+ exp: mcp_request_duration_sum.sum(['service_name',
'service_instance_id']).increase('PT1M') /
mcp_request_duration_count.sum(['service_name',
'service_instance_id']).increase('PT1M') * 1000
+
+ # MCP request latency percentile (ms)
+ - name: request_latency_percentile
+ exp: mcp_request_duration.sum(['le', 'service_name',
'service_instance_id']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99])
* 1000
+
+ # MCP method invocation CPM — labeled by mcp_method_name
+ - name: method_cpm
+ exp: mcp_method_count.sum(['mcp_method_name', 'service_name',
'service_instance_id']).increase('PT1M')
+
+ # MCP error CPM
+ - name: error_cpm
+ exp: mcp_method_count.tagEqual('status', 'error').sum(['service_name',
'service_instance_id']).increase('PT1M')
+
+ # MCP initialization latency average (ms)
+ - name: initialization_latency_avg
+ exp: mcp_initialization_duration_sum.sum(['service_name',
'service_instance_id']).increase('PT1M') /
mcp_initialization_duration_count.sum(['service_name',
'service_instance_id']).increase('PT1M') * 1000
+
+ # MCP initialization latency percentile (ms)
+ - name: initialization_latency_percentile
+ exp: mcp_initialization_duration.sum(['le', 'service_name',
'service_instance_id']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99])
* 1000
+
+ # MCP capabilities negotiated CPM — labeled by capability_type
+ - name: capabilities_cpm
+ exp: mcp_capabilities_negotiated.sum(['capability_type', 'service_name',
'service_instance_id']).increase('PT1M')
+
+ # ===================== Per-backend breakdown =====================
+
+ # Backend request CPM
+ - name: backend_request_cpm
+ exp: mcp_request_duration_count.sum(['mcp_backend', 'service_name',
'service_instance_id']).increase('PT1M')
+
+ # Backend request latency average (ms)
+ - name: backend_request_latency_avg
+ exp: mcp_request_duration_sum.sum(['mcp_backend', 'service_name',
'service_instance_id']).increase('PT1M') /
mcp_request_duration_count.sum(['mcp_backend', 'service_name',
'service_instance_id']).increase('PT1M') * 1000
+
+ # Backend method CPM
+ - name: backend_method_cpm
+ exp: mcp_method_count.sum(['mcp_backend', 'mcp_method_name',
'service_name', 'service_instance_id']).increase('PT1M')
+
+ # Backend error CPM
+ - name: backend_error_cpm
+ exp: mcp_method_count.tagEqual('status', 'error').sum(['mcp_backend',
'service_name', 'service_instance_id']).increase('PT1M')
+
+ # Backend initialization latency average (ms)
+ - name: backend_initialization_latency_avg
+ exp: mcp_initialization_duration_sum.sum(['mcp_backend', 'service_name',
'service_instance_id']).increase('PT1M') /
mcp_initialization_duration_count.sum(['mcp_backend', 'service_name',
'service_instance_id']).increase('PT1M') * 1000
diff --git
a/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-mcp-service.yaml
b/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-mcp-service.yaml
new file mode 100644
index 0000000000..482b8edaa3
--- /dev/null
+++
b/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-mcp-service.yaml
@@ -0,0 +1,85 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Envoy AI Gateway — MCP Service-level metrics
+#
+# Source OTLP metrics (dots → underscores by OTel receiver):
+# mcp_request_duration — Histogram (Cumulative), unit: seconds,
labels: mcp_backend, error_type (on error)
+# mcp_method_count — Counter (Cumulative), labels: mcp_backend,
mcp_method_name, status
+# mcp_initialization_duration — Histogram (Cumulative), unit: seconds,
labels: mcp_backend
+# mcp_capabilities_negotiated — Counter (Cumulative), labels: mcp_backend,
capability_type, capability_side
+#
+# All durations are in seconds from the AI Gateway; multiply by 1000 for ms
display.
+
+filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }"
+expSuffix: service(['service_name'], Layer.ENVOY_AI_GATEWAY)
+metricPrefix: meter_envoy_ai_gw_mcp
+
+metricsRules:
+ # ===================== Aggregate MCP metrics =====================
+
+ # MCP request CPM — count of all MCP requests per minute
+ - name: request_cpm
+ exp: mcp_request_duration_count.sum(['service_name']).increase('PT1M')
+
+ # MCP request latency average (ms)
+ - name: request_latency_avg
+ exp: mcp_request_duration_sum.sum(['service_name']).increase('PT1M') /
mcp_request_duration_count.sum(['service_name']).increase('PT1M') * 1000
+
+ # MCP request latency percentile (ms)
+ - name: request_latency_percentile
+ exp: mcp_request_duration.sum(['le',
'service_name']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99])
* 1000
+
+ # MCP method invocation CPM — labeled by mcp_method_name
+ - name: method_cpm
+ exp: mcp_method_count.sum(['mcp_method_name',
'service_name']).increase('PT1M')
+
+ # MCP error CPM — only error status
+ - name: error_cpm
+ exp: mcp_method_count.tagEqual('status',
'error').sum(['service_name']).increase('PT1M')
+
+ # MCP initialization latency average (ms)
+ - name: initialization_latency_avg
+ exp:
mcp_initialization_duration_sum.sum(['service_name']).increase('PT1M') /
mcp_initialization_duration_count.sum(['service_name']).increase('PT1M') * 1000
+
+ # MCP initialization latency percentile (ms)
+ - name: initialization_latency_percentile
+ exp: mcp_initialization_duration.sum(['le',
'service_name']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99])
* 1000
+
+ # MCP capabilities negotiated CPM — labeled by capability_type
+ - name: capabilities_cpm
+ exp: mcp_capabilities_negotiated.sum(['capability_type',
'service_name']).increase('PT1M')
+
+ # ===================== Per-backend breakdown =====================
+
+ # Backend request CPM — labeled by mcp_backend
+ - name: backend_request_cpm
+ exp: mcp_request_duration_count.sum(['mcp_backend',
'service_name']).increase('PT1M')
+
+ # Backend request latency average (ms) — labeled by mcp_backend
+ - name: backend_request_latency_avg
+ exp: mcp_request_duration_sum.sum(['mcp_backend',
'service_name']).increase('PT1M') /
mcp_request_duration_count.sum(['mcp_backend',
'service_name']).increase('PT1M') * 1000
+
+ # Backend method CPM — labeled by mcp_backend and mcp_method_name
+ - name: backend_method_cpm
+ exp: mcp_method_count.sum(['mcp_backend', 'mcp_method_name',
'service_name']).increase('PT1M')
+
+ # Backend error CPM — labeled by mcp_backend
+ - name: backend_error_cpm
+ exp: mcp_method_count.tagEqual('status', 'error').sum(['mcp_backend',
'service_name']).increase('PT1M')
+
+ # Backend initialization latency average (ms) — labeled by mcp_backend
+ - name: backend_initialization_latency_avg
+ exp: mcp_initialization_duration_sum.sum(['mcp_backend',
'service_name']).increase('PT1M') /
mcp_initialization_duration_count.sum(['mcp_backend',
'service_name']).increase('PT1M') * 1000
diff --git
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json
index fe314a11c0..a70b4a2b9d 100644
---
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json
+++
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json
@@ -36,8 +36,8 @@
}
],
"widget": {
- "title": "Request CPM",
- "tips": "Calls Per Minute — total requests through this
pod"
+ "title": "Request CPM (calls / min)",
+ "tips": "Total requests through this pod"
}
},
{
@@ -62,7 +62,7 @@
}
],
"widget": {
- "title": "Request Latency Avg"
+ "title": "Request Latency Avg (ms)"
}
},
{
@@ -87,7 +87,7 @@
}
],
"widget": {
- "title": "Request Latency Percentile",
+ "title": "Request Latency Percentile (ms)",
"tips": "P50 / P75 / P90 / P95 / P99"
}
},
@@ -113,7 +113,7 @@
}
],
"widget": {
- "title": "Input Token Rate",
+ "title": "Input Token Rate (tokens / min)",
"tips": "Input (prompt) tokens per minute sent to LLM
providers"
}
},
@@ -139,7 +139,7 @@
}
],
"widget": {
- "title": "Output Token Rate",
+ "title": "Output Token Rate (tokens / min)",
"tips": "Output (completion) tokens per minute generated
by LLM providers"
}
},
@@ -165,7 +165,7 @@
}
],
"widget": {
- "title": "Time to First Token Avg (TTFT)",
+ "title": "Time to First Token Avg - TTFT (ms)",
"tips": "Average time to first token for streaming
requests"
}
},
@@ -191,7 +191,7 @@
}
],
"widget": {
- "title": "Time to First Token Percentile (TTFT)",
+ "title": "Time to First Token Percentile - TTFT (ms)",
"tips": "P50 / P75 / P90 / P95 / P99"
}
},
@@ -217,7 +217,7 @@
}
],
"widget": {
- "title": "Time Per Output Token Avg (TPOT)",
+ "title": "Time Per Output Token Avg - TPOT (ms)",
"tips": "Average inter-token latency for streaming
requests"
}
},
@@ -243,7 +243,7 @@
}
],
"widget": {
- "title": "Time Per Output Token Percentile (TPOT)",
+ "title": "Time Per Output Token Percentile - TPOT (ms)",
"tips": "P50 / P75 / P90 / P95 / P99"
}
}
@@ -274,7 +274,7 @@
}
],
"widget": {
- "title": "Request CPM by Provider"
+ "title": "Request CPM by Provider (calls / min)"
}
},
{
@@ -299,7 +299,7 @@
}
],
"widget": {
- "title": "Token Rate by Provider"
+ "title": "Token Rate by Provider (tokens / min)"
}
},
{
@@ -324,7 +324,7 @@
}
],
"widget": {
- "title": "Latency Avg by Provider"
+ "title": "Latency Avg by Provider (ms)"
}
}
]
@@ -354,7 +354,7 @@
}
],
"widget": {
- "title": "Request CPM by Model"
+ "title": "Request CPM by Model (calls / min)"
}
},
{
@@ -379,7 +379,7 @@
}
],
"widget": {
- "title": "Token Rate by Model"
+ "title": "Token Rate by Model (tokens / min)"
}
},
{
@@ -404,7 +404,7 @@
}
],
"widget": {
- "title": "Latency Avg by Model"
+ "title": "Latency Avg by Model (ms)"
}
},
{
@@ -429,7 +429,7 @@
}
],
"widget": {
- "title": "Time to First Token Avg by Model (TTFT)"
+ "title": "Time to First Token Avg by Model - TTFT (ms)"
}
},
{
@@ -454,7 +454,339 @@
}
],
"widget": {
- "title": "Time Per Output Token Avg by Model (TPOT)"
+ "title": "Time Per Output Token Avg by Model - TPOT (ms)"
+ }
+ }
+ ]
+ },
+ {
+ "name": "MCP",
+ "children": [
+ {
+ "x": 0,
+ "y": 0,
+ "w": 8,
+ "h": 13,
+ "i": "0",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_instance_request_cpm"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "MCP Request CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Request CPM (calls / min)"
+ }
+ },
+ {
+ "x": 8,
+ "y": 0,
+ "w": 8,
+ "h": 13,
+ "i": "1",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_instance_request_latency_avg"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Avg Latency",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Request Latency Avg (ms)"
+ }
+ },
+ {
+ "x": 16,
+ "y": 0,
+ "w": 8,
+ "h": 13,
+ "i": "2",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_instance_request_latency_percentile"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Latency Percentile",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Request Latency Percentile (ms)",
+ "tips": "P50 / P75 / P90 / P95 / P99"
+ }
+ },
+ {
+ "x": 0,
+ "y": 13,
+ "w": 8,
+ "h": 13,
+ "i": "3",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_instance_error_cpm"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Error CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Error CPM (calls / min)"
+ }
+ },
+ {
+ "x": 8,
+ "y": 13,
+ "w": 8,
+ "h": 13,
+ "i": "4",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_instance_method_cpm,sum(mcp_method_name))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Method CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Method CPM (calls / min)"
+ }
+ },
+ {
+ "x": 16,
+ "y": 13,
+ "w": 8,
+ "h": 13,
+ "i": "5",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_instance_initialization_latency_avg"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Init Latency",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Initialization Latency Avg (ms)"
+ }
+ },
+ {
+ "x": 0,
+ "y": 26,
+ "w": 8,
+ "h": 13,
+ "i": "6",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_instance_backend_request_cpm,sum(mcp_backend))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Request CPM by Backend (calls / min)"
+ }
+ },
+ {
+ "x": 8,
+ "y": 26,
+ "w": 8,
+ "h": 13,
+ "i": "7",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_instance_backend_request_latency_avg,avg(mcp_backend))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend Latency",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Latency Avg by Backend (ms)"
+ }
+ },
+ {
+ "x": 16,
+ "y": 26,
+ "w": 8,
+ "h": 13,
+ "i": "8",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_instance_backend_error_cpm,sum(mcp_backend))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend Error CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Error CPM by Backend (calls / min)"
+ }
+ },
+ {
+ "x": 0,
+ "y": 39,
+ "w": 8,
+ "h": 13,
+ "i": "9",
+ "type": "Widget",
+ "expressions": [
+
"meter_envoy_ai_gw_mcp_instance_initialization_latency_percentile"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Init Latency Percentile",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Initialization Latency Percentile (ms)",
+ "tips": "P50 / P75 / P90 / P95 / P99"
+ }
+ },
+ {
+ "x": 8,
+ "y": 39,
+ "w": 8,
+ "h": 13,
+ "i": "10",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_instance_capabilities_cpm,sum(capability_type))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Capabilities CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Capabilities Negotiated (calls / min)"
+ }
+ },
+ {
+ "x": 16,
+ "y": 39,
+ "w": 8,
+ "h": 13,
+ "i": "11",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_instance_backend_method_cpm,sum(mcp_backend,mcp_method_name))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend Method CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Method CPM by Backend (calls / min)"
+ }
+ },
+ {
+ "x": 0,
+ "y": 52,
+ "w": 8,
+ "h": 13,
+ "i": "12",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_instance_backend_initialization_latency_avg,avg(mcp_backend))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend Init Latency",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Init Latency Avg by Backend (ms)"
}
}
]
diff --git
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json
index e2599eee1b..4ac89f5f11 100644
---
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json
+++
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json
@@ -36,8 +36,8 @@
}
],
"widget": {
- "title": "Request CPM",
- "tips": "Calls Per Minute — total requests through the AI
Gateway"
+ "title": "Request CPM (calls / min)",
+ "tips": "Total requests through the AI Gateway"
}
},
{
@@ -62,7 +62,7 @@
}
],
"widget": {
- "title": "Request Latency Avg"
+ "title": "Request Latency Avg (ms)"
}
},
{
@@ -87,7 +87,7 @@
}
],
"widget": {
- "title": "Request Latency Percentile",
+ "title": "Request Latency Percentile (ms)",
"tips": "P50 / P75 / P90 / P95 / P99"
}
},
@@ -113,7 +113,7 @@
}
],
"widget": {
- "title": "Input Token Rate",
+ "title": "Input Token Rate (tokens / min)",
"tips": "Input (prompt) tokens per minute sent to LLM
providers"
}
},
@@ -139,7 +139,7 @@
}
],
"widget": {
- "title": "Output Token Rate",
+ "title": "Output Token Rate (tokens / min)",
"tips": "Output (completion) tokens per minute generated
by LLM providers"
}
},
@@ -165,8 +165,8 @@
}
],
"widget": {
- "title": "Time to First Token Avg (TTFT)",
- "tips": "Average time to first token for streaming
requests"
+ "title": "Time to First Token Avg - TTFT (ms)",
+ "tips": "Average TTFT for streaming requests"
}
},
{
@@ -191,8 +191,8 @@
}
],
"widget": {
- "title": "Time to First Token Percentile (TTFT)",
- "tips": "P50 / P75 / P90 / P95 / P99"
+ "title": "Time to First Token Percentile - TTFT (ms)",
+ "tips": "TTFT P50 / P75 / P90 / P95 / P99"
}
},
{
@@ -217,8 +217,8 @@
}
],
"widget": {
- "title": "Time Per Output Token Avg (TPOT)",
- "tips": "Average inter-token latency for streaming
requests"
+ "title": "Time Per Output Token Avg - TPOT (ms)",
+ "tips": "Average TPOT inter-token latency for streaming
requests"
}
},
{
@@ -243,8 +243,8 @@
}
],
"widget": {
- "title": "Time Per Output Token Percentile (TPOT)",
- "tips": "P50 / P75 / P90 / P95 / P99"
+ "title": "Time Per Output Token Percentile - TPOT (ms)",
+ "tips": "TPOT P50 / P75 / P90 / P95 / P99"
}
}
]
@@ -274,7 +274,7 @@
}
],
"widget": {
- "title": "Request CPM by Provider"
+ "title": "Request CPM by Provider (calls / min)"
}
},
{
@@ -299,7 +299,7 @@
}
],
"widget": {
- "title": "Token Rate by Provider"
+ "title": "Token Rate by Provider (tokens / min)"
}
},
{
@@ -324,7 +324,7 @@
}
],
"widget": {
- "title": "Latency Avg by Provider"
+ "title": "Latency Avg by Provider (ms)"
}
}
]
@@ -354,7 +354,7 @@
}
],
"widget": {
- "title": "Request CPM by Model"
+ "title": "Request CPM by Model (calls / min)"
}
},
{
@@ -379,7 +379,7 @@
}
],
"widget": {
- "title": "Token Rate by Model"
+ "title": "Token Rate by Model (tokens / min)"
}
},
{
@@ -404,7 +404,7 @@
}
],
"widget": {
- "title": "Latency Avg by Model"
+ "title": "Latency Avg by Model (ms)"
}
},
{
@@ -429,7 +429,7 @@
}
],
"widget": {
- "title": "Time to First Token Avg by Model (TTFT)"
+ "title": "Time to First Token Avg by Model - TTFT (ms)"
}
},
{
@@ -454,7 +454,342 @@
}
],
"widget": {
- "title": "Time Per Output Token Avg by Model (TPOT)"
+ "title": "Time Per Output Token Avg by Model - TPOT (ms)"
+ }
+ }
+ ]
+ },
+ {
+ "name": "MCP",
+ "children": [
+ {
+ "x": 0,
+ "y": 0,
+ "w": 8,
+ "h": 13,
+ "i": "0",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_request_cpm"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "MCP Request CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Request CPM (calls / min)",
+ "tips": "Total MCP requests per minute"
+ }
+ },
+ {
+ "x": 8,
+ "y": 0,
+ "w": 8,
+ "h": 13,
+ "i": "1",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_request_latency_avg"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Avg Latency",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Request Latency Avg (ms)"
+ }
+ },
+ {
+ "x": 16,
+ "y": 0,
+ "w": 8,
+ "h": 13,
+ "i": "2",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_request_latency_percentile"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Latency Percentile",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Request Latency Percentile (ms)",
+ "tips": "P50 / P75 / P90 / P95 / P99"
+ }
+ },
+ {
+ "x": 0,
+ "y": 13,
+ "w": 8,
+ "h": 13,
+ "i": "3",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_error_cpm"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Error CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Error CPM (calls / min)"
+ }
+ },
+ {
+ "x": 8,
+ "y": 13,
+ "w": 8,
+ "h": 13,
+ "i": "4",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_method_cpm,sum(mcp_method_name))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Method CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Method CPM (calls / min)",
+ "tips": "Requests per minute by MCP method (initialize,
tools/list, tools/call, etc.)"
+ }
+ },
+ {
+ "x": 16,
+ "y": 13,
+ "w": 8,
+ "h": 13,
+ "i": "5",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_initialization_latency_avg"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Init Latency",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Initialization Latency Avg (ms)",
+ "tips": "Average time to initialize MCP session with
backend"
+ }
+ },
+ {
+ "x": 0,
+ "y": 26,
+ "w": 8,
+ "h": 13,
+ "i": "6",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_backend_request_cpm,sum(mcp_backend))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Request CPM by Backend (calls / min)"
+ }
+ },
+ {
+ "x": 8,
+ "y": 26,
+ "w": 8,
+ "h": 13,
+ "i": "7",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_backend_request_latency_avg,avg(mcp_backend))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend Latency",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Latency Avg by Backend (ms)"
+ }
+ },
+ {
+ "x": 16,
+ "y": 26,
+ "w": 8,
+ "h": 13,
+ "i": "8",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_backend_error_cpm,sum(mcp_backend))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend Error CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Error CPM by Backend (calls / min)"
+ }
+ },
+ {
+ "x": 0,
+ "y": 39,
+ "w": 8,
+ "h": 13,
+ "i": "9",
+ "type": "Widget",
+ "expressions": [
+ "meter_envoy_ai_gw_mcp_initialization_latency_percentile"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Init Latency Percentile",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Initialization Latency Percentile (ms)",
+ "tips": "P50 / P75 / P90 / P95 / P99"
+ }
+ },
+ {
+ "x": 8,
+ "y": 39,
+ "w": 8,
+ "h": 13,
+ "i": "10",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_capabilities_cpm,sum(capability_type))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Capabilities CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Capabilities Negotiated (calls / min)"
+ }
+ },
+ {
+ "x": 16,
+ "y": 39,
+ "w": 8,
+ "h": 13,
+ "i": "11",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_backend_method_cpm,sum(mcp_backend,mcp_method_name))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend Method CPM",
+ "unit": "calls/min"
+ }
+ ],
+ "widget": {
+ "title": "MCP Method CPM by Backend (calls / min)"
+ }
+ },
+ {
+ "x": 0,
+ "y": 52,
+ "w": 8,
+ "h": 13,
+ "i": "12",
+ "type": "Widget",
+ "expressions": [
+
"aggregate_labels(meter_envoy_ai_gw_mcp_backend_initialization_latency_avg,avg(mcp_backend))"
+ ],
+ "graph": {
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
+ },
+ "metricConfig": [
+ {
+ "label": "Backend Init Latency",
+ "unit": "ms"
+ }
+ ],
+ "widget": {
+ "title": "MCP Init Latency Avg by Backend (ms)"
}
}
]
diff --git
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-model.json
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-model.json
index a5672eb2d6..79ffba319a 100644
---
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-model.json
+++
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-model.json
@@ -202,8 +202,9 @@
"gen_ai_model_latency_percentile"
],
"graph": {
- "type": "Bar",
- "showBackground": true
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
},
"widget": {
"name": "LatencyPercentile",
@@ -267,8 +268,9 @@
}
],
"graph": {
- "type": "Bar",
- "showBackground": true
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
},
"id": "0-0-21",
"moved": false,
diff --git
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-provider.json
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-provider.json
index 0a8be506d4..1d25ad3317 100644
---
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-provider.json
+++
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-provider.json
@@ -228,8 +228,9 @@
"gen_ai_provider_latency_percentile"
],
"graph": {
- "type": "Bar",
- "showBackground": true
+ "type": "Line",
+ "showXAxis": true,
+ "showYAxis": true
},
"widget": {
"name": "LatencyPercentile",
@@ -333,5 +334,3 @@
}
}
]
-
-
diff --git a/test/e2e-v2/cases/envoy-ai-gateway/docker-compose.yml
b/test/e2e-v2/cases/envoy-ai-gateway/docker-compose.yml
index 89669cf6fd..c2a64dc2dd 100644
--- a/test/e2e-v2/cases/envoy-ai-gateway/docker-compose.yml
+++ b/test/e2e-v2/cases/envoy-ai-gateway/docker-compose.yml
@@ -13,12 +13,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-# Envoy AI Gateway e2e — ai-gateway-cli + Ollama + SkyWalking OAP
+# Envoy AI Gateway e2e — ai-gateway-cli + Ollama + MCP Everything Server +
SkyWalking OAP
#
# Architecture:
-# trigger → ai-gateway-cli (port 1975) → ollama (port 11434)
-# ↓ OTLP gRPC
-# oap (port 11800) → banyandb
+# LLM trigger → ai-gateway-cli (port 1975) → ollama (port 11434)
+# MCP trigger → ai-gateway-cli (port 1975) → mcp-server (port 3001)
+# ↓ OTLP gRPC
+# oap (port 11800) → banyandb
services:
banyandb:
@@ -52,10 +53,27 @@ services:
timeout: 60s
retries: 120
+ mcp-server:
+ image: tzolov/mcp-everything-server:v3
+ command: ["node", "dist/index.js", "streamableHttp"]
+ networks:
+ - e2e
+ expose:
+ - 3001
+ healthcheck:
+ test: ["CMD-SHELL", "wget -qO- --header='Content-Type: application/json'
--header='Accept: application/json, text/event-stream'
--post-data='{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"initialize\",\"params\":{\"protocolVersion\":\"2025-03-26\",\"capabilities\":{},\"clientInfo\":{\"name\":\"healthcheck\",\"version\":\"1.0\"}}}'
http://localhost:3001/mcp || exit 1"]
+ interval: 5s
+ timeout: 10s
+ retries: 30
+
aigw:
# TODO: pin to a release version once ai-gateway-cli HTTP listener is
available in a release
image: envoyproxy/ai-gateway-cli:latest
- command: run --run-id=0
+ command:
+ - "run"
+ - "--run-id=0"
+ - "--mcp-json"
+ -
'{"mcpServers":{"everything":{"type":"http","url":"http://mcp-server:3001/mcp"}}}'
environment:
OPENAI_API_KEY: "dummy-key-not-used"
OPENAI_BASE_URL: "http://ollama:11434/v1"
@@ -71,7 +89,7 @@ services:
networks:
- e2e
healthcheck:
- test: ["CMD", "aigw", "healthcheck"]
+ test: ["CMD", "/app", "healthcheck"]
interval: 5s
timeout: 60s
retries: 120
@@ -80,6 +98,8 @@ services:
condition: service_healthy
ollama:
condition: service_healthy
+ mcp-server:
+ condition: service_healthy
networks:
e2e:
diff --git a/test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml
b/test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml
index 18aaa417b8..1969c9e911 100644
--- a/test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml
+++ b/test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml
@@ -15,14 +15,15 @@
# Envoy AI Gateway e2e test (docker-compose)
#
-# Validates ENVOY_AI_GATEWAY layer metrics and logs via OTLP from
ai-gateway-cli.
+# Validates ENVOY_AI_GATEWAY layer LLM + MCP metrics and logs via OTLP from
ai-gateway-cli.
#
# Architecture:
-# trigger (curl) → ai-gateway-cli (port 1975) → Ollama (port 11434)
-# ↓ OTLP gRPC
-# SkyWalking OAP (port 11800)
-# ↓
-# BanyanDB
+# LLM trigger (curl) → ai-gateway-cli (port 1975) → Ollama (port 11434)
+# MCP trigger (curl) → ai-gateway-cli (port 1975) → MCP Everything Server
(port 3001)
+# ↓ OTLP gRPC
+# SkyWalking OAP (port 11800)
+# ↓
+# BanyanDB
setup:
env: compose
@@ -47,6 +48,32 @@ setup:
-d
'{"model":"nonexistent-model","messages":[{"role":"user","content":"Hi"}]}'
sleep 1
done
+ - name: Send MCP requests for metric verification
+ command: |
+ # Initialize MCP session and capture session ID
+ SESSION_ID=$(curl -sS -f -D- --max-time 15 \
+ http://${aigw_host}:${aigw_1975}/mcp \
+ -H 'Content-Type: application/json' \
+ -d
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"e2e-test","version":"1.0"}}}'
\
+ 2>&1 | grep -i '^Mcp-Session-Id:' | tr -d '\r' | sed
's/^[Mm]cp-[Ss]ession-[Ii]d: *//')
+ if [ -z "$SESSION_ID" ]; then
+ echo "Failed to capture MCP session ID" >&2
+ exit 1
+ fi
+ sleep 2
+ # tools/list
+ curl -sS -f --max-time 15 -o /dev/null \
+ http://${aigw_host}:${aigw_1975}/mcp \
+ -H 'Content-Type: application/json' \
+ -H "Mcp-Session-Id: $SESSION_ID" \
+ -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
+ sleep 1
+ # tools/call — call a nonexistent tool to generate an error (no -f,
expects 4xx)
+ curl -sS --max-time 15 -o /dev/null \
+ http://${aigw_host}:${aigw_1975}/mcp \
+ -H 'Content-Type: application/json' \
+ -H "Mcp-Session-Id: $SESSION_ID" \
+ -d
'{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"nonexistent_tool","arguments":{}}}'
trigger:
action: http
diff --git a/test/e2e-v2/cases/envoy-ai-gateway/envoy-ai-gateway-cases.yaml
b/test/e2e-v2/cases/envoy-ai-gateway/envoy-ai-gateway-cases.yaml
index 19f11d7dc7..c9fd498d43 100644
--- a/test/e2e-v2/cases/envoy-ai-gateway/envoy-ai-gateway-cases.yaml
+++ b/test/e2e-v2/cases/envoy-ai-gateway/envoy-ai-gateway-cases.yaml
@@ -45,6 +45,26 @@ cases:
- query: swctl --display yaml
--base-url=http://${oap_host}:${oap_12800}/graphql metrics exec
--expression=meter_envoy_ai_gw_model_latency_avg --service-name=e2e-ai-gateway
expected: expected/metrics-has-value.yml
- # Access logs — error requests (404) should be persisted by LAL sampling
- - query: swctl --display yaml
--base-url=http://${oap_host}:${oap_12800}/graphql logs ls
--service-name=e2e-ai-gateway
+ # ===================== MCP metrics =====================
+
+ # MCP aggregate metrics
+ - query: swctl --display yaml
--base-url=http://${oap_host}:${oap_12800}/graphql metrics exec
--expression=meter_envoy_ai_gw_mcp_request_cpm --service-name=e2e-ai-gateway
+ expected: expected/metrics-has-value.yml
+ - query: swctl --display yaml
--base-url=http://${oap_host}:${oap_12800}/graphql metrics exec
--expression=meter_envoy_ai_gw_mcp_request_latency_avg
--service-name=e2e-ai-gateway
+ expected: expected/metrics-has-value.yml
+ - query: swctl --display yaml
--base-url=http://${oap_host}:${oap_12800}/graphql metrics exec
--expression=meter_envoy_ai_gw_mcp_request_latency_percentile
--service-name=e2e-ai-gateway
+ expected: expected/metrics-has-value-label.yml
+
+ # MCP method breakdown
+ - query: swctl --display yaml
--base-url=http://${oap_host}:${oap_12800}/graphql metrics exec
--expression=meter_envoy_ai_gw_mcp_method_cpm --service-name=e2e-ai-gateway
+ expected: expected/metrics-has-value.yml
+
+ # MCP backend breakdown
+ - query: swctl --display yaml
--base-url=http://${oap_host}:${oap_12800}/graphql metrics exec
--expression=meter_envoy_ai_gw_mcp_backend_request_cpm
--service-name=e2e-ai-gateway
+ expected: expected/metrics-has-value.yml
+
+ # ===================== Access logs =====================
+
+ # LLM error requests (404) should be persisted with ai_route_type=llm
+ - query: swctl --display yaml
--base-url=http://${oap_host}:${oap_12800}/graphql logs ls
--service-name=e2e-ai-gateway --tags=ai_route_type=llm
expected: expected/logs.yml
diff --git a/test/e2e-v2/cases/envoy-ai-gateway/expected/logs.yml
b/test/e2e-v2/cases/envoy-ai-gateway/expected/logs.yml
index 37a7f56d41..6efc226799 100644
--- a/test/e2e-v2/cases/envoy-ai-gateway/expected/logs.yml
+++ b/test/e2e-v2/cases/envoy-ai-gateway/expected/logs.yml
@@ -28,6 +28,8 @@ logs:
content: {{ notEmpty .content }}
tags:
{{- contains .tags }}
+ - key: ai_route_type
+ value: llm
- key: response_code
value: "404"
- key: gen_ai.request.model
diff --git a/test/e2e-v2/cases/storage/expected/config-dump.yml
b/test/e2e-v2/cases/storage/expected/config-dump.yml
index 5dded79d67..59c96cb3bc 100644
--- a/test/e2e-v2/cases/storage/expected/config-dump.yml
+++ b/test/e2e-v2/cases/storage/expected/config-dump.yml
@@ -161,7 +161,7 @@ receiver-sharing-server.default.gRPCSslTrustedCAsPath=
aws-firehose.default.firehoseAccessKey=******
core.default.autocompleteTagValuesQueryMaxSize=100
agent-analyzer.default.noUpstreamRealAddressAgents=6000,9000
-core.default.searchableLogsTags=level,http.status_code
+core.default.searchableLogsTags=level,http.status_code,ai_route_type
core.default.role=Mixed
receiver-sharing-server.default.gRPCSslKeyPath=
receiver-pprof.provider=default