Copilot commented on code in PR #13791: URL: https://github.com/apache/skywalking/pull/13791#discussion_r3036195448
########## oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-mcp-instance.yaml: ########## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Envoy AI Gateway — MCP Instance-level (per-pod) metrics +# +# Same metrics as gateway-mcp-service.yaml but scoped to individual pods. +# All durations are in seconds from the AI Gateway; multiply by 1000 for ms display. + +filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }" +expSuffix: instance(['service_name'], ['service_instance_id'], Layer.ENVOY_AI_GATEWAY) +metricPrefix: meter_envoy_ai_gw_mcp_instance + +metricsRules: + # ===================== Aggregate MCP metrics ===================== + + # MCP request CPM + - name: request_cpm + exp: mcp_request_duration_count.sum(['service_name', 'service_instance_id']).increase('PT1M') + + # MCP request latency average (ms) + - name: request_latency_avg + exp: mcp_request_duration_sum.sum(['service_name', 'service_instance_id']).increase('PT1M') / mcp_request_duration_count.sum(['service_name', 'service_instance_id']).increase('PT1M') * 1000 + + # MCP request latency percentile (ms) + - name: request_latency_percentile + exp: mcp_request_duration.sum(['le', 'service_name', 'service_instance_id']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99]) * 1000 + + # MCP method invocation CPM — labeled by mcp_method_name + - name: method_cpm + exp: mcp_method_count.sum(['mcp_method_name', 'service_name', 'service_instance_id']).increase('PT1M') + + # MCP error CPM + - name: error_cpm + exp: mcp_method_count.tagEqual('status', 'error').sum(['service_name', 'service_instance_id']).increase('PT1M') + + # MCP initialization latency average (ms) + - name: initialization_latency_avg + exp: mcp_initialization_duration_sum.sum(['service_name', 'service_instance_id']).increase('PT1M') / mcp_initialization_duration_count.sum(['service_name', 'service_instance_id']).increase('PT1M') * 1000 + + # MCP initialization latency percentile (ms) + - name: initialization_latency_percentile + exp: mcp_initialization_duration.sum(['le', 'service_name', 'service_instance_id']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99]) * 1000 + + # MCP capabilities negotiated CPM — labeled by capability_type + - name: capabilities_cpm + exp: mcp_capabilities_negotiated.sum(['capability_type', 'service_name', 'service_instance_id']).increase('PT1M') + + # ===================== Per-backend breakdown ===================== + + # Backend request CPM + - name: backend_request_cpm + exp: mcp_request_duration_count.sum(['mcp_backend', 'service_name', 'service_instance_id']).increase('PT1M') + + # Backend request latency average (ms) + - name: backend_request_latency_avg + exp: mcp_request_duration_sum.sum(['mcp_backend', 'service_name', 'service_instance_id']).increase('PT1M') / mcp_request_duration_count.sum(['mcp_backend', 'service_name', 'service_instance_id']).increase('PT1M') * 1000 + + # Backend method CPM + - name: backend_method_cpm + exp: mcp_method_count.sum(['mcp_backend', 'mcp_method_name', 'service_name', 'service_instance_id']).increase('PT1M') + + # Backend error CPM + - name: backend_error_cpm + exp: mcp_method_count.tagEqual('status', 'error').sum(['service_name', 'service_instance_id']).increase('PT1M') Review Comment: `backend_error_cpm` is intended to be a per-backend breakdown metric, but the expression currently sums only by `service_name` and `service_instance_id`, dropping the `mcp_backend` label. This makes it effectively identical to `error_cpm` and breaks the backend breakdown dashboard widget. Include `mcp_backend` in the `sum([...])` dimensions (consistent with the service-level rule). ```suggestion exp: mcp_method_count.tagEqual('status', 'error').sum(['mcp_backend', 'service_name', 'service_instance_id']).increase('PT1M') ``` ########## oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml: ########## @@ -15,29 +15,33 @@ # Envoy AI Gateway access log processing via OTLP. # -# Sampling policy: only persist abnormal or expensive requests. -# Normal 200 responses with low token count and no upstream failure are dropped. +# Two rules: one for LLM route logs, one for MCP route logs. +# LLM sampling: only persist abnormal or expensive requests. +# MCP sampling: only persist error responses. +# Both tag ai_route_type for searchable filtering in the UI. rules: - - name: envoy-ai-gateway-access-log + - name: envoy-ai-gateway-llm-access-log layer: ENVOY_AI_GATEWAY dsl: | filter { + // Only process LLM route logs (gen_ai.request.model is always set for LLM routes, even on errors) + if (tag("gen_ai.request.model") == "" || tag("gen_ai.request.model") == "-") { + abort {} + } + // Drop normal logs: response < 400, no upstream failure, low token count + // Keep only: error responses (>= 400), upstream failures, or high-token requests (>= 10000) if (tag("response_code") != "" && tag("response_code") != "-") { if (tag("response_code") as Integer < 400) { if (tag("upstream_transport_failure_reason") == "" || tag("upstream_transport_failure_reason") == "-") { - if (tag("gen_ai.usage.input_tokens") != "" && tag("gen_ai.usage.input_tokens") != "-" - && tag("gen_ai.usage.output_tokens") != "" && tag("gen_ai.usage.output_tokens") != "-") { - if ((tag("gen_ai.usage.input_tokens") as Integer) + (tag("gen_ai.usage.output_tokens") as Integer) < 10000) { - abort {} - } - } + abort {} } Review Comment: The log sampling comments still mention token-based sampling ("low token count" / "high-token requests (>= 10000)") but the token threshold logic has been removed and the filter now only keeps (a) HTTP >= 400 or (b) upstream failures. Please update these comments to reflect the actual behavior to avoid misleading future changes. ########## oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml: ########## @@ -50,3 +54,32 @@ rules: sink { } } + + - name: envoy-ai-gateway-mcp-access-log + layer: ENVOY_AI_GATEWAY + dsl: | + filter { + // Only process MCP route logs + if (tag("mcp.method.name") == "" || tag("mcp.method.name") == "-") { + abort {} + } + + // Only persist error responses + if (tag("response_code") != "" && tag("response_code") != "-") { + if (tag("response_code") as Integer < 400) { + abort {} + } + } Review Comment: The MCP rule is documented as "Only persist error responses", but the current logic only aborts when `response_code` is present and < 400. If `response_code` is missing/"-", the log will be persisted, which contradicts the stated policy. Consider explicitly aborting when `response_code` is missing, or handling upstream-failure signals separately if those should be retained. ```suggestion if (tag("response_code") == "" || tag("response_code") == "-") { abort {} } if (tag("response_code") as Integer < 400) { abort {} } ``` ########## test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml: ########## @@ -47,6 +48,28 @@ setup: -d '{"model":"nonexistent-model","messages":[{"role":"user","content":"Hi"}]}' sleep 1 done + - name: Send MCP requests for metric verification + command: | + # Initialize MCP session and capture session ID + SESSION_ID=$(curl -s -D- --max-time 15 \ + http://${aigw_host}:${aigw_1975}/mcp \ + -H 'Content-Type: application/json' \ + -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"e2e-test","version":"1.0"}}}' \ + 2>&1 | grep -i 'Mcp-Session-Id' | tr -d '\r' | sed 's/Mcp-Session-Id: //') + sleep 2 + # tools/list + curl -s --max-time 15 -o /dev/null \ + http://${aigw_host}:${aigw_1975}/mcp \ + -H 'Content-Type: application/json' \ + -H "Mcp-Session-Id: $SESSION_ID" \ + -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' + sleep 1 + # tools/call — call a nonexistent tool to generate an error + curl -s --max-time 15 -o /dev/null \ Review Comment: The MCP session ID extraction does not validate that `SESSION_ID` was actually captured. If the header is not present (or the grep/sed pipeline changes), subsequent requests will still run with an empty `Mcp-Session-Id`, potentially making the test silently flaky. Add a check that fails fast when `SESSION_ID` is empty (and/or have curl fail on non-2xx). ```suggestion SESSION_ID=$(curl -sS -f -D- --max-time 15 \ http://${aigw_host}:${aigw_1975}/mcp \ -H 'Content-Type: application/json' \ -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"e2e-test","version":"1.0"}}}' \ 2>&1 | grep -i '^Mcp-Session-Id:' | tr -d '\r' | sed 's/^[Mm]cp-[Ss]ession-[Ii]d: *//') if [ -z "$SESSION_ID" ]; then echo "Failed to capture MCP session ID from initialize response" >&2 exit 1 fi sleep 2 # tools/list curl -sS -f --max-time 15 -o /dev/null \ http://${aigw_host}:${aigw_1975}/mcp \ -H 'Content-Type: application/json' \ -H "Mcp-Session-Id: $SESSION_ID" \ -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' sleep 1 # tools/call — call a nonexistent tool to generate an error curl -sS -f --max-time 15 -o /dev/null \ ``` ########## docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md: ########## @@ -4,8 +4,8 @@ [Envoy AI Gateway](https://aigateway.envoyproxy.io/) is a gateway/proxy for AI/LLM API traffic (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on top of Envoy Proxy. -It natively emits GenAI metrics and access logs via OTLP, following -[OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). +It natively emits GenAI metrics, MCP (Model Context Protocol) metrics, and access logs via OTLP, +following [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). Review Comment: This sentence now implies MCP metrics follow the OpenTelemetry GenAI Semantic Conventions. GenAI semconv covers GenAI signals, but MCP metrics are a separate set (as defined by Envoy AI Gateway). Consider rephrasing to avoid suggesting MCP is part of the GenAI semantic conventions (e.g., say GenAI follows the semconv, and MCP metrics are emitted via OTLP as well). ```suggestion It natively emits GenAI metrics following [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/), and also emits MCP (Model Context Protocol) metrics and access logs via OTLP. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
