Re: [PR] Support MCP observability for Envoy AI Gateway [skywalking]

via GitHub Sat, 04 Apr 2026 18:17:46 -0700


Copilot commented on code in PR #13791:
URL: https://github.com/apache/skywalking/pull/13791#discussion_r3036254848



##########
oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml:
##########
@@ -15,29 +15,32 @@
 
 # Envoy AI Gateway access log processing via OTLP.
 #
-# Sampling policy: only persist abnormal or expensive requests.
-# Normal 200 responses with low token count and no upstream failure are 
dropped.
+# Two rules: one for LLM route logs, one for MCP route logs.
+# LLM sampling: only persist error responses (>= 400) and upstream failures.
+# MCP sampling: only persist error responses (>= 400).
+# Both tag ai_route_type for searchable filtering in the UI.
 
 rules:
-  - name: envoy-ai-gateway-access-log
+  - name: envoy-ai-gateway-llm-access-log
     layer: ENVOY_AI_GATEWAY
     dsl: |
       filter {
-        // Drop normal logs: response < 400, no upstream failure, low token 
count
+        // Only process LLM route logs (gen_ai.request.model is always set for 
LLM routes, even on errors)
+        if (tag("gen_ai.request.model") == "" || tag("gen_ai.request.model") 
== "-") {
+          abort {}
+        }
+
+        // Keep only: error responses (>= 400) and upstream failures
         if (tag("response_code") != "" && tag("response_code") != "-") {

Review Comment:
   In the LLM rule, logs with missing/"-" response_code are never aborted by 
the sampling logic. If such logs exist without an upstream failure reason, 
they'll be persisted unexpectedly and could increase log volume. Consider 
explicitly handling response_code empty/"-" (e.g., abort unless 
upstream_transport_failure_reason is set) to keep the stated sampling policy 
(“errors and upstream failures only”) accurate.
   ```suggestion
           if (tag("response_code") == "" || tag("response_code") == "-") {
             if (tag("upstream_transport_failure_reason") == "" || 
tag("upstream_transport_failure_reason") == "-") {
               abort {}
             }
           } else {
   ```



##########
test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml:
##########
@@ -47,6 +48,32 @@ setup:
             -d 
'{"model":"nonexistent-model","messages":[{"role":"user","content":"Hi"}]}'
           sleep 1
         done
+    - name: Send MCP requests for metric verification
+      command: |
+        # Initialize MCP session and capture session ID
+        SESSION_ID=$(curl -sS -f -D- --max-time 15 \
+          http://${aigw_host}:${aigw_1975}/mcp \
+          -H 'Content-Type: application/json' \
+          -d 
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"e2e-test","version":"1.0"}}}'
 \
+          2>&1 | grep -i '^Mcp-Session-Id:' | tr -d '\r' | sed 
's/^[Mm]cp-[Ss]ession-[Ii]d: *//')
+        if [ -z "$SESSION_ID" ]; then
+          echo "Failed to capture MCP session ID" >&2
+          exit 1
+        fi
+        sleep 2
+        # tools/list
+        curl -s --max-time 15 -o /dev/null \
+          http://${aigw_host}:${aigw_1975}/mcp \
+          -H 'Content-Type: application/json' \
+          -H "Mcp-Session-Id: $SESSION_ID" \
+          -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
+        sleep 1
+        # tools/call — call a nonexistent tool to generate an error
+        curl -s --max-time 15 -o /dev/null \
+          http://${aigw_host}:${aigw_1975}/mcp \
+          -H 'Content-Type: application/json' \
+          -H "Mcp-Session-Id: $SESSION_ID" \
+          -d 
'{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"nonexistent_tool","arguments":{}}}'

Review Comment:
   The MCP request setup uses curl without `-f` for the `tools/list` and 
`tools/call` requests. If these requests fail (network, bad session id, backend 
not ready), the setup step still succeeds and later metric assertions fail with 
less actionable errors. Consider adding `-f` (and/or explicit exit-on-failure 
checks) to fail fast and make the e2e failure reason clearer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Support MCP observability for Envoy AI Gateway [skywalking]

Reply via email to