Copilot commented on code in PR #13791:
URL: https://github.com/apache/skywalking/pull/13791#discussion_r3036254848
##########
oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml:
##########
@@ -15,29 +15,32 @@
# Envoy AI Gateway access log processing via OTLP.
#
-# Sampling policy: only persist abnormal or expensive requests.
-# Normal 200 responses with low token count and no upstream failure are
dropped.
+# Two rules: one for LLM route logs, one for MCP route logs.
+# LLM sampling: only persist error responses (>= 400) and upstream failures.
+# MCP sampling: only persist error responses (>= 400).
+# Both tag ai_route_type for searchable filtering in the UI.
rules:
- - name: envoy-ai-gateway-access-log
+ - name: envoy-ai-gateway-llm-access-log
layer: ENVOY_AI_GATEWAY
dsl: |
filter {
- // Drop normal logs: response < 400, no upstream failure, low token
count
+ // Only process LLM route logs (gen_ai.request.model is always set for
LLM routes, even on errors)
+ if (tag("gen_ai.request.model") == "" || tag("gen_ai.request.model")
== "-") {
+ abort {}
+ }
+
+ // Keep only: error responses (>= 400) and upstream failures
if (tag("response_code") != "" && tag("response_code") != "-") {
Review Comment:
In the LLM rule, logs with missing/"-" response_code are never aborted by
the sampling logic. If such logs exist without an upstream failure reason,
they'll be persisted unexpectedly and could increase log volume. Consider
explicitly handling response_code empty/"-" (e.g., abort unless
upstream_transport_failure_reason is set) to keep the stated sampling policy
(“errors and upstream failures only”) accurate.
```suggestion
if (tag("response_code") == "" || tag("response_code") == "-") {
if (tag("upstream_transport_failure_reason") == "" ||
tag("upstream_transport_failure_reason") == "-") {
abort {}
}
} else {
```
##########
test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml:
##########
@@ -47,6 +48,32 @@ setup:
-d
'{"model":"nonexistent-model","messages":[{"role":"user","content":"Hi"}]}'
sleep 1
done
+ - name: Send MCP requests for metric verification
+ command: |
+ # Initialize MCP session and capture session ID
+ SESSION_ID=$(curl -sS -f -D- --max-time 15 \
+ http://${aigw_host}:${aigw_1975}/mcp \
+ -H 'Content-Type: application/json' \
+ -d
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"e2e-test","version":"1.0"}}}'
\
+ 2>&1 | grep -i '^Mcp-Session-Id:' | tr -d '\r' | sed
's/^[Mm]cp-[Ss]ession-[Ii]d: *//')
+ if [ -z "$SESSION_ID" ]; then
+ echo "Failed to capture MCP session ID" >&2
+ exit 1
+ fi
+ sleep 2
+ # tools/list
+ curl -s --max-time 15 -o /dev/null \
+ http://${aigw_host}:${aigw_1975}/mcp \
+ -H 'Content-Type: application/json' \
+ -H "Mcp-Session-Id: $SESSION_ID" \
+ -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
+ sleep 1
+ # tools/call — call a nonexistent tool to generate an error
+ curl -s --max-time 15 -o /dev/null \
+ http://${aigw_host}:${aigw_1975}/mcp \
+ -H 'Content-Type: application/json' \
+ -H "Mcp-Session-Id: $SESSION_ID" \
+ -d
'{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"nonexistent_tool","arguments":{}}}'
Review Comment:
The MCP request setup uses curl without `-f` for the `tools/list` and
`tools/call` requests. If these requests fail (network, bad session id, backend
not ready), the setup step still succeeds and later metric assertions fail with
less actionable errors. Consider adding `-f` (and/or explicit exit-on-failure
checks) to fail fast and make the e2e failure reason clearer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]