(skywalking-website) branch master updated: Add blog: Monitoring Envoy AI Gateway with SkyWalking (#824)

wusheng Thu, 02 Apr 2026 20:47:35 -0700

This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/skywalking-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 7d0ab9f748e Add blog: Monitoring Envoy AI Gateway with SkyWalking 
(#824)
7d0ab9f748e is described below

commit 7d0ab9f748e4a0405bf5fa30239fa5a09281f5d0
Author: 吴晟 Wu Sheng <[email protected]>
AuthorDate: Fri Apr 3 11:45:43 2026 +0800

    Add blog: Monitoring Envoy AI Gateway with SkyWalking (#824)
---
 .../index.md                                       | 337 +++++++++++++++++++++
 .../screen-1.png                                   | Bin 0 -> 118105 bytes
 .../screen-2.png                                   | Bin 0 -> 254079 bytes
 .../screen-3.png                                   | Bin 0 -> 156318 bytes
 .../screen-4.png                                   | Bin 0 -> 248357 bytes
 .../screen-5.png                                   | Bin 0 -> 382415 bytes
 .../workflow.jpg                                   | Bin 0 -> 449322 bytes
 .../index.md                                       | 294 ++++++++++++++++++
 .../screen-1.png                                   | Bin 0 -> 118105 bytes
 .../screen-2.png                                   | Bin 0 -> 254079 bytes
 .../screen-3.png                                   | Bin 0 -> 156318 bytes
 .../screen-4.png                                   | Bin 0 -> 248357 bytes
 .../screen-5.png                                   | Bin 0 -> 382415 bytes
 .../workflow.jpg                                   | Bin 0 -> 449322 bytes
 14 files changed, 631 insertions(+)

diff --git a/content/blog/2026-04-02-envoy-ai-gateway-monitoring/index.md 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/index.md
new file mode 100644
index 00000000000..eadaaa3f163
--- /dev/null
+++ b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/index.md
@@ -0,0 +1,337 @@
+---
+title: "Monitoring Envoy AI Gateway with Apache SkyWalking"
+date: 2026-04-02
+author: "Sheng Wu"
+description: "Set up full-stack observability for your AI/LLM traffic using 
Envoy AI Gateway, SkyWalking OAP 10.4.0, and BanyanDB 0.10.0."
+tags:
+- GenAI
+- Envoy
+- Observability
+---
+
+## The Problem: Flying Blind with LLM Traffic
+
+LLM traffic is becoming a first-class citizen in production infrastructure. 
Teams are calling OpenAI, Anthropic,
+AWS Bedrock, Azure OpenAI, Google Gemini — often multiple providers at once. 
But most organizations have
+no unified visibility into this traffic:
+
+- **Token costs spiral** without knowing which teams, models, or providers 
drive the spend.
+  A single misconfigured prompt template can burn through thousands of dollars 
before anyone notices.
+- **Provider outages cause cascading failures.** When OpenAI has a bad hour, 
your application goes down
+  with it — and you have no failover visibility to understand what happened or 
switch providers automatically.
+- **No unified metrics** across heterogeneous LLM calls. Latency, Time to 
First Token (TTFT),
+  Time Per Output Token (TPOT), token usage, error rates — each provider 
reports these differently,
+  if at all. There is no single dashboard to compare them.
+
+This is the same observability gap that microservices faced a decade ago. The 
solution then was
+service meshes and API gateways with built-in telemetry. For AI workloads, the 
answer is an AI gateway.
+
+## Why an AI Gateway
+
+[Envoy AI Gateway](https://aigateway.envoyproxy.io/) is an open-source AI 
gateway built on top of
+[Envoy Proxy](https://www.envoyproxy.io/) and [Envoy 
Gateway](https://gateway.envoyproxy.io/).
+It is not a standalone SaaS product or a Python proxy — it is 
infrastructure-grade software built on
+the same Envoy that already handles traffic for a large portion of 
cloud-native deployments.
+
+Key capabilities:
+
+- **Multi-provider routing** — supports 16+ AI providers (OpenAI, Anthropic, 
AWS Bedrock, Azure OpenAI,
+  Google Gemini, Mistral, Cohere, DeepSeek, and more) behind a unified API.
+- **Token-based rate limiting** — rate limit by token consumption, not just 
request count.
+- **Provider fallback** — automatic failover when a provider is down or slow.
+- **Model virtualization** — abstract model names so applications are 
decoupled from specific providers.
+- **Two-tier architecture** — a reference architecture with a centralized 
entry gateway (Tier 1) for
+  auth and global routing, and per-cluster gateways (Tier 2) for inference 
optimization.
+- **CNCF ecosystem native** — runs on Kubernetes, composes with existing Envoy 
filters, WASM plugins,
+  and standard Kubernetes Gateway API resources.
+
+Because Envoy AI Gateway natively emits GenAI metrics and access logs via OTLP 
following
+[OpenTelemetry GenAI Semantic 
Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/),
+it plugs directly into any OpenTelemetry-compatible backend.
+
+Starting from SkyWalking 10.4.0, the OAP server natively receives and analyzes 
Envoy AI Gateway's
+OTLP metrics and access logs — no OpenTelemetry Collector needed in between.
+
+## Data Flow
+
+The AI Gateway pushes telemetry directly to SkyWalking via OTLP gRPC:
+
+![Data flow](workflow.jpg)
+
+1. **Application** sends LLM API requests through the Envoy AI Gateway.
+2. **Envoy AI Gateway** routes requests to AI providers (or local models like 
Ollama)
+   and records GenAI metrics (token usage, latency, TTFT, TPOT) and access 
logs.
+3. The gateway pushes metrics and logs via **OTLP gRPC** directly to 
**SkyWalking OAP** on port 11800.
+4. SkyWalking OAP parses metrics with MAL rules and access logs with LAL rules,
+   then stores everything in **BanyanDB**.
+
+No OpenTelemetry Collector is needed. SkyWalking OAP's built-in OTLP receiver 
handles everything.
+
+## Try It Locally
+
+This demo uses [Ollama](https://ollama.com/) as a local LLM backend so you can 
try
+everything without an API key. The [Envoy AI Gateway 
CLI](https://github.com/envoyproxy/ai-gateway/tree/main/cmd/aigw)
+(`aigw`) provides a standalone mode that runs outside Kubernetes — perfect for 
local testing.
+
+### Prerequisites
+
+- Docker and Docker Compose
+- [Ollama](https://ollama.com/) installed on your host
+
+### Step 1: Start Ollama
+
+Start Ollama on all interfaces so Docker containers can reach it:
+
+```bash
+OLLAMA_HOST=0.0.0.0 ollama serve
+```
+
+Pull a small model for testing:
+
+```bash
+ollama pull llama3.2:1b
+```
+
+### Step 2: Start the Stack
+
+Create a `docker-compose.yaml`:
+
+```yaml
+services:
+  banyandb:
+    image: apache/skywalking-banyandb:0.10.0
+    container_name: banyandb
+    ports:
+      - "17912:17912"
+    command: standalone --stream-root-path /tmp/stream-data 
--measure-root-path /tmp/measure-data
+    healthcheck:
+      test: ["CMD-SHELL", "wget -qO- http://localhost:17913/api/healthz || 
exit 1"]
+      interval: 5s
+      timeout: 3s
+      retries: 10
+
+  oap:
+    image: apache/skywalking-oap-server:10.4.0
+    container_name: oap
+    depends_on:
+      banyandb:
+        condition: service_healthy
+    ports:
+      - "11800:11800"
+      - "12800:12800"
+    environment:
+      SW_STORAGE: banyandb
+      SW_STORAGE_BANYANDB_TARGETS: banyandb:17912
+    healthcheck:
+      test: ["CMD-SHELL", "bash -c 'echo > /dev/tcp/localhost/12800' || exit 
1"]
+      interval: 10s
+      timeout: 5s
+      retries: 30
+      start_period: 60s
+
+  ui:
+    image: apache/skywalking-ui:10.4.0
+    container_name: ui
+    depends_on:
+      oap:
+        condition: service_healthy
+    ports:
+      - "8080:8080"
+    environment:
+      SW_OAP_ADDRESS: http://oap:12800
+
+  aigw:
+    image: envoyproxy/ai-gateway-cli:latest
+    container_name: aigw
+    depends_on:
+      oap:
+        condition: service_healthy
+    environment:
+      - OPENAI_BASE_URL=http://host.docker.internal:11434/v1
+      - OPENAI_API_KEY=unused
+      - OTEL_SERVICE_NAME=my-ai-gateway
+      - OTEL_EXPORTER_OTLP_ENDPOINT=http://oap:11800
+      - OTEL_EXPORTER_OTLP_PROTOCOL=grpc
+      - OTEL_METRICS_EXPORTER=otlp
+      - OTEL_LOGS_EXPORTER=otlp
+      - OTEL_METRIC_EXPORT_INTERVAL=5000
+      - 
OTEL_RESOURCE_ATTRIBUTES=job_name=envoy-ai-gateway,service.instance.id=aigw-1,service.layer=ENVOY_AI_GATEWAY
+    ports:
+      - "1975:1975"
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    command: ["run"]
+```
+
+Start everything:
+
+```bash
+docker compose up -d
+```
+
+Wait for all services to become healthy (BanyanDB starts first, then OAP, then 
UI and AI Gateway):
+
+```bash
+docker compose ps
+```
+
+The key OTLP configuration on the `aigw` service:
+
+| Env Var | Value | Purpose |
+|---------|-------|---------|
+| `OTEL_SERVICE_NAME` | `my-ai-gateway` | Service name in SkyWalking |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://oap:11800` | SkyWalking OAP gRPC 
endpoint |
+| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | OTLP transport |
+| `OTEL_METRICS_EXPORTER` | `otlp` | Enable metrics push |
+| `OTEL_LOGS_EXPORTER` | `otlp` | Enable access log push |
+
+The `OTEL_RESOURCE_ATTRIBUTES` must include:
+- `job_name=envoy-ai-gateway` — routing tag for MAL/LAL rules
+- `service.instance.id=<id>` — instance identity
+- `service.layer=ENVOY_AI_GATEWAY` — routes logs to AI Gateway LAL rules
+
+The MAL and LAL rules are enabled by default in SkyWalking OAP. No OAP-side 
configuration is needed.
+
+### Step 3: Run the Demo App
+
+Create a simple Python application that sends requests through the AI Gateway 
(`app.py`).
+It mixes normal requests, streaming requests (for TTFT/TPOT metrics), and 
error requests
+(non-existent model → HTTP 404, always captured by the LAL sampling policy):
+
+```python
+import time, random, requests
+
+GATEWAY = "http://localhost:1975";
+HEADERS = {"Authorization": "Bearer unused", "Content-Type": 
"application/json"}
+
+questions = [
+    "What is Apache SkyWalking? Answer in one sentence.",
+    "What is Envoy Proxy used for? Answer in one sentence.",
+    "What are the benefits of an AI gateway? Answer in two sentences.",
+    "Explain observability in three sentences.",
+]
+
+def chat(model, question, stream=False):
+    resp = requests.post(
+        f"{GATEWAY}/v1/chat/completions",
+        json={"model": model, "messages": [{"role": "user", "content": 
question}], "stream": stream},
+        headers=HEADERS, timeout=60, stream=stream,
+    )
+    if stream:
+        chunks = []
+        for line in resp.iter_lines():
+            if line:
+                chunks.append(line.decode())
+        return resp.status_code, f"[streamed {len(chunks)} chunks]"
+    return resp.status_code, resp.json()
+
+while True:
+    r = random.random()
+    if r < 0.2:
+        # Error request: non-existent model triggers 404
+        status, body = chat("non-existent-model", "hello")
+        print(f"[error] model=non-existent-model status={status}")
+    elif r < 0.5:
+        # Streaming request — generates TTFT and TPOT metrics
+        q = random.choice(questions)
+        status, info = chat("llama3.2:1b", q, stream=True)
+        print(f"[stream] status={status} {info}")
+    else:
+        # Normal non-streaming request
+        q = random.choice(questions)
+        status, body = chat("llama3.2:1b", q)
+        answer = body.get("choices", [{}])[0].get("message", 
{}).get("content", "")[:80]
+        tokens = body.get("usage", {})
+        print(f"[ok] status={status} tokens={tokens} answer={answer}...")
+    time.sleep(random.randint(20, 30))
+```
+
+Run it:
+
+```bash
+pip install requests
+python app.py
+```
+
+The application talks to the AI Gateway on port 1975, which routes to Ollama.
+Each request generates GenAI metrics (token usage, latency, TTFT, TPOT) and 
access logs
+that the gateway pushes to SkyWalking via OTLP.
+
+The error requests (non-existent model → HTTP 404) are always captured by the 
access log
+sampling policy, so you will see them in the SkyWalking log view.
+
+### Step 4: View in SkyWalking UI
+
+Open [http://localhost:8080](http://localhost:8080) and select the **GenAI > 
Envoy AI Gateway** menu.
+
+The service list shows `my-ai-gateway` with CPM, latency, and token rates at a 
glance:
+
+![Service list](screen-1.png)
+
+Click into the service to see the full dashboard — Request CPM, Latency 
(average + percentiles),
+Input/Output Token Rates, TTFT, and TPOT:
+
+![Service dashboard](screen-2.png)
+
+The **Providers** tab breaks down metrics by AI provider:
+
+![Provider breakdown](screen-3.png)
+
+The **Models** tab shows per-model metrics including TTFT and TPOT (streaming 
only).
+Note the `unknown` model entries — these are the error requests with 
non-existent models:
+
+![Model breakdown](screen-4.png)
+
+The **Log** tab shows access logs. The sampling policy drops normal successful 
responses
+but always captures errors (HTTP 404) and high-token requests:
+
+![Access logs](screen-5.png)
+
+### Cleanup
+
+```bash
+docker compose down
+```
+
+## Deploying on Kubernetes
+
+For production deployments, Envoy AI Gateway runs as a full Kubernetes 
controller with
+Envoy Gateway as the control plane. See the
+[Envoy AI Gateway getting started 
guide](https://aigateway.envoyproxy.io/docs/getting-started/)
+for Kubernetes installation.
+
+The OTLP configuration is the same — set the `OTEL_*` environment variables on 
the
+AI Gateway's external processor to point at SkyWalking OAP's gRPC port (11800).
+See the [SkyWalking Envoy AI Gateway 
Monitoring](https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/)
+documentation for details.
+
+## GenAI Observability Without an AI Gateway
+
+Not every deployment uses an AI gateway. If your applications call LLM 
providers directly,
+SkyWalking 10.4.0 also provides GenAI observability through the
+[Virtual 
GenAI](https://skywalking.apache.org/docs/main/next/en/setup/service-agent/virtual-genai/)
 layer.
+
+This works with any SkyWalking-instrumented, OpenTelemetry-instrumented, or 
Zipkin-instrumented application.
+When traces carry `gen_ai.*` tags (following
+[OpenTelemetry GenAI Semantic 
Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)),
+SkyWalking derives per-provider and per-model metrics from the client side:
+latency, token usage, success rate, and estimated cost.
+
+For Java applications, the SkyWalking Java Agent (9.7+) includes a Spring AI 
plugin that automatically
+instruments calls to 13+ providers (OpenAI, Anthropic, AWS Bedrock, Google 
GenAI, DeepSeek, Mistral, etc.)
+with the correct `gen_ai.*` span tags — no code changes needed.
+
+This is a different use case from the Envoy AI Gateway monitoring covered 
above:
+
+- **Envoy AI Gateway layer**: infrastructure-level observability — what the 
gateway sees across all traffic.
+  Best for platform teams managing centralized AI routing.
+- **Virtual GenAI layer**: application-level observability — what each 
instrumented app sees for its own LLM calls.
+  Best for teams without a centralized gateway, or for per-application cost 
tracking.
+
+## References
+
+- [Envoy AI Gateway](https://aigateway.envoyproxy.io/) — project site and 
documentation
+- [Envoy AI Gateway 
CLI](https://github.com/envoyproxy/ai-gateway/tree/main/cmd/aigw) — standalone 
mode for local development
+- [SkyWalking Envoy AI Gateway 
Monitoring](https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/)
 — OAP setup doc
+- [SkyWalking Virtual 
GenAI](https://skywalking.apache.org/docs/main/next/en/setup/service-agent/virtual-genai/)
 — client-side GenAI observability
+- [OpenTelemetry GenAI Semantic 
Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) — the 
metric/attribute standard both projects follow
diff --git a/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-1.png 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-1.png
new file mode 100644
index 00000000000..60f4532cf49
Binary files /dev/null and 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-1.png differ
diff --git a/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-2.png 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-2.png
new file mode 100644
index 00000000000..58239cca650
Binary files /dev/null and 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-2.png differ
diff --git a/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-3.png 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-3.png
new file mode 100644
index 00000000000..812778c3dbf
Binary files /dev/null and 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-3.png differ
diff --git a/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-4.png 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-4.png
new file mode 100644
index 00000000000..9e3b524e682
Binary files /dev/null and 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-4.png differ
diff --git a/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-5.png 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-5.png
new file mode 100644
index 00000000000..24eba78a46d
Binary files /dev/null and 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/screen-5.png differ
diff --git a/content/blog/2026-04-02-envoy-ai-gateway-monitoring/workflow.jpg 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/workflow.jpg
new file mode 100644
index 00000000000..21f0beeeeb3
Binary files /dev/null and 
b/content/blog/2026-04-02-envoy-ai-gateway-monitoring/workflow.jpg differ
diff --git a/content/zh/2026-04-02-envoy-ai-gateway-monitoring/index.md 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/index.md
new file mode 100644
index 00000000000..6e20a14fc18
--- /dev/null
+++ b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/index.md
@@ -0,0 +1,294 @@
+---
+title: "用 Apache SkyWalking 监控 Envoy AI Gateway"
+date: 2026-04-02
+author: "吴晟"
+description: "基于 Envoy AI Gateway、SkyWalking OAP 10.4.0 和 BanyanDB 0.10.0，搭建面向 
AI/LLM 流量的全栈可观测方案。"
+tags:
+- GenAI
+- Envoy
+- Observability
+---
+
+## 问题：LLM 流量缺乏统一观测
+
+LLM 流量正在成为生产基础设施中不可忽视的一部分。团队同时在调用 OpenAI、Anthropic、AWS Bedrock、Azure 
OpenAI、Google Gemini——往往还不止一个提供商。但大多数组织对这些流量缺乏统一的可见性：
+
+- **Token 费用失控**，却不知道哪个团队、哪个模型、哪个提供商在烧钱。一个配置不当的 prompt 模板就可能在无人察觉的情况下烧掉几千美元。
+- **提供商故障引发连锁反应。** OpenAI 出问题的那一小时，你的应用也跟着挂——而你既没有故障切换的可见性，也无法自动切换提供商。
+- **缺乏统一指标。** 延迟、首 Token 耗时（TTFT）、每 Token 输出耗时（TPOT）、Token 
用量、错误率——每个提供商的报告方式都不一样，有些甚至不提供。没有一个统一的面板能做对比。
+
+这和十年前微服务面临的可观测性困境如出一辙。当时的解法是服务网格和内置遥测的 API 网关。对 AI 工作负载来说，答案就是 AI 网关。
+
+## 为什么选择 AI 网关
+
+[Envoy AI Gateway](https://aigateway.envoyproxy.io/) 是一个开源 AI 网关，构建在 [Envoy 
Proxy](https://www.envoyproxy.io/) 和 [Envoy 
Gateway](https://gateway.envoyproxy.io/) 之上。底层就是云原生世界里已经广泛部署的 
Envoy，天然具备基础设施级的稳定性和性能。
+
+核心能力：
+
+- **多提供商路由** —— 支持 16+ AI 提供商（OpenAI、Anthropic、AWS Bedrock、Azure OpenAI、Google 
Gemini、Mistral、Cohere、DeepSeek 等），统一 API 接入。
+- **基于 Token 的限流** —— 按 Token 消耗限流，而不只是按请求数。
+- **提供商故障切换** —— 某个提供商宕机或响应慢时自动切换。
+- **模型虚拟化** —— 抽象模型名称，让应用与具体提供商解耦。
+- **两层架构** —— 参考架构包含一个集中入口网关（Tier 1）负责认证和全局路由，以及每集群网关（Tier 2）负责推理优化。
+- **CNCF 生态原生** —— 运行在 Kubernetes 上，兼容现有的 Envoy Filter、WASM 插件和标准 Kubernetes 
Gateway API 资源。
+
+Envoy AI Gateway 原生支持通过 OTLP 发送 GenAI 指标和访问日志，遵循 [OpenTelemetry GenAI 
语义约定](https://opentelemetry.io/docs/specs/semconv/gen-ai/)，可以直接接入任何兼容 
OpenTelemetry 的后端。
+
+从 SkyWalking 10.4.0 开始，OAP 原生接收和分析 Envoy AI Gateway 的 OTLP 指标和访问日志——中间不需要部署 
OpenTelemetry Collector。
+
+## 数据流
+
+AI Gateway 通过 OTLP gRPC 直接将遥测数据推送到 SkyWalking：
+
+![数据流](workflow.jpg)
+
+1. **应用** 通过 Envoy AI Gateway 发送 LLM API 请求。
+2. **Envoy AI Gateway** 将请求路由到 AI 提供商（或 Ollama 这样的本地模型），同时记录 GenAI 指标（Token 
用量、延迟、TTFT、TPOT）和访问日志。
+3. 网关通过 **OTLP gRPC** 直接将指标和日志推送到 **SkyWalking OAP** 的 11800 端口。
+4. SkyWalking OAP 用 MAL 规则解析指标、用 LAL 规则解析访问日志，然后统一存储到 **BanyanDB**。
+
+不需要 OpenTelemetry Collector。SkyWalking OAP 内置的 OTLP 接收器可以直接处理所有数据。
+
+## 本地体验
+
+这个 Demo 使用 [Ollama](https://ollama.com/) 作为本地 LLM 后端，不需要任何 API Key 
就能跑起来。[Envoy AI Gateway 
CLI](https://github.com/envoyproxy/ai-gateway/tree/main/cmd/aigw)（`aigw`）提供独立运行模式，不依赖
 Kubernetes，非常适合本地测试。
+
+### 前置条件
+
+- Docker 和 Docker Compose
+- 主机上已安装 [Ollama](https://ollama.com/)
+
+### 第一步：启动 Ollama
+
+让 Ollama 监听所有网络接口，以便 Docker 容器能访问到：
+
+```bash
+OLLAMA_HOST=0.0.0.0 ollama serve
+```
+
+拉取一个小模型用于测试：
+
+```bash
+ollama pull llama3.2:1b
+```
+
+### 第二步：启动服务栈
+
+创建 `docker-compose.yaml`：
+
+```yaml
+services:
+  banyandb:
+    image: apache/skywalking-banyandb:0.10.0
+    container_name: banyandb
+    ports:
+      - "17912:17912"
+    command: standalone --stream-root-path /tmp/stream-data 
--measure-root-path /tmp/measure-data
+    healthcheck:
+      test: ["CMD-SHELL", "wget -qO- http://localhost:17913/api/healthz || 
exit 1"]
+      interval: 5s
+      timeout: 3s
+      retries: 10
+
+  oap:
+    image: apache/skywalking-oap-server:10.4.0
+    container_name: oap
+    depends_on:
+      banyandb:
+        condition: service_healthy
+    ports:
+      - "11800:11800"
+      - "12800:12800"
+    environment:
+      SW_STORAGE: banyandb
+      SW_STORAGE_BANYANDB_TARGETS: banyandb:17912
+    healthcheck:
+      test: ["CMD-SHELL", "bash -c 'echo > /dev/tcp/localhost/12800' || exit 
1"]
+      interval: 10s
+      timeout: 5s
+      retries: 30
+      start_period: 60s
+
+  ui:
+    image: apache/skywalking-ui:10.4.0
+    container_name: ui
+    depends_on:
+      oap:
+        condition: service_healthy
+    ports:
+      - "8080:8080"
+    environment:
+      SW_OAP_ADDRESS: http://oap:12800
+
+  aigw:
+    image: envoyproxy/ai-gateway-cli:latest
+    container_name: aigw
+    depends_on:
+      oap:
+        condition: service_healthy
+    environment:
+      - OPENAI_BASE_URL=http://host.docker.internal:11434/v1
+      - OPENAI_API_KEY=unused
+      - OTEL_SERVICE_NAME=my-ai-gateway
+      - OTEL_EXPORTER_OTLP_ENDPOINT=http://oap:11800
+      - OTEL_EXPORTER_OTLP_PROTOCOL=grpc
+      - OTEL_METRICS_EXPORTER=otlp
+      - OTEL_LOGS_EXPORTER=otlp
+      - OTEL_METRIC_EXPORT_INTERVAL=5000
+      - 
OTEL_RESOURCE_ATTRIBUTES=job_name=envoy-ai-gateway,service.instance.id=aigw-1,service.layer=ENVOY_AI_GATEWAY
+    ports:
+      - "1975:1975"
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    command: ["run"]
+```
+
+启动所有服务：
+
+```bash
+docker compose up -d
+```
+
+等待所有服务变为健康状态（BanyanDB 先启动，然后是 OAP，最后是 UI 和 AI Gateway）：
+
+```bash
+docker compose ps
+```
+
+`aigw` 服务的关键 OTLP 配置：
+
+| 环境变量 | 值 | 用途 |
+|---------|-------|---------|
+| `OTEL_SERVICE_NAME` | `my-ai-gateway` | SkyWalking 中的服务名 |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://oap:11800` | SkyWalking OAP gRPC 端点 |
+| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | OTLP 传输协议 |
+| `OTEL_METRICS_EXPORTER` | `otlp` | 启用指标推送 |
+| `OTEL_LOGS_EXPORTER` | `otlp` | 启用访问日志推送 |
+
+`OTEL_RESOURCE_ATTRIBUTES` 必须包含：
+- `job_name=envoy-ai-gateway` —— MAL/LAL 规则的路由标签
+- `service.instance.id=<id>` —— 实例标识
+- `service.layer=ENVOY_AI_GATEWAY` —— 将日志路由到 AI Gateway LAL 规则
+
+MAL 和 LAL 规则在 SkyWalking OAP 中默认启用，不需要额外配置。
+
+### 第三步：运行 Demo 应用
+
+创建一个简单的 Python 应用，通过 AI Gateway 发送请求（`app.py`）。
+它混合了普通请求、流式请求（用于产生 TTFT/TPOT 指标）和错误请求（不存在的模型 → HTTP 404，始终会被 LAL 采样策略捕获）：
+
+```python
+import time, random, requests
+
+GATEWAY = "http://localhost:1975";
+HEADERS = {"Authorization": "Bearer unused", "Content-Type": 
"application/json"}
+
+questions = [
+    "What is Apache SkyWalking? Answer in one sentence.",
+    "What is Envoy Proxy used for? Answer in one sentence.",
+    "What are the benefits of an AI gateway? Answer in two sentences.",
+    "Explain observability in three sentences.",
+]
+
+def chat(model, question, stream=False):
+    resp = requests.post(
+        f"{GATEWAY}/v1/chat/completions",
+        json={"model": model, "messages": [{"role": "user", "content": 
question}], "stream": stream},
+        headers=HEADERS, timeout=60, stream=stream,
+    )
+    if stream:
+        chunks = []
+        for line in resp.iter_lines():
+            if line:
+                chunks.append(line.decode())
+        return resp.status_code, f"[streamed {len(chunks)} chunks]"
+    return resp.status_code, resp.json()
+
+while True:
+    r = random.random()
+    if r < 0.2:
+        # Error request: non-existent model triggers 404
+        status, body = chat("non-existent-model", "hello")
+        print(f"[error] model=non-existent-model status={status}")
+    elif r < 0.5:
+        # Streaming request — generates TTFT and TPOT metrics
+        q = random.choice(questions)
+        status, info = chat("llama3.2:1b", q, stream=True)
+        print(f"[stream] status={status} {info}")
+    else:
+        # Normal non-streaming request
+        q = random.choice(questions)
+        status, body = chat("llama3.2:1b", q)
+        answer = body.get("choices", [{}])[0].get("message", 
{}).get("content", "")[:80]
+        tokens = body.get("usage", {})
+        print(f"[ok] status={status} tokens={tokens} answer={answer}...")
+    time.sleep(random.randint(20, 30))
+```
+
+运行：
+
+```bash
+pip install requests
+python app.py
+```
+
+应用通过 1975 端口与 AI Gateway 通信，AI Gateway 再路由到 Ollama。每次请求都会产生 GenAI 指标（Token 
用量、延迟、TTFT、TPOT）和访问日志，由网关通过 OTLP 推送到 SkyWalking。
+
+错误请求（不存在的模型 → HTTP 404）始终会被访问日志采样策略捕获，所以在 SkyWalking 的日志视图中一定能看到。
+
+### 第四步：在 SkyWalking UI 中查看
+
+打开 [http://localhost:8080](http://localhost:8080)，选择 **GenAI > Envoy AI 
Gateway** 菜单。
+
+服务列表显示 `my-ai-gateway`，可以一览 CPM、延迟和 Token 速率：
+
+![服务列表](screen-1.png)
+
+点击进入服务详情，查看完整仪表盘——请求 CPM、延迟（平均值 + 百分位数）、输入/输出 Token 速率、TTFT 和 TPOT：
+
+![服务仪表盘](screen-2.png)
+
+**Providers** 标签页按 AI 提供商维度展示指标：
+
+![提供商维度](screen-3.png)
+
+**Models** 标签页展示每个模型的指标，包括 TTFT 和 TPOT（仅流式请求）。注意 `unknown` 
模型条目——这些就是使用不存在模型的错误请求：
+
+![模型维度](screen-4.png)
+
+**Log** 标签页展示访问日志。采样策略会丢弃正常的成功响应，但始终保留错误（HTTP 404）和高 Token 消耗的请求：
+
+![访问日志](screen-5.png)
+
+### 清理
+
+```bash
+docker compose down
+```
+
+## Kubernetes 生产部署
+
+生产环境中，Envoy AI Gateway 作为完整的 Kubernetes 控制器运行，以 Envoy Gateway 作为控制面。详见 [Envoy 
AI Gateway 入门指南](https://aigateway.envoyproxy.io/docs/getting-started/)。
+
+OTLP 配置方式相同——在 AI Gateway 的 External Processor 上设置 `OTEL_*` 环境变量，指向 SkyWalking 
OAP 的 gRPC 端口（11800）。详见 [SkyWalking Envoy AI Gateway 
监控文档](https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/)。
+
+## 不用 AI 网关也能做 GenAI 可观测
+
+并非所有场景都需要 AI 网关。如果你的应用直接调用 LLM 提供商，SkyWalking 10.4.0 也提供了基于 [Virtual 
GenAI](https://skywalking.apache.org/docs/main/next/en/setup/service-agent/virtual-genai/)
 层的 GenAI 可观测方案。
+
+任何接入了 SkyWalking、OpenTelemetry 或 Zipkin 探针的应用都能使用这个功能。只要 Trace 中携带 `gen_ai.*` 
标签（遵循 [OpenTelemetry GenAI 
语义约定](https://opentelemetry.io/docs/specs/semconv/gen-ai/)），SkyWalking 
就能从客户端视角推导出每提供商、每模型的指标：延迟、Token 用量、成功率和预估费用。
+
+对于 Java 应用，SkyWalking Java Agent（9.7+）内置了 Spring AI 插件，自动为 13+ 
提供商（OpenAI、Anthropic、AWS Bedrock、Google GenAI、DeepSeek、Mistral 等）的调用注入正确的 
`gen_ai.*` Span 标签——不需要改代码。
+
+这与上面介绍的 Envoy AI Gateway 监控是不同的使用场景：
+
+- **Envoy AI Gateway 层**：基础设施级可观测——网关视角，覆盖所有流量。适合负责集中 AI 路由的平台团队。
+- **Virtual GenAI 层**：应用级可观测——每个应用自己看到的 LLM 调用情况。适合没有集中网关的团队，或者需要按应用维度跟踪费用的场景。
+
+## 参考资料
+
+- [Envoy AI Gateway](https://aigateway.envoyproxy.io/) —— 项目官网和文档
+- [Envoy AI Gateway 
CLI](https://github.com/envoyproxy/ai-gateway/tree/main/cmd/aigw) —— 
本地开发用的独立运行模式
+- [SkyWalking Envoy AI Gateway 
监控](https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/)
 —— OAP 配置文档
+- [SkyWalking Virtual 
GenAI](https://skywalking.apache.org/docs/main/next/en/setup/service-agent/virtual-genai/)
 —— 客户端侧 GenAI 可观测
+- [OpenTelemetry GenAI 
语义约定](https://opentelemetry.io/docs/specs/semconv/gen-ai/) —— 两个项目共同遵循的指标/属性标准
diff --git a/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-1.png 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-1.png
new file mode 100644
index 00000000000..60f4532cf49
Binary files /dev/null and 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-1.png differ
diff --git a/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-2.png 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-2.png
new file mode 100644
index 00000000000..58239cca650
Binary files /dev/null and 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-2.png differ
diff --git a/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-3.png 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-3.png
new file mode 100644
index 00000000000..812778c3dbf
Binary files /dev/null and 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-3.png differ
diff --git a/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-4.png 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-4.png
new file mode 100644
index 00000000000..9e3b524e682
Binary files /dev/null and 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-4.png differ
diff --git a/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-5.png 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-5.png
new file mode 100644
index 00000000000..24eba78a46d
Binary files /dev/null and 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/screen-5.png differ
diff --git a/content/zh/2026-04-02-envoy-ai-gateway-monitoring/workflow.jpg 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/workflow.jpg
new file mode 100644
index 00000000000..21f0beeeeb3
Binary files /dev/null and 
b/content/zh/2026-04-02-envoy-ai-gateway-monitoring/workflow.jpg differ

(skywalking-website) branch master updated: Add blog: Monitoring Envoy AI Gateway with SkyWalking (#824)

Reply via email to