This is an automated email from the ASF dual-hosted git repository. wusheng pushed a commit to branch swip/swip-10-envoy-ai-gateway in repository https://gitbox.apache.org/repos/asf/skywalking.git
commit 36d0bcb087e415f6a607fa7d16cd89860de7ad26 Author: Wu Sheng <[email protected]> AuthorDate: Wed Mar 25 14:34:32 2026 +0800 Add SWIP-10: Support Envoy AI Gateway Observability --- docs/en/swip/SWIP-10/SWIP.md | 767 ++++++++++++++++++++++++++ docs/en/swip/SWIP-10/kind-test-resources.yaml | 247 +++++++++ docs/en/swip/SWIP-10/kind-test-setup.sh | 108 ++++ docs/en/swip/readme.md | 3 +- 4 files changed, 1124 insertions(+), 1 deletion(-) diff --git a/docs/en/swip/SWIP-10/SWIP.md b/docs/en/swip/SWIP-10/SWIP.md new file mode 100644 index 0000000000..1910b140e6 --- /dev/null +++ b/docs/en/swip/SWIP-10/SWIP.md @@ -0,0 +1,767 @@ +# SWIP-10 Support Envoy AI Gateway Observability + +## Motivation +[Envoy AI Gateway](https://aigateway.envoyproxy.io/) is a gateway/proxy for AI/LLM API traffic (OpenAI, Anthropic, +AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on top of Envoy Proxy. It provides GenAI-specific observability +following [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/), including +token usage tracking, request latency, time-to-first-token (TTFT), and inter-token latency. + +SkyWalking should support monitoring Envoy AI Gateway as a first-class integration, providing: +1. **Metrics monitoring** via OTLP push for GenAI metrics. +2. **Access log collection** via OTLP log sink for per-request AI metadata analysis. + +This is complementary to [PR #13745](https://github.com/apache/skywalking/pull/13745) (agent-based Virtual GenAI +monitoring). The agent-based approach monitors LLM calls from the client application side, while this SWIP monitors +from the gateway (infrastructure) side. Both can coexist — the AI Gateway provides infrastructure-level visibility +regardless of whether the calling application is instrumented. + +## Architecture Graph + +### Metrics Path (OTLP Push) +``` +┌─────────────────┐ OTLP gRPC ┌─────────────────┐ +│ Envoy AI │ ──────────────────> │ SkyWalking OAP │ +│ Gateway │ (push, port 11800) │ (otel-receiver) │ +│ │ │ │ +│ 4 GenAI metrics│ │ MAL rules │ +│ + labels │ │ → aggregation │ +└─────────────────┘ └─────────────────┘ +``` + +### Access Log Path (OTLP Push) +``` +┌─────────────────┐ OTLP gRPC ┌─────────────────┐ +│ Envoy AI │ ──────────────────> │ SkyWalking OAP │ +│ Gateway │ (push, port 11800) │ (otel-receiver) │ +│ │ │ │ +│ access logs │ │ LAL rules │ +│ with AI meta │ │ → analysis │ +└─────────────────┘ └─────────────────┘ +``` +The AI Gateway natively supports an OTLP access log sink (via Envoy Gateway's OpenTelemetry sink), +pushing structured access logs directly to the OAP's OTLP receiver. No FluentBit or external log +collector is needed. + +## Proposed Changes + +### 1. New Layer: `ENVOY_AI_GATEWAY` + +Add a new layer in `Layer.java`: +```java +/** + * Envoy AI Gateway is an AI/LLM traffic gateway built on Envoy Proxy, + * providing observability for GenAI API traffic. + */ +ENVOY_AI_GATEWAY(46, true), +``` + +This is a **normal** layer (`isNormal=true`) because the AI Gateway is a real, instrumented infrastructure component +(similar to `KONG`, `APISIX`, `NGINX`), not a virtual/conjectured service. + +### 2. Entity Model + +#### `job_name` — Routing Tag for MAL/LAL Rules + +SkyWalking's OTel receiver maps the OTLP resource attribute `service.name` to the internal tag `job_name`. +This tag is used by MAL rule filters to route metrics to the correct rule set. All Envoy AI Gateway +deployments must use a fixed `OTEL_SERVICE_NAME` value so that SkyWalking can identify the traffic: + +```bash +OTEL_SERVICE_NAME=envoy-ai-gateway +``` + +This becomes `job_name=envoy-ai-gateway` in MAL, and the rules filter on it: +```yaml +filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }" +``` + +`job_name` is NOT the SkyWalking service name — it is only used for metric/log routing. + +#### Service and Instance Mapping + +| SkyWalking Entity | Source | Example | +|---|---|---| +| **Service** | `aigw.service` resource attribute (K8s Deployment/Service name, set via CRD) | `envoy-ai-gateway-basic` | +| **Service Instance** | `service.instance.id` resource attribute (pod name, set via CRD + Downward API) | `aigw-pod-7b9f4d8c5` | + +Each Kubernetes Gateway deployment is a separate SkyWalking **service**. Each pod (ext_proc replica) is a +**service instance**. Neither attribute is emitted by the AI Gateway by default — both must be explicitly +set via `OTEL_RESOURCE_ATTRIBUTES` in the `GatewayConfig` CRD (see below). + +The **layer** (`ENVOY_AI_GATEWAY`) is set by MAL/LAL rules based on the `job_name` filter, not by the +client. This follows the same pattern as other SkyWalking OTel integrations (e.g., ActiveMQ, K8s). + +Provider and model are **metric-level labels**, not separate entities in this layer. They are used for +fine-grained metric breakdowns within the gateway service dashboards rather than being modeled as separate +services (unlike the agent-based `VIRTUAL_GENAI` layer where provider=service, model=instance). + +The MAL `expSuffix` uses the `aigw_service` tag (dots converted to underscores by OTel receiver) as the +SkyWalking service name and `service_instance_id` as the instance name: +```yaml +expSuffix: service(['aigw_service'], Layer.ENVOY_AI_GATEWAY).instance(['aigw_service', 'service_instance_id']) +``` + +#### Complete Kubernetes Setup Example + +The following example shows a complete Envoy AI Gateway deployment configured for SkyWalking +observability via OTLP metrics and access logs. + +```yaml +# 1. GatewayClass — standard Envoy Gateway controller +apiVersion: gateway.networking.k8s.io/v1 +kind: GatewayClass +metadata: + name: envoy-ai-gateway +spec: + controllerName: gateway.envoyproxy.io/gatewayclass-controller +--- +# 2. GatewayConfig — OTLP configuration for SkyWalking +# One GatewayConfig per gateway. Sets job_name, service name, instance ID, +# and enables OTLP push for both metrics and access logs. +apiVersion: aigateway.envoyproxy.io/v1alpha1 +kind: GatewayConfig +metadata: + name: my-gateway-config + namespace: default +spec: + extProc: + kubernetes: + env: + # job_name — fixed value for MAL/LAL rule routing (same for ALL AI Gateway deployments) + - name: OTEL_SERVICE_NAME + value: "envoy-ai-gateway" + # OTLP endpoint — SkyWalking OAP gRPC receiver + - name: OTEL_EXPORTER_OTLP_ENDPOINT + value: "http://skywalking-oap.skywalking:11800" + - name: OTEL_EXPORTER_OTLP_PROTOCOL + value: "grpc" + # Enable OTLP for both metrics and access logs + - name: OTEL_METRICS_EXPORTER + value: "otlp" + - name: OTEL_LOGS_EXPORTER + value: "otlp" + # Gateway name = Gateway CRD metadata.name (e.g., "my-ai-gateway") + # Read from pod label gateway.envoyproxy.io/owning-gateway-name, + # which is auto-set by the Envoy Gateway controller on every envoy pod. + - name: GATEWAY_NAME + valueFrom: + fieldRef: + fieldPath: metadata.labels['gateway.envoyproxy.io/owning-gateway-name'] + # Pod name (e.g., "envoy-default-my-ai-gateway-76d02f2b-xxx") + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + # aigw.service → SkyWalking service name (= Gateway CRD name, auto-resolved) + # service.instance.id → SkyWalking instance name (= pod name, auto-resolved) + # $(VAR) substitution references the valueFrom env vars defined above. + - name: OTEL_RESOURCE_ATTRIBUTES + value: "aigw.service=$(GATEWAY_NAME),service.instance.id=$(POD_NAME)" +--- +# 3. Gateway — references the GatewayConfig via annotation +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: my-ai-gateway + namespace: default + annotations: + aigateway.envoyproxy.io/gateway-config: my-gateway-config +spec: + gatewayClassName: envoy-ai-gateway + listeners: + - name: http + protocol: HTTP + port: 80 +--- +# 4. AIGatewayRoute — routing rules + token metadata for access logs +apiVersion: aigateway.envoyproxy.io/v1alpha1 +kind: AIGatewayRoute +metadata: + name: my-ai-gateway-route + namespace: default +spec: + parentRefs: + - name: my-ai-gateway + kind: Gateway + group: gateway.networking.k8s.io + # Enable token counts in access logs + llmRequestCosts: + - metadataKey: llm_input_token + type: InputToken + - metadataKey: llm_output_token + type: OutputToken + - metadataKey: llm_total_token + type: TotalToken + # Route all models to the backend + rules: + - backendRefs: + - name: openai-backend +--- +# 5. AIServiceBackend + Backend — LLM provider +apiVersion: aigateway.envoyproxy.io/v1alpha1 +kind: AIServiceBackend +metadata: + name: openai-backend + namespace: default +spec: + schema: + name: OpenAI + backendRef: + name: openai-backend + kind: Backend + group: gateway.envoyproxy.io +--- +apiVersion: gateway.envoyproxy.io/v1alpha1 +kind: Backend +metadata: + name: openai-backend + namespace: default +spec: + endpoints: + - fqdn: + hostname: api.openai.com + port: 443 +``` + +**Key env var mapping:** + +| Env Var / Resource Attribute | SkyWalking Concept | Example Value | +|---|---|---| +| `OTEL_SERVICE_NAME` | `job_name` (MAL/LAL rule routing) | `envoy-ai-gateway` (fixed for all deployments) | +| `aigw.service` | Service name | `my-ai-gateway` (auto-resolved from gateway name label) | +| `service.instance.id` | Instance name | `envoy-default-my-ai-gateway-...` (auto-resolved from pod name) | + +**No manual per-gateway configuration needed** for service and instance names: +- `GATEWAY_NAME` is auto-resolved from the pod label `gateway.envoyproxy.io/owning-gateway-name`, + which is set automatically by the Envoy Gateway controller on every envoy pod. +- `POD_NAME` is auto-resolved from the pod name via the Downward API. +- Both are injected into `OTEL_RESOURCE_ATTRIBUTES` via standard Kubernetes `$(VAR)` substitution. + +The `GatewayConfig.spec.extProc.kubernetes.env` field accepts full `corev1.EnvVar` objects (including +`valueFrom`), merged into the ext_proc container by the gateway mutator webhook. Verified on Kind +cluster — the gateway label resolves correctly (e.g., `my-ai-gateway`). + +**Important:** The `resource.WithFromEnv()` code path in the AI Gateway (`internal/metrics/metrics.go`) +is conditional — it only executes when `OTEL_EXPORTER_OTLP_ENDPOINT` is set (or `OTEL_METRICS_EXPORTER=console`). +The ext_proc runs in-process (not as a subprocess), so there is no env var propagation issue. + +### 3. MAL Rules for OTLP Metrics + +Create `oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/` with MAL rules consuming +the 4 GenAI metrics from Envoy AI Gateway. + +All MAL rule files use the `job_name` filter to match only AI Gateway traffic: +```yaml +filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }" +``` + +#### Source Metrics from AI Gateway + +| Metric | Type | Labels | +|---|---|---| +| `gen_ai_client_token_usage` | Histogram (Delta) | `gen_ai.token.type` (input/output), `gen_ai.provider.name`, `gen_ai.response.model`, `gen_ai.operation.name` | +| `gen_ai_server_request_duration` | Histogram | `gen_ai.provider.name`, `gen_ai.response.model`, `gen_ai.operation.name` | +| `gen_ai_server_time_to_first_token` | Histogram | `gen_ai.provider.name`, `gen_ai.response.model`, `gen_ai.operation.name` | +| `gen_ai_server_time_per_output_token` | Histogram | `gen_ai.provider.name`, `gen_ai.response.model`, `gen_ai.operation.name` | + +#### Proposed SkyWalking Metrics + +**Gateway-level (Service) metrics:** + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Request CPM | count/min | `meter_envoy_ai_gw_request_cpm` | Requests per minute | +| Request Latency Avg | ms | `meter_envoy_ai_gw_request_latency_avg` | Average request duration | +| Request Latency Percentile | ms | `meter_envoy_ai_gw_request_latency_percentile` | P50/P75/P90/P95/P99 request duration | +| Input Tokens Rate | tokens/min | `meter_envoy_ai_gw_input_token_rate` | Input tokens per minute (total across all models) | +| Output Tokens Rate | tokens/min | `meter_envoy_ai_gw_output_token_rate` | Output tokens per minute (total across all models) | +| Total Tokens Rate | tokens/min | `meter_envoy_ai_gw_total_token_rate` | Total tokens per minute | +| TTFT Avg | ms | `meter_envoy_ai_gw_ttft_avg` | Average time to first token | +| TTFT Percentile | ms | `meter_envoy_ai_gw_ttft_percentile` | P50/P75/P90/P95/P99 time to first token | +| Time Per Output Token Avg | ms | `meter_envoy_ai_gw_tpot_avg` | Average inter-token latency | +| Time Per Output Token Percentile | ms | `meter_envoy_ai_gw_tpot_percentile` | P50/P75/P90/P95/P99 inter-token latency | +| Estimated Cost | cost/min | `meter_envoy_ai_gw_estimated_cost` | Estimated cost per minute (from token counts × config pricing) | + +**Per-provider breakdown metrics (labeled, within gateway service):** + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Provider Request CPM | count/min | `meter_envoy_ai_gw_provider_request_cpm` | Requests per minute by provider | +| Provider Token Usage | tokens/min | `meter_envoy_ai_gw_provider_token_rate` | Token rate by provider and token type | +| Provider Latency Avg | ms | `meter_envoy_ai_gw_provider_latency_avg` | Average latency by provider | + +**Per-model breakdown metrics (labeled, within gateway service):** + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Model Request CPM | count/min | `meter_envoy_ai_gw_model_request_cpm` | Requests per minute by model | +| Model Token Usage | tokens/min | `meter_envoy_ai_gw_model_token_rate` | Token rate by model and token type | +| Model Latency Avg | ms | `meter_envoy_ai_gw_model_latency_avg` | Average latency by model | +| Model TTFT Avg | ms | `meter_envoy_ai_gw_model_ttft_avg` | Average TTFT by model | +| Model TPOT Avg | ms | `meter_envoy_ai_gw_model_tpot_avg` | Average inter-token latency by model | + +#### Cost Estimation + +Reuse the same `gen-ai-config.yml` pricing configuration from PR #13745. The MAL rules will: +1. Keep total token counts (input + output) per model from `gen_ai_client_token_usage`. +2. Look up per-million-token pricing from config. +3. Compute `estimated_cost = input_tokens × input_cost_per_m / 1_000_000 + output_tokens × output_cost_per_m / 1_000_000`. +4. Amplify by 10^6 (same as PR #13745) to avoid floating point precision issues. + +No new MAL function is needed — standard arithmetic operations on counters/gauges are sufficient. + +#### Metrics vs Access Logs for Token Cost + +Both data sources provide token counts, but serve different cost analysis purposes: + +| Aspect | OTLP Metrics (MAL) | Access Logs (LAL) | +|---|---|---| +| **Granularity** | Aggregated counters — token sums over time windows | Per-request — exact token count for each individual call | +| **Cost output** | Cost **rate** (e.g., $X/minute) — good for trends and capacity planning | Cost **per request** (e.g., this call cost $0.03) — good for attribution and audit | +| **Precision** | Approximate (counter deltas over scrape intervals) | Exact (individual request values) | +| **Use case** | Dashboard trends, billing estimates, provider comparison | Detect expensive individual requests, cost anomaly alerting, per-user/per-session attribution | + +The metrics path provides aggregated cost trends. The access log path enables per-request cost +analysis — for example, alerting on a single request that consumed an unusually large number of tokens +(e.g., a runaway prompt). Both paths reuse the same `gen-ai-config.yml` pricing data. + +### 4. Access Log Collection via OTLP + +The AI Gateway natively supports an OTLP access log sink. When `OTEL_LOGS_EXPORTER=otlp` (or defaulting +to OTLP when `OTEL_EXPORTER_OTLP_ENDPOINT` is set), Envoy pushes structured access logs directly via +OTLP gRPC to the same endpoint as metrics. No FluentBit or external log collector is needed. + +#### AI Gateway Configuration + +The OTLP log sink shares the same `GatewayConfig` CRD env vars as metrics (see Section 2). +`OTEL_LOGS_EXPORTER=otlp` and `OTEL_EXPORTER_OTLP_ENDPOINT` enable the log sink. The +`OTEL_RESOURCE_ATTRIBUTES` (including `aigw.service` and `service.instance.id`) are injected as +resource attributes on each OTLP log record, ensuring consistency between metrics and access logs. + +Additionally, enable token metadata population in `AIGatewayRoute` so token counts appear in access logs: +```yaml +apiVersion: aigateway.envoyproxy.io/v1alpha1 +kind: AIGatewayRoute +spec: + llmRequestCosts: + - metadataKey: llm_input_token + type: InputToken + - metadataKey: llm_output_token + type: OutputToken + - metadataKey: llm_total_token + type: TotalToken +``` + +#### OTLP Log Record Structure (Verified) + +Each access log record is pushed as an OTLP LogRecord with the following structure: + +**Resource attributes** (from `OTEL_RESOURCE_ATTRIBUTES` + Envoy metadata): + +| Attribute | Example | Notes | +|---|---|---| +| `aigw.service` | `envoy-ai-gateway-basic` | From `OTEL_RESOURCE_ATTRIBUTES` — SkyWalking service name | +| `service.instance.id` | `aigw-pod-7b9f4d8c5` | From `OTEL_RESOURCE_ATTRIBUTES` — SkyWalking instance name | +| `service.name` | `envoy-ai-gateway` | From `OTEL_SERVICE_NAME` — mapped to `job_name` for rule routing | +| `node_name` | `default-aigw-run-85f8cf28` | Envoy node identifier | +| `cluster_name` | `default/aigw-run` | Envoy cluster name | + +**Log record attributes** (per-request, LLM traffic): + +| Attribute | Example | Description | +|---|---|---| +| `gen_ai.request.model` | `llama3.2:latest` | Original requested model | +| `gen_ai.response.model` | `llama3.2:latest` | Actual model from response | +| `gen_ai.provider.name` | `openai` | Backend provider name | +| `gen_ai.usage.input_tokens` | `31` | Input token count | +| `gen_ai.usage.output_tokens` | `4` | Output token count | +| `session.id` | `sess-abc123` | Session identifier (if set via header mapping) | +| `response_code` | `200` | HTTP status code | +| `duration` | `1835` | Request duration (ms) | +| `request.path` | `/v1/chat/completions` | API path | +| `connection_termination_details` | `-` | Envoy connection termination reason | +| `upstream_transport_failure_reason` | `-` | Upstream failure reason | + +Note: `total_tokens` is not a separate field in the OTLP log — it equals `input_tokens + output_tokens` +and can be computed in LAL rules. `connection_termination_details` and `upstream_transport_failure_reason` +serve as error/timeout indicators (replacing `response_flags` from the file-based log format). + +**Log record attributes** (per-request, MCP traffic): + +| Attribute | Example | Description | +|---|---|---| +| `mcp.method.name` | `tools/call` | MCP method name | +| `mcp.provider.name` | `kiwi` | MCP provider identifier | +| `jsonrpc.request.id` | `1` | JSON-RPC request ID | +| `mcp.session.id` | `sess-xyz` | MCP session ID | + +#### LAL Rules — Sampling Policy + +Create `oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml` to process the OTLP +access logs. + +**Sampling strategy:** Not all access logs need to be stored — only those that indicate abnormal or +expensive requests. The LAL rules apply the following sampling policy: + +1. **High token cost** — persist logs where `input_tokens + output_tokens >= threshold` (default 10,000). +2. **Error responses** — always persist logs with `response_code >= 400`. +3. **Slow/timeout requests** — always persist logs where `duration` exceeds a configurable timeout + threshold, or where `connection_termination_details` / `upstream_transport_failure_reason` indicate + upstream failures. LLM requests are inherently slow (especially streaming), so timeout sampling is + important for diagnosing provider availability issues. + +All other access logs are dropped to avoid storage bloat. + +**Industry token usage reference** (from [OpenRouter State of AI 2025](https://openrouter.ai/state-of-ai), +100 trillion token study): + +| Use Case | Avg Input Tokens | Avg Output Tokens | Avg Total | +|---|---|---|---| +| Simple chat/Q&A | 500–1,000 | 200–400 | ~1,000 | +| Customer support | 500–3,000 | 300–400 | ~2,500 | +| RAG applications | 3,000–4,000 | 300–500 | ~3,500 | +| Programming/code | 6,000–20,000+ | 400–1,500 | ~10,000+ | +| Overall average (2025) | ~6,000 | ~400 | ~6,400 | + +Note: The overall average is heavily skewed by programming workloads. Non-programming use cases +(chat, RAG, support) typically fall in the 1,000–3,500 total token range. + +**Default sampling threshold: 10,000 total tokens** (configurable). This is approximately 3× the +non-programming median (~3,000), which captures genuinely expensive or abnormal requests without +logging every routine call. The threshold is configurable to accommodate different workload profiles: +- Lower (e.g., 5,000) for chat-heavy deployments where most requests are short. +- Higher (e.g., 30,000) for code-generation-heavy deployments where large prompts are normal. + +The LAL rules would: +1. Extract AI metadata (`gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.request.model`, + `gen_ai.provider.name`) from OTLP log record attributes. +2. Compute `total_tokens = input_tokens + output_tokens`. +3. Associate logs with the gateway service and instance using resource attributes (`service.name`, + `service.instance.id`) in the `ENVOY_AI_GATEWAY` layer. +4. **Apply sampling**: persist only logs matching at least one of: + - `total_tokens >= 10,000` (configurable threshold) + - `response_code >= 400` + - `duration >= timeout_threshold` or non-empty `upstream_transport_failure_reason` + +### 5. UI Dashboard + +**OAP side** — Create dashboard JSON templates under +`oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/`: +- `envoy-ai-gateway-root.json` — Root list view of all AI Gateway services. +- `envoy-ai-gateway-service.json` — Service dashboard: Request CPM, latency, token rates, TTFT, TPOT, + estimated cost, with provider and model breakdown panels. +- `envoy-ai-gateway-instance.json` — Instance (pod) level dashboard. + +**UI side** — A separate PR in [skywalking-booster-ui](https://github.com/apache/skywalking-booster-ui) +is needed for i18n menu entries (similar to +[skywalking-booster-ui#534](https://github.com/apache/skywalking-booster-ui/pull/534) for Virtual GenAI). +The menu entry should be added under the infrastructure/gateway category. + +## Imported Dependencies libs and their licenses. +No new dependency. The AI Gateway pushes both metrics and access logs via OTLP to SkyWalking's +existing otel-receiver. + +## Compatibility +- New layer `ENVOY_AI_GATEWAY` — no breaking change, additive only. +- New MAL rules — opt-in via configuration. +- New LAL rules for OTLP access logs — opt-in via configuration. +- Reuses existing `gen-ai-config.yml` for cost estimation (shared with agent-based GenAI from PR #13745). +- No changes to query protocol or storage structure — uses existing meter and log storage. +- No external log collector (FluentBit, etc.) required — access logs are pushed via OTLP. + +## General usage docs + +### Prerequisites +- Envoy AI Gateway deployed with the `GatewayConfig` CRD configured (see Section 2 for the full + env var setup including `OTEL_SERVICE_NAME`, `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_RESOURCE_ATTRIBUTES`). + +### Step 1: Configure Envoy AI Gateway + +Apply the `GatewayConfig` CRD from Section 2 to your AI Gateway deployment. Key env vars: + +| Env Var | Value | Purpose | +|---|---|---| +| `OTEL_SERVICE_NAME` | `envoy-ai-gateway` | Routes metrics/logs to correct MAL/LAL rules via `job_name` (fixed for all deployments) | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://skywalking-oap:11800` | SkyWalking OAP OTLP receiver | +| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | OTLP transport | +| `OTEL_METRICS_EXPORTER` | `otlp` | Enable OTLP metrics push | +| `OTEL_LOGS_EXPORTER` | `otlp` | Enable OTLP access log push | +| `GATEWAY_NAME` | (auto from label) | Auto-resolved from pod label `gateway.envoyproxy.io/owning-gateway-name` | +| `POD_NAME` | (auto from Downward API) | Auto-resolved from pod name | +| `OTEL_RESOURCE_ATTRIBUTES` | `aigw.service=$(GATEWAY_NAME),service.instance.id=$(POD_NAME)` | SkyWalking service name (auto) + instance ID (auto) | + +### Step 2: Configure SkyWalking OAP + +Enable the OTel receiver, MAL rules, and LAL rules in `application.yml`: +```yaml +receiver-otel: + selector: ${SW_OTEL_RECEIVER:default} + default: + enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-metrics,otlp-logs"} + enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"envoy-ai-gateway"} + +log-analyzer: + selector: ${SW_LOG_ANALYZER:default} + default: + lalFiles: ${SW_LOG_LAL_FILES:"envoy-ai-gateway"} +``` + +### Cost Estimation + +Update `gen-ai-config.yml` with pricing for the models served through the AI Gateway. +The same config file is shared with agent-based GenAI monitoring. + +## Appendix A: OTLP Payload Verification + +The following data was verified by capturing raw OTLP payloads from the AI Gateway +(`envoyproxy/ai-gateway-cli:latest` Docker image) via an OTel Collector debug exporter. + +#### Resource Attributes + +With `OTEL_RESOURCE_ATTRIBUTES=service.instance.id=test-instance-456` and +`OTEL_SERVICE_NAME=aigw-test-service`: + +| Attribute | Value | Notes | +|---|---|---| +| `service.instance.id` | `test-instance-456` | Set via `OTEL_RESOURCE_ATTRIBUTES` — **confirmed working** | +| `service.name` | `aigw-test-service` | Set via `OTEL_SERVICE_NAME` env var | +| `telemetry.sdk.language` | `go` | SDK metadata | +| `telemetry.sdk.name` | `opentelemetry` | SDK metadata | +| `telemetry.sdk.version` | `1.40.0` | SDK metadata | + +**Not present by default (without explicit env config):** `service.instance.id`, `aigw.service`, `host.name`. +These must be explicitly set via `OTEL_RESOURCE_ATTRIBUTES` in the `GatewayConfig` CRD (see Section 2). + +`resource.WithFromEnv()` (source: `internal/metrics/metrics.go:35-94`) is called inside a conditional +block that requires `OTEL_EXPORTER_OTLP_ENDPOINT` to be set. When configured, `OTEL_RESOURCE_ATTRIBUTES` +is fully honored. + +#### Metric-Level Attributes (Labels) + +All 4 metrics carry: + +| Label | Example Value | Notes | +|---|---|---| +| `gen_ai.operation.name` | `chat` | Operation type | +| `gen_ai.original.model` | `llama3.2:latest` | Original model from request | +| `gen_ai.provider.name` | `openai` | Backend provider name. In K8s mode with explicit backend routing, this is the configured backend name. | +| `gen_ai.request.model` | `llama3.2:latest` | Requested model | +| `gen_ai.response.model` | `llama3.2:latest` | Model from response | +| `gen_ai.token.type` | `input` / `output` / `cached_input` / `cache_creation_input` | Only on `gen_ai.client.token.usage`. **No `total` value** — total must be computed. `cached_input` and `cache_creation_input` are for Anthropic-style prompt caching. | + +#### Metric Names and Types + +| OTLP Metric Name | Type | Unit | Temporality | +|---|---|---|---| +| `gen_ai.client.token.usage` | **Histogram** (not Counter!) | `token` | **Delta** | +| `gen_ai.server.request.duration` | Histogram | `s` (seconds, not ms!) | Delta | +| `gen_ai.server.time_to_first_token` | Histogram | `s` | Delta (streaming only) | +| `gen_ai.server.time_per_output_token` | Histogram | `s` | Delta (streaming only) | + +**Key findings:** +1. Token usage is a **Histogram**, not a Counter — Sum/Count/Min/Max available per bucket. +2. Duration is in **seconds** — MAL rules must multiply by 1000 for ms display. +3. Temporality is **Delta** — MAL needs `increase()` semantics, not `rate()`. +4. TTFT and TPOT **only appear for streaming requests** — non-streaming produces only token.usage + request.duration. +5. **Dots in metric names** — OTLP uses dots (`gen_ai.client.token.usage`), Prometheus converts to underscores. + +#### Histogram Bucket Boundaries (verified from source: `internal/metrics/genai.go`) + +Token usage (14 boundaries, power-of-4): +`1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864` + +Request duration (14 boundaries, power-of-2 seconds): +`0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96, 81.92` + +TTFT (21 boundaries, finer granularity for streaming): +`0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, 15.0, 20.0, 30.0, 45.0, 60.0` + +TPOT (13 boundaries, finest granularity): +`0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 2.5` + +#### Impact on Implementation + +| Finding | Impact | +|---|---| +| No `service.instance.id` by default | `OTEL_RESOURCE_ATTRIBUTES=service.instance.id=<value>` **works** when OTLP exporter is configured (verified). MAL rules should treat instance as optional and document `OTEL_RESOURCE_ATTRIBUTES` configuration. | +| `gen_ai.provider.name` = backend name | In K8s mode with explicit backend config, this is the configured backend name. | +| Token usage is Histogram | MAL uses histogram sum/count, not counter value. | +| Delta temporality | SkyWalking OTel receiver must handle delta-to-cumulative conversion. | +| Duration in seconds | MAL rules multiply by 1000 for ms-based metrics. | +| TTFT/TPOT streaming-only | Dashboard should note these metrics may be absent for non-streaming workloads. | + +#### Bonus: Traces Also Pushed + +The AI Gateway also pushes OpenInference traces via OTLP, including full request/response content +in span attributes (`llm.input_messages`, `llm.output_messages`, `llm.token_count.*`). This is a +potential future integration point but out of scope for this SWIP. + +## Appendix B: Raw OTLP Metric Data (Verified) + +Captured from OTel Collector debug exporter. This is the actual OTLP payload from `envoyproxy/ai-gateway-cli:latest`. + +### Resource Attributes +``` +Resource SchemaURL: https://opentelemetry.io/schemas/1.39.0 +Resource attributes: + -> service.instance.id: Str(test-instance-456) + -> service.name: Str(aigw-test-service) + -> telemetry.sdk.language: Str(go) + -> telemetry.sdk.name: Str(opentelemetry) + -> telemetry.sdk.version: Str(1.40.0) +``` + +`OTEL_RESOURCE_ATTRIBUTES=service.instance.id=<value>` **is honored** when an OTLP exporter is configured +(i.e., `OTEL_EXPORTER_OTLP_ENDPOINT` is set). Without an OTLP endpoint, the resource block is skipped and +only the Prometheus reader is used (which does not carry resource attributes per-metric). + +### InstrumentationScope +``` +ScopeMetrics SchemaURL: +InstrumentationScope envoyproxy/ai-gateway +``` + +### Metric 1: gen_ai.client.token.usage (input tokens) +``` +Name: gen_ai.client.token.usage +Description: Number of tokens processed. +Unit: token +DataType: Histogram +AggregationTemporality: Delta + +Data point attributes: + -> gen_ai.operation.name: Str(chat) + -> gen_ai.original.model: Str(llama3.2:latest) + -> gen_ai.provider.name: Str(openai) + -> gen_ai.request.model: Str(llama3.2:latest) + -> gen_ai.response.model: Str(llama3.2:latest) + -> gen_ai.token.type: Str(input) +Count: 1 +Sum: 31.000000 +Min: 31.000000 +Max: 31.000000 +ExplicitBounds: [1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864] +``` + +### Metric 1b: gen_ai.client.token.usage (output tokens) +``` +Data point attributes: + -> gen_ai.token.type: Str(output) + (other attributes same as above) +Count: 1 +Sum: 3.000000 +``` + +### Metric 2: gen_ai.server.request.duration +``` +Name: gen_ai.server.request.duration +Description: Generative AI server request duration such as time-to-last byte or last output token. +Unit: s +DataType: Histogram +AggregationTemporality: Delta + +Data point attributes: + -> gen_ai.operation.name: Str(chat) + -> gen_ai.original.model: Str(llama3.2:latest) + -> gen_ai.provider.name: Str(openai) + -> gen_ai.request.model: Str(llama3.2:latest) + -> gen_ai.response.model: Str(llama3.2:latest) +Count: 1 +Sum: 10.432428 +ExplicitBounds: [0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96, 81.92] +``` + +### Metric 3: gen_ai.server.time_to_first_token (streaming only) +``` +Name: gen_ai.server.time_to_first_token +Description: Time to receive first token in streaming responses. +Unit: s +DataType: Histogram +AggregationTemporality: Delta +(Same attributes as request.duration, excluding gen_ai.token.type) +ExplicitBounds (from source code): [0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5, + 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, 15.0, 20.0, 30.0, 45.0, 60.0] +``` + +### Metric 4: gen_ai.server.time_per_output_token (streaming only) +``` +Name: gen_ai.server.time_per_output_token +Description: Time per output token generated after the first token for successful responses. +Unit: s +DataType: Histogram +AggregationTemporality: Delta +(Same attributes as request.duration, excluding gen_ai.token.type) +ExplicitBounds (from source code): [0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, + 0.75, 1.0, 2.5] +``` + +## Appendix C: Access Log Format (from Envoy Config Dump) + +The AI Gateway auto-configures two access log entries on the listener (one for LLM, one for MCP). +Verified from `config_dump` of the AI Gateway. + +### LLM Access Log Format (JSON) + +Filter: `request.headers['x-ai-eg-model'] != ''` (only logs requests processed by the AI Gateway ext_proc) + +```json +{ + "start_time": "%START_TIME%", + "method": "%REQ(:METHOD)%", + "request.path": "%REQ(:PATH)%", + "x-envoy-origin-path": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%", + "response_code": "%RESPONSE_CODE%", + "duration": "%DURATION%", + "bytes_received": "%BYTES_RECEIVED%", + "bytes_sent": "%BYTES_SENT%", + "user-agent": "%REQ(USER-AGENT)%", + "x-request-id": "%REQ(X-REQUEST-ID)%", + "x-forwarded-for": "%REQ(X-FORWARDED-FOR)%", + "x-envoy-upstream-service-time": "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%", + "upstream_host": "%UPSTREAM_HOST%", + "upstream_cluster": "%UPSTREAM_CLUSTER%", + "upstream_local_address": "%UPSTREAM_LOCAL_ADDRESS%", + "upstream_transport_failure_reason": "%UPSTREAM_TRANSPORT_FAILURE_REASON%", + "downstream_remote_address": "%DOWNSTREAM_REMOTE_ADDRESS%", + "downstream_local_address": "%DOWNSTREAM_LOCAL_ADDRESS%", + "connection_termination_details": "%CONNECTION_TERMINATION_DETAILS%", + "gen_ai.request.model": "%REQ(X-AI-EG-MODEL)%", + "gen_ai.response.model": "%DYNAMIC_METADATA(io.envoy.ai_gateway:model_name_override)%", + "gen_ai.provider.name": "%DYNAMIC_METADATA(io.envoy.ai_gateway:backend_name)%", + "gen_ai.usage.input_tokens": "%DYNAMIC_METADATA(io.envoy.ai_gateway:llm_input_token)%", + "gen_ai.usage.output_tokens": "%DYNAMIC_METADATA(io.envoy.ai_gateway:llm_output_token)%", + "session.id": "%DYNAMIC_METADATA(io.envoy.ai_gateway:session.id)%" +} +``` + +**Code review corrections** (source: `internal/metrics/genai.go`, `examples/access-log/basic.yaml`, +`site/docs/capabilities/observability/accesslogs.md`): +- `response_flags` (`%RESPONSE_FLAGS%`) IS documented in AI Gateway access log docs and used in tests, + but not in the default config. Can be added via `EnvoyProxy` resource if needed. +- `gen_ai.usage.total_tokens` IS supported via `%DYNAMIC_METADATA(io.envoy.ai_gateway:llm_total_token)%` + when `AIGatewayRoute.spec.llmRequestCosts` includes `type: TotalToken`. +- Access log format is **user-configurable** via `EnvoyProxy` resource, not hardcoded by the AI Gateway. + The AI Gateway only populates dynamic metadata; users define which fields appear in logs. +- Additional token cost types beyond input/output/total: `CachedInputToken` and `CacheCreationInputToken` + (for Anthropic-style prompt caching, stored as `llm_cached_input_token` and + `llm_cache_creation_input_token` in dynamic metadata). + +### MCP Access Log Format (JSON) + +Filter: `request.headers['x-ai-eg-mcp-backend'] != ''` + +```json +{ + "start_time": "%START_TIME%", + "method": "%REQ(:METHOD)%", + "request.path": "%REQ(:PATH)%", + "response_code": "%RESPONSE_CODE%", + "duration": "%DURATION%", + "mcp.method.name": "%DYNAMIC_METADATA(io.envoy.ai_gateway:mcp_method)%", + "mcp.provider.name": "%DYNAMIC_METADATA(io.envoy.ai_gateway:mcp_backend)%", + "mcp.session.id": "%REQ(MCP-SESSION-ID)%", + "jsonrpc.request.id": "%DYNAMIC_METADATA(io.envoy.ai_gateway:mcp_request_id)%", + "session.id": "%DYNAMIC_METADATA(io.envoy.ai_gateway:session.id)%" +} +``` + diff --git a/docs/en/swip/SWIP-10/kind-test-resources.yaml b/docs/en/swip/SWIP-10/kind-test-resources.yaml new file mode 100644 index 0000000000..ff5d5bd790 --- /dev/null +++ b/docs/en/swip/SWIP-10/kind-test-resources.yaml @@ -0,0 +1,247 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# SWIP-10 Kind Test Resources +# Deploy with: kubectl apply -f kind-test-resources.yaml +# +# This file contains all K8s resources for the SWIP-10 local verification: +# - Ollama (in-cluster LLM backend) +# - OTel Collector (debug exporter for capturing OTLP payloads) +# - AI Gateway CRDs (GatewayClass, GatewayConfig, Gateway, AIGatewayRoute, AIServiceBackend, Backend) + +# --- Ollama (in-cluster) --- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ollama + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: ollama + template: + metadata: + labels: + app: ollama + spec: + containers: + - name: ollama + image: ollama/ollama:latest + imagePullPolicy: Never + ports: + - containerPort: 11434 + resources: + requests: + cpu: "500m" + memory: "2Gi" +--- +apiVersion: v1 +kind: Service +metadata: + name: ollama + namespace: default +spec: + selector: + app: ollama + ports: + - port: 11434 + targetPort: 11434 +--- +# --- OTel Collector (debug exporter) --- +apiVersion: v1 +kind: ConfigMap +metadata: + name: otel-collector-config + namespace: default +data: + config.yaml: | + receivers: + otlp: + protocols: + grpc: + endpoint: 0.0.0.0:4317 + exporters: + debug: + verbosity: detailed + service: + pipelines: + metrics: + receivers: [otlp] + exporters: [debug] + logs: + receivers: [otlp] + exporters: [debug] +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: otel-collector + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: otel-collector + template: + metadata: + labels: + app: otel-collector + spec: + containers: + - name: collector + image: otel/opentelemetry-collector:latest + imagePullPolicy: Never + ports: + - containerPort: 4317 + volumeMounts: + - name: config + mountPath: /etc/otelcol/config.yaml + subPath: config.yaml + volumes: + - name: config + configMap: + name: otel-collector-config +--- +apiVersion: v1 +kind: Service +metadata: + name: otel-collector + namespace: default +spec: + selector: + app: otel-collector + ports: + - port: 4317 + targetPort: 4317 +--- +# --- AI Gateway CRDs --- +# 1. GatewayClass +apiVersion: gateway.networking.k8s.io/v1 +kind: GatewayClass +metadata: + name: envoy-ai-gateway +spec: + controllerName: gateway.envoyproxy.io/gatewayclass-controller +--- +# 2. GatewayConfig — OTLP configuration for SkyWalking +# Verified: GATEWAY_NAME auto-resolves from pod label +# gateway.envoyproxy.io/owning-gateway-name via Downward API +apiVersion: aigateway.envoyproxy.io/v1alpha1 +kind: GatewayConfig +metadata: + name: sw-test-config + namespace: default +spec: + extProc: + kubernetes: + env: + # job_name for MAL/LAL rule routing (fixed for all deployments) + - name: OTEL_SERVICE_NAME + value: "envoy-ai-gateway" + # OTLP endpoint — OTel Collector (or SkyWalking OAP in production) + - name: OTEL_EXPORTER_OTLP_ENDPOINT + value: "http://otel-collector.default:4317" + - name: OTEL_EXPORTER_OTLP_PROTOCOL + value: "grpc" + # Enable OTLP for both metrics and access logs + - name: OTEL_METRICS_EXPORTER + value: "otlp" + - name: OTEL_LOGS_EXPORTER + value: "otlp" + - name: OTEL_METRIC_EXPORT_INTERVAL + value: "5000" + # Gateway name = Gateway CRD metadata.name (e.g., "my-ai-gateway") + # Read from pod label gateway.envoyproxy.io/owning-gateway-name, + # which is auto-set by the Envoy Gateway controller on every envoy pod. + - name: GATEWAY_NAME + valueFrom: + fieldRef: + fieldPath: metadata.labels['gateway.envoyproxy.io/owning-gateway-name'] + # Pod name (e.g., "envoy-default-my-ai-gateway-76d02f2b-xxx") + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + # aigw.service → SkyWalking service name (= Gateway CRD name, auto-resolved) + # service.instance.id → SkyWalking instance name (= pod name, auto-resolved) + # $(VAR) substitution references the valueFrom env vars defined above. + - name: OTEL_RESOURCE_ATTRIBUTES + value: "aigw.service=$(GATEWAY_NAME),service.instance.id=$(POD_NAME)" +--- +# 3. Gateway — references GatewayConfig via annotation +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: my-ai-gateway + namespace: default + annotations: + aigateway.envoyproxy.io/gateway-config: sw-test-config +spec: + gatewayClassName: envoy-ai-gateway + listeners: + - name: http + protocol: HTTP + port: 80 +--- +# 4. AIGatewayRoute — routing + token metadata for access logs +apiVersion: aigateway.envoyproxy.io/v1alpha1 +kind: AIGatewayRoute +metadata: + name: my-ai-gateway-route + namespace: default +spec: + parentRefs: + - name: my-ai-gateway + kind: Gateway + group: gateway.networking.k8s.io + llmRequestCosts: + - metadataKey: llm_input_token + type: InputToken + - metadataKey: llm_output_token + type: OutputToken + - metadataKey: llm_total_token + type: TotalToken + rules: + - backendRefs: + - name: ollama-backend +--- +# 5. AIServiceBackend + Backend — Ollama in-cluster +apiVersion: aigateway.envoyproxy.io/v1alpha1 +kind: AIServiceBackend +metadata: + name: ollama-backend + namespace: default +spec: + schema: + name: OpenAI + prefix: "/v1" + backendRef: + name: ollama-backend + kind: Backend + group: gateway.envoyproxy.io +--- +apiVersion: gateway.envoyproxy.io/v1alpha1 +kind: Backend +metadata: + name: ollama-backend + namespace: default +spec: + endpoints: + - fqdn: + hostname: ollama.default.svc.cluster.local + port: 11434 diff --git a/docs/en/swip/SWIP-10/kind-test-setup.sh b/docs/en/swip/SWIP-10/kind-test-setup.sh new file mode 100644 index 0000000000..4fd3afcc46 --- /dev/null +++ b/docs/en/swip/SWIP-10/kind-test-setup.sh @@ -0,0 +1,108 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# SWIP-10 Local Verification: Envoy AI Gateway + SkyWalking OTLP on Kind +# +# Prerequisites: +# - kind, kubectl, helm, docker installed +# - Docker images pulled (or internet access for Kind to pull) +# +# This script sets up a Kind cluster with: +# - Envoy Gateway (v1.3.3) + AI Gateway controller (v0.5.0) +# - Ollama (in-cluster) with a small model +# - OTel Collector (debug exporter) to capture OTLP metrics and logs +# - AI Gateway configured with SkyWalking-compatible OTLP resource attributes +# +# Usage: +# ./kind-test-setup.sh # Full setup +# ./kind-test-setup.sh cleanup # Delete the cluster + +set -e + +CLUSTER_NAME="aigw-swip10-test" + +if [ "$1" = "cleanup" ]; then + echo "Cleaning up..." + kind delete cluster --name $CLUSTER_NAME + exit 0 +fi + +echo "=== Step 1: Create Kind cluster ===" +kind create cluster --name $CLUSTER_NAME + +echo "=== Step 2: Pre-load Docker images ===" +IMAGES=( + "envoyproxy/ai-gateway-controller:v0.5.0" + "envoyproxy/ai-gateway-extproc:v0.5.0" + "envoyproxy/gateway:v1.3.3" + "envoyproxy/envoy:distroless-v1.33.3" + "otel/opentelemetry-collector:latest" + "ollama/ollama:latest" +) +for img in "${IMAGES[@]}"; do + echo "Pulling $img..." + docker pull "$img" + echo "Loading $img into Kind..." + kind load docker-image "$img" --name $CLUSTER_NAME +done + +echo "=== Step 3: Install Envoy Gateway ===" +# enableBackend is required for Backend resources used by AIServiceBackend +helm install eg oci://docker.io/envoyproxy/gateway-helm \ + --version v1.3.3 -n envoy-gateway-system --create-namespace \ + --set config.envoyGateway.extensionApis.enableBackend=true +kubectl wait --for=condition=available deployment/envoy-gateway \ + -n envoy-gateway-system --timeout=120s + +echo "=== Step 4: Install AI Gateway ===" +helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm \ + --namespace envoy-ai-gateway-system --create-namespace +helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm \ + --namespace envoy-ai-gateway-system --create-namespace +kubectl wait --for=condition=available deployment/ai-gateway-controller \ + -n envoy-ai-gateway-system --timeout=120s + +echo "=== Step 5: Deploy test resources ===" +kubectl apply -f kind-test-resources.yaml + +echo "=== Step 6: Wait for pods ===" +sleep 10 +kubectl wait --for=condition=available deployment/ollama -n default --timeout=120s +kubectl wait --for=condition=available deployment/otel-collector -n default --timeout=60s + +echo "=== Step 7: Pull Ollama model ===" +OLLAMA_POD=$(kubectl get pod -l app=ollama -o jsonpath='{.items[0].metadata.name}') +kubectl exec "$OLLAMA_POD" -- ollama pull qwen2.5:0.5b + +echo "=== Step 8: Wait for Envoy pod ===" +sleep 30 +kubectl get pods -A + +echo "" +echo "=== Setup complete ===" +echo "To test:" +echo " kubectl port-forward -n envoy-gateway-system svc/envoy-default-my-ai-gateway-76d02f2b 8080:80 &" +echo " curl -s --noproxy '*' http://localhost:8080/v1/chat/completions \\" +echo " -H 'Content-Type: application/json' \\" +echo " -d '{\"model\":\"qwen2.5:0.5b\",\"messages\":[{\"role\":\"user\",\"content\":\"Say hi\"}]}'" +echo "" +echo "To check OTLP output:" +echo " kubectl logs -l app=otel-collector | grep -A 20 'ResourceMetrics\\|ResourceLog'" +echo "" +echo "To cleanup:" +echo " ./kind-test-setup.sh cleanup" diff --git a/docs/en/swip/readme.md b/docs/en/swip/readme.md index 0cf9f8cc43..50121cbe29 100644 --- a/docs/en/swip/readme.md +++ b/docs/en/swip/readme.md @@ -68,10 +68,11 @@ All accepted and proposed SWIPs can be found in [here](https://github.com/apache ## Known SWIPs -Next SWIP Number: 10 +Next SWIP Number: 11 ### Accepted SWIPs +- [SWIP-10 Support Envoy AI Gateway Observability](SWIP-10/SWIP.md) - [SWIP-9 Support Flink Monitoring](SWIP-9.md) - [SWIP-8 Support Kong Monitoring](SWIP-8.md) - [SWIP-6 Support ActiveMQ Monitoring](SWIP-6.md)
