This is an automated email from the ASF dual-hosted git repository. wusheng pushed a commit to branch fix/otlp-traces-e2e-stability in repository https://gitbox.apache.org/repos/asf/skywalking.git
commit ec1e8107fbe1ce1082c8140394d2b86ec9538adb Author: Wu Sheng <[email protected]> AuthorDate: Fri Mar 20 09:49:32 2026 +0800 Fix OTLP traces e2e test stability and add e2e expectation docs 1. Fix docker-compose for OTLP traces e2e test: - Add health checks to frontend and productcatalogservice containers - Use depends_on with condition: service_healthy for proper startup ordering - Replace no.exist:80 endpoints with reachable addresses to avoid DNS/connection timeouts - Increase memory limits (200M->300M frontend, 20M->40M productcatalogservice) 2. Add e2e expectation specification documents covering all query protocols: - Core template syntax (contains, notEmpty, gt/ge/lt/le, b64enc, regexp) - GraphQL/MQE with .graphqls schema references - LogQL, PromQL, TraceQL, Zipkin v2, Status/Debug endpoints - CLAUDE.md guide for navigating e2e test structure --- test/e2e-v2/CLAUDE.md | 111 ++++++ test/e2e-v2/cases/otlp-traces/docker-compose.yml | 27 +- test/e2e-v2/e2e-expectation-graphql.md | 418 +++++++++++++++++++++++ test/e2e-v2/e2e-expectation-logql.md | 120 +++++++ test/e2e-v2/e2e-expectation-promql.md | 227 ++++++++++++ test/e2e-v2/e2e-expectation-spec.md | 215 ++++++++++++ test/e2e-v2/e2e-expectation-status-debug.md | 213 ++++++++++++ test/e2e-v2/e2e-expectation-traceql-zipkin.md | 357 +++++++++++++++++++ test/e2e-v2/e2e-expectation-zipkin.md | 186 ++++++++++ 9 files changed, 1867 insertions(+), 7 deletions(-) diff --git a/test/e2e-v2/CLAUDE.md b/test/e2e-v2/CLAUDE.md new file mode 100644 index 0000000000..377fbd9c72 --- /dev/null +++ b/test/e2e-v2/CLAUDE.md @@ -0,0 +1,111 @@ +# E2E-V2 Test Framework Guide + +## Overview + +SkyWalking uses [skywalking-infra-e2e](https://github.com/apache/skywalking-infra-e2e) for end-to-end testing. Tests follow a **Setup → Trigger → Verify → Cleanup** lifecycle, with expected files using Go templates for flexible result matching. + +## Directory Structure + +``` +test/e2e-v2/ +├── cases/ # Test scenarios (one dir per feature) +│ ├── <feature>/ +│ │ ├── e2e.yaml # Main config: setup, trigger, verify, cleanup +│ │ ├── <feature>-cases.yaml # Verify cases: query + expected pairs +│ │ ├── expected/ # Expected result files (Go templates) +│ │ │ └── *.yml +│ │ ├── docker-compose.yml # Infrastructure for Docker-based tests +│ │ └── <storage>/ # Storage-specific variants (es/, banyandb/, etc.) +│ │ ├── docker-compose.yml +│ │ └── e2e.yaml +│ └── ... +├── script/ # Shared scripts and environment setup +│ ├── env # Environment variables (SkyWalking version, etc.) +│ └── prepare/ # Setup scripts (yq, swctl installation) +└── java-test-service/ # Java test service implementations +``` + +## Key Files to Find + +| What you need | Where to look | +|---------------|---------------| +| Test entry point (config) | `cases/<feature>/e2e.yaml` or `cases/<feature>/<storage>/e2e.yaml` | +| Query + expected pairs | `cases/<feature>/*-cases.yaml` (referenced from e2e.yaml `verify.cases.includes`) | +| Expected result templates | `cases/<feature>/expected/*.yml` | +| Docker infrastructure | `cases/<feature>/docker-compose.yml` or `cases/<feature>/<storage>/docker-compose.yml` | +| Shared env variables | `script/env` | +| Test services source | `java-test-service/` | + +## Test Cases by Feature Area + +| Directory | Protocol | Description | +|-----------|----------|-------------| +| `simple/` | GraphQL (swctl) | Core service/endpoint/trace/metrics verification | +| `mqe/` | GraphQL (swctl) | Metrics Query Engine expression tests | +| `alarm/` | GraphQL (swctl) | Alarm rule verification | +| `logql/` | LogQL (curl) | Loki-compatible log query API | +| `promql/` | PromQL (curl) | Prometheus-compatible metrics API | +| `traceql/` | TraceQL (curl) | TraceQL search API via Zipkin endpoint | +| `zipkin/` | Zipkin v2 (curl) | Native Zipkin trace API | +| `log/` | GraphQL (swctl) | Log collection and analysis | +| `meter/` | GraphQL (swctl) | Meter/MAL metrics | +| `profiling/` | GraphQL (swctl) | Profiling (CPU, memory, network, eBPF) | +| `browser/` | GraphQL (swctl) | Browser/RUM monitoring | +| `event/` | GraphQL (swctl) | Event collection | +| `baseline/` | GraphQL (swctl) | Baseline metrics | + +## Query Protocols & Ports + +| Protocol | Port Variable | Base URL | Query Tool | Spec Doc | +|----------|---------------|----------|------------|----------| +| GraphQL / MQE | `${oap_12800}` | `http://${oap_host}:${oap_12800}/graphql` | `swctl` CLI | [graphql](e2e-expectation-graphql.md) | +| Status / Debug | `${oap_12800}` | `http://${oap_host}:${oap_12800}/debugging/` | `curl` | [status-debug](e2e-expectation-status-debug.md) | +| PromQL | `${oap_9090}` | `http://${oap_host}:${oap_9090}/api/v1/` | `curl` | [promql](e2e-expectation-promql.md) | +| LogQL | `${oap_3100}` | `http://${oap_host}:${oap_3100}/loki/api/v1/` | `curl` | [logql](e2e-expectation-logql.md) | +| TraceQL | `${oap_3200}` | `http://${oap_host}:${oap_3200}/zipkin/api/` | `curl` | [traceql](e2e-expectation-traceql-zipkin.md) | +| Zipkin v2 | `${oap_9412}` | `http://${oap_host}:${oap_9412}/zipkin/api/v2/` | `curl` | [zipkin](e2e-expectation-zipkin.md) | + +## GraphQL Schema Definitions + +All GraphQL types and queries are defined in `.graphqls` files at: +``` +oap-server/server-query-plugin/query-graphql-plugin/src/main/resources/query-protocol/ +``` +Key schemas: `common.graphqls` (shared types), `metadata-v2.graphqls` (services/instances/endpoints), `topology.graphqls` (dependency graphs), `trace.graphqls` + `trace-v2.graphqls` (distributed traces), `metrics-v3.graphqls` (MQE), `alarm.graphqls`, `log.graphqls`, `event.graphqls`, `hierarchy.graphqls`. See [e2e-expectation-graphql.md](e2e-expectation-graphql.md) for the full schema-to-expected-file mapping. + +## How Expected Files Work + +Expected files are **Go templates** that render with actual query results as context. After rendering, the framework compares the rendered YAML against the actual YAML using structural equality (`go-cmp`). + +### Expectation specification documents + +For detailed template syntax and protocol-specific patterns, see: +- [e2e-expectation-spec.md](e2e-expectation-spec.md) — Core template syntax and functions +- [e2e-expectation-graphql.md](e2e-expectation-graphql.md) — GraphQL/MQE expected file patterns (with `.graphqls` schema references) +- [e2e-expectation-logql.md](e2e-expectation-logql.md) — LogQL expected file patterns +- [e2e-expectation-promql.md](e2e-expectation-promql.md) — PromQL expected file patterns +- [e2e-expectation-traceql-zipkin.md](e2e-expectation-traceql-zipkin.md) — TraceQL (Tempo-compatible) expected file patterns +- [e2e-expectation-zipkin.md](e2e-expectation-zipkin.md) — Zipkin v2 native API expected file patterns +- [e2e-expectation-status-debug.md](e2e-expectation-status-debug.md) — Status & debugging HTTP endpoint patterns + +## Common Patterns + +### Adding a new e2e test case +1. Add query + expected pair in the `*-cases.yaml` file +2. Create expected file in `expected/` directory using Go template syntax +3. Use `contains` for unordered list matching, `notEmpty` for dynamic values, `b64enc` for IDs + +### Storage variants +Many features test across multiple storage backends. Each storage has its own `docker-compose.yml` and optionally its own `e2e.yaml`. Expected files are usually shared (referenced via relative paths like `../../expected/`). + +### Environment variables +All tests share `script/env` for version pinning. Common variables: +- `${oap_host}`, `${oap_12800}`, `${oap_9090}`, etc. — OAP server connection +- `${provider_host}`, `${consumer_host}` — Test service hosts +- `${provider_9090}`, `${consumer_9092}` — Test service ports + +### Running e2e tests locally +```bash +# From skywalking root, with infra-e2e installed: +e2e run -c test/e2e-v2/cases/<feature>/<storage>/e2e.yaml +``` diff --git a/test/e2e-v2/cases/otlp-traces/docker-compose.yml b/test/e2e-v2/cases/otlp-traces/docker-compose.yml index 6a73fa6738..069b57e573 100644 --- a/test/e2e-v2/cases/otlp-traces/docker-compose.yml +++ b/test/e2e-v2/cases/otlp-traces/docker-compose.yml @@ -26,7 +26,7 @@ services: deploy: resources: limits: - memory: 200M + memory: 300M restart: unless-stopped ports: - 8080 @@ -39,10 +39,17 @@ services: ENV_PLATFORM: local OTEL_SERVICE_NAME: frontend WEB_OTEL_SERVICE_NAME: frontend-web - CURRENCY_SERVICE_ADDR: no.exist:80 + CURRENCY_SERVICE_ADDR: productcatalogservice:3550 depends_on: - - oap - - productcatalogservice + oap: + condition: service_healthy + productcatalogservice: + condition: service_healthy + healthcheck: + test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/8080"] + interval: 5s + timeout: 60s + retries: 120 logging: *logging networks: - e2e @@ -52,7 +59,7 @@ services: deploy: resources: limits: - memory: 20M + memory: 40M restart: unless-stopped ports: - "3550" @@ -61,9 +68,15 @@ services: OTEL_EXPORTER_OTLP_ENDPOINT: http://oap:11800 OTEL_RESOURCE_ATTRIBUTES: service.namespace=opentelemetry-demo OTEL_SERVICE_NAME: productcatalogservice - FEATURE_FLAG_GRPC_SERVICE_ADDR: no.exist:80 + FEATURE_FLAG_GRPC_SERVICE_ADDR: productcatalogservice:3550 depends_on: - - oap + oap: + condition: service_healthy + healthcheck: + test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/3550"] + interval: 5s + timeout: 60s + retries: 120 logging: *logging networks: - e2e diff --git a/test/e2e-v2/e2e-expectation-graphql.md b/test/e2e-v2/e2e-expectation-graphql.md new file mode 100644 index 0000000000..49b64ecae1 --- /dev/null +++ b/test/e2e-v2/e2e-expectation-graphql.md @@ -0,0 +1,418 @@ +# GraphQL & MQE Expected File Patterns + +GraphQL queries use `swctl` CLI tool with `--display yaml` output. MQE (Metrics Query Engine) queries also go through the GraphQL endpoint. + +**Port:** `${oap_12800}` — Base URL: `http://${oap_host}:${oap_12800}/graphql` + +## GraphQL Schema Definitions + +All GraphQL types, queries, and inputs are defined in `.graphqls` files at: +``` +oap-server/server-query-plugin/query-graphql-plugin/src/main/resources/query-protocol/ +``` + +| Schema File | Key Types & Queries | +|-------------|-------------------| +| `common.graphqls` | `Duration`, `Pagination`, `Scope`, `DetectPoint`, `KeyValue`, `DebuggingTrace`, `HealthStatus` | +| `metadata-v2.graphqls` | `Service`(id,name,group,shortName,layers,normal), `ServiceInstance`(id,name,attributes,language,instanceUUID), `Endpoint`(id,name), `Process` — queries: `listLayers`, `listServices`, `listInstances`, `findEndpoint` | +| `topology.graphqls` | `Topology`(nodes,calls), `Node`(id,name,type,isReal,layers), `Call`(source,target,id,detectPoints) — queries: `getGlobalTopology`, `getServiceTopology`, `getServicesTopology`, `getServiceInstanceTopology`, `getEndpointDependencies`, `getProcessTopology` | +| `trace.graphqls` | `Span`(traceId,segmentId,spanId,parentSpanId,refs,serviceCode,serviceInstanceName,startTime,endTime,endpointName,type,peer,component,isError,layer,tags,logs,attachedEvents), `TraceBrief`, `BasicTrace` — queries: `queryBasicTraces`, `queryTrace` | +| `trace-v2.graphqls` | `TraceList`(traces,retrievedTimeRange,debuggingTrace), `TraceV2`(spans), `RetrievedTimeRange`(startTime,endTime) — query: `queryTraces` | +| `metrics-v3.graphqls` | `ExpressionResult`(type,results,error,debuggingTrace), `MQEValues`(metric,values), `MQEValue`(id,owner,value,traceID), `ExpressionResultType`(UNKNOWN,SINGLE_VALUE,TIME_SERIES_VALUES,SORTED_LIST,RECORD_LIST), `Entity` — query: `execExpression` | +| `alarm.graphqls` | `AlarmMessage`(startTime,scope,id,name,message,events,tags,snapshot), `AlarmSnapshot`(expression,metrics) — queries: `getAlarm`, `queryAlarmTagAutocompleteKeys`, `queryAlarmTagAutocompleteValues` | +| `log.graphqls` | `Log`(serviceName,serviceInstanceName,endpointName,traceId,timestamp,contentType,content,tags), `Logs`(errorReason,logs) — queries: `queryLogs`, `test` (LAL test) | +| `event.graphqls` | `Event`(uuid,source,name,type,message,parameters,startTime,endTime,layer), `Source`(service,serviceInstance,endpoint) — query: `queryEvents` | +| `hierarchy.graphqls` | `ServiceHierarchy`, `InstanceHierarchy`, `HierarchyServiceRelation`, `LayerLevel` — queries: `getServiceHierarchy`, `getInstanceHierarchy`, `listLayerLevels` | +| `profile.graphqls` | Profile task creation/query | +| `ebpf-profiling.graphqls` | eBPF profiling task creation/query | +| `continuous-profiling.graphqls` | Continuous profiling policy/task management | +| `async-profiler.graphqls` | Async profiler task management | +| `pprof.graphqls` | pprof profiling task management | +| `browser-log.graphqls` | Browser error log query | +| `record.graphqls` | Sampled record reading | +| `top-n-records.graphqls` | Top-N record query | +| `ondemand-pod-log.graphqls` | On-demand pod log query | +| `ui-configuration.graphqls` | UI dashboard template management | +| `metric.graphqls` | Legacy metric query (deprecated, use metrics-v3) | +| `metrics-v2.graphqls` | Metrics v2 query (deprecated, use metrics-v3) | +| `aggregation.graphqls` | Legacy aggregation query (deprecated) | +| `metadata.graphqls` | Legacy metadata query (deprecated, use metadata-v2) | + +## Service List + +**GraphQL schema:** `metadata-v2.graphqls` — `listServices(layer: String): [Service!]!` + +**Service type fields:** `id`, `name`, `group`, `shortName`, `layers`, `normal` + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql service ls +``` + +**Expected:** +```yaml +{{- contains . }} +- id: {{ b64enc "e2e-service-provider" }}.1 + name: e2e-service-provider + group: "" + shortname: e2e-service-provider + normal: true + layers: + - GENERAL +- id: {{ b64enc "e2e-service-consumer" }}.1 + name: e2e-service-consumer + group: "" + shortname: e2e-service-consumer + normal: true + layers: + - GENERAL +{{- end }} +``` + +## Layer List + +**GraphQL schema:** `metadata-v2.graphqls` — `listLayers: [String!]!` + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql layer ls +``` + +**Expected:** +```yaml +{{- contains . }} +- GENERAL +{{- end }} +``` + +## Service Instance List + +**GraphQL schema:** `metadata-v2.graphqls` — `listInstances(duration, serviceId): [ServiceInstance!]!` + +**ServiceInstance type fields:** `id`, `name`, `attributes`(`name`,`value`), `language`, `instanceUUID` + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql instance list \ + --service-name=e2e-service-provider +``` + +**Expected:** +```yaml +{{- contains . }} +- id: {{ b64enc "e2e-service-provider" }}.1_{{ b64enc "provider1" }} + name: provider1 + language: JAVA + instanceuuid: {{ notEmpty .instanceuuid }} + attributes: + {{- contains .attributes }} + - name: {{ notEmpty .name }} + value: {{ notEmpty .value }} + {{- end }} +{{- end }} +``` + +## Endpoint List + +**GraphQL schema:** `metadata-v2.graphqls` — `findEndpoint(keyword, serviceId, limit, duration): [Endpoint!]!` + +**Endpoint type fields:** `id`, `name` + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql endpoint list \ + --service-name=e2e-service-provider +``` + +**Expected:** +```yaml +{{- contains . }} +- id: {{ b64enc "e2e-service-provider" }}.1_{{ b64enc "POST:/users" }} + name: POST:/users +{{- end }} +``` + +## Service Dependency (Topology) + +**GraphQL schema:** `topology.graphqls` — `getServiceTopology(serviceId, duration): Topology` + +**Topology type:** `nodes: [Node!]!` + `calls: [Call!]!` + `debuggingTrace` +- **Node fields:** `id`, `name`, `type`, `isReal`, `layers` +- **Call fields:** `source`, `sourceComponents`, `target`, `targetComponents`, `id`, `detectPoints` + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql dependency service \ + --service-name=e2e-service-provider +``` + +**Expected:** +```yaml +debuggingtrace: null +nodes: +{{- contains .nodes }} +- id: {{ b64enc "e2e-service-provider" }}.1 + name: e2e-service-provider + type: Tomcat + isreal: true + layers: + - GENERAL +- id: {{ b64enc "localhost:-1" }}.0 + name: localhost:-1 + type: H2 + isreal: false + layers: + - VIRTUAL_DATABASE +{{- end }} +calls: +{{- contains .calls }} +- source: {{ b64enc "e2e-service-provider" }}.1 + sourcecomponents: + - h2-jdbc-driver + target: {{ b64enc "localhost:-1" }}.0 + targetcomponents: [] + id: {{ b64enc "e2e-service-provider" }}.1-{{ b64enc "localhost:-1" }}.0 + detectpoints: + - CLIENT +{{- end }} +``` + +## Trace List (v2) + +**GraphQL schema:** `trace-v2.graphqls` — `queryTraces(condition, debug): TraceList` + +**TraceList type:** `traces: [TraceV2!]!` + `retrievedTimeRange`(`startTime`, `endTime`) + `debuggingTrace` +- **TraceV2:** `spans: [Span!]!` +- **Span fields** (from `trace.graphqls`): `traceId`, `segmentId`, `spanId`, `parentSpanId`, `refs`, `serviceCode`, `serviceInstanceName`, `startTime`(ms), `endTime`(ms), `endpointName`, `type`(Local/Entry/Exit), `peer`, `component`, `isError`, `layer`(Unknown/Database/RPCFramework/Http/MQ/Cache), `tags`(`[KeyValue]`), `logs`(`[LogEntity]`), `attachedEvents`(`[SpanAttachedEvent]`) + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql tv2 ls +``` + +**Expected:** +```yaml +debuggingtrace: null +retrievedtimerange: + {{- with .retrievedtimerange }} + starttime: {{ gt .starttime 0 }} + endtime: {{ gt .endtime 0 }} + {{- end }} +traces: + {{- contains .traces }} + - spans: + {{- contains .spans }} + - traceid: {{ .traceid }} + segmentid: {{ .segmentid }} + spanid: 0 + parentspanid: -1 + refs: [] + servicecode: e2e-service-consumer + serviceinstancename: consumer1 + starttime: {{ gt .starttime 0 }} + endtime: {{ gt .endtime 0 }} + endpointname: POST:/users + type: Entry + peer: "" + component: Tomcat + iserror: false + layer: Http + tags: + {{- contains .tags }} + - key: url + value: {{ notEmpty .value }} + - key: http.method + value: POST + - key: http.status_code + value: "200" + {{- end }} + logs: [] + attachedevents: [] + {{- end }} + {{- end }} +``` + +## MQE Metrics (No Operation) + +**GraphQL schema:** `metrics-v3.graphqls` — `execExpression(expression, entity, duration, debug, dumpDBRsp): ExpressionResult!` + +**ExpressionResult type:** `type`(`ExpressionResultType`), `results`(`[MQEValues]`), `error`, `debuggingTrace` +- **MQEValues:** `metric`(`Metadata` with `labels: [KeyValue]`) + `values`(`[MQEValue]`) +- **MQEValue fields:** `id`, `owner`(`Owner`), `value`(String, may be null), `traceID` +- **ExpressionResultType enum:** `UNKNOWN`, `SINGLE_VALUE`, `TIME_SERIES_VALUES`, `SORTED_LIST`, `RECORD_LIST` + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec \ + --expression=service_sla --service-name=e2e-service-provider +``` + +**Expected:** +```yaml +debuggingtrace: null +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: [] + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ notEmpty .value }} + owner: null + traceid: null + - id: {{ notEmpty .id }} + value: null + owner: null + traceid: null + {{- end }} + {{- end }} +error: null +``` + +**Notes:** +- `value: null` entries represent time slots with no data (gaps) +- `id` is typically a timestamp-based identifier +- Including both `notEmpty` and `null` value entries ensures the time series has at least some data + +## MQE Metrics with Labels (Percentile) + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec \ + --expression="service_percentile{p='50,75,90,95,99'}" --service-name=e2e-service-provider +``` + +**Expected:** +```yaml +debuggingtrace: null +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: + {{- contains .metric.labels }} + - key: p + value: "50" + {{- end }} + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ notEmpty .value }} + owner: null + traceid: null + {{- end }} + {{- end }} +error: null +``` + +## MQE Binary Operations + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec \ + --expression="service_sla * 2 / 100 + 500 - 50" --service-name=e2e-service-provider +``` + +Expression supports: `+`, `-`, `*`, `/`, `>`, `<`, `>=`, `<=`, `==`, `!=`, `&&`, `||`, and functions like `avg()`, `abs()`, `top_n()`, `relabels()`, `aggregate_labels()`. + +## Alarm Queries + +**GraphQL schema:** `alarm.graphqls` — `getAlarm(duration, scope, keyword, paging, tags): Alarms` + +**Alarms type:** `msgs: [AlarmMessage!]!` +- **AlarmMessage fields:** `startTime`(Long), `recoveryTime`(Long), `scope`(Scope), `id`, `name`, `message`, `events`(`[Event]`), `tags`(`[KeyValue]`), `snapshot`(`AlarmSnapshot`) +- **AlarmSnapshot:** `expression`(String) + `metrics`(`[MQEMetric]` with `name` + `results: [MQEValues]`) +- **Event fields** (from `event.graphqls`): `uuid`, `source`(`service`,`serviceInstance`,`endpoint`), `name`, `type`(Normal/Error), `message`, `parameters`(`[KeyValue]`), `startTime`(ms), `endTime`(ms), `layer` + +**Query:** +```bash +swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql alarm ls \ + --tags=level=WARNING +``` + +**Expected:** +```yaml +msgs: + {{- contains .msgs }} + - starttime: {{ gt .starttime 0 }} + scope: Service + id: {{ b64enc "e2e-service-provider" }}.1 + name: e2e-service-provider + message: {{ notEmpty .message }} + tags: + - key: level + value: WARNING + events: + {{- contains .events }} + - uuid: {{ notEmpty .uuid }} + source: + service: e2e-service-provider + serviceinstance: "" + endpoint: "" + name: Alarm + type: "" + message: {{ notEmpty .message }} + parameters: [] + starttime: {{ gt .starttime 0 }} + endtime: {{ gt .endtime 0 }} + layer: GENERAL + {{- end }} + snapshot: + expression: {{ notEmpty .snapshot.expression }} + metrics: + {{- contains .snapshot.metrics }} + - name: {{ notEmpty .name }} + results: + {{- contains .results }} + - metric: + labels: [] + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + owner: null + value: {{ .value }} + traceid: null + {{- end }} + {{- end }} + {{- end }} + {{- end }} +``` + +## Log Queries + +**GraphQL schema:** `log.graphqls` — `queryLogs(condition, debug): Logs` + +**Logs type:** `errorReason`(String), `logs`(`[Log]`), `debuggingTrace` +- **Log fields:** `serviceName`, `serviceId`, `serviceInstanceName`, `serviceInstanceId`, `endpointName`, `endpointId`, `traceId`, `timestamp`(Long), `contentType`(TEXT/JSON/YAML), `content`, `tags`(`[KeyValue]`) + +## Event Queries + +**GraphQL schema:** `event.graphqls` — `queryEvents(condition): Events` + +**Events type:** `events: [Event!]!` +- **Event fields:** `uuid`, `source`(`service`,`serviceInstance`,`endpoint`), `name`, `type`(Normal/Error), `message`, `parameters`(`[KeyValue]`), `startTime`(ms), `endTime`(ms), `layer` + +## Hierarchy Queries + +**GraphQL schema:** `hierarchy.graphqls` — `getServiceHierarchy(serviceId, layer): ServiceHierarchy!` + +**ServiceHierarchy type:** `relations: [HierarchyServiceRelation!]!` +- **HierarchyServiceRelation:** `upperService` + `lowerService` (each: `id`, `name`, `layer`, `normal`) + +## swctl Query Types Reference + +| Command | Description | GraphQL Query | +|---------|-------------|---------------| +| `service ls` | List all services | `listServices` | +| `layer ls` | List all layers | `listLayers` | +| `instance list --service-name=X` | List instances of a service | `listInstances` | +| `endpoint list --service-name=X` | List endpoints of a service | `findEndpoint` | +| `dependency global` | Global service dependency topology | `getGlobalTopology` | +| `dependency service --service-name=X` | Service-level dependency | `getServiceTopology` | +| `dependency instance --service-name=X --dest-service-name=Y` | Instance-level dependency | `getServiceInstanceTopology` | +| `dependency endpoint --service-name=X --endpoint-name=Y` | Endpoint-level dependency | `getEndpointDependencies` | +| `tv2 ls` | List traces (v2) | `queryTraces` | +| `metrics exec --expression=EXPR --service-name=X` | Execute MQE expression | `execExpression` | +| `alarm ls` | List alarms | `getAlarm` | +| `alarm ls --tags=key=value` | List alarms filtered by tags | `getAlarm` with tags | diff --git a/test/e2e-v2/e2e-expectation-logql.md b/test/e2e-v2/e2e-expectation-logql.md new file mode 100644 index 0000000000..8115ec7e33 --- /dev/null +++ b/test/e2e-v2/e2e-expectation-logql.md @@ -0,0 +1,120 @@ +# LogQL Expected File Patterns + +SkyWalking implements a Loki-compatible LogQL API for log querying. + +**Port:** `${oap_3100}` — Base URL: `http://${oap_host}:${oap_3100}/loki/api/v1/` + +## Labels Query + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3100}/loki/api/v1/labels \ + -d 'start='$(($(($(date +%s)-1800))*1000000000))'&end='$(($(date +%s)*1000000000)) +``` + +**Expected:** +```yaml +status: success +data: +{{- contains .data }} + - {{ notEmpty . }} +{{- end}} +``` + +**Notes:** +- Returns list of available label names (e.g., `service`, `service_instance`, `endpoint`, `trace_id`) +- Timestamps are in **nanoseconds** (Unix epoch × 10⁹) + +## Label Values Query + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3100}/loki/api/v1/label/service/values \ + -d 'start='$(($(($(date +%s)-1800))*1000000000))'&end='$(($(date +%s)*1000000000)) +``` + +**Expected:** +```yaml +status: success +data: +{{- contains .data }} + - e2e-service-provider +{{- end}} +``` + +## Log Stream Query (query_range) + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3100}/loki/api/v1/query_range \ + -d 'query={service="e2e-service-provider"}&start='$(($(($(date +%s)-1800))*1000000000))'&end='$(($(date +%s)*1000000000))'&limit=100&direction=BACKWARD' +``` + +**Expected:** +```yaml +status: success +data: + resultType: streams + result: + {{- contains .data.result }} + - stream: + service: e2e-service-provider + service_instance: {{ notEmpty .stream.service_instance }} + endpoint: {{ .stream.endpoint }} + trace_id: {{ .stream.trace_id }} + values: + {{- contains .values }} + - - "{{ notEmpty (index . 0) }}" + - "{{ notEmpty (println (index . 1)) }}" + {{- end}} + {{- end}} +``` + +**Notes:** +- `resultType` is always `streams` for log queries +- Each stream has a `stream` metadata object and `values` array +- Values are `[timestamp, log_content]` tuples +- Use `(index . 0)` for timestamp and `(index . 1)` for log body +- `println` is used for log body to handle multiline content +- Stream metadata labels: `service`, `service_instance`, `endpoint`, `trace_id` + +## Log Query with Instance Filter + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3100}/loki/api/v1/query_range \ + -d 'query={service_instance="provider1"}&start='$(($(($(date +%s)-1800))*1000000000))'&end='$(($(date +%s)*1000000000))'&limit=100&direction=BACKWARD' +``` + +**Expected:** Same structure as above, with `service_instance` narrowed to the filter value. + +## LogQL Query Syntax + +LogQL filter expressions use label matchers inside `{}`: + +``` +{service="e2e-service-provider"} # exact match +{service_instance="provider1"} # exact match +{service="e2e-service-provider", endpoint="/users"} # multiple labels +``` + +## Query Parameters + +| Parameter | Description | Example | +|-----------|-------------|---------| +| `query` | LogQL filter expression | `{service="svc"}` | +| `start` | Start timestamp (nanoseconds) | `1700000000000000000` | +| `end` | End timestamp (nanoseconds) | `1700001800000000000` | +| `limit` | Max number of log entries | `100` | +| `direction` | Sort order: `FORWARD` or `BACKWARD` | `BACKWARD` | + +## Timestamp Generation Pattern + +The e2e tests use shell arithmetic for dynamic timestamps: +```bash +# Start: 30 minutes ago in nanoseconds +start='$(($(($(date +%s)-1800))*1000000000))' + +# End: now in nanoseconds +end='$(($(date +%s)*1000000000))' +``` diff --git a/test/e2e-v2/e2e-expectation-promql.md b/test/e2e-v2/e2e-expectation-promql.md new file mode 100644 index 0000000000..6a27b1de41 --- /dev/null +++ b/test/e2e-v2/e2e-expectation-promql.md @@ -0,0 +1,227 @@ +# PromQL Expected File Patterns + +SkyWalking implements a Prometheus-compatible PromQL API for metrics querying. + +**Port:** `${oap_9090}` — Base URL: `http://${oap_host}:${oap_9090}/api/v1/` + +## Service Traffic (Series Query) + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_9090}/api/v1/series \ + -d 'match[]=service_traffic{layer="GENERAL"}&start='$(($(date +%s)-1800))'&end='$(date +%s) +``` + +**Expected:** +```yaml +status: success +data: +{{- contains .data }} + - __name__: service_traffic + layer: GENERAL + scope: Service + service: {{ notEmpty .service }} +{{- end}} +``` + +## Labels Query + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_9090}/api/v1/labels \ + -d 'match[]=service_traffic{layer="GENERAL"}&start='$(($(date +%s)-1800))'&end='$(date +%s) +``` + +**Expected:** +```yaml +status: success +data: +{{- contains .data }} + - __name__ + - layer + - scope + - service +{{- end}} +``` + +## Label Values Query + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_9090}/api/v1/label/__name__/values \ + -d 'match[]=service_traffic{layer="GENERAL"}&start='$(($(date +%s)-1800))'&end='$(date +%s) +``` + +**Expected:** +```yaml +status: success +data: +{{- contains .data }} + - service_traffic +{{- end}} +``` + +## Instant Query — Vector Result + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_9090}/api/v1/query \ + -d 'query=service_sla{service="e2e-service-consumer", layer="GENERAL"}' +``` + +**Expected:** +```yaml +status: success +data: + resultType: vector + result: + {{- contains .data.result }} + - metric: + __name__: service_sla + layer: GENERAL + scope: Service + service: e2e-service-consumer + value: + - "{{ index .value 0 }}" + - '10000' + {{- end}} +``` + +**Notes:** +- `resultType: vector` for instant queries +- `value` is a `[timestamp, value]` tuple +- Use `{{ index .value 0 }}` for timestamp (dynamic) and literal for expected value + +## Range Query — Matrix Result + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_9090}/api/v1/query \ + -d 'query=service_sla{service="e2e-service-consumer", layer="GENERAL"}[30m]' +``` + +**Expected:** +```yaml +status: success +data: + resultType: matrix + result: + {{- contains .data.result }} + - metric: + __name__: service_sla + layer: GENERAL + scope: Service + service: e2e-service-consumer + values: + {{- contains .values }} + - - "{{ index . 0 }}" + - '10000' + {{- end}} + {{- end}} +``` + +**Notes:** +- `resultType: matrix` for range queries (bracket suffix like `[30m]`) +- `values` is an array of `[timestamp, value]` tuples + +## Labeled Metrics (Percentile) — Matrix + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_9090}/api/v1/query \ + -d 'query=endpoint_percentile{service="e2e-service-consumer", layer="GENERAL", p="50,75,90"}[30m]' +``` + +**Expected:** +```yaml +status: success +data: + resultType: matrix + result: + {{- contains .data.result }} + - metric: + __name__: endpoint_percentile + p: 50 + layer: GENERAL + scope: Endpoint + service: e2e-service-consumer + endpoint: POST:/users + values: + {{- contains .values }} + - - "{{ index . 0 }}" + - "{{ index . 1 }}" + {{- end}} + - metric: + __name__: endpoint_percentile + p: 75 + layer: GENERAL + scope: Endpoint + service: e2e-service-consumer + endpoint: POST:/users + values: + {{- contains .values }} + - - "{{ index . 0 }}" + - "{{ index . 1 }}" + {{- end}} + {{- end}} +``` + +**Notes:** +- Percentile label `p` expands into separate metric series +- Each percentile value produces its own result entry + +## Range Query + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_9090}/api/v1/query_range \ + -d 'query=service_sla{service="e2e-service-consumer", layer="GENERAL"}&start='$(($(date +%s)-1800))'&end='$(date +%s)'&step=300' +``` + +**Expected:** Same structure as matrix result above. + +## Metadata Query + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_9090}/api/v1/metadata \ + -d 'metric=service_sla' +``` + +**Expected:** +```yaml +status: success +data: + service_sla: + - type: gauge + help: "" + unit: "" +``` + +## PromQL Query Syntax + +Label selectors use `{}` with key-value pairs: + +``` +service_sla{service="svc", layer="GENERAL"} # instant query +service_sla{service="svc", layer="GENERAL"}[30m] # range query (30 min) +``` + +## Common Metric Labels + +| Label | Description | Example | +|-------|-------------|---------| +| `__name__` | Metric name | `service_sla` | +| `layer` | SkyWalking layer | `GENERAL` | +| `scope` | Metric scope | `Service`, `Endpoint`, `ServiceInstance` | +| `service` | Service name | `e2e-service-provider` | +| `endpoint` | Endpoint name | `POST:/users` | +| `p` | Percentile value | `50`, `75`, `90`, `95`, `99` | + +## Timestamp Format + +PromQL timestamps are in **seconds** (Unix epoch), unlike LogQL which uses nanoseconds: +```bash +start='$(($(date +%s)-1800))' # 30 minutes ago in seconds +end='$(date +%s)' # now in seconds +``` diff --git a/test/e2e-v2/e2e-expectation-spec.md b/test/e2e-v2/e2e-expectation-spec.md new file mode 100644 index 0000000000..f5eba631a1 --- /dev/null +++ b/test/e2e-v2/e2e-expectation-spec.md @@ -0,0 +1,215 @@ +# E2E Expectation File Specification + +## How Verification Works + +The skywalking-infra-e2e framework uses a template-based verification approach: + +1. **Query execution** — Run a shell command (e.g., `swctl`, `curl`) that produces YAML/JSON output +2. **Template rendering** — Render the expected file as a Go template, with the actual query result as the template context (`.`) +3. **Structural comparison** — Parse both rendered template and actual data as YAML, then compare using `go-cmp` +4. **Retry loop** — If comparison fails, retry up to `verify.retry.count` times with `verify.retry.interval` between attempts + +The key insight: **template functions validate values during rendering**. When a function like `notEmpty` succeeds, it returns the actual value (so rendered output matches actual). When it fails, it returns an error string like `<"" is empty, wanted is not empty>` which causes a mismatch. + +## Template Syntax + +Expected files use [Go template syntax](https://pkg.go.dev/text/template). The actual data is available as `.` (dot). + +### Whitespace Control + +Use `{{-` and `-}}` to trim surrounding whitespace (critical for YAML formatting): + +```yaml +{{- contains .items }} # trim leading whitespace +- name: {{ .name }} +{{- end }} # trim leading whitespace +``` + +### Variable Assignment + +```yaml +{{ $id := .id }}{{ notEmpty $id }} +{{ $svcID := (index .nodes 0).id }}{{ notEmpty $svcID }} +``` + +### Conditional Blocks + +```yaml +{{if . }} +- id: {{ notEmpty .id }} +{{else}} +[] +{{end}} +``` + +### Context Switch (`with`) + +```yaml +{{- with .retrievedtimerange }} +starttime: {{ gt .starttime 0 }} +endtime: {{ gt .endtime 0 }} +{{- end }} +``` + +### Indexing + +```yaml +{{ index .value 0 }} # first element of array +{{ (index .nodes 2).id }} # id field of third node +``` + +## Template Functions + +### Assertion Functions + +These functions validate values. On success they return the actual value; on failure they return an error string that causes a mismatch. + +| Function | Signature | Success | Failure | +|----------|-----------|---------|---------| +| `notEmpty` | `notEmpty <value>` | Returns the value | `<"" is empty, wanted is not empty>` | +| `gt` | `gt <value> <threshold>` | Returns value | `<wanted gt N, but was M>` | +| `ge` | `ge <value> <threshold>` | Returns value | `<wanted ge N, but was M>` | +| `lt` | `lt <value> <threshold>` | Returns value | `<wanted lt N, but was M>` | +| `le` | `le <value> <threshold>` | Returns value | `<wanted le N, but was M>` | +| `regexp` | `regexp <value> <pattern>` | Returns value | `<"X" does not match the pattern "Y">` | + +### Encoding Functions + +| Function | Description | Example | +|----------|-------------|---------| +| `b64enc` | Base64 encode a string | `{{ b64enc "service-name" }}` → `c2VydmljZS1uYW1l` | +| `sha256enc` | SHA256 hex hash | `{{ sha256enc "data" }}` | +| `sha512enc` | SHA512 hex hash | `{{ sha512enc "data" }}` | + +### Utility Functions + +| Function | Description | Example | +|----------|-------------|---------| +| `subtractor` | Subtract values from first arg | `{{ subtractor 100 10 20 }}` → `70` | +| `println` | Print with newline (useful for multiline values) | `{{ println .body }}` | + +### Standard Go Template Functions + +Available from Go's `text/template`: `and`, `or`, `not`, `eq`, `ne`, `len`, `index`, `slice`, `print`, `printf`, `hasPrefix`, `hasSuffix`. + +## The `contains` Action + +`contains` is a **custom template action** (not a function) for unordered list matching. It verifies that the actual array contains **all** items matching the specified patterns, regardless of order. + +### Syntax + +```yaml +{{- contains .arrayField }} +- field1: {{ notEmpty .field1 }} + field2: {{ gt .field2 0 }} +- field1: expected-value + field2: 42 +{{- end }} +``` + +### Behavior + +- Every item in the `contains` block must match at least one item in the actual array +- Items can appear in any order in the actual data +- The actual array may contain additional items not listed in the pattern +- Supports nesting: `contains` inside `contains` for nested arrays +- Each pattern item is rendered with each actual item as context until a match is found + +### Common Pattern — Nested Contains + +```yaml +results: + {{- contains .results }} + - metric: + labels: + {{- contains .metric.labels }} + - key: {{ .key }} + value: {{ notEmpty .value }} + {{- end }} + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ notEmpty .value }} + {{- end }} + {{- end }} +``` + +## SkyWalking ID Encoding Convention + +SkyWalking encodes entity IDs using base64. Common patterns: + +```yaml +# Service ID: base64(serviceName).isNormal +id: {{ b64enc "e2e-service-provider" }}.1 # normal=true → .1 +id: {{ b64enc "localhost:-1" }}.0 # normal=false → .0 + +# Instance ID: serviceID_base64(instanceName) +id: {{ b64enc "e2e-service-provider" }}.1_{{ b64enc "provider1" }} + +# Endpoint ID: serviceID_base64(endpointName) +id: {{ b64enc "e2e-service-provider" }}.1_{{ b64enc "POST:/users" }} + +# Dependency ID: sourceServiceID-targetServiceID +id: {{ b64enc "e2e-service-consumer" }}.1-{{ b64enc "e2e-service-provider" }}.1 +``` + +## e2e.yaml Configuration + +```yaml +setup: + env: compose # or "kind" for Kubernetes + file: docker-compose.yml + timeout: 20m + init-system-environment: ../../script/env + +trigger: + action: http + interval: 3s + times: -1 # infinite until verify completes + url: http://${provider_host}:${provider_9090}/users + method: POST + body: '{"name": "test"}' + +verify: + retry: + count: 20 + interval: 3s + fail-fast: true # stop on first failure + cases: + - includes: + - simple-cases.yaml # relative path to cases file + +cleanup: + on: always # always | success | failure | never +``` + +## Cases YAML + +```yaml +cases: + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql service ls + expected: expected/service.yml + + - query: | + curl -s http://${oap_host}:${oap_9090}/api/v1/query \ + -d 'query=service_sla{service="e2e-service-provider"}' + expected: expected/metrics.yml +``` + +## License Header + +All `.yml` expected files must include the Apache 2.0 license header (see project `HEADER` file). + +## Debugging Failed Verifications + +When a verification fails, the framework outputs: +``` +✘ failed to verify case[expected/service.yml], retried 20 time(s) + the actual data is: + [actual YAML] + + mismatch (-want +got): + [go-cmp diff showing expected vs actual] +``` + +The `-want +got` diff shows exactly what the template rendered vs what was received. Look for error strings like `<"" is empty, wanted is not empty>` in the diff to identify which assertions failed. diff --git a/test/e2e-v2/e2e-expectation-status-debug.md b/test/e2e-v2/e2e-expectation-status-debug.md new file mode 100644 index 0000000000..a5308aae62 --- /dev/null +++ b/test/e2e-v2/e2e-expectation-status-debug.md @@ -0,0 +1,213 @@ +# Status & Debugging Query Expected File Patterns + +SkyWalking provides internal status and debugging HTTP endpoints on the core OAP REST port (shared with GraphQL). + +**Port:** `${oap_12800}` — Base URL: `http://${oap_host}:${oap_12800}/` +**Module:** `status-query-plugin` (`status-query`) +**Handlers:** `DebuggingHTTPHandler`, `TTLConfigQueryHandler`, `ClusterStatusQueryHandler`, `AlarmStatusQueryHandler` + +All debugging endpoints return **YAML-formatted** responses with embedded execution traces. + +## Debugging Endpoints + +### MQE Debug + +**Query:** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/mqe?expression=service_sla&service=e2e-service-provider&serviceLayer=GENERAL&startTime=...&endTime=...&step=MINUTE" +``` + +**Parameters:** `dumpDBRsp`(bool), `expression`, `startTime`, `endTime`, `step`(DAY/HOUR/MINUTE/SECOND), `coldStage`(bool), `service`, `serviceLayer`, `serviceInstance`, `endpoint`, `process`, `destService`, `destServiceLayer`, `destServiceInstance`, `destEndpoint`, `destProcess` + +**Expected (YAML):** +```yaml +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: [] + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ notEmpty .value }} + owner: null + traceid: null + {{- end }} + {{- end }} +error: null +debuggingtrace: + traceid: {{ notEmpty .debuggingtrace.traceid }} + condition: {{ notEmpty .debuggingtrace.condition }} + starttime: {{ gt .debuggingtrace.starttime 0 }} + endtime: {{ gt .debuggingtrace.endtime 0 }} + duration: {{ gt .debuggingtrace.duration 0 }} + spans: + {{- contains .debuggingtrace.spans }} + - spanid: {{ ge .spanid 0 }} + parentspanid: {{ ge .parentspanid -1 }} + operation: {{ notEmpty .operation }} + starttime: {{ gt .starttime 0 }} + endtime: {{ gt .endtime 0 }} + duration: {{ ge .duration 0 }} + {{- end }} +``` + +### Trace Debug + +**Query (basic traces):** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/trace/queryBasicTraces?service=e2e-service-provider&serviceLayer=GENERAL&startTime=...&endTime=...&step=MINUTE&pageNum=1&pageSize=10" +``` + +**Parameters:** `service`, `serviceLayer`, `serviceInstance`, `endpoint`, `traceId`, `startTime`, `endTime`, `step`, `minTraceDuration`, `maxTraceDuration`, `traceState`(ALL/SUCCESS/ERROR), `queryOrder`(BY_START_TIME/BY_DURATION), `tags`, `pageNum`, `pageSize` + +**Query (single trace):** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/trace/queryTrace?traceId=abc123&startTime=...&endTime=...&step=MINUTE" +``` + +### Topology Debug + +**Query (global):** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/topology/getGlobalTopology?startTime=...&endTime=...&step=MINUTE&serviceLayer=GENERAL" +``` + +**Query (services):** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/topology/getServicesTopology?startTime=...&endTime=...&step=MINUTE&services=svc1,svc2" +``` + +**Query (instance):** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/topology/getServiceInstanceTopology?startTime=...&endTime=...&step=MINUTE&clientService=svc1&serverService=svc2&clientServiceLayer=GENERAL&serverServiceLayer=GENERAL" +``` + +**Query (endpoint):** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/topology/getEndpointDependencies?startTime=...&endTime=...&step=MINUTE&service=svc1&serviceLayer=GENERAL&endpoint=POST:/users" +``` + +**Query (process):** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/topology/getProcessTopology?startTime=...&endTime=...&step=MINUTE&service=svc1&serviceLayer=GENERAL&instance=inst1" +``` + +### Zipkin Trace Debug + +**Query (traces):** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/zipkin/api/v2/traces?serviceName=frontend&limit=10" +``` + +**Parameters:** `serviceName`, `remoteServiceName`, `spanName`, `annotationQuery`, `minDuration`, `maxDuration`, `endTs`, `lookback`, `limit`(default 10) + +**Query (single trace):** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/zipkin/api/v2/trace?traceId=abc123" +``` + +### Log Debug + +**Query:** +```bash +curl "http://${oap_host}:${oap_12800}/debugging/query/log/queryLogs?service=svc1&serviceLayer=GENERAL&startTime=...&endTime=...&step=MINUTE&pageNum=1&pageSize=10" +``` + +**Parameters:** `service`, `serviceLayer`, `serviceInstance`, `endpoint`, `startTime`, `endTime`, `step`, `coldStage`, `traceId`, `segmentId`, `spanId`, `tags`, `pageNum`, `pageSize`, `keywordsOfContent`, `excludingKeywordsOfContent`, `queryOrder` + +### Config Dump + +**Query:** +```bash +curl http://${oap_host}:${oap_12800}/debugging/config/dump +``` + +Returns all booting configurations with secret values masked. + +## Status Endpoints + +### TTL Config + +**Query:** +```bash +curl http://${oap_host}:${oap_12800}/status/config/ttl +``` + +**Expected:** +```yaml +metricsttl: + minute: {{ ge .metricsttl.minute 0 }} + hour: {{ ge .metricsttl.hour 0 }} + day: {{ ge .metricsttl.day 0 }} +recordsttl: + normal: {{ ge .recordsttl.normal 0 }} + trace: {{ ge .recordsttl.trace 0 }} + log: {{ ge .recordsttl.log 0 }} +``` + +### Cluster Node List + +**Query:** +```bash +curl http://${oap_host}:${oap_12800}/status/cluster/nodes +``` + +**Expected:** +```yaml +nodes: + {{- contains .nodes }} + - {{ notEmpty . }} + {{- end }} +``` + +### Alarm Rules + +**Query (all rules):** +```bash +curl http://${oap_host}:${oap_12800}/status/alarm/rules +``` + +**Query (specific rule):** +```bash +curl http://${oap_host}:${oap_12800}/status/alarm/{ruleId} +``` + +**Query (rule context for entity):** +```bash +curl http://${oap_host}:${oap_12800}/status/alarm/{ruleId}/{entityName} +``` + +**Expected (cluster response wrapper):** +```yaml +oapinstances: + {{- contains .oapinstances }} + - address: {{ notEmpty .address }} + status: {{ notEmpty .status }} + errormsg: null + {{- end }} +``` + +## Debugging Trace Structure + +All `/debugging/` endpoints include a `debuggingtrace` field in the response. This is the OAP internal execution trace (same as GraphQL `DebuggingTrace` type from `common.graphqls`): + +```yaml +debuggingtrace: + traceid: "unique-trace-id" + condition: "query condition string" + starttime: 1234567890000000 # nanoseconds + endtime: 1234567891000000 + duration: 1000000 + spans: + - spanid: 0 + parentspanid: -1 + operation: "TopologyQueryService.getGlobalTopology" + starttime: 1234567890000000 + endtime: 1234567891000000 + duration: 1000000 + msg: "optional message" + error: null +``` + +The span hierarchy mirrors the internal call chain, useful for diagnosing slow queries or storage issues. diff --git a/test/e2e-v2/e2e-expectation-traceql-zipkin.md b/test/e2e-v2/e2e-expectation-traceql-zipkin.md new file mode 100644 index 0000000000..75947590f8 --- /dev/null +++ b/test/e2e-v2/e2e-expectation-traceql-zipkin.md @@ -0,0 +1,357 @@ +# TraceQL & Zipkin Expected File Patterns + +SkyWalking implements both a TraceQL search API and a Zipkin v2-compatible trace API. + +## TraceQL API + +**Port:** `${oap_3200}` — Base URL: `http://${oap_host}:${oap_3200}/zipkin/api/` + +### Build Info + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3200}/zipkin/api/status/buildinfo +``` + +**Expected:** +```yaml +status: experimental +version: {{ notEmpty .version }} +``` + +### Search Tags (v1) + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3200}/zipkin/api/search/tags \ + -d 'start='$(($(date +%s)-1800))'&end='$(date +%s) +``` + +**Expected:** +```yaml +{{- contains . }} +- resource.service.name +- span.http.method +- span.http.path +- span.net.peer.name +{{- end}} +``` + +### Search Tags (v2 — Scoped) + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3200}/zipkin/api/v2/search/tags \ + -d 'start='$(($(date +%s)-1800))'&end='$(date +%s) +``` + +**Expected:** +```yaml +scopes: + {{- contains .scopes }} + - name: resource + tags: + {{- contains .tags }} + - service + {{- end }} + - name: span + tags: + {{- contains .tags }} + - {{ notEmpty . }} + {{- end }} + {{- end }} +``` + +### Tag Values + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3200}/zipkin/api/v2/search/tag/resource.service.name/values \ + -d 'start='$(($(date +%s)-1800))'&end='$(date +%s) +``` + +**Expected:** +```yaml +tagValues: + {{- contains .tagValues }} + - type: string + value: frontend + - type: string + value: backend + {{- end }} +``` + +### Search Traces by Service Name + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3200}/zipkin/api/search \ + -d 'q={resource.service.name="frontend"}&start='$(($(date +%s)-1800))'&end='$(date +%s)'&limit=10' +``` + +**Expected:** +```yaml +traces: + {{- contains .traces }} + - traceID: {{ notEmpty .traceID }} + rootServiceName: {{ notEmpty .rootServiceName }} + rootTraceName: {{ notEmpty .rootTraceName }} + startTimeUnixNano: "{{ notEmpty .startTimeUnixNano }}" + durationMs: {{ gt .durationMs -1 }} + spanSets: + {{- contains .spanSets }} + - matched: {{ gt .matched 0 }} + spans: + {{- contains .spans }} + - spanID: {{ notEmpty .spanID }} + startTimeUnixNano: "{{ .startTimeUnixNano }}" + durationNanos: "{{ .durationNanos }}" + attributes: + {{- contains .attributes }} + - key: {{ notEmpty .key }} + value: + stringValue: {{ notEmpty .value.stringValue }} + {{- end }} + {{- end }} + {{- end }} + {{- end }} +``` + +### Search Traces by Duration + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3200}/zipkin/api/search \ + -d 'q={duration>1ms}&start='$(($(date +%s)-1800))'&end='$(date +%s)'&limit=1' +``` + +**Expected:** Same structure as service name search above. + +### Complex TraceQL Query + +**Query:** +```bash +curl -X GET http://${oap_host}:${oap_3200}/zipkin/api/search \ + -d 'q={resource.service.name="frontend" && span.http.method="GET"}&start='$(($(date +%s)-1800))'&end='$(date +%s)'&limit=10' +``` + +**Expected:** Same structure as service name search above. + +### Trace by ID (JSON — OTLP Format) + +**Query:** +```bash +TRACE_ID=$(curl -X GET http://${oap_host}:${oap_3200}/zipkin/api/search \ + -d 'q={resource.service.name="frontend"}&start=...&end=...&limit=1' | jq -r '.traces[0].traceID // empty') +curl -X GET http://${oap_host}:${oap_3200}/zipkin/api/v2/traces/${TRACE_ID} \ + -H "Accept: application/json" +``` + +**Expected (OTLP format):** +```yaml +trace: + resourceSpans: + {{- contains .trace.resourceSpans }} + - resource: + attributes: + - key: service.name + value: + stringValue: backend + scopeSpans: + {{- contains .scopeSpans }} + - scope: + name: zipkin-tracer + version: "0.1.0" + spans: + {{- contains .spans }} + - traceId: {{ notEmpty .traceId }} + spanId: {{ notEmpty .spanId }} + parentSpanId: {{ notEmpty .parentSpanId }} + name: get /api + kind: SPAN_KIND_SERVER + startTimeUnixNano: "{{ notEmpty .startTimeUnixNano }}" + endTimeUnixNano: "{{ notEmpty .endTimeUnixNano }}" + attributes: + {{- contains .attributes }} + - key: http.method + value: + stringValue: GET + - key: http.path + value: + stringValue: /api + - key: net.host.ip + value: + stringValue: {{ notEmpty .value.stringValue }} + {{- end }} + events: + - timeUnixNano: "{{ notEmpty .timeUnixNano }}" + name: wr + attributes: [] + - timeUnixNano: "{{ notEmpty .timeUnixNano }}" + name: ws + attributes: [] + status: + code: STATUS_CODE_UNSET + {{- end }} + {{- end }} + {{- end }} +``` + +### TraceQL Query Syntax + +``` +{resource.service.name="frontend"} # service filter +{duration>1ms} # duration filter +{resource.service.name="frontend" && span.http.method="GET"} # compound filter +``` + +### TraceQL Search Parameters + +| Parameter | Description | Example | +|-----------|-------------|---------| +| `q` | TraceQL query expression | `{resource.service.name="frontend"}` | +| `start` | Start timestamp (seconds) | `1700000000` | +| `end` | End timestamp (seconds) | `1700001800` | +| `limit` | Max number of traces | `10` | + +--- + +## Zipkin v2 API + +**Port:** `${oap_9412}` — Base URL: `http://${oap_host}:${oap_9412}/zipkin/api/v2/` + +### Service List + +**Query:** +```bash +curl http://${oap_host}:${oap_9412}/zipkin/api/v2/services +``` + +**Expected:** +```yaml +[ + "backend", + "frontend" +] +``` + +### Span Names + +**Query:** +```bash +curl "http://${oap_host}:${oap_9412}/zipkin/api/v2/spans?serviceName=frontend" +``` + +**Expected:** +```yaml +{{- contains . }} +- get +- post / +{{- end }} +``` + +### Autocomplete Keys + +**Query:** +```bash +curl http://${oap_host}:${oap_9412}/zipkin/api/v2/autocompleteKeys +``` + +**Expected:** +```yaml +{{- contains . }} +- http.method +- http.path +{{- end }} +``` + +### Autocomplete Values + +**Query:** +```bash +curl "http://${oap_host}:${oap_9412}/zipkin/api/v2/autocompleteValues?key=http.method" +``` + +**Expected:** +```yaml +{{- contains . }} +- GET +- POST +{{- end }} +``` + +### Trace Search + +**Query:** +```bash +curl "http://${oap_host}:${oap_9412}/zipkin/api/v2/traces?serviceName=frontend&remoteServiceName=backend&spanName=get&annotationQuery=wr&limit=1" +``` + +**Expected (Zipkin span format):** +```yaml +{{- contains . }} +- traceId: {{ notEmpty .traceId }} + id: {{ notEmpty .id }} + kind: SERVER + name: get /api + timestamp: {{ ge .timestamp 0 }} + duration: {{ ge .duration 0 }} + localEndpoint: + serviceName: backend + ipv4: {{ notEmpty .localEndpoint.ipv4 }} + remoteEndpoint: + ipv4: {{ notEmpty .remoteEndpoint.ipv4 }} + port: {{ ge .remoteEndpoint.port 0 }} + annotations: + {{- contains .annotations }} + - timestamp: {{ ge .timestamp 0 }} + value: wr + - timestamp: {{ ge .timestamp 0 }} + value: ws + {{- end }} + tags: + http.method: GET + http.path: /api +{{- end }} +``` + +### Zipkin Span Structure + +| Field | Description | +|-------|-------------| +| `traceId` | 128-bit trace identifier | +| `id` | Span identifier | +| `parentId` | Parent span identifier (absent for root spans) | +| `kind` | `SERVER`, `CLIENT`, `PRODUCER`, `CONSUMER` | +| `name` | Operation name | +| `timestamp` | Start time in **microseconds** | +| `duration` | Duration in **microseconds** | +| `localEndpoint` | `{serviceName, ipv4, port}` | +| `remoteEndpoint` | `{serviceName, ipv4, port}` | +| `annotations` | Array of `{timestamp, value}` events | +| `tags` | Key-value map of span tags | + +### Zipkin Search Parameters + +| Parameter | Description | +|-----------|-------------| +| `serviceName` | Filter by service | +| `remoteServiceName` | Filter by downstream service | +| `spanName` | Filter by span operation | +| `annotationQuery` | Filter by annotation value | +| `minDuration` | Minimum span duration | +| `maxDuration` | Maximum span duration | +| `limit` | Max traces returned | +| `endTs` | End timestamp (milliseconds) | +| `lookback` | Lookback period (milliseconds) | + +## Key Differences Between TraceQL and Zipkin + +| Aspect | TraceQL (port 3200) | Zipkin v2 (port 9412) | +|--------|--------------------|-----------------------| +| Response format | OTLP-like (`resourceSpans`, `scopeSpans`) | Zipkin native (flat span array) | +| Timestamp unit | Nanoseconds | Microseconds | +| Search syntax | TraceQL expressions: `{attr="val"}` | URL query parameters | +| Trace detail | `resourceSpans` with `scopeSpans` nesting | Flat list of spans | +| Span kind | `SPAN_KIND_SERVER`, `SPAN_KIND_CLIENT` | `SERVER`, `CLIENT` | +| Tags/attributes | `attributes: [{key, value: {stringValue}}]` | `tags: {key: value}` | diff --git a/test/e2e-v2/e2e-expectation-zipkin.md b/test/e2e-v2/e2e-expectation-zipkin.md new file mode 100644 index 0000000000..04c578a05d --- /dev/null +++ b/test/e2e-v2/e2e-expectation-zipkin.md @@ -0,0 +1,186 @@ +# Zipkin v2 Query Expected File Patterns + +SkyWalking implements the native Zipkin v2 query API, compatible with the Zipkin ecosystem (Zipkin UI, Zipkin clients). + +**Port:** `${oap_9412}` — Base URL: `http://${oap_host}:${oap_9412}/zipkin` +**Module:** `zipkin-query-plugin` (`query-zipkin`) +**Handler:** `ZipkinQueryHandler` — reference: `zipkin2.server.internal.ZipkinQueryApiV2` + +## Endpoints Reference + +| Path | Method | Parameters | Response | +|------|--------|------------|----------| +| `/api/v2/services` | GET | — | `[String]` — service names | +| `/api/v2/remoteServices` | GET | `serviceName` (required) | `[String]` — remote service names | +| `/api/v2/spans` | GET | `serviceName` (required) | `[String]` — span names | +| `/api/v2/trace/{traceId}` | GET | `traceId` (path) | `[Span]` — Zipkin v2 spans | +| `/api/v2/traces` | GET | `serviceName`, `remoteServiceName`, `spanName`, `annotationQuery`, `minDuration`(ms), `maxDuration`(ms), `endTs`(ms), `lookback`(ms), `limit`(default 10) | `[[Span]]` — list of traces | +| `/api/v2/traceMany` | GET | `traceIds` (comma-separated) | `[[Span]]` — batch trace query | +| `/api/v2/autocompleteKeys` | GET | — | `[String]` — available tag keys | +| `/api/v2/autocompleteValues` | GET | `key` (required) | `[String]` — values for tag key | +| `/config.json` | GET | — | UI configuration object | + +## Zipkin v2 Span Format + +Each span in the response follows the Zipkin v2 JSON format: +``` +{ + traceId: String # 128-bit trace identifier + id: String # span identifier + parentId: String # parent span (absent for root) + kind: SERVER|CLIENT|PRODUCER|CONSUMER + name: String # operation name + timestamp: Long # start time in MICROSECONDS + duration: Long # duration in MICROSECONDS + localEndpoint: {serviceName, ipv4, port} + remoteEndpoint: {serviceName, ipv4, port} + annotations: [{timestamp, value}] # timestamped events + tags: {key: value, ...} # string key-value map +} +``` + +## Service List + +**Query:** +```bash +curl http://${oap_host}:${oap_9412}/zipkin/api/v2/services +``` + +**Expected:** +```yaml +[ + "backend", + "frontend" +] +``` + +## Remote Services + +**Query:** +```bash +curl "http://${oap_host}:${oap_9412}/zipkin/api/v2/remoteServices?serviceName=frontend" +``` + +**Expected:** +```yaml +{{- contains . }} +- backend +{{- end }} +``` + +## Span Names + +**Query:** +```bash +curl "http://${oap_host}:${oap_9412}/zipkin/api/v2/spans?serviceName=frontend" +``` + +**Expected:** +```yaml +{{- contains . }} +- get +- post / +{{- end }} +``` + +## Trace Search + +**Query:** +```bash +curl "http://${oap_host}:${oap_9412}/zipkin/api/v2/traces?serviceName=frontend&remoteServiceName=backend&spanName=get&annotationQuery=wr&limit=1" +``` + +**Expected:** +```yaml +{{- contains . }} +- traceId: {{ notEmpty .traceId }} + id: {{ notEmpty .id }} + kind: SERVER + name: get /api + timestamp: {{ ge .timestamp 0 }} + duration: {{ ge .duration 0 }} + localEndpoint: + serviceName: backend + ipv4: {{ notEmpty .localEndpoint.ipv4 }} + remoteEndpoint: + ipv4: {{ notEmpty .remoteEndpoint.ipv4 }} + port: {{ ge .remoteEndpoint.port 0 }} + annotations: + {{- contains .annotations }} + - timestamp: {{ ge .timestamp 0 }} + value: wr + - timestamp: {{ ge .timestamp 0 }} + value: ws + {{- end }} + tags: + http.method: GET + http.path: /api +{{- end }} +``` + +## Trace Search with Duration Filter + +**Query:** +```bash +curl "http://${oap_host}:${oap_9412}/zipkin/api/v2/traces?minDuration=1000&limit=1" +``` + +**Expected:** Same Zipkin span structure as above. + +## Autocomplete Keys + +**Query:** +```bash +curl http://${oap_host}:${oap_9412}/zipkin/api/v2/autocompleteKeys +``` + +**Expected:** +```yaml +{{- contains . }} +- http.method +- http.path +{{- end }} +``` + +## Autocomplete Values + +**Query:** +```bash +curl "http://${oap_host}:${oap_9412}/zipkin/api/v2/autocompleteValues?key=http.method" +``` + +**Expected:** +```yaml +{{- contains . }} +- GET +- POST +{{- end }} +``` + +## Key Differences from TraceQL (port 3200) + +| Aspect | Zipkin v2 (this doc, port 9412) | TraceQL (port 3200) | +|--------|--------------------------------|---------------------| +| API style | Zipkin v2 native REST | Grafana Tempo-compatible | +| Response format | Flat span array, Zipkin JSON | OTLP-like (`resourceSpans` / `scopeSpans`) | +| Timestamp unit | Microseconds | Nanoseconds | +| Search | URL query parameters | TraceQL expressions `{attr="val"}` | +| Span kind values | `SERVER`, `CLIENT` | `SPAN_KIND_SERVER`, `SPAN_KIND_CLIENT` | +| Tags format | `tags: {key: value}` (flat map) | `attributes: [{key, value: {stringValue}}]` (array) | +| Attached events | Annotations + extra `attachedEvents` field | Not applicable | + +## SpanAttachedEvent Support + +The Zipkin query handler can append `SpanAttachedEvent` data (from SkyWalking Rover/eBPF) as extra annotations and tags on Zipkin spans. When querying `/api/v2/trace/{traceId}`, if attached events exist: +- Events are appended as annotations with their event names +- Event summaries and tags are encoded in span tags + +## Configuration + +| Property | Env Var | Default | Description | +|----------|---------|---------|-------------| +| `restHost` | `SW_QUERY_ZIPKIN_REST_HOST` | `0.0.0.0` | HTTP server host | +| `restPort` | `SW_QUERY_ZIPKIN_REST_PORT` | `9412` | HTTP server port | +| `restContextPath` | `SW_QUERY_ZIPKIN_REST_CONTEXT_PATH` | `/zipkin` | URL context path | +| `lookback` | — | — | Default lookback duration (ms) | +| `uiQueryLimit` | — | — | Max query limit for UI |
