This is an automated email from the ASF dual-hosted git repository.

hanahmily pushed a commit to branch test/replication
in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git

commit 1a64a4f4e1fae278080c7649758c9c57f7773a63
Author: Hongtao Gao <[email protected]>
AuthorDate: Fri Mar 20 03:12:07 2026 +0000

    docs: add replication integration tests design spec
    
    Co-Authored-By: Claude Opus 4.6 <[email protected]>
---
 ...6-03-20-replication-integration-tests-design.md | 207 +++++++++++++++++++++
 1 file changed, 207 insertions(+)

diff --git 
a/docs/superpowers/specs/2026-03-20-replication-integration-tests-design.md 
b/docs/superpowers/specs/2026-03-20-replication-integration-tests-design.md
new file mode 100644
index 000000000..bdbaf50af
--- /dev/null
+++ b/docs/superpowers/specs/2026-03-20-replication-integration-tests-design.md
@@ -0,0 +1,207 @@
+# Replication Integration Tests Design
+
+## Overview
+
+Add comprehensive replication integration tests for BanyanDB that verify:
+1. **Deduplication**: Consistent unique results from multiple replicas
+2. **Accessibility**: Queries continue working after a node failure
+3. **Recovery**: Failed nodes recover and resync data via handoff queue
+
+## Background
+
+GitHub issue [#13229](https://github.com/apache/skywalking/issues/13229) 
requests integration tests for BanyanDB replication resilience. The existing 
replication test only covers `service_traffic` measure with `index_mode: true` 
in a replicated group.
+
+## Scope
+
+This design adds replication tests for:
+- **Stream** - replicated stream schemas
+- **Measure (normal mode)** - measures without `index_mode: true`
+- **Trace** - replicated trace schemas
+- **TopN** - replicated TopN schemas (runs on top of measures)
+
+## Design Principles
+
+1. **No test case copying** - Existing test cases in 
`test/cases/measure/data/`, `test/cases/stream/data/`, etc. are reused
+2. **Same schema names** - Measures/streams/traces keep their original names
+3. **Separate schema location** - Replicated schemas in a new directory 
`pkg/test/replicated/`
+4. **Isolation** - Replication test suite loads only from 
`pkg/test/replicated/` schemas
+
+## Schema Location
+
+```
+pkg/test/replicated/
+├── measure/
+│   └── testdata/
+│       ├── groups/
+│       │   ├── sw_metric_replicated.json      # replicas: 2
+│       │   ├── index_mode_replicated.json     # replicas: 2
+│       │   └── exception_replicated.json      # replicas: 2
+│       └── measures/
+│           ├── service_traffic.json           # index_mode: true
+│           ├── service_cpm_minute.json        # normal mode
+│           ├── service_instance_traffic.json
+│           └── ... (all existing measures, same names)
+├── stream/
+│   └── testdata/
+│       ├── group.json                         # replicas: 2
+│       └── streams/
+│           ├── sw.json
+│           ├── duplicated.json
+│           └── ... (all existing streams)
+├── trace/
+│   └── testdata/
+│       └── groups/
+│           ├── test-trace-group.json          # replicas: 2
+│           └── zipkin-trace-group.json        # replicas: 2
+└── topn/
+    └── testdata/
+        └── topnaggregations/
+            └── ... (existing TopN schemas)
+```
+
+## Schema Design
+
+### Measure Groups
+
+| Original Group | Replicated Group | Replicas |
+|----------------|------------------|----------|
+| sw_metric | sw_metric | 2 |
+| index_mode | index_mode | 2 |
+| exception | exception | 2 |
+
+### Stream Groups
+
+| Original Group | Replicated Group | Replicas |
+|----------------|------------------|----------|
+| default | default | 2 |
+| default-spec | default-spec | 2 |
+| default-spec2 | default-spec2 | 2 |
+
+### Trace Groups
+
+| Original Group | Replicated Group | Replicas |
+|----------------|------------------|----------|
+| test-trace-group | test-trace-group | 2 |
+| zipkinTrace | zipkinTrace | 2 |
+| test-trace-updated | test-trace-updated | 2 |
+| test-trace-spec | test-trace-spec | 2 |
+| test-trace-spec2 | test-trace-spec2 | 2 |
+
+## Test Files to Create
+
+### Schema Files (pkg/test/replicated/)
+
+#### Measure Schemas
+- `pkg/test/replicated/measure/testdata/groups/sw_metric.json` (copy from 
`pkg/test/measure/testdata/groups/sw_metric.json` with `replicas: 2`)
+- `pkg/test/replicated/measure/testdata/groups/index_mode.json` (copy with 
`replicas: 2`)
+- `pkg/test/replicated/measure/testdata/groups/exception.json` (copy with 
`replicas: 2`)
+- `pkg/test/replicated/measure/testdata/measures/*.json` (all measure schemas)
+
+#### Stream Schemas
+- `pkg/test/replicated/stream/testdata/group.json` (copy with `replicas: 2`)
+- `pkg/test/replicated/stream/testdata/group_with_stages.json` (copy with 
`replicas: 2`)
+- `pkg/test/replicated/stream/testdata/streams/*.json` (all stream schemas)
+
+#### Trace Schemas
+- `pkg/test/replicated/trace/testdata/groups/test-trace-group.json` (copy with 
`replicas: 2`)
+- `pkg/test/replicated/trace/testdata/groups/test-trace-updated.json` (copy 
with `replicas: 2`)
+- `pkg/test/replicated/trace/testdata/groups/zipkin-trace-group.json` (copy 
with `replicas: 2`)
+- `pkg/test/replicated/trace/testdata/groups/test-trace-spec.json` (copy with 
`replicas: 2`)
+- `pkg/test/replicated/trace/testdata/groups/test-trace-spec2.json` (copy with 
`replicas: 2`)
+
+#### TopN Schemas
+- `pkg/test/replicated/topn/testdata/topnaggregations/*.json` (all TopN 
schemas)
+
+### Go Files (pkg/test/replicated/)
+
+#### etcd.go Preload Functions
+- `pkg/test/replicated/measure/etcd.go` - `PreloadSchema(ctx, registry)` for 
replicated measures
+- `pkg/test/replicated/stream/etcd.go` - `PreloadSchema(ctx, registry)` for 
replicated streams
+- `pkg/test/replicated/trace/etcd.go` - `PreloadSchema(ctx, registry)` for 
replicated traces
+
+### Integration Test Files (test/integration/replication/)
+
+#### New Test Files
+- `test/integration/replication/measure_normal_replication_test.go` - Normal 
mode measure replication tests
+- `test/integration/replication/stream_replication_test.go` - Stream 
replication tests
+- `test/integration/replication/trace_replication_test.go` - Trace replication 
tests
+
+#### Modified Files
+- `test/integration/replication/replication_suite_test.go` - Add preloading of 
replicated schemas
+- `test/cases/init.go` - Add data initialization for replicated schemas
+
+## Extended Test Pattern
+
+For each data type (measure, stream, trace), the test verifies:
+
+### 1. Deduplication Verification
+```go
+// Query the same data multiple times
+for i := 0; i < 3; i++ {
+    resp := query()
+// Verify results are consistent (deduplication works across replicas)
+}
+```
+
+### 2. Accessibility Verification
+```go
+// Stop one data node
+closersToStop[0]()
+
+// Wait for cluster stabilization
+gm.Eventually(func() int {
+    nodes, _ := helpers.ListKeys(etcdEndpoint, nodePath)
+    return len(nodes)
+}, flags.EventuallyTimeout).Should(gm.Equal(3))
+
+// Verify queries still work
+verifyQueryResults(conn, now)
+```
+
+### 3. Recovery Verification
+```go
+// Restart the stopped node
+restartedNode := startDataNode(config)
+
+// Wait for handoff queue to drain
+gm.Eventually(func() int {
+    return countPendingParts(liaisonAddr)
+}, flags.EventuallyTimeout).Should(gm.Equal(0))
+
+// Verify data is fully recovered
+verifyQueryResults(conn, now)
+```
+
+## Implementation Order
+
+1. **Measure (normal mode)** - Create `pkg/test/replicated/measure/` and add 
measure replication tests
+2. **Stream** - Create `pkg/test/replicated/stream/` and add stream 
replication tests
+3. **Trace** - Create `pkg/test/replicated/trace/` and add trace replication 
tests
+4. **TopN** - Create `pkg/test/replicated/topn/` and add TopN replication tests
+
+## Data Initialization
+
+The `test/cases/init.go` file will be extended to initialize data for 
replicated schemas:
+
+```go
+// replicated measure
+casesmeasuredata.Write(conn, "service_traffic", "index_mode", 
"service_traffic_data.json", now, interval)
+casesmeasuredata.Write(conn, "service_cpm_minute", "sw_metric", 
"service_cpm_minute_data.json", now, interval)
+// ... (all measures written to their replicated groups)
+
+// replicated stream
+casesstreamdata.Write(conn, "sw", now, interval)
+// ... (all streams written to their replicated groups)
+
+// replicated trace
+casestrace.WriteToGroup(conn, "sw", "test-trace-group", "sw", now, interval)
+// ... (all traces written to their replicated groups)
+```
+
+## Acceptance Criteria
+
+1. All existing test cases in `test/cases/measure/data/`, 
`test/cases/stream/data/`, `test/cases/trace/data/` can run against replicated 
schemas without modification
+2. Replication tests verify deduplication (consistent results from replicas)
+3. Replication tests verify accessibility (queries work after node failure)
+4. Replication tests verify recovery (handoff queue drains after node restart)
+5. Test infrastructure is isolated: replication tests use only 
`pkg/test/replicated/` schemas

Reply via email to