This is an automated email from the ASF dual-hosted git repository.
liuhan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git
The following commit(s) were added to refs/heads/main by this push:
new 1958df1c3 Add a new common-issue documentation (#1014)
1958df1c3 is described below
commit 1958df1c3424f411be6a2aa6fbbfd43dfc199617
Author: mrproliu <[email protected]>
AuthorDate: Wed Mar 18 23:56:56 2026 +0800
Add a new common-issue documentation (#1014)
---
CHANGES.md | 1 +
docs/menu.yml | 2 +
docs/operation/troubleshooting/common-issues.md | 55 +++++++++++++++++++++++
docs/operation/troubleshooting/error-checklist.md | 1 +
4 files changed, 59 insertions(+)
diff --git a/CHANGES.md b/CHANGES.md
index 55bd0b4fe..b06b93753 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -53,6 +53,7 @@ Release Notes.
- Add design of KTM.
- Add FODC overview doc.
- Remove Java client doc, and recreate client APIs docs.
+- Add common issue documentation.
### Chores
diff --git a/docs/menu.yml b/docs/menu.yml
index aaee5c0a8..54cfefeeb 100644
--- a/docs/menu.yml
+++ b/docs/menu.yml
@@ -133,6 +133,8 @@ catalog:
path: "/operation/troubleshooting/overhead"
- name: "Troubleshooting Query"
path: "/operation/troubleshooting/query"
+ - name: "Common Issues"
+ path: "/operation/troubleshooting/common-issues"
- name: "Security"
path: "/operation/security"
- name: "Backup"
diff --git a/docs/operation/troubleshooting/common-issues.md
b/docs/operation/troubleshooting/common-issues.md
new file mode 100644
index 000000000..d950c98fc
--- /dev/null
+++ b/docs/operation/troubleshooting/common-issues.md
@@ -0,0 +1,55 @@
+# Common Issues
+
+This document covers known issues that users may encounter during normal
operation of BanyanDB. These issues are typically transient and resolve
automatically.
+
+## Group Not Found During Startup
+
+### Symptom
+
+After starting or restarting BanyanDB, you may see error logs like the
following.
+
+On **liaison** nodes:
+
+```
+{"level":"error","module":"SERVER-QUEUE-PUB-DATA","error":"failed to receive
response for chunk 0: rpc error: code = Unknown desc = failed to create part
handler: group zipkinTrace not
found","time":"2026-03-18T05:17:05Z","message":"chunk send failed, aborting
sync session"}
+{"level":"warn","module":"TRACE","error":"failed to sync streaming part:
failed to stream parts: failed to send chunk 0: failed to receive response for
chunk 0: rpc error: code = Unknown desc = failed to create part handler: group
zipkinTrace not
found","node":"banyandb-data-hot-1.banyandb-data-hot-headless.banyandb:17912","partID":30059,"partType":"core","time":"2026-03-18T05:17:05Z","message":"failed
to send part during replay"}
+```
+
+On **data** nodes:
+
+```
+{"level":"error","module":"TRACE","error":"group zipkinTrace not
found","group":"zipkinTrace","time":"2026-03-18T05:17:25Z","message":"failed to
load TSDB for group"}
+{"level":"error","module":"SERVER-QUEUE-SUB","error":"failed to create part
handler: group zipkinTrace not
found","session_id":"sync-1773811045574321157","time":"2026-03-18T05:17:25Z","message":"failed
to process chunk"}
+```
+
+The group name in the error (e.g., `zipkinTrace`) may vary depending on your
configuration. These errors appear in the logs during the initial startup phase
and affect both query and write operations temporarily.
+
+### Applicable Scenario
+
+This issue is most commonly seen when `schema-registry-mode` is set to
`property`, which is the **default** mode. However, it can also occur in `etcd`
mode when the node is under heavy CPU pressure.
+
+BanyanDB supports two schema registry modes:
+
+- **`property`** (default): Schema metadata is synchronized between nodes via
an internal property-based protocol. This mode does not require an external
etcd cluster. The startup delay is more pronounced in this mode due to the
push-pull synchronization mechanism.
+- **`etcd`**: Schema metadata is stored and distributed through etcd. This
mode generally provides faster schema availability upon startup, but under high
CPU pressure, nodes may still experience delays in loading the schema from etcd.
+
+### Cause
+
+In `property` mode, schema metadata is distributed through a hybrid push-pull
mechanism:
+
+- **Active push (watch)**: When a schema change occurs, the schema server
broadcasts the event to all connected clients in real time via a gRPC watch
stream.
+- **Periodic sync (pull)**: Each node also polls for schema updates at a
configurable interval (default: 30 seconds,
`--schema-property-client-sync-interval`) as a fallback to ensure eventual
consistency.
+
+During startup, there is an inherent delay before the node receives the full
schema. Two common factors contribute to this delay:
+
+1. **Schema propagation latency**: Both the active push and periodic sync may
have delays during startup, so the node may need to wait up to one full sync
cycle (default: 30 seconds, configurable via
`--schema-property-client-sync-interval`) before the group metadata becomes
available.
+2. **CPU resource constraints**: If the node is running in a
resource-constrained environment (e.g., containers with limited CPU), both the
watch stream establishment and the periodic sync process may take longer to
complete.
+
+### Resolution
+
+This is a **transient issue** that resolves automatically once the schema sync
completes. No manual intervention is required in most cases.
+
+- **Wait for sync**: The error should disappear within one or two sync
intervals (default: 30-60 seconds) after startup.
+- **Check resource allocation**: If the error persists for an extended period,
verify that the nodes have sufficient CPU and memory resources.
+- **Adjust sync interval**: If faster startup convergence is needed, you can
reduce the sync interval with `--schema-property-client-sync-interval` (e.g.,
`10s`), though this increases the background sync overhead.
+- **Switch to etcd mode**: For environments where immediate schema
availability at startup is critical, consider using
`--schema-registry-mode=etcd` with an external etcd cluster.
diff --git a/docs/operation/troubleshooting/error-checklist.md
b/docs/operation/troubleshooting/error-checklist.md
index 99fd173b5..b33007b87 100644
--- a/docs/operation/troubleshooting/error-checklist.md
+++ b/docs/operation/troubleshooting/error-checklist.md
@@ -42,3 +42,4 @@ Here's an expanded section on common issues for your BanyanDB
troubleshooting do
- [Troubleshooting No Data Issues](./no-data.md)
- [Troubleshooting Query Issues](./query.md)
- [Troubleshooting Installation Issues](./install.md)
+- [Common Issues](./common-issues.md)