(skywalking-banyandb) branch main updated: Add a new common-issue documentation (#1014)

liuhan Wed, 18 Mar 2026 08:57:21 -0700

This is an automated email from the ASF dual-hosted git repository.

liuhan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git



The following commit(s) were added to refs/heads/main by this push:
     new 1958df1c3 Add a new common-issue documentation (#1014)
1958df1c3 is described below

commit 1958df1c3424f411be6a2aa6fbbfd43dfc199617
Author: mrproliu <[email protected]>
AuthorDate: Wed Mar 18 23:56:56 2026 +0800

    Add a new common-issue documentation (#1014)
---
 CHANGES.md                                        |  1 +
 docs/menu.yml                                     |  2 +
 docs/operation/troubleshooting/common-issues.md   | 55 +++++++++++++++++++++++
 docs/operation/troubleshooting/error-checklist.md |  1 +
 4 files changed, 59 insertions(+)

diff --git a/CHANGES.md b/CHANGES.md
index 55bd0b4fe..b06b93753 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -53,6 +53,7 @@ Release Notes.
 - Add design of KTM.
 - Add FODC overview doc.
 - Remove Java client doc, and recreate client APIs docs.
+- Add common issue documentation.
 
 ### Chores
 
diff --git a/docs/menu.yml b/docs/menu.yml
index aaee5c0a8..54cfefeeb 100644
--- a/docs/menu.yml
+++ b/docs/menu.yml
@@ -133,6 +133,8 @@ catalog:
             path: "/operation/troubleshooting/overhead"
           - name: "Troubleshooting Query"
             path: "/operation/troubleshooting/query"
+          - name: "Common Issues"
+            path: "/operation/troubleshooting/common-issues"
       - name: "Security"
         path: "/operation/security"
       - name: "Backup"
diff --git a/docs/operation/troubleshooting/common-issues.md 
b/docs/operation/troubleshooting/common-issues.md
new file mode 100644
index 000000000..d950c98fc
--- /dev/null
+++ b/docs/operation/troubleshooting/common-issues.md
@@ -0,0 +1,55 @@
+# Common Issues
+
+This document covers known issues that users may encounter during normal 
operation of BanyanDB. These issues are typically transient and resolve 
automatically.
+
+## Group Not Found During Startup
+
+### Symptom
+
+After starting or restarting BanyanDB, you may see error logs like the 
following.
+
+On **liaison** nodes:
+
+```
+{"level":"error","module":"SERVER-QUEUE-PUB-DATA","error":"failed to receive 
response for chunk 0: rpc error: code = Unknown desc = failed to create part 
handler: group zipkinTrace not 
found","time":"2026-03-18T05:17:05Z","message":"chunk send failed, aborting 
sync session"}
+{"level":"warn","module":"TRACE","error":"failed to sync streaming part: 
failed to stream parts: failed to send chunk 0: failed to receive response for 
chunk 0: rpc error: code = Unknown desc = failed to create part handler: group 
zipkinTrace not 
found","node":"banyandb-data-hot-1.banyandb-data-hot-headless.banyandb:17912","partID":30059,"partType":"core","time":"2026-03-18T05:17:05Z","message":"failed
 to send part during replay"}
+```
+
+On **data** nodes:
+
+```
+{"level":"error","module":"TRACE","error":"group zipkinTrace not 
found","group":"zipkinTrace","time":"2026-03-18T05:17:25Z","message":"failed to 
load TSDB for group"}
+{"level":"error","module":"SERVER-QUEUE-SUB","error":"failed to create part 
handler: group zipkinTrace not 
found","session_id":"sync-1773811045574321157","time":"2026-03-18T05:17:25Z","message":"failed
 to process chunk"}
+```
+
+The group name in the error (e.g., `zipkinTrace`) may vary depending on your 
configuration. These errors appear in the logs during the initial startup phase 
and affect both query and write operations temporarily.
+
+### Applicable Scenario
+
+This issue is most commonly seen when `schema-registry-mode` is set to 
`property`, which is the **default** mode. However, it can also occur in `etcd` 
mode when the node is under heavy CPU pressure.
+
+BanyanDB supports two schema registry modes:
+
+- **`property`** (default): Schema metadata is synchronized between nodes via 
an internal property-based protocol. This mode does not require an external 
etcd cluster. The startup delay is more pronounced in this mode due to the 
push-pull synchronization mechanism.
+- **`etcd`**: Schema metadata is stored and distributed through etcd. This 
mode generally provides faster schema availability upon startup, but under high 
CPU pressure, nodes may still experience delays in loading the schema from etcd.
+
+### Cause
+
+In `property` mode, schema metadata is distributed through a hybrid push-pull 
mechanism:
+
+- **Active push (watch)**: When a schema change occurs, the schema server 
broadcasts the event to all connected clients in real time via a gRPC watch 
stream.
+- **Periodic sync (pull)**: Each node also polls for schema updates at a 
configurable interval (default: 30 seconds, 
`--schema-property-client-sync-interval`) as a fallback to ensure eventual 
consistency.
+
+During startup, there is an inherent delay before the node receives the full 
schema. Two common factors contribute to this delay:
+
+1. **Schema propagation latency**: Both the active push and periodic sync may 
have delays during startup, so the node may need to wait up to one full sync 
cycle (default: 30 seconds, configurable via 
`--schema-property-client-sync-interval`) before the group metadata becomes 
available.
+2. **CPU resource constraints**: If the node is running in a 
resource-constrained environment (e.g., containers with limited CPU), both the 
watch stream establishment and the periodic sync process may take longer to 
complete.
+
+### Resolution
+
+This is a **transient issue** that resolves automatically once the schema sync 
completes. No manual intervention is required in most cases.
+
+- **Wait for sync**: The error should disappear within one or two sync 
intervals (default: 30-60 seconds) after startup.
+- **Check resource allocation**: If the error persists for an extended period, 
verify that the nodes have sufficient CPU and memory resources.
+- **Adjust sync interval**: If faster startup convergence is needed, you can 
reduce the sync interval with `--schema-property-client-sync-interval` (e.g., 
`10s`), though this increases the background sync overhead.
+- **Switch to etcd mode**: For environments where immediate schema 
availability at startup is critical, consider using 
`--schema-registry-mode=etcd` with an external etcd cluster.
diff --git a/docs/operation/troubleshooting/error-checklist.md 
b/docs/operation/troubleshooting/error-checklist.md
index 99fd173b5..b33007b87 100644
--- a/docs/operation/troubleshooting/error-checklist.md
+++ b/docs/operation/troubleshooting/error-checklist.md
@@ -42,3 +42,4 @@ Here's an expanded section on common issues for your BanyanDB 
troubleshooting do
 - [Troubleshooting No Data Issues](./no-data.md)
 - [Troubleshooting Query Issues](./query.md)
 - [Troubleshooting Installation Issues](./install.md)
+- [Common Issues](./common-issues.md)

(skywalking-banyandb) branch main updated: Add a new common-issue documentation (#1014)

Reply via email to