This is an automated email from the ASF dual-hosted git repository.
liaoxin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 7a97e755304 [Doc] Add Stream Load in complex network environments
documentation (#3288)
7a97e755304 is described below
commit 7a97e7553040065f5122fcc6cc74837eb4d9125b
Author: Xin Liao <[email protected]>
AuthorDate: Thu Jan 15 23:06:19 2026 +0800
[Doc] Add Stream Load in complex network environments documentation (#3288)
---
.../stream-load-in-complex-network.md | 134 +++++++++++++++++++++
.../instance-management/ADD-BACKEND.md | 16 +++
.../stream-load-in-complex-network.md | 134 +++++++++++++++++++++
.../instance-management/ADD-BACKEND.md | 16 +++
.../stream-load-in-complex-network.md | 134 +++++++++++++++++++++
.../instance-management/ADD-BACKEND.md | 16 +++
.../stream-load-in-complex-network.md | 134 +++++++++++++++++++++
.../instance-management/ADD-BACKEND.md | 16 +++
sidebars.ts | 1 +
.../stream-load-in-complex-network.md | 134 +++++++++++++++++++++
.../instance-management/ADD-BACKEND.md | 16 +++
.../stream-load-in-complex-network.md | 134 +++++++++++++++++++++
.../instance-management/ADD-BACKEND.md | 16 +++
versioned_sidebars/version-3.x-sidebars.json | 9 +-
versioned_sidebars/version-4.x-sidebars.json | 3 +-
15 files changed, 911 insertions(+), 2 deletions(-)
diff --git
a/docs/data-operate/import/load-internals/stream-load-in-complex-network.md
b/docs/data-operate/import/load-internals/stream-load-in-complex-network.md
new file mode 100644
index 00000000000..5f6710cbefa
--- /dev/null
+++ b/docs/data-operate/import/load-internals/stream-load-in-complex-network.md
@@ -0,0 +1,134 @@
+---
+{
+ "title": "Stream Load in Complex Network Environments",
+ "language": "en",
+ "description": "Best practices for Stream Load in complex network
environments including public cloud, private cloud, and Kubernetes
cross-cluster access scenarios."
+}
+---
+
+## Overview
+
+In complex network environments such as public cloud, private cloud, and
Kubernetes cross-cluster deployments, data import faces unique challenges. Load
balancers (LB) and network isolation (VPC internal/external access) can impact
both request routing flexibility and batch processing efficiency.
+
+Apache Doris addresses these challenges through two key features:
+- **Stream Load Multi-Endpoint Support**: Enables flexible configuration of
multiple network endpoints for BE nodes
+- **Group Commit LB Scheduling Optimization**: Ensures efficient batch
processing even when requests pass through load balancers
+
+## Background
+
+### Stream Load
+
+Stream Load is an HTTP-based data import method that supports JSON, CSV, and
other formats. As a push-based approach, clients send data directly to Backend
nodes (BE) via HTTP requests, bypassing the MySQL protocol. This design enables
high concurrency, low latency, and high throughput, making it ideal for
small-batch, frequent write scenarios.
+
+### Group Commit
+
+Group Commit optimizes throughput by combining multiple small requests into
larger batch operations on the server side, reducing disk I/O, lock contention,
and compaction overhead. For maximum efficiency, Group Commit requires requests
for the same table to be routed to the same BE node.
+
+### The Challenge
+
+In cloud environments, load balancers randomly distribute requests across BE
nodes. This breaks the "node affinity" required by Group Commit, causing
requests for the same table to scatter across different nodes. Tests show
throughput can drop 20-50% in high-concurrency scenarios due to this issue.
+
+## Stream Load Multi-Endpoint Support
+
+### Address Types
+
+Doris BE nodes support three address types to accommodate different network
access scenarios:
+
+| Address Type | Purpose | Example |
+|-------------|---------|---------|
+| `be_host` | Internal cluster communication | `192.168.1.1:9050` |
+| `public_endpoint` | External public access via LB or public IP |
`11.10.20.12:8010` |
+| `private_endpoint` | Private access within VPC or Kubernetes Service IP |
`10.10.10.9:8020` |
+
+### Configuration
+
+Configure endpoints using SQL statements:
+
+```sql
+-- Add BE node with endpoints
+ALTER SYSTEM ADD BACKEND '192.168.1.1:9050' PROPERTIES(
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+
+-- Modify existing BE node endpoints
+ALTER SYSTEM MODIFY BACKEND '192.168.1.1:9050' SET (
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+```
+
+### Redirect Policy
+
+Control request routing using the `redirect-policy` HTTP header:
+
+| Policy | Behavior | Use Case |
+|--------|----------|----------|
+| `direct` | Routes to `be_host` | Internal low-latency communication,
Pod-to-Pod |
+| `public` | Routes to `public_endpoint` | External access via public network |
+| `private` | Routes to `private_endpoint` | VPC internal or cross-cluster
access |
+| Default (empty) | Auto-selects based on hostname matching | General use |
+
+**Default behavior:**
+1. If request hostname matches `public_endpoint` hostname, routes to
`public_endpoint`
+2. Else if `private_endpoint` is configured, routes to `private_endpoint`
+3. Otherwise, falls back to `be_host`
+
+**Example:**
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### How It Works
+
+1. Client sends Stream Load request to FE with optional `redirect-policy`
header
+2. FE selects target address from BE's address pool based on the policy
+3. FE returns HTTP redirect response to the selected endpoint
+
+## Group Commit LB Scheduling Optimization
+
+### Two-Phase Forwarding
+
+To maintain Group Commit efficiency behind load balancers, Doris implements a
two-phase forwarding mechanism:
+
+**Phase 1: FE Redirect**
+- FE selects the appropriate endpoint based on `redirect-policy`
+- FE determines which BE node should handle the target table
+- Request is redirected through LB, which randomly distributes to a BE node
+
+**Phase 2: BE Forwarding**
+- If the receiving BE (BE1) is not the designated node for the table
+- BE1 forwards the request internally to the correct BE (BE2) via `be_host`
+- This ensures all requests for the same table reach the same node
+
+### Configuration Example
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -H "group_commit: async_mode" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### Performance
+
+The two-phase forwarding introduces minimal overhead (millisecond-level),
while Group Commit's batch processing provides 20-50% throughput improvement in
high-concurrency scenarios.
+
+## Use Cases
+
+| Scenario | Configuration | Benefit |
+|----------|--------------|---------|
+| Real-time log ingestion | Group Commit + Multi-Endpoint | High throughput
with flexible routing |
+| Cloud-native BI | `public_endpoint` for external access | Secure external
user access |
+| Kubernetes cross-cluster | `private_endpoint` with Pod/Service IPs |
Efficient cross-cluster communication |
+
+## Considerations
+
+- **Configuration planning**: Ensure endpoint addresses are correctly
configured, especially in Kubernetes environments
+- **Monitoring**: Use monitoring tools to track forwarding rates and
performance
+- **Version requirement**: These features require Doris 3.1.0 or later
diff --git
a/docs/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
b/docs/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
index 02c2b9f65ef..225f6397d85 100644
---
a/docs/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
+++
b/docs/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
@@ -33,6 +33,8 @@ ALTER SYSTEM ADD BACKEND
"<host>:<heartbeat_port>"[,"<host>:<heartbeat_port>" [,
> A set of key-value pairs used to define additional properties of the BE
> node. These properties can be used to customize the configuration of the BE
> being added. Available properties include:
> - `tag.location`: Used to specify the Resource Group to which the BE node
> belongs in the integrated storage and computing mode.
> - `tag.compute_group_name`: Used to specify the compute group to which the
> BE node belongs in the decoupling storage and computing mode.
+> - `tag.public_endpoint`: Used to specify the public endpoint of the BE node
for external access (e.g., `11.10.20.12:8010`). This is typically a load
balancer domain name or public IP for external user access.
+> - `tag.private_endpoint`: Used to specify the private endpoint of the BE
node for private network access (e.g., `10.10.10.9:8020`). This is typically
used for VPC internal access or Kubernetes Service IP within cluster.
## Access Control Requirements
@@ -73,3 +75,17 @@ The user executing this SQL must have at least the following
permissions:
ALTER SYSTEM ADD BACKEND "192.168.0.3:9050" PROPERTIES
("tag.compute_group_name" = "cloud_groupc");
```
This command adds a single BE node (IP 192.168.0.3, port 9050) to the
compute group `cloud_groupc` in the cluster.
+
+4. Add a BE node with public and private endpoints configured for complex
network environments
+ ```sql
+ ALTER SYSTEM ADD BACKEND "192.168.1.1:9050" PROPERTIES (
+ "tag.public_endpoint" = "11.10.20.12:8010",
+ "tag.private_endpoint" = "10.10.10.9:8020"
+ );
+ ```
+ This command adds a BE node with multiple network endpoints:
+ * `192.168.1.1:9050`: The internal address (be_host) for cluster
communication
+ * `11.10.20.12:8010`: The public endpoint for external user access through
load balancer
+ * `10.10.10.9:8020`: The private endpoint for VPC internal or Kubernetes
cross-cluster access
+
+ This configuration is useful in cloud environments or Kubernetes clusters
where BE nodes need to be accessible from different network contexts. For more
details, see [Stream Load in Complex Network
Environments](../../../../data-operate/import/load-internals/stream-load-in-complex-network.md).
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-internals/stream-load-in-complex-network.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-internals/stream-load-in-complex-network.md
new file mode 100644
index 00000000000..a836034ba40
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-internals/stream-load-in-complex-network.md
@@ -0,0 +1,134 @@
+---
+{
+ "title": "复杂网络环境下的 Stream Load 实践",
+ "language": "zh-CN",
+ "description": "Apache Doris 在复杂网络环境(公有云、私有云、Kubernetes 跨集群访问)下的 Stream
Load 最佳实践。"
+}
+---
+
+## 概述
+
+在公有云、私有云、Kubernetes 跨集群等复杂网络环境中,数据导入面临独特挑战。负载均衡器(LB)和网络隔离(VPC
内外访问)会影响请求路由的灵活性和批处理效率。
+
+Apache Doris 通过以下两个特性解决这些挑战:
+- **Stream Load 多端点支持**:支持为 BE 节点灵活配置多个网络端点
+- **Group Commit LB 调度优化**:确保请求经过负载均衡器时仍能高效批处理
+
+## 背景
+
+### Stream Load
+
+Stream Load 是基于 HTTP 的数据导入方式,支持 JSON、CSV 等格式。作为推送式方法,客户端通过 HTTP 请求直接将数据发送到 BE
节点,绕过 MySQL 协议。这种设计支持高并发、低延迟和高吞吐,特别适合小批量、高频写入场景。
+
+### Group Commit
+
+Group Commit 通过在服务端将多个小请求合并为大批量操作来优化吞吐量,减少磁盘 I/O、锁竞争和 Compaction
开销。为实现最佳效率,Group Commit 要求同一表的请求路由到同一 BE 节点。
+
+### 问题
+
+在云环境中,负载均衡器会将请求随机分发到各 BE 节点,破坏了 Group Commit
所需的"节点亲和性",导致同一表的请求分散到不同节点。测试表明,高并发场景下吞吐量可能因此下降 20-50%。
+
+## Stream Load 多端点支持
+
+### 地址类型
+
+Doris BE 节点支持三种地址类型,以适配不同的网络访问场景:
+
+| 地址类型 | 用途 | 示例 |
+|---------|------|------|
+| `be_host` | 集群内部通信 | `192.168.1.1:9050` |
+| `public_endpoint` | 外部公网访问(通过 LB 或公网 IP) | `11.10.20.12:8010` |
+| `private_endpoint` | VPC 内部或 Kubernetes Service IP 访问 | `10.10.10.9:8020` |
+
+### 配置方式
+
+通过 SQL 语句配置端点:
+
+```sql
+-- 添加 BE 节点并配置端点
+ALTER SYSTEM ADD BACKEND '192.168.1.1:9050' PROPERTIES(
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+
+-- 修改现有 BE 节点的端点
+ALTER SYSTEM MODIFY BACKEND '192.168.1.1:9050' SET (
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+```
+
+### 重定向策略
+
+通过 `redirect-policy` HTTP 头控制请求路由:
+
+| 策略 | 行为 | 适用场景 |
+|-----|------|---------|
+| `direct` | 路由到 `be_host` | 内部低延迟通信、Pod 间通信 |
+| `public` | 路由到 `public_endpoint` | 外部公网访问 |
+| `private` | 路由到 `private_endpoint` | VPC 内部或跨集群访问 |
+| 默认(空) | 根据主机名自动选择 | 通用场景 |
+
+**默认行为:**
+1. 若请求主机名与 `public_endpoint` 主机名匹配,路由到 `public_endpoint`
+2. 否则若配置了 `private_endpoint`,路由到 `private_endpoint`
+3. 否则回退到 `be_host`
+
+**示例:**
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### 工作原理
+
+1. 客户端向 FE 发送 Stream Load 请求,可携带 `redirect-policy` 头
+2. FE 根据策略从 BE 地址池中选择目标地址
+3. FE 返回 HTTP 重定向响应,指向选定的端点
+
+## Group Commit LB 调度优化
+
+### 两阶段转发机制
+
+为在负载均衡器后保持 Group Commit 效率,Doris 实现了两阶段转发机制:
+
+**第一阶段:FE 重定向**
+- FE 根据 `redirect-policy` 选择合适的端点
+- FE 确定应处理目标表的 BE 节点
+- 请求经 LB 重定向,LB 随机分发到某个 BE 节点
+
+**第二阶段:BE 转发**
+- 若接收请求的 BE(BE1)不是该表的指定节点
+- BE1 通过 `be_host` 将请求内部转发到正确的 BE(BE2)
+- 确保同一表的所有请求到达同一节点
+
+### 配置示例
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -H "group_commit: async_mode" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### 性能表现
+
+两阶段转发引入的开销极小(毫秒级),而 Group Commit 的批处理优化可在高并发场景下提升 20-50% 的吞吐量。
+
+## 应用场景
+
+| 场景 | 配置 | 收益 |
+|-----|------|-----|
+| 实时日志采集 | Group Commit + 多端点 | 高吞吐 + 灵活路由 |
+| 云原生 BI | `public_endpoint` 外部访问 | 安全的外部用户访问 |
+| Kubernetes 跨集群 | `private_endpoint` 配合 Pod/Service IP | 高效跨集群通信 |
+
+## 注意事项
+
+- **配置规划**:确保端点地址配置正确,尤其在 Kubernetes 环境中
+- **监控**:使用监控工具跟踪转发率和性能指标
+- **版本要求**:需要 Doris 3.1.0 或更高版本
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
index 777910ff2b0..160e3c01bb7 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
@@ -33,6 +33,8 @@ ALTER SYSTEM ADD BACKEND
"<host>:<heartbeat_port>"[,"<host>:<heartbeat_port>"...
> 一组键值对,用于定义 BE 节点的附加属性。这些属性可用于自定义正在添加的 BE 的配置。可用属性包括:
> - `tag.location`:存算一体模式下用于指定 BE 节点所属的资源组。
> - `tag.compute_group_name`:存算分离模式下用于指定 BE 节点所属的计算组。
+> - `tag.public_endpoint`:用于指定 BE 节点的公网端点,供外部访问使用(如
`11.10.20.12:8010`)。通常是负载均衡器的域名或公网 IP,用于外部用户访问。
+> - `tag.private_endpoint`:用于指定 BE 节点的私网端点,供私有网络访问使用(如 `10.10.10.9:8020`)。通常用于
VPC 内部访问或 Kubernetes 集群内的 Service IP。
## 权限控制
@@ -73,3 +75,17 @@ ALTER SYSTEM ADD BACKEND
"<host>:<heartbeat_port>"[,"<host>:<heartbeat_port>"...
ALTER SYSTEM ADD BACKEND "192.168.0.3:9050" PROPERTIES
("tag.compute_group_name" = "cloud_groupc");
```
此命令将单个 BE 节点(IP 192.168.0.3,端口 9050)添加到集群中的计算组`cloud_groupc`。
+
+4. 在复杂网络环境下,添加配置了公网和私网端点的 BE 节点
+ ```sql
+ ALTER SYSTEM ADD BACKEND "192.168.1.1:9050" PROPERTIES (
+ "tag.public_endpoint" = "11.10.20.12:8010",
+ "tag.private_endpoint" = "10.10.10.9:8020"
+ );
+ ```
+ 此命令添加一个具有多个网络端点的 BE 节点:
+ * `192.168.1.1:9050`:用于集群内部通信的内部地址(be_host)
+ * `11.10.20.12:8010`:公网端点,用于外部用户通过负载均衡器访问
+ * `10.10.10.9:8020`:私网端点,用于 VPC 内部或 Kubernetes 跨集群访问
+
+ 此配置在云环境或 Kubernetes 集群中非常有用,BE 节点需要从不同的网络环境进行访问。详情请参阅[复杂网络环境下的 Stream Load
实践](../../../../data-operate/import/load-internals/stream-load-in-complex-network.md)。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/data-operate/import/load-internals/stream-load-in-complex-network.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/data-operate/import/load-internals/stream-load-in-complex-network.md
new file mode 100644
index 00000000000..a836034ba40
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/data-operate/import/load-internals/stream-load-in-complex-network.md
@@ -0,0 +1,134 @@
+---
+{
+ "title": "复杂网络环境下的 Stream Load 实践",
+ "language": "zh-CN",
+ "description": "Apache Doris 在复杂网络环境(公有云、私有云、Kubernetes 跨集群访问)下的 Stream
Load 最佳实践。"
+}
+---
+
+## 概述
+
+在公有云、私有云、Kubernetes 跨集群等复杂网络环境中,数据导入面临独特挑战。负载均衡器(LB)和网络隔离(VPC
内外访问)会影响请求路由的灵活性和批处理效率。
+
+Apache Doris 通过以下两个特性解决这些挑战:
+- **Stream Load 多端点支持**:支持为 BE 节点灵活配置多个网络端点
+- **Group Commit LB 调度优化**:确保请求经过负载均衡器时仍能高效批处理
+
+## 背景
+
+### Stream Load
+
+Stream Load 是基于 HTTP 的数据导入方式,支持 JSON、CSV 等格式。作为推送式方法,客户端通过 HTTP 请求直接将数据发送到 BE
节点,绕过 MySQL 协议。这种设计支持高并发、低延迟和高吞吐,特别适合小批量、高频写入场景。
+
+### Group Commit
+
+Group Commit 通过在服务端将多个小请求合并为大批量操作来优化吞吐量,减少磁盘 I/O、锁竞争和 Compaction
开销。为实现最佳效率,Group Commit 要求同一表的请求路由到同一 BE 节点。
+
+### 问题
+
+在云环境中,负载均衡器会将请求随机分发到各 BE 节点,破坏了 Group Commit
所需的"节点亲和性",导致同一表的请求分散到不同节点。测试表明,高并发场景下吞吐量可能因此下降 20-50%。
+
+## Stream Load 多端点支持
+
+### 地址类型
+
+Doris BE 节点支持三种地址类型,以适配不同的网络访问场景:
+
+| 地址类型 | 用途 | 示例 |
+|---------|------|------|
+| `be_host` | 集群内部通信 | `192.168.1.1:9050` |
+| `public_endpoint` | 外部公网访问(通过 LB 或公网 IP) | `11.10.20.12:8010` |
+| `private_endpoint` | VPC 内部或 Kubernetes Service IP 访问 | `10.10.10.9:8020` |
+
+### 配置方式
+
+通过 SQL 语句配置端点:
+
+```sql
+-- 添加 BE 节点并配置端点
+ALTER SYSTEM ADD BACKEND '192.168.1.1:9050' PROPERTIES(
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+
+-- 修改现有 BE 节点的端点
+ALTER SYSTEM MODIFY BACKEND '192.168.1.1:9050' SET (
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+```
+
+### 重定向策略
+
+通过 `redirect-policy` HTTP 头控制请求路由:
+
+| 策略 | 行为 | 适用场景 |
+|-----|------|---------|
+| `direct` | 路由到 `be_host` | 内部低延迟通信、Pod 间通信 |
+| `public` | 路由到 `public_endpoint` | 外部公网访问 |
+| `private` | 路由到 `private_endpoint` | VPC 内部或跨集群访问 |
+| 默认(空) | 根据主机名自动选择 | 通用场景 |
+
+**默认行为:**
+1. 若请求主机名与 `public_endpoint` 主机名匹配,路由到 `public_endpoint`
+2. 否则若配置了 `private_endpoint`,路由到 `private_endpoint`
+3. 否则回退到 `be_host`
+
+**示例:**
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### 工作原理
+
+1. 客户端向 FE 发送 Stream Load 请求,可携带 `redirect-policy` 头
+2. FE 根据策略从 BE 地址池中选择目标地址
+3. FE 返回 HTTP 重定向响应,指向选定的端点
+
+## Group Commit LB 调度优化
+
+### 两阶段转发机制
+
+为在负载均衡器后保持 Group Commit 效率,Doris 实现了两阶段转发机制:
+
+**第一阶段:FE 重定向**
+- FE 根据 `redirect-policy` 选择合适的端点
+- FE 确定应处理目标表的 BE 节点
+- 请求经 LB 重定向,LB 随机分发到某个 BE 节点
+
+**第二阶段:BE 转发**
+- 若接收请求的 BE(BE1)不是该表的指定节点
+- BE1 通过 `be_host` 将请求内部转发到正确的 BE(BE2)
+- 确保同一表的所有请求到达同一节点
+
+### 配置示例
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -H "group_commit: async_mode" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### 性能表现
+
+两阶段转发引入的开销极小(毫秒级),而 Group Commit 的批处理优化可在高并发场景下提升 20-50% 的吞吐量。
+
+## 应用场景
+
+| 场景 | 配置 | 收益 |
+|-----|------|-----|
+| 实时日志采集 | Group Commit + 多端点 | 高吞吐 + 灵活路由 |
+| 云原生 BI | `public_endpoint` 外部访问 | 安全的外部用户访问 |
+| Kubernetes 跨集群 | `private_endpoint` 配合 Pod/Service IP | 高效跨集群通信 |
+
+## 注意事项
+
+- **配置规划**:确保端点地址配置正确,尤其在 Kubernetes 环境中
+- **监控**:使用监控工具跟踪转发率和性能指标
+- **版本要求**:需要 Doris 3.1.0 或更高版本
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
index 777910ff2b0..160e3c01bb7 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
@@ -33,6 +33,8 @@ ALTER SYSTEM ADD BACKEND
"<host>:<heartbeat_port>"[,"<host>:<heartbeat_port>"...
> 一组键值对,用于定义 BE 节点的附加属性。这些属性可用于自定义正在添加的 BE 的配置。可用属性包括:
> - `tag.location`:存算一体模式下用于指定 BE 节点所属的资源组。
> - `tag.compute_group_name`:存算分离模式下用于指定 BE 节点所属的计算组。
+> - `tag.public_endpoint`:用于指定 BE 节点的公网端点,供外部访问使用(如
`11.10.20.12:8010`)。通常是负载均衡器的域名或公网 IP,用于外部用户访问。
+> - `tag.private_endpoint`:用于指定 BE 节点的私网端点,供私有网络访问使用(如 `10.10.10.9:8020`)。通常用于
VPC 内部访问或 Kubernetes 集群内的 Service IP。
## 权限控制
@@ -73,3 +75,17 @@ ALTER SYSTEM ADD BACKEND
"<host>:<heartbeat_port>"[,"<host>:<heartbeat_port>"...
ALTER SYSTEM ADD BACKEND "192.168.0.3:9050" PROPERTIES
("tag.compute_group_name" = "cloud_groupc");
```
此命令将单个 BE 节点(IP 192.168.0.3,端口 9050)添加到集群中的计算组`cloud_groupc`。
+
+4. 在复杂网络环境下,添加配置了公网和私网端点的 BE 节点
+ ```sql
+ ALTER SYSTEM ADD BACKEND "192.168.1.1:9050" PROPERTIES (
+ "tag.public_endpoint" = "11.10.20.12:8010",
+ "tag.private_endpoint" = "10.10.10.9:8020"
+ );
+ ```
+ 此命令添加一个具有多个网络端点的 BE 节点:
+ * `192.168.1.1:9050`:用于集群内部通信的内部地址(be_host)
+ * `11.10.20.12:8010`:公网端点,用于外部用户通过负载均衡器访问
+ * `10.10.10.9:8020`:私网端点,用于 VPC 内部或 Kubernetes 跨集群访问
+
+ 此配置在云环境或 Kubernetes 集群中非常有用,BE 节点需要从不同的网络环境进行访问。详情请参阅[复杂网络环境下的 Stream Load
实践](../../../../data-operate/import/load-internals/stream-load-in-complex-network.md)。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/load-internals/stream-load-in-complex-network.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/load-internals/stream-load-in-complex-network.md
new file mode 100644
index 00000000000..a836034ba40
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/load-internals/stream-load-in-complex-network.md
@@ -0,0 +1,134 @@
+---
+{
+ "title": "复杂网络环境下的 Stream Load 实践",
+ "language": "zh-CN",
+ "description": "Apache Doris 在复杂网络环境(公有云、私有云、Kubernetes 跨集群访问)下的 Stream
Load 最佳实践。"
+}
+---
+
+## 概述
+
+在公有云、私有云、Kubernetes 跨集群等复杂网络环境中,数据导入面临独特挑战。负载均衡器(LB)和网络隔离(VPC
内外访问)会影响请求路由的灵活性和批处理效率。
+
+Apache Doris 通过以下两个特性解决这些挑战:
+- **Stream Load 多端点支持**:支持为 BE 节点灵活配置多个网络端点
+- **Group Commit LB 调度优化**:确保请求经过负载均衡器时仍能高效批处理
+
+## 背景
+
+### Stream Load
+
+Stream Load 是基于 HTTP 的数据导入方式,支持 JSON、CSV 等格式。作为推送式方法,客户端通过 HTTP 请求直接将数据发送到 BE
节点,绕过 MySQL 协议。这种设计支持高并发、低延迟和高吞吐,特别适合小批量、高频写入场景。
+
+### Group Commit
+
+Group Commit 通过在服务端将多个小请求合并为大批量操作来优化吞吐量,减少磁盘 I/O、锁竞争和 Compaction
开销。为实现最佳效率,Group Commit 要求同一表的请求路由到同一 BE 节点。
+
+### 问题
+
+在云环境中,负载均衡器会将请求随机分发到各 BE 节点,破坏了 Group Commit
所需的"节点亲和性",导致同一表的请求分散到不同节点。测试表明,高并发场景下吞吐量可能因此下降 20-50%。
+
+## Stream Load 多端点支持
+
+### 地址类型
+
+Doris BE 节点支持三种地址类型,以适配不同的网络访问场景:
+
+| 地址类型 | 用途 | 示例 |
+|---------|------|------|
+| `be_host` | 集群内部通信 | `192.168.1.1:9050` |
+| `public_endpoint` | 外部公网访问(通过 LB 或公网 IP) | `11.10.20.12:8010` |
+| `private_endpoint` | VPC 内部或 Kubernetes Service IP 访问 | `10.10.10.9:8020` |
+
+### 配置方式
+
+通过 SQL 语句配置端点:
+
+```sql
+-- 添加 BE 节点并配置端点
+ALTER SYSTEM ADD BACKEND '192.168.1.1:9050' PROPERTIES(
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+
+-- 修改现有 BE 节点的端点
+ALTER SYSTEM MODIFY BACKEND '192.168.1.1:9050' SET (
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+```
+
+### 重定向策略
+
+通过 `redirect-policy` HTTP 头控制请求路由:
+
+| 策略 | 行为 | 适用场景 |
+|-----|------|---------|
+| `direct` | 路由到 `be_host` | 内部低延迟通信、Pod 间通信 |
+| `public` | 路由到 `public_endpoint` | 外部公网访问 |
+| `private` | 路由到 `private_endpoint` | VPC 内部或跨集群访问 |
+| 默认(空) | 根据主机名自动选择 | 通用场景 |
+
+**默认行为:**
+1. 若请求主机名与 `public_endpoint` 主机名匹配,路由到 `public_endpoint`
+2. 否则若配置了 `private_endpoint`,路由到 `private_endpoint`
+3. 否则回退到 `be_host`
+
+**示例:**
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### 工作原理
+
+1. 客户端向 FE 发送 Stream Load 请求,可携带 `redirect-policy` 头
+2. FE 根据策略从 BE 地址池中选择目标地址
+3. FE 返回 HTTP 重定向响应,指向选定的端点
+
+## Group Commit LB 调度优化
+
+### 两阶段转发机制
+
+为在负载均衡器后保持 Group Commit 效率,Doris 实现了两阶段转发机制:
+
+**第一阶段:FE 重定向**
+- FE 根据 `redirect-policy` 选择合适的端点
+- FE 确定应处理目标表的 BE 节点
+- 请求经 LB 重定向,LB 随机分发到某个 BE 节点
+
+**第二阶段:BE 转发**
+- 若接收请求的 BE(BE1)不是该表的指定节点
+- BE1 通过 `be_host` 将请求内部转发到正确的 BE(BE2)
+- 确保同一表的所有请求到达同一节点
+
+### 配置示例
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -H "group_commit: async_mode" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### 性能表现
+
+两阶段转发引入的开销极小(毫秒级),而 Group Commit 的批处理优化可在高并发场景下提升 20-50% 的吞吐量。
+
+## 应用场景
+
+| 场景 | 配置 | 收益 |
+|-----|------|-----|
+| 实时日志采集 | Group Commit + 多端点 | 高吞吐 + 灵活路由 |
+| 云原生 BI | `public_endpoint` 外部访问 | 安全的外部用户访问 |
+| Kubernetes 跨集群 | `private_endpoint` 配合 Pod/Service IP | 高效跨集群通信 |
+
+## 注意事项
+
+- **配置规划**:确保端点地址配置正确,尤其在 Kubernetes 环境中
+- **监控**:使用监控工具跟踪转发率和性能指标
+- **版本要求**:需要 Doris 3.1.0 或更高版本
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
index 777910ff2b0..160e3c01bb7 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
@@ -33,6 +33,8 @@ ALTER SYSTEM ADD BACKEND
"<host>:<heartbeat_port>"[,"<host>:<heartbeat_port>"...
> 一组键值对,用于定义 BE 节点的附加属性。这些属性可用于自定义正在添加的 BE 的配置。可用属性包括:
> - `tag.location`:存算一体模式下用于指定 BE 节点所属的资源组。
> - `tag.compute_group_name`:存算分离模式下用于指定 BE 节点所属的计算组。
+> - `tag.public_endpoint`:用于指定 BE 节点的公网端点,供外部访问使用(如
`11.10.20.12:8010`)。通常是负载均衡器的域名或公网 IP,用于外部用户访问。
+> - `tag.private_endpoint`:用于指定 BE 节点的私网端点,供私有网络访问使用(如 `10.10.10.9:8020`)。通常用于
VPC 内部访问或 Kubernetes 集群内的 Service IP。
## 权限控制
@@ -73,3 +75,17 @@ ALTER SYSTEM ADD BACKEND
"<host>:<heartbeat_port>"[,"<host>:<heartbeat_port>"...
ALTER SYSTEM ADD BACKEND "192.168.0.3:9050" PROPERTIES
("tag.compute_group_name" = "cloud_groupc");
```
此命令将单个 BE 节点(IP 192.168.0.3,端口 9050)添加到集群中的计算组`cloud_groupc`。
+
+4. 在复杂网络环境下,添加配置了公网和私网端点的 BE 节点
+ ```sql
+ ALTER SYSTEM ADD BACKEND "192.168.1.1:9050" PROPERTIES (
+ "tag.public_endpoint" = "11.10.20.12:8010",
+ "tag.private_endpoint" = "10.10.10.9:8020"
+ );
+ ```
+ 此命令添加一个具有多个网络端点的 BE 节点:
+ * `192.168.1.1:9050`:用于集群内部通信的内部地址(be_host)
+ * `11.10.20.12:8010`:公网端点,用于外部用户通过负载均衡器访问
+ * `10.10.10.9:8020`:私网端点,用于 VPC 内部或 Kubernetes 跨集群访问
+
+ 此配置在云环境或 Kubernetes 集群中非常有用,BE 节点需要从不同的网络环境进行访问。详情请参阅[复杂网络环境下的 Stream Load
实践](../../../../data-operate/import/load-internals/stream-load-in-complex-network.md)。
diff --git a/sidebars.ts b/sidebars.ts
index 7b1387f77d6..8ca38d52ad6 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -222,6 +222,7 @@ const sidebars: SidebarsConfig = {
items: [
'data-operate/import/load-internals/load-internals',
'data-operate/import/load-internals/routine-load-internals',
+
'data-operate/import/load-internals/stream-load-in-complex-network',
],
},
"data-operate/import/streaming-job"
diff --git
a/versioned_docs/version-3.x/data-operate/import/load-internals/stream-load-in-complex-network.md
b/versioned_docs/version-3.x/data-operate/import/load-internals/stream-load-in-complex-network.md
new file mode 100644
index 00000000000..5f6710cbefa
--- /dev/null
+++
b/versioned_docs/version-3.x/data-operate/import/load-internals/stream-load-in-complex-network.md
@@ -0,0 +1,134 @@
+---
+{
+ "title": "Stream Load in Complex Network Environments",
+ "language": "en",
+ "description": "Best practices for Stream Load in complex network
environments including public cloud, private cloud, and Kubernetes
cross-cluster access scenarios."
+}
+---
+
+## Overview
+
+In complex network environments such as public cloud, private cloud, and
Kubernetes cross-cluster deployments, data import faces unique challenges. Load
balancers (LB) and network isolation (VPC internal/external access) can impact
both request routing flexibility and batch processing efficiency.
+
+Apache Doris addresses these challenges through two key features:
+- **Stream Load Multi-Endpoint Support**: Enables flexible configuration of
multiple network endpoints for BE nodes
+- **Group Commit LB Scheduling Optimization**: Ensures efficient batch
processing even when requests pass through load balancers
+
+## Background
+
+### Stream Load
+
+Stream Load is an HTTP-based data import method that supports JSON, CSV, and
other formats. As a push-based approach, clients send data directly to Backend
nodes (BE) via HTTP requests, bypassing the MySQL protocol. This design enables
high concurrency, low latency, and high throughput, making it ideal for
small-batch, frequent write scenarios.
+
+### Group Commit
+
+Group Commit optimizes throughput by combining multiple small requests into
larger batch operations on the server side, reducing disk I/O, lock contention,
and compaction overhead. For maximum efficiency, Group Commit requires requests
for the same table to be routed to the same BE node.
+
+### The Challenge
+
+In cloud environments, load balancers randomly distribute requests across BE
nodes. This breaks the "node affinity" required by Group Commit, causing
requests for the same table to scatter across different nodes. Tests show
throughput can drop 20-50% in high-concurrency scenarios due to this issue.
+
+## Stream Load Multi-Endpoint Support
+
+### Address Types
+
+Doris BE nodes support three address types to accommodate different network
access scenarios:
+
+| Address Type | Purpose | Example |
+|-------------|---------|---------|
+| `be_host` | Internal cluster communication | `192.168.1.1:9050` |
+| `public_endpoint` | External public access via LB or public IP |
`11.10.20.12:8010` |
+| `private_endpoint` | Private access within VPC or Kubernetes Service IP |
`10.10.10.9:8020` |
+
+### Configuration
+
+Configure endpoints using SQL statements:
+
+```sql
+-- Add BE node with endpoints
+ALTER SYSTEM ADD BACKEND '192.168.1.1:9050' PROPERTIES(
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+
+-- Modify existing BE node endpoints
+ALTER SYSTEM MODIFY BACKEND '192.168.1.1:9050' SET (
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+```
+
+### Redirect Policy
+
+Control request routing using the `redirect-policy` HTTP header:
+
+| Policy | Behavior | Use Case |
+|--------|----------|----------|
+| `direct` | Routes to `be_host` | Internal low-latency communication,
Pod-to-Pod |
+| `public` | Routes to `public_endpoint` | External access via public network |
+| `private` | Routes to `private_endpoint` | VPC internal or cross-cluster
access |
+| Default (empty) | Auto-selects based on hostname matching | General use |
+
+**Default behavior:**
+1. If request hostname matches `public_endpoint` hostname, routes to
`public_endpoint`
+2. Else if `private_endpoint` is configured, routes to `private_endpoint`
+3. Otherwise, falls back to `be_host`
+
+**Example:**
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### How It Works
+
+1. Client sends Stream Load request to FE with optional `redirect-policy`
header
+2. FE selects target address from BE's address pool based on the policy
+3. FE returns HTTP redirect response to the selected endpoint
+
+## Group Commit LB Scheduling Optimization
+
+### Two-Phase Forwarding
+
+To maintain Group Commit efficiency behind load balancers, Doris implements a
two-phase forwarding mechanism:
+
+**Phase 1: FE Redirect**
+- FE selects the appropriate endpoint based on `redirect-policy`
+- FE determines which BE node should handle the target table
+- Request is redirected through LB, which randomly distributes to a BE node
+
+**Phase 2: BE Forwarding**
+- If the receiving BE (BE1) is not the designated node for the table
+- BE1 forwards the request internally to the correct BE (BE2) via `be_host`
+- This ensures all requests for the same table reach the same node
+
+### Configuration Example
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -H "group_commit: async_mode" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### Performance
+
+The two-phase forwarding introduces minimal overhead (millisecond-level),
while Group Commit's batch processing provides 20-50% throughput improvement in
high-concurrency scenarios.
+
+## Use Cases
+
+| Scenario | Configuration | Benefit |
+|----------|--------------|---------|
+| Real-time log ingestion | Group Commit + Multi-Endpoint | High throughput
with flexible routing |
+| Cloud-native BI | `public_endpoint` for external access | Secure external
user access |
+| Kubernetes cross-cluster | `private_endpoint` with Pod/Service IPs |
Efficient cross-cluster communication |
+
+## Considerations
+
+- **Configuration planning**: Ensure endpoint addresses are correctly
configured, especially in Kubernetes environments
+- **Monitoring**: Use monitoring tools to track forwarding rates and
performance
+- **Version requirement**: These features require Doris 3.1.0 or later
diff --git
a/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
b/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
index 02c2b9f65ef..225f6397d85 100644
---
a/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
+++
b/versioned_docs/version-3.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
@@ -33,6 +33,8 @@ ALTER SYSTEM ADD BACKEND
"<host>:<heartbeat_port>"[,"<host>:<heartbeat_port>" [,
> A set of key-value pairs used to define additional properties of the BE
> node. These properties can be used to customize the configuration of the BE
> being added. Available properties include:
> - `tag.location`: Used to specify the Resource Group to which the BE node
> belongs in the integrated storage and computing mode.
> - `tag.compute_group_name`: Used to specify the compute group to which the
> BE node belongs in the decoupling storage and computing mode.
+> - `tag.public_endpoint`: Used to specify the public endpoint of the BE node
for external access (e.g., `11.10.20.12:8010`). This is typically a load
balancer domain name or public IP for external user access.
+> - `tag.private_endpoint`: Used to specify the private endpoint of the BE
node for private network access (e.g., `10.10.10.9:8020`). This is typically
used for VPC internal access or Kubernetes Service IP within cluster.
## Access Control Requirements
@@ -73,3 +75,17 @@ The user executing this SQL must have at least the following
permissions:
ALTER SYSTEM ADD BACKEND "192.168.0.3:9050" PROPERTIES
("tag.compute_group_name" = "cloud_groupc");
```
This command adds a single BE node (IP 192.168.0.3, port 9050) to the
compute group `cloud_groupc` in the cluster.
+
+4. Add a BE node with public and private endpoints configured for complex
network environments
+ ```sql
+ ALTER SYSTEM ADD BACKEND "192.168.1.1:9050" PROPERTIES (
+ "tag.public_endpoint" = "11.10.20.12:8010",
+ "tag.private_endpoint" = "10.10.10.9:8020"
+ );
+ ```
+ This command adds a BE node with multiple network endpoints:
+ * `192.168.1.1:9050`: The internal address (be_host) for cluster
communication
+ * `11.10.20.12:8010`: The public endpoint for external user access through
load balancer
+ * `10.10.10.9:8020`: The private endpoint for VPC internal or Kubernetes
cross-cluster access
+
+ This configuration is useful in cloud environments or Kubernetes clusters
where BE nodes need to be accessible from different network contexts. For more
details, see [Stream Load in Complex Network
Environments](../../../../data-operate/import/load-internals/stream-load-in-complex-network.md).
diff --git
a/versioned_docs/version-4.x/data-operate/import/load-internals/stream-load-in-complex-network.md
b/versioned_docs/version-4.x/data-operate/import/load-internals/stream-load-in-complex-network.md
new file mode 100644
index 00000000000..5f6710cbefa
--- /dev/null
+++
b/versioned_docs/version-4.x/data-operate/import/load-internals/stream-load-in-complex-network.md
@@ -0,0 +1,134 @@
+---
+{
+ "title": "Stream Load in Complex Network Environments",
+ "language": "en",
+ "description": "Best practices for Stream Load in complex network
environments including public cloud, private cloud, and Kubernetes
cross-cluster access scenarios."
+}
+---
+
+## Overview
+
+In complex network environments such as public cloud, private cloud, and
Kubernetes cross-cluster deployments, data import faces unique challenges. Load
balancers (LB) and network isolation (VPC internal/external access) can impact
both request routing flexibility and batch processing efficiency.
+
+Apache Doris addresses these challenges through two key features:
+- **Stream Load Multi-Endpoint Support**: Enables flexible configuration of
multiple network endpoints for BE nodes
+- **Group Commit LB Scheduling Optimization**: Ensures efficient batch
processing even when requests pass through load balancers
+
+## Background
+
+### Stream Load
+
+Stream Load is an HTTP-based data import method that supports JSON, CSV, and
other formats. As a push-based approach, clients send data directly to Backend
nodes (BE) via HTTP requests, bypassing the MySQL protocol. This design enables
high concurrency, low latency, and high throughput, making it ideal for
small-batch, frequent write scenarios.
+
+### Group Commit
+
+Group Commit optimizes throughput by combining multiple small requests into
larger batch operations on the server side, reducing disk I/O, lock contention,
and compaction overhead. For maximum efficiency, Group Commit requires requests
for the same table to be routed to the same BE node.
+
+### The Challenge
+
+In cloud environments, load balancers randomly distribute requests across BE
nodes. This breaks the "node affinity" required by Group Commit, causing
requests for the same table to scatter across different nodes. Tests show
throughput can drop 20-50% in high-concurrency scenarios due to this issue.
+
+## Stream Load Multi-Endpoint Support
+
+### Address Types
+
+Doris BE nodes support three address types to accommodate different network
access scenarios:
+
+| Address Type | Purpose | Example |
+|-------------|---------|---------|
+| `be_host` | Internal cluster communication | `192.168.1.1:9050` |
+| `public_endpoint` | External public access via LB or public IP |
`11.10.20.12:8010` |
+| `private_endpoint` | Private access within VPC or Kubernetes Service IP |
`10.10.10.9:8020` |
+
+### Configuration
+
+Configure endpoints using SQL statements:
+
+```sql
+-- Add BE node with endpoints
+ALTER SYSTEM ADD BACKEND '192.168.1.1:9050' PROPERTIES(
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+
+-- Modify existing BE node endpoints
+ALTER SYSTEM MODIFY BACKEND '192.168.1.1:9050' SET (
+ 'tag.public_endpoint' = '11.10.20.12:8010',
+ 'tag.private_endpoint' = '10.10.10.9:8020'
+);
+```
+
+### Redirect Policy
+
+Control request routing using the `redirect-policy` HTTP header:
+
+| Policy | Behavior | Use Case |
+|--------|----------|----------|
+| `direct` | Routes to `be_host` | Internal low-latency communication,
Pod-to-Pod |
+| `public` | Routes to `public_endpoint` | External access via public network |
+| `private` | Routes to `private_endpoint` | VPC internal or cross-cluster
access |
+| Default (empty) | Auto-selects based on hostname matching | General use |
+
+**Default behavior:**
+1. If request hostname matches `public_endpoint` hostname, routes to
`public_endpoint`
+2. Else if `private_endpoint` is configured, routes to `private_endpoint`
+3. Otherwise, falls back to `be_host`
+
+**Example:**
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### How It Works
+
+1. Client sends Stream Load request to FE with optional `redirect-policy`
header
+2. FE selects target address from BE's address pool based on the policy
+3. FE returns HTTP redirect response to the selected endpoint
+
+## Group Commit LB Scheduling Optimization
+
+### Two-Phase Forwarding
+
+To maintain Group Commit efficiency behind load balancers, Doris implements a
two-phase forwarding mechanism:
+
+**Phase 1: FE Redirect**
+- FE selects the appropriate endpoint based on `redirect-policy`
+- FE determines which BE node should handle the target table
+- Request is redirected through LB, which randomly distributes to a BE node
+
+**Phase 2: BE Forwarding**
+- If the receiving BE (BE1) is not the designated node for the table
+- BE1 forwards the request internally to the correct BE (BE2) via `be_host`
+- This ensures all requests for the same table reach the same node
+
+### Configuration Example
+
+```bash
+curl --location-trusted -u user:pass \
+ -H "redirect-policy: private" \
+ -H "group_commit: async_mode" \
+ -T data.csv \
+ http://doris.example.com:8030/api/db_name/table_name/_stream_load
+```
+
+### Performance
+
+The two-phase forwarding introduces minimal overhead (millisecond-level),
while Group Commit's batch processing provides 20-50% throughput improvement in
high-concurrency scenarios.
+
+## Use Cases
+
+| Scenario | Configuration | Benefit |
+|----------|--------------|---------|
+| Real-time log ingestion | Group Commit + Multi-Endpoint | High throughput
with flexible routing |
+| Cloud-native BI | `public_endpoint` for external access | Secure external
user access |
+| Kubernetes cross-cluster | `private_endpoint` with Pod/Service IPs |
Efficient cross-cluster communication |
+
+## Considerations
+
+- **Configuration planning**: Ensure endpoint addresses are correctly
configured, especially in Kubernetes environments
+- **Monitoring**: Use monitoring tools to track forwarding rates and
performance
+- **Version requirement**: These features require Doris 3.1.0 or later
diff --git
a/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
b/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
index 02c2b9f65ef..225f6397d85 100644
---
a/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
+++
b/versioned_docs/version-4.x/sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md
@@ -33,6 +33,8 @@ ALTER SYSTEM ADD BACKEND
"<host>:<heartbeat_port>"[,"<host>:<heartbeat_port>" [,
> A set of key-value pairs used to define additional properties of the BE
> node. These properties can be used to customize the configuration of the BE
> being added. Available properties include:
> - `tag.location`: Used to specify the Resource Group to which the BE node
> belongs in the integrated storage and computing mode.
> - `tag.compute_group_name`: Used to specify the compute group to which the
> BE node belongs in the decoupling storage and computing mode.
+> - `tag.public_endpoint`: Used to specify the public endpoint of the BE node
for external access (e.g., `11.10.20.12:8010`). This is typically a load
balancer domain name or public IP for external user access.
+> - `tag.private_endpoint`: Used to specify the private endpoint of the BE
node for private network access (e.g., `10.10.10.9:8020`). This is typically
used for VPC internal access or Kubernetes Service IP within cluster.
## Access Control Requirements
@@ -73,3 +75,17 @@ The user executing this SQL must have at least the following
permissions:
ALTER SYSTEM ADD BACKEND "192.168.0.3:9050" PROPERTIES
("tag.compute_group_name" = "cloud_groupc");
```
This command adds a single BE node (IP 192.168.0.3, port 9050) to the
compute group `cloud_groupc` in the cluster.
+
+4. Add a BE node with public and private endpoints configured for complex
network environments
+ ```sql
+ ALTER SYSTEM ADD BACKEND "192.168.1.1:9050" PROPERTIES (
+ "tag.public_endpoint" = "11.10.20.12:8010",
+ "tag.private_endpoint" = "10.10.10.9:8020"
+ );
+ ```
+ This command adds a BE node with multiple network endpoints:
+ * `192.168.1.1:9050`: The internal address (be_host) for cluster
communication
+ * `11.10.20.12:8010`: The public endpoint for external user access through
load balancer
+ * `10.10.10.9:8020`: The private endpoint for VPC internal or Kubernetes
cross-cluster access
+
+ This configuration is useful in cloud environments or Kubernetes clusters
where BE nodes need to be accessible from different network contexts. For more
details, see [Stream Load in Complex Network
Environments](../../../../data-operate/import/load-internals/stream-load-in-complex-network.md).
diff --git a/versioned_sidebars/version-3.x-sidebars.json
b/versioned_sidebars/version-3.x-sidebars.json
index a4b82da165f..281d39d207f 100644
--- a/versioned_sidebars/version-3.x-sidebars.json
+++ b/versioned_sidebars/version-3.x-sidebars.json
@@ -213,7 +213,14 @@
"data-operate/import/load-data-convert",
"data-operate/import/load-high-availability",
"data-operate/import/group-commit-manual",
- "data-operate/import/load-best-practices"
+ "data-operate/import/load-best-practices",
+ {
+ "type": "category",
+ "label": "Load Internals",
+ "items": [
+
"data-operate/import/load-internals/stream-load-in-complex-network"
+ ]
+ }
]
},
{
diff --git a/versioned_sidebars/version-4.x-sidebars.json
b/versioned_sidebars/version-4.x-sidebars.json
index ee402b0ae9a..75079f02983 100644
--- a/versioned_sidebars/version-4.x-sidebars.json
+++ b/versioned_sidebars/version-4.x-sidebars.json
@@ -226,7 +226,8 @@
"label": "Load Internals",
"items": [
"data-operate/import/load-internals/load-internals",
-
"data-operate/import/load-internals/routine-load-internals"
+
"data-operate/import/load-internals/routine-load-internals",
+
"data-operate/import/load-internals/stream-load-in-complex-network"
]
},
"data-operate/import/streaming-job"
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]