(doris-website) branch master updated: add load best practices (#2301)

liaoxin Fri, 18 Apr 2025 02:05:03 -0700

This is an automated email from the ASF dual-hosted git repository.

liaoxin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new f9a90042401 add load best practices (#2301)
f9a90042401 is described below

commit f9a9004240101e3a9a703fc6c911d38669021cc8
Author: Xin Liao <liao...@selectdb.com>
AuthorDate: Fri Apr 18 17:04:53 2025 +0800

    add load best practices (#2301)
---
 docs/data-operate/import/load-best-practices.md    | 62 ++++++++++++++++++++++
 .../data-operate/import/load-best-practices.md     | 62 ++++++++++++++++++++++
 .../data-operate/import/load-best-practices.md     | 62 ++++++++++++++++++++++
 .../data-operate/import/load-best-practices.md     | 62 ++++++++++++++++++++++
 sidebars.json                                      |  3 +-
 .../data-operate/import/load-best-practices.md     | 62 ++++++++++++++++++++++
 .../data-operate/import/load-best-practices.md     | 62 ++++++++++++++++++++++
 versioned_sidebars/version-2.1-sidebars.json       |  3 +-
 versioned_sidebars/version-3.0-sidebars.json       |  3 +-
 9 files changed, 378 insertions(+), 3 deletions(-)

diff --git a/docs/data-operate/import/load-best-practices.md 
b/docs/data-operate/import/load-best-practices.md
new file mode 100644
index 00000000000..af4598fef1e
--- /dev/null
+++ b/docs/data-operate/import/load-best-practices.md
@@ -0,0 +1,62 @@
+---
+{
+    "title": "Load Best Practices",
+    "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+- Table Model Selection
+
+It is recommended to prioritize using the Duplicate Key model, which offers 
advantages in both data loading and query performance compared to other models. 
For more information, please refer to: [Data 
Model](../../table-design/data-model/overview)
+
+- Partition and Bucket Configuration
+
+It is recommended to keep the size of a tablet between 1-10GB. Tablets that 
are too small may lead to poor aggregation performance and increase metadata 
management overhead; tablets that are too large may hinder replica migration 
and repair. For details, please refer to: [Data 
Distribution](../../table-design/data-partitioning/data-distribution).
+
+- Random Bucketing
+
+When using Random bucketing, you can enable single-tablet loading mode by 
setting load_to_single_tablet to true. This mode can improve data loading 
concurrency and throughput while reducing write amplification during 
large-scale data loading. For details, refer to: [Random 
Bucketing](../../table-design/data-partitioning/data-bucketing#random-bucketing)
+
+- Batch Loading
+
+Client-side batching: It is recommended to batch data (from several MB to GB 
in size) on the client side before loading. High-frequency small loads will 
cause frequent compaction, leading to severe write amplification issues.
+Server-side batching: For high-concurrency small data volume loading, it is 
recommended to enable [Group Commit](group-commit-manual.md) to implement 
batching on the server side.
+
+- Partition Loading
+
+It is recommended to load data from only a few partitions at a time. Loading 
from too many partitions simultaneously will increase memory usage and may 
cause performance issues. Each tablet in Doris has an active Memtable in 
memory, which is flushed to disk when it reaches a certain size. To prevent 
process OOM, when the active Memtable's memory usage is too high, it will 
trigger early flushing, resulting in many small files and affecting loading 
performance.
+
+- Large-scale Data Batch Loading
+
+When dealing with a large number of files or large data volumes, it is 
recommended to load in batches to avoid high retry costs in case of loading 
failures and to reduce system resource impact. For Broker Load, it is 
recommended not to exceed 100GB per batch. For large local data files, you can 
use Doris's streamloader tool, which automatically performs batch loading.
+
+- Broker Load Concurrency
+
+Compressed files/Parquet/ORC files: It is recommended to split files into 
multiple smaller files for loading to achieve higher concurrency.
+
+Uncompressed CSV and JSON files: Doris will automatically split files and load 
them concurrently.
+
+For concurrency strategies, please refer to: [Broker Load Configuration 
Parameters](./import-way/broker-load-manual#Related-Configurations)
+
+- Stream Load Concurrency
+
+It is recommended to keep Stream load concurrency per BE under 128 (controlled 
by BE's webserver_num_workers parameter). High concurrency may cause webserver 
thread exhaustion and affect loading performance. Particularly when a single 
BE's concurrency exceeds 512 (doris_max_remote_scanner_thread_pool_thread_num 
parameter), it may cause the BE process to hang.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-best-practices.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-best-practices.md
new file mode 100644
index 00000000000..8285ddc95d1
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-best-practices.md
@@ -0,0 +1,62 @@
+---
+{
+    "title": "导入最佳实践",
+    "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+- 表模型选择 
+
+建议优先考虑使用明细模型, 
明细模型在数据导入和查询性能方面相比其他模型都具有优势。如需了解更多信息，请参考：[数据模型](../../table-design/data-model/overview)
+
+- 分区分桶配置
+
+建议一个 tablet 的大小在 1-10G 范围内。过小的 tablet 可能导致聚合效果不佳，增加元数据管理压力；过大的 tablet 
不利于副本迁移、补齐。详细请参考：[数据分布](../../table-design/data-partitioning/data-distribution)。
+
+- Random 分桶
+
+在使用Random分桶时，可以通过设置load_to_single_tablet为true来启用单分片导入模式。这种模式在大规模数据导入过程中，能够提升数据导入的并发度和吞吐量，减少写放大问题。详细参考：[Random分桶](../../table-design/data-partitioning/data-bucketing#random-分桶)
+
+- 攒批导入
+
+客户端攒批‌：建议将数据在客户端进行攒批（数MB到数GB大小）后再进行导入，高频小导入会频繁做compaction，导致严重的写放大问题。
+服务端攒批：对于高并发小数据量导入，建议打开[Group Commit](group-commit-manual.md)，在服务端实现攒批导入。
+
+- 分区导入
+
+每次导入建议只导入少量分区的数据。过多的分区同时导入会增加内存占用，并可能导致性能问题。Doris每个tablet在内存中有一个活跃的Memtable，每个Memtable达到一定大小时才会下刷到磁盘。为了避免进程OOM，当活跃的Memtable占用内存过高时，会提前触发Memtable下刷，导致产生大量小文件，同时会影响导入的性能。
+
+- 大规模数据分批导入
+
+需要导入的文件数较多、数据量很大时，建议分批进行导入，避免导入出错后重试代价太大，同时减少对系统资源的冲击。对 Broker Load 
每批次导入的数据量建议不超过100G。对于本地的大数据量文件，可以使用Doris提供的streamloader工具进行导入，该工具会自动进行分批导入。
+
+- Broker Load 导入并发数
+
+压缩文件/Parquet/ORC文件‌：建议将文件分割成多个小文件进行导入，以实现多并发导入。 
+
+非压缩的CSV和JSON文件‌：Doris内部会自动切分文件并并发导入。
+
+并发数策略请参考：[Broker Load导入配置参数](./import-way/broker-load-manual#导入配置参数)
+
+- Stream load并发导入
+
+Stream 
load单BE上的并发数建议不超过128（由BE的webserver_num_workers参数控制）。过高的并发数可能导致webserver线程数不够用，影响导入性能。特别是当单个BE的并发数超过512（doris_max_remote_scanner_thread_pool_thread_num参数）时，可能会导致BE进程卡住。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-best-practices.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-best-practices.md
new file mode 100644
index 00000000000..8285ddc95d1
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-best-practices.md
@@ -0,0 +1,62 @@
+---
+{
+    "title": "导入最佳实践",
+    "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+- 表模型选择 
+
+建议优先考虑使用明细模型, 
明细模型在数据导入和查询性能方面相比其他模型都具有优势。如需了解更多信息，请参考：[数据模型](../../table-design/data-model/overview)
+
+- 分区分桶配置
+
+建议一个 tablet 的大小在 1-10G 范围内。过小的 tablet 可能导致聚合效果不佳，增加元数据管理压力；过大的 tablet 
不利于副本迁移、补齐。详细请参考：[数据分布](../../table-design/data-partitioning/data-distribution)。
+
+- Random 分桶
+
+在使用Random分桶时，可以通过设置load_to_single_tablet为true来启用单分片导入模式。这种模式在大规模数据导入过程中，能够提升数据导入的并发度和吞吐量，减少写放大问题。详细参考：[Random分桶](../../table-design/data-partitioning/data-bucketing#random-分桶)
+
+- 攒批导入
+
+客户端攒批‌：建议将数据在客户端进行攒批（数MB到数GB大小）后再进行导入，高频小导入会频繁做compaction，导致严重的写放大问题。
+服务端攒批：对于高并发小数据量导入，建议打开[Group Commit](group-commit-manual.md)，在服务端实现攒批导入。
+
+- 分区导入
+
+每次导入建议只导入少量分区的数据。过多的分区同时导入会增加内存占用，并可能导致性能问题。Doris每个tablet在内存中有一个活跃的Memtable，每个Memtable达到一定大小时才会下刷到磁盘。为了避免进程OOM，当活跃的Memtable占用内存过高时，会提前触发Memtable下刷，导致产生大量小文件，同时会影响导入的性能。
+
+- 大规模数据分批导入
+
+需要导入的文件数较多、数据量很大时，建议分批进行导入，避免导入出错后重试代价太大，同时减少对系统资源的冲击。对 Broker Load 
每批次导入的数据量建议不超过100G。对于本地的大数据量文件，可以使用Doris提供的streamloader工具进行导入，该工具会自动进行分批导入。
+
+- Broker Load 导入并发数
+
+压缩文件/Parquet/ORC文件‌：建议将文件分割成多个小文件进行导入，以实现多并发导入。 
+
+非压缩的CSV和JSON文件‌：Doris内部会自动切分文件并并发导入。
+
+并发数策略请参考：[Broker Load导入配置参数](./import-way/broker-load-manual#导入配置参数)
+
+- Stream load并发导入
+
+Stream 
load单BE上的并发数建议不超过128（由BE的webserver_num_workers参数控制）。过高的并发数可能导致webserver线程数不够用，影响导入性能。特别是当单个BE的并发数超过512（doris_max_remote_scanner_thread_pool_thread_num参数）时，可能会导致BE进程卡住。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-best-practices.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-best-practices.md
new file mode 100644
index 00000000000..8285ddc95d1
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-best-practices.md
@@ -0,0 +1,62 @@
+---
+{
+    "title": "导入最佳实践",
+    "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+- 表模型选择 
+
+建议优先考虑使用明细模型, 
明细模型在数据导入和查询性能方面相比其他模型都具有优势。如需了解更多信息，请参考：[数据模型](../../table-design/data-model/overview)
+
+- 分区分桶配置
+
+建议一个 tablet 的大小在 1-10G 范围内。过小的 tablet 可能导致聚合效果不佳，增加元数据管理压力；过大的 tablet 
不利于副本迁移、补齐。详细请参考：[数据分布](../../table-design/data-partitioning/data-distribution)。
+
+- Random 分桶
+
+在使用Random分桶时，可以通过设置load_to_single_tablet为true来启用单分片导入模式。这种模式在大规模数据导入过程中，能够提升数据导入的并发度和吞吐量，减少写放大问题。详细参考：[Random分桶](../../table-design/data-partitioning/data-bucketing#random-分桶)
+
+- 攒批导入
+
+客户端攒批‌：建议将数据在客户端进行攒批（数MB到数GB大小）后再进行导入，高频小导入会频繁做compaction，导致严重的写放大问题。
+服务端攒批：对于高并发小数据量导入，建议打开[Group Commit](group-commit-manual.md)，在服务端实现攒批导入。
+
+- 分区导入
+
+每次导入建议只导入少量分区的数据。过多的分区同时导入会增加内存占用，并可能导致性能问题。Doris每个tablet在内存中有一个活跃的Memtable，每个Memtable达到一定大小时才会下刷到磁盘。为了避免进程OOM，当活跃的Memtable占用内存过高时，会提前触发Memtable下刷，导致产生大量小文件，同时会影响导入的性能。
+
+- 大规模数据分批导入
+
+需要导入的文件数较多、数据量很大时，建议分批进行导入，避免导入出错后重试代价太大，同时减少对系统资源的冲击。对 Broker Load 
每批次导入的数据量建议不超过100G。对于本地的大数据量文件，可以使用Doris提供的streamloader工具进行导入，该工具会自动进行分批导入。
+
+- Broker Load 导入并发数
+
+压缩文件/Parquet/ORC文件‌：建议将文件分割成多个小文件进行导入，以实现多并发导入。 
+
+非压缩的CSV和JSON文件‌：Doris内部会自动切分文件并并发导入。
+
+并发数策略请参考：[Broker Load导入配置参数](./import-way/broker-load-manual#导入配置参数)
+
+- Stream load并发导入
+
+Stream 
load单BE上的并发数建议不超过128（由BE的webserver_num_workers参数控制）。过高的并发数可能导致webserver线程数不够用，影响导入性能。特别是当单个BE的并发数超过512（doris_max_remote_scanner_thread_pool_thread_num参数）时，可能会导致BE进程卡住。
diff --git a/sidebars.json b/sidebars.json
index b7a7e942a66..1c3d405d408 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -199,7 +199,8 @@
                         "data-operate/import/handling-messy-data",
                         "data-operate/import/load-data-convert",
                         "data-operate/import/load-high-availability",
-                        "data-operate/import/group-commit-manual"
+                        "data-operate/import/group-commit-manual",
+                        "data-operate/import/load-best-practices"
                     ]
                 },
                 {
diff --git 
a/versioned_docs/version-2.1/data-operate/import/load-best-practices.md 
b/versioned_docs/version-2.1/data-operate/import/load-best-practices.md
new file mode 100644
index 00000000000..af4598fef1e
--- /dev/null
+++ b/versioned_docs/version-2.1/data-operate/import/load-best-practices.md
@@ -0,0 +1,62 @@
+---
+{
+    "title": "Load Best Practices",
+    "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+- Table Model Selection
+
+It is recommended to prioritize using the Duplicate Key model, which offers 
advantages in both data loading and query performance compared to other models. 
For more information, please refer to: [Data 
Model](../../table-design/data-model/overview)
+
+- Partition and Bucket Configuration
+
+It is recommended to keep the size of a tablet between 1-10GB. Tablets that 
are too small may lead to poor aggregation performance and increase metadata 
management overhead; tablets that are too large may hinder replica migration 
and repair. For details, please refer to: [Data 
Distribution](../../table-design/data-partitioning/data-distribution).
+
+- Random Bucketing
+
+When using Random bucketing, you can enable single-tablet loading mode by 
setting load_to_single_tablet to true. This mode can improve data loading 
concurrency and throughput while reducing write amplification during 
large-scale data loading. For details, refer to: [Random 
Bucketing](../../table-design/data-partitioning/data-bucketing#random-bucketing)
+
+- Batch Loading
+
+Client-side batching: It is recommended to batch data (from several MB to GB 
in size) on the client side before loading. High-frequency small loads will 
cause frequent compaction, leading to severe write amplification issues.
+Server-side batching: For high-concurrency small data volume loading, it is 
recommended to enable [Group Commit](group-commit-manual.md) to implement 
batching on the server side.
+
+- Partition Loading
+
+It is recommended to load data from only a few partitions at a time. Loading 
from too many partitions simultaneously will increase memory usage and may 
cause performance issues. Each tablet in Doris has an active Memtable in 
memory, which is flushed to disk when it reaches a certain size. To prevent 
process OOM, when the active Memtable's memory usage is too high, it will 
trigger early flushing, resulting in many small files and affecting loading 
performance.
+
+- Large-scale Data Batch Loading
+
+When dealing with a large number of files or large data volumes, it is 
recommended to load in batches to avoid high retry costs in case of loading 
failures and to reduce system resource impact. For Broker Load, it is 
recommended not to exceed 100GB per batch. For large local data files, you can 
use Doris's streamloader tool, which automatically performs batch loading.
+
+- Broker Load Concurrency
+
+Compressed files/Parquet/ORC files: It is recommended to split files into 
multiple smaller files for loading to achieve higher concurrency.
+
+Uncompressed CSV and JSON files: Doris will automatically split files and load 
them concurrently.
+
+For concurrency strategies, please refer to: [Broker Load Configuration 
Parameters](./import-way/broker-load-manual#Related-Configurations)
+
+- Stream Load Concurrency
+
+It is recommended to keep Stream load concurrency per BE under 128 (controlled 
by BE's webserver_num_workers parameter). High concurrency may cause webserver 
thread exhaustion and affect loading performance. Particularly when a single 
BE's concurrency exceeds 512 (doris_max_remote_scanner_thread_pool_thread_num 
parameter), it may cause the BE process to hang.
diff --git 
a/versioned_docs/version-3.0/data-operate/import/load-best-practices.md 
b/versioned_docs/version-3.0/data-operate/import/load-best-practices.md
new file mode 100644
index 00000000000..af4598fef1e
--- /dev/null
+++ b/versioned_docs/version-3.0/data-operate/import/load-best-practices.md
@@ -0,0 +1,62 @@
+---
+{
+    "title": "Load Best Practices",
+    "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+- Table Model Selection
+
+It is recommended to prioritize using the Duplicate Key model, which offers 
advantages in both data loading and query performance compared to other models. 
For more information, please refer to: [Data 
Model](../../table-design/data-model/overview)
+
+- Partition and Bucket Configuration
+
+It is recommended to keep the size of a tablet between 1-10GB. Tablets that 
are too small may lead to poor aggregation performance and increase metadata 
management overhead; tablets that are too large may hinder replica migration 
and repair. For details, please refer to: [Data 
Distribution](../../table-design/data-partitioning/data-distribution).
+
+- Random Bucketing
+
+When using Random bucketing, you can enable single-tablet loading mode by 
setting load_to_single_tablet to true. This mode can improve data loading 
concurrency and throughput while reducing write amplification during 
large-scale data loading. For details, refer to: [Random 
Bucketing](../../table-design/data-partitioning/data-bucketing#random-bucketing)
+
+- Batch Loading
+
+Client-side batching: It is recommended to batch data (from several MB to GB 
in size) on the client side before loading. High-frequency small loads will 
cause frequent compaction, leading to severe write amplification issues.
+Server-side batching: For high-concurrency small data volume loading, it is 
recommended to enable [Group Commit](group-commit-manual.md) to implement 
batching on the server side.
+
+- Partition Loading
+
+It is recommended to load data from only a few partitions at a time. Loading 
from too many partitions simultaneously will increase memory usage and may 
cause performance issues. Each tablet in Doris has an active Memtable in 
memory, which is flushed to disk when it reaches a certain size. To prevent 
process OOM, when the active Memtable's memory usage is too high, it will 
trigger early flushing, resulting in many small files and affecting loading 
performance.
+
+- Large-scale Data Batch Loading
+
+When dealing with a large number of files or large data volumes, it is 
recommended to load in batches to avoid high retry costs in case of loading 
failures and to reduce system resource impact. For Broker Load, it is 
recommended not to exceed 100GB per batch. For large local data files, you can 
use Doris's streamloader tool, which automatically performs batch loading.
+
+- Broker Load Concurrency
+
+Compressed files/Parquet/ORC files: It is recommended to split files into 
multiple smaller files for loading to achieve higher concurrency.
+
+Uncompressed CSV and JSON files: Doris will automatically split files and load 
them concurrently.
+
+For concurrency strategies, please refer to: [Broker Load Configuration 
Parameters](./import-way/broker-load-manual#Related-Configurations)
+
+- Stream Load Concurrency
+
+It is recommended to keep Stream load concurrency per BE under 128 (controlled 
by BE's webserver_num_workers parameter). High concurrency may cause webserver 
thread exhaustion and affect loading performance. Particularly when a single 
BE's concurrency exceeds 512 (doris_max_remote_scanner_thread_pool_thread_num 
parameter), it may cause the BE process to hang.
diff --git a/versioned_sidebars/version-2.1-sidebars.json 
b/versioned_sidebars/version-2.1-sidebars.json
index 2a7836d0d65..f4196656388 100644
--- a/versioned_sidebars/version-2.1-sidebars.json
+++ b/versioned_sidebars/version-2.1-sidebars.json
@@ -175,7 +175,8 @@
                         "data-operate/import/handling-messy-data",
                         "data-operate/import/load-data-convert",
                         "data-operate/import/load-high-availability",
-                        "data-operate/import/group-commit-manual"
+                        "data-operate/import/group-commit-manual",
+                        "data-operate/import/load-best-practices"
                     ]
                 },
                 {
diff --git a/versioned_sidebars/version-3.0-sidebars.json 
b/versioned_sidebars/version-3.0-sidebars.json
index ff54cd62f91..52253c3ddaa 100644
--- a/versioned_sidebars/version-3.0-sidebars.json
+++ b/versioned_sidebars/version-3.0-sidebars.json
@@ -199,7 +199,8 @@
                         "data-operate/import/handling-messy-data",
                         "data-operate/import/load-data-convert",
                         "data-operate/import/load-high-availability",
-                        "data-operate/import/group-commit-manual"
+                        "data-operate/import/group-commit-manual",
+                        "data-operate/import/load-best-practices"
                     ]
                 },
                 {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris-website) branch master updated: add load best practices (#2301)

Reply via email to