(doris-website) branch master updated: [fix](load) fix the format of load best practices (#2335)

liaoxin Sun, 27 Apr 2025 20:22:25 -0700

This is an automated email from the ASF dual-hosted git repository.

liaoxin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new e8253fe62b4 [fix](load) fix the format of load best practices (#2335)
e8253fe62b4 is described below

commit e8253fe62b48e0a2ad5a4a9e4fbbda4cd981f947
Author: Xin Liao <liao...@selectdb.com>
AuthorDate: Mon Apr 28 11:22:11 2025 +0800

    [fix](load) fix the format of load best practices (#2335)
---
 docs/data-operate/import/load-best-practices.md          | 16 ++++++++--------
 .../current/data-operate/import/load-best-practices.md   | 16 ++++++++--------
 .../data-operate/import/load-best-practices.md           | 16 ++++++++--------
 .../data-operate/import/load-best-practices.md           | 16 ++++++++--------
 .../data-operate/import/load-best-practices.md           | 16 ++++++++--------
 .../data-operate/import/load-best-practices.md           | 16 ++++++++--------
 6 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/docs/data-operate/import/load-best-practices.md 
b/docs/data-operate/import/load-best-practices.md
index 8e563aed778..efe4736f20e 100644
--- a/docs/data-operate/import/load-best-practices.md
+++ b/docs/data-operate/import/load-best-practices.md
@@ -24,32 +24,32 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Table Model Selection
+## Table Model Selection
 
 It is recommended to prioritize using the Duplicate Key model, which offers 
advantages in both data loading and query performance compared to other models. 
For more information, please refer to: [Data 
Model](../../table-design/data-model/overview)
 
-# Partition and Bucket Configuration
+## Partition and Bucket Configuration
 
 It is recommended to keep the size of a tablet between 1-10GB. Tablets that 
are too small may lead to poor aggregation performance and increase metadata 
management overhead; tablets that are too large may hinder replica migration 
and repair. For details, please refer to: [Data 
Distribution](../../table-design/data-partitioning/data-distribution).
 
-# Random Bucketing
+## Random Bucketing
 
 When using Random bucketing, you can enable single-tablet loading mode by 
setting load_to_single_tablet to true. This mode can improve data loading 
concurrency and throughput while reducing write amplification during 
large-scale data loading. For details, refer to: [Random 
Bucketing](../../table-design/data-partitioning/data-bucketing#random-bucketing)
 
-# Batch Loading
+## Batch Loading
 
 Client-side batching: It is recommended to batch data (from several MB to GB 
in size) on the client side before loading. High-frequency small loads will 
cause frequent compaction, leading to severe write amplification issues.
 Server-side batching: For high-concurrency small data volume loading, it is 
recommended to enable [Group Commit](group-commit-manual.md) to implement 
batching on the server side.
 
-# Partition Loading
+## Partition Loading
 
 It is recommended to load data from only a few partitions at a time. Loading 
from too many partitions simultaneously will increase memory usage and may 
cause performance issues. Each tablet in Doris has an active Memtable in 
memory, which is flushed to disk when it reaches a certain size. To prevent 
process OOM, when the active Memtable's memory usage is too high, it will 
trigger early flushing, resulting in many small files and affecting loading 
performance.
 
-# Large-scale Data Batch Loading
+## Large-scale Data Batch Loading
 
 When dealing with a large number of files or large data volumes, it is 
recommended to load in batches to avoid high retry costs in case of loading 
failures and to reduce system resource impact. For Broker Load, it is 
recommended not to exceed 100GB per batch. For large local data files, you can 
use Doris's streamloader tool, which automatically performs batch loading.
 
-# Broker Load Concurrency
+## Broker Load Concurrency
 
 Compressed files/Parquet/ORC files: It is recommended to split files into 
multiple smaller files for loading to achieve higher concurrency.
 
@@ -57,6 +57,6 @@ Uncompressed CSV and JSON files: Doris will automatically 
split files and load t
 
 For concurrency strategies, please refer to: [Broker Load Configuration 
Parameters](./import-way/broker-load-manual#Related-Configurations)
 
-# Stream Load Concurrency
+## Stream Load Concurrency
 
 It is recommended to keep Stream load concurrency per BE under 128 (controlled 
by BE's webserver_num_workers parameter). High concurrency may cause webserver 
thread exhaustion and affect loading performance. Particularly when a single 
BE's concurrency exceeds 512 (doris_max_remote_scanner_thread_pool_thread_num 
parameter), it may cause the BE process to hang.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-best-practices.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-best-practices.md
index de35ad83b29..dc310523c00 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-best-practices.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-best-practices.md
@@ -24,32 +24,32 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# 表模型选择 
+## 表模型选择 
 
 建议优先考虑使用明细模型, 
明细模型在数据导入和查询性能方面相比其他模型都具有优势。如需了解更多信息，请参考：[数据模型](../../table-design/data-model/overview)
 
-# 分区分桶配置
+## 分区分桶配置
 
 建议一个 tablet 的大小在 1-10G 范围内。过小的 tablet 可能导致聚合效果不佳，增加元数据管理压力；过大的 tablet 
不利于副本迁移、补齐。详细请参考：[数据分布](../../table-design/data-partitioning/data-distribution)。
 
-# Random 分桶
+## Random 分桶
 
 
在使用Random分桶时，可以通过设置load_to_single_tablet为true来启用单分片导入模式。这种模式在大规模数据导入过程中，能够提升数据导入的并发度和吞吐量，减少写放大问题。详细参考：[Random分桶](../../table-design/data-partitioning/data-bucketing#random-分桶)
 
-# 攒批导入
+## 攒批导入
 
 客户端攒批‌：建议将数据在客户端进行攒批（数MB到数GB大小）后再进行导入，高频小导入会频繁做compaction，导致严重的写放大问题。
 服务端攒批：对于高并发小数据量导入，建议打开[Group Commit](group-commit-manual.md)，在服务端实现攒批导入。
 
-# 分区导入
+## 分区导入
 
 
每次导入建议只导入少量分区的数据。过多的分区同时导入会增加内存占用，并可能导致性能问题。Doris每个tablet在内存中有一个活跃的Memtable，每个Memtable达到一定大小时才会下刷到磁盘。为了避免进程OOM，当活跃的Memtable占用内存过高时，会提前触发Memtable下刷，导致产生大量小文件，同时会影响导入的性能。
 
-# 大规模数据分批导入
+## 大规模数据分批导入
 
 需要导入的文件数较多、数据量很大时，建议分批进行导入，避免导入出错后重试代价太大，同时减少对系统资源的冲击。对 Broker Load 
每批次导入的数据量建议不超过100G。对于本地的大数据量文件，可以使用Doris提供的streamloader工具进行导入，该工具会自动进行分批导入。
 
-# Broker Load 导入并发数
+## Broker Load 导入并发数
 
 压缩文件/Parquet/ORC文件‌：建议将文件分割成多个小文件进行导入，以实现多并发导入。 
 
@@ -57,6 +57,6 @@ under the License.
 
 并发数策略请参考：[Broker Load导入配置参数](./import-way/broker-load-manual#导入配置参数)
 
-# Stream load并发导入
+## Stream load并发导入
 
 Stream 
load单BE上的并发数建议不超过128（由BE的webserver_num_workers参数控制）。过高的并发数可能导致webserver线程数不够用，影响导入性能。特别是当单个BE的并发数超过512（doris_max_remote_scanner_thread_pool_thread_num参数）时，可能会导致BE进程卡住。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-best-practices.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-best-practices.md
index de35ad83b29..dc310523c00 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-best-practices.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-best-practices.md
@@ -24,32 +24,32 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# 表模型选择 
+## 表模型选择 
 
 建议优先考虑使用明细模型, 
明细模型在数据导入和查询性能方面相比其他模型都具有优势。如需了解更多信息，请参考：[数据模型](../../table-design/data-model/overview)
 
-# 分区分桶配置
+## 分区分桶配置
 
 建议一个 tablet 的大小在 1-10G 范围内。过小的 tablet 可能导致聚合效果不佳，增加元数据管理压力；过大的 tablet 
不利于副本迁移、补齐。详细请参考：[数据分布](../../table-design/data-partitioning/data-distribution)。
 
-# Random 分桶
+## Random 分桶
 
 
在使用Random分桶时，可以通过设置load_to_single_tablet为true来启用单分片导入模式。这种模式在大规模数据导入过程中，能够提升数据导入的并发度和吞吐量，减少写放大问题。详细参考：[Random分桶](../../table-design/data-partitioning/data-bucketing#random-分桶)
 
-# 攒批导入
+## 攒批导入
 
 客户端攒批‌：建议将数据在客户端进行攒批（数MB到数GB大小）后再进行导入，高频小导入会频繁做compaction，导致严重的写放大问题。
 服务端攒批：对于高并发小数据量导入，建议打开[Group Commit](group-commit-manual.md)，在服务端实现攒批导入。
 
-# 分区导入
+## 分区导入
 
 
每次导入建议只导入少量分区的数据。过多的分区同时导入会增加内存占用，并可能导致性能问题。Doris每个tablet在内存中有一个活跃的Memtable，每个Memtable达到一定大小时才会下刷到磁盘。为了避免进程OOM，当活跃的Memtable占用内存过高时，会提前触发Memtable下刷，导致产生大量小文件，同时会影响导入的性能。
 
-# 大规模数据分批导入
+## 大规模数据分批导入
 
 需要导入的文件数较多、数据量很大时，建议分批进行导入，避免导入出错后重试代价太大，同时减少对系统资源的冲击。对 Broker Load 
每批次导入的数据量建议不超过100G。对于本地的大数据量文件，可以使用Doris提供的streamloader工具进行导入，该工具会自动进行分批导入。
 
-# Broker Load 导入并发数
+## Broker Load 导入并发数
 
 压缩文件/Parquet/ORC文件‌：建议将文件分割成多个小文件进行导入，以实现多并发导入。 
 
@@ -57,6 +57,6 @@ under the License.
 
 并发数策略请参考：[Broker Load导入配置参数](./import-way/broker-load-manual#导入配置参数)
 
-# Stream load并发导入
+## Stream load并发导入
 
 Stream 
load单BE上的并发数建议不超过128（由BE的webserver_num_workers参数控制）。过高的并发数可能导致webserver线程数不够用，影响导入性能。特别是当单个BE的并发数超过512（doris_max_remote_scanner_thread_pool_thread_num参数）时，可能会导致BE进程卡住。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-best-practices.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-best-practices.md
index de35ad83b29..dc310523c00 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-best-practices.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-best-practices.md
@@ -24,32 +24,32 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# 表模型选择 
+## 表模型选择 
 
 建议优先考虑使用明细模型, 
明细模型在数据导入和查询性能方面相比其他模型都具有优势。如需了解更多信息，请参考：[数据模型](../../table-design/data-model/overview)
 
-# 分区分桶配置
+## 分区分桶配置
 
 建议一个 tablet 的大小在 1-10G 范围内。过小的 tablet 可能导致聚合效果不佳，增加元数据管理压力；过大的 tablet 
不利于副本迁移、补齐。详细请参考：[数据分布](../../table-design/data-partitioning/data-distribution)。
 
-# Random 分桶
+## Random 分桶
 
 
在使用Random分桶时，可以通过设置load_to_single_tablet为true来启用单分片导入模式。这种模式在大规模数据导入过程中，能够提升数据导入的并发度和吞吐量，减少写放大问题。详细参考：[Random分桶](../../table-design/data-partitioning/data-bucketing#random-分桶)
 
-# 攒批导入
+## 攒批导入
 
 客户端攒批‌：建议将数据在客户端进行攒批（数MB到数GB大小）后再进行导入，高频小导入会频繁做compaction，导致严重的写放大问题。
 服务端攒批：对于高并发小数据量导入，建议打开[Group Commit](group-commit-manual.md)，在服务端实现攒批导入。
 
-# 分区导入
+## 分区导入
 
 
每次导入建议只导入少量分区的数据。过多的分区同时导入会增加内存占用，并可能导致性能问题。Doris每个tablet在内存中有一个活跃的Memtable，每个Memtable达到一定大小时才会下刷到磁盘。为了避免进程OOM，当活跃的Memtable占用内存过高时，会提前触发Memtable下刷，导致产生大量小文件，同时会影响导入的性能。
 
-# 大规模数据分批导入
+## 大规模数据分批导入
 
 需要导入的文件数较多、数据量很大时，建议分批进行导入，避免导入出错后重试代价太大，同时减少对系统资源的冲击。对 Broker Load 
每批次导入的数据量建议不超过100G。对于本地的大数据量文件，可以使用Doris提供的streamloader工具进行导入，该工具会自动进行分批导入。
 
-# Broker Load 导入并发数
+## Broker Load 导入并发数
 
 压缩文件/Parquet/ORC文件‌：建议将文件分割成多个小文件进行导入，以实现多并发导入。 
 
@@ -57,6 +57,6 @@ under the License.
 
 并发数策略请参考：[Broker Load导入配置参数](./import-way/broker-load-manual#导入配置参数)
 
-# Stream load并发导入
+## Stream load并发导入
 
 Stream 
load单BE上的并发数建议不超过128（由BE的webserver_num_workers参数控制）。过高的并发数可能导致webserver线程数不够用，影响导入性能。特别是当单个BE的并发数超过512（doris_max_remote_scanner_thread_pool_thread_num参数）时，可能会导致BE进程卡住。
diff --git 
a/versioned_docs/version-2.1/data-operate/import/load-best-practices.md 
b/versioned_docs/version-2.1/data-operate/import/load-best-practices.md
index 8e563aed778..efe4736f20e 100644
--- a/versioned_docs/version-2.1/data-operate/import/load-best-practices.md
+++ b/versioned_docs/version-2.1/data-operate/import/load-best-practices.md
@@ -24,32 +24,32 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Table Model Selection
+## Table Model Selection
 
 It is recommended to prioritize using the Duplicate Key model, which offers 
advantages in both data loading and query performance compared to other models. 
For more information, please refer to: [Data 
Model](../../table-design/data-model/overview)
 
-# Partition and Bucket Configuration
+## Partition and Bucket Configuration
 
 It is recommended to keep the size of a tablet between 1-10GB. Tablets that 
are too small may lead to poor aggregation performance and increase metadata 
management overhead; tablets that are too large may hinder replica migration 
and repair. For details, please refer to: [Data 
Distribution](../../table-design/data-partitioning/data-distribution).
 
-# Random Bucketing
+## Random Bucketing
 
 When using Random bucketing, you can enable single-tablet loading mode by 
setting load_to_single_tablet to true. This mode can improve data loading 
concurrency and throughput while reducing write amplification during 
large-scale data loading. For details, refer to: [Random 
Bucketing](../../table-design/data-partitioning/data-bucketing#random-bucketing)
 
-# Batch Loading
+## Batch Loading
 
 Client-side batching: It is recommended to batch data (from several MB to GB 
in size) on the client side before loading. High-frequency small loads will 
cause frequent compaction, leading to severe write amplification issues.
 Server-side batching: For high-concurrency small data volume loading, it is 
recommended to enable [Group Commit](group-commit-manual.md) to implement 
batching on the server side.
 
-# Partition Loading
+## Partition Loading
 
 It is recommended to load data from only a few partitions at a time. Loading 
from too many partitions simultaneously will increase memory usage and may 
cause performance issues. Each tablet in Doris has an active Memtable in 
memory, which is flushed to disk when it reaches a certain size. To prevent 
process OOM, when the active Memtable's memory usage is too high, it will 
trigger early flushing, resulting in many small files and affecting loading 
performance.
 
-# Large-scale Data Batch Loading
+## Large-scale Data Batch Loading
 
 When dealing with a large number of files or large data volumes, it is 
recommended to load in batches to avoid high retry costs in case of loading 
failures and to reduce system resource impact. For Broker Load, it is 
recommended not to exceed 100GB per batch. For large local data files, you can 
use Doris's streamloader tool, which automatically performs batch loading.
 
-# Broker Load Concurrency
+## Broker Load Concurrency
 
 Compressed files/Parquet/ORC files: It is recommended to split files into 
multiple smaller files for loading to achieve higher concurrency.
 
@@ -57,6 +57,6 @@ Uncompressed CSV and JSON files: Doris will automatically 
split files and load t
 
 For concurrency strategies, please refer to: [Broker Load Configuration 
Parameters](./import-way/broker-load-manual#Related-Configurations)
 
-# Stream Load Concurrency
+## Stream Load Concurrency
 
 It is recommended to keep Stream load concurrency per BE under 128 (controlled 
by BE's webserver_num_workers parameter). High concurrency may cause webserver 
thread exhaustion and affect loading performance. Particularly when a single 
BE's concurrency exceeds 512 (doris_max_remote_scanner_thread_pool_thread_num 
parameter), it may cause the BE process to hang.
diff --git 
a/versioned_docs/version-3.0/data-operate/import/load-best-practices.md 
b/versioned_docs/version-3.0/data-operate/import/load-best-practices.md
index 8e563aed778..efe4736f20e 100644
--- a/versioned_docs/version-3.0/data-operate/import/load-best-practices.md
+++ b/versioned_docs/version-3.0/data-operate/import/load-best-practices.md
@@ -24,32 +24,32 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Table Model Selection
+## Table Model Selection
 
 It is recommended to prioritize using the Duplicate Key model, which offers 
advantages in both data loading and query performance compared to other models. 
For more information, please refer to: [Data 
Model](../../table-design/data-model/overview)
 
-# Partition and Bucket Configuration
+## Partition and Bucket Configuration
 
 It is recommended to keep the size of a tablet between 1-10GB. Tablets that 
are too small may lead to poor aggregation performance and increase metadata 
management overhead; tablets that are too large may hinder replica migration 
and repair. For details, please refer to: [Data 
Distribution](../../table-design/data-partitioning/data-distribution).
 
-# Random Bucketing
+## Random Bucketing
 
 When using Random bucketing, you can enable single-tablet loading mode by 
setting load_to_single_tablet to true. This mode can improve data loading 
concurrency and throughput while reducing write amplification during 
large-scale data loading. For details, refer to: [Random 
Bucketing](../../table-design/data-partitioning/data-bucketing#random-bucketing)
 
-# Batch Loading
+## Batch Loading
 
 Client-side batching: It is recommended to batch data (from several MB to GB 
in size) on the client side before loading. High-frequency small loads will 
cause frequent compaction, leading to severe write amplification issues.
 Server-side batching: For high-concurrency small data volume loading, it is 
recommended to enable [Group Commit](group-commit-manual.md) to implement 
batching on the server side.
 
-# Partition Loading
+## Partition Loading
 
 It is recommended to load data from only a few partitions at a time. Loading 
from too many partitions simultaneously will increase memory usage and may 
cause performance issues. Each tablet in Doris has an active Memtable in 
memory, which is flushed to disk when it reaches a certain size. To prevent 
process OOM, when the active Memtable's memory usage is too high, it will 
trigger early flushing, resulting in many small files and affecting loading 
performance.
 
-# Large-scale Data Batch Loading
+## Large-scale Data Batch Loading
 
 When dealing with a large number of files or large data volumes, it is 
recommended to load in batches to avoid high retry costs in case of loading 
failures and to reduce system resource impact. For Broker Load, it is 
recommended not to exceed 100GB per batch. For large local data files, you can 
use Doris's streamloader tool, which automatically performs batch loading.
 
-# Broker Load Concurrency
+## Broker Load Concurrency
 
 Compressed files/Parquet/ORC files: It is recommended to split files into 
multiple smaller files for loading to achieve higher concurrency.
 
@@ -57,6 +57,6 @@ Uncompressed CSV and JSON files: Doris will automatically 
split files and load t
 
 For concurrency strategies, please refer to: [Broker Load Configuration 
Parameters](./import-way/broker-load-manual#Related-Configurations)
 
-# Stream Load Concurrency
+## Stream Load Concurrency
 
 It is recommended to keep Stream load concurrency per BE under 128 (controlled 
by BE's webserver_num_workers parameter). High concurrency may cause webserver 
thread exhaustion and affect loading performance. Particularly when a single 
BE's concurrency exceeds 512 (doris_max_remote_scanner_thread_pool_thread_num 
parameter), it may cause the BE process to hang.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris-website) branch master updated: [fix](load) fix the format of load best practices (#2335)

Reply via email to