This is an automated email from the ASF dual-hosted git repository.

eldenmoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 2f5095306ac add description to tuning variant column count limits 
(#2659)
2f5095306ac is described below

commit 2f5095306ac04a5e2da2a8627220fe584a0f2ed4
Author: lihangyu <[email protected]>
AuthorDate: Fri Jul 25 16:42:31 2025 +0800

    add description to tuning variant column count limits (#2659)
    
    ## Versions
    
    - [x] dev
    - [x] 3.0
    - [x] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 .../sql-data-types/semi-structured/VARIANT.md      | 22 +++++++++++++++++++++-
 .../sql-data-types/semi-structured/VARIANT.md      | 14 +++++++++++++-
 .../sql-data-types/semi-structured/VARIANT.md      | 15 ++++++++++++++-
 .../sql-data-types/semi-structured/VARIANT.md      | 16 +++++++++++++++-
 .../sql-data-types/semi-structured/VARIANT.md      | 22 +++++++++++++++++++++-
 .../sql-data-types/semi-structured/VARIANT.md      | 22 +++++++++++++++++++++-
 6 files changed, 105 insertions(+), 6 deletions(-)

diff --git 
a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md 
b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 6607f13b302..3c55900fc00 100644
--- a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -448,11 +448,31 @@ CREATE TABLE example_table (
 SELECT * FROM example_table WHERE data_string LIKE '%doris%'
 ```
 
+**Tuning Techniques for Column-Count Limits:**
+
+Note: If the number of sub-columns exceeds 5,000, higher requirements for 
memory and configuration apply. On a single machine, aim for at least 128 GB of 
RAM and 32 CPU cores.
+
+1. In BE configuration, adjust `variant_max_merged_tablet_schema_size=n`, 
where n should be greater than the actual number of columns (not recommended to 
exceed 10,000).
+
+2. Be aware that extracting too many columns will put heavy pressure on 
compaction (import throughput must be throttled accordingly). Increasing the 
client-side import `batch_size`—based on memory usage—can reduce write 
amplification during compaction. Alternatively, enable `group_commit` (a table 
property) and appropriately increase `group_commit_interval_ms` and 
`group_commit_data_bytes`.
+
+3. If your queries do not require bucket pruning, use random bucketing and 
enable the 
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
 import setting (an import configuration) to reduce compaction write 
amplification.
+
+4. In BE configuration, adjust `max_cumu_compaction_threads` according to 
import pressure; ensure at least 8 threads.
+
+5. In BE configuration, set `vertical_compaction_num_columns_per_group=500` to 
improve grouped-compaction efficiency, although this increases memory overhead.
+
+6. In BE configuration, set `segment_cache_memory_percentage=20` to increase 
segment cache capacity and improve metadata caching efficiency.
+
+7. Monitor the Compaction Score closely. A continuously rising score indicates 
that compaction cannot keep up (import pressure should be reduced accordingly).
+
+8. Using `SELECT *` or `SELECT variant` can significantly increase 
cluster-wide pressure, potentially causing timeouts or out-of-memory errors. It 
is recommended to include path information in queries—for example, `SELECT 
variant['path_1']`.
+
 ### FAQ
 
 1. Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: 
[DATA_QUALITY_ERROR] Reached max column size limit 2048.
 
-Due to compaction and metadata storage limitations, the VARIANT type imposes a 
limit on the number of columns, with the default being 2048 columns. You can 
adjust the BE configuration `variant_max_merged_tablet_schema_size` 
accordingly, but it is not recommended to exceed 4096 columns.
+Due to compaction and metadata storage limitations, the VARIANT type imposes a 
limit on the number of columns, with the default being 2048 columns. You can 
adjust the BE configuration `variant_max_merged_tablet_schema_size` 
accordingly, but it is not recommended to exceed 4096 columns(Requires 
higher-spec hardware).
 
 2. Is there a difference between null in the VARIANT type (e.g., `{"key": 
null}`) and SQL NULL (i.e., IS NULL)?
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index c5b5b2561f2..1ad51ea6ce4 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -363,10 +363,22 @@ CREATE TABLE example_table (
 SELECT * FROM example_table WHERE data_string LIKE '%doris%';
 ```
 
+**针对列数限制的调优手段:**
+
+注意如果是 超过 5000 子列,对内存和配置有比较高的要求,单机尽可能 128G 以上内存,核数 32C 以上
+1. BE 配置`variant_max_merged_tablet_schema_size=n` n 大于实际的列数(不推荐超过 10000)
+2. 需要注意的是,提取的列数过多会导致 compaction 的压力过大(需要控制导入的吞吐)。根据内存使用情况增大客户端导入的 batch_size 
可以降低 compaction 的写放大(或者推荐使用 group_commit,表 properties 
配置,适当增加`group_commit_interval_ms` 和 `group_commit_data_bytes`)
+3. 如果查询没有分桶裁剪的需求,可以使用 random 分桶,开启 
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
 导入(导入的配置),可以减少 compaction 写放大
+4. BE 配置 根据导入压力调整 `max_cumu_compaction_threads`,至少保证 8 个线程
+5. BE 配置`vertical_compaction_num_columns_per_group=500`提升分组 compaction 
效率,但是会增加内存开销销
+6. BE 配置`segment_cache_memory_percentage=20`增加 segment 缓存的容量,提升元数据缓存效率率
+7. 注意关注 Compaction Score,如果 Score 持续增加会导致,Score 过高反应 Compaction 
做不过来(需要适当降低导入压力)
+8. `SELECT *` 或者 `SELECT variant` 会导致集群整体压力明显上升,甚至出现超时或者内存超限。建议查询带上 Path 信息例如 
`SELECT variant['path_1']`。
+
 ### FAQ
 1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: 
[DATA_QUALITY_ERROR]Reached max column size limit 2048。
 
-    由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置 
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096
+    由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置 
`variant_max_merged_tablet_schema_size`,但是不建议超过 4096(依赖较高配置机型)
 
 2. VARIANT 类型中的 null(例如 `{"key": null}` )和 SQL 中的 NULL(即 IS NULL)有区别吗? 
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index c5b5b2561f2..15e2b7550b6 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -363,10 +363,23 @@ CREATE TABLE example_table (
 SELECT * FROM example_table WHERE data_string LIKE '%doris%';
 ```
 
+**针对列数限制的调优手段:**
+
+注意如果是 超过 5000 子列,对内存和配置有比较高的要求,单机尽可能 128G 以上内存,核数 32C 以上
+1. BE 配置`variant_max_merged_tablet_schema_size=n` n 大于实际的列数(不推荐超过 10000)
+2. 需要注意的是,提取的列数过多会导致 compaction 的压力过大(需要控制导入的吞吐)。根据内存使用情况增大客户端导入的 batch_size 
可以降低 compaction 的写放大(或者推荐使用 group_commit,表 properties 
配置,适当增加`group_commit_interval_ms` 和 `group_commit_data_bytes`)
+3. 如果查询没有分桶裁剪的需求,可以使用 random 分桶,开启 
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
 导入(导入的配置),可以减少 compaction 写放大
+4. BE 配置 根据导入压力调整 `max_cumu_compaction_threads`,至少保证 8 个线程
+5. BE 配置`vertical_compaction_num_columns_per_group=500`提升分组 compaction 
效率,但是会增加内存开销销
+6. BE 配置`segment_cache_memory_percentage=20`增加 segment 缓存的容量,提升元数据缓存效率率
+7. 注意关注 Compaction Score,如果 Score 持续增加会导致,Score 过高反应 Compaction 
做不过来(需要适当降低导入压力)
+8. `SELECT *` 或者 `SELECT variant` 会导致集群整体压力明显上升,甚至出现超时或者内存超限。建议查询带上 Path 信息例如 
`SELECT variant['path_1']`。
+
+
 ### FAQ
 1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: 
[DATA_QUALITY_ERROR]Reached max column size limit 2048。
 
-    由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置 
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096
+    由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置 
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096(依赖较高配置机型)
 
 2. VARIANT 类型中的 null(例如 `{"key": null}` )和 SQL 中的 NULL(即 IS NULL)有区别吗? 
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index d0cd3345973..c36adf63e98 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -363,10 +363,24 @@ CREATE TABLE example_table (
 SELECT * FROM example_table WHERE data_string LIKE '%doris%';
 ```
 
+**针对列数限制的调优手段:**
+
+**针对列数限制的调优手段:**
+
+注意如果是 超过 5000 子列,对内存和配置有比较高的要求,单机尽可能 128G 以上内存,核数 32C 以上
+1. BE 配置`variant_max_merged_tablet_schema_size=n` n 大于实际的列数(不推荐超过 10000)
+2. 需要注意的是,提取的列数过多会导致 compaction 的压力过大(需要控制导入的吞吐)。根据内存使用情况增大客户端导入的 batch_size 
可以降低 compaction 的写放大(或者推荐使用 group_commit,表 properties 
配置,适当增加`group_commit_interval_ms` 和 `group_commit_data_bytes`)
+3. 如果查询没有分桶裁剪的需求,可以使用 random 分桶,开启 
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
 导入(导入的配置),可以减少 compaction 写放大
+4. BE 配置 根据导入压力调整 `max_cumu_compaction_threads`,至少保证 8 个线程
+5. BE 配置`vertical_compaction_num_columns_per_group=500`提升分组 compaction 
效率,但是会增加内存开销销
+6. BE 配置`segment_cache_memory_percentage=20`增加 segment 缓存的容量,提升元数据缓存效率率
+7. 注意关注 Compaction Score,如果 Score 持续增加会导致,Score 过高反应 Compaction 
做不过来(需要适当降低导入压力)
+8. `SELECT *` 或者 `SELECT variant` 会导致集群整体压力明显上升,甚至出现超时或者内存超限。建议查询带上 Path 信息例如 
`SELECT variant['path_1']`。
+
 ### FAQ
 1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: 
[DATA_QUALITY_ERROR]Reached max column size limit 2048。
 
-    由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置 
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096
+    由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置 
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096(依赖较高配置机型)
 
 2. VARIANT 类型中的 null(例如 `{"key": null}` )和 SQL 中的 NULL(即 IS NULL)有区别吗? 
 
diff --git 
a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 1829c0723ba..360e7c3f8c3 100644
--- 
a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -359,11 +359,31 @@ CREATE TABLE example_table (
 SELECT * FROM example_table WHERE data_string LIKE '%doris%'
 ```
 
+**Tuning Techniques for Column-Count Limits:**
+
+Note: If the number of sub-columns exceeds 5,000, higher requirements for 
memory and configuration apply. On a single machine, aim for at least 128 GB of 
RAM and 32 CPU cores.
+
+1. In BE configuration, adjust `variant_max_merged_tablet_schema_size=n`, 
where n should be greater than the actual number of columns (not recommended to 
exceed 10,000).
+
+2. Be aware that extracting too many columns will put heavy pressure on 
compaction (import throughput must be throttled accordingly). Increasing the 
client-side import `batch_size`—based on memory usage—can reduce write 
amplification during compaction. Alternatively, enable `group_commit` (a table 
property) and appropriately increase `group_commit_interval_ms` and 
`group_commit_data_bytes`.
+
+3. If your queries do not require bucket pruning, use random bucketing and 
enable the 
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
 import setting (an import configuration) to reduce compaction write 
amplification.
+
+4. In BE configuration, adjust `max_cumu_compaction_threads` according to 
import pressure; ensure at least 8 threads.
+
+5. In BE configuration, set `vertical_compaction_num_columns_per_group=500` to 
improve grouped-compaction efficiency, although this increases memory overhead.
+
+6. In BE configuration, set `segment_cache_memory_percentage=20` to increase 
segment cache capacity and improve metadata caching efficiency.
+
+7. Monitor the Compaction Score closely. A continuously rising score indicates 
that compaction cannot keep up (import pressure should be reduced accordingly).
+
+8. Using `SELECT *` or `SELECT variant` can significantly increase 
cluster-wide pressure, potentially causing timeouts or out-of-memory errors. It 
is recommended to include path information in queries—for example, `SELECT 
variant['path_1']`.
+
 ### FAQ
 
 1. Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: 
[DATA_QUALITY_ERROR] Reached max column size limit 2048.
 
-Due to compaction and metadata storage limitations, the VARIANT type imposes a 
limit on the number of columns, with the default being 2048 columns. You can 
adjust the BE configuration `variant_max_merged_tablet_schema_size` 
accordingly, but it is not recommended to exceed 4096 columns.
+Due to compaction and metadata storage limitations, the VARIANT type imposes a 
limit on the number of columns, with the default being 2048 columns. You can 
adjust the BE configuration `variant_max_merged_tablet_schema_size` 
accordingly, but it is not recommended to exceed 4096 columns(Requires 
higher-spec hardware).
 
 2. Is there a difference between null in the VARIANT type (e.g., `{"key": 
null}`) and SQL NULL (i.e., IS NULL)?
 
diff --git 
a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 1829c0723ba..360e7c3f8c3 100644
--- 
a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -359,11 +359,31 @@ CREATE TABLE example_table (
 SELECT * FROM example_table WHERE data_string LIKE '%doris%'
 ```
 
+**Tuning Techniques for Column-Count Limits:**
+
+Note: If the number of sub-columns exceeds 5,000, higher requirements for 
memory and configuration apply. On a single machine, aim for at least 128 GB of 
RAM and 32 CPU cores.
+
+1. In BE configuration, adjust `variant_max_merged_tablet_schema_size=n`, 
where n should be greater than the actual number of columns (not recommended to 
exceed 10,000).
+
+2. Be aware that extracting too many columns will put heavy pressure on 
compaction (import throughput must be throttled accordingly). Increasing the 
client-side import `batch_size`—based on memory usage—can reduce write 
amplification during compaction. Alternatively, enable `group_commit` (a table 
property) and appropriately increase `group_commit_interval_ms` and 
`group_commit_data_bytes`.
+
+3. If your queries do not require bucket pruning, use random bucketing and 
enable the 
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
 import setting (an import configuration) to reduce compaction write 
amplification.
+
+4. In BE configuration, adjust `max_cumu_compaction_threads` according to 
import pressure; ensure at least 8 threads.
+
+5. In BE configuration, set `vertical_compaction_num_columns_per_group=500` to 
improve grouped-compaction efficiency, although this increases memory overhead.
+
+6. In BE configuration, set `segment_cache_memory_percentage=20` to increase 
segment cache capacity and improve metadata caching efficiency.
+
+7. Monitor the Compaction Score closely. A continuously rising score indicates 
that compaction cannot keep up (import pressure should be reduced accordingly).
+
+8. Using `SELECT *` or `SELECT variant` can significantly increase 
cluster-wide pressure, potentially causing timeouts or out-of-memory errors. It 
is recommended to include path information in queries—for example, `SELECT 
variant['path_1']`.
+
 ### FAQ
 
 1. Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: 
[DATA_QUALITY_ERROR] Reached max column size limit 2048.
 
-Due to compaction and metadata storage limitations, the VARIANT type imposes a 
limit on the number of columns, with the default being 2048 columns. You can 
adjust the BE configuration `variant_max_merged_tablet_schema_size` 
accordingly, but it is not recommended to exceed 4096 columns.
+Due to compaction and metadata storage limitations, the VARIANT type imposes a 
limit on the number of columns, with the default being 2048 columns. You can 
adjust the BE configuration `variant_max_merged_tablet_schema_size` 
accordingly, but it is not recommended to exceed 4096 columns(Requires 
higher-spec hardware).
 
 2. Is there a difference between null in the VARIANT type (e.g., `{"key": 
null}`) and SQL NULL (i.e., IS NULL)?
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to