This is an automated email from the ASF dual-hosted git repository.
eldenmoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 2f5095306ac add description to tuning variant column count limits
(#2659)
2f5095306ac is described below
commit 2f5095306ac04a5e2da2a8627220fe584a0f2ed4
Author: lihangyu <[email protected]>
AuthorDate: Fri Jul 25 16:42:31 2025 +0800
add description to tuning variant column count limits (#2659)
## Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
.../sql-data-types/semi-structured/VARIANT.md | 22 +++++++++++++++++++++-
.../sql-data-types/semi-structured/VARIANT.md | 14 +++++++++++++-
.../sql-data-types/semi-structured/VARIANT.md | 15 ++++++++++++++-
.../sql-data-types/semi-structured/VARIANT.md | 16 +++++++++++++++-
.../sql-data-types/semi-structured/VARIANT.md | 22 +++++++++++++++++++++-
.../sql-data-types/semi-structured/VARIANT.md | 22 +++++++++++++++++++++-
6 files changed, 105 insertions(+), 6 deletions(-)
diff --git
a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 6607f13b302..3c55900fc00 100644
--- a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -448,11 +448,31 @@ CREATE TABLE example_table (
SELECT * FROM example_table WHERE data_string LIKE '%doris%'
```
+**Tuning Techniques for Column-Count Limits:**
+
+Note: If the number of sub-columns exceeds 5,000, higher requirements for
memory and configuration apply. On a single machine, aim for at least 128 GB of
RAM and 32 CPU cores.
+
+1. In BE configuration, adjust `variant_max_merged_tablet_schema_size=n`,
where n should be greater than the actual number of columns (not recommended to
exceed 10,000).
+
+2. Be aware that extracting too many columns will put heavy pressure on
compaction (import throughput must be throttled accordingly). Increasing the
client-side import `batch_size`—based on memory usage—can reduce write
amplification during compaction. Alternatively, enable `group_commit` (a table
property) and appropriately increase `group_commit_interval_ms` and
`group_commit_data_bytes`.
+
+3. If your queries do not require bucket pruning, use random bucketing and
enable the
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
import setting (an import configuration) to reduce compaction write
amplification.
+
+4. In BE configuration, adjust `max_cumu_compaction_threads` according to
import pressure; ensure at least 8 threads.
+
+5. In BE configuration, set `vertical_compaction_num_columns_per_group=500` to
improve grouped-compaction efficiency, although this increases memory overhead.
+
+6. In BE configuration, set `segment_cache_memory_percentage=20` to increase
segment cache capacity and improve metadata caching efficiency.
+
+7. Monitor the Compaction Score closely. A continuously rising score indicates
that compaction cannot keep up (import pressure should be reduced accordingly).
+
+8. Using `SELECT *` or `SELECT variant` can significantly increase
cluster-wide pressure, potentially causing timeouts or out-of-memory errors. It
is recommended to include path information in queries—for example, `SELECT
variant['path_1']`.
+
### FAQ
1. Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error:
[DATA_QUALITY_ERROR] Reached max column size limit 2048.
-Due to compaction and metadata storage limitations, the VARIANT type imposes a
limit on the number of columns, with the default being 2048 columns. You can
adjust the BE configuration `variant_max_merged_tablet_schema_size`
accordingly, but it is not recommended to exceed 4096 columns.
+Due to compaction and metadata storage limitations, the VARIANT type imposes a
limit on the number of columns, with the default being 2048 columns. You can
adjust the BE configuration `variant_max_merged_tablet_schema_size`
accordingly, but it is not recommended to exceed 4096 columns(Requires
higher-spec hardware).
2. Is there a difference between null in the VARIANT type (e.g., `{"key":
null}`) and SQL NULL (i.e., IS NULL)?
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index c5b5b2561f2..1ad51ea6ce4 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -363,10 +363,22 @@ CREATE TABLE example_table (
SELECT * FROM example_table WHERE data_string LIKE '%doris%';
```
+**针对列数限制的调优手段:**
+
+注意如果是 超过 5000 子列,对内存和配置有比较高的要求,单机尽可能 128G 以上内存,核数 32C 以上
+1. BE 配置`variant_max_merged_tablet_schema_size=n` n 大于实际的列数(不推荐超过 10000)
+2. 需要注意的是,提取的列数过多会导致 compaction 的压力过大(需要控制导入的吞吐)。根据内存使用情况增大客户端导入的 batch_size
可以降低 compaction 的写放大(或者推荐使用 group_commit,表 properties
配置,适当增加`group_commit_interval_ms` 和 `group_commit_data_bytes`)
+3. 如果查询没有分桶裁剪的需求,可以使用 random 分桶,开启
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
导入(导入的配置),可以减少 compaction 写放大
+4. BE 配置 根据导入压力调整 `max_cumu_compaction_threads`,至少保证 8 个线程
+5. BE 配置`vertical_compaction_num_columns_per_group=500`提升分组 compaction
效率,但是会增加内存开销销
+6. BE 配置`segment_cache_memory_percentage=20`增加 segment 缓存的容量,提升元数据缓存效率率
+7. 注意关注 Compaction Score,如果 Score 持续增加会导致,Score 过高反应 Compaction
做不过来(需要适当降低导入压力)
+8. `SELECT *` 或者 `SELECT variant` 会导致集群整体压力明显上升,甚至出现超时或者内存超限。建议查询带上 Path 信息例如
`SELECT variant['path_1']`。
+
### FAQ
1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error:
[DATA_QUALITY_ERROR]Reached max column size limit 2048。
- 由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096
+ 由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置
`variant_max_merged_tablet_schema_size`,但是不建议超过 4096(依赖较高配置机型)
2. VARIANT 类型中的 null(例如 `{"key": null}` )和 SQL 中的 NULL(即 IS NULL)有区别吗?
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index c5b5b2561f2..15e2b7550b6 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -363,10 +363,23 @@ CREATE TABLE example_table (
SELECT * FROM example_table WHERE data_string LIKE '%doris%';
```
+**针对列数限制的调优手段:**
+
+注意如果是 超过 5000 子列,对内存和配置有比较高的要求,单机尽可能 128G 以上内存,核数 32C 以上
+1. BE 配置`variant_max_merged_tablet_schema_size=n` n 大于实际的列数(不推荐超过 10000)
+2. 需要注意的是,提取的列数过多会导致 compaction 的压力过大(需要控制导入的吞吐)。根据内存使用情况增大客户端导入的 batch_size
可以降低 compaction 的写放大(或者推荐使用 group_commit,表 properties
配置,适当增加`group_commit_interval_ms` 和 `group_commit_data_bytes`)
+3. 如果查询没有分桶裁剪的需求,可以使用 random 分桶,开启
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
导入(导入的配置),可以减少 compaction 写放大
+4. BE 配置 根据导入压力调整 `max_cumu_compaction_threads`,至少保证 8 个线程
+5. BE 配置`vertical_compaction_num_columns_per_group=500`提升分组 compaction
效率,但是会增加内存开销销
+6. BE 配置`segment_cache_memory_percentage=20`增加 segment 缓存的容量,提升元数据缓存效率率
+7. 注意关注 Compaction Score,如果 Score 持续增加会导致,Score 过高反应 Compaction
做不过来(需要适当降低导入压力)
+8. `SELECT *` 或者 `SELECT variant` 会导致集群整体压力明显上升,甚至出现超时或者内存超限。建议查询带上 Path 信息例如
`SELECT variant['path_1']`。
+
+
### FAQ
1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error:
[DATA_QUALITY_ERROR]Reached max column size limit 2048。
- 由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096
+ 由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096(依赖较高配置机型)
2. VARIANT 类型中的 null(例如 `{"key": null}` )和 SQL 中的 NULL(即 IS NULL)有区别吗?
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index d0cd3345973..c36adf63e98 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -363,10 +363,24 @@ CREATE TABLE example_table (
SELECT * FROM example_table WHERE data_string LIKE '%doris%';
```
+**针对列数限制的调优手段:**
+
+**针对列数限制的调优手段:**
+
+注意如果是 超过 5000 子列,对内存和配置有比较高的要求,单机尽可能 128G 以上内存,核数 32C 以上
+1. BE 配置`variant_max_merged_tablet_schema_size=n` n 大于实际的列数(不推荐超过 10000)
+2. 需要注意的是,提取的列数过多会导致 compaction 的压力过大(需要控制导入的吞吐)。根据内存使用情况增大客户端导入的 batch_size
可以降低 compaction 的写放大(或者推荐使用 group_commit,表 properties
配置,适当增加`group_commit_interval_ms` 和 `group_commit_data_bytes`)
+3. 如果查询没有分桶裁剪的需求,可以使用 random 分桶,开启
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
导入(导入的配置),可以减少 compaction 写放大
+4. BE 配置 根据导入压力调整 `max_cumu_compaction_threads`,至少保证 8 个线程
+5. BE 配置`vertical_compaction_num_columns_per_group=500`提升分组 compaction
效率,但是会增加内存开销销
+6. BE 配置`segment_cache_memory_percentage=20`增加 segment 缓存的容量,提升元数据缓存效率率
+7. 注意关注 Compaction Score,如果 Score 持续增加会导致,Score 过高反应 Compaction
做不过来(需要适当降低导入压力)
+8. `SELECT *` 或者 `SELECT variant` 会导致集群整体压力明显上升,甚至出现超时或者内存超限。建议查询带上 Path 信息例如
`SELECT variant['path_1']`。
+
### FAQ
1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error:
[DATA_QUALITY_ERROR]Reached max column size limit 2048。
- 由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096
+ 由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置
`variant_max_merged_tablet_schema_size` ,但是不建议超过 4096(依赖较高配置机型)
2. VARIANT 类型中的 null(例如 `{"key": null}` )和 SQL 中的 NULL(即 IS NULL)有区别吗?
diff --git
a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 1829c0723ba..360e7c3f8c3 100644
---
a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++
b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -359,11 +359,31 @@ CREATE TABLE example_table (
SELECT * FROM example_table WHERE data_string LIKE '%doris%'
```
+**Tuning Techniques for Column-Count Limits:**
+
+Note: If the number of sub-columns exceeds 5,000, higher requirements for
memory and configuration apply. On a single machine, aim for at least 128 GB of
RAM and 32 CPU cores.
+
+1. In BE configuration, adjust `variant_max_merged_tablet_schema_size=n`,
where n should be greater than the actual number of columns (not recommended to
exceed 10,000).
+
+2. Be aware that extracting too many columns will put heavy pressure on
compaction (import throughput must be throttled accordingly). Increasing the
client-side import `batch_size`—based on memory usage—can reduce write
amplification during compaction. Alternatively, enable `group_commit` (a table
property) and appropriately increase `group_commit_interval_ms` and
`group_commit_data_bytes`.
+
+3. If your queries do not require bucket pruning, use random bucketing and
enable the
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
import setting (an import configuration) to reduce compaction write
amplification.
+
+4. In BE configuration, adjust `max_cumu_compaction_threads` according to
import pressure; ensure at least 8 threads.
+
+5. In BE configuration, set `vertical_compaction_num_columns_per_group=500` to
improve grouped-compaction efficiency, although this increases memory overhead.
+
+6. In BE configuration, set `segment_cache_memory_percentage=20` to increase
segment cache capacity and improve metadata caching efficiency.
+
+7. Monitor the Compaction Score closely. A continuously rising score indicates
that compaction cannot keep up (import pressure should be reduced accordingly).
+
+8. Using `SELECT *` or `SELECT variant` can significantly increase
cluster-wide pressure, potentially causing timeouts or out-of-memory errors. It
is recommended to include path information in queries—for example, `SELECT
variant['path_1']`.
+
### FAQ
1. Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error:
[DATA_QUALITY_ERROR] Reached max column size limit 2048.
-Due to compaction and metadata storage limitations, the VARIANT type imposes a
limit on the number of columns, with the default being 2048 columns. You can
adjust the BE configuration `variant_max_merged_tablet_schema_size`
accordingly, but it is not recommended to exceed 4096 columns.
+Due to compaction and metadata storage limitations, the VARIANT type imposes a
limit on the number of columns, with the default being 2048 columns. You can
adjust the BE configuration `variant_max_merged_tablet_schema_size`
accordingly, but it is not recommended to exceed 4096 columns(Requires
higher-spec hardware).
2. Is there a difference between null in the VARIANT type (e.g., `{"key":
null}`) and SQL NULL (i.e., IS NULL)?
diff --git
a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 1829c0723ba..360e7c3f8c3 100644
---
a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++
b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -359,11 +359,31 @@ CREATE TABLE example_table (
SELECT * FROM example_table WHERE data_string LIKE '%doris%'
```
+**Tuning Techniques for Column-Count Limits:**
+
+Note: If the number of sub-columns exceeds 5,000, higher requirements for
memory and configuration apply. On a single machine, aim for at least 128 GB of
RAM and 32 CPU cores.
+
+1. In BE configuration, adjust `variant_max_merged_tablet_schema_size=n`,
where n should be greater than the actual number of columns (not recommended to
exceed 10,000).
+
+2. Be aware that extracting too many columns will put heavy pressure on
compaction (import throughput must be throttled accordingly). Increasing the
client-side import `batch_size`—based on memory usage—can reduce write
amplification during compaction. Alternatively, enable `group_commit` (a table
property) and appropriately increase `group_commit_interval_ms` and
`group_commit_data_bytes`.
+
+3. If your queries do not require bucket pruning, use random bucketing and
enable the
[load_to_single_tablet](../../../../table-design/data-partitioning/data-bucketing#bucketing)
import setting (an import configuration) to reduce compaction write
amplification.
+
+4. In BE configuration, adjust `max_cumu_compaction_threads` according to
import pressure; ensure at least 8 threads.
+
+5. In BE configuration, set `vertical_compaction_num_columns_per_group=500` to
improve grouped-compaction efficiency, although this increases memory overhead.
+
+6. In BE configuration, set `segment_cache_memory_percentage=20` to increase
segment cache capacity and improve metadata caching efficiency.
+
+7. Monitor the Compaction Score closely. A continuously rising score indicates
that compaction cannot keep up (import pressure should be reduced accordingly).
+
+8. Using `SELECT *` or `SELECT variant` can significantly increase
cluster-wide pressure, potentially causing timeouts or out-of-memory errors. It
is recommended to include path information in queries—for example, `SELECT
variant['path_1']`.
+
### FAQ
1. Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error:
[DATA_QUALITY_ERROR] Reached max column size limit 2048.
-Due to compaction and metadata storage limitations, the VARIANT type imposes a
limit on the number of columns, with the default being 2048 columns. You can
adjust the BE configuration `variant_max_merged_tablet_schema_size`
accordingly, but it is not recommended to exceed 4096 columns.
+Due to compaction and metadata storage limitations, the VARIANT type imposes a
limit on the number of columns, with the default being 2048 columns. You can
adjust the BE configuration `variant_max_merged_tablet_schema_size`
accordingly, but it is not recommended to exceed 4096 columns(Requires
higher-spec hardware).
2. Is there a difference between null in the VARIANT type (e.g., `{"key":
null}`) and SQL NULL (i.e., IS NULL)?
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]