This is an automated email from the ASF dual-hosted git repository. yiguolei pushed a commit to branch branch-2.1 in repository https://gitbox.apache.org/repos/asf/doris.git
commit ae206596c9152956e1534951c8969706ced81c42 Author: lihangyu <15605149...@163.com> AuthorDate: Tue Mar 12 10:44:32 2024 +0800 [DOC](Variant) add bloom filter description and correct some benchmark results (#31998) --- .../sql-manual/sql-reference/Data-Types/VARIANT.md | 22 +++++++++++++++------- .../sql-manual/sql-reference/Data-Types/VARIANT.md | 22 +++++++++++++++------- 2 files changed, 30 insertions(+), 14 deletions(-) diff --git a/docs/en/docs/sql-manual/sql-reference/Data-Types/VARIANT.md b/docs/en/docs/sql-manual/sql-reference/Data-Types/VARIANT.md index 9382b7dbda0..f61fa2f3395 100644 --- a/docs/en/docs/sql-manual/sql-reference/Data-Types/VARIANT.md +++ b/docs/en/docs/sql-manual/sql-reference/Data-Types/VARIANT.md @@ -45,11 +45,11 @@ Below are test results based on clickbench data: | | Storage Space | |--------------|------------| -| Predefined Static Columns | 24.329 GB | -| VARIANT Type | 24.296 GB | -| JSON Type | 46.730 GB | +| Predefined Static Columns | 12.618 GB | +| VARIANT Type | 12.718 GB | +| JSON Type | 35.711 GB | -**Saves approximately 50% storage capacity** +**Saves approximately 65% storage capacity** | Query Counts | Predefined Static Columns | VARIANT Type | JSON Type | |---------------------|---------------------------|--------------|-----------------| @@ -84,12 +84,20 @@ CREATE TABLE IF NOT EXISTS ${table_name} ( ) table_properties; +-- Create an bloom filter on v column, to enhance query seed on sub columns +CREATE TABLE IF NOT EXISTS ${table_name} ( + k BIGINT, + v VARIANT +) +... +properties("replication_num" = "1", "bloom_filter_columns" = "v"); + ``` **Query Syntax** ``` sql --- use v['a']['b'] format for example +-- use v['a']['b'] format for example, v['properties']['title'] type is Variant SELECT v['properties']['title'] from ${table_name} ``` @@ -351,8 +359,8 @@ When the above types cannot be compatible, they will be transformed into JSON ty **Other limitations include:** - Aggregate models are currently not supported. -- VARIANT columns can only create inverted indexes. -- Using the **RANDOM** mode is recommended for higher write performance. +- VARIANT columns can only create inverted indexes or bloom filter to speed up query. +- Using the **RANDOM** mode or [group commit](https://doris.apache.org/docs/dev/data-operate/import/import-way/group-commit-manual/) mode is recommended for higher write performance. - Non-standard JSON types such as date and decimal should ideally use static types for better performance, since these types are infered to text type. - Arrays with dimensions of 2 or higher will be stored as JSONB encoding, which might perform less efficiently than native arrays. - Not supported as primary or sort keys. diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/VARIANT.md b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/VARIANT.md index c784a1ada36..9b7f012c2ee 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/VARIANT.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/VARIANT.md @@ -45,13 +45,13 @@ VARIANT类型 | | 存储空间 | |--------------|------------| -| 预定义静态列 | 24.329 GB | -| variant 类型 | 24.296 GB | -| json 类型 | 46.730 GB | +| 预定义静态列 | 12.618 GB | +| variant 类型 | 12.718 GB | +| json 类型 | 35.711 GB | -**节省约 50%存储容量** +**节省约 65%存储容量** | 查询次数 | 预定义静态列 | variant 类型 | json 类型 | |----------------|--------------|--------------|-----------------| @@ -88,12 +88,20 @@ CREATE TABLE IF NOT EXISTS ${table_name} ( INDEX idx_var(v) USING INVERTED [PROPERTIES("parser" = "english|unicode|chinese")] [COMMENT 'your comment'] ) table_properties; + +-- 在v列创建bloom filter +CREATE TABLE IF NOT EXISTS ${table_name} ( + k BIGINT, + v VARIANT +) +... +properties("replication_num" = "1", "bloom_filter_columns" = "v"); ``` **查询语法** ``` sql --- 使用 v['a']['b'] 形式例如 +-- 使用 v['a']['b'] 形式如下,v['properties']['title']类型是Variant SELECT v['properties']['title'] from ${table_name} ``` @@ -359,8 +367,8 @@ VARIANT 动态列与预定义静态列几乎一样高效。处理诸如日志之 其它限制如下: - 目前不支持 Aggregate 模型 -- VARIANT 列只能创建倒排索引 -- **推荐使用 RANDOM 模式, 写入性能更高效** +- VARIANT 列只能创建倒排索引或者bloom filter来加速过滤 +- **推荐使用 RANDOM 模式和[Group Commit](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/group-commit-manual/)模式, 写入性能更高效** - 日期、decimal 等非标准 JSON 类型会被默认推断成字符串类型,所以尽可能从 VARIANT 中提取出来,用静态类型,性能更好 - 2 维及其以上的数组列存化会被存成 JSONB 编码,性能不如原生数组 - 不支持作为主键或者排序键 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org