This is an automated email from the ASF dual-hosted git repository.

yiguolei pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git

commit ae206596c9152956e1534951c8969706ced81c42
Author: lihangyu <15605149...@163.com>
AuthorDate: Tue Mar 12 10:44:32 2024 +0800

    [DOC](Variant) add bloom filter description and correct some benchmark 
results (#31998)
---
 .../sql-manual/sql-reference/Data-Types/VARIANT.md | 22 +++++++++++++++-------
 .../sql-manual/sql-reference/Data-Types/VARIANT.md | 22 +++++++++++++++-------
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/docs/en/docs/sql-manual/sql-reference/Data-Types/VARIANT.md 
b/docs/en/docs/sql-manual/sql-reference/Data-Types/VARIANT.md
index 9382b7dbda0..f61fa2f3395 100644
--- a/docs/en/docs/sql-manual/sql-reference/Data-Types/VARIANT.md
+++ b/docs/en/docs/sql-manual/sql-reference/Data-Types/VARIANT.md
@@ -45,11 +45,11 @@ Below are test results based on clickbench data:
 
 |    | Storage Space |
 |--------------|------------|
-| Predefined Static Columns | 24.329 GB  |
-| VARIANT Type    | 24.296 GB  |
-| JSON Type             | 46.730 GB  |
+| Predefined Static Columns | 12.618 GB  |
+| VARIANT Type    | 12.718 GB  |
+| JSON Type             | 35.711 GB   |
 
-**Saves approximately 50% storage capacity**
+**Saves approximately 65% storage capacity**
 
 | Query Counts        | Predefined Static Columns | VARIANT Type | JSON Type   
     |
 
|---------------------|---------------------------|--------------|-----------------|
@@ -84,12 +84,20 @@ CREATE TABLE IF NOT EXISTS ${table_name} (
 )
 table_properties;
 
+-- Create an bloom filter on v column, to enhance query seed on sub columns
+CREATE TABLE IF NOT EXISTS ${table_name} (
+    k BIGINT,
+    v VARIANT
+)
+...
+properties("replication_num" = "1", "bloom_filter_columns" = "v");
+
 ```
 
 **Query Syntax**
 
 ``` sql
--- use v['a']['b'] format for example
+-- use v['a']['b'] format for example, v['properties']['title'] type is Variant
 SELECT v['properties']['title'] from ${table_name}
 
 ```
@@ -351,8 +359,8 @@ When the above types cannot be compatible, they will be 
transformed into JSON ty
 **Other limitations include:**
 
 - Aggregate models are currently not supported.
-- VARIANT columns can only create inverted indexes.
-- Using the **RANDOM** mode is recommended for higher write performance.
+- VARIANT columns can only create inverted indexes or bloom filter to speed up 
query.
+- Using the **RANDOM** mode or [group 
commit](https://doris.apache.org/docs/dev/data-operate/import/import-way/group-commit-manual/)
 mode is recommended for higher write performance.
 - Non-standard JSON types such as date and decimal should ideally use static 
types for better performance, since these types are infered to text type.
 - Arrays with dimensions of 2 or higher will be stored as JSONB encoding, 
which might perform less efficiently than native arrays.
 - Not supported as primary or sort keys.
diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/VARIANT.md 
b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/VARIANT.md
index c784a1ada36..9b7f012c2ee 100644
--- a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/VARIANT.md
+++ b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/VARIANT.md
@@ -45,13 +45,13 @@ VARIANT类型
 
 |    | 存储空间   |
 |--------------|------------|
-| 预定义静态列 | 24.329 GB  |
-| variant 类型    | 24.296 GB  |
-| json 类型             | 46.730 GB  |
+| 预定义静态列 | 12.618 GB  |
+| variant 类型    | 12.718 GB |
+| json 类型             | 35.711 GB   |
    
    
 
-**节省约 50%存储容量**
+**节省约 65%存储容量**
 
 | 查询次数        | 预定义静态列 | variant 类型 | json 类型        |
 |----------------|--------------|--------------|-----------------|
@@ -88,12 +88,20 @@ CREATE TABLE IF NOT EXISTS ${table_name} (
     INDEX idx_var(v) USING INVERTED [PROPERTIES("parser" = 
"english|unicode|chinese")] [COMMENT 'your comment']
 )
 table_properties;
+
+-- 在v列创建bloom filter
+CREATE TABLE IF NOT EXISTS ${table_name} (
+    k BIGINT,
+    v VARIANT
+)
+...
+properties("replication_num" = "1", "bloom_filter_columns" = "v");
 ```
 
 **查询语法**
 
 ``` sql
--- 使用 v['a']['b'] 形式例如
+-- 使用 v['a']['b'] 形式如下,v['properties']['title']类型是Variant
 SELECT v['properties']['title'] from ${table_name}
 ```
 
@@ -359,8 +367,8 @@ VARIANT 动态列与预定义静态列几乎一样高效。处理诸如日志之
 其它限制如下:
 
 - 目前不支持 Aggregate 模型
-- VARIANT 列只能创建倒排索引
-- **推荐使用 RANDOM 模式, 写入性能更高效**
+- VARIANT 列只能创建倒排索引或者bloom filter来加速过滤
+- **推荐使用 RANDOM 模式和[Group 
Commit](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/group-commit-manual/)模式,
 写入性能更高效**
 - 日期、decimal 等非标准 JSON 类型会被默认推断成字符串类型,所以尽可能从 VARIANT 中提取出来,用静态类型,性能更好
 - 2 维及其以上的数组列存化会被存成 JSONB 编码,性能不如原生数组
 - 不支持作为主键或者排序键


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to