This is an automated email from the ASF dual-hosted git repository.

eldenmoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new f0f73b855ce [variant] optimize description on some user cases (#2338)
f0f73b855ce is described below

commit f0f73b855cef01af549656743b58b001eee460c6
Author: lihangyu <lihan...@selectdb.com>
AuthorDate: Mon Apr 28 16:57:24 2025 +0800

    [variant] optimize description on some user cases (#2338)
    
    select variant fields recommendation
    
    ## Versions
    
    - [x] dev
    - [x] 3.0
    - [x] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 .../sql-data-types/semi-structured/VARIANT.md          | 17 +++++++++++++++++
 .../sql-data-types/semi-structured/VARIANT.md          | 17 +++++++++++++++++
 .../sql-data-types/semi-structured/VARIANT.md          | 17 +++++++++++++++++
 .../sql-data-types/semi-structured/VARIANT.md          | 17 +++++++++++++++++
 .../sql-data-types/semi-structured/VARIANT.md          | 17 +++++++++++++++++
 .../sql-data-types/semi-structured/VARIANT.md          | 18 ++++++++++++++++++
 6 files changed, 103 insertions(+)

diff --git 
a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md 
b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 6292966c3d8..8dc006e7d7b 100644
--- a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -449,6 +449,23 @@ When the above types cannot be compatible, they will be 
transformed into JSON ty
 - Arrays with dimensions of 2 or higher will be stored as JSONB encoding, 
which might perform less efficiently than native arrays.
 - Not supported as primary or sort keys.
 - Queries with filters or aggregations require casting. The storage layer 
eliminates cast operations based on storage type and the target type of the 
cast, speeding up queries. 
+- Reading a VARIANT column inherently involves scanning all its subfields. If 
the column contains numerous subfields, this can lead to substantial scan 
overhead and negatively impact query performance. To optimize performance when 
you need to retrieve the entire column, consider adding an additional column of 
type STRING or JSONB to store the raw JSON string. Example:
+``` sql
+-- Lead to scan all subfields of data_variant
+CREATE TABLE example_table (
+  id INT,
+  data_variant VARIANT
+);
+SELECT * FROM example_table WHERE data_variant LIKE '%doris%'
+
+-- Better performance for `LIKE`
+CREATE TABLE example_table (
+  id INT,
+  data_variant VARIANT,
+  data_string STRING
+);
+SELECT * FROM example_table WHERE data_string LIKE '%doris%'
+```
 
 ### FAQ
 1.Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: 
[DATA_QUALITY_ERROR] Reached max column size limit 2048.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 8c827801616..99c2e72c724 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -441,6 +441,23 @@ VARIANT 动态列与预定义静态列几乎一样高效。处理诸如日志之
 - 2 维及其以上的数组列存化会被存成 JSONB 编码,性能不如原生数组
 - 不支持作为主键或者排序键
 - 查询过滤、聚合需要带 cast,存储层会根据存储类型和 cast 目标类型来消除 cast 操作,加速查询。
+- 读取整个 VARIANT 
列时,会扫描其所有子字段。如果该列包含大量子字段,这可能导致显著的扫描开销,从而影响查询性能。为了优化在需要检索整个列时的性能,建议添加一个 STRING 或 
JSONB 类型的附加列,用于存储原始的 JSON 字符串。这种方法允许您直接查询整个 JSON 对象,从而减少扫描成本。
+``` sql
+-- 导致扫描 data_variant 的所有子字段
+CREATE TABLE example_table (
+  id INT,
+  data_variant VARIANT
+);
+SELECT * FROM example_table WHERE data_variant LIKE '%doris%';
+
+-- 对于 `LIKE` 查询,性能更佳
+CREATE TABLE example_table (
+  id INT,
+  data_string STRING,
+  data_variant VARIANT
+);
+SELECT * FROM example_table WHERE data_string LIKE '%doris%';
+```
 
 ### FAQ
 1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: 
[DATA_QUALITY_ERROR]Reached max column size limit 2048。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 3b6b2589199..9c9025aabb5 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -441,6 +441,23 @@ VARIANT 动态列与预定义静态列几乎一样高效。处理诸如日志之
 - 2 维及其以上的数组列存化会被存成 JSONB 编码,性能不如原生数组
 - 不支持作为主键或者排序键
 - 查询过滤、聚合需要带 cast,存储层会根据存储类型和 cast 目标类型来消除 cast 操作,加速查询。
+- 读取整个 VARIANT 
列时,会扫描其所有子字段。如果该列包含大量子字段,这可能导致显著的扫描开销,从而影响查询性能。为了优化在需要检索整个列时的性能,建议添加一个 STRING 或 
JSONB 类型的附加列,用于存储原始的 JSON 字符串。这种方法允许您直接查询整个 JSON 对象,从而减少扫描成本。
+``` sql
+-- 导致扫描 data_variant 的所有子字段
+CREATE TABLE example_table (
+  id INT,
+  data_variant VARIANT
+);
+SELECT * FROM example_table WHERE data_variant LIKE '%doris%';
+
+-- 对于 `LIKE` 查询,性能更佳
+CREATE TABLE example_table (
+  id INT,
+  data_string STRING,
+  data_variant VARIANT
+);
+SELECT * FROM example_table WHERE data_string LIKE '%doris%';
+```
 
 ### FAQ
 1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: 
[DATA_QUALITY_ERROR]Reached max column size limit 2048。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index b373eabe0ae..39345ad0b89 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -453,6 +453,23 @@ VARIANT 动态列与预定义静态列几乎一样高效。处理诸如日志之
 - 2 维及其以上的数组列存化会被存成 JSONB 编码,性能不如原生数组
 - 不支持作为主键或者排序键
 - 查询过滤、聚合需要带 cast,存储层会根据存储类型和 cast 目标类型来消除 cast 操作,加速查询。
+- 读取整个 VARIANT 
列时,会扫描其所有子字段。如果该列包含大量子字段,这可能导致显著的扫描开销,从而影响查询性能。为了优化在需要检索整个列时的性能,建议添加一个 STRING 或 
JSONB 类型的附加列,用于存储原始的 JSON 字符串。这种方法允许您直接查询整个 JSON 对象,从而减少扫描成本。
+``` sql
+-- 导致扫描 data_variant 的所有子字段
+CREATE TABLE example_table (
+  id INT,
+  data_variant VARIANT
+);
+SELECT * FROM example_table WHERE data_variant LIKE '%doris%';
+
+-- 对于 `LIKE` 查询,性能更佳
+CREATE TABLE example_table (
+  id INT,
+  data_string STRING,
+  data_variant VARIANT
+);
+SELECT * FROM example_table WHERE data_string LIKE '%doris%';
+```
 
 ### FAQ
 1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: 
[DATA_QUALITY_ERROR]Reached max column size limit 2048。
diff --git 
a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 6e528caf02a..17f60a219db 100644
--- 
a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -437,6 +437,23 @@ When the above types cannot be compatible, they will be 
transformed into JSON ty
 - Arrays with dimensions of 2 or higher will be stored as JSONB encoding, 
which might perform less efficiently than native arrays.
 - Not supported as primary or sort keys.
 - Queries with filters or aggregations require casting. The storage layer 
eliminates cast operations based on storage type and the target type of the 
cast, speeding up queries. 
+- Reading a VARIANT column inherently involves scanning all its subfields. If 
the column contains numerous subfields, this can lead to substantial scan 
overhead and negatively impact query performance. To optimize performance when 
you need to retrieve the entire column, consider adding an additional column of 
type STRING or JSONB to store the raw JSON string. Example:
+``` sql
+-- Lead to scan all subfields of data_variant
+CREATE TABLE example_table (
+  id INT,
+  data_variant VARIANT,
+);
+SELECT * FROM example_table WHERE data_variant LIKE '%doris%'
+
+-- Better performance for `LIKE`
+CREATE TABLE example_table (
+  id INT,
+  data_variant VARIANT,
+  data_string STRING
+);
+SELECT * FROM example_table WHERE data_string LIKE '%doris%'
+```
 
 ### FAQ
 1.Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: 
[DATA_QUALITY_ERROR] Reached max column size limit 2048.
diff --git 
a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
 
b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 6e528caf02a..9bda2180363 100644
--- 
a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ 
b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -437,6 +437,24 @@ When the above types cannot be compatible, they will be 
transformed into JSON ty
 - Arrays with dimensions of 2 or higher will be stored as JSONB encoding, 
which might perform less efficiently than native arrays.
 - Not supported as primary or sort keys.
 - Queries with filters or aggregations require casting. The storage layer 
eliminates cast operations based on storage type and the target type of the 
cast, speeding up queries. 
+- Reading a VARIANT column inherently involves scanning all its subfields. If 
the column contains numerous subfields, this can lead to substantial scan 
overhead and negatively impact query performance. To optimize performance when 
you need to retrieve the entire column, consider adding an additional column of 
type STRING or JSONB to store the raw JSON string. Example:
+``` sql
+-- Lead to scan all subfields of data_variant
+CREATE TABLE example_table (
+  id INT,
+  data_variant VARIANT,
+);
+SELECT * FROM example_table WHERE data_variant LIKE '%doris%'
+
+-- Better performance for `LIKE`
+CREATE TABLE example_table (
+  id INT,
+  data_variant VARIANT,
+  data_string STRING
+);
+SELECT * FROM example_table WHERE data_string LIKE '%doris%'
+```
+
 
 ### FAQ
 1.Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: 
[DATA_QUALITY_ERROR] Reached max column size limit 2048.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to