This is an automated email from the ASF dual-hosted git repository. eldenmoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new f0f73b855ce [variant] optimize description on some user cases (#2338) f0f73b855ce is described below commit f0f73b855cef01af549656743b58b001eee460c6 Author: lihangyu <lihan...@selectdb.com> AuthorDate: Mon Apr 28 16:57:24 2025 +0800 [variant] optimize description on some user cases (#2338) select variant fields recommendation ## Versions - [x] dev - [x] 3.0 - [x] 2.1 - [ ] 2.0 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --- .../sql-data-types/semi-structured/VARIANT.md | 17 +++++++++++++++++ .../sql-data-types/semi-structured/VARIANT.md | 17 +++++++++++++++++ .../sql-data-types/semi-structured/VARIANT.md | 17 +++++++++++++++++ .../sql-data-types/semi-structured/VARIANT.md | 17 +++++++++++++++++ .../sql-data-types/semi-structured/VARIANT.md | 17 +++++++++++++++++ .../sql-data-types/semi-structured/VARIANT.md | 18 ++++++++++++++++++ 6 files changed, 103 insertions(+) diff --git a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md index 6292966c3d8..8dc006e7d7b 100644 --- a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md +++ b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md @@ -449,6 +449,23 @@ When the above types cannot be compatible, they will be transformed into JSON ty - Arrays with dimensions of 2 or higher will be stored as JSONB encoding, which might perform less efficiently than native arrays. - Not supported as primary or sort keys. - Queries with filters or aggregations require casting. The storage layer eliminates cast operations based on storage type and the target type of the cast, speeding up queries. +- Reading a VARIANT column inherently involves scanning all its subfields. If the column contains numerous subfields, this can lead to substantial scan overhead and negatively impact query performance. To optimize performance when you need to retrieve the entire column, consider adding an additional column of type STRING or JSONB to store the raw JSON string. Example: +``` sql +-- Lead to scan all subfields of data_variant +CREATE TABLE example_table ( + id INT, + data_variant VARIANT +); +SELECT * FROM example_table WHERE data_variant LIKE '%doris%' + +-- Better performance for `LIKE` +CREATE TABLE example_table ( + id INT, + data_variant VARIANT, + data_string STRING +); +SELECT * FROM example_table WHERE data_string LIKE '%doris%' +``` ### FAQ 1.Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: [DATA_QUALITY_ERROR] Reached max column size limit 2048. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md index 8c827801616..99c2e72c724 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md @@ -441,6 +441,23 @@ VARIANT 动态列与预定义静态列几乎一样高效。处理诸如日志之 - 2 维及其以上的数组列存化会被存成 JSONB 编码,性能不如原生数组 - 不支持作为主键或者排序键 - 查询过滤、聚合需要带 cast,存储层会根据存储类型和 cast 目标类型来消除 cast 操作,加速查询。 +- 读取整个 VARIANT 列时,会扫描其所有子字段。如果该列包含大量子字段,这可能导致显著的扫描开销,从而影响查询性能。为了优化在需要检索整个列时的性能,建议添加一个 STRING 或 JSONB 类型的附加列,用于存储原始的 JSON 字符串。这种方法允许您直接查询整个 JSON 对象,从而减少扫描成本。 +``` sql +-- 导致扫描 data_variant 的所有子字段 +CREATE TABLE example_table ( + id INT, + data_variant VARIANT +); +SELECT * FROM example_table WHERE data_variant LIKE '%doris%'; + +-- 对于 `LIKE` 查询,性能更佳 +CREATE TABLE example_table ( + id INT, + data_string STRING, + data_variant VARIANT +); +SELECT * FROM example_table WHERE data_string LIKE '%doris%'; +``` ### FAQ 1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: [DATA_QUALITY_ERROR]Reached max column size limit 2048。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md index 3b6b2589199..9c9025aabb5 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md @@ -441,6 +441,23 @@ VARIANT 动态列与预定义静态列几乎一样高效。处理诸如日志之 - 2 维及其以上的数组列存化会被存成 JSONB 编码,性能不如原生数组 - 不支持作为主键或者排序键 - 查询过滤、聚合需要带 cast,存储层会根据存储类型和 cast 目标类型来消除 cast 操作,加速查询。 +- 读取整个 VARIANT 列时,会扫描其所有子字段。如果该列包含大量子字段,这可能导致显著的扫描开销,从而影响查询性能。为了优化在需要检索整个列时的性能,建议添加一个 STRING 或 JSONB 类型的附加列,用于存储原始的 JSON 字符串。这种方法允许您直接查询整个 JSON 对象,从而减少扫描成本。 +``` sql +-- 导致扫描 data_variant 的所有子字段 +CREATE TABLE example_table ( + id INT, + data_variant VARIANT +); +SELECT * FROM example_table WHERE data_variant LIKE '%doris%'; + +-- 对于 `LIKE` 查询,性能更佳 +CREATE TABLE example_table ( + id INT, + data_string STRING, + data_variant VARIANT +); +SELECT * FROM example_table WHERE data_string LIKE '%doris%'; +``` ### FAQ 1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: [DATA_QUALITY_ERROR]Reached max column size limit 2048。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md index b373eabe0ae..39345ad0b89 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md @@ -453,6 +453,23 @@ VARIANT 动态列与预定义静态列几乎一样高效。处理诸如日志之 - 2 维及其以上的数组列存化会被存成 JSONB 编码,性能不如原生数组 - 不支持作为主键或者排序键 - 查询过滤、聚合需要带 cast,存储层会根据存储类型和 cast 目标类型来消除 cast 操作,加速查询。 +- 读取整个 VARIANT 列时,会扫描其所有子字段。如果该列包含大量子字段,这可能导致显著的扫描开销,从而影响查询性能。为了优化在需要检索整个列时的性能,建议添加一个 STRING 或 JSONB 类型的附加列,用于存储原始的 JSON 字符串。这种方法允许您直接查询整个 JSON 对象,从而减少扫描成本。 +``` sql +-- 导致扫描 data_variant 的所有子字段 +CREATE TABLE example_table ( + id INT, + data_variant VARIANT +); +SELECT * FROM example_table WHERE data_variant LIKE '%doris%'; + +-- 对于 `LIKE` 查询,性能更佳 +CREATE TABLE example_table ( + id INT, + data_string STRING, + data_variant VARIANT +); +SELECT * FROM example_table WHERE data_string LIKE '%doris%'; +``` ### FAQ 1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: [DATA_QUALITY_ERROR]Reached max column size limit 2048。 diff --git a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md index 6e528caf02a..17f60a219db 100644 --- a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md +++ b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md @@ -437,6 +437,23 @@ When the above types cannot be compatible, they will be transformed into JSON ty - Arrays with dimensions of 2 or higher will be stored as JSONB encoding, which might perform less efficiently than native arrays. - Not supported as primary or sort keys. - Queries with filters or aggregations require casting. The storage layer eliminates cast operations based on storage type and the target type of the cast, speeding up queries. +- Reading a VARIANT column inherently involves scanning all its subfields. If the column contains numerous subfields, this can lead to substantial scan overhead and negatively impact query performance. To optimize performance when you need to retrieve the entire column, consider adding an additional column of type STRING or JSONB to store the raw JSON string. Example: +``` sql +-- Lead to scan all subfields of data_variant +CREATE TABLE example_table ( + id INT, + data_variant VARIANT, +); +SELECT * FROM example_table WHERE data_variant LIKE '%doris%' + +-- Better performance for `LIKE` +CREATE TABLE example_table ( + id INT, + data_variant VARIANT, + data_string STRING +); +SELECT * FROM example_table WHERE data_string LIKE '%doris%' +``` ### FAQ 1.Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: [DATA_QUALITY_ERROR] Reached max column size limit 2048. diff --git a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md index 6e528caf02a..9bda2180363 100644 --- a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md +++ b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md @@ -437,6 +437,24 @@ When the above types cannot be compatible, they will be transformed into JSON ty - Arrays with dimensions of 2 or higher will be stored as JSONB encoding, which might perform less efficiently than native arrays. - Not supported as primary or sort keys. - Queries with filters or aggregations require casting. The storage layer eliminates cast operations based on storage type and the target type of the cast, speeding up queries. +- Reading a VARIANT column inherently involves scanning all its subfields. If the column contains numerous subfields, this can lead to substantial scan overhead and negatively impact query performance. To optimize performance when you need to retrieve the entire column, consider adding an additional column of type STRING or JSONB to store the raw JSON string. Example: +``` sql +-- Lead to scan all subfields of data_variant +CREATE TABLE example_table ( + id INT, + data_variant VARIANT, +); +SELECT * FROM example_table WHERE data_variant LIKE '%doris%' + +-- Better performance for `LIKE` +CREATE TABLE example_table ( + id INT, + data_variant VARIANT, + data_string STRING +); +SELECT * FROM example_table WHERE data_string LIKE '%doris%' +``` + ### FAQ 1.Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: [DATA_QUALITY_ERROR] Reached max column size limit 2048. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org