This is an automated email from the ASF dual-hosted git repository. kassiez pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 606ab35cf3a [fix] Fix data type of version-2.0 (#2108) 606ab35cf3a is described below commit 606ab35cf3a01c58a235ca63f681297116a939f0 Author: KassieZ <139741991+kass...@users.noreply.github.com> AuthorDate: Fri Feb 21 21:11:12 2025 +0800 [fix] Fix data type of version-2.0 (#2108) ## Versions - [ ] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [ ] Chinese - [ ] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --- .../sql-data-types/data-type-overview.md | 11 +- .../sql-manual/sql-data-types/ip/IPV4.md | 76 ----- .../sql-manual/sql-data-types/ip/IPV6.md | 73 ---- .../sql-data-types/semi-structured/VARIANT.md | 371 --------------------- .../version-2.0/table-design/data-type.md | 6 +- .../sql-data-types/data-type-overview.md | 6 - .../sql-manual/sql-data-types/ip/IPV4.md | 78 ----- .../sql-manual/sql-data-types/ip/IPV6.md | 76 ----- .../sql-data-types/semi-structured/VARIANT.md | 370 -------------------- .../version-2.0/table-design/data-type.md | 6 - versioned_sidebars/version-2.0-sidebars.json | 11 +- 11 files changed, 3 insertions(+), 1081 deletions(-) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/data-type-overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/data-type-overview.md index bd359098ee2..47699526e66 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/data-type-overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/data-type-overview.md @@ -72,9 +72,7 @@ under the License. 1. [CHAR(M)](../../sql-manual/sql-data-types/string/CHAR.md):定长字符串,固定长度 M 字节,M 的范围是 [1, 255]。 -2. [VARCHAR(M)](../../sql-manual/sql-data-types/string/VARCHAR.md):不定长字符串,M 是最大长度,M 的范围是 [1, 65533]。 - -3. [STRING](../../sql-manual/sql-data-types/string/STRING.md):不定长字符串,默认最长 1048576 字节(1MB),可调大到 2147483643 字节(2GB),BE 配置 string_type_length_soft_limit_bytes。 +2. [STRING](../../sql-manual/sql-data-types/string/STRING.md):不定长字符串,默认最长 1048576 字节(1MB),可调大到 2147483643 字节(2GB),BE 配置 string_type_length_soft_limit_bytes。 ## 半结构化类型 @@ -99,10 +97,3 @@ under the License. 4. [AGG_STATE](../../sql-manual/sql-data-types/aggregate/AGG-STATE.md):用于聚合计算加速,配合 state/merge/union 聚合函数组合器使用。 -## IP 类型 - -IP 类型以二进制形式存储 IP 地址,比用字符串存储更省空间查询速度更快,支持 2 种类型: - -1. [IPv4](../../sql-manual/sql-data-types/ip/IPV4.md):以 4 字节二进制存储 IPv4 地址,配合 ipv4_* 系列函数使用。 - -2. [IPv6](../../sql-manual/sql-data-types/ip/IPV6.md):以 16 字节二进制存储 IPv6 地址,配合 ipv6_* 系列函数使用。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/ip/IPV4.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/ip/IPV4.md deleted file mode 100644 index 9405eaca15e..00000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/ip/IPV4.md +++ /dev/null @@ -1,76 +0,0 @@ ---- -{ - "title": "IPV4", - "language": "zh-CN" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - - - -## 描述 - -IPv4 类型,以 UInt32 的形式存储在 4 个字节中,用于表示 IPv4 地址。 -取值范围是 ['0.0.0.0', '255.255.255.255']。 - -`超出取值范围或者格式非法的输入将返回NULL` - -## 举例 - -建表示例如下: - -``` -CREATE TABLE ipv4_test ( - `id` int, - `ip_v4` ipv4 -) ENGINE=OLAP -DISTRIBUTED BY HASH(`id`) BUCKETS 4 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1" -); -``` - -插入数据示例: - -``` -insert into ipv4_test values(1, '0.0.0.0'); -insert into ipv4_test values(2, '127.0.0.1'); -insert into ipv4_test values(3, '59.50.185.152'); -insert into ipv4_test values(4, '255.255.255.255'); -insert into ipv4_test values(5, '255.255.255.256'); // invalid data -``` - -查询数据示例: - -``` -mysql> select * from ipv4_test order by id; -+------+-----------------+ -| id | ip_v4 | -+------+-----------------+ -| 1 | 0.0.0.0 | -| 2 | 127.0.0.1 | -| 3 | 59.50.185.152 | -| 4 | 255.255.255.255 | -| 5 | NULL | -+------+-----------------+ -``` - - diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/ip/IPV6.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/ip/IPV6.md deleted file mode 100644 index 2f6e5a14c59..00000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/ip/IPV6.md +++ /dev/null @@ -1,73 +0,0 @@ ---- -{ - "title": "IPV6", - "language": "zh-CN" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - - -## 描述 - -IPv6 类型,以 UInt128 的形式存储在 16 个字节中,用于表示 IPv6 地址。 -取值范围是 ['::', 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff']。 - -`超出取值范围或者格式非法的输入将返回NULL` - -## 举例 - -建表示例如下: - -``` -CREATE TABLE ipv6_test ( - `id` int, - `ip_v6` ipv6 -) ENGINE=OLAP -DISTRIBUTED BY HASH(`id`) BUCKETS 4 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1" -); -``` - -插入数据示例: - -``` -insert into ipv6_test values(1, '::'); -insert into ipv6_test values(2, '2001:16a0:2:200a::2'); -insert into ipv6_test values(3, 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff'); -insert into ipv6_test values(4, 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffffg'); // invalid data -``` - -查询数据示例: - -``` -mysql> select * from ipv6_test order by id; -+------+-----------------------------------------+ -| id | ip_v6 | -+------+-----------------------------------------+ -| 1 | :: | -| 2 | 2001:16a0:2:200a::2 | -| 3 | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | -| 4 | NULL | -+------+-----------------------------------------+ -``` - - diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/semi-structured/VARIANT.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/semi-structured/VARIANT.md deleted file mode 100644 index b1893e4a9c9..00000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-data-types/semi-structured/VARIANT.md +++ /dev/null @@ -1,371 +0,0 @@ ---- -{ - "title": "VARIANT", - "language": "zh-CN" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - - - -## 描述 - -在 Doris 2.1 中引入一种新的数据类型 VARIANT,它可以存储半结构化 JSON 数据。它允许存储包含不同数据类型(如整数、字符串、布尔值等)的复杂数据结构,而无需在表结构中提前定义具体的列。VARIANT 类型特别适用于处理复杂的嵌套结构,而这些结构可能随时会发生变化。在写入过程中,该类型可以自动根据列的结构、类型推断列信息,动态合并写入的 schema,并通过将 JSON 键及其对应的值存储为列和动态子列。 - -### Note - -相比 JSON 类型有以下优势: - -1. 存储方式不同,JSON 类型是以二进制 JSONB 格式进行存储,整行 JSON 以行存的形式存储到 segment 文件中。而 VARIANT 类型在写入的时候进行类型推断,将写入的 JSON 列存化。比 JSON 类型有更高的压缩比,存储空间更小。 -2. 查询方式不同,查询不需要进行解析。VARIANT 充分利用 Doris 中列式存储、向量化引擎、优化器等组件给用户带来极高的查询性能。 -下面是基于 clickbench 数据测试的结果: - -| | 存储空间 | -|--------------|------------| -| 预定义静态列 | 12.618 GB | -| VARIANT 类型 | 12.718 GB | -| JSON 类型 | 35.711 GB | - -**节省约 65% 存储容量** - -| 查询次数 | 预定义静态列 | VARIANT 类型 | JSON 类型 | -|----------------|--------------|--------------|-----------------| -| 第一次查询 (cold) | 233.79s | 248.66s | **大部分查询超时** | -| 第二次查询 (hot) | 86.02s | 94.82s | 789.24s | -| 第三次查询 (hot) | 83.03s | 92.29s | 743.69s | - -[测试集](https://github.com/ClickHouse/ClickBench/blob/main/doris/queries.sql) 一共 43 个查询语句 - -**查询提速 8+ 倍,查询性能与静态列相当** - -## 举例 - -用一个从建表、导数据、查询全周期的例子说明 VARIANT 的功能和用法。 - -**建表语法** - -建表语法关键字 VARIANT - -``` sql --- 无索引 -CREATE TABLE IF NOT EXISTS ${table_name} ( - k BIGINT, - v VARIANT -) -table_properties; - --- 在v列创建索引,可选指定分词方式,默认不分词 -CREATE TABLE IF NOT EXISTS ${table_name} ( - k BIGINT, - v VARIANT, - INDEX idx_var(v) USING INVERTED [PROPERTIES("parser" = "english|unicode|chinese")] [COMMENT 'your comment'] -) -table_properties; - --- 在v列创建bloom filter -CREATE TABLE IF NOT EXISTS ${table_name} ( - k BIGINT, - v VARIANT -) -... -properties("replication_num" = "1", "bloom_filter_columns" = "v"); -``` - -**查询语法** - -``` sql --- 使用 v['a']['b'] 形式如下,v['properties']['title']类型是VARIANT -SELECT v['properties']['title'] from ${table_name} -``` - -### 基于 github events 数据集示例 - -这里用 github events 数据展示 VARIANT 的建表、导入、查询。 -下面是格式化后的一行数据 - -``` json -{ - "id": "14186154924", - "type": "PushEvent", - "actor": { - "id": 282080, - "login": "brianchandotcom", - "display_login": "brianchandotcom", - "gravatar_id": "", - "url": "https://api.github.com/users/brianchandotcom", - "avatar_url": "https://avatars.githubusercontent.com/u/282080?" - }, - "repo": { - "id": 1920851, - "name": "brianchandotcom/liferay-portal", - "url": "https://api.github.com/repos/brianchandotcom/liferay-portal" - }, - "payload": { - "push_id": 6027092734, - "size": 4, - "distinct_size": 4, - "ref": "refs/heads/master", - "head": "91edd3c8c98c214155191feb852831ec535580ba", - "before": "abb58cc0db673a0bd5190000d2ff9c53bb51d04d", - "commits": [""] - }, - "public": true, - "created_at": "2020-11-13T18:00:00Z" -} -``` - -**建表** - -- 创建了三个 VARIANT 类型的列, `actor`,`repo` 和 `payload` -- 创建表的同时创建了 `payload` 列的倒排索引 `idx_payload` -- USING INVERTED 指定索引类型是倒排索引,用于加速子列的条件过滤 -- `PROPERTIES("parser" = "english")` 指定采用 english 分词 - -``` sql -CREATE DATABASE test_variant; -USE test_variant; -CREATE TABLE IF NOT EXISTS github_events ( - id BIGINT NOT NULL, - type VARCHAR(30) NULL, - actor VARIANT NULL, - repo VARIANT NULL, - payload VARIANT NULL, - public BOOLEAN NULL, - created_at DATETIME NULL, - INDEX idx_payload (`payload`) USING INVERTED PROPERTIES("parser" = "english") COMMENT 'inverted index for payload' -) -DUPLICATE KEY(`id`) -DISTRIBUTED BY HASH(id) BUCKETS 10 -properties("replication_num" = "1"); -``` - -**需要注意的是:** - -:::tip - -1. 在 VARIANT 列上创建索引,比如 payload 的子列很多时,可能会造成索引列过多,影响写入性能 -2. 同一个 VARIANT 列的分词属性是相同的,如果您有不同的分词需求,那么可以创建多个 VARIANT 然后分别指定索引属性 - -::: - -**使用 streamload 导入** - -导入 gh_2022-11-07-3.json,这是 github events 一个小时的数据 - -``` shell -wget https://qa-build.oss-cn-beijing.aliyuncs.com/regression/variant/gh_2022-11-07-3.json - -curl --location-trusted -u root: -T gh_2022-11-07-3.json -H "read_json_by_line:true" -H "format:json" http://127.0.0.1:18148/api/test_variant/github_events/_strea -m_load - -{ - "TxnId": 2, - "Label": "086fd46a-20e6-4487-becc-9b6ca80281bf", - "Comment": "", - "TwoPhaseCommit": "false", - "Status": "Success", - "Message": "OK", - "NumberTotalRows": 139325, - "NumberLoadedRows": 139325, - "NumberFilteredRows": 0, - "NumberUnselectedRows": 0, - "LoadBytes": 633782875, - "LoadTimeMs": 7870, - "BeginTxnTimeMs": 19, - "StreamLoadPutTimeMs": 162, - "ReadDataTimeMs": 2416, - "WriteDataTimeMs": 7634, - "CommitAndPublishTimeMs": 55 -} -``` - -确认导入成功 - -``` sql --- 查看行数 -mysql> select count() from github_events; -+----------+ -| count(*) | -+----------+ -| 139325 | -+----------+ -1 row in set (0.25 sec) - --- 随机看一条数据 -mysql> select * from github_events limit 1; -+-------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------- [...] -| id | type | actor | repo | payload [...] -+-------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------- [...] -| 25061821748 | PushEvent | {"gravatar_id":"","display_login":"jfrog-pipelie-intg","url":"https://api.github.com/users/jfrog-pipelie-intg","id":98024358,"login":"jfrog-pipelie-intg","avatar_url":"https://avatars.githubusercontent.com/u/98024358?"} | {"url":"https://api.github.com/repos/jfrog-pipelie-intg/jfinte2e_1667789956723_16","id":562683829,"name":"jfrog-pipelie-intg/jfinte2e_1667789956723_16"} | {"commits":[{"sha":"334433de436baa198024ef9f55f0647721bcd750","author":{"email":"980243 [...] -+-------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------- [...] -1 row in set (0.23 sec) -``` - -desc 查看 schema 信息,子列会在存储层自动扩展、并进行类型推导 - -``` sql -mysql> desc github_events; -+------------------------------------------------------------+------------+------+-------+---------+-------+ -| Field | Type | Null | Key | Default | Extra | -+------------------------------------------------------------+------------+------+-------+---------+-------+ -| id | BIGINT | No | true | NULL | | -| type | VARCHAR(*) | Yes | false | NULL | NONE | -| actor | VARIANT | Yes | false | NULL | NONE | -| created_at | DATETIME | Yes | false | NULL | NONE | -| payload | VARIANT | Yes | false | NULL | NONE | -| public | BOOLEAN | Yes | false | NULL | NONE | -+------------------------------------------------------------+------------+------+-------+---------+-------+ -6 rows in set (0.07 sec) - -mysql> set describe_extend_variant_column = true; -Query OK, 0 rows affected (0.01 sec) - -mysql> desc github_events; -+------------------------------------------------------------+------------+------+-------+---------+-------+ -| Field | Type | Null | Key | Default | Extra | -+------------------------------------------------------------+------------+------+-------+---------+-------+ -| id | BIGINT | No | true | NULL | | -| type | VARCHAR(*) | Yes | false | NULL | NONE | -| actor | VARIANT | Yes | false | NULL | NONE | -| actor.avatar_url | TEXT | Yes | false | NULL | NONE | -| actor.display_login | TEXT | Yes | false | NULL | NONE | -| actor.id | INT | Yes | false | NULL | NONE | -| actor.login | TEXT | Yes | false | NULL | NONE | -| actor.url | TEXT | Yes | false | NULL | NONE | -| created_at | DATETIME | Yes | false | NULL | NONE | -| payload | VARIANT | Yes | false | NULL | NONE | -| payload.action | TEXT | Yes | false | NULL | NONE | -| payload.before | TEXT | Yes | false | NULL | NONE | -| payload.comment.author_association | TEXT | Yes | false | NULL | NONE | -| payload.comment.body | TEXT | Yes | false | NULL | NONE | -.... -+------------------------------------------------------------+------------+------+-------+---------+-------+ -406 rows in set (0.07 sec) -``` - -desc 可以指定 partition 查看某个 partition 的 schema,语法如下 - -``` -DESCRIBE ${table_name} PARTITION ($partition_name); -``` - -**查询** - -:::tip - -**注意** -如使用过滤和聚合等功能来查询子列,需要对子列进行额外的 cast 操作(因为存储类型不一定是固定的,需要有一个 SQL 统一的类型)。 -例如 SELECT * FROM tbl where CAST(var['titile'] as text) MATCH "hello world" -以下简化的示例说明了如何使用 VARIANT 进行查询 - -::: - -下面是典型的三个查询场景: - -1. 从 github_events 表中获取 top 5 star 数的代码库 - -``` sql -mysql> SELECT - -> cast(repo['name'] as text) as repo_name, count() AS stars - -> FROM github_events - -> WHERE type = 'WatchEvent' - -> GROUP BY repo_name - -> ORDER BY stars DESC LIMIT 5; -+--------------------------+-------+ -| repo_name | stars | -+--------------------------+-------+ -| aplus-framework/app | 78 | -| lensterxyz/lenster | 77 | -| aplus-framework/database | 46 | -| stashapp/stash | 42 | -| aplus-framework/image | 34 | -+--------------------------+-------+ -5 rows in set (0.03 sec) -``` - -2. 获取评论中包含 doris 的数量 - -``` sql --- implicit cast `payload['comment']['body']` to string type -mysql> SELECT - -> count() FROM github_events - -> WHERE payload['comment']['body'] MATCH 'doris'; -+---------+ -| count() | -+---------+ -| 3 | -+---------+ -1 row in set (0.04 sec) -``` - -3. 查询 comments 最多的 issue 号以及对应的库 - -``` sql -mysql> SELECT - -> cast(repo['name'] as string) as repo_name, - -> cast(payload['issue']['number'] as int) as issue_number, - -> count() AS comments, - -> count( - -> distinct cast(actor['login'] as string) - -> ) AS authors - -> FROM github_events - -> WHERE type = 'IssueCommentEvent' AND (cast(payload['action'] as string) = 'created') AND (cast(payload['issue']['number'] as int) > 10) - -> GROUP BY repo_name, issue_number - -> HAVING authors >= 4 - -> ORDER BY comments DESC, repo_name - -> LIMIT 50; -+--------------------------------------+--------------+----------+---------+ -| repo_name | issue_number | comments | authors | -+--------------------------------------+--------------+----------+---------+ -| facebook/react-native | 35228 | 5 | 4 | -| swsnu/swppfall2022-team4 | 27 | 5 | 4 | -| belgattitude/nextjs-monorepo-example | 2865 | 4 | 4 | -+--------------------------------------+--------------+----------+---------+ -3 rows in set (0.03 sec) -``` - -### 使用限制和最佳实践 - -**VARIANT 类型的使用有以下限制:** -VARIANT 动态列与预定义静态列几乎一样高效。处理诸如日志之类的数据,在这类数据中,经常通过动态属性添加字段(例如 Kubernetes 中的容器标签)。但是解析 JSON 和推断类型会在写入时产生额外开销。因此,我们建议保持单次导入列数在 1000 以下。 - -尽可能保证类型一致,Doris 会自动进行如下兼容类型转换,当字段无法进行兼容类型转换时会统一转换成 JSONB 类型。JSONB 列的性能与 int、text 等列性能会有所退化。 - -1. tinyint->smallint->int->bigint,整形可以按照箭头做类型提升 -2. float->double,浮点数按照箭头做类型提升 -3. text,字符串类型 -4. JSON,二进制 JSON 类型 - -上诉类型无法兼容时,会变成 JSON 类型防止类型信息丢失,如果您需要在 VARIANT 中设置严格的 schema,即将推出 VARIANT MAPPING 机制 - -其它限制如下: - -- VARIANT 列只能创建倒排索引或者 bloom filter 来加速过滤 -- **推荐使用 RANDOM 模式和[Group Commit](../../../data-operate/import/group-commit-manual.md) 模式,写入性能更高效** -- 日期、decimal 等非标准 JSON 类型会被默认推断成字符串类型,所以尽可能从 VARIANT 中提取出来,用静态类型,性能更好 -- 2 维及其以上的数组列存化会被存成 JSONB 编码,性能不如原生数组 -- 不支持作为主键或者排序键 -- 查询过滤、聚合需要带 cast,存储层会根据存储类型和 cast 目标类型来消除 cast 操作,加速查询。 - -### FAQ -1. Stream Load 报错: [CANCELLED][INTERNAL_ERROR]tablet error: [DATA_QUALITY_ERROR]Reached max column size limit 2048。 -由于 Compaction 和元信息存储限制,VARIANT 类型会限制列数,默认 2048 列,可以适当调整 BE 配置 `variant_max_merged_tablet_schema_size` ,但是不建议超过 4096 - diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-type.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-type.md index 2d21e021185..6cca8b37f0b 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-type.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-type.md @@ -74,10 +74,6 @@ Apache Doris 已支持的数据类型列表如下: | [QUANTILE_STATE](../sql-manual/sql-data-types/aggregate/QUANTILE-STATE) | 不定长 | QUANTILE_STATE 是一种计算分位数近似值的类型,在导入时会对相同的 Key,不同 Value 进行预聚合,当 value 数量不超过 2048 时采用明细记录所有数据,当 Value 数量大于 2048 时采用 TDigest 算法,对数据进行聚合(聚类)保存聚类后的质心点。QUANTILE_STATE 不能作为 Key 列使用,建表时配合聚合类型为 QUANTILE_UNION。用户不需要指定长度和默认值。长度根据数据的聚合程度系统内控制。QUANTILE_STATE 列只能通过配套的 QUANTILE_PERCENT、QUANTILE_UNION、TO_QUANTILE_STATE 等函数进行查询或使用。 | | [AGG_STATE](../sql-manual/sql-data-types/aggregate/AGG-STATE.md) | 不定长 | 聚合函数,只能配合 state/merge/union 函数组合器使用。AGG_STATE 不能作为 Key 列使用,建表时需要同时声明聚合函数的签名。用户不需要指定长度和默认值。实际存储的数据大小与函数实现有关。 | -### [IP 类型](../sql-manual/sql-data-types/data-type-overview#ip-类型) -| 类型名 | 存储空间(字节)| 描述 | -| -------------- | --------- | ------------------------------------------------------------ | -| [IPv4](../sql-manual/sql-data-types/ip/IPV4) | 4 字节 | 以 4 字节二进制存储 IPv4 地址,配合 ipv4_* 系列函数使用。 | -| [IPv6](../sql-manual/sql-data-types/ip/IPV6) | 16 字节 | 以 16 字节二进制存储 IPv6 地址,配合 ipv6_* 系列函数使用。 | + 也可通过`SHOW DATA TYPES;`语句查看 Apache Doris 支持的所有数据类型。 diff --git a/versioned_docs/version-2.0/sql-manual/sql-data-types/data-type-overview.md b/versioned_docs/version-2.0/sql-manual/sql-data-types/data-type-overview.md index fcc340e3671..704454d92e3 100644 --- a/versioned_docs/version-2.0/sql-manual/sql-data-types/data-type-overview.md +++ b/versioned_docs/version-2.0/sql-manual/sql-data-types/data-type-overview.md @@ -105,10 +105,4 @@ The aggregation data types store aggregation results or intermediate results dur - **[AGG_STATE](../sql-data-types/aggregate/AGG-STATE.md)**: It is used to accelerate aggregations, utilized in combination with aggregation function combinators like state/merge/union. -## IP Types - -IP data types store IP addresses in a binary format, which is faster and more space-efficient for querying compared to storing them as strings. There are two supported IP data types: - -- **[IPv4](../sql-data-types/ip/IPV4.md)**: It stores IPv4 addresses as a 4-byte binary value. It is used in conjunction with the `ipv4_*` family of functions. -- **[IPv6](../sql-data-types/ip/IPV6.md)**: It stores IPv6 addresses as a 16-byte binary value. It is used in conjunction with the `ipv6_*` family of functions. diff --git a/versioned_docs/version-2.0/sql-manual/sql-data-types/ip/IPV4.md b/versioned_docs/version-2.0/sql-manual/sql-data-types/ip/IPV4.md deleted file mode 100644 index 76c2aabb01c..00000000000 --- a/versioned_docs/version-2.0/sql-manual/sql-data-types/ip/IPV4.md +++ /dev/null @@ -1,78 +0,0 @@ ---- -{ - "title": "IPV4", - "language": "en" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -## IPV4 - -### description - -IPv4 type, stored in the form of UInt32 in 4 bytes, used to represent IPv4 addresses. -The range of values is ['0.0.0.0', '255.255.255.255']. - -`Inputs that exceed the value range or have invalid format will return NULL` - -### example - -Create table example: - -``` -CREATE TABLE ipv4_test ( - `id` int, - `ip_v4` ipv4 -) ENGINE=OLAP -DISTRIBUTED BY HASH(`id`) BUCKETS 4 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1" -); -``` - -Insert data example: - -``` -insert into ipv4_test values(1, '0.0.0.0'); -insert into ipv4_test values(2, '127.0.0.1'); -insert into ipv4_test values(3, '59.50.185.152'); -insert into ipv4_test values(4, '255.255.255.255'); -insert into ipv4_test values(5, '255.255.255.256'); // invalid data -``` - -Select data example: - -``` -mysql> select * from ipv4_test order by id; -+------+-----------------+ -| id | ip_v4 | -+------+-----------------+ -| 1 | 0.0.0.0 | -| 2 | 127.0.0.1 | -| 3 | 59.50.185.152 | -| 4 | 255.255.255.255 | -| 5 | NULL | -+------+-----------------+ -``` - -### keywords - -IPV4 diff --git a/versioned_docs/version-2.0/sql-manual/sql-data-types/ip/IPV6.md b/versioned_docs/version-2.0/sql-manual/sql-data-types/ip/IPV6.md deleted file mode 100644 index 48415767c43..00000000000 --- a/versioned_docs/version-2.0/sql-manual/sql-data-types/ip/IPV6.md +++ /dev/null @@ -1,76 +0,0 @@ ---- -{ - "title": "IPV6", - "language": "en" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -## IPV6 - -### description - -IPv6 type, stored in UInt128 format in 16 bytes, used to represent IPv6 addresses. -The range of values is ['::', 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff']. - -`Inputs that exceed the value range or have invalid format will return NULL` - -### example - -Create table example: - -``` -CREATE TABLE ipv6_test ( - `id` int, - `ip_v6` ipv6 -) ENGINE=OLAP -DISTRIBUTED BY HASH(`id`) BUCKETS 4 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1" -); -``` - -Insert data example: - -``` -insert into ipv6_test values(1, '::'); -insert into ipv6_test values(2, '2001:16a0:2:200a::2'); -insert into ipv6_test values(3, 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff'); -insert into ipv6_test values(4, 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffffg'); // invalid data -``` - -Select data example: - -``` -mysql> select * from ipv6_test order by id; -+------+-----------------------------------------+ -| id | ip_v6 | -+------+-----------------------------------------+ -| 1 | :: | -| 2 | 2001:16a0:2:200a::2 | -| 3 | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | -| 4 | NULL | -+------+-----------------------------------------+ -``` - -### keywords - -IPV6 diff --git a/versioned_docs/version-2.0/sql-manual/sql-data-types/semi-structured/VARIANT.md b/versioned_docs/version-2.0/sql-manual/sql-data-types/semi-structured/VARIANT.md deleted file mode 100644 index 83280efd71b..00000000000 --- a/versioned_docs/version-2.0/sql-manual/sql-data-types/semi-structured/VARIANT.md +++ /dev/null @@ -1,370 +0,0 @@ ---- -{ - "title": "VARIANT", - "language": "en" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -## VARIANT - -### Description - -Introduced a new data type VARIANT in Doris 2.1, which can store semi-structured JSON data. It allows storing complex data structures containing different data types (such as integers, strings, boolean values, etc.) without the need to define specific columns in the table structure beforehand. The VARIANT type is particularly useful for handling complex nested structures that may change at any time. During the writing process, this type can automatically infer column information based on [...] - -### Note - -Advantages over JSON Type: - -1. Different storage methods: The JSON type is stored in binary JSONB format, and the entire JSON is stored row by row in segment files. In contrast, the VARIANT type infers types during writing and stores the written JSON columns. It has a higher compression ratio compared to the JSON type, providing better storage efficiency. -2. Query: Querying does not require parsing. VARIANT fully utilizes columnar storage, vectorized engines, optimizers, and other components in Doris, providing users with extremely high query performance. -Below are test results based on clickbench data: - -| | Storage Space | -|--------------|------------| -| Predefined Static Columns | 12.618 GB | -| VARIANT Type | 12.718 GB | -| JSON Type | 35.711 GB | - -**Saves approximately 65% storage capacity** - -| Query Counts | Predefined Static Columns | VARIANT Type | JSON Type | -|---------------------|---------------------------|--------------|-----------------| -| First Query (cold) | 233.79s | 248.66s | **Most queries timeout** | -| Second Query (hot) | 86.02s | 94.82s | 789.24s | -| Third Query (hot) | 83.03s | 92.29s | 743.69s | - -[test case](https://github.com/ClickHouse/ClickBench/blob/main/doris/queries.sql) contains 43 queries - -**8x faster query, query performance comparable to static columns** - -### Example - -Demonstrate the functionality and usage of VARIANT with an example covering table creation, data import, and query cycle. - -**Table Creation Syntax** -Create a table, using the `VARIANT` keyword in the syntax. - -``` sql --- Without index -CREATE TABLE IF NOT EXISTS ${table_name} ( - k BIGINT, - v VARIANT -) -table_properties; - --- Create an index on the v column, optionally specify the tokenize method, default is untokenized -CREATE TABLE IF NOT EXISTS ${table_name} ( - k BIGINT, - v VARIANT, - INDEX idx_var(v) USING INVERTED [PROPERTIES("parser" = "english|unicode|chinese")] [COMMENT 'your comment'] -) -table_properties; - --- Create an bloom filter on v column, to enhance query seed on sub columns -CREATE TABLE IF NOT EXISTS ${table_name} ( - k BIGINT, - v VARIANT -) -... -properties("replication_num" = "1", "bloom_filter_columns" = "v"); - -``` - -**Query Syntax** - -``` sql --- use v['a']['b'] format for example, v['properties']['title'] type is VARIANT -SELECT v['properties']['title'] from ${table_name} - -``` - -**Example based on the GitHub events dataset** - -Here, github events data is used to demonstrate the table creation, data import, and query using VARIANT. -The below is a formatted line of data: - -``` json -{ - "id": "14186154924", - "type": "PushEvent", - "actor": { - "id": 282080, - "login": "brianchandotcom", - "display_login": "brianchandotcom", - "gravatar_id": "", - "url": "https://api.github.com/users/brianchandotcom", - "avatar_url": "https://avatars.githubusercontent.com/u/282080?" - }, - "repo": { - "id": 1920851, - "name": "brianchandotcom/liferay-portal", - "url": "https://api.github.com/repos/brianchandotcom/liferay-portal" - }, - "payload": { - "push_id": 6027092734, - "size": 4, - "distinct_size": 4, - "ref": "refs/heads/master", - "head": "91edd3c8c98c214155191feb852831ec535580ba", - "before": "abb58cc0db673a0bd5190000d2ff9c53bb51d04d", - "commits": [""] - }, - "public": true, - "created_at": "2020-11-13T18:00:00Z" -} -``` - -**Table Creation** - -- Created three columns of VARIANT type: `actor`, `repo`, and `payload`. -- Simultaneously created an inverted index, `idx_payload`, for the `payload` column while creating the table. -- Specified the index type as inverted using `USING INVERTED`, aimed at accelerating conditional filtering of sub-columns. -- `PROPERTIES("parser" = "english")` specified the adoption of English tokenization. - -``` sql -CREATE DATABASE test_variant; -USE test_variant; -CREATE TABLE IF NOT EXISTS github_events ( - id BIGINT NOT NULL, - type VARCHAR(30) NULL, - actor VARIANT NULL, - repo VARIANT NULL, - payload VARIANT NULL, - public BOOLEAN NULL, - created_at DATETIME NULL, - INDEX idx_payload (`payload`) USING INVERTED PROPERTIES("parser" = "english") COMMENT 'inverted index for payload' -) -DUPLICATE KEY(`id`) -DISTRIBUTED BY HASH(id) BUCKETS 10 -properties("replication_num" = "1"); -``` - -:::tip - -1. Creating an index on VARIANT columns, such as when there are numerous sub-columns in payload, might lead to an excessive number of index columns, impacting write performance. -2. The tokenization properties for the same VARIANT column are uniform. If you have varied tokenization requirements, consider creating multiple VARIANT columns and specifying index properties separately for each. - -::: - - -**Using Streamload for Import** - -Importing gh_2022-11-07-3.json, which contains one hour's worth of GitHub events data. - -``` shell -wget https://qa-build.oss-cn-beijing.aliyuncs.com/regression/variant/gh_2022-11-07-3.json - -curl --location-trusted -u root: -T gh_2022-11-07-3.json -H "read_json_by_line:true" -H "format:json" http://127.0.0.1:18148/api/test_variant/github_events/_strea -m_load - -{ - "TxnId": 2, - "Label": "086fd46a-20e6-4487-becc-9b6ca80281bf", - "Comment": "", - "TwoPhaseCommit": "false", - "Status": "Success", - "Message": "OK", - "NumberTotalRows": 139325, - "NumberLoadedRows": 139325, - "NumberFilteredRows": 0, - "NumberUnselectedRows": 0, - "LoadBytes": 633782875, - "LoadTimeMs": 7870, - "BeginTxnTimeMs": 19, - "StreamLoadPutTimeMs": 162, - "ReadDataTimeMs": 2416, - "WriteDataTimeMs": 7634, - "CommitAndPublishTimeMs": 55 -} -``` - -Confirm the successful import. - -``` sql --- View the number of rows. -mysql> select count() from github_events; -+----------+ -| count(*) | -+----------+ -| 139325 | -+----------+ -1 row in set (0.25 sec) - --- Random select one row -mysql> select * from github_events limit 1; -+-------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------- [...] -| id | type | actor | repo | payload [...] -+-------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------- [...] -| 25061821748 | PushEvent | {"gravatar_id":"","display_login":"jfrog-pipelie-intg","url":"https://api.github.com/users/jfrog-pipelie-intg","id":98024358,"login":"jfrog-pipelie-intg","avatar_url":"https://avatars.githubusercontent.com/u/98024358?"} | {"url":"https://api.github.com/repos/jfrog-pipelie-intg/jfinte2e_1667789956723_16","id":562683829,"name":"jfrog-pipelie-intg/jfinte2e_1667789956723_16"} | {"commits":[{"sha":"334433de436baa198024ef9f55f0647721bcd750","author":{"email":"980243 [...] -+-------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------- [...] -1 row in set (0.23 sec) -``` -Running desc command to view schema information, sub-columns will automatically expand at the storage layer and undergo type inference. - -``` sql -mysql> desc github_events; -+------------------------------------------------------------+------------+------+-------+---------+-------+ -| Field | Type | Null | Key | Default | Extra | -+------------------------------------------------------------+------------+------+-------+---------+-------+ -| id | BIGINT | No | true | NULL | | -| type | VARCHAR(*) | Yes | false | NULL | NONE | -| actor | VARIANT | Yes | false | NULL | NONE | -| created_at | DATETIME | Yes | false | NULL | NONE | -| payload | VARIANT | Yes | false | NULL | NONE | -| public | BOOLEAN | Yes | false | NULL | NONE | -+------------------------------------------------------------+------------+------+-------+---------+-------+ -6 rows in set (0.07 sec) - -mysql> set describe_extend_variant_column = true; -Query OK, 0 rows affected (0.01 sec) - -mysql> desc github_events; -+------------------------------------------------------------+------------+------+-------+---------+-------+ -| Field | Type | Null | Key | Default | Extra | -+------------------------------------------------------------+------------+------+-------+---------+-------+ -| id | BIGINT | No | true | NULL | | -| type | VARCHAR(*) | Yes | false | NULL | NONE | -| actor | VARIANT | Yes | false | NULL | NONE | -| actor.avatar_url | TEXT | Yes | false | NULL | NONE | -| actor.display_login | TEXT | Yes | false | NULL | NONE | -| actor.id | INT | Yes | false | NULL | NONE | -| actor.login | TEXT | Yes | false | NULL | NONE | -| actor.url | TEXT | Yes | false | NULL | NONE | -| created_at | DATETIME | Yes | false | NULL | NONE | -| payload | VARIANT | Yes | false | NULL | NONE | -| payload.action | TEXT | Yes | false | NULL | NONE | -| payload.before | TEXT | Yes | false | NULL | NONE | -| payload.comment.author_association | TEXT | Yes | false | NULL | NONE | -| payload.comment.body | TEXT | Yes | false | NULL | NONE | -.... -+------------------------------------------------------------+------------+------+-------+---------+-------+ -406 rows in set (0.07 sec) -``` -DESC can be used to specify partition and view the schema of a particular partition. The syntax is as follows: - -``` sql -DESCRIBE ${table_name} PARTITION ($partition_name); -``` - -**Querying** - -:::tip - -When utilizing filtering and aggregation functionalities to query sub-columns, additional casting operations need to be performed on sub-columns (because the storage types are not necessarily fixed and require a unified SQL type). -For instance, `SELECT * FROM tbl where CAST(var['titile'] as text) MATCH "hello world"` -The simplified examples below illustrate how to use VARIANT for querying: -The following are three typical query scenarios - -::: - -1. Retrieve the top 5 repositories based on star count from the `github_events` table. - -``` sql -mysql> SELECT - -> cast(repo['name'] as text) as repo_name, count() AS stars - -> FROM github_events - -> WHERE type = 'WatchEvent' - -> GROUP BY repo_name - -> ORDER BY stars DESC LIMIT 5; -+--------------------------+-------+ -| repo_name | stars | -+--------------------------+-------+ -| aplus-framework/app | 78 | -| lensterxyz/lenster | 77 | -| aplus-framework/database | 46 | -| stashapp/stash | 42 | -| aplus-framework/image | 34 | -+--------------------------+-------+ -5 rows in set (0.03 sec) -``` - -2. Retrieve the count of comments containing "doris". - -``` sql --- implicit cast `payload['comment']['body']` to string type -mysql> SELECT - -> count() FROM github_events - -> WHERE payload['comment']['body'] MATCH 'doris'; -+---------+ -| count() | -+---------+ -| 3 | -+---------+ -1 row in set (0.04 sec) -``` - -3. Query the issue number with the highest number of comments along with its corresponding repository. - -``` sql -mysql> SELECT - -> cast(repo['name'] as string) as repo_name, - -> cast(payload['issue']['number'] as int) as issue_number, - -> count() AS comments, - -> count( - -> distinct cast(actor['login'] as string) - -> ) AS authors - -> FROM github_events - -> WHERE type = 'IssueCommentEvent' AND (cast(payload["action"] as string) = 'created') AND (cast(payload["issue"]["number"] as int) > 10) - -> GROUP BY repo_name, issue_number - -> HAVING authors >= 4 - -> ORDER BY comments DESC, repo_name - -> LIMIT 50; -+--------------------------------------+--------------+----------+---------+ -| repo_name | issue_number | comments | authors | -+--------------------------------------+--------------+----------+---------+ -| facebook/react-native | 35228 | 5 | 4 | -| swsnu/swppfall2022-team4 | 27 | 5 | 4 | -| belgattitude/nextjs-monorepo-example | 2865 | 4 | 4 | -+--------------------------------------+--------------+----------+---------+ -3 rows in set (0.03 sec) -``` - -### Usage Restrictions and Best Practices - -**There are several limitations when using the VARIANT type:** -Dynamic columns of VARIANT are nearly as efficient as predefined static columns. When dealing with data like logs, where fields are often added dynamically (such as container labels in Kubernetes), parsing JSON and inferring types can generate additional costs during write operations. Therefore, it's recommended to keep the number of columns for a single import below 1000. - -Ensure consistency in types whenever possible. Doris automatically performs compatible type conversions. When a field cannot undergo compatible type conversion, it is uniformly converted to JSONB type. The performance of JSONB columns may degrade compared to columns like int or text. - -1. tinyint -> smallint -> int -> bigint, integer types can be promoted following the direction of the arrows. -2. float -> double, floating-point numbers can be promoted following the direction of the arrow. -3. text, string type. -4. JSON, binary JSON type. - -When the above types cannot be compatible, they will be transformed into JSON type to prevent loss of type information. If you need to set a strict schema in VARIANT, the VARIANT MAPPING mechanism will be introduced soon. - -**Other limitations include:** - -- VARIANT columns can only create inverted indexes or bloom filter to speed up query. -- Using the **RANDOM** mode or [group commit](/docs/data-operate/import/group-commit-manual.md) mode is recommended for higher write performance. -- Non-standard JSON types such as date and decimal should ideally use static types for better performance, since these types are infered to text type. -- Arrays with dimensions of 2 or higher will be stored as JSONB encoding, which might perform less efficiently than native arrays. -- Not supported as primary or sort keys. -- Queries with filters or aggregations require casting. The storage layer eliminates cast operations based on storage type and the target type of the cast, speeding up queries. - -### FAQ -1.Streamload Error: [CANCELLED][INTERNAL_ERROR] tablet error: [DATA_QUALITY_ERROR] Reached max column size limit 2048. -Due to compaction and metadata storage limitations, the VARIANT type imposes a limit on the number of columns, with the default being 2048 columns. You can adjust the BE configuration `variant_max_merged_tablet_schema_size` accordingly, but it is not recommended to exceed 4096 columns. - -### Keywords - - VARIANT diff --git a/versioned_docs/version-2.0/table-design/data-type.md b/versioned_docs/version-2.0/table-design/data-type.md index 308641c5531..6081da38ca5 100644 --- a/versioned_docs/version-2.0/table-design/data-type.md +++ b/versioned_docs/version-2.0/table-design/data-type.md @@ -75,11 +75,5 @@ The list of data types supported by Doris is as follows: | [QUANTILE_STATE](../sql-manual/sql-data-types/aggregate/QUANTILE_STATE) | Variable Length | A type used to calculate approximate quantile values. When loading, it performs pre-aggregation for the same keys with different values. When the number of values does not exceed 2048, it records all data in detail. When the number of values is greater than 2048, it employs the TDigest algorithm to aggregate (cluster) the data and store the centroid points after clustering. QUANTILE_STATE can [...] | [AGG_STATE](../sql-manual/sql-data-types/aggregate/AGG_STATE) | Variable Length | Aggregate function can only be used with state/merge/union function combiners. AGG_STATE cannot be used as a key column. When creating a table, the signature of the aggregate function needs to be declared alongside. Users do not need to specify the length or default value. The actual data storage size depends on the function's implementation. | -## [IP types](../sql-manual/sql-data-types/data-type-overview#ip-types) - -| Type Name | Storage (bytes) | Description | -| ---------------------------------------------------------- | --------------- | ------------------------------------------------------------ | -| [IPv4](../sql-manual/sql-data-types/ip/IPV4) | 4 | It is used in conjunction with the `ipv4_*` family of functions. | -| [IPv6](../sql-manual/sql-data-types/ip/IPV6) | 16 | It is used in conjunction with the `ipv6_*` family of functions. | You can also view all the data types supported by Doris with the `SHOW DATA TYPES; `statement. diff --git a/versioned_sidebars/version-2.0-sidebars.json b/versioned_sidebars/version-2.0-sidebars.json index 463adb42d45..a79b7a811d9 100644 --- a/versioned_sidebars/version-2.0-sidebars.json +++ b/versioned_sidebars/version-2.0-sidebars.json @@ -589,8 +589,7 @@ "sql-manual/sql-data-types/semi-structured/ARRAY", "sql-manual/sql-data-types/semi-structured/MAP", "sql-manual/sql-data-types/semi-structured/STRUCT", - "sql-manual/sql-data-types/semi-structured/JSON", - "sql-manual/sql-data-types/semi-structured/VARIANT" + "sql-manual/sql-data-types/semi-structured/JSON" ] }, { @@ -602,14 +601,6 @@ "sql-manual/sql-data-types/aggregate/QUANTILE-STATE", "sql-manual/sql-data-types/aggregate/AGG-STATE" ] - }, - { - "type": "category", - "label": "IP Data Type", - "items": [ - "sql-manual/sql-data-types/ip/IPV4", - "sql-manual/sql-data-types/ip/IPV6" - ] } ] }, --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org