This is an automated email from the ASF dual-hosted git repository. kassiez pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new d002c5f4b8d add agg state link (#1318) d002c5f4b8d is described below commit d002c5f4b8d1f6fbc66bfcf15c003da994d13705 Author: Pxl <pxl...@qq.com> AuthorDate: Fri Nov 15 20:18:16 2024 +0800 add agg state link (#1318) # Versions - [X] dev - [X] 3.0 - [ ] 2.1 - [X] 2.0 # Languages - [X] Chinese - [X] English --- docs/table-design/data-model/aggregate.md | 2 +- .../current/table-design/data-model/aggregate.md | 1 + .../table-design/data-model/aggregate.md | 112 ------------------- .../table-design/data-model/aggregate.md | 1 + .../table-design/data-model/aggregate.md | 122 --------------------- .../table-design/data-model/aggregate.md | 1 + 6 files changed, 4 insertions(+), 235 deletions(-) diff --git a/docs/table-design/data-model/aggregate.md b/docs/table-design/data-model/aggregate.md index 508c7430f93..a9fc1459ab7 100644 --- a/docs/table-design/data-model/aggregate.md +++ b/docs/table-design/data-model/aggregate.md @@ -212,7 +212,7 @@ At different stages, data will be aggregated to varying degrees. For example, wh AGG_STATE cannot be used as a key column, and when creating a table, you need to declare the signature of the aggregation function. Users do not need to specify a length or default value. The actual storage size of the data depends on the function implementation. ::: - +Details can be found in [agg_state](../../sql-manual/sql-data-types/aggregate/AGG-STATE.md) 。 CREATE TABLE diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/aggregate.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/aggregate.md index 65dbd0a35f7..2a63b804577 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/aggregate.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-model/aggregate.md @@ -210,6 +210,7 @@ insert into example_tbl_agg1 values AGG_STATE不能作为key列使用,建表时需要同时声明聚合函数的签名。 用户不需要指定长度和默认值。实际存储的数据大小与函数实现有关。 ``` +详情可见于 [agg_state](../../sql-manual/sql-data-types/aggregate/AGG-STATE.md) 。 建表 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-model/aggregate.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-model/aggregate.md index ebf2f34878f..1e3a7d30bd7 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-model/aggregate.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-model/aggregate.md @@ -203,115 +203,3 @@ insert into example_tbl_agg1 values 3. 数据查询阶段。在数据查询时,对于查询涉及到的数据,会进行对应的聚合。 数据在不同时间,可能聚合的程度不一致。比如一批数据刚导入时,可能还未与之前已存在的数据进行聚合。但是对于用户而言,用户只能查询到聚合后的数据。即不同的聚合程度对于用户查询而言是透明的。用户需始终认为数据以最终的完成的聚合程度存在,而不应假设某些聚合还未发生。(可参阅聚合模型的局限性一节获得更多详情。) - -## agg_state - -```Plain -AGG_STATE不能作为key列使用,建表时需要同时声明聚合函数的签名。 -用户不需要指定长度和默认值。实际存储的数据大小与函数实现有关。 -``` - -建表 - -```sql -set enable_agg_state=true; -create table aggstate( - k1 int null, - k2 agg_state sum(int), - k3 agg_state group_concat(string) -) -aggregate key (k1) -distributed BY hash(k1) buckets 3 -properties("replication_num" = "1"); -``` - -其中 agg_state 用于声明数据类型为 agg_state,sum/group_concat 为聚合函数的签名。注意 agg_state 是一种数据类型,同 int/array/string - -agg_state 只能配合[state](../../sql-manual/sql-functions/combinators/state) /[merge](../../sql-manual/sql-functions/combinators/merge)/[union](../../sql-manual/sql-functions/combinators/union)函数组合器使用。 - -agg_state 是聚合函数的中间结果,例如,聚合函数 sum,则 agg_state 可以表示 sum(1,2,3,4,5) 的这个中间状态,而不是最终的结果。 - -agg_state 类型需要使用 state 函数来生成,对于当前的这个表,则为`sum_state`,`group_concat_state`。 - -```sql -insert into aggstate values(1,sum_state(1),group_concat_state('a')); -insert into aggstate values(1,sum_state(2),group_concat_state('b')); -insert into aggstate values(1,sum_state(3),group_concat_state('c')); -``` - -此时表只有一行 ( 注意,下面的表只是示意图,不是真的可以 select 显示出来) - -| k1 | k2 | k3 | -| ---- | ---------- | ------------------------- | -| 1 | sum(1,2,3) | group_concat_state(a,b,c) | - -再插入一条数据 - -```sql -insert into aggstate values(2,sum_state(4),group_concat_state('d')); -``` - -此时表的结构为 - -| k1 | k2 | k3 | -| ---- | ---------- | ------------------------- | -| 1 | sum(1,2,3) | group_concat_state(a,b,c) | -| 2 | sum(4) | group_concat_state(d) | - -我们可以通过 merge 操作来合并多个 state,并且返回最终聚合函数计算的结果 - -```Plain -mysql> select sum_merge(k2) from aggstate; -+---------------+ -| sum_merge(k2) | -+---------------+ -| 10 | -+---------------+ -``` - -`sum_merge` 会先把 sum(1,2,3) 和 sum(4) 合并成 sum(1,2,3,4) ,并返回计算的结果。因为 group_concat 对于顺序有要求,所以结果是不稳定的。 - -```Plain -mysql> select group_concat_merge(k3) from aggstate; -+------------------------+ -| group_concat_merge(k3) | -+------------------------+ -| c,b,a,d | -+------------------------+ -``` - -如果不想要聚合的最终结果,可以使用 union 来合并多个聚合的中间结果,生成一个新的中间结果。 - -```sql -insert into aggstate select 3,sum_union(k2),group_concat_union(k3) from aggstate ; -``` - -此时的表结构为 - -| k1 | k2 | k3 | -| ---- | ------------ | --------------------------- | -| 1 | sum(1,2,3) | group_concat_state(a,b,c) | -| 2 | sum(4) | group_concat_state(d) | -| 3 | sum(1,2,3,4) | group_concat_state(a,b,c,d) | - -可以通过查询 - -```Plain -mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate; -+---------------+------------------------+ -| sum_merge(k2) | group_concat_merge(k3) | -+---------------+------------------------+ -| 20 | c,b,a,d,c,b,a,d | -+---------------+------------------------+ - -mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate where k1 != 2; -+---------------+------------------------+ -| sum_merge(k2) | group_concat_merge(k3) | -+---------------+------------------------+ -| 16 | c,b,a,d,c,b,a | -+---------------+------------------------+ -``` - -用户可以通过 agg_state 做出更细致的聚合函数操作。 - -注意 agg_state 存在一定的性能开销。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/aggregate.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/aggregate.md index 65dbd0a35f7..2a63b804577 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/aggregate.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/aggregate.md @@ -210,6 +210,7 @@ insert into example_tbl_agg1 values AGG_STATE不能作为key列使用,建表时需要同时声明聚合函数的签名。 用户不需要指定长度和默认值。实际存储的数据大小与函数实现有关。 ``` +详情可见于 [agg_state](../../sql-manual/sql-data-types/aggregate/AGG-STATE.md) 。 建表 diff --git a/versioned_docs/version-2.0/table-design/data-model/aggregate.md b/versioned_docs/version-2.0/table-design/data-model/aggregate.md index fb3ae1a187f..b3255f98fe7 100644 --- a/versioned_docs/version-2.0/table-design/data-model/aggregate.md +++ b/versioned_docs/version-2.0/table-design/data-model/aggregate.md @@ -203,125 +203,3 @@ In Doris, data aggregation happens in the following 3 stages: 3. The data query stage. The data involved in the query will be aggregated accordingly. At different stages, data will be aggregated to varying degrees. For example, when a batch of data is just imported, it may not be aggregated with the existing data. But for users, they **can only query aggregated data**. That is, what users see are the aggregated data, and they **should not assume that what they have seen are not or partly aggregated**. - -### agg_state - -:::tip - -AGG_STATE cannot be used as a key column, and when creating a table, you need to declare the signature of the aggregation function. Users do not need to specify a length or default value. The actual storage size of the data depends on the function implementation. - -::: - - -CREATE TABLE - -```sql -set enable_agg_state=true; -create table aggstate( - k1 int null, - k2 agg_state sum(int), - k3 agg_state group_concat(string) -) -aggregate key (k1) -distributed BY hash(k1) buckets 3 -properties("replication_num" = "1"); -``` - - -`agg_state` is used to declare the data type as `agg_state,` and `sum/group_concat` are the signatures of aggregation functions. - -Please note that `agg_state` is a data type, similar to `int`, `array`, or `string`. - -`agg_state` can only be used in conjunction with the [state](../../sql-manual/sql-functions/combinators/state.md)/[merge](../../sql-manual/sql-functions/combinators/merge.md)/[union](../../sql-manual/sql-functions/combinators/union.md) function combinators. - -`agg_state` represents an intermediate result of an aggregation function. For example, with the aggregation function `sum`, `agg_state` can represent the intermediate state of summing values like `sum(1, 2, 3, 4, 5)`, rather than the final result. - -The `agg_state` type needs to be generated using the `state` function. For the current table, it would be `sum_state` and `group_concat_state` for the "sum" and `group_concat` aggregation functions, respectively. - -```sql -insert into aggstate values(1,sum_state(1),group_concat_state('a')); -insert into aggstate values(1,sum_state(2),group_concat_state('b')); -insert into aggstate values(1,sum_state(3),group_concat_state('c')); -``` - -At this point, the table contains only one row. Please note that the table below is for illustrative purposes and cannot be selected/displayed directly: - -| k1 | k2 | k3 | -| ---- | ---------- | ------------------------- | -| 1 | sum(1,2,3) | group_concat_state(a,b,c) | - -Insert another record. - -```sql -insert into aggstate values(2,sum_state(4),group_concat_state('d')); -``` - -The table's structure at this moment is... - -| k1 | k2 | k3 | -| ---- | ---------- | ------------------------- | -| 1 | sum(1,2,3) | group_concat_state(a,b,c) | -| 2 | sum(4) | group_concat_state(d) | - -We can use the `merge` operation to combine multiple states and return the final result calculated by the aggregation function. - -``` -mysql> select sum_merge(k2) from aggstate; -+---------------+ -| sum_merge(k2) | -+---------------+ -| 10 | -+---------------+ -``` - -`sum_merge` will first combine sum(1,2,3) and sum(4) into sum(1,2,3,4), and return the calculated result. -Because `group_concat` has a specific order requirement, the result is not stable. - -``` -mysql> select group_concat_merge(k3) from aggstate; -+------------------------+ -| group_concat_merge(k3) | -+------------------------+ -| c,b,a,d | -+------------------------+ -``` - -If you do not want the final aggregation result, you can use 'union' to combine multiple intermediate aggregation results and generate a new intermediate result. - -```sql -insert into aggstate select 3,sum_union(k2),group_concat_union(k3) from aggstate ; -``` - -The table's structure at this moment is... - -| k1 | k2 | k3 | -| ---- | ------------ | --------------------------- | -| 1 | sum(1,2,3) | group_concat_state(a,b,c) | -| 2 | sum(4) | group_concat_state(d) | -| 3 | sum(1,2,3,4) | group_concat_state(a,b,c,d) | - -You can achieve this through a query. - -``` -mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate; -+---------------+------------------------+ -| sum_merge(k2) | group_concat_merge(k3) | -+---------------+------------------------+ -| 20 | c,b,a,d,c,b,a,d | -+---------------+------------------------+ - -mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate where k1 != 2; -+---------------+------------------------+ -| sum_merge(k2) | group_concat_merge(k3) | -+---------------+------------------------+ -| 16 | c,b,a,d,c,b,a | -+---------------+------------------------+ -``` - -Users can perform more detailed aggregation function operations using `agg_state`. - -:::tip - - `agg_state` comes with a certain performance overhead. - -::: \ No newline at end of file diff --git a/versioned_docs/version-3.0/table-design/data-model/aggregate.md b/versioned_docs/version-3.0/table-design/data-model/aggregate.md index 508c7430f93..0ac394bab3b 100644 --- a/versioned_docs/version-3.0/table-design/data-model/aggregate.md +++ b/versioned_docs/version-3.0/table-design/data-model/aggregate.md @@ -212,6 +212,7 @@ At different stages, data will be aggregated to varying degrees. For example, wh AGG_STATE cannot be used as a key column, and when creating a table, you need to declare the signature of the aggregation function. Users do not need to specify a length or default value. The actual storage size of the data depends on the function implementation. ::: +Details can be found in [agg_state](../../sql-manual/sql-data-types/aggregate/AGG-STATE.md) 。 CREATE TABLE --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org