This is an automated email from the ASF dual-hosted git repository. morrysnow pushed a commit to branch polish_query_data in repository https://gitbox.apache.org/repos/asf/doris-website.git
commit a5ebadecc7398a679e27d7980bc9011469a39b65 Author: morrySnow <zhangwen...@selectdb.com> AuthorDate: Tue Dec 24 18:04:30 2024 +0800 [polish](query-data) polish query data docs --- docs/query-data/complex-type.md | 4 +--- docs/query-data/cte.md | 26 ++++++++++++++++++++-- docs/query-data/join.md | 22 ++++++++---------- docs/query-data/subquery.md | 6 ++--- docs/query-data/udf/alias-function.md | 4 ---- .../current/query-data/complex-type.md | 4 +--- .../current/query-data/cte.md | 26 ++++++++++++++++++++-- .../current/query-data/join.md | 14 +++++------- .../current/query-data/udf/alias-function.md | 4 ---- .../current/query-data/window-function.md | 12 +++++----- .../version-2.1/query-data/complex-type.md | 4 +--- .../version-2.1/query-data/cte.md | 26 ++++++++++++++++++++-- .../version-2.1/query-data/join.md | 14 +++++------- .../version-2.1/query-data/udf/alias-function.md | 4 ---- .../version-2.1/query-data/window-function.md | 13 ++++++----- .../version-3.0/query-data/complex-type.md | 4 +--- .../version-3.0/query-data/cte.md | 26 ++++++++++++++++++++-- .../version-3.0/query-data/join.md | 14 +++++------- .../version-3.0/query-data/udf/alias-function.md | 4 ---- .../version-3.0/query-data/window-function.md | 12 +++++----- .../version-2.1/query-data/complex-type.md | 5 +---- versioned_docs/version-2.1/query-data/cte.md | 26 ++++++++++++++++++++-- versioned_docs/version-2.1/query-data/join.md | 22 ++++++++---------- versioned_docs/version-2.1/query-data/subquery.md | 6 ++--- .../version-2.1/query-data/udf/alias-function.md | 4 ---- .../version-3.0/query-data/complex-type.md | 5 +---- versioned_docs/version-3.0/query-data/cte.md | 26 ++++++++++++++++++++-- versioned_docs/version-3.0/query-data/join.md | 22 ++++++++---------- versioned_docs/version-3.0/query-data/subquery.md | 6 ++--- .../version-3.0/query-data/udf/alias-function.md | 4 ---- 30 files changed, 220 insertions(+), 149 deletions(-) diff --git a/docs/query-data/complex-type.md b/docs/query-data/complex-type.md index 92dd4862cf7..9db1b523fd8 100644 --- a/docs/query-data/complex-type.md +++ b/docs/query-data/complex-type.md @@ -28,6 +28,4 @@ Doris supports complex types such as Array, Map, Struct, and JSON. Doris provides various functions specifically designed for these complex types. -For detailed information on the supported functions, please refer to the SQL manual- [Array functions](../sql-manual/sql-functions/array-functions/array), [SQL manual-Struct functions](../sql-manual/sql-functions/struct-functions/struct), and [SQL manual-JSON functions](../sql-manual/sql-functions/json-functions/json-parse). - -To check the support for Map functions, you can refer to the SQL manual docs under the section for the [MAP](../sql-manual/sql-data-types/semi-structured/MAP) data type. +For detailed information on the supported functions, please refer to the [SQL manual - Array functions](../sql-manual/sql-functions/array-functions/array), [SQL manual - Map functions](../sql-manual/sql-data-types/semi-structured/MAP) data type, [SQL manual - Struct functions](../sql-manual/sql-functions/struct-functions/struct), and [SQL manual - Json functions](../sql-manual/sql-functions/json-functions/json-parse). diff --git a/docs/query-data/cte.md b/docs/query-data/cte.md index 941d74639a1..04299e4afec 100644 --- a/docs/query-data/cte.md +++ b/docs/query-data/cte.md @@ -30,12 +30,14 @@ Common Table Expression (CTE) define a temporary result set that can be referenc To specify a CTE, use the `WITH` clause with one or more comma-separated clauses. Each clause provides a subquery that generates a result set and associates a name with the subquery. -Within the statement that contains the `WITH` clause, you can reference each CTE name to access the corresponding CTE result set. CTE names can be referenced in other CTE, allowing you to define CTE based on other CTE. +Doris supports nested CTE。 Within the statement that contains the `WITH` clause, you can reference each CTE name to access the corresponding CTE result set. CTE names can be referenced in other CTE, allowing you to define CTE based on other CTE. -Doris **DO NOT** support recursive CTE. +Doris **DOES NOT** support recursive CTE. For more information, please read MySQL manual about [recursive CTE](https://dev.mysql.com/doc/refman/8.4/en/with.html#common-table-expressions-recursive) ## Example +### Simple CTE + The following example defines CTE named cte1 and cte2 within the WITH clause and refers to them in the top-level SELECT below the WITH clause: ```sql @@ -45,3 +47,23 @@ WITH SELECT b, d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c; ``` + +### Nested CTE + +```sql +WITH + cte1 AS (SELECT a, b FROM table1), + cte2 AS (SELECT c, d FROM cte1) +SELECT b, d FROM cte1 JOIN cte2 +WHERE cte1.a = cte2.c; +``` + +### Recursive CTE (NOT Support) + +```sql +WITH r_cte AS ( + SELECT 1 AS user_id, 2 as manager_id + UNION ALL + SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as manager_id) t ON r_cte.manager_id = t.user_id +) +SELECT * FROM r_cte diff --git a/docs/query-data/join.md b/docs/query-data/join.md index dd56f235fb2..fb187a4bb38 100644 --- a/docs/query-data/join.md +++ b/docs/query-data/join.md @@ -59,9 +59,9 @@ Doris supports two implementation methods for JOIN: **Hash Join** and **Nested L As a distributed MPP database, Apache Doris requires data shuffling during the Hash Join process to ensure the correctness of the JOIN results. Below are several data shuffling methods: -**Broadcast Join** As illustrated, the Broadcast Join process involves sending all data from the right table to all nodes participating in the JOIN computation, including the nodes scanning the left table data, while the left table data remains stationary. In this process, each node receives a complete copy of the right table's data (with a total volume of T(R)) to ensure that all nodes have the necessary data to perform the JOIN operation. +**Broadcast Join** As illustrated, the Broadcast Join process involves sending all data from the right table to all nodes participating in the JOIN computation, including the nodes scanning the left table's data, while the left table's data remains stationary. In this process, each node receives a complete copy of the right table's data (with a total volume of T(R)) to ensure that all nodes have the necessary data to perform the JOIN operation. -This method is suitable for various scenarios but is not applicable for RIGHT OUTER, RIGHT ANTI, and RIGHT SEMI types of Hash Join. Its network overhead is calculated as the number of JOIN nodes N multiplied by the volume of right table data T(R). +This method is suitable for various scenarios but is not applicable for RIGHT OUTER, RIGHT ANTI, and RIGHT SEMI types of Hash Join. Its network overhead is calculated as the number of JOIN nodes N multiplied by the volume of right table's data T(R).  @@ -69,13 +69,13 @@ This method is suitable for various scenarios but is not applicable for RIGHT OU This method computes hash values based on the JOIN conditions and performs bucketing. Specifically, the data from both the left and right tables is partitioned according to the hash values calculated from the JOIN conditions, and these partitioned data sets are then sent to the corresponding partition nodes (as illustrated). -The network overhead of this method mainly includes two parts: the cost of transferring the left table data T(S) and the cost of transferring the right table data T(R). This method only supports Hash Join operations because it relies on the JOIN conditions to perform data bucketing. +The network overhead of this method mainly includes two parts: the cost of transferring the left table's data T(S) and the cost of transferring the right table's data T(R). This method only supports Hash Join operations because it relies on the JOIN conditions to perform data bucketing.  ### Bucket Shuffle Join -When the JOIN condition includes the bucketed column from the left table, the left table's data remains unchanged while the right table's data is distributed to the left table's nodes for the JOIN, reducing network overhead. +When the JOIN condition includes the bucketed column from the left table, the left table's data location remains unchanged while the right table's data is distributed to the left table's nodes for the JOIN, reducing network overhead. When one side of the table involved in the JOIN operation has its data already hash-distributed according to the JOIN condition column, users can choose to keep this side's data location unchanged while distributing the other side's data based on the same JOIN condition column and hash distribution. (The term "table" here refers not only to physically stored tables but also to the output results of any operators in SQL queries. Users can flexibly choose to keep either the left or right t [...] @@ -418,20 +418,16 @@ When directly scanning two physical tables for a Bucket Shuffle Join, the follow 2. **Inclusion of bucketed columns in equality conditions**: The equality JOIN condition must include the bucketed columns from both tables. When the left table's bucketed column is used as the equality JOIN condition, it is more likely to be planned as a Bucket Shuffle Join. -3. **Data type consistency**: Since the hash value computation results differ for different data types, the data types of the left table's bucketed column and the right table's equality JOIN column must match; otherwise, the corresponding planning cannot occur. +3. **Table type restrictions**: Bucket Shuffle Join is only applicable to native OLAP tables in Doris. For external tables such as ODBC, MySQL, and ES, Bucket Shuffle Join cannot be effective when they are used as the left table. -4. **Table type restrictions**: Bucket Shuffle Join is only applicable to native OLAP tables in Doris. For external tables such as ODBC, MySQL, and ES, Bucket Shuffle Join cannot be effective when they are used as the left table. - -5. **Single Partition Requirement**: For partitioned tables, since the data distribution may differ across partitions, Bucket Shuffle Join is only guaranteed to be effective when the left table is a single partition. Therefore, when executing SQL, it is advisable to use `WHERE` conditions to enable partition pruning strategies whenever possible. +4. **Single Partition Requirement**: For partitioned tables, since the data distribution may differ across partitions, Bucket Shuffle Join is only guaranteed to be effective when the left table is a single partition. Therefore, when executing SQL, it is advisable to use `WHERE` conditions to enable partition pruning strategies whenever possible. ### Limitations of Colocate Join When directly scanning two physical tables, Colocate Join has stricter limitations compared to Bucket Shuffle Join. In addition to meeting all the conditions for Bucket Shuffle Join, the following requirements must also be satisfied: -1. **Consistency of bucket column types and counts**: Not only must the types of the bucketed columns match, but the number of buckets must also be the same to ensure data distribution consistency. - -2. **Consistency of table replicas**: The number of replicas for the tables must be consistent. +1. **bucket column types and counts are Same**: Not only must the types of the bucketed columns match, but the number of buckets must also be the same to ensure data distribution consistency. -3. **Explicit specification of Colocation Group**: A Colocation Group must be explicitly specified; only tables within the same Colocation Group can participate in a Colocate Join. +2. **Explicit specification of Colocation Group**: A Colocation Group must be explicitly specified; only tables within the same Colocation Group can participate in a Colocate Join. -4. **Unstable state during replica repair or balancing**: During operations such as replica repair or balancing, the Colocation Group may be in an unstable state. In this case, the Colocate Join will degrade to a regular Join operation. \ No newline at end of file +3. **Unstable state during replica repair or balancing**: During operations such as replica repair or balancing, the Colocation Group may be in an unstable state. In this case, the Colocate Join will degrade to a regular Join operation. \ No newline at end of file diff --git a/docs/query-data/subquery.md b/docs/query-data/subquery.md index 78d7618a81d..ddff0b8f53d 100644 --- a/docs/query-data/subquery.md +++ b/docs/query-data/subquery.md @@ -84,11 +84,11 @@ select * from t1 where t1.c1 in (select t2.c1 from t2); Subqueries can be classified into correlated subqueries and non-correlated subqueries based on whether they reference columns from the outer query: -**3. Non-correlated Subquery** +**1. Non-correlated Subquery** A subquery that does not reference any columns from the outer query. Non-correlated subqueries can often be computed independently and return the corresponding results once for the outer query to use. -**4. Correlated Subquery** +**2. Correlated Subquery** A subquery that references one or more columns from the main query (also known as the outer query) (the referenced outer columns are often in the WHERE condition of the subquery). Correlated subqueries can often be seen as a filtering operation on the externally associated table, as for each row of data in the outer table, the subquery is computed and returns the corresponding result. @@ -302,7 +302,7 @@ The difference between `Mark Join` and a regular `left semi join` is that a regu With this flag, the `where` filtering condition can be rewritten as `where mark_join_flag or t1.c1 > 0` to obtain the correct results. -## Common Issues +## Usage notes Since the output of a scalar subquery must be a single value, a runtime error will be reported when the subquery returns more than one row of data. diff --git a/docs/query-data/udf/alias-function.md b/docs/query-data/udf/alias-function.md index acebd4544ea..16285734cbf 100644 --- a/docs/query-data/udf/alias-function.md +++ b/docs/query-data/udf/alias-function.md @@ -69,7 +69,3 @@ CREATE ALIAS FUNCTION func(INT, INT) WITH PARAMETER(foo, bar) AS foo + bar; ### Parameter Requirements Currently, alias functions do not support variable-length parameters and must have at least one parameter. - -## More Content - -Apart from alias functions, Doris supports Java UDF and UDTF functionalities. For detailed information, please refer to the documentation [Java UDF](../../query-data/udf/java-user-defined-function). \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/complex-type.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/complex-type.md index 8f4d4ea1300..7d0a648c107 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/complex-type.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/complex-type.md @@ -24,10 +24,8 @@ specific language governing permissions and limitations under the License. --> - Doris 支持 Array,Map,Struct,JSON 等复杂类型。 Doris 提供了针对以上复杂类型的各类函数。 -详细的函数支持,请查看 SQL 手册 / SQL 函数 下的 [数组函数](../sql-manual/sql-functions/array-functions/array)、[Struct 函数](../sql-manual/sql-functions/struct-functions/struct) 和 [JSON 函数](../sql-manual/sql-functions/json-functions/json-parse),Map 函数支持情况,直接查看 SQL 手册 / 数据类型 / [MAP](../sql-manual/sql-data-types/semi-structured/MAP)。 - +详细的函数支持,请查看 SQL 手册 / SQL 函数 下的 [Array 函数](../sql-manual/sql-functions/array-functions/array)、[Map 函数](../sql-manual/sql-data-types/semi-structured/MAP) 、[Struct 函数](../sql-manual/sql-functions/struct-functions/struct) 和 [JSON 函数](../sql-manual/sql-functions/json-functions/json-parse)。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md index e78a1c09a5c..da6810711c2 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md @@ -30,12 +30,14 @@ under the License. 要指定公用表表达式,请使用 `WITH` 具有一个或多个逗号分隔子句的子句。每个子条款都提供一个子查询,用于生成结果集,并将名称与子查询相关联。 -在包含该 `WITH`子句 的语句中,可以引用每个 CTE 名称以访问相应的 CTE 结果集。CTE 名称可以在其他 CTE 中引用,从而可以基于其他 CTE 定义 CTE。 +Doris 支持嵌套 CTE。在包含该 `WITH`子句 的语句中,可以引用每个 CTE 名称以访问相应的 CTE 结果集。CTE 名称可以在其他 CTE 中引用,从而可以基于其他 CTE 定义 CTE。 -Doris **不支持** 递归 CTE。 +Doris **不支持** 递归 CTE。有关递归 CTE 的详细解释,可以参考 [MySQL 递归 CTE 手册](https://dev.mysql.com/doc/refman/8.4/en/with.html#common-table-expressions-recursive) ## 示例 +### 简单示例 + 下面的示例定义名为的 CTE `cte1` 和 `cte2` 中 `WITH` 子句,并且是指在它们的顶层 `SELECT` 下面的 `WITH` 子句: ```sql @@ -45,3 +47,23 @@ WITH SELECT b,d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c; ``` + +### 嵌套 CTE + +```sql +WITH + cte1 AS (SELECT a, b FROM table1), + cte2 AS (SELECT c, d FROM cte1) +SELECT b, d FROM cte1 JOIN cte2 +WHERE cte1.a = cte2.c; +``` + +### 递归 CTE (Doris 不支持) + +```sql +WITH r_cte AS ( + SELECT 1 AS user_id, 2 as manager_id + UNION ALL + SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as manager_id) t ON r_cte.manager_id = t.user_id +) +SELECT * FROM r_cte diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/join.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/join.md index 4cc9dd3cf43..9139fe4a790 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/join.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/join.md @@ -427,20 +427,16 @@ Bucket Shuffle Join 和 Colocate Join 在应用时对数据分布和 JOIN 条件 2. 包含分桶列的等值条件:等值 Join 条件中须包含两张表的分桶列,当左表的分桶列作为等值 Join 条件时,更有可能被规划为 Bucket Shuffle Join。 -3. 数据类型一致性:由于不同的数据类型的 hash 值计算结果不同,左表的分桶列与右表的等值 Join 列的数据类型必须一致,否则将无法进行对应的规划。 +3. 表类型限制:Bucket Shuffle Join 仅适用于 Doris 原生的 OLAP 表。对于 ODBC、MySQL、ES 等外部表,当它们作为左表时,Bucket Shuffle Join 无法生效。 -4. 表类型限制:Bucket Shuffle Join 仅适用于 Doris 原生的 OLAP 表。对于 ODBC、MySQL、ES 等外部表,当它们作为左表时,Bucket Shuffle Join 无法生效。 - -5. 单分区要求:对于分区表,由于每个分区的数据分布可能不同,Bucket Shuffle Join 仅在左表为单分区时保证有效。因此在执行 SQL 时,应尽可能使用 `WHERE` 条件来启用分区裁剪策略。 +4. 单分区要求:对于分区表,由于每个分区的数据分布可能不同,Bucket Shuffle Join 仅在左表为单分区时保证有效。因此在执行 SQL 时,应尽可能使用 `WHERE` 条件来启用分区裁剪策略。 ### Colocate Join 的限制 在直接扫描两张物理表时,Colocate Join 相较于 Bucket Shuffle Join 具有更严格的限制条件,除了满足 Bucket Shuffle Join 的所有条件外,还需满足以下要求: -1. 不仅分桶列的类型必须一致,分桶的数量也必须相同,以确保数据分布的一致性。 - -2. 表的副本数必须保持一致。 +1. 分桶列的类型和分桶数量必须一致,以确保数据分布的一致性。 -3. 需要显式指定 Colocation Group,只有处于相同 Colocation Group 的表才能进行 Colocate Join。 +2. 需要显式指定 Colocation Group,只有处于相同 Colocation Group 的表才能进行 Colocate Join。 -4. 在进行副本修复或副本均衡等操作时,Colocation Group 可能处于 Unstable 状态,此时 Colocate Join 将退化为普通的 Join 操作。 \ No newline at end of file +3. 在进行副本修复或副本均衡等操作时,Colocation Group 可能处于 Unstable 状态,此时 Colocate Join 将退化为普通的 Join 操作。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/udf/alias-function.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/udf/alias-function.md index 73df67ec714..66df3d84ca8 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/udf/alias-function.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/udf/alias-function.md @@ -70,7 +70,3 @@ CREATE ALIAS FUNCTION func(INT, INT) WITH PARAMETER(foo, bar) AS foo + bar; ### 参数要求 当前别名函数不支持变长参数,且至少有一个参数。 - -## 更多内容 - -除别名函数外,Doris 支持 Java UDF、UDTF 功能,详细信息可参考文档 [Java UDF](../../query-data/udf/java-user-defined-function)。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/window-function.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/window-function.md index 55f71093d8c..a74ab33daf4 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/window-function.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/window-function.md @@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License. --> -分析函数,也称为窗口函数,是一种在 SQL 查询中对数据集中的行进行复杂计算的函数。窗口函数的特点在于,它们不会减少查询结果的行数,而是为每一行增加一个新的计算结果。窗口函数适用于多种分析场景,如计算运行总和、排名以及移动平均等。 +分析函数,也称为窗口函数,是一种在 SQL 查询中对数据集中的行进行复杂计算的函数。窗口函数的特点在于,它们不会减少查询结果的行数,而是为每一行增加一个新的计算结果。窗口函数适用于多种分析场景,如计算滚动合计、排名以及移动平均等。 下面是一个使用窗口函数计算每个商店的前后三天的销售移动平均值的例子: @@ -232,9 +232,9 @@ FROM 在这个例子中,学生按班级进行分区,然后在每个班级内按成绩分成 3 个组。每个组的学生数量尽可能均匀。 -## 分析函数 +## 聚合函数 -### 使用分析函数 SUM 计算累计值 +### 使用聚合函数 SUM 计算累计值 示例如下: @@ -294,9 +294,9 @@ GROUP BY 24 rows in set (0.13 sec) ``` -在此示例中,分析函数 SUM 为每一行定义一个窗口,该窗口从分区的开头(UNBOUNDED PRECEDING)开始,默认在当前行结束。在此示例中,需要嵌套使用 SUM,因为需要对本身就是 SUM 的结果执行 SUM。嵌套聚合在分析聚合函数中高频使用。 +在此示例中,聚合函数 SUM 为每一行定义一个窗口,该窗口从分区的开头(UNBOUNDED PRECEDING)开始,默认在当前行结束。在此示例中,需要嵌套使用 SUM,因为需要对本身就是 SUM 的结果执行 SUM。嵌套聚合在分析聚合函数中高频使用。 -### 使用分析函数 AVG 计算移动平均值 +### 使用聚合函数 AVG 计算移动平均值 示例如下: @@ -725,4 +725,4 @@ curl --location-trusted \ http://127.0.0.1:8030/api/doc_tpcds/customer_address/_stream_load ``` -数据文件``item_1_10.dat``,``date_dim_1_10.dat``,``store_sales.csv``,``customer_address_1_10.dat``可以[点击链接](https://cdn.selectdb.com/static/doc_ddl_dir_d27a752a7b.tar)下载。 \ No newline at end of file +数据文件``item_1_10.dat``,``date_dim_1_10.dat``,``store_sales.csv``,``customer_address_1_10.dat``可以[点击链接](https://cdn.selectdb.com/static/doc_ddl_dir_d27a752a7b.tar)下载。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/complex-type.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/complex-type.md index 8f4d4ea1300..7d0a648c107 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/complex-type.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/complex-type.md @@ -24,10 +24,8 @@ specific language governing permissions and limitations under the License. --> - Doris 支持 Array,Map,Struct,JSON 等复杂类型。 Doris 提供了针对以上复杂类型的各类函数。 -详细的函数支持,请查看 SQL 手册 / SQL 函数 下的 [数组函数](../sql-manual/sql-functions/array-functions/array)、[Struct 函数](../sql-manual/sql-functions/struct-functions/struct) 和 [JSON 函数](../sql-manual/sql-functions/json-functions/json-parse),Map 函数支持情况,直接查看 SQL 手册 / 数据类型 / [MAP](../sql-manual/sql-data-types/semi-structured/MAP)。 - +详细的函数支持,请查看 SQL 手册 / SQL 函数 下的 [Array 函数](../sql-manual/sql-functions/array-functions/array)、[Map 函数](../sql-manual/sql-data-types/semi-structured/MAP) 、[Struct 函数](../sql-manual/sql-functions/struct-functions/struct) 和 [JSON 函数](../sql-manual/sql-functions/json-functions/json-parse)。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/cte.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/cte.md index e78a1c09a5c..da6810711c2 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/cte.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/cte.md @@ -30,12 +30,14 @@ under the License. 要指定公用表表达式,请使用 `WITH` 具有一个或多个逗号分隔子句的子句。每个子条款都提供一个子查询,用于生成结果集,并将名称与子查询相关联。 -在包含该 `WITH`子句 的语句中,可以引用每个 CTE 名称以访问相应的 CTE 结果集。CTE 名称可以在其他 CTE 中引用,从而可以基于其他 CTE 定义 CTE。 +Doris 支持嵌套 CTE。在包含该 `WITH`子句 的语句中,可以引用每个 CTE 名称以访问相应的 CTE 结果集。CTE 名称可以在其他 CTE 中引用,从而可以基于其他 CTE 定义 CTE。 -Doris **不支持** 递归 CTE。 +Doris **不支持** 递归 CTE。有关递归 CTE 的详细解释,可以参考 [MySQL 递归 CTE 手册](https://dev.mysql.com/doc/refman/8.4/en/with.html#common-table-expressions-recursive) ## 示例 +### 简单示例 + 下面的示例定义名为的 CTE `cte1` 和 `cte2` 中 `WITH` 子句,并且是指在它们的顶层 `SELECT` 下面的 `WITH` 子句: ```sql @@ -45,3 +47,23 @@ WITH SELECT b,d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c; ``` + +### 嵌套 CTE + +```sql +WITH + cte1 AS (SELECT a, b FROM table1), + cte2 AS (SELECT c, d FROM cte1) +SELECT b, d FROM cte1 JOIN cte2 +WHERE cte1.a = cte2.c; +``` + +### 递归 CTE (Doris 不支持) + +```sql +WITH r_cte AS ( + SELECT 1 AS user_id, 2 as manager_id + UNION ALL + SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as manager_id) t ON r_cte.manager_id = t.user_id +) +SELECT * FROM r_cte diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/join.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/join.md index 5f6eff7039c..8a018314793 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/join.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/join.md @@ -427,20 +427,16 @@ Bucket Shuffle Join 和 Colocate Join 在应用时对数据分布和 JOIN 条件 2. 包含分桶列的等值条件:等值 Join 条件中须包含两张表的分桶列,当左表的分桶列作为等值 Join 条件时,更有可能被规划为 Bucket Shuffle Join。 -3. 数据类型一致性:由于不同的数据类型的 hash 值计算结果不同,左表的分桶列与右表的等值 Join 列的数据类型必须一致,否则将无法进行对应的规划。 +3. 表类型限制:Bucket Shuffle Join 仅适用于 Doris 原生的 OLAP 表。对于 ODBC、MySQL、ES 等外部表,当它们作为左表时,Bucket Shuffle Join 无法生效。 -4. 表类型限制:Bucket Shuffle Join 仅适用于 Doris 原生的 OLAP 表。对于 ODBC、MySQL、ES 等外部表,当它们作为左表时,Bucket Shuffle Join 无法生效。 - -5. 单分区要求:对于分区表,由于每个分区的数据分布可能不同,Bucket Shuffle Join 仅在左表为单分区时保证有效。因此在执行 SQL 时,应尽可能使用 `WHERE` 条件来启用分区裁剪策略。 +4. 单分区要求:对于分区表,由于每个分区的数据分布可能不同,Bucket Shuffle Join 仅在左表为单分区时保证有效。因此在执行 SQL 时,应尽可能使用 `WHERE` 条件来启用分区裁剪策略。 ### Colocate Join 的限制 在直接扫描两张物理表时,Colocate Join 相较于 Bucket Shuffle Join 具有更严格的限制条件,除了满足 Bucket Shuffle Join 的所有条件外,还需满足以下要求: -1. 不仅分桶列的类型必须一致,分桶的数量也必须相同,以确保数据分布的一致性。 - -2. 表的副本数必须保持一致。 +1. 分桶列的类型和分桶数量必须一致,以确保数据分布的一致性。 -3. 需要显式指定 Colocation Group,只有处于相同 Colocation Group 的表才能进行 Colocate Join。 +2. 需要显式指定 Colocation Group,只有处于相同 Colocation Group 的表才能进行 Colocate Join。 -4. 在进行副本修复或副本均衡等操作时,Colocation Group 可能处于 Unstable 状态,此时 Colocate Join 将退化为普通的 Join 操作。 \ No newline at end of file +3. 在进行副本修复或副本均衡等操作时,Colocation Group 可能处于 Unstable 状态,此时 Colocate Join 将退化为普通的 Join 操作。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/udf/alias-function.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/udf/alias-function.md index 73df67ec714..66df3d84ca8 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/udf/alias-function.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/udf/alias-function.md @@ -70,7 +70,3 @@ CREATE ALIAS FUNCTION func(INT, INT) WITH PARAMETER(foo, bar) AS foo + bar; ### 参数要求 当前别名函数不支持变长参数,且至少有一个参数。 - -## 更多内容 - -除别名函数外,Doris 支持 Java UDF、UDTF 功能,详细信息可参考文档 [Java UDF](../../query-data/udf/java-user-defined-function)。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/window-function.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/window-function.md index b9efd2e8d5a..a74ab33daf4 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/window-function.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-data/window-function.md @@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License. --> -分析函数,也称为窗口函数,是一种在 SQL 查询中对数据集中的行进行复杂计算的函数。窗口函数的特点在于,它们不会减少查询结果的行数,而是为每一行增加一个新的计算结果。窗口函数适用于多种分析场景,如计算运行总和、排名以及移动平均等。 +分析函数,也称为窗口函数,是一种在 SQL 查询中对数据集中的行进行复杂计算的函数。窗口函数的特点在于,它们不会减少查询结果的行数,而是为每一行增加一个新的计算结果。窗口函数适用于多种分析场景,如计算滚动合计、排名以及移动平均等。 下面是一个使用窗口函数计算每个商店的前后三天的销售移动平均值的例子: @@ -232,9 +232,9 @@ FROM 在这个例子中,学生按班级进行分区,然后在每个班级内按成绩分成 3 个组。每个组的学生数量尽可能均匀。 -## 分析函数 +## 聚合函数 -### 使用分析函数 SUM 计算累计值 +### 使用聚合函数 SUM 计算累计值 示例如下: @@ -294,9 +294,9 @@ GROUP BY 24 rows in set (0.13 sec) ``` -在此示例中,分析函数 SUM 为每一行定义一个窗口,该窗口从分区的开头(UNBOUNDED PRECEDING)开始,默认在当前行结束。在此示例中,需要嵌套使用 SUM,因为需要对本身就是 SUM 的结果执行 SUM。嵌套聚合在分析聚合函数中高频使用。 +在此示例中,聚合函数 SUM 为每一行定义一个窗口,该窗口从分区的开头(UNBOUNDED PRECEDING)开始,默认在当前行结束。在此示例中,需要嵌套使用 SUM,因为需要对本身就是 SUM 的结果执行 SUM。嵌套聚合在分析聚合函数中高频使用。 -### 使用分析函数 AVG 计算移动平均值 +### 使用聚合函数 AVG 计算移动平均值 示例如下: @@ -571,6 +571,7 @@ FROM 3 rows in set (0.03 sec) ``` +了解更多有关分析函数信息,可以参考 Oracle 官网文档 [SQL for Analysis and Reporting](https://docs.oracle.com/en/database/oracle/oracle-database/23/dwhsg/sql-analysis-reporting-data-warehouses.html#GUID-20EFBF1E-F79D-4E4A-906C-6E496EECA684) ## 附录 @@ -724,4 +725,4 @@ curl --location-trusted \ http://127.0.0.1:8030/api/doc_tpcds/customer_address/_stream_load ``` -数据文件``item_1_10.dat``,``date_dim_1_10.dat``,``store_sales.csv``,``customer_address_1_10.dat``可以[点击链接](https://cdn.selectdb.com/static/doc_ddl_dir_d27a752a7b.tar)下载。 \ No newline at end of file +数据文件``item_1_10.dat``,``date_dim_1_10.dat``,``store_sales.csv``,``customer_address_1_10.dat``可以[点击链接](https://cdn.selectdb.com/static/doc_ddl_dir_d27a752a7b.tar)下载。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/complex-type.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/complex-type.md index 8f4d4ea1300..7d0a648c107 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/complex-type.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/complex-type.md @@ -24,10 +24,8 @@ specific language governing permissions and limitations under the License. --> - Doris 支持 Array,Map,Struct,JSON 等复杂类型。 Doris 提供了针对以上复杂类型的各类函数。 -详细的函数支持,请查看 SQL 手册 / SQL 函数 下的 [数组函数](../sql-manual/sql-functions/array-functions/array)、[Struct 函数](../sql-manual/sql-functions/struct-functions/struct) 和 [JSON 函数](../sql-manual/sql-functions/json-functions/json-parse),Map 函数支持情况,直接查看 SQL 手册 / 数据类型 / [MAP](../sql-manual/sql-data-types/semi-structured/MAP)。 - +详细的函数支持,请查看 SQL 手册 / SQL 函数 下的 [Array 函数](../sql-manual/sql-functions/array-functions/array)、[Map 函数](../sql-manual/sql-data-types/semi-structured/MAP) 、[Struct 函数](../sql-manual/sql-functions/struct-functions/struct) 和 [JSON 函数](../sql-manual/sql-functions/json-functions/json-parse)。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/cte.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/cte.md index e78a1c09a5c..da6810711c2 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/cte.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/cte.md @@ -30,12 +30,14 @@ under the License. 要指定公用表表达式,请使用 `WITH` 具有一个或多个逗号分隔子句的子句。每个子条款都提供一个子查询,用于生成结果集,并将名称与子查询相关联。 -在包含该 `WITH`子句 的语句中,可以引用每个 CTE 名称以访问相应的 CTE 结果集。CTE 名称可以在其他 CTE 中引用,从而可以基于其他 CTE 定义 CTE。 +Doris 支持嵌套 CTE。在包含该 `WITH`子句 的语句中,可以引用每个 CTE 名称以访问相应的 CTE 结果集。CTE 名称可以在其他 CTE 中引用,从而可以基于其他 CTE 定义 CTE。 -Doris **不支持** 递归 CTE。 +Doris **不支持** 递归 CTE。有关递归 CTE 的详细解释,可以参考 [MySQL 递归 CTE 手册](https://dev.mysql.com/doc/refman/8.4/en/with.html#common-table-expressions-recursive) ## 示例 +### 简单示例 + 下面的示例定义名为的 CTE `cte1` 和 `cte2` 中 `WITH` 子句,并且是指在它们的顶层 `SELECT` 下面的 `WITH` 子句: ```sql @@ -45,3 +47,23 @@ WITH SELECT b,d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c; ``` + +### 嵌套 CTE + +```sql +WITH + cte1 AS (SELECT a, b FROM table1), + cte2 AS (SELECT c, d FROM cte1) +SELECT b, d FROM cte1 JOIN cte2 +WHERE cte1.a = cte2.c; +``` + +### 递归 CTE (Doris 不支持) + +```sql +WITH r_cte AS ( + SELECT 1 AS user_id, 2 as manager_id + UNION ALL + SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as manager_id) t ON r_cte.manager_id = t.user_id +) +SELECT * FROM r_cte diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/join.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/join.md index 5f6eff7039c..8a018314793 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/join.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/join.md @@ -427,20 +427,16 @@ Bucket Shuffle Join 和 Colocate Join 在应用时对数据分布和 JOIN 条件 2. 包含分桶列的等值条件:等值 Join 条件中须包含两张表的分桶列,当左表的分桶列作为等值 Join 条件时,更有可能被规划为 Bucket Shuffle Join。 -3. 数据类型一致性:由于不同的数据类型的 hash 值计算结果不同,左表的分桶列与右表的等值 Join 列的数据类型必须一致,否则将无法进行对应的规划。 +3. 表类型限制:Bucket Shuffle Join 仅适用于 Doris 原生的 OLAP 表。对于 ODBC、MySQL、ES 等外部表,当它们作为左表时,Bucket Shuffle Join 无法生效。 -4. 表类型限制:Bucket Shuffle Join 仅适用于 Doris 原生的 OLAP 表。对于 ODBC、MySQL、ES 等外部表,当它们作为左表时,Bucket Shuffle Join 无法生效。 - -5. 单分区要求:对于分区表,由于每个分区的数据分布可能不同,Bucket Shuffle Join 仅在左表为单分区时保证有效。因此在执行 SQL 时,应尽可能使用 `WHERE` 条件来启用分区裁剪策略。 +4. 单分区要求:对于分区表,由于每个分区的数据分布可能不同,Bucket Shuffle Join 仅在左表为单分区时保证有效。因此在执行 SQL 时,应尽可能使用 `WHERE` 条件来启用分区裁剪策略。 ### Colocate Join 的限制 在直接扫描两张物理表时,Colocate Join 相较于 Bucket Shuffle Join 具有更严格的限制条件,除了满足 Bucket Shuffle Join 的所有条件外,还需满足以下要求: -1. 不仅分桶列的类型必须一致,分桶的数量也必须相同,以确保数据分布的一致性。 - -2. 表的副本数必须保持一致。 +1. 分桶列的类型和分桶数量必须一致,以确保数据分布的一致性。 -3. 需要显式指定 Colocation Group,只有处于相同 Colocation Group 的表才能进行 Colocate Join。 +2. 需要显式指定 Colocation Group,只有处于相同 Colocation Group 的表才能进行 Colocate Join。 -4. 在进行副本修复或副本均衡等操作时,Colocation Group 可能处于 Unstable 状态,此时 Colocate Join 将退化为普通的 Join 操作。 \ No newline at end of file +3. 在进行副本修复或副本均衡等操作时,Colocation Group 可能处于 Unstable 状态,此时 Colocate Join 将退化为普通的 Join 操作。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/udf/alias-function.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/udf/alias-function.md index 73df67ec714..66df3d84ca8 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/udf/alias-function.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/udf/alias-function.md @@ -70,7 +70,3 @@ CREATE ALIAS FUNCTION func(INT, INT) WITH PARAMETER(foo, bar) AS foo + bar; ### 参数要求 当前别名函数不支持变长参数,且至少有一个参数。 - -## 更多内容 - -除别名函数外,Doris 支持 Java UDF、UDTF 功能,详细信息可参考文档 [Java UDF](../../query-data/udf/java-user-defined-function)。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/window-function.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/window-function.md index 55f71093d8c..a74ab33daf4 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/window-function.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-data/window-function.md @@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License. --> -分析函数,也称为窗口函数,是一种在 SQL 查询中对数据集中的行进行复杂计算的函数。窗口函数的特点在于,它们不会减少查询结果的行数,而是为每一行增加一个新的计算结果。窗口函数适用于多种分析场景,如计算运行总和、排名以及移动平均等。 +分析函数,也称为窗口函数,是一种在 SQL 查询中对数据集中的行进行复杂计算的函数。窗口函数的特点在于,它们不会减少查询结果的行数,而是为每一行增加一个新的计算结果。窗口函数适用于多种分析场景,如计算滚动合计、排名以及移动平均等。 下面是一个使用窗口函数计算每个商店的前后三天的销售移动平均值的例子: @@ -232,9 +232,9 @@ FROM 在这个例子中,学生按班级进行分区,然后在每个班级内按成绩分成 3 个组。每个组的学生数量尽可能均匀。 -## 分析函数 +## 聚合函数 -### 使用分析函数 SUM 计算累计值 +### 使用聚合函数 SUM 计算累计值 示例如下: @@ -294,9 +294,9 @@ GROUP BY 24 rows in set (0.13 sec) ``` -在此示例中,分析函数 SUM 为每一行定义一个窗口,该窗口从分区的开头(UNBOUNDED PRECEDING)开始,默认在当前行结束。在此示例中,需要嵌套使用 SUM,因为需要对本身就是 SUM 的结果执行 SUM。嵌套聚合在分析聚合函数中高频使用。 +在此示例中,聚合函数 SUM 为每一行定义一个窗口,该窗口从分区的开头(UNBOUNDED PRECEDING)开始,默认在当前行结束。在此示例中,需要嵌套使用 SUM,因为需要对本身就是 SUM 的结果执行 SUM。嵌套聚合在分析聚合函数中高频使用。 -### 使用分析函数 AVG 计算移动平均值 +### 使用聚合函数 AVG 计算移动平均值 示例如下: @@ -725,4 +725,4 @@ curl --location-trusted \ http://127.0.0.1:8030/api/doc_tpcds/customer_address/_stream_load ``` -数据文件``item_1_10.dat``,``date_dim_1_10.dat``,``store_sales.csv``,``customer_address_1_10.dat``可以[点击链接](https://cdn.selectdb.com/static/doc_ddl_dir_d27a752a7b.tar)下载。 \ No newline at end of file +数据文件``item_1_10.dat``,``date_dim_1_10.dat``,``store_sales.csv``,``customer_address_1_10.dat``可以[点击链接](https://cdn.selectdb.com/static/doc_ddl_dir_d27a752a7b.tar)下载。 diff --git a/versioned_docs/version-2.1/query-data/complex-type.md b/versioned_docs/version-2.1/query-data/complex-type.md index 0dbf7496c94..9db1b523fd8 100644 --- a/versioned_docs/version-2.1/query-data/complex-type.md +++ b/versioned_docs/version-2.1/query-data/complex-type.md @@ -28,7 +28,4 @@ Doris supports complex types such as Array, Map, Struct, and JSON. Doris provides various functions specifically designed for these complex types. - -For detailed information on the supported functions, please refer to the SQL manual- [Array functions](../sql-manual/sql-functions/array-functions/array), [SQL manual-Struct functions](../sql-manual/sql-functions/struct-functions/struct), and [SQL manual-JSON functions](../sql-manual/sql-functions/json-functions/json-parse). - -To check the support for Map functions, you can refer to the SQL manual docs under the section for the [MAP](../sql-manual/sql-data-types/semi-structured/MAP) data type. \ No newline at end of file +For detailed information on the supported functions, please refer to the [SQL manual - Array functions](../sql-manual/sql-functions/array-functions/array), [SQL manual - Map functions](../sql-manual/sql-data-types/semi-structured/MAP) data type, [SQL manual - Struct functions](../sql-manual/sql-functions/struct-functions/struct), and [SQL manual - Json functions](../sql-manual/sql-functions/json-functions/json-parse). diff --git a/versioned_docs/version-2.1/query-data/cte.md b/versioned_docs/version-2.1/query-data/cte.md index 941d74639a1..04299e4afec 100644 --- a/versioned_docs/version-2.1/query-data/cte.md +++ b/versioned_docs/version-2.1/query-data/cte.md @@ -30,12 +30,14 @@ Common Table Expression (CTE) define a temporary result set that can be referenc To specify a CTE, use the `WITH` clause with one or more comma-separated clauses. Each clause provides a subquery that generates a result set and associates a name with the subquery. -Within the statement that contains the `WITH` clause, you can reference each CTE name to access the corresponding CTE result set. CTE names can be referenced in other CTE, allowing you to define CTE based on other CTE. +Doris supports nested CTE。 Within the statement that contains the `WITH` clause, you can reference each CTE name to access the corresponding CTE result set. CTE names can be referenced in other CTE, allowing you to define CTE based on other CTE. -Doris **DO NOT** support recursive CTE. +Doris **DOES NOT** support recursive CTE. For more information, please read MySQL manual about [recursive CTE](https://dev.mysql.com/doc/refman/8.4/en/with.html#common-table-expressions-recursive) ## Example +### Simple CTE + The following example defines CTE named cte1 and cte2 within the WITH clause and refers to them in the top-level SELECT below the WITH clause: ```sql @@ -45,3 +47,23 @@ WITH SELECT b, d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c; ``` + +### Nested CTE + +```sql +WITH + cte1 AS (SELECT a, b FROM table1), + cte2 AS (SELECT c, d FROM cte1) +SELECT b, d FROM cte1 JOIN cte2 +WHERE cte1.a = cte2.c; +``` + +### Recursive CTE (NOT Support) + +```sql +WITH r_cte AS ( + SELECT 1 AS user_id, 2 as manager_id + UNION ALL + SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as manager_id) t ON r_cte.manager_id = t.user_id +) +SELECT * FROM r_cte diff --git a/versioned_docs/version-2.1/query-data/join.md b/versioned_docs/version-2.1/query-data/join.md index 03bb5928356..39f6b9a2f26 100644 --- a/versioned_docs/version-2.1/query-data/join.md +++ b/versioned_docs/version-2.1/query-data/join.md @@ -59,9 +59,9 @@ Doris supports two implementation methods for JOIN: **Hash Join** and **Nested L As a distributed MPP database, Apache Doris requires data shuffling during the Hash Join process to ensure the correctness of the JOIN results. Below are several data shuffling methods: -**Broadcast Join** As illustrated, the Broadcast Join process involves sending all data from the right table to all nodes participating in the JOIN computation, including the nodes scanning the left table data, while the left table data remains stationary. In this process, each node receives a complete copy of the right table's data (with a total volume of T(R)) to ensure that all nodes have the necessary data to perform the JOIN operation. +**Broadcast Join** As illustrated, the Broadcast Join process involves sending all data from the right table to all nodes participating in the JOIN computation, including the nodes scanning the left table's data, while the left table's data remains stationary. In this process, each node receives a complete copy of the right table's data (with a total volume of T(R)) to ensure that all nodes have the necessary data to perform the JOIN operation. -This method is suitable for various scenarios but is not applicable for RIGHT OUTER, RIGHT ANTI, and RIGHT SEMI types of Hash Join. Its network overhead is calculated as the number of JOIN nodes N multiplied by the volume of right table data T(R). +This method is suitable for various scenarios but is not applicable for RIGHT OUTER, RIGHT ANTI, and RIGHT SEMI types of Hash Join. Its network overhead is calculated as the number of JOIN nodes N multiplied by the volume of right table's data T(R).  @@ -69,13 +69,13 @@ This method is suitable for various scenarios but is not applicable for RIGHT OU This method computes hash values based on the JOIN conditions and performs bucketing. Specifically, the data from both the left and right tables is partitioned according to the hash values calculated from the JOIN conditions, and these partitioned data sets are then sent to the corresponding partition nodes (as illustrated). -The network overhead of this method mainly includes two parts: the cost of transferring the left table data T(S) and the cost of transferring the right table data T(R). This method only supports Hash Join operations because it relies on the JOIN conditions to perform data bucketing. +The network overhead of this method mainly includes two parts: the cost of transferring the left table's data T(S) and the cost of transferring the right table's data T(R). This method only supports Hash Join operations because it relies on the JOIN conditions to perform data bucketing.  ### Bucket Shuffle Join -When the JOIN condition includes the bucketed column from the left table, the left table's data remains unchanged while the right table's data is distributed to the left table's nodes for the JOIN, reducing network overhead. +When the JOIN condition includes the bucketed column from the left table, the left table's data location remains unchanged while the right table's data is distributed to the left table's nodes for the JOIN, reducing network overhead. When one side of the table involved in the JOIN operation has its data already hash-distributed according to the JOIN condition column, users can choose to keep this side's data location unchanged while distributing the other side's data based on the same JOIN condition column and hash distribution. (The term "table" here refers not only to physically stored tables but also to the output results of any operators in SQL queries. Users can flexibly choose to keep either the left or right t [...] @@ -418,20 +418,16 @@ When directly scanning two physical tables for a Bucket Shuffle Join, the follow 2. **Inclusion of bucketed columns in equality conditions**: The equality JOIN condition must include the bucketed columns from both tables. When the left table's bucketed column is used as the equality JOIN condition, it is more likely to be planned as a Bucket Shuffle Join. -3. **Data type consistency**: Since the hash value computation results differ for different data types, the data types of the left table's bucketed column and the right table's equality JOIN column must match; otherwise, the corresponding planning cannot occur. +3. **Table type restrictions**: Bucket Shuffle Join is only applicable to native OLAP tables in Doris. For external tables such as ODBC, MySQL, and ES, Bucket Shuffle Join cannot be effective when they are used as the left table. -4. **Table type restrictions**: Bucket Shuffle Join is only applicable to native OLAP tables in Doris. For external tables such as ODBC, MySQL, and ES, Bucket Shuffle Join cannot be effective when they are used as the left table. - -5. **Single Partition Requirement**: For partitioned tables, since the data distribution may differ across partitions, Bucket Shuffle Join is only guaranteed to be effective when the left table is a single partition. Therefore, when executing SQL, it is advisable to use `WHERE` conditions to enable partition pruning strategies whenever possible. +4. **Single Partition Requirement**: For partitioned tables, since the data distribution may differ across partitions, Bucket Shuffle Join is only guaranteed to be effective when the left table is a single partition. Therefore, when executing SQL, it is advisable to use `WHERE` conditions to enable partition pruning strategies whenever possible. ### Limitations of Colocate Join When directly scanning two physical tables, Colocate Join has stricter limitations compared to Bucket Shuffle Join. In addition to meeting all the conditions for Bucket Shuffle Join, the following requirements must also be satisfied: -1. **Consistency of bucket column types and counts**: Not only must the types of the bucketed columns match, but the number of buckets must also be the same to ensure data distribution consistency. - -2. **Consistency of table replicas**: The number of replicas for the tables must be consistent. +1. **bucket column types and counts are Same**: Not only must the types of the bucketed columns match, but the number of buckets must also be the same to ensure data distribution consistency. -3. **Explicit specification of Colocation Group**: A Colocation Group must be explicitly specified; only tables within the same Colocation Group can participate in a Colocate Join. +2. **Explicit specification of Colocation Group**: A Colocation Group must be explicitly specified; only tables within the same Colocation Group can participate in a Colocate Join. -4. **Unstable state during replica repair or balancing**: During operations such as replica repair or balancing, the Colocation Group may be in an unstable state. In this case, the Colocate Join will degrade to a regular Join operation. \ No newline at end of file +3. **Unstable state during replica repair or balancing**: During operations such as replica repair or balancing, the Colocation Group may be in an unstable state. In this case, the Colocate Join will degrade to a regular Join operation. \ No newline at end of file diff --git a/versioned_docs/version-2.1/query-data/subquery.md b/versioned_docs/version-2.1/query-data/subquery.md index 9e46b6eae3a..c70e55d4cd0 100644 --- a/versioned_docs/version-2.1/query-data/subquery.md +++ b/versioned_docs/version-2.1/query-data/subquery.md @@ -84,11 +84,11 @@ select * from t1 where t1.c1 in (select t2.c1 from t2); Subqueries can be classified into correlated subqueries and non-correlated subqueries based on whether they reference columns from the outer query: -**3. Non-correlated Subquery** +**1. Non-correlated Subquery** A subquery that does not reference any columns from the outer query. Non-correlated subqueries can often be computed independently and return the corresponding results once for the outer query to use. -**4. Correlated Subquery** +**2. Correlated Subquery** A subquery that references one or more columns from the main query (also known as the outer query) (the referenced outer columns are often in the WHERE condition of the subquery). Correlated subqueries can often be seen as a filtering operation on the externally associated table, as for each row of data in the outer table, the subquery is computed and returns the corresponding result. @@ -302,7 +302,7 @@ The difference between `Mark Join` and a regular `left semi join` is that a regu With this flag, the `where` filtering condition can be rewritten as `where mark_join_flag or t1.c1 > 0` to obtain the correct results. -## Common Issues +## Usage notes Since the output of a scalar subquery must be a single value, Doris adopts different processing methods for correlated and non-correlated scalar subqueries. diff --git a/versioned_docs/version-2.1/query-data/udf/alias-function.md b/versioned_docs/version-2.1/query-data/udf/alias-function.md index acebd4544ea..16285734cbf 100644 --- a/versioned_docs/version-2.1/query-data/udf/alias-function.md +++ b/versioned_docs/version-2.1/query-data/udf/alias-function.md @@ -69,7 +69,3 @@ CREATE ALIAS FUNCTION func(INT, INT) WITH PARAMETER(foo, bar) AS foo + bar; ### Parameter Requirements Currently, alias functions do not support variable-length parameters and must have at least one parameter. - -## More Content - -Apart from alias functions, Doris supports Java UDF and UDTF functionalities. For detailed information, please refer to the documentation [Java UDF](../../query-data/udf/java-user-defined-function). \ No newline at end of file diff --git a/versioned_docs/version-3.0/query-data/complex-type.md b/versioned_docs/version-3.0/query-data/complex-type.md index 0dbf7496c94..9db1b523fd8 100644 --- a/versioned_docs/version-3.0/query-data/complex-type.md +++ b/versioned_docs/version-3.0/query-data/complex-type.md @@ -28,7 +28,4 @@ Doris supports complex types such as Array, Map, Struct, and JSON. Doris provides various functions specifically designed for these complex types. - -For detailed information on the supported functions, please refer to the SQL manual- [Array functions](../sql-manual/sql-functions/array-functions/array), [SQL manual-Struct functions](../sql-manual/sql-functions/struct-functions/struct), and [SQL manual-JSON functions](../sql-manual/sql-functions/json-functions/json-parse). - -To check the support for Map functions, you can refer to the SQL manual docs under the section for the [MAP](../sql-manual/sql-data-types/semi-structured/MAP) data type. \ No newline at end of file +For detailed information on the supported functions, please refer to the [SQL manual - Array functions](../sql-manual/sql-functions/array-functions/array), [SQL manual - Map functions](../sql-manual/sql-data-types/semi-structured/MAP) data type, [SQL manual - Struct functions](../sql-manual/sql-functions/struct-functions/struct), and [SQL manual - Json functions](../sql-manual/sql-functions/json-functions/json-parse). diff --git a/versioned_docs/version-3.0/query-data/cte.md b/versioned_docs/version-3.0/query-data/cte.md index 941d74639a1..04299e4afec 100644 --- a/versioned_docs/version-3.0/query-data/cte.md +++ b/versioned_docs/version-3.0/query-data/cte.md @@ -30,12 +30,14 @@ Common Table Expression (CTE) define a temporary result set that can be referenc To specify a CTE, use the `WITH` clause with one or more comma-separated clauses. Each clause provides a subquery that generates a result set and associates a name with the subquery. -Within the statement that contains the `WITH` clause, you can reference each CTE name to access the corresponding CTE result set. CTE names can be referenced in other CTE, allowing you to define CTE based on other CTE. +Doris supports nested CTE。 Within the statement that contains the `WITH` clause, you can reference each CTE name to access the corresponding CTE result set. CTE names can be referenced in other CTE, allowing you to define CTE based on other CTE. -Doris **DO NOT** support recursive CTE. +Doris **DOES NOT** support recursive CTE. For more information, please read MySQL manual about [recursive CTE](https://dev.mysql.com/doc/refman/8.4/en/with.html#common-table-expressions-recursive) ## Example +### Simple CTE + The following example defines CTE named cte1 and cte2 within the WITH clause and refers to them in the top-level SELECT below the WITH clause: ```sql @@ -45,3 +47,23 @@ WITH SELECT b, d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c; ``` + +### Nested CTE + +```sql +WITH + cte1 AS (SELECT a, b FROM table1), + cte2 AS (SELECT c, d FROM cte1) +SELECT b, d FROM cte1 JOIN cte2 +WHERE cte1.a = cte2.c; +``` + +### Recursive CTE (NOT Support) + +```sql +WITH r_cte AS ( + SELECT 1 AS user_id, 2 as manager_id + UNION ALL + SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as manager_id) t ON r_cte.manager_id = t.user_id +) +SELECT * FROM r_cte diff --git a/versioned_docs/version-3.0/query-data/join.md b/versioned_docs/version-3.0/query-data/join.md index 03bb5928356..39f6b9a2f26 100644 --- a/versioned_docs/version-3.0/query-data/join.md +++ b/versioned_docs/version-3.0/query-data/join.md @@ -59,9 +59,9 @@ Doris supports two implementation methods for JOIN: **Hash Join** and **Nested L As a distributed MPP database, Apache Doris requires data shuffling during the Hash Join process to ensure the correctness of the JOIN results. Below are several data shuffling methods: -**Broadcast Join** As illustrated, the Broadcast Join process involves sending all data from the right table to all nodes participating in the JOIN computation, including the nodes scanning the left table data, while the left table data remains stationary. In this process, each node receives a complete copy of the right table's data (with a total volume of T(R)) to ensure that all nodes have the necessary data to perform the JOIN operation. +**Broadcast Join** As illustrated, the Broadcast Join process involves sending all data from the right table to all nodes participating in the JOIN computation, including the nodes scanning the left table's data, while the left table's data remains stationary. In this process, each node receives a complete copy of the right table's data (with a total volume of T(R)) to ensure that all nodes have the necessary data to perform the JOIN operation. -This method is suitable for various scenarios but is not applicable for RIGHT OUTER, RIGHT ANTI, and RIGHT SEMI types of Hash Join. Its network overhead is calculated as the number of JOIN nodes N multiplied by the volume of right table data T(R). +This method is suitable for various scenarios but is not applicable for RIGHT OUTER, RIGHT ANTI, and RIGHT SEMI types of Hash Join. Its network overhead is calculated as the number of JOIN nodes N multiplied by the volume of right table's data T(R).  @@ -69,13 +69,13 @@ This method is suitable for various scenarios but is not applicable for RIGHT OU This method computes hash values based on the JOIN conditions and performs bucketing. Specifically, the data from both the left and right tables is partitioned according to the hash values calculated from the JOIN conditions, and these partitioned data sets are then sent to the corresponding partition nodes (as illustrated). -The network overhead of this method mainly includes two parts: the cost of transferring the left table data T(S) and the cost of transferring the right table data T(R). This method only supports Hash Join operations because it relies on the JOIN conditions to perform data bucketing. +The network overhead of this method mainly includes two parts: the cost of transferring the left table's data T(S) and the cost of transferring the right table's data T(R). This method only supports Hash Join operations because it relies on the JOIN conditions to perform data bucketing.  ### Bucket Shuffle Join -When the JOIN condition includes the bucketed column from the left table, the left table's data remains unchanged while the right table's data is distributed to the left table's nodes for the JOIN, reducing network overhead. +When the JOIN condition includes the bucketed column from the left table, the left table's data location remains unchanged while the right table's data is distributed to the left table's nodes for the JOIN, reducing network overhead. When one side of the table involved in the JOIN operation has its data already hash-distributed according to the JOIN condition column, users can choose to keep this side's data location unchanged while distributing the other side's data based on the same JOIN condition column and hash distribution. (The term "table" here refers not only to physically stored tables but also to the output results of any operators in SQL queries. Users can flexibly choose to keep either the left or right t [...] @@ -418,20 +418,16 @@ When directly scanning two physical tables for a Bucket Shuffle Join, the follow 2. **Inclusion of bucketed columns in equality conditions**: The equality JOIN condition must include the bucketed columns from both tables. When the left table's bucketed column is used as the equality JOIN condition, it is more likely to be planned as a Bucket Shuffle Join. -3. **Data type consistency**: Since the hash value computation results differ for different data types, the data types of the left table's bucketed column and the right table's equality JOIN column must match; otherwise, the corresponding planning cannot occur. +3. **Table type restrictions**: Bucket Shuffle Join is only applicable to native OLAP tables in Doris. For external tables such as ODBC, MySQL, and ES, Bucket Shuffle Join cannot be effective when they are used as the left table. -4. **Table type restrictions**: Bucket Shuffle Join is only applicable to native OLAP tables in Doris. For external tables such as ODBC, MySQL, and ES, Bucket Shuffle Join cannot be effective when they are used as the left table. - -5. **Single Partition Requirement**: For partitioned tables, since the data distribution may differ across partitions, Bucket Shuffle Join is only guaranteed to be effective when the left table is a single partition. Therefore, when executing SQL, it is advisable to use `WHERE` conditions to enable partition pruning strategies whenever possible. +4. **Single Partition Requirement**: For partitioned tables, since the data distribution may differ across partitions, Bucket Shuffle Join is only guaranteed to be effective when the left table is a single partition. Therefore, when executing SQL, it is advisable to use `WHERE` conditions to enable partition pruning strategies whenever possible. ### Limitations of Colocate Join When directly scanning two physical tables, Colocate Join has stricter limitations compared to Bucket Shuffle Join. In addition to meeting all the conditions for Bucket Shuffle Join, the following requirements must also be satisfied: -1. **Consistency of bucket column types and counts**: Not only must the types of the bucketed columns match, but the number of buckets must also be the same to ensure data distribution consistency. - -2. **Consistency of table replicas**: The number of replicas for the tables must be consistent. +1. **bucket column types and counts are Same**: Not only must the types of the bucketed columns match, but the number of buckets must also be the same to ensure data distribution consistency. -3. **Explicit specification of Colocation Group**: A Colocation Group must be explicitly specified; only tables within the same Colocation Group can participate in a Colocate Join. +2. **Explicit specification of Colocation Group**: A Colocation Group must be explicitly specified; only tables within the same Colocation Group can participate in a Colocate Join. -4. **Unstable state during replica repair or balancing**: During operations such as replica repair or balancing, the Colocation Group may be in an unstable state. In this case, the Colocate Join will degrade to a regular Join operation. \ No newline at end of file +3. **Unstable state during replica repair or balancing**: During operations such as replica repair or balancing, the Colocation Group may be in an unstable state. In this case, the Colocate Join will degrade to a regular Join operation. \ No newline at end of file diff --git a/versioned_docs/version-3.0/query-data/subquery.md b/versioned_docs/version-3.0/query-data/subquery.md index 78d7618a81d..ddff0b8f53d 100644 --- a/versioned_docs/version-3.0/query-data/subquery.md +++ b/versioned_docs/version-3.0/query-data/subquery.md @@ -84,11 +84,11 @@ select * from t1 where t1.c1 in (select t2.c1 from t2); Subqueries can be classified into correlated subqueries and non-correlated subqueries based on whether they reference columns from the outer query: -**3. Non-correlated Subquery** +**1. Non-correlated Subquery** A subquery that does not reference any columns from the outer query. Non-correlated subqueries can often be computed independently and return the corresponding results once for the outer query to use. -**4. Correlated Subquery** +**2. Correlated Subquery** A subquery that references one or more columns from the main query (also known as the outer query) (the referenced outer columns are often in the WHERE condition of the subquery). Correlated subqueries can often be seen as a filtering operation on the externally associated table, as for each row of data in the outer table, the subquery is computed and returns the corresponding result. @@ -302,7 +302,7 @@ The difference between `Mark Join` and a regular `left semi join` is that a regu With this flag, the `where` filtering condition can be rewritten as `where mark_join_flag or t1.c1 > 0` to obtain the correct results. -## Common Issues +## Usage notes Since the output of a scalar subquery must be a single value, a runtime error will be reported when the subquery returns more than one row of data. diff --git a/versioned_docs/version-3.0/query-data/udf/alias-function.md b/versioned_docs/version-3.0/query-data/udf/alias-function.md index acebd4544ea..16285734cbf 100644 --- a/versioned_docs/version-3.0/query-data/udf/alias-function.md +++ b/versioned_docs/version-3.0/query-data/udf/alias-function.md @@ -69,7 +69,3 @@ CREATE ALIAS FUNCTION func(INT, INT) WITH PARAMETER(foo, bar) AS foo + bar; ### Parameter Requirements Currently, alias functions do not support variable-length parameters and must have at least one parameter. - -## More Content - -Apart from alias functions, Doris supports Java UDF and UDTF functionalities. For detailed information, please refer to the documentation [Java UDF](../../query-data/udf/java-user-defined-function). \ No newline at end of file --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org