morrySnow commented on code in PR #2039: URL: https://github.com/apache/doris-website/pull/2039#discussion_r1953750564
########## docs/sql-manual/sql-statements/table-and-view/sync-materialized-view/CREATE-MATERIALIZED-VIEW.md: ########## @@ -0,0 +1,126 @@ +--- +{ + "title": "CREATE SYNC MATERIALIZED VIEW", + "language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +## Description + +Statement for creating a synchronized materialized view. + +## Syntax + +```sql +CREATE MATERIALIZED VIEW <materialized_view_name> AS <query> +``` + +Where + +```sql +query + : + SELECT <select_expr> select_expr[, select_expr ...] + FROM <base_table> + GROUP BY <column_name>[, <column_name> ...] + ORDER BY <column_name>[, <column_name> ...] +``` + +## Required Parameters + +**1. <materialized_view_name>** + +> Specifies the identifier (i.e., name) of the materialized view. Since a synchronized materialized view is created based on a table, the name must be unique within the same table. +> +> The identifier must start with a letter character (if Unicode name support is enabled, it can be any character from any language) and cannot contain spaces or special characters unless the entire identifier string is enclosed in backticks (e.g., `My Object`). +> +> The identifier cannot be a reserved keyword. +> +> For more details, refer to the requirements for identifiers and reserved keywords. + +**2. <query>** + +> The query statement used to construct the materialized view, the result of which constitutes the data of the materialized view. The currently supported query format is: +> +> The syntax is consistent with the query statement syntax. +> +> - `select_expr`: All columns in the schema of the materialized view. +> - Must include at least one single column. +> - `base_table`: The name of the base table for the materialized view, a required item. +> - Must be a single table, not a subquery. +> - `group by`: The grouping columns of the materialized view, an optional item. +> - If not specified, the data will not be grouped. +> - `order by`: The sorting columns of the materialized view, an optional item. +> - The declaration order of the sorting columns must be consistent with the order of columns declared in `select_expr`. +> - If `order by` is not declared, sorting columns will be automatically supplemented according to the rules. If the materialized view is of the aggregate type, all grouping columns will be automatically added as sorting columns. If the materialized view is of the non-aggregate type, the first 36 bytes will be automatically added as sorting columns. +> - If the number of automatically supplemented sorting columns is less than 3, the first three will be used as sorting columns. If the query contains grouping columns, the sorting columns must be consistent with the grouping columns. + +## Access Control Requirements + +| Privilege | Object | Notes | +| ---------- | ------ | ------------------------------------------------------------ | +| ALTER_PRIV | Table | Requires ALTER_PRIV permission on the base table of the current materialized view | + +## Notes + +- Synchronized materialized views only support SELECT statements for a single table, supporting WHERE, GROUP BY, ORDER BY clauses, but not JOIN, HAVING, LIMIT clauses, or LATERAL VIEW. +- The SELECT list cannot contain auto-increment columns, constants, duplicate expressions, or window functions. +- If the SELECT list contains aggregate functions, the aggregate functions must be root expressions (e.g., `sum(a + 1)` is supported, but `sum(a) + 1` is not), and no other non-aggregate function expressions can follow the aggregate functions (for example, `SELECT x, sum(a)` is acceptable, but `SELECT sum(a), x` is not). +- Too many materialized views on a single table can affect the efficiency of data import: when importing data, the data of the materialized views and the Base table are updated synchronously. If there are too many materialized views on a table, it may slow down the import speed, similar to importing data into multiple tables simultaneously in a single import operation. +- When a materialized view targets the Unique Key data model, it can only change the order of columns and cannot perform aggregation. Therefore, on the Unique Key model, data cannot be coarsely aggregated by creating materialized views. + +## Example + +```sql +desc lineitem; ++-----------------+---------------+------+-------+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++-----------------+---------------+------+-------+---------+-------+ +| l_orderkey | int | No | true | NULL | | +| l_partkey | int | No | true | NULL | | +| l_suppkey | int | No | true | NULL | | +| l_linenumber | int | No | true | NULL | | +| l_quantity | decimal(15,2) | No | false | NULL | NONE | +| l_extendedprice | decimal(15,2) | No | false | NULL | NONE | +| l_discount | decimal(15,2) | No | false | NULL | NONE | +| l_tax | decimal(15,2) | No | false | NULL | NONE | +| l_returnflag | char(1) | No | false | NULL | NONE | +| l_linestatus | char(1) | No | false | NULL | NONE | +| l_shipdate | date | No | false | NULL | NONE | +| l_commitdate | date | No | false | NULL | NONE | +| l_receiptdate | date | No | false | NULL | NONE | +| l_shipinstruct | char(25) | No | false | NULL | NONE | +| l_shipmode | char(10) | No | false | NULL | NONE | +| l_comment | varchar(44) | No | false | NULL | NONE | ++-----------------+---------------+------+-------+---------+-------+ +CREATE MATERIALIZED VIEW sync_agg_mv AS +SELECT + l_shipdate, + l_partkey, + count(*), + sum(l_discount) +FROM + lineitem +GROUP BY + l_shipdate, + l_partkey; +``` Review Comment: ````suggestion ```sql desc lineitem; ``` ```text +-----------------+---------------+------+-------+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+---------------+------+-------+---------+-------+ | l_orderkey | int | No | true | NULL | | | l_partkey | int | No | true | NULL | | | l_suppkey | int | No | true | NULL | | | l_linenumber | int | No | true | NULL | | | l_quantity | decimal(15,2) | No | false | NULL | NONE | | l_extendedprice | decimal(15,2) | No | false | NULL | NONE | | l_discount | decimal(15,2) | No | false | NULL | NONE | | l_tax | decimal(15,2) | No | false | NULL | NONE | | l_returnflag | char(1) | No | false | NULL | NONE | | l_linestatus | char(1) | No | false | NULL | NONE | | l_shipdate | date | No | false | NULL | NONE | | l_commitdate | date | No | false | NULL | NONE | | l_receiptdate | date | No | false | NULL | NONE | | l_shipinstruct | char(25) | No | false | NULL | NONE | | l_shipmode | char(10) | No | false | NULL | NONE | | l_comment | varchar(44) | No | false | NULL | NONE | +-----------------+---------------+------+-------+---------+-------+ ``` ```sql CREATE MATERIALIZED VIEW sync_agg_mv AS SELECT l_shipdate, l_partkey, count(*), sum(l_discount) FROM lineitem GROUP BY l_shipdate, l_partkey; ``` ```` ########## docs/sql-manual/sql-statements/table-and-view/async-materialized-view/CREATE-ASYNC-MATERIALIZED-VIEW.md: ########## @@ -0,0 +1,184 @@ +--- +{ +"title": "CREATE ASYNC MATERIALIZED VIEW", +"language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +## Description + +Statement for creating an asynchronous materialized view. The column names and types are derived from the materialized view SQL statement. Custom column names are allowed, but column types cannot be defined. + +## Syntax + + +```sql +CREATE MATERIALIZED VIEW +[ IF NOT EXISTS ] <materialized_view_name> + [ (<columns_definition>) ] + [ BUILD <build_mode> ] + [ REFRESH <refresh_method> [<refresh_trigger>]] + [ [DUPLICATE] KEY (<key_cols>) ] + [ COMMENT '<table_comment>' ] + [ PARTITION BY ( + { <partition_col> + | DATE_TRUNC(<partition_col>, <partition_unit>) } + )] + [ DISTRIBUTED BY { HASH (<distribute_cols>) | RANDOM } + [ BUCKETS { <bucket_count> | AUTO } ] + ] + [ PROPERTIES ( + -- Table property + <table_property> + -- Additional table properties + [ , ... ]) + ] + AS <query> +``` + +Where: + + +```sql +columns_definition + : -- Column definition + <col_name> + [ COMMENT '<col_comment>' ] +refresh_trigger + : ON MANUAL + | ON SCHEDULE EVERY <int_value> <refresh_unit> [ STARTS '<start_time>'] + | ON COMMIT +``` + +## Required Parameters + +**1. <materialized_view_name>** + +> Specifies the identifier (i.e., name) of the materialized view; it must be unique within the database where the table is created. +> +> The identifier must start with a letter character (if Unicode name support is enabled, it can be any character from any language) and cannot contain spaces or special characters unless the entire identifier string is enclosed in backticks (e.g., `My Object`). +> +> The identifier cannot be a reserved keyword. +> +> For more details, refer to the requirements for identifiers and reserved keywords. + +**2. <query>** + +> A required parameter when creating a materialized view. Specifies the SELECT statement that populates the data. + +## Optional Parameters + +**1. <key_cols>** + +> The key columns of the table. In Doris, key columns must be the first K columns of the table. For more information on key restrictions and how to choose key columns, refer to the respective sections in the "Data Model" chapter. + +**2. <build_mode>** + +> Refresh timing: whether the materialized view should be refreshed immediately after creation. +> +> IMMEDIATE: Refresh immediately. Default is IMMEDIATE. +> +> DEFERRED: Delayed refresh. + +**3. <refresh_method>** + +> Refresh method: +> +> COMPLETE: Refresh all partitions. +> +> AUTO: Try to refresh incrementally, only refreshing partitions that have changed since the last materialized view refresh. If incremental refresh is not possible, all partitions will be refreshed. + +**4. <refresh_trigger>** + +> Trigger method: +> +> MANUAL: Manual refresh. +> +> ON SCHEDULE: Scheduled refresh. +> +> ON COMMIT: Triggered refresh, where changes to the base table data trigger a refresh of the materialized view. + +**5. <refresh_unit>** + +> The time unit for periodic refreshes. Currently supported units are MINUTE, HOUR, DAY, and WEEK. + +**6. <partition_col>** + +> If PARTITION BY is not specified, there will be only one partition by default. +> +> If a partition field is specified, it will automatically infer which base table the field comes from and synchronize the base table (currently supports internal tables and Hive tables). For internal tables, only one partition field is allowed. +> +> Materialized views can also reduce the number of partitions through partition roll-up. Currently, the partition roll-up function supports `date_trunc`. + +**7. <partition_unit>** + +> The aggregation granularity for partition roll-up. Currently supported units are HOUR, DAY, WEEK, QUARTER, MONTH, and YEAR. + +**8. <start_time>** + +> The scheduled start time must be in the future, i.e., later than the current time. + +**9. <table_property>** + +Properties used by internal tables, most of which can be used by materialized views, along with some properties specific to materialized views, as listed below: + +| Property Name | Description | +| -------------------------------- | ------------------------------------------------------------ | +| grace_period | The maximum allowed delay in seconds for materialized view data during query rewriting. If partition A of the materialized view and the base table data are inconsistent, and the last refresh time of partition A was 10:15:00, with the current system time being 10:15:08, the partition will not be transparently rewritten. However, if `grace_period` is set to 10, the partition will be used for transparent rewriting. | +| excluded_trigger_tables | Comma-separated table names to be ignored during data refresh. For example, `table1,table2`. | +| refresh_partition_num | The number of partitions refreshed by a single INSERT statement, defaulting to 1. When refreshing a materialized view, it first calculates the list of partitions to be refreshed and then splits them into multiple INSERT statements for sequential execution. If an INSERT statement fails, the entire task will stop. The materialized view ensures the atomicity of a single INSERT statement, and a failed INSERT will not affect partitions that have already been refreshed successfully. | +| workload_group | The name of the `workload_group` used when the materialized view executes refresh tasks. This is used to limit the resources used by the materialized view during data refresh to avoid impacting other business operations. For more information on creating and using `workload_group`, refer to the [WORKLOAD-GROUP](https://doris.apache.org/zh-CN/docs/admin-manual/workload-group.md) documentation. | +| partition_sync_limit | When the base table's partition field is of type time, this property can be used to configure the range of partitions to synchronize with the base table, in conjunction with `partition_sync_time_unit`. For example, setting it to 2 with `partition_sync_time_unit` set to `MONTH` means that only the partitions and data from the last 2 months of the base table will be synchronized. The minimum value is `1`. As time progresses, the materialized view will automatically add and remove partitions during each refresh. For example, if the materialized view currently has data for months 2 and 3, next month it will automatically remove the data for month 2 and add data for month 4. | +| partition_sync_time_unit | The time unit for partition refresh, supporting DAY/MONTH/YEAR (default is DAY). | +| partition_date_format | When the base table's partition field is of type string, if you want to use the `partition_sync_limit` capability, you can set the date format to parse the partition time according to the `partition_date_format` setting. | +| enable_nondeterministic_function | Whether the materialized view definition SQL is allowed to contain nondeterministic functions, such as current_date(), now(), random(), etc. If set to true, it allows the inclusion; otherwise, it does not. The default is not to allow. | +| use_for_rewrite | Indicates whether this materialized view participates in transparent rewriting. If set to false, it does not participate in transparent rewriting. The default is true. In data modeling scenarios, if the materialized view is only used for direct queries, this property can be set so that the materialized view does not participate in transparent rewriting, thereby improving query response speed. | + +## Access Control Requirements + +The user executing this SQL command must have at least the following permissions: + +| Privilege | Object | Notes | +| ----------- | ----------- | ------------------------------------------------------------ | +| CREATE_PRIV | Database | | +| SELECT_PRIV | Table, View | Requires SELECT_PRIV permission on the tables or views queried in <query> | Review Comment: ```suggestion | SELECT_PRIV | Table, View | Requires SELECT_PRIV permission on the tables or views queried in `<query>` | ``` ########## docs/sql-manual/sql-statements/table-and-view/async-materialized-view/CREATE-ASYNC-MATERIALIZED-VIEW.md: ########## @@ -0,0 +1,184 @@ +--- +{ +"title": "CREATE ASYNC MATERIALIZED VIEW", +"language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +## Description + +Statement for creating an asynchronous materialized view. The column names and types are derived from the materialized view SQL statement. Custom column names are allowed, but column types cannot be defined. + +## Syntax + + +```sql +CREATE MATERIALIZED VIEW +[ IF NOT EXISTS ] <materialized_view_name> + [ (<columns_definition>) ] + [ BUILD <build_mode> ] + [ REFRESH <refresh_method> [<refresh_trigger>]] + [ [DUPLICATE] KEY (<key_cols>) ] + [ COMMENT '<table_comment>' ] + [ PARTITION BY ( + { <partition_col> + | DATE_TRUNC(<partition_col>, <partition_unit>) } + )] + [ DISTRIBUTED BY { HASH (<distribute_cols>) | RANDOM } + [ BUCKETS { <bucket_count> | AUTO } ] + ] + [ PROPERTIES ( + -- Table property + <table_property> + -- Additional table properties + [ , ... ]) + ] + AS <query> +``` + +Where: + + +```sql +columns_definition + : -- Column definition + <col_name> + [ COMMENT '<col_comment>' ] +refresh_trigger + : ON MANUAL + | ON SCHEDULE EVERY <int_value> <refresh_unit> [ STARTS '<start_time>'] + | ON COMMIT +``` + +## Required Parameters + +**1. <materialized_view_name>** Review Comment: ```suggestion **1. `<materialized_view_name>`** ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org