(doris-website) branch master updated: [Improvement](docs) refine query acceleration doc (#1610)

englefly Wed, 25 Dec 2024 05:36:17 -0800

This is an automated email from the ASF dual-hosted git repository.

englefly pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 5757b60b0f7 [Improvement](docs) refine query acceleration doc (#1610)
5757b60b0f7 is described below

commit 5757b60b0f7a176a26f2874a3c2ba0a663687f51
Author: xzj7019 <xiongzhongj...@selectdb.com>
AuthorDate: Wed Dec 25 21:34:14 2024 +0800

    [Improvement](docs) refine query acceleration doc (#1610)
    
    ## Versions
    
    - [x] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [ ] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 .../accelerating-queries-with-sql-cache.md         |  72 +-----
 .../tuning/tuning-plan/adjusting-join-shuffle.md   |   3 +-
 .../tuning-plan/controlling-hints-with-cbo-rule.md |   2 +-
 .../tuning/tuning-plan/dml-tuning-plan.md          |  96 +-------
 .../transparent-rewriting-with-async-mv.md         | 268 ++++++++-------------
 .../transparent-rewriting-with-sync-mv.md          |  59 +++--
 6 files changed, 146 insertions(+), 354 deletions(-)

diff --git 
a/docs/query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache.md
 
b/docs/query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache.md
index 86c34727adb..cb2fb2254bd 100644
--- 
a/docs/query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache.md
+++ 
b/docs/query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache.md
@@ -24,69 +24,17 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Principle
+## Overview
+For the detailed implementation principle of SQL Cache, please refer to the 
chapter [SQL Cache](../../../query-acceleration/sql-cache-manual).
 
-For detailed implementation principles of SQL Cache, please refer to [SQL 
Cache](../../../query-acceleration/sql-cache-manual).
-
-## Use Case
-
-The following case illustrates how to enable and use SQL Cache in Doris:
-
-1. Ensure that `cache_enable_sql_mode` in `fe.conf` is set to `true` (default 
is `true`):
-   
-    ```sql
-    vim fe/conf/fe.conf
-    cache_enable_sql_mode=true
-    ```
-
-
-2. Set the variable in the MySQL command line:
-   
-    ```sql
-    MySQL [(none)]> set global enable_sql_cache=true;
-    ```
-
-    Note: `GLOBAL` indicates a global variable, not just for the current 
session.
-
-3. In Doris versions 2.1.3 and above, you can control the number of cache key 
entries and the cleanup time with the following commands:
-   
-    ```sql
-    MySQL [(none)]> ADMIN SET FRONTEND CONFIG ('sql_cache_manage_num' = '100');
-    MySQL [(none)]> ADMIN SET FRONTEND CONFIG ('expire_sql_cache_in_fe_second' 
= '300');
-    ```
-
-
-4. Execute the query
-
-    Suppose we have a table named "sales" containing information on date, 
product, and sales amount. We need to query the total sales for each product 
over the past 30 days:
-
-    ```sql
-    SELECT product, SUM(amount) as total_sales
-    FROM sales
-    WHERE date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
-    GROUP BY product
-    ORDER BY total_sales DESC
-    LIMIT 10;
-    ```
-
-    When this query is executed for the first time, Doris retrieves the 
results from the BE and stores them in the cache. Subsequent executions of the 
same query will retrieve the results directly from the cache if the data has 
not been updated, significantly improving query speed.
-
-5. Cache Conditions
-
-    After the initial query, the query results will be cached if the following 
three conditions are met:
-
-    - (Current time - last update time of the query partition) is greater than 
`cache_last_version_interval_second` in `fe.conf`.
-
-    - The number of query result rows is less than 
`cache_result_max_row_count` in `fe.conf`.
-
-    - The number of bytes in the query results is less than 
`cache_result_max_data_size` in `fe.conf`.
+## Case
+For detailed cases, please refer to the chapter [SQL 
Cache](../../../query-acceleration/sql-cache-manual).
 
 ## Summary
+SQL Cache is a query optimization mechanism provided by Doris, which can 
significantly improve query performance. When using it, the following points 
should be noted:
 
-SQL Cache is a query optimization mechanism provided by Doris that can 
significantly improve query performance. When using it, please note:
-
-1. SQL Cache is not suitable for queries containing functions that generate 
random values (such as `random()`), as this can cause the query results to lose 
randomness.
-
-2. Currently, it does not support using cached results of some metrics to 
satisfy queries for more metrics. For example, cached results for two metrics 
cannot be used for queries involving three metrics.
-
-3. By reasonably using SQL Cache, you can significantly improve Doris's query 
performance, especially in scenarios with low data update frequency. In 
practical applications, cache parameters need to be adjusted based on specific 
data characteristics and query patterns to achieve optimal performance 
improvements.
\ No newline at end of file
+:::tip Note
+- SQL Cache is not suitable for queries containing functions that generate 
random values (such as `random()`), as this will cause the query results to 
lose randomness.
+- Currently, it does not support using the cached results of some metrics to 
meet the needs of querying more metrics. For example, the cache for previously 
queried two metrics cannot be used for the situation of querying three metrics.
+- By reasonably using SQL Cache, the query performance of Doris can be 
significantly improved, especially in scenarios with a low data update 
frequency. In practical applications, cache parameters need to be adjusted 
according to specific data characteristics and query patterns to achieve the 
best performance improvement.
+  ::: 
\ No newline at end of file
diff --git 
a/docs/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md 
b/docs/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md
index 93979be0c4c..c8d1bb7694f 100644
--- a/docs/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md
+++ b/docs/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md
@@ -1,3 +1,4 @@
+---
 {
     "title": "Adjusting Join Shuffle Mode with Hint",
     "language": "en"
@@ -40,7 +41,7 @@ SELECT COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2;
 SELECT COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2;
 ```
 
-## Case Study
+## Case
 
 Next, we will demonstrate the usage of Distribute Hints through an example:
 
diff --git 
a/docs/query-acceleration/tuning/tuning-plan/controlling-hints-with-cbo-rule.md 
b/docs/query-acceleration/tuning/tuning-plan/controlling-hints-with-cbo-rule.md
index f71bb303bee..3b1db76c4b1 100644
--- 
a/docs/query-acceleration/tuning/tuning-plan/controlling-hints-with-cbo-rule.md
+++ 
b/docs/query-acceleration/tuning/tuning-plan/controlling-hints-with-cbo-rule.md
@@ -51,7 +51,7 @@ Currently, the Doris optimizer supports several cost-based 
rewrites, which can b
 - PUSH_DOWN_AGG_THROUGH_JOIN_ONE_SIDE
 - PUSH_DOWN_DISTINCT_THROUGH_JOIN
 
-## Example
+## Case
 
 Here is a query example:
 
diff --git a/docs/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md 
b/docs/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md
index 418354590ed..f0497ece45b 100644
--- a/docs/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md
+++ b/docs/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md
@@ -24,98 +24,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-DML Plan Tuning Includes Both Import Section (INSERT INTO SELECT) and Query 
Section (CREATE TABLE AS SELECT - CTAS). Next, will introduce the principles 
and tuning practices of these two parts separately.
+For DML plan tuning, it is first necessary to identify whether the performance 
bottleneck is caused by the import process or the query section. For the 
troubleshooting and tuning of performance bottlenecks in the query section, 
please refer to other subsections in [Plan Tuning](optimizing-table-schema.md) 
for details.
 
-## Loading
-
-### Principles
-
-Apache Doris offers a variety of flexible data import solutions to meet data 
access needs in different scenarios. Doris supports importing data from the 
following data sources:
-
-1. Object storage (S3) and HDFS
-
-2. Local files
-
-3. Kafka
-
-4. Relational databases (such as MySQL, PostgreSQL, Oracle, SQLServer, etc.)
-
-5. Data sources connected via JDBC
-
-6. JSON format data
-
-**Doris provides the following main data import methods:**
-
-1. Broker Load: Imports data from external storage systems through the Broker 
process
-
-2. Stream Load: Streams data from local files or in-memory data
-
-3. Routine Load: Continuously imports data from Kafka
-
-4. INSERT INTO: Imports data through SQL insert statements
-
-5. S3 Load: Directly imports data from object storage that supports the S3 
protocol
-
-6. MySQL Load: Uses MySQL clients to import local data
-
-**Different import methods support slightly different data formats:**
-
-1. Broker Load: Supports Parquet, ORC, CSV, and GZip formats
-
-2. Stream Load: Supports CSV, JSON, Parquet, and ORC formats
-
-3. Routine Load: Supports CSV and JSON formats
-
-4. MySQL Load: Supports CSV format
-
-**Data import has the following mechanisms:**
-
-1. Atomicity guarantee: Each import job acts as a complete transaction, 
ensuring atomic data writes.
-
-2. Import label: Each import job is assigned a unique label to ensure 
At-Most-Once semantics.
-
-3. Synchronous/Asynchronous mode: Synchronous mode returns results 
immediately, while asynchronous mode requires separate querying of job status.
-
-4. Array type support: Array type data can be imported using CAST and array 
functions.
-
-5. Execution engine: Users can choose whether to use the Pipeline engine to 
execute import tasks based on configuration.
-
-**In practicality, the following considerations are necessary:**
-
-1. Reasonable selection of import methods: Choose the most appropriate import 
method for different data sources.
-
-2. Utilization of label mechanism: Achieve Exactly-Once semantics guarantee.
-
-3. Proper configuration of parallelism: Adjust the number of parallel imports 
based on cluster resources.
-
-4. Monitoring of import status: For asynchronous imports, check the job's 
progress timely.
-
-By flexibly using the various import functions provided by Doris, data from 
various sources can be efficiently imported into Doris for analysis. For more 
details, please refer to the [Data Loading 
Overview](../../../data-operate/import/import-way/load-manual).
-
-### Loading Optimization
-
-The Pipeline engine is a new query execution engine in Doris, designed to 
improve the efficiency of queries and data processing. During data import, the 
Pipeline engine can also be enabled to enhance overall performance. By default, 
the Pipeline engine is disabled during data import, but users can enable it 
through relevant configurations.
-
-To enable the Pipeline engine during data import, configure the following 
variables:
-
-**1. FE Configuration Item: enable_pipeline_load**
-
-- Location: In the configuration file of the FE (Frontend)
-
-- Function: When enabled, import tasks such as Stream Load will attempt to 
execute using the Pipeline engine
-
-**2. Session Variable: enable_nereids_dml_with_pipeline**
-
-- Location: Set at session level
-
-- Function: When enabled, INSERT INTO statements will attempt to execute using 
the Pipeline engine
-
-**3. Session Variable: enable_pipeline_engine**
-   
-- Location: Set at session level
-
-- Function: Controls whether the Pipeline engine is actually enabled
-
-## Querying
-
-For details, please refer to other sections on [plan 
tuning](../../../query-acceleration/tuning/tuning-plan/optimizing-table-schema).
\ No newline at end of file
+Doris supports importing data from multiple data sources. By flexibly 
utilizing the various import functions provided by Doris, data from various 
sources can be efficiently imported into Doris for analysis. For details of 
best practices, please refer to [Import 
Overview](../../../data-operate/import/load-manual.md). 
\ No newline at end of file
diff --git 
a/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md
 
b/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md
index 6665ddc1949..5b2db7e426d 100644
--- 
a/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md
+++ 
b/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md
@@ -24,202 +24,138 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Principle
+## Overview
 
-The aync-materialized view adopts a transparent rewriting algorithm based on 
the SPJG (SELECT-PROJECT-JOIN-GROUP-BY) model. This algorithm can analyze the 
structural information of query SQL, automatically find suitable materialized 
views, and attempt transparent rewriting to utilize the optimal materialized 
view to express the query SQL. By using precomputed materialized view results, 
it can significantly improve query performance and reduce computational costs.
+The [multi-table materialized 
view](../../materialized-view/async-materialized-view/overview.md) adopts a 
transparent rewriting algorithm based on the SPJG 
(SELECT-PROJECT-JOIN-GROUP-BY) pattern. This algorithm can analyze the 
structural information of the query SQL, automatically find the appropriate 
materialized views, and attempt to perform transparent rewriting to express the 
query SQL using the optimal materialized views. By using the pre-computed 
results of materialized views, the  [...]
 
-## Tuning Case
+## Case
 
-Next, through an example, we will demonstrate in detail how to use 
aync-materialized views to optimize queries. This example covers a series of 
operations including the creation, metadata viewing, data refreshing, task 
management, modification, and deletion of materialized views.
+Next, an example will be used to demonstrate in detail how to utilize 
multi-table materialized views to accelerate queries.
 
-### 1 Creation of Base Tables and Data Insertion
+### Create Base Tables
 
-First, create two tables, `orders` and `lineitem`, in the tpch database, and 
insert the corresponding data.
+Firstly, create the tpch database and then create two tables, namely `orders` 
and `lineitem`, within it, and insert corresponding data.
 
 ```sql
 CREATE DATABASE IF NOT EXISTS tpch;
-
-USE tpch;  
-  
--- Create the orders table  
-CREATE TABLE IF NOT EXISTS orders (  
-    o_orderkey       integer not null,  
-    o_custkey        integer not null,  
-    o_orderstatus    char(1) not null,  
-    o_totalprice     decimalv3(15,2) not null,  
-    o_orderdate      date not null,  
-    o_orderpriority  char(15) not null,  
-    o_clerk          char(15) not null,  
-    o_shippriority   integer not null,  
-    o_comment        varchar(79) not null  
-)  
-DUPLICATE KEY(o_orderkey, o_custkey)  
-PARTITION BY RANGE(o_orderdate)(  
-    FROM ('2023-10-17') TO ('2023-10-20') INTERVAL 1 DAY  
-)  
-DISTRIBUTED BY HASH(o_orderkey) BUCKETS 3  
-PROPERTIES ("replication_num" = "1");  
-  
--- Insert data into the orders table  
-INSERT INTO orders VALUES  
-    (1, 1, 'o', 99.5, '2023-10-17', 'a', 'b', 1, 'yy'),  
-    (2, 2, 'o', 109.2, '2023-10-18', 'c','d',2, 'mm'),  
-    (3, 3, 'o', 99.5, '2023-10-19', 'a', 'b', 1, 'yy');  
-  
--- Create the lineitem table  
-CREATE TABLE IF NOT EXISTS lineitem (  
-    l_orderkey    integer not null,  
-    l_partkey     integer not null,  
-    l_suppkey     integer not null,  
-    l_linenumber  integer not null,  
-    l_quantity    decimalv3(15,2) not null,  
-    l_extendedprice  decimalv3(15,2) not null,  
-    l_discount    decimalv3(15,2) not null,  
-    l_tax         decimalv3(15,2) not null,  
-    l_returnflag  char(1) not null,  
-    l_linestatus  char(1) not null,  
-    l_shipdate    date not null,  
-    l_commitdate  date not null,  
-    l_receiptdate date not null,  
-    l_shipinstruct char(25) not null,  
-    l_shipmode     char(10) not null,  
-    l_comment      varchar(44) not null  
-)  
-DUPLICATE KEY(l_orderkey, l_partkey, l_suppkey, l_linenumber)  
-PARTITION BY RANGE(l_shipdate)  
-(FROM ('2023-10-17') TO ('2023-10-20') INTERVAL 1 DAY)  
-DISTRIBUTED BY HASH(l_orderkey) BUCKETS 3  
-PROPERTIES ("replication_num" = "1");  
-  
--- Insert data into the lineitem table  
-INSERT INTO lineitem VALUES  
-    (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-17', '2023-10-17', 
'2023-10-17', 'a', 'b', 'yyyyyyyyy'),  
-    (2, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-18', '2023-10-18', 
'2023-10-18', 'a', 'b', 'yyyyyyyyy'),  
+USE tpch;
+
+CREATE TABLE IF NOT EXISTS orders (
+    o_orderkey       integer not null,
+    o_custkey        integer not null,
+    o_orderstatus    char(1) not null,
+    o_totalprice     decimalv3(15,2) not null,
+    o_orderdate      date not null,
+    o_orderpriority  char(15) not null,
+    o_clerk          char(15) not null,
+    o_shippriority   integer not null,
+    o_comment        varchar(79) not null
+)
+DUPLICATE KEY(o_orderkey, o_custkey)
+PARTITION BY RANGE(o_orderdate)(
+    FROM ('2023-10-17') TO ('2023-10-20') INTERVAL 1 DAY
+)
+DISTRIBUTED BY HASH(o_orderkey) BUCKETS 3
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO orders VALUES
+    (1, 1, 'o', 99.5, '2023-10-17', 'a', 'b', 1, 'yy'),
+    (2, 2, 'o', 109.2, '2023-10-18', 'c','d',2, 'mm'),
+    (3, 3, 'o', 99.5, '2023-10-19', 'a', 'b', 1, 'yy');
+
+CREATE TABLE IF NOT EXISTS lineitem (
+    l_orderkey    integer not null,
+    l_partkey     integer not null,
+    l_suppkey     integer not null,
+    l_linenumber  integer not null,
+    l_quantity    decimalv3(15,2) not null,
+    l_extendedprice  decimalv3(15,2) not null,
+    l_discount    decimalv3(15,2) not null,
+    l_tax         decimalv3(15,2) not null,
+    l_returnflag  char(1) not null,
+    l_linestatus  char(1) not null,
+    l_shipdate    date not null,
+    l_commitdate  date not null,
+    l_receiptdate date not null,
+    l_shipinstruct char(25) not null,
+    l_shipmode     char(10) not null,
+    l_comment      varchar(44) not null
+)
+DUPLICATE KEY(l_orderkey, l_partkey, l_suppkey, l_linenumber)
+PARTITION BY RANGE(l_shipdate)
+(FROM ('2023-10-17') TO ('2023-10-20') INTERVAL 1 DAY)
+DISTRIBUTED BY HASH(l_orderkey) BUCKETS 3
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO lineitem VALUES
+    (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-17', '2023-10-17', 
'2023-10-17', 'a', 'b', 'yyyyyyyyy'),
+    (2, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-18', '2023-10-18', 
'2023-10-18', 'a', 'b', 'yyyyyyyyy'),
     (3, 2, 3, 6, 7.5, 8.5, 9.5, 10.5, 'k', 'o', '2023-10-19', '2023-10-19', 
'2023-10-19', 'c', 'd', 'xxxxxxxxx');
 ```
 
-### 2 Creation of Async-Materialized View
+### Create an Asynchronous Materialized View
 
-Next, create an asynchronous materialized view `mv1`.
+Based on several original tables in the tpch benchmark, create an asynchronous 
materialized view named `mv1`.
 
 ```sql
 CREATE MATERIALIZED VIEW mv1   
-BUILD DEFERRED REFRESH AUTO ON MANUAL  
+BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
 PARTITION BY(l_shipdate)  
 DISTRIBUTED BY RANDOM BUCKETS 2  
 PROPERTIES ('replication_num' = '1')   
-AS   
-SELECT l_shipdate, o_orderdate, l_partkey, l_suppkey, SUM(o_totalprice) AS 
sum_total  
-FROM lineitem  
-LEFT JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate = 
o_orderdate  
-GROUP BY  
-l_shipdate,  
-o_orderdate,  
-l_partkey,  
+       @@ -119,107 +114,55 @@ l_partkey,
 l_suppkey;
 ```
 
-### 3 Viewing Materialized View Metadata
-
-```sql
-SELECT * FROM mv_infos("database"="tpch") WHERE Name="mv1";
-```
-
-### 4 Refreshing the Materialized View
-
-First, view the partition list:
-
-```sql
-SHOW PARTITIONS FROM mv1;
-```
-
-Then refresh a specific partition:
-
-```sql
-REFRESH MATERIALIZED VIEW mv1 PARTITIONS(p_20231017_20231018);
-```
-
-### 5 Task Management
-
-Manage jobs for materialized views, including viewing jobs, pausing scheduled 
tasks, resuming scheduled tasks, and viewing and canceling tasks.
-
-- View materialized view jobs
-  
-    ```sql
-    SELECT * FROM jobs("type"="mv") ORDER BY CreateTime;
-    ```
-
-- Pause scheduled tasks for materialized views
-  
-    ```sql
-    PAUSE MATERIALIZED VIEW JOB ON mv1;
-    ```
-- Resume scheduled tasks for materialized views
-  
-    ```sql
-    RESUME MATERIALIZED VIEW JOB ON mv1;
-    ```
-
-- View materialized view tasks
-  
-    ```sql
-    SELECT * FROM tasks("type"="mv");
-    ```
-
-- Cancel a materialized view task: assuming `realTaskId` is 123
-
-    ```sql
-    CANCEL MATERIALIZED VIEW TASK 123 ON mv1;
-    ```
-
-### 6 Modifying the Materialized View
+### Use the Materialized View for Transparent Rewriting
 
 ```sql
-ALTER MATERIALIZED VIEW mv1 SET("grace_period"="3333");
+mysql> explain shape plan SELECT l_shipdate, SUM(o_totalprice) AS total_price
+    -> FROM lineitem
+    -> LEFT JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND 
l_shipdate = o_orderdate
+    -> WHERE l_partkey = 2 AND l_suppkey = 3
+    -> GROUP BY l_shipdate;
++-------------------------------------------------------------------+
+| Explain String(Nereids Planner)                                   |
++-------------------------------------------------------------------+
+| PhysicalResultSink                                                |
+| --PhysicalDistribute[DistributionSpecGather]                      |
+| ----PhysicalProject                                               |
+| ------hashAgg[GLOBAL]                                             |
+| --------PhysicalDistribute[DistributionSpecHash]                  |
+| ----------hashAgg[LOCAL]                                          |
+| ------------PhysicalProject                                       |
+| --------------filter((mv1.l_partkey = 2) and (mv1.l_suppkey = 3)) |
+| ----------------PhysicalOlapScan[mv1]                             |
++-------------------------------------------------------------------+
 ```
 
-### 7 Deleting the Materialized View
+It can be seen from the `explain shape plan` that the plan after being 
transparently rewritten by `mv1` has already hit `mv1`. You can also use 
`explain` to view the current state of the plan after being rewritten by the 
materialized view, including whether it has hit and which materialized view has 
been hit, etc., as shown below:
 
 ```sql
-DROP MATERIALIZED VIEW mv1;
+| ========== MATERIALIZATIONS ==========                                       
     |
+|                                                                              
     |
+| MaterializedView                                                             
     |
+| MaterializedViewRewriteSuccessAndChose:                                      
     |
+|   internal.tpch.mv1 chose,                                                   
     |
+|                                                                              
     |
+| MaterializedViewRewriteSuccessButNotChose:                                   
     |
+|   not chose: none,                                                           
     |
+|                                                                              
     |
+| MaterializedViewRewriteFail:                                                 
     |
 ```
 
-### 8 Querying Using the Materialized View
-
-- Direct query
-
-    ```sql
-    SELECT l_shipdate, sum_total 
-    FROM mv1 
-    WHERE l_partkey = 2 AND l_suppkey = 3;
-    ```
-
-- Query through transparent rewriting (the query optimizer automatically uses 
the materialized view)
-
-    ```sql
-    SELECT l_shipdate, SUM(o_totalprice) AS total_price
-    FROM lineitem
-    LEFT JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate 
= o_orderdate
-    WHERE l_partkey = 2 AND l_suppkey = 3
-    GROUP BY l_shipdate;
-    ```
-
-The above example fully demonstrates the lifecycle of an asynchronous 
materialized view, including creation, management, usage, and deletion.
-
 ## Summary
 
-By utilizing materialized views, query performance can be significantly 
enhanced, particularly for complex aggregated queries. Several considerations 
should be kept in mind when using them:
-
-1. Precomputed Results: Materialized views precompute and store query results, 
avoiding the overhead of repeated computations for each query. This is 
especially effective for complex queries that need to be executed frequently.
-
-2. Reduction of Join Operations: Materialized views can consolidate data from 
multiple tables into a single view, reducing the need for join operations 
during queries and thereby improving query efficiency.
-
-3. Automatic Updates: When the data in the base tables changes, materialized 
views can be automatically updated to maintain data consistency. This ensures 
that query results always reflect the latest data state.
-
-4. Storage Overhead: Materialized views require additional storage space to 
save precomputed results. When creating materialized views, a trade-off between 
query performance improvement and storage space consumption needs to be 
considered.
-
-5. Maintenance Cost: The maintenance of materialized views requires certain 
system resources and time. Frequently updated base tables may result in higher 
update overhead for materialized views. Therefore, it is necessary to select an 
appropriate refresh strategy based on actual conditions.
+By using multi-table materialized views, the query performance can be 
significantly improved, especially for complex join and aggregation queries. 
When using them, the following points need to be noted:
 
-6. Use Cases: Materialized views are suitable for scenarios where data changes 
infrequently but query frequency is high. For data that changes frequently, 
real-time computation may be more appropriate.
+:::tip Usage Suggestions
+- Pre-computed Results: Materialized views pre-compute and store the query 
results, avoiding the overhead of repeated computations for each query. This is 
especially effective for complex queries that need to be executed frequently.
+- Reducing Join Operations: Materialized views can combine the data of 
multiple tables into one view, reducing the join operations during queries and 
thus improving query efficiency.
+- Automatic Updates: When the data in the base tables changes, materialized 
views can be updated automatically to maintain data consistency. This ensures 
that the query results always reflect the latest data status.
+- Space Overhead: Materialized views require additional storage space to save 
the pre-computed results. When creating materialized views, it is necessary to 
balance the improvement in query performance and the consumption of storage 
space.
+- Maintenance Cost: The maintenance of materialized views requires certain 
system resources and time. Base tables that are updated frequently may lead to 
relatively high update overheads for materialized views. Therefore, it is 
necessary to choose an appropriate refresh strategy according to the actual 
situation.
+- Applicable Scenarios: Materialized views are suitable for scenarios where 
the data change frequency is low and the query frequency is high. For 
frequently changing data, real-time computation may be more appropriate.
+  :::
 
-The rational use of materialized views can significantly improve database 
query performance, especially in the case of complex queries and large data 
volumes. At the same time, it is also necessary to comprehensively consider 
factors such as storage and maintenance to achieve a balance between 
performance and cost.
\ No newline at end of file
+Reasonable utilization of multi-table materialized views can significantly 
improve the query performance of the database, especially in the case of 
complex queries and large data volumes. Meanwhile, factors such as storage and 
maintenance also need to be considered comprehensively to achieve a balance 
between performance and cost. 
\ No newline at end of file
diff --git 
a/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md
 
b/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md
index 564471a2772..4cba281c363 100644
--- 
a/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md
+++ 
b/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md
@@ -1,7 +1,7 @@
 ---
 {
-    "title": "Transparent Rewriting with Sync-Materialized View",
-    "language": "en"
+  "title": "Transparent Rewriting with Single-table Materialized View",
+  "language": "en"
 }
 ---
 
@@ -24,73 +24,72 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Principle
+## Overview
 
-A sync-materialized view is a special type of table that precomputes and 
stores data based on a predefined SELECT statement. Its primary purpose is to 
satisfy users' analytical needs for arbitrary dimensions of raw detailed data 
while also enabling rapid fixed-dimension analytical queries.
+The [Materialized View](../../materialized-view/sync-materialized-view.md) is 
a special kind of table that pre-computes and stores data according to a 
predefined SELECT statement. Its main purpose is to meet users' needs for 
analyzing raw detailed data from any dimension and also enable quick analysis 
and queries on fixed dimensions.
 
-Materialized views are suitable for the following scenarios:
+The applicable scenarios for materialized views are as follows:
 
-1. Analytical requirements cover both detailed data queries and 
fixed-dimension queries.
+1. The analysis requirements cover both detailed data queries and 
fixed-dimension queries.
+2. The queries only involve a small number of columns or rows in the table.
+3. The queries include time-consuming processing operations, such as 
long-duration aggregation operations, etc.
+4. The queries need to match different prefix indexes.
 
-2. Queries only involve a small subset of columns or rows in the table.
+For queries that frequently and repeatedly use the results of the same 
subqueries, single-table synchronous materialized views can significantly 
improve performance. Doris will automatically maintain the data of materialized 
views to ensure data consistency between the base table and the materialized 
view table, without requiring additional manual maintenance costs. During a 
query, the system will automatically match the optimal materialized view and 
directly read data from it.
 
-3. Queries contain time-consuming processing operations, such as long 
aggregation operations.
+:::tip Precautions
+- In Doris 2.0 and subsequent versions, materialized views have some enhanced 
functions. It is recommended that users confirm in the test environment whether 
the expected queries can hit the materialized views they want to create before 
using materialized views in the formal production environment.
+- It is not recommended to create multiple materialized views with similar 
forms on the same table, as this may lead to conflicts among multiple 
materialized views and thus cause query hit failures.
+  :::
 
-4. Queries require matching different prefix indexes.
+## Case
 
-For queries that frequently reuse the same subquery results, a 
sync-materialized view can significantly enhance performance. Doris 
automatically maintains the data in the materialized view, ensuring data 
consistency between the base table and the materialized view without additional 
manual maintenance costs. During queries, the system automatically matches the 
optimal materialized view and reads data directly from it.
+The following uses a specific example to demonstrate the process of using 
single-table materialized views to accelerate queries:
 
-## Tuning Usage Case
+Suppose we have a detailed sales record table named `sales_records`, which 
records various pieces of information for each transaction in detail, including 
the transaction ID, salesperson ID, selling store ID, sales date, and 
transaction amount. Now, we often need to conduct analysis and queries on the 
sales volume of different stores.
 
-The following is a specific example to illustrate the use of single-table 
materialized views:
-
-Suppose we have a detailed sales record table `sales_records` that records 
various information for each transaction, including transaction ID, salesperson 
ID, store ID, sales date, and transaction amount. Now, we frequently need to 
perform analytical queries on sales volumes for different stores.
-
-To optimize the performance of these queries, we can create a materialized 
view `store_amt` that groups by store ID and sums the sales amounts for the 
same store. The specific steps are as follows:
+To optimize the performance of these queries, we can create a materialized 
view named `store_amt`. This view groups by the selling store and sums up the 
sales amounts of the same store. The specific steps are as follows:
 
 ### Create a Materialized View
 
-First, we use the following SQL statement to create the materialized view 
`store_amt`:
+Firstly, we use the following SQL statement to create the materialized view 
`store_amt`:
 
 ```sql
 CREATE MATERIALIZED VIEW store_amt AS 
-SELECT store_id, SUM(sale_amt) 
-FROM sales_records
+       @@ -59,38 +61,36 @@ FROM sales_records
 GROUP BY store_id;
 ```
 
-After submitting the creation task, Doris will asynchronously build this 
materialized view in the background. We can view the creation progress of the 
materialized view through the following command:
+After submitting the creation task, Doris will build this materialized view 
asynchronously in the background. We can use the following command to check the 
creation progress of the materialized view:
 
 ```sql
-SHOW ALTER TABLE MATERIALIZED VIEW FROM db_name; 
+SHOW ALTER TABLE MATERIALIZED VIEW FROM db_name;
 ```
 
-When the `State` field changes to `FINISHED`, it indicates that the 
`store_amt` materialized view has been successfully created.
-
-### Query Data
+When the `State` field becomes `FINISHED`, it means that the `store_amt` 
materialized view has been successfully created.
 
-After the materialized view is created, when we query the sales volumes of 
different stores, Doris will automatically match the `store_amt` materialized 
view and read the pre-aggregated data directly from it, significantly improving 
query efficiency.
+### Transparent Rewriting
 
-The query statement is as follows:
+After the materialized view is created, when we query the sales volume of 
different stores, Doris will automatically match the `store_amt` materialized 
view and directly read the pre-aggregated data from it, thus significantly 
improving the query efficiency. The query statement is as follows:
 
 ```sql
 SELECT store_id, SUM(sale_amt) FROM sales_records GROUP BY store_id;
 ```
 
-We can also use the `EXPLAIN` command to check whether the query successfully 
hits the materialized view:
+We can also use the `EXPLAIN` command to check whether the query has 
successfully hit the materialized view:
 
 ```sql
 EXPLAIN SELECT store_id, SUM(sale_amt) FROM sales_records GROUP BY store_id;
 ```
 
-At the end of the execution plan, if similar content is displayed, it 
indicates that the query successfully hits the `store_amt` materialized view:
+At the end of the execution plan, if something like the following is 
displayed, it means that the query has successfully hit the `store_amt` 
materialized view:
 
 ```sql
 TABLE: default_cluster:test.sales_records(store_amt), PREAGGREGATION: ON
 ```
 
-By following these steps, we can utilize single-table materialized views to 
optimize query performance and improve the efficiency of data analysis.
+Through the above steps, we can use single-table materialized views to 
optimize query performance and improve the efficiency of data analysis.
 
 ## Summary
 
-By creating materialized views, we can significantly enhance the query speed 
for related aggregation analyses. Materialized views not only enable us to 
perform statistical analyses quickly but also flexibly support the query 
requirements of detailed data, making them a very powerful feature in Doris.
\ No newline at end of file
+By creating single-table materialized views, we can significantly improve the 
query speed for relevant aggregation analysis. Single-table materialized views 
not only enable us to conduct statistical analysis quickly but also flexibly 
support the query needs for detailed data, which is a very powerful feature in 
Doris. 
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris-website) branch master updated: [Improvement](docs) refine query acceleration doc (#1610)

Reply via email to