[doris-website] branch master updated: [doc] Add blog 'Doris analysis: Doris SQL principle analysis' (#69)

jiafengzheng Fri, 26 Aug 2022 08:15:35 -0700

This is an automated email from the ASF dual-hosted git repository.

jiafengzheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 34a18e04cdd [doc] Add blog 'Doris analysis: Doris SQL principle 
analysis'  (#69)
34a18e04cdd is described below

commit 34a18e04cdd925c31f32d73d69667ddbaf66f565
Author: ZHbamboo <89990773+zhbam...@users.noreply.github.com>
AuthorDate: Fri Aug 26 23:14:49 2022 +0800

    [doc] Add blog 'Doris analysis: Doris SQL principle analysis'  (#69)
    
    * doc: Add blog 'principle of Doris SQL parsing' in both Chinese and 
English language with pictures.
---
 blog/principle-of-Doris-SQL-parsing.md             | 314 +++++++++++++++++++++
 .../principle-of-Doris-SQL-parsing.md              | 311 ++++++++++++++++++++
 .../Figure_10_cn.png                               | Bin 0 -> 31184 bytes
 .../Figure_10_en.png                               | Bin 0 -> 31184 bytes
 .../Figure_11_cn.png                               | Bin 0 -> 101767 bytes
 .../Figure_11_en.png                               | Bin 0 -> 179724 bytes
 .../Figure_12_cn.png                               | Bin 0 -> 164271 bytes
 .../Figure_12_en.png                               | Bin 0 -> 164271 bytes
 .../Figure_13_cn.png                               | Bin 0 -> 37752 bytes
 .../Figure_13_en.png                               | Bin 0 -> 37752 bytes
 .../Figure_14_cn.png                               | Bin 0 -> 83197 bytes
 .../Figure_14_en.png                               | Bin 0 -> 83197 bytes
 .../principle-of-Doris-SQL-parsing/Figure_1_cn.png | Bin 0 -> 46479 bytes
 .../principle-of-Doris-SQL-parsing/Figure_1_en.png | Bin 0 -> 51277 bytes
 .../principle-of-Doris-SQL-parsing/Figure_2_cn.png | Bin 0 -> 34320 bytes
 .../principle-of-Doris-SQL-parsing/Figure_2_en.png | Bin 0 -> 35451 bytes
 .../principle-of-Doris-SQL-parsing/Figure_3_cn.png | Bin 0 -> 37183 bytes
 .../principle-of-Doris-SQL-parsing/Figure_3_en.png | Bin 0 -> 37183 bytes
 .../principle-of-Doris-SQL-parsing/Figure_4_cn.png | Bin 0 -> 130453 bytes
 .../principle-of-Doris-SQL-parsing/Figure_4_en.png | Bin 0 -> 214843 bytes
 .../principle-of-Doris-SQL-parsing/Figure_5_cn.png | Bin 0 -> 113584 bytes
 .../principle-of-Doris-SQL-parsing/Figure_5_en.png | Bin 0 -> 181047 bytes
 .../principle-of-Doris-SQL-parsing/Figure_6_cn.png | Bin 0 -> 79061 bytes
 .../principle-of-Doris-SQL-parsing/Figure_6_en.png | Bin 0 -> 79061 bytes
 .../principle-of-Doris-SQL-parsing/Figure_7_cn.png | Bin 0 -> 71241 bytes
 .../principle-of-Doris-SQL-parsing/Figure_7_en.png | Bin 0 -> 71241 bytes
 .../principle-of-Doris-SQL-parsing/Figure_8_cn.png | Bin 0 -> 51214 bytes
 .../principle-of-Doris-SQL-parsing/Figure_8_en.png | Bin 0 -> 51214 bytes
 .../principle-of-Doris-SQL-parsing/Figure_9_cn.png | Bin 0 -> 81455 bytes
 .../principle-of-Doris-SQL-parsing/Figure_9_en.png | Bin 0 -> 81455 bytes
 30 files changed, 625 insertions(+)

diff --git a/blog/principle-of-Doris-SQL-parsing.md 
b/blog/principle-of-Doris-SQL-parsing.md
new file mode 100644
index 00000000000..cacadfe20bc
--- /dev/null
+++ b/blog/principle-of-Doris-SQL-parsing.md
@@ -0,0 +1,314 @@
+---
+{
+'title': 'Doris analysis: Doris SQL principle analysis',
+'summary': "This article mainly introduces the principle of Doris SQL 
parsing.Since there are many types of SQL, this article focuses on the analysis 
of query SQL. Doris's SQL analysis will be explained deeply in the algorithm 
principle and code implementation.",
+'date': '2022-08-25',
+'author': 'Apache Doris',
+'tags': ['Tech Sharing'],
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+**Lead：**
+This article mainly introduces the principle of Doris SQL parsing.
+
+It focuses on generating a single-machine logical plan, developing a 
distributed logical plan, and generating a distributed physical plan. Analyze, 
SinglePlan, DistributedPlan, and Schedule four parts correspond to the code 
implementation.
+
+First, AST will be processed preliminary by Analyze and then optimized by 
SinglePlan to generate a single-machine query plan. Third, DistributedPlan will 
split the single-machine query plan into distributed query plans. In the end, 
the query plan will be sent to machines and executed orderly, which decide by 
Schedule.
+
+Since there are many types of SQL, this article focuses on the analysis of 
query SQL. Doris's SQL analysis will be explained deeply in the algorithm 
principle and code implementation.
+
+# 1 Introduction to Doris
+Doris is an interactive SQL database based on MPP architecture, mainly used to 
solve near real-time reports and multi-dimensional analysis. The Doris 
architecture is straightforward, with only two types of processes.
+
+- Frontend（FE）: It is mainly responsible for user request access, query 
parsing and planning, storage and management of metadata, and node 
management-related work.
+
+- Backend（BE）: It is mainly responsible for data storage and query plan 
execution.
+
+In Doris' storage engine, data will be horizontally divided into several data 
shards (Tablet, also called data bucket). Each tablet contains several rows of 
data. Multiple Tablets belong to different partitions logically. A Tablet only 
belongs to one Partition. And a Partition contains several Tablets. Tablet is 
the smallest physical storage unit for operations such as data movement, 
copying, etc.
+
+# 2 SQL parsing In Apache Doris
+SQL parsing in this article refers to **the process of generating a complete 
physical execution plan after a series of parsing of an SQL statement**.
+
+This process includes the following four steps: lexical analysis, syntax 
analysis, generating a logical plan, and generating a physical plan.
+
+<div align=center>
+<img alt="Figure1 The process of SQL parsing" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_1_en.png"/>
+</div>
+ <p align="center">Figure1 The process of SQL parsing</p>
+
+## 2.1 Lexical analysis
+The lexical analysis will identify the SQL in the form of a string into 
tokens, in preparation for the grammatical analysis.
+```undefined
+select ......  from ...... where ....... group by ..... order by ......
+
+SQL Tokens could be divided into the following categories:
+￮ Keywords (select, from, where)
+￮ operator (+, -, >=)
+￮ Open/close flag ((, CASE)
+￮ placeholder (?)
+￮ Comments
+￮ space
+......
+```
+## 2.2 Syntax analysis
+The syntax analysis will convert the token generated by the lexical analysis 
into an abstract syntax tree based on the syntax rules, as shown in Figure 2.
+
+<div align=center>
+<img alt=">Figure2 An example of an abstract syntax tree" width="60%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_2_en.png"/>
+</div>
+<p align="center">Figure2 An example of an abstract syntax tree</p>
+
+## 2.3 Logical plan
+The logical plan converts the abstract syntax tree into an algebraic relation, 
which is an operator tree, and each node represents a calculation method for 
data. The entire tree represents the calculation method and flows direction of 
data, as shown in Figure 3.
+<div align=center>
+<img alt="Figure3 Relational algebra example" width="20%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_3_en.png"/>
+</div>
+ <p align="center">Figure3 Relational algebra example</p>
+
+## 2.4 Physical plan
+The physical plan is the plan that determines which computing operations are 
performed on which machines. It will be generated based on the logical plan, 
the distribution of machines, and the distribution of data.
+
+The SQL parsing of the Doris system also adopts these steps, but it is refined 
and optimized according to the characteristics of the Doris system structure 
and the storage method of data to maximize the computing power of the machine.
+
+# 3 Design goals
+The design goals of the Doris SQL parsing architecture are:
+
+1. Maximize Computational Parallelism
+
+2. Minimize network transfer of data
+
+3. Minimize the amount of data that needs to be scanned
+
+# 4 Architecture
+Doris SQL parsing includes five steps: lexical analysis, syntax analysis, 
generation of a stand-alone logical plan, generation of a distributed logical 
plan, and generation of a physical execution plan.
+
+In terms of code implementation, it corresponds to the following five steps: 
Parse, Analyze, SinglePlan, DistributedPlan, and Schedule, which as shown in 
Figure 4.
+
+<div align=center>
+<img alt="Figure 4 System Architecture Diagram" width="40%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_4_en.png"/>
+</div>
+ <p align="center">Figure 4 System Architecture Diagram</p>
+
+The Parse phase will not be discussed in this article. Analyze will do some 
pre-processing of the AST. A stand-alone query plan will be optimized by 
SinglePlan based on the AST. DistributedPlan will split the stand-alone query 
plan into distributed query plans. Schedule phase will determine which machines 
the query plan will be sent to for execution.
+
+**Since there are many types of SQL, this article focuses on the analysis of 
query SQL.**
+
+Figure 5 shows a simple query SQL parsing implementation in Doris.
+
+
+<div align=center>
+<img alt="Figure5 The parsing process of query sql in Doris" width="50%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_5_en.png"/>
+</div>
+ <p align="center">Figure5 The parsing process of query sql in Doris</p>
+
+# 5 Parse Phase
+In the Parse stage, JFlex technology is used for lexical analysis, java cup 
parser technology is used for syntax analysis, and an AST（Abstract Syntax 
Tree）will finally generate. These are existing and mature technologies and will 
not be introduced in detail here.
+
+AST has a tree-like structure, which represents a piece of SQL. Therefore, 
different types of queries -- select, insert, show, set, alter table, create 
table, etc. will generate additional data structures after Parse (SelectStmt, 
InsertStmt, ShowStmt, SetStmt, AlterStmt, AlterTableStmt, CreateTableStmt, 
etc.). However, they all inherit from Statements and will perform some specific 
processing according to their own grammar rules. For example: for select type 
SQL, the SelectStmt structure [...]
+
+SelectStmt structure contains SelectList, FromClause, WhereClause, 
GroupByClause, SortInfo and other structures. These structures contain more 
basic data structures. For Example, WhereClause contains BetweenPredicate, 
BinaryPredicate, CompoundPredicate, InPredicate, and so on.
+
+All structures in AST are composed of basic structure expressions--Expr by 
using various combinations, as shown in Figure 6.
+
+<div align=center>
+<img alt="Figure6 Implementation of Abstract Syntax Tree AST in Doris" 
width="60%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_6_en.png"/>
+</div>
+ <p align="center">Figure6 Implementation of Abstract Syntax Tree AST in 
Doris</p>
+
+# 6 Analyze Phase
+Analyze will perform pre-processing and semantic analysis on the abstract 
syntax tree AST generated in the Parse phase, preparing for the generation of 
stand-alone logic plans.
+
+The abstract class StatementBase represents the abstract syntax tree. This 
abstract class contains a most crucial member function--analyze(), which is 
used to perform what's needed to do in Analyze phase.
+
+Different types of queries (select, insert, show, set, alter table, create 
table, etc.) will generate different data structures through the Parse 
stage(SelectStmt, InsertStmt, ShowStmt, SetStmt, AlterStmt, AlterTableStmt, 
CreateTableStmt, etc.), these data structures inherit From StatementBase, and 
perform a specific Analysis on a specific type sof SQL by implementing the 
analyze() function.
+
+For example, a query of select type will be converted into analyze() of the 
sub-statements SelectList, FromClause, GroupByClause, HavingClause, 
WhereClause, SortInfo, etc. of select SQL. Then these sub-statements further 
analyze() their sub-structures, and various scenarios of various types of SQL 
are analyzed by layer-by-layer iteration. For example, WhereClause will further 
explore the BetweenPredicate, BinaryPredicate, CompoundPredicate, InPredicate, 
etc., which it contains.
+
+**For query type SQL, Analyze will performs several important steps:**
+
+- **Metadata identification and parsing**： Identify and parse metadata such as 
Cluster, Database, Table, Column, etc. involved in SQL, and determine which 
columns, tables, databases, and clusters need to be calculated.
+
+- **SQL correctness check**：such as the window function cannot DISTINCT, 
whether the projection column is ambiguous, the where statement cannot contain 
grouping operations, etc.
+
+- **Rewrite SQL simply**：for example, expand select * to select all columns, 
convert count distinct to bitmap or hll function, etc.
+
+- **Function correctness check**：Check whether the functions contained in SQL 
are consistent with the system-defined procedures, including parameter types, 
number of parameters, etc.
+
+- **Aliasing for Table and Column.**
+
+- **Type checking and conversion**： For example, when the types on both sides 
of a binary expression are inconsistent, one of the types needs to be converted 
(with BIGINT and DECIMAL, the BIGINT type needs to be cast to DECIMAL).
+
+After analyzing the AST, a rewrite operation will be performed again to 
simplify or convert it into a unified processing method. A present rewrite 
algorithm is a rule-based approach. It will rewrite the AST with each rule from 
bottom to top, based on the tree structure of the AST. If the AST changes after 
rewriting, analysis and rewrite will start again until there is no change in 
the AST.
+
+For example: simplification of constant expressions: 1 + 1 + 1 is rewritten as 
3, 1 > 2 is rewritten as Flase, etc. Convert some statements into a unified 
processing method, such as rewriting where in, where exists as semi join, where 
not in, where not exists as anti join.
+
+# 7 Generate stand-alone logical Plan phase
+At this stage, algebraic relations will be generated according to the AST 
abstract syntax tree, also known as the operator number. Each node on the tree 
is an operator, representing an operation.
+
+As shown in Figure 7, ScanNode represents scan and read operations on a table. 
HashJoinNode represents the join operation. A hash table of a small table will 
be constructed in memory, and the large table will be traversed to find the 
exact value of the join key. Project means the projection operation, which 
represents the column that needs to be output at the end. Figure 7 shows that 
only citycode column will output.
+
+<div align=center>
+<img alt="Figure7 Example of a stand-alone logical plan" width="50%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_7_en.png"/>
+</div>
+ <p align="center">Figure7 Example of a stand-alone logical plan</p>
+
+Without optimization, the generated relational algebra is very expensive to 
send to storage and execute.
+
+For query:
+```undefined
+select a.siteid, a.pv from table1 a join table2 b on a.siteid = b.siteid where 
a.citycode=122216 and b.username="test" order by a.pv limit 10
+```
+As shown in Figure 8, for unoptimized relational algebra, all columns need to 
be read out for a series of calculations. In the end, siteid and pv column are 
selected and output. A large amount of useless column data wastes computing 
resources.
+
+When Doris generates algebraic relations, a lot of optimizations are made: the 
projection columns and query conditions will be put into the scan operation as 
much as possible.
+
+<div align=center>
+<img alt="Figure8 Unoptimized relational algebra" width="20%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_8_en.png"/>
+</div>
+ <p align="center">Figure8 Unoptimized relational algebra</p>
+
+**Specifically, this phase mainly does the following tasks:**
+
+- **Slot materialization**：Determine the column that needs to be scanned and 
calculated for the expression. Such as aggregate function expressions and Group 
By words of aggregate nodes need to be materialized.
+
+- **Projection pushdown**：BE only scans the columns that must be read when 
Scanning.
+
+- **Predicate pushdown**：Push down the filter conditions to the Scan node as 
much as possible under the premise of semantically correct.
+
+- **Partition, bucket cutting**：According to the information in the filter 
conditions, determine which partitions and buckets of tablets need to be 
scanned.
+
+- **Join Reorder**：For Inner Join, Doris will adjust the order of the table 
according to the number of rows--put the large table in the front.
+
+- **Sort + Limit optimized to TopN**：For the order by the limit statement, it 
will be converted into TopN operation nodes, which is convenient for unified 
processing.
+
+- **MaterializedView selection**: The best-materialized view will be selected 
according to the columns required by the query, the columns for filtering, 
sorting and Join, the number of rows, the number of columns, and other factors.
+
+Figure 9 shows an example of optimization. The optimization of Doris is 
carried out in generating relational algebra. Generating one will optimize 
one.· Projection pushdown: BE only scans the columns that must be read when 
Scanning.
+
+<div align=center>
+<img alt="Figure9 The process of single-machine query plan optimization" 
width="100%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_9_en.png"/>
+</div>
+ <p align="center">Figure9 The process of single-machine query plan 
optimization</p>
+
+# 8 Generate Distributed Plan Phase
+
+After the single-machine PlanNode tree is generated, it needs to be split into 
a distributed PlanFragment tree (PlanFragment is used to represent an 
independent execution unit) according to the distributed environment. A table's 
data is distributed across multiple hosts could allow some computations to be 
parallelized.
+
+The primary purpose of this step is to maximize parallelism and data 
localization. The primary strategy is to split the nodes that can be executed 
in parallel and create a separate PlanFragment. ExchangeNodes will replace the 
split nodes to receive data. Finally, a DataSinkNode will be added to the split 
node to transmit the calculated data to the ExchangeNode for further processing.
+
+This step adopts a recursive method, traverses the entire PlanNode tree from 
bottom to top, and then creates a PlanFragment for each leaf node on the tree. 
If the parent node is encountered, splitting the child nodes that can be 
executed in parallel will be considered.
+
+For query operations, the join operation is the most common.
+
+**Doris currently supports four join algorithms:** broadcast join, hash 
partition join, colocate join, and bucket shuffle join.
+
+**broadcast join**：Send the small table to each machine where the large table 
is located and perform a hash join operation. When the amount of data scanned 
from a table is small, the cost of broadcast join will be calculated, and the 
method with the smallest cost will be selected by calculating and comparing the 
cost of hash partitions.
+
+**hash partition join**：When the data scanned from the two tables are both 
large, hash partition join is generally used. It traverses all the data in the 
table, calculates the hash value of the key, then modulizes the number of 
clusters, and whichever machine is selected, the data will be sent to this 
machine for hash join operation.
+
+**colocate join**：If the data distribution of the two tables is specified to 
be consistent when they are created, the colocate join algorithm will be used 
when the join key of the two tables is the same as the bucket key. Since the 
data distribution of the two tables is the same, the hash join operation is 
equivalent to a local process. It does not involve data transmission, which 
significantly improves query performance.
+
+**bucket shuffle join**：When the join key is a bucketing key, and only one 
partition is involved, the bucket shuffle join algorithm is preferred. Since 
bucketing itself represents a way of dividing data, it only needs to take the 
hash modulo of the number of buckets from the right table to the left table, so 
that only one copy of the data in the right table needs to be transmitted over 
the network, which greatly reduces the network of data transmission, as shown 
in Figure 10.
+
+<div align=center>
+<img alt="Figure10 Example of bucket shuffle join" width="40%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_10_en.png"/>
+</div>
+ <p align="center">Figure10 Example of bucket shuffle join</p>
+
+Figure 11 shows the core process of creating a distributed logical plan with a 
single-machine logical plan with HashJoinNode.
+
+- For PlanNodes, PlanFragments are created bottom-up.
+
+- If it is a ScanNode, PlanFragment will be created directly, and the 
RootPlanNode of the PlanFragment is this ScanNode.
+
+- If it is a HashJoinNode, the broadcastCost will be calculated at first, 
which could provide a reference for selecting boracast join or hash partition 
join.
+
+- Join algorithm will be chosen according to different conditions.
+
+- If colocate joins are used, since joins are all local, no splitting is 
required. Set the left child node of HashJoinNode as the RootPlanNode of 
leftFragment, and the right child node as the RootPlanNode of rightFragment, 
share a PlanFragment with leftFragment, and delete rightFragment.
+
+- If bucket shuffle join is used, data from the right table needs to be sent 
to the left table. So first create an ExchangeNode, set the left child node of 
HashJoinNode as the RootPlanNode of leftFragment, the right child node as this 
ExchangeNode, share a PlanFragment with leftFragment, and specify the 
destination of rightFragment data to be sent to this ExchangeNode.
+
+- If broadcast join is used, the data from the right table needs to be sent to 
the left table. So first create an ExchangeNode, set the left child node of 
HashJoinNode as the RootPlanNode of leftFragment, the right child node as this 
ExchangeNode, share a PlanFragment with leftFragment, and specify the 
destination of rightFragment data to be sent to this ExchangeNode.
+
+- If hash partition join is used, the data in the left table and the right 
table must be split, and both left and right nodes need to be split out to 
create left ExchangeNode and right ExchangeNode respectively. HashJoinNode 
specifies the left and right nodes as left ExchangeNode and right ExchangeNode. 
Create a PlanFragment separately and specify RootPlanNode as this HashJoinNode. 
Finally, specify the data sending destination of leftFragment and rightFragment 
as left ExchangeNode and ri [...]
+
+<div align=center>
+<img alt="Figure11 The core process of HashJoinNode creating a distributed 
logic plan" width="50%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_11_en.png"/>
+</div>
+ <p align="center">Figure11 The core process of HashJoinNode creating a 
distributed logic plan</p>
+
+Figure 12 is an example after the join operation of two tables is converted 
into a PlanFragment tree, there are 3 PlanFragments generated. The final output 
data passes through the ResultSinkNode node.
+
+<div align=center>
+<img alt="Figure12 From stand-alone plan to distributed plan" width="50%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_12_cn.png"/>
+</div>
+ <p align="center">Figure12 From stand-alone plan to distributed plan</p>
+
+# 9. Schedule phase
+
+This step is to create a distributed physical plan based on the distributed 
logical plan. will solve the following questions:
+
+- Which BE executes which PlanFragment
+
+- Which replica to chooes for each Tablet to query
+
+- How to perform multi-instance concurrency
+
+**Figure 13 shows the core process for creating a distributed physical plan:**
+
+**a. Prepare phase**：Create a FragmentExecParams structure for each 
PlanFragment to represent all the parameters required for PlanFragment 
execution; if a PlanFragment contains DataSinkNode, the destination 
PlanFragment for data transmission will be found, and specify the input of 
FragmentExecParams of the destination PlanFragment as FragmentExecParams of 
this PlanFragment.
+
+**b. computeScanRangeAssignment phase**：Different processing is performed for 
different types of joins.
+
+- computeScanRangeAssignmentByColocate: For colocate join processing, since 
the data distribution in the two table buckets of the join is the same, they 
are based on the bucket join operation, so here is to determine which host is 
selected for each bucket. When allocating buckets to hosts, try to ensure that 
the buckets allocated to each host are even.
+
+- computeScanRangeAssignmentByBucket: Processing for bucket shuffle join, 
which is only based on bucket operations, so here is to determine which host is 
selected for each bucket. When allocating buckets to hosts, it is also 
necessary to try to ensure that the buckets allocated to each host are even.
+
+- computeScanRangeAssignmentByScheduler: Process for other types of joins. 
Determines which replica of the tablet each scanNode reads. A scanNode will 
read multiple tablets, and each tablet has various copies. To distribute the 
scan operation on various machines as much as possible, improve concurrent 
performance, and reduce IO pressure, Doris uses the Round-Robin algorithm to 
distribute tablet scans to multiple machines as much as possible. For example, 
100 tablets need to be scanned, e [...]
+
+**c.computeFragmentExecParams phase**：This stage determines which BE the 
PlanFragment is issued to for execution and how to handle instance concurrency. 
After the scan address of each tablet is determined, FragmentExecParams will 
generate multiple instances with the address as the dimension. If various 
addresses are contained in FragmentExecParams, various instances of 
FInstanceExecParam will be generated. If the concurrency is set, the execution 
instance of an address will be further sp [...]
+
+**d. Create result receiver stage**：The resulting receiver is where the final 
data needs to be output after the query is completed.
+
+**e. to thrift stage**：Create RPC requests based on FInstanceExecParam of all 
PlanFragments, then send them to the BE side for execution. A complete SQL 
parsing process is completed.
+
+<div align=center>
+<img alt="Figure13 The core process of creating a distributed physical plan" 
width="60%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_13_en.png"/>
+</div>
+ <p align="center">Figure13 The core process of creating a distributed 
physical plan</p>
+
+Figure 14 is a simple example. The PlanFrament in the figure contains a 
ScanNode. The ScanNode scans three tablets. Each tablet has two copies, and the 
cluster assumes that there are two hosts.
+
+The computeScanRangeAssignment stage determines that replicas 1, 3, 5, 8, 10, 
and 12 need to be scanned, where replicas 1, 3, and 5 are located on host1, and 
replicas 8, 10, and 12 are located on host2.
+
+If the global concurrency is set to 1, 2 instances of FInstanceExecParam are 
created and sent to host1 and host2 for execution. If the global concurrency is 
set to 3, 3 instances of FInstanceExecParam are created on this host1, and 
three instances of FInstanceExecParam are created on host2. Each instance scans 
one replica, equivalent to initiating 6 RPC requests.
+
+<div align=center>
+<img alt="Figure14 Process of generating a physical plan" width="60%" 
src="../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_14_en.png"/>
+</div>
+ <p align="center">Figure14 Process of generating a physical plan</p>
+
+# 10 Summary
+This article first briefly introduces Doris and then introduces the general 
process of SQL parsing: lexical analysis, syntax analysis, generating logical 
plans, and generating physical plans. Then, it presents the overall 
architecture of DorisSQL parsing. In the end, the five processes:  Parse, 
Analyze, SinglePlan, DistributedPlan, and Schedule are explained in detail, and 
an in-depth explanation is given of the algorithm principle and code 
implementation.
+
+Doris complies with the standard methods of SQL parsing. Still, according to 
the underlying storage architecture and distributed characteristics, many 
optimizations have been made in SQL parsing to achieve maximum parallelism and 
minimize network transmission, reducing a lot of burden on the SQL execution 
level.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/principle-of-Doris-SQL-parsing.md 
b/i18n/zh-CN/docusaurus-plugin-content-blog/principle-of-Doris-SQL-parsing.md
new file mode 100644
index 00000000000..51efa17509b
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-blog/principle-of-Doris-SQL-parsing.md
@@ -0,0 +1,311 @@
+---
+{
+'title': 'Doris全面解析：Doris SQL 原理解析',
+'summary': "本文主要介绍了Doris 
SQL解析的原理。由于SQL类型有很多，本文侧重介绍查询SQL的解析，从算法原理和代码实现上深入讲解了Doris的SQL解析原理。",
+'date': '2022-08-25',
+'author': 'Apache Doris',
+'tags': ['技术解析'],
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+**导读：**
+本文主要介绍了Doris SQL解析的原理。
+
+重点讲述了生成单机逻辑计划，生成分布式逻辑计划，生成分布式物理计划的过程。对应于代码实现是Analyze，SinglePlan，DistributedPlan，Schedule四个部分。
+
+Analyze负责对AST进行前期的一些处理，SinglePlan根据AST进行优化生成单机查询计划，DistributedPlan将单机的查询计划拆成分布式的查询计划，Schedule阶段负责决定查询计划下发到哪些机器上执行。
+
+由于SQL类型有很多，本文侧重介绍查询SQL的解析，从算法原理和代码实现上深入讲解了Doris的SQL解析原理。
+
+# 1 Doris简介
+Doris是基于MPP架构的交互式SQL数据仓库，主要用于解决近实时的报表和多维分析。
+
+Doris分成两部分FE和BE，FE 负责存储以及维护集群元数据、接收、解析、查询、设计规划整体查询流程，BE 负责数据存储和具体的实施过程。
+
+在 Doris 的存储引擎中，用户数据被水平划分为若干个数据分片（Tablet，也称作数据分桶）。每个 Tablet 包含若干数据行。多个 Tablet 
在逻辑上归属于不同的分区Partition。一个 Tablet 只属于一个 Partition。而一个 Partition 包含若干个 
Tablet。Tablet 是数据移动、复制等操作的最小物理存储单元。
+
+# 2 SQL解析简介
+SQL解析在这篇文章中指的是**将一条sql语句经过一系列的解析最后生成一个完整的物理执行计划的过程**。
+
+这个过程包括以下四个步骤：词法分析，语法分析，生成逻辑计划，生成物理计划。 如图1所示：
+
+<div align=center>
+<img alt="图 1 SQL解析的流程" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_1_cn.png"/>
+</div>
+ <p align="center">图 1 SQL解析的流程</p>
+
+## 2.1 词法分析
+词法分析主要负责将字符串形式的sql识别成一个个token，为语法分析做准备。
+```undefined
+select ......  from ...... where ....... group by ..... order by ......
+
+SQL 的 Token 可以分为如下几类：
+￮ 关键字（select、from、where）
+￮ 操作符（+、-、>=）
+￮ 开闭合标志（(、CASE）
+￮ 占位符（?）
+￮ 注释
+￮ 空格
+......
+```
+## 2.2 语法分析
+语法分析主要负责根据语法规则，将词法分析生成的token转成抽象语法树（Abstract Syntax Tree），如图2所示。
+
+<div align=center>
+<img alt=">图 2 抽象语法树示例" width="60%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_2_cn.png"/>
+</div>
+<p align="center">图 2 抽象语法树示例</p>
+
+## 2.3 逻辑计划
+逻辑计划负责将抽象语法树转成代数关系。代数关系是一棵算子树，每个节点代表一种对数据的计算方式，整棵树代表了数据的计算方式以及流动方向，如图3所示。
+<div align=center>
+<img alt="图3 关系代数示例" width="20%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_3_cn.png"/>
+</div>
+ <p align="center">图3 关系代数示例</p>
+
+## 2.4 物理计划
+物理计划是在逻辑计划的基础上，根据机器的分布，数据的分布，决定去哪些机器上执行哪些计算操作。
+
+Doris系统的SQL解析也是采用这些步骤，只不过根据Doris系统结构的特点和数据的存储方式，进行了细化和优化，最大化发挥机器的计算能力。
+
+# 3 设计目标
+Doris SQL解析架构的设计有以下目标：
+
+1. 最大化计算的并行性
+
+2. 最小化数据的网络传输
+
+3. 最大化减少需要扫描的数据
+
+# 4 总体架构
+Doris SQL解析具体包括了五个步骤：词法分析，语法分析，生成单机逻辑计划，生成分布式逻辑计划，生成物理执行计划。
+
+具体代码实现上包含以下五个步骤：Parse, Analyze, SinglePlan, DistributedPlan, Schedule。
+
+<div align=center>
+<img alt="图4 系统总体架构图" width="40%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_4_cn.png"/>
+</div>
+ <p align="center">图4 系统总体架构图</p>
+
+如图4所示，Parse阶段本文不详细讲，Analyze负责对AST进行前期的一些处理，SinglePlan根据AST进行优化生成单机查询计划，DistributedPlan将单机的查询计划拆成分布式的查询计划，Schedule阶段负责决定查询计划下发到哪些机器上执行。
+
+**由于SQL类型有很多，本文侧重介绍查询SQL的解析。**
+
+图5展示了一个简单的查询SQL在Doris的解析实现
+
+<div align=center>
+<img alt="图5 查询sql在Doris中的解析过程" width="50%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_5_cn.png"/>
+</div>
+ <p align="center">图5 查询sql在Doris中的解析过程</p>
+
+# 5 Parse阶段
+词法分析采用jflex技术，语法分析采用java cup parser技术，最后生成抽象语法树（Abstract Syntax 
Tree）AST，这些都是现有的、成熟的技术，在这里不进行详细介绍。
+
+AST是一种树状结构，代表着一条SQL。不同类型的查询select, insert, show, set, alter table, create 
table等经过Parse阶段后生成不同的数据结构（SelectStmt, InsertStmt, ShowStmt, SetStmt, AlterStmt, 
AlterTableStmt, 
CreateTableStmt等），但他们都继承自Statement，并根据自己的语法规则进行一些特定的处理。例如：对于select类型的sql， 
Parse之后生成了SelectStmt结构。
+
+SelectStmt结构包含了SelectList，FromClause，WhereClause，GroupByClause，SortInfo等结构。这些结构又包含了更基础的一些数据结构，如WhereClause包含了BetweenPredicate（between表达式）,
 BinaryPredicate（二元表达式）， CompoundPredicate（and or组合表达式）, InPredicate（in表达式）等。
+
+AST中所有结构都是由基本结构表达式Expr通过多种组合而成，如图6所示。
+
+<div align=center>
+<img alt="图6 Doris中抽象语法树AST的实现" width="60%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_6_cn.png"/>
+</div>
+ <p align="center">图6 Doris中抽象语法树AST的实现</p>
+
+# 6 Analyze阶段
+Analyze主要是对Parse阶段生成的抽象语法树AST进行一些前期的处理和语义分析，为生成单机逻辑计划做准备。
+
+抽象语法树是由StatementBase这个抽象类表示。这个抽象类包含一个最重要的成员函数analyze()，用来执行Analyze阶段要做的事。
+
+不同类型的查询select, insert, show, set, alter table, create 
table等经过Parse阶段后生成不同的数据结构（SelectStmt, InsertStmt, ShowStmt, SetStmt, AlterStmt, 
AlterTableStmt, 
CreateTableStmt等），这些数据结构继承自StatementBase，并实现analyze()函数，对特定类型的SQL进行特定的Analyze。
+
+例如：select类型的查询，会转成对select sql的子语句SelectList, FromClause, GroupByClause, 
HavingClause, WhereClause, 
SortInfo等的analyze()。然后这些子语句再各自对自己的子结构进行进一步的analyze()，通过层层迭代，把各种类型的sql的各种情景都分析完毕。例如：WhereClause进一步分析其包含的BetweenPredicate（between表达式）,
 BinaryPredicate（二元表达式）， CompoundPredicate（and or组合表达式）, InPredicate（in表达式）等。
+
+**对于查询类型的SQL，包含以下几项重要工作：**
+
+- **元信息的识别和解析**：识别和解析sql中涉及的 Cluster, Database, Table, Column 
等元信息，确定需要对哪个集群的哪个数据库的哪些表的哪些列进行计算。
+
+- **SQL 的合法性检查**：窗口函数不能 DISTINCT，投影列是否有歧义，where语句中不能含有grouping操作等。
+
+- **SQL 简单重写**：比如将 select * 扩展成 select 所有列，count distinct转成bitmap或者hll函数等。
+
+- **函数处理**：检查sql中包含的函数和系统定义的函数是否一致，包括参数类型，参数个数等。
+
+- **Table 和 Column 的别名处理**
+
+- **类型检查和转换**：例如二元表达式两边的类型不一致时，需要对其中一个类型进行转换（BIGINT 和 DECIMAL 比较，BIGINT 类型需要 
Cast 成 DECIMAL）。
+
+对AST 
进行analyze后，会再进行一次rewrite操作，进行精简或者是转成统一的处理方式。目前rewrite的算法是基于规则的方式，针对AST的树状结构，自底向上，应用每一条规则进行重写。如果重写后，AST有变化，则再次进行analyze和rewrite，直到AST无变化为止。
+
+例如：常量表达式的化简：1 + 1 + 1 重写成 3，1 > 2 重写成 Flase 等。将一些语句转成统一的处理方式，比如将 where in, 
where exists 重写成 semi join, where not in, where not exists 重写成 anti join。
+
+# 7 生成单机逻辑Plan阶段
+这部分工作主要是根据AST抽象语法树生成代数关系，也就是俗称的算子数。树上的每个节点都是一个算子，代表着一种操作。
+
+如图7所示，ScanNode代表着对一个表的扫描操作，将一个表的数据读出来。HashJoinNode代表着join操作，小表在内存中构建哈希表，遍历大表找到连接键相同的值。Project表示投影操作，代表着最后需要输出的列，图7表示只用输出citycode这一列。
+
+<div align=center>
+<img alt="图7 单机逻辑计划示例" width="50%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_7_cn.png"/>
+</div>
+ <p align="center">图7 单机逻辑计划示例</p>
+
+如果不进行优化，生成的关系代数下发到存储中执行的代价非常高。
+
+对于查询：
+```undefined
+select a.siteid, a.pv from table1 a join table2 b on a.siteid = b.siteid where 
a.citycode=122216 and b.username="test" order by a.pv limit 10
+```
+未优化的关系代数，如图8所示，需要将所有列读出来进行一系列的计算，在最后选择输出siteid, pv两列，大量无用的列数据浪费了计算资源。
+
+Doris在生成代数关系时，进行了大量的优化，将投影列和查询条件尽可能放到扫描操作时执行。
+
+<div align=center>
+<img alt="图8 未优化的关系代数" width="20%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_8_cn.png"/>
+</div>
+ <p align="center">图8 未优化的关系代数</p>
+
+**具体来说这个阶段主要做了如下几项工作：**
+
+- **Slot 物化**：指确定一个表达式对应的列需要 Scan 和计算，比如聚合节点的聚合函数表达式和 Group By 表达式需要进行物化。
+
+- **投影下推**：BE 在 Scan 时只会 Scan 必须读取的列。
+
+- **谓词下推**：在满足语义正确的前提下将过滤条件尽可能下推到 Scan 节点。
+
+- **分区，分桶裁剪**：根据过滤条件中的信息，确定需要扫描哪些分区，哪些桶的tablet。
+
+- **Join Reorder**：对于 Inner Join, Doris 会根据行数调整表的顺序，将大表放在前面。
+
+- **Sort + Limit 优化成 TopN**：对于order by limit语句会转换成TopN的操作节点，方便统一处理。
+
+- **MaterializedView 选择**：会根据查询需要的列，过滤，排序和 Join 的列，行数，列数等因素选择最佳的物化视图。
+
+图9展示了优化的示例，Doris是在生成关系代数的过程中优化，边生成边优化。
+
+<div align=center>
+<img alt="图9 单机查询计划优化的过程" width="100%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_9_cn.png"/>
+</div>
+ <p align="center">图9 单机查询计划优化的过程</p>
+
+# 8 生成分布式Plan阶段
+
+有了单机的PlanNode树之后，就需要进一步根据分布式环境，拆成分布式PlanFragment树（PlanFragment用来表示独立的执行单元），毕竟一个表的数据分散地存储在多台主机上，完全可以让一些计算并行起来。
+
+这个步骤的主要目标是最大化并行度和数据本地化。主要方法是将能够并行执行的节点拆分出去单独建立一个PlanFragment，用ExchangeNode代替被拆分出去的节点，用来接收数据。拆分出去的节点增加一个DataSinkNode，用来将计算之后的数据传送到ExchangeNode中，做进一步的处理。
+
+这一步采用递归的方法，自底向上，遍历整个PlanNode树，然后给树上的每个叶子节点创建一个PlanFragment，如果碰到父节点，则考虑将其中能够并行执行的子节点拆分出去，父节点和保留下来的子节点组成一个parent
 PlanFragment。拆分出去的子节点增加一个父节点DataSinkNode组成一个child PlanFragment，child 
PlanFragment指向parent PlanFragment。这样就确定了数据的流动方向。
+
+对于查询操作来说，join操作是最常见的一种操作。
+
+**Doris目前支持4种join算法**：broadcast join，hash partition join，colocate join，bucket 
shuffle join。
+
+**broadcast join**：将小表发送到大表所在的每台机器，然后进行hash join操作。当一个表扫描出的数据量较少时，计算broadcast 
join的cost，通过计算比较hash partition的cost，来选择cost最小的方式。
+
+**hash partition join**：当两张表扫描出的数据都很大时，一般采用hash partition 
join。它遍历表中的所有数据，计算key的哈希值，然后对集群数取模，选到哪台机器，就将数据发送到这台机器进行hash join操作。
+
+**colocate join**：两个表在创建的时候就指定了数据分布保持一致，那么当两个表的join key与分桶的key一致时，就会采用colocate 
join算法。由于两个表的数据分布是一样的，那么hash join操作就相当于在本地，不涉及到数据的传输，极大提高查询性能。
+
+**bucket shuffle join**：当join key是分桶key，并且只涉及到一个分区时，就会优先采用bucket shuffle 
join算法。由于分桶本身就代表了数据的一种切分方式，所以可以利用这一特点，只需将右表对左表的分桶数hash取模，这样只需网络传输一份右表数据，极大减少了数据的网络传输，如图10所示。
+
+<div align=center>
+<img alt="图10 bucket shuffle join示例" width="40%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_10_cn.png"/>
+</div>
+ <p align="center">图10 bucket shuffle join示例</p>
+
+如图11展示了带有HashJoinNode的单机逻辑计划创建分布式逻辑计划的核心流程。
+
+- 对PlanNode，自底向上创建PlanFragment。
+
+- 如果是ScanNode，则直接创建一个PlanFragment，PlanFragment的RootPlanNode是这个ScanNode。
+
+- 如果是HashJoinNode，则首先计算下broadcastCost，为选择boracast join还是hash partition 
join提供参考。
+
+- 根据不同的条件判断选择哪种Join算法
+
+- 如果使用colocate 
join，由于join操作都在本地，就不需要拆分。设置HashJoinNode的左子节点为leftFragment的RootPlanNode，右子节点为rightFragment的RootPlanNode，与leftFragment共用一个PlanFragment，删除掉rightFragment。
+
+- 如果使用bucket shuffle 
join，需要将右表的数据发送给左表。所以先创建了一个ExchangeNode，设置HashJoinNode的左子节点为leftFragment的RootPlanNode，右子节点为这个ExchangeNode，与leftFragment共用一个PlanFragment，并且指定rightFragment数据发送的目的地为这个ExchangeNode。
+
+- 如果使用broadcast 
join，需要将右表的数据发送给左表。所以先创建了一个ExchangeNode，设置HashJoinNode的左子节点为leftFragment的RootPlanNode，右子节点为这个ExchangeNode，与leftFragment共用一个PlanFragment，并且指定rightFragment数据发送的目的地为这个ExchangeNode。
+
+- 如果使用hash partition join，左表和右边的数据都要切分，需要将左右节点都拆分出去，分别创建left ExchangeNode, 
right ExchangeNode，HashJoinNode指定左右节点为left ExchangeNode和 right 
ExchangeNode。单独创建一个PlanFragment，指定RootPlanNode为这个HashJoinNode。最后指定leftFragment, 
rightFragment的数据发送目的地为left ExchangeNode, right ExchangeNode。
+
+<div align=center>
+<img alt="图 11 HashJoinNode创建分布式逻辑计划核心流程" width="50%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_11_cn.png"/>
+</div>
+ <p align="center">图 11 HashJoinNode创建分布式逻辑计划核心流程</p>
+
+图12是两个表的join操作转换成PlanFragment树之后的示例，一共生成了3个PlanFragment。最终数据的输出通过ResultSinkNode节点。
+
+<div align=center>
+<img alt="图12 从单机计划到分布式计划" width="50%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_12_cn.png"/>
+</div>
+ <p align="center">图12 从单机计划到分布式计划</p>
+
+# 9 Schedule阶段
+
+这一步是根据分布式逻辑计划，创建分布式物理计划。主要解决以下问题：
+
+- 哪个 BE 执行哪个 PlanFragment
+
+- 每个 Tablet 选择哪个副本去查询
+
+- 如何进行多实例并发
+
+**图13展示了创建分布式物理计划的核心流程：**
+
+**a. 
prepare阶段**：给每个PlanFragment创建一个FragmentExecParams结构，用来表示PlanFragment执行时所需的所有参数；如果一个PlanFragment包含有DataSinkNode，则找到数据发送的目的PlanFragment，然后指定目的PlanFragment的FragmentExecParams的输入为该PlanFragment的FragmentExecParams。
+
+**b. computeScanRangeAssignment阶段**：针对不同类型的join进行不同的处理。
+
+- computeScanRangeAssignmentByColocate：针对colocate 
join进行处理，由于join的两个表桶中的数据分布都是一样的，他们是基于桶的join操作，所以在这里是确定每个桶选择哪个host。在给host分配桶时，尽量保证每个host分配到的桶基本平均。
+
+- computeScanRangeAssignmentByBucket：针对bucket shuffle 
join进行处理，也只是基于桶的操作，所以在这里是确定每个桶选择哪个host。在给host分配桶时，同样需要尽量保证每个host分配到的桶基本平均。
+
+- 
computeScanRangeAssignmentByScheduler：针对其他类型的join进行处理。确定每个scanNode读取tablet哪个副本。一个scanNode会读取多个tablet，每个tablet有多个副本。为了使scan操作尽可能分散到多台机器上执行，提高并发性能，减少IO压力，Doris采用了Round-Robin算法，使tablet的扫描尽可能地分散到多台机器上去。例如100个tablet需要扫描，每个tablet
 3个副本，一共10台机器，在分配时，保障每台机器扫描10个tablet。
+
+**c. 
computeFragmentExecParams阶段**：这个阶段解决PlanFragment下发到哪个BE上执行，以及如何处理实例并发问题。确定了每个tablet的扫描地址之后，就可以以地址为维度，将FragmentExecParams生成多个实例，也就是FragmentExecParams中包含的地址有多个，就生成多个实例FInstanceExecParam。如果设置了并发度，那么一个地址的执行实例再进一步的拆成多个FInstanceExecParam。针对bucket
 shuffle join和colocate 
join会有一些特殊处理，但是基本思想一样。FInstanceExecParam创建完成后，会分配一个唯一的ID，方便追踪信息。如果FragmentExecParams中包含有ExchangeNode，需要计算有多少senders，以便知道需要接受多少个发送方的数据。最后FragmentExecParams确定destinations，并把目的地址填充上去。
+
+**d. create result receiver阶段**：result receiver是查询完成后，最终数据需要输出的地方。
+
+**e. to 
thrift阶段**：根据所有PlanFragment的FInstanceExecParam创建rpc请求，然后下发到BE端执行。这样一个完整的SQL解析过程完成了。
+
+<div align=center>
+<img alt="图13 创建分布式物理计划核心流程" width="60%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_13_cn.png"/>
+</div>
+ <p align="center">图13 创建分布式物理计划核心流程</p>
+
+如图14所示是一个简单示例，图中的PlanFrament包含了一个ScanNode，ScanNode扫描3个tablet，每个tablet有2副本，集群假设有2台host。
+
+computeScanRangeAssignment阶段确定了需要扫描replica 1,3,5,8,10,12，其中replica 
1,3,5位于host1上，replica 8,10,12位于host2上。
+
+如果全局并发度设置为1时，则创建2个实例FInstanceExecParam，下发到host1和host2上去执行，如果如果全局并发度设置为3，这个host1上创建3个实例FInstanceExecParam，host2上创建3个实例FInstanceExecParam，每个实例扫描一个replica，相当于发起6个rpc请求。
+
+<div align=center>
+<img alt="图14 生成物理计划的过程" width="60%" 
src="../../../static/images/blogs/principle-of-Doris-SQL-parsing/Figure_14_cn.png"/>
+</div>
+ <p align="center">图14 生成物理计划的过程</p>
+
+# 10 总结
+本文首先简单介绍了Doris，然后介绍SQL解析的通用流程：词法分析，语法分析，生成逻辑计划，生成物理计划，接着从总体上介绍了Doris在SQL解析这块的总体架构，最后详细讲解了Parse，Analyze，SinglePlan，DistributedPlan，Schedule等5个过程，从算法原理和代码实现上进行了深入的讲解。
+
+Doris遵守了SQL解析的常用方法，但根据底层存储架构，以及分布式的特点，在SQL解析这块进行了大量的优化，实现了最大并行度和最小化网络传输，给SQL执行层面减少很多负担。
\ No newline at end of file
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_10_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_10_cn.png
new file mode 100644
index 00000000000..8b597abfd7d
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_10_cn.png differ
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_10_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_10_en.png
new file mode 100644
index 00000000000..8b597abfd7d
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_10_en.png differ
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_11_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_11_cn.png
new file mode 100644
index 00000000000..ee926e256ca
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_11_cn.png differ
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_11_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_11_en.png
new file mode 100644
index 00000000000..80519ce78df
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_11_en.png differ
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_12_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_12_cn.png
new file mode 100644
index 00000000000..5490d402bc9
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_12_cn.png differ
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_12_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_12_en.png
new file mode 100644
index 00000000000..5490d402bc9
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_12_en.png differ
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_13_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_13_cn.png
new file mode 100644
index 00000000000..0326b5fb846
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_13_cn.png differ
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_13_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_13_en.png
new file mode 100644
index 00000000000..0326b5fb846
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_13_en.png differ
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_14_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_14_cn.png
new file mode 100644
index 00000000000..863917bf2b9
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_14_cn.png differ
diff --git 
a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_14_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_14_en.png
new file mode 100644
index 00000000000..863917bf2b9
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_14_en.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_1_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_1_cn.png
new file mode 100644
index 00000000000..19b5b1b0eb1
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_1_cn.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_1_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_1_en.png
new file mode 100644
index 00000000000..5bc26950f8d
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_1_en.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_2_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_2_cn.png
new file mode 100644
index 00000000000..9fd48fdb27e
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_2_cn.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_2_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_2_en.png
new file mode 100644
index 00000000000..e2623dd3afe
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_2_en.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_3_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_3_cn.png
new file mode 100644
index 00000000000..e23a2280748
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_3_cn.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_3_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_3_en.png
new file mode 100644
index 00000000000..e23a2280748
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_3_en.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_4_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_4_cn.png
new file mode 100644
index 00000000000..cf11957272e
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_4_cn.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_4_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_4_en.png
new file mode 100644
index 00000000000..822b7f83811
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_4_en.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_5_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_5_cn.png
new file mode 100644
index 00000000000..3afc0441d0f
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_5_cn.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_5_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_5_en.png
new file mode 100644
index 00000000000..24411121186
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_5_en.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_6_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_6_cn.png
new file mode 100644
index 00000000000..dd7a7b248fc
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_6_cn.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_6_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_6_en.png
new file mode 100644
index 00000000000..dd7a7b248fc
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_6_en.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_7_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_7_cn.png
new file mode 100644
index 00000000000..f6e9aad4d56
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_7_cn.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_7_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_7_en.png
new file mode 100644
index 00000000000..f6e9aad4d56
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_7_en.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_8_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_8_cn.png
new file mode 100644
index 00000000000..0ea7ae94dee
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_8_cn.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_8_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_8_en.png
new file mode 100644
index 00000000000..0ea7ae94dee
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_8_en.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_9_cn.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_9_cn.png
new file mode 100644
index 00000000000..c2c1f4e6b68
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_9_cn.png differ
diff --git a/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_9_en.png 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_9_en.png
new file mode 100644
index 00000000000..c2c1f4e6b68
Binary files /dev/null and 
b/static/images/blogs/principle-of-Doris-SQL-parsing/Figure_9_en.png differ


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[doris-website] branch master updated: [doc] Add blog 'Doris analysis: Doris SQL principle analysis' (#69)

Reply via email to