[doris-website] branch master updated: fix_en_doc (#49)

jiafengzheng Tue, 16 Aug 2022 18:00:30 -0700

This is an automated email from the ASF dual-hosted git repository.

jiafengzheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new d5d0a9a5bd5 fix_en_doc (#49)
d5d0a9a5bd5 is described below

commit d5d0a9a5bd53a6f18e336e5675d0e5dd4af8457b
Author: Liqf <109049295+lemonlit...@users.noreply.github.com>
AuthorDate: Wed Aug 17 08:59:23 2022 +0800

    fix_en_doc (#49)
---
 docs/data-table/data-model.md                      |  36 +-
 docs/data-table/data-partition.md                  | 419 +++++++++++----------
 docs/get-starting/get-starting.md                  |   2 +-
 docs/install/install-deploy.md                     | 136 +------
 .../compilation-with-ldb-toolchain.md              |   2 +-
 docs/install/source-install/compilation.md         |  17 +-
 .../current/data-table/data-model.md               |   8 +-
 .../current/get-starting/get-starting.md           |   1 +
 8 files changed, 258 insertions(+), 363 deletions(-)

diff --git a/docs/data-table/data-model.md b/docs/data-table/data-model.md
index 3df284c933b..662e7f405b3 100644
--- a/docs/data-table/data-model.md
+++ b/docs/data-table/data-model.md
@@ -26,7 +26,7 @@ under the License.
 
 # Data Model
 
-This document describes Doris's data model, ROLLUP and prefix index concepts 
at the logical level to help users better use Doris to cope with different 
business scenarios.
+This document describes Doris's data model,at the logical level to help users 
better use Doris to cope with different business scenarios.
 
 ## Basic concepts
 
@@ -38,7 +38,7 @@ Columns can be divided into two categories: Key and Value. 
From a business persp
 Doris's data model is divided into three main categories:
 
 * Aggregate
-* Uniq
+* Unique
 * Duplicate
 
 Let's introduce them separately.
@@ -234,9 +234,9 @@ Data aggregation occurs in Doris in the following three 
stages:
 
 Data may be aggregated to varying degrees at different times. For example, 
when a batch of data is just imported, it may not be aggregated with the 
existing data. But for users, user**can only query aggregated data**. That is, 
different degrees of aggregation are transparent to user queries. Users should 
always assume that data exists in terms of the degree of aggregation that 
**ultimately completes**, and **should not assume that some aggregation has not 
yet occurred**. (See the section [...]
 
-## Uniq Model
+## Unique Model
 
-In some multi-dimensional analysis scenarios, users are more concerned with 
how to ensure the uniqueness of Key, that is, how to obtain the Primary Key 
uniqueness constraint. Therefore, we introduce Uniq's data model. This model is 
essentially a special case of aggregation model and a simplified representation 
of table structure. Let's give an example.
+In some multi-dimensional analysis scenarios, users are more concerned with 
how to ensure the uniqueness of Key, that is, how to obtain the Primary Key 
uniqueness constraint. Therefore, we introduce Unique data model. This model is 
essentially a special case of aggregation model and a simplified representation 
of table structure. Let's give an example.
 
 |ColumnName|Type|IsKey|Comment|
 |---|---|---|---|
@@ -303,7 +303,7 @@ PROPERTIES (
 );
 ```
 
-That is to say, Uniq model can be completely replaced by REPLACE in 
aggregation model. Its internal implementation and data storage are exactly the 
same. No further examples will be given here.
+That is to say, Unique model can be completely replaced by REPLACE in 
aggregation model. Its internal implementation and data storage are exactly the 
same. No further examples will be given here.
 
 ## Duplicate Model
 
@@ -336,14 +336,14 @@ PROPERTIES (
 );
 ```
 
-This data model is different from Aggregate and Uniq models. Data is stored 
entirely in accordance with the data in the imported file, without any 
aggregation. Even if the two rows of data are identical, they will be retained.
-The DUPLICATE KEY specified in the table building statement is only used to 
specify which columns the underlying data is sorted according to. (The more 
appropriate name should be "Sorted Column", where the name "DUPLICATE KEY" is 
used to specify the data model used. For more explanations of "Sorted Column", 
see the section **Prefix Index**.) On the choice of DUPLICATE KEY, we recommend 
that the first 2-4 columns be selected appropriately.
+This data model is different from Aggregate and Unique models. Data is stored 
entirely in accordance with the data in the imported file, without any 
aggregation. Even if the two rows of data are identical, they will be retained.
+The DUPLICATE KEY specified in the table building statement is only used to 
specify which columns the underlying data is sorted according to. (The more 
appropriate name should be "Sorted Column", where the name "DUPLICATE KEY" is 
used to specify the data model used. For more explanations of "Sorted Column", 
see the section [Prefix Index](./index/prefix-index.md)) On the choice of 
DUPLICATE KEY, we recommend that the first 2-4 columns be selected 
appropriately.
 
-This data model is suitable for storing raw data without aggregation 
requirements and primary key uniqueness constraints. For more usage scenarios, 
see the **Limitations of the Aggregation Model** section.
+This data model is suitable for storing raw data without aggregation 
requirements and primary key uniqueness constraints. For more usage scenarios, 
see the [Limitations of the Aggregation 
Model](./data-model.md#limitations-of-aggregation-model) section.
 
 ## Limitations of aggregation model
 
-Here we introduce the limitations of Aggregate model (including Uniq model).
+Here we introduce the limitations of Aggregate model (including Unique model).
 
 In the aggregation model, what the model presents is the aggregated data. That 
is to say, any data that has not yet been aggregated (for example, two 
different imported batches) must be presented in some way to ensure 
consistency. Let's give an example.
 
@@ -385,7 +385,9 @@ We add aggregation operator to query engine to ensure data 
consistency.
 
 In addition, on the aggregate column (Value), when executing aggregate class 
queries that are inconsistent with aggregate types, attention should be paid to 
semantics. For example, in the example above, we execute the following queries:
 
-`SELECT MIN(cost) FROM table;`
+```
+SELECT MIN(cost) FROM table;
+```
 
 The result is 5, not 1.
 
@@ -393,7 +395,9 @@ At the same time, this consistency guarantee will greatly 
reduce the query effic
 
 Let's take the most basic count (*) query as an example:
 
-`SELECT COUNT(*) FROM table;`
+```
+SELECT COUNT(*) FROM table;
+```
 
 In other databases, such queries return results quickly. Because in the 
implementation, we can get the query result by counting rows at the time of 
import and saving count statistics information, or by scanning only a column of 
data to get count value at the time of query, with very little overhead. But in 
Doris's aggregation model, the overhead of this query is **very large**.
 
@@ -425,9 +429,9 @@ Because the final aggregation result is:
 
 So `select count (*) from table;` The correct result should be **4**. But if 
we only scan the `user_id`column and add query aggregation, the final result is 
**3** (10001, 10002, 10003). If aggregated without queries, the result is **5** 
(a total of five rows in two batches). It can be seen that both results are 
wrong.
 
-In order to get the correct result, we must read the data of `user_id` and 
`date`, and **together with aggregate** when querying, to return the correct 
result of **4**. That is to say, in the count (*) query, Doris must scan all 
AGGREGATE KEY columns (here are `user_id` and `date`) and aggregate them to get 
the semantically correct results. When aggregated columns are large, count (*) 
queries need to scan a large amount of data.
+In order to get the correct result, we must read the data of `user_id` and 
`date`, and **together with aggregate** when querying, to return the correct 
result of **4**. That is to say, in the `count (*)` query, Doris must scan all 
AGGREGATE KEY columns (here are `user_id` and `date`) and aggregate them to get 
the semantically correct results. When aggregated columns are large, `count 
(*)` queries need to scan a large amount of data.
 
-Therefore, when there are frequent count (*) queries in the business, we 
recommend that users simulate count (*) by adding a column with a value of 1 
and aggregation type of SUM. As the table structure in the previous example, we 
modify it as follows:
+Therefore, when there are frequent `count (*)` queries in the business, we 
recommend that users simulate `count (*)` by adding a column with a value of 1 
and aggregation type of SUM. As the table structure in the previous example, we 
modify it as follows:
 
 |ColumnName|Type|AggregationType|Comment|
 |---|---|---|---|
@@ -442,15 +446,15 @@ Another way is to **change the aggregation type of the 
count column above to REP
 
 ### Duplicate Model
 
-Duplicate model has no limitation of aggregation model. Because the model does 
not involve aggregate semantics, when doing count (*) query, we can get the 
correct semantics by choosing a column of queries arbitrarily.
+Duplicate model has no limitation of aggregation model. Because the model does 
not involve aggregate semantics, when doing `count (*)` query, we can get the 
correct semantics by choosing a column of queries arbitrarily.
 
 ## Suggestions for Choosing Data Model
 
 Because the data model was established when the table was built, and **could 
not be modified. Therefore, it is very important to select an appropriate data 
model**.
 
-1. Aggregate model can greatly reduce the amount of data scanned and the 
amount of query computation by pre-aggregation. It is very suitable for report 
query scenarios with fixed patterns. But this model is not very friendly for 
count (*) queries. At the same time, because the aggregation method on the 
Value column is fixed, semantic correctness should be considered in other types 
of aggregation queries.
+1. Aggregate model can greatly reduce the amount of data scanned and the 
amount of query computation by pre-aggregation. It is very suitable for report 
query scenarios with fixed patterns. But this model is not very friendly for 
`count (*)` queries. At the same time, because the aggregation method on the 
Value column is fixed, semantic correctness should be considered in other types 
of aggregation queries.
 
-2. Uniq model guarantees the uniqueness of primary key for scenarios requiring 
unique primary key constraints. However, the query advantage brought by 
pre-aggregation such as ROLLUP cannot be exploited (because the essence is 
REPLACE, there is no such aggregation as SUM).
+2. Unique model guarantees the uniqueness of primary key for scenarios 
requiring unique primary key constraints. However, the query advantage brought 
by pre-aggregation such as ROLLUP cannot be exploited (because the essence is 
REPLACE, there is no such aggregation as SUM).
 
    > **[Note]** The Unique model only supports the entire row update. If the 
user needs unique key with partial update (such as loading multiple source 
tables into one doris table), you can consider using the Aggregate model, 
setting the aggregate type of the non-primary key columns to 
REPLACE_IF_NOT_NULL. For detail, please refer to [CREATE TABLE 
Manual](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md)
 
diff --git a/docs/data-table/data-partition.md 
b/docs/data-table/data-partition.md
index 441e43cee8e..8d4372340b5 100644
--- a/docs/data-table/data-partition.md
+++ b/docs/data-table/data-partition.md
@@ -140,200 +140,200 @@ When defining columns, you can refer to the following 
suggestions:
 
 Doris supports two levels of data partitioning. The first layer is Partition, 
which supports Range and List partitioning. The second layer is the Bucket 
(Tablet), which only supports Hash partitioning.
 
-It is also possible to use only one layer of partitioning. When using a layer 
partition, only Bucket partitioning is supported.
-
-1. Partition
-
-    * The Partition column can specify one or more columns. The partition 
class must be a KEY column. The use of multi-column partitions is described 
later in the **Multi-column partitioning** summary. 
-    * Regardless of the type of partition column, double quotes are required 
when writing partition values.
-    * There is no theoretical limit on the number of partitions.
-    * When you do not use Partition to build a table, the system will 
automatically generate a Partition with the same name as the table name. This 
Partition is not visible to the user and cannot be modified.
-    * **Do not add partitions with overlapping ranges** when creating 
partitions.
-
-    #### Range Partition
-
-    * Partition columns are usually time columns for easy management of old 
and new data.
-
-    * Partition supports only the upper bound by `VALUES LESS THAN (...)`, the 
system will use the upper bound of the previous partition as the lower bound of 
the partition, and generate a left closed right open interval. Passing, also 
supports specifying the upper and lower bounds by `VALUES [...)`, and 
generating a left closed right open interval.
-
-    * It is easier to understand by specifying `VALUES [...)`. Here is an 
example of the change in partition range when adding or deleting partitions 
using the `VALUES LESS THAN (...)` statement:
-        * As in the `example_range_tbl` example above, when the table is 
built, the following 3 partitions are automatically generated:
-            ```
-            P201701: [MIN_VALUE, 2017-02-01)
-            P201702: [2017-02-01, 2017-03-01)
-            P201703: [2017-03-01, 2017-04-01)
-            ```
-            
-        * When we add a partition p201705 VALUES LESS THAN ("2017-06-01"), the 
partition results are as follows:
-        
-            ```
-            P201701: [MIN_VALUE, 2017-02-01)
-            P201702: [2017-02-01, 2017-03-01)
-            P201703: [2017-03-01, 2017-04-01)
-            P201705: [2017-04-01, 2017-06-01)
-            ```
-            
-        * At this point we delete the partition p201703, the partition results 
are as follows:
-        
-            ```
-            p201701: [MIN_VALUE, 2017-02-01)
-            p201702: [2017-02-01, 2017-03-01)
-            p201705: [2017-04-01, 2017-06-01)
-            ```
-            
-            > Note that the partition range of p201702 and p201705 has not 
changed, and there is a hole between the two partitions: [2017-03-01, 
2017-04-01). That is, if the imported data range is within this hole, it cannot 
be imported.
-            
-        * Continue to delete partition p201702, the partition results are as 
follows:
-        
-            ```
-            p201701: [MIN_VALUE, 2017-02-01)
-            p201705: [2017-04-01, 2017-06-01)
-            ```
-            
-            > The void range becomes: [2017-02-01, 2017-04-01)
-            
-        * Now add a partition p201702new VALUES LESS THAN ("2017-03-01"), the 
partition results are as follows:
-          
-            ```
-            p201701: [MIN_VALUE, 2017-02-01)
-            p201702new: [2017-02-01, 2017-03-01)
-            p201705: [2017-04-01, 2017-06-01)
-            ```
-            
-            > You can see that the hole size is reduced to: [2017-03-01, 
2017-04-01)
-            
-        * Now delete partition p201701 and add partition p201612 VALUES LESS 
THAN ("2017-01-01"), the partition result is as follows:
-        
-            ```
-            p201612: [MIN_VALUE, 2017-01-01)
-            p201702new: [2017-02-01, 2017-03-01)
-            p201705: [2017-04-01, 2017-06-01)
-            ```
-            
-            > A new void appeared: [2017-01-01, 2017-02-01)
-        
-
-    In summary, the deletion of a partition does not change the scope of an 
existing partition. There may be holes in deleting partitions. When a partition 
is added by the `VALUES LESS THAN` statement, the lower bound of the partition 
immediately follows the upper bound of the previous partition.
-
-    In addition to the single-column partitioning we have seen above, Range 
partition also supports **multi-column partitioning**, examples are as follows:
+It is also possible to use only one layer of partitioning. When using a layer 
partition, only Bucket partitioning is supported.Let's introduce the partitions 
and buckets separately:
+
+1. **Partition**
+
+   - The Partition column can specify one or more columns. The partition class 
must be a KEY column. The use of multi-column partitions is described later in 
the **Multi-column partitioning** summary.
+   - Regardless of the type of partition column, double quotes are required 
when writing partition values.
+   - There is no theoretical limit on the number of partitions.
+   - When you do not use Partition to build a table, the system will 
automatically generate a Partition with the same name as the table name. This 
Partition is not visible to the user and cannot be modified.
+   - **Do not add partitions with overlapping ranges** when creating 
partitions.
+
+   **Range Partition**
+
+   - The partition column is usually a time column to facilitate the 
management of old and new data.
+
+   - Partition supports only the upper bound by `VALUES LESS THAN (...)`, the 
system will use the upper bound of the previous partition as the lower bound of 
the partition, and generate a left closed right open interval. Passing, also 
supports specifying the upper and lower bounds by `VALUES [...)`, and 
generating a left closed right open interval.
+
+   - It is easier to understand by specifying `VALUES [...)`. Here is an 
example of the change in partition range when adding or deleting partitions 
using the `VALUES LESS THAN (...)` statement:
+
+     - As in the `example_range_tbl` example above, when the table is built, 
the following 3 partitions are automatically generated:
+
+       ```text
+       p201701: [MIN_VALUE,  2017-02-01)
+       p201702: [2017-02-01, 2017-03-01)
+       p201703: [2017-03-01, 2017-04-01)
+       ```
+
+     - When we add a partition p201705 VALUES LESS THAN ("2017-06-01"), the 
partition results are as follows:
+
+       ```text
+       p201701: [MIN_VALUE,  2017-02-01)
+       p201702: [2017-02-01, 2017-03-01)
+       p201703: [2017-03-01, 2017-04-01)
+       p201705: [2017-04-01, 2017-06-01)
+       ```
+
+     - At this point we delete the partition p201703, the partition results 
are as follows:
+
+       ```text
+       p201701: [MIN_VALUE,  2017-02-01)
+       p201702: [2017-02-01, 2017-03-01)
+       p201705: [2017-04-01, 2017-06-01)
+       ```
+
+       >  Note that the partition range of p201702 and p201705 has not 
changed, and there is a hole between the two partitions: [2017-03-01, 
2017-04-01). That is, if the imported data range is within this hole, it cannot 
be imported.
+
+     - Continue to delete partition p201702, the partition results are as 
follows:
+
+       ```text
+       p201701: [MIN_VALUE,  2017-02-01)
+       p201705: [2017-04-01, 2017-06-01)
+       ```
+
+       > The void range becomes: [2017-02-01, 2017-04-01)
+
+     - Now add a partition p201702new VALUES LESS THAN ("2017-03-01"), the 
partition results are as follows:
+
+       ```text
+       p201701:    [MIN_VALUE,  2017-02-01)
+       p201702new: [2017-02-01, 2017-03-01)
+       p201705:    [2017-04-01, 2017-06-01)
+       ```
+
+       > You can see that the hole size is reduced to: [2017-03-01, 2017-04-01)
+
+     - Now delete partition p201701 and add partition p201612 VALUES LESS THAN 
("2017-01-01"), the partition result is as follows:
+
+       ```text
+       p201612:    [MIN_VALUE,  2017-01-01)
+       p201702new: [2017-02-01, 2017-03-01)
+       p201705:    [2017-04-01, 2017-06-01) 
+       ```
+
+       > A new void appeared: [2017-01-01, 2017-02-01)
+
+   In summary, the deletion of a partition does not change the scope of an 
existing partition. There may be holes in deleting partitions. When a partition 
is added by the `VALUES LESS THAN` statement, the lower bound of the partition 
immediately follows the upper bound of the previous partition.
+
+   In addition to the single-column partitioning we have seen above, Range 
partition also supports **multi-column partitioning**, examples are as follows:
 
    ```text
-    PARTITION BY RANGE(`date`, `id`)
-    (
-        PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
-        PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
-        PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
-    )
-    ```
-    
-    In the above example, we specify `date` (DATE type) and `id` (INT type) as 
partition columns. The resulting partitions in the above example are as follows:
-    
-    ```
-    *p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
-    *p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
-    *p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
-    ```
-    
-    Note that the last partition user defaults only the partition value of the 
`date` column, so the partition value of the `id` column will be filled with 
`MIN_VALUE` by default. When the user inserts data, the partition column values 
are compared in order, and the corresponding partition is finally obtained. 
Examples are as follows:
-    
-    ```
-    * Data --> Partition
-    * 2017-01-01, 200   --> p201701_1000
-    * 2017-01-01, 2000  --> p201701_1000
-    * 2017-02-01, 100   --> p201701_1000
-    * 2017-02-01, 2000  --> p201702_2000
-    * 2017-02-15, 5000  --> p201702_2000
-    * 2017-03-01, 2000  --> p201703_all
-    * 2017-03-10, 1     --> p201703_all
-    * 2017-04-01, 1000  --> Unable to import
-    * 2017-05-01, 1000  --> Unable to import
-    ```
-
-    #### List Partition
-
-    * The partition column supports the `BOOLEAN, TINYINT, SMALLINT, INT, 
BIGINT, LARGEINT, DATE, DATETIME, CHAR, VARCHAR` data type, and the partition 
value is an enumeration value. Partitions can be hit only if the data is one of 
the target partition enumeration values.
-    * Partition supports specifying the number of partitions contained in each 
partition via `VALUES IN (...) ` to specify the enumeration values contained in 
each partition.
-    * The following example illustrates how partitions change when adding or 
deleting partitions.
-      
-        * As in the `example_list_tbl` example above, when the table is built, 
the following three partitions are automatically created.
-
-            ```
-            p_cn: ("Beijing", "Shanghai", "Hong Kong")
-            p_usa: ("New York", "San Francisco")
-            p_jp: ("Tokyo")
-            ```
-
-        * When we add a partition p_uk VALUES IN ("London"), the result of the 
partition is as follows
-        
-            ```
-            p_cn: ("Beijing", "Shanghai", "Hong Kong")
-            p_usa: ("New York", "San Francisco")
-            p_jp: ("Tokyo")
-            p_uk: ("London")
-            ```
-        
-        * When we delete the partition p_jp, the result of the partition is as 
follows.
-
-            ```
-            p_cn: ("Beijing", "Shanghai", "Hong Kong")
-            p_usa: ("New York", "San Francisco")
-            p_uk: ("London")
-            ```
-
-    List partition also supports **multi-column partition**, examples are as 
follows:
-
-    ```text
-    PARTITION BY LIST(`id`, `city`)
-    (
-        PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
-        PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
-        PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
-    )
-    ```
-    
-    In the above example, we specify `id`(INT type) and `city`(VARCHAR type) 
as partition columns. The above example ends up with the following partitions.
-    
-    ```
-    * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
-    * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
-    * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
-    ```
-    
-    When the user inserts data, the partition column values will be compared 
sequentially in order to finally get the corresponding partition. An example is 
as follows.
-    
-    ```
-    * Data ---> Partition
-    * 1, Beijing  ---> p1_city
-    * 1, Shanghai ---> p1_city
-    * 2, Shanghai ---> p2_city
-    * 3, Beijing  ---> p3_city
-    * 1, Tianjin  ---> Unable to import
-    * 4, Beijing  ---> Unable to import
-    ```
-
-2. Bucket
-
-    * If a Partition is used, the `DISTRIBUTED ...` statement describes the 
division rules for the data in each partition. If you do not use Partition, it 
describes the rules for dividing the data of the entire table.
-    * The bucket column can be multiple columns, but it must be a Key column. 
The bucket column can be the same or different from the Partition column.
-    * The choice of bucket column is a trade-off between **query throughput** 
and **query concurrency**:
-
-        1. If you select multiple bucket columns, the data is more evenly 
distributed. However, if the query condition does not include the equivalent 
condition for all bucket columns, a query will scan all buckets. The throughput 
of such queries will increase, and the latency of a single query will decrease. 
This method is suitable for large throughput and low concurrent query scenarios.
-        2. If you select only one or a few bucket columns, the point query can 
query only one bucket. This approach is suitable for high-concurrency point 
query scenarios.
-        
-    * There is no theoretical limit on the number of buckets.
-
-3. Recommendations on the number and amount of data for Partitions and Buckets.
-
-    * The total number of tablets in a table is equal to (Partition num * 
Bucket num).
-    * The number of tablets in a table, which is slightly more than the number 
of disks in the entire cluster, regardless of capacity expansion.
-    * The data volume of a single tablet does not theoretically have an upper 
and lower bound, but is recommended to be in the range of 1G - 10G. If the 
amount of data for a single tablet is too small, the aggregation of the data is 
not good and the metadata management pressure is high. If the amount of data is 
too large, it is not conducive to the migration, completion, and increase the 
cost of Schema Change or Rollup operation failure retry (the granularity of 
these operations failure  [...]
-    * When the tablet's data volume principle and quantity principle conflict, 
it is recommended to prioritize the data volume principle.
-    * When building a table, the number of Buckets for each partition is 
uniformly specified. However, when dynamically increasing partitions (`ADD 
PARTITION`), you can specify the number of Buckets for the new partition 
separately. This feature can be used to easily reduce or expand data.
-    * Once the number of Buckets for a Partition is specified, it cannot be 
changed. Therefore, when determining the number of Buckets, you need to 
consider the expansion of the cluster in advance. For example, there are 
currently only 3 hosts, and each host has 1 disk. If the number of Buckets is 
only set to 3 or less, then even if you add more machines later, you can't 
increase the concurrency.
-    * Give some examples: Suppose there are 10 BEs, one for each BE disk. If 
the total size of a table is 500MB, you can consider 4-8 shards. 5GB: 8-16. 
50GB: 32. 500GB: Recommended partitions, each partition is about 50GB in size, 
with 16-32 shards per partition. 5TB: Recommended partitions, each with a size 
of around 50GB and 16-32 shards per partition.
-    
-    > Note: The amount of data in the table can be viewed by the [show 
data](../sql-manual/sql-reference/Show-Statements/SHOW-DATA.md) command. The 
result is divided by the number of copies, which is the amount of data in the 
table.
-    
+   PARTITION BY RANGE(`date`, `id`)
+   (
+       PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
+       PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
+       PARTITION `p201703_all`  VALUES LESS THAN ("2017-04-01")
+   )
+   ```
+   
+   In the above example, we specify `date` (DATE type) and `id` (INT type) as 
partition columns. The resulting partitions in the above example are as follows:
+   
+   ```
+   * p201701_1000:    [(MIN_VALUE,  MIN_VALUE), ("2017-02-01", "1000")   )
+   * p201702_2000:    [("2017-02-01", "1000"),  ("2017-03-01", "2000")   )
+   * p201703_all:     [("2017-03-01", "2000"),  ("2017-04-01", MIN_VALUE)) 
+   ```
+   
+   Note that the last partition user defaults only the partition value of the 
`date` column, so the partition value of the `id` column will be filled with 
`MIN_VALUE` by default. When the user inserts data, the partition column values 
are compared in order, and the corresponding partition is finally obtained. 
Examples are as follows:
+   
+   ```
+   * Data                --> Partition
+   * 2017-01-01, 200     --> p201701_1000
+   * 2017-01-01, 2000    --> p201701_1000
+   * 2017-02-01, 100     --> p201701_1000
+   * 2017-02-01, 2000    --> p201702_2000
+   * 2017-02-15, 5000    --> p201702_2000
+   * 2017-03-01, 2000    --> p201703_all
+   * 2017-03-10, 1       --> p201703_all
+   * 2017-04-01, 1000    --> Unable to import
+   * 2017-05-01, 1000    --> Unable to import
+   ```
+
+   **List Partition**
+
+   - The partition column supports the `BOOLEAN, TINYINT, SMALLINT, INT, 
BIGINT, LARGEINT, DATE, DATETIME, CHAR, VARCHAR` data type, and the partition 
value is an enumeration value. Partitions can be hit only if the data is one of 
the target partition enumeration values.
+
+   - Partition supports specifying the number of partitions contained in each 
partition via `VALUES IN (...) ` to specify the enumeration values contained in 
each partition.
+
+   - The following example illustrates how partitions change when adding or 
deleting partitions.
+
+     - As in the `example_list_tbl` example above, when the table is built, 
the following three partitions are automatically created.
+
+       ```text
+       p_cn: ("Beijing", "Shanghai", "Hong Kong")
+       p_usa: ("New York", "San Francisco")
+       p_jp: ("Tokyo")
+       ```
+
+     - When we add a partition p_uk VALUES IN ("London"), the result of the 
partition is as follows:
+
+       ```text
+       p_cn: ("Beijing", "Shanghai", "Hong Kong")
+       p_usa: ("New York", "San Francisco")
+       p_jp: ("Tokyo")
+       p_uk: ("London")
+       ```
+
+     - When we delete the partition p_jp, the result of the partition is as 
follows:
+
+       ```text
+       p_cn: ("Beijing", "Shanghai", "Hong Kong")
+       p_usa: ("New York", "San Francisco")
+       p_uk: ("London")
+       ```
+
+   List partition also supports **multi-column partition**, examples are as 
follows:
+
+   ```text
+   PARTITION BY LIST(`id`, `city`)
+   (
+       PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
+       PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
+       PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
+   )
+   ```
+   
+   In the above example, we specify `id`(INT type) and `city`(VARCHAR type) as 
partition columns. The above example ends up with the following partitions:
+   
+   ```
+   * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
+   * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
+   * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
+   ```
+   
+   When the user inserts data, the partition column values will be compared 
sequentially in order to finally get the corresponding partition. An example is 
as follows.
+   
+   ```
+   * Data           ---> Partition
+   * 1, Beijing     ---> p1_city
+   * 1, Shanghai    ---> p1_city
+   * 2, Shanghai    ---> p2_city
+   * 3, Beijing     ---> p3_city
+   * 1, Tianjin     ---> 无法导入
+   * 4, Beijing     ---> 无法导入
+   ```
+
+2. **Bucket**
+
+   - If a Partition is used, the `DISTRIBUTED ...` statement describes the 
division rules for the data in each partition. If you do not use Partition, it 
describes the rules for dividing the data of the entire table.
+   - The bucket column can be multiple columns, but it must be a Key column. 
The bucket column can be the same or different from the Partition column.
+   - The choice of bucket column is a trade-off between **query throughput** 
and **query concurrency**:
+     1. If you select multiple bucket columns, the data is more evenly 
distributed. However, if the query condition does not include the equivalent 
condition for all bucket columns, a query will scan all buckets. The throughput 
of such queries will increase, and the latency of a single query will decrease. 
This method is suitable for large throughput and low concurrent query scenarios.
+     2. If you select only one or a few bucket columns, the point query can 
query only one bucket. This approach is suitable for high-concurrency point 
query scenarios.
+   - There is no theoretical limit on the number of buckets.
+
+3. **Recommendations on the number and amount of data for Partitions and 
Buckets.**
+
+   - The total number of tablets in a table is equal to (Partition num * 
Bucket num).
+   - The number of tablets in a table, which is slightly more than the number 
of disks in the entire cluster, regardless of capacity expansion.
+   - The data volume of a single tablet does not theoretically have an upper 
and lower bound, but is recommended to be in the range of 1G - 10G. If the 
amount of data for a single tablet is too small, the aggregation of the data is 
not good and the metadata management pressure is high. If the amount of data is 
too large, it is not conducive to the migration, completion, and increase the 
cost of Schema Change or Rollup operation failure retry (the granularity of 
these operations failure r [...]
+   - When the tablet's data volume principle and quantity principle conflict, 
it is recommended to prioritize the data volume principle.
+   - When building a table, the number of Buckets for each partition is 
uniformly specified. However, when dynamically increasing partitions (`ADD 
PARTITION`), you can specify the number of Buckets for the new partition 
separately. This feature can be used to easily reduce or expand data.
+   - Once the number of Buckets for a Partition is specified, it cannot be 
changed. Therefore, when determining the number of Buckets, you need to 
consider the expansion of the cluster in advance. For example, there are 
currently only 3 hosts, and each host has 1 disk. If the number of Buckets is 
only set to 3 or less, then even if you add more machines later, you can't 
increase the concurrency.
+   - Give some examples: Suppose there are 10 BEs, one for each BE disk. If 
the total size of a table is 500MB, you can consider 4-8 shards. 5GB: 8-16. 
50GB: 32. 500GB: Recommended partitions, each partition is about 50GB in size, 
with 16-32 shards per partition. 5TB: Recommended partitions, each with a size 
of around 50GB and 16-32 shards per partition.
+
+   > Note: The amount of data in the table can be viewed by the [show 
data](../sql-manual/sql-reference/Show-Statements/SHOW-DATA.md) command. The 
result is divided by the number of copies, which is the amount of data in the 
table.
 
 #### Compound Partitions vs Single Partitions
 
@@ -355,43 +355,44 @@ The user can also use a single partition without using 
composite partitions. The
 In the last PROPERTIES of the table building statement, for the relevant 
parameters that can be set in PROPERTIES, we can check [CREATE 
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md)
 for a detailed introduction.
 
 ### ENGINE
-
 In this example, the type of ENGINE is olap, the default ENGINE type. In 
Doris, only this ENGINE type is managed and stored by Doris. Other ENGINE 
types, such as mysql, broker, es, etc., are essentially mappings to tables in 
other external databases or systems to ensure that Doris can read the data. And 
Doris itself does not create, manage, and store any tables and data of a 
non-olap ENGINE type.
 
-### Other
+### 其他
 
+```text
 `IF NOT EXISTS` indicates that if the table has not been created, it is 
created. Note that only the table name is judged here, and it is not determined 
whether the new table structure is the same as the existing table structure. So 
if there is a table with the same name but different structure, the command 
will also return success, but it does not mean that a new table and a new 
structure have been created.
+```
 
 ## common problem
 
-### Build Table Operations FAQ
+###Build Table Operations FAQ
 
 1. If a syntax error occurs in a long build statement, a syntax error may be 
incomplete. Here is a list of possible syntax errors for manual error 
correction:
 
-    * The syntax is incorrect. Please read `HELP CREATE TABLE;` carefully to 
check the relevant syntax structure.
-    * Reserved words. When the user-defined name encounters a reserved word, 
it needs to be enclosed in the backquote ``. It is recommended that all custom 
names be generated using this symbol.
-    * Chinese characters or full-width characters. Non-utf8 encoded Chinese 
characters, or hidden full-width characters (spaces, punctuation, etc.) can 
cause syntax errors. It is recommended to check with a text editor with 
invisible characters.
+   - The syntax is incorrect. Please read `HELP CREATE TABLE;` carefully to 
check the relevant syntax structure.
+   - Reserved words. When the user-defined name encounters a reserved word, it 
needs to be enclosed in the backquote ``. It is recommended that all custom 
names be generated using this symbol.
+   - Chinese characters or full-width characters. Non-utf8 encoded Chinese 
characters, or hidden full-width characters (spaces, punctuation, etc.) can 
cause syntax errors. It is recommended to check with a text editor with 
invisible characters.
 
 2. `Failed to create partition [xxx] . Timeout`
 
-    Doris builds are created in order of Partition granularity. This error may 
be reported when a Partition creation fails. Even if you don't use Partition, 
you will report `Failed to create partition` when there is a problem with the 
built table, because as mentioned earlier, Doris will create an unchangeable 
default Partition for tables that do not have a Partition specified.
+   Doris builds are created in order of Partition granularity. This error may 
be reported when a Partition creation fails. Even if you don't use Partition, 
you will report `Failed to create partition` when there is a problem with the 
built table, because as mentioned earlier, Doris will create an unchangeable 
default Partition for tables that do not have a Partition specified.
 
-    When this error is encountered, it is usually the BE that has encountered 
problems creating data fragments. You can follow the steps below to 
troubleshoot:
+   When this error is encountered, it is usually the BE that has encountered 
problems creating data fragments. You can follow the steps below to 
troubleshoot:
 
-    1. In fe.log, find the `Failed to create partition` log for the 
corresponding point in time. In this log, a series of numbers like 
`{10001-10010}` will appear. The first number of the pair is the Backend ID and 
the second number is the Tablet ID. As for the pair of numbers above, on the 
Backend with ID 10001, creating a tablet with ID 10010 failed.
-    2. Go to the be.INFO log corresponding to Backend and find the log related 
to the tablet id in the corresponding time period. You can find the error 
message.
-    3. Listed below are some common tablet creation failure errors, including 
but not limited to:
-        * BE did not receive the relevant task, and the tablet id related log 
could not be found in be.INFO. Or the BE is created successfully, but the 
report fails. For the above questions, see [Deployment and Upgrade 
Documentation] to check the connectivity of FE and BE.
-        * Pre-allocated memory failed. It may be that the length of a line in 
a row in the table exceeds 100KB.
-        * `Too many open files`. The number of open file handles exceeds the 
Linux system limit. The handle limit of the Linux system needs to be modified.
+   - In fe.log, find the `Failed to create partition` log for the 
corresponding point in time. In this log, a series of numbers like 
`{10001-10010}` will appear. The first number of the pair is the Backend ID and 
the second number is the Tablet ID. As for the pair of numbers above, on the 
Backend with ID 10001, creating a tablet with ID 10010 failed.
+   - Go to the be.INFO log corresponding to Backend and find the log related 
to the tablet id in the corresponding time period. You can find the error 
message.
+   - Listed below are some common tablet creation failure errors, including 
but not limited to:
+     - BE did not receive the relevant task, and the tablet id related log 
could not be found in be.INFO. Or the BE is created successfully, but the 
report fails. For the above questions, see [Deployment and Upgrade 
Documentation](../install/install-deploy.md) to check the connectivity of FE 
and BE.
+     - Pre-allocated memory failed. It may be that the length of a line in a 
row in the table exceeds 100KB.
+     - `Too many open files`. The number of open file handles exceeds the 
Linux system limit. The handle limit of the Linux system needs to be modified.
 
-    You can also extend the timeout by setting 
`tablet_create_timeout_second=xxx` in fe.conf. The default is 2 seconds.
+   If there is a timeout when creating a data shard, you can also extend the 
timeout by setting `tablet_create_timeout_second=xxx` and 
`max_create_table_timeout_second=xxx` in fe.conf. Among them, 
`tablet_create_timeout_second` defaults to 1 second, 
`max_create_table_timeout_second` defaults to 60 seconds, and the overall 
timeout time is min(tablet_create_timeout_second * replication_num, 
max_create_table_timeout_second). For specific parameter settings, please refer 
to [fe-config](../ad [...]
 
 3. The build table command does not return results for a long time.
 
-    Doris's table creation command is a synchronous command. The timeout of 
this command is currently set to be relatively simple, ie (tablet num * 
replication num) seconds. If you create more data fragments and have fragment 
creation failed, it may cause an error to be returned after waiting for a long 
timeout.
+   Doris's table creation command is a synchronous command. The timeout of 
this command is currently set to be relatively simple, ie (tablet num * 
replication num) seconds. If you create more data fragments and have fragment 
creation failed, it may cause an error to be returned after waiting for a long 
timeout.
 
-    Under normal circumstances, the statement will return in a few seconds or 
ten seconds. If it is more than one minute, it is recommended to cancel this 
operation directly and go to the FE or BE log to view the related errors.
+   Under normal circumstances, the statement will return in a few seconds or 
ten seconds. If it is more than one minute, it is recommended to cancel this 
operation directly and go to the FE or BE log to view the related errors.
 
 ## More help
 
diff --git a/docs/get-starting/get-starting.md 
b/docs/get-starting/get-starting.md
index 0fc753de39c..731c66497f4 100644
--- a/docs/get-starting/get-starting.md
+++ b/docs/get-starting/get-starting.md
@@ -236,7 +236,7 @@ For a complete parameter comparison table, please go to 
[Profile parameter analy
 - create database
 
    ```mysql
-   CREATE DATABASE database name;
+   CREATE DATABASE database_name;
    ````
 
    > For more detailed syntax and best practices used by Create-DataBase, see 
[Create-Database](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-DATABASE)
 command manual.
diff --git a/docs/install/install-deploy.md b/docs/install/install-deploy.md
index 73b4b553f8d..2d294b5104b 100644
--- a/docs/install/install-deploy.md
+++ b/docs/install/install-deploy.md
@@ -142,7 +142,7 @@ This is a representation of 
[CIDR](https://en.wikipedia.org/wiki/Classless_Inter
 
 **Note**: When priority networks is configured and FE or BE is started, only 
the correct IP binding of FE or BE is ensured. In ADD BACKEND or ADD FRONTEND 
statements, you also need to specify IP matching priority networks 
configuration, otherwise the cluster cannot be established. Give an example:
 
-BE is configured as `priority_networks = 10.1.3.0/24'.`.
+BE is configured as `priority_networks = 10.1.3.0/24`.
 
 When you want to ADD BACKEND use: `ALTER SYSTEM ADD BACKEND 
"192.168.0.1:9050";`
 
@@ -152,7 +152,7 @@ At this point, DROP must remove the BE that added errors 
and re-use the correct
 
 FE is the same.
 
-BROKER does not currently have, nor does it need, priority\_networks. Broker's 
services are bound to 0.0.0 by default. Simply execute the correct accessible 
BROKER IP when ADD BROKER is used.
+BROKER does not currently have, nor does it need, priority\_networks. Broker's 
services are bound to 0.0.0.0 by default. Simply execute the correct accessible 
BROKER IP when ADD BROKER is used.
 
 #### Table Name Case Sensitivity Setting
 
@@ -201,9 +201,9 @@ See the section on `lower_case_table_names` variables in 
[Variables](../advanced
 
     **instructions**
 
-    * 1./home/disk1/doris.HDD : The storage medium is HDD;
-    * 2./home/disk2/doris.SSD  : The storage medium is SSD;
-    * 3./home/disk2/doris  :  The storage medium is HDD HDD(default)
+    - 1./home/disk1/doris.HDD : The storage medium is HDD;
+    - 2./home/disk2/doris.SSD  : The storage medium is SSD;
+    - 3./home/disk2/doris  :  The storage medium is HDD(default)
 
     eg.2: 
 
@@ -213,8 +213,8 @@ See the section on `lower_case_table_names` variables in 
[Variables](../advanced
   
     **instructions**
   
-    * 1./home/disk1/doris,medium:hdd   The storage medium is HDD;
-    * 2./home/disk2/doris,medium:ssd    The storage medium is SSD;
+    - 1./home/disk1/doris,medium:hdd   The storage medium is HDD;
+    - 2./home/disk2/doris,medium:ssd   The storage medium is SSD;
   
 * BE webserver_port configuration
 
@@ -274,128 +274,6 @@ Broker is deployed as a plug-in, independent of Doris. If 
you need to import dat
 
 **Note: In production environments, daemons should be used to start all 
instances to ensure that processes are automatically pulled up after they exit, 
such as [Supervisor](http://supervisord.org/). For daemon startup, in 0.9.0 and 
previous versions, you need to modify the start_xx.sh scripts to remove the 
last & symbol**. Starting with version 0.10.0, call `sh start_xx.sh` directly 
to start. Also refer to [here](https://www.cnblogs.com/lenmom/p/9973401.html)
 
-## Expansion and contraction
-
-Doris can easily expand and shrink FE, BE, Broker instances.
-
-### FE Expansion and Compression
-
-High availability of FE can be achieved by expanding FE to three top-one nodes.
-
-Users can login to Master FE through MySQL client. By:
-
-`SHOW PROC '/frontends';`
-
-To view the current FE node situation.
-
-You can also view the FE node through the front-end page connection: 
``http://fe_hostname:fe_http_port/frontend`` or 
```http://fe_hostname:fe_http_port/system?Path=//frontends```.
-
-All of the above methods require Doris's root user rights.
-
-The process of FE node expansion and contraction does not affect the current 
system operation.
-
-#### Adding FE nodes
-
-FE is divided into three roles: Leader, Follower and Observer. By default, a 
cluster can have only one Leader and multiple Followers and Observers. Leader 
and Follower form a Paxos selection group. If the Leader goes down, the 
remaining Followers will automatically select a new Leader to ensure high write 
availability. Observer synchronizes Leader data, but does not participate in 
the election. If only one FE is deployed, FE defaults to Leader.
-
-The first FE to start automatically becomes Leader. On this basis, several 
Followers and Observers can be added.
-
-Add Follower or Observer. Connect to the started FE using mysql-client and 
execute:
-
-`ALTER SYSTEM ADD FOLLOWER "follower_host:edit_log_port";`
-
-or
-
-`ALTER SYSTEM ADD OBSERVER "observer_host:edit_log_port";`
-
-The follower\_host and observer\_host is the node IP of Follower or Observer, 
and the edit\_log\_port in its configuration file fe.conf.
-
-Configure and start Follower or Observer. Follower and Observer are configured 
with Leader. The following commands need to be executed at the first startup:
-
-`bin/start_fe.sh --helper host:edit_log_port --daemon`
-
-The host is the node IP of Leader, and the edit\_log\_port in Lead's 
configuration file fe.conf. The --helper is only required when 
follower/observer is first startup.
-
-View the status of Follower or Observer. Connect to any booted FE using 
mysql-client and execute: 
-
-```SHOW PROC '/frontends';```
-
-You can view the FE currently joined the cluster and its corresponding roles.
-
-> Notes for FE expansion:
-> 
-> 1. The number of Follower FEs (including Leaders) must be odd. It is 
recommended that a maximum of three constituent high availability (HA) modes be 
deployed.
-> 2. When FE is in a highly available deployment (1 Leader, 2 Follower), we 
recommend that the reading service capability of FE be extended by adding 
Observer FE. Of course, you can continue to add Follower FE, but it's almost 
unnecessary.
-> 3. Usually a FE node can handle 10-20 BE nodes. It is suggested that the 
total number of FE nodes should be less than 10. Usually three can meet most of 
the needs.
-> 4. The helper cannot point to the FE itself, it must point to one or more 
existing running Master/Follower FEs.
-
-#### Delete FE nodes
-
-Delete the corresponding FE node using the following command:
-
-```ALTER SYSTEM DROP FOLLOWER[OBSERVER] "fe_host:edit_log_port";```
-
-> Notes for FE contraction:
-> 
-> 1. When deleting Follower FE, make sure that the remaining Follower 
(including Leader) nodes are odd.
-
-### BE Expansion and Compression
-
-Users can login to Leader FE through mysql-client. By:
-
-```SHOW PROC '/backends';```
-
-To see the current BE node situation.
-
-You can also view the BE node through the front-end page connection: 
``http://fe_hostname:fe_http_port/backend`` or 
``http://fe_hostname:fe_http_port/system?Path=//backends``.
-
-All of the above methods require Doris's root user rights.
-
-The expansion and scaling process of BE nodes does not affect the current 
system operation and the tasks being performed, and does not affect the 
performance of the current system. Data balancing is done automatically. 
Depending on the amount of data available in the cluster, the cluster will be 
restored to load balancing in a few hours to a day. For cluster load, see the 
[Tablet Load Balancing 
Document](../admin-manual/maint-monitor/tablet-meta-tool.md).
-
-#### Add BE nodes
-
-The BE node is added in the same way as in the **BE deployment** section. The 
BE node is added by the `ALTER SYSTEM ADD BACKEND` command.
-
-> Notes for BE expansion:
-> 
-> 1. After BE expansion, Doris will automatically balance the data according 
to the load, without affecting the use during the period.
-
-#### Delete BE nodes
-
-There are two ways to delete BE nodes: DROP and DECOMMISSION
-
-The DROP statement is as follows:
-
-```ALTER SYSTEM DROP BACKEND "be_host:be_heartbeat_service_port";```
-
-**Note: DROP BACKEND will delete the BE directly and the data on it will not 
be recovered!!! So we strongly do not recommend DROP BACKEND to delete BE 
nodes. When you use this statement, there will be corresponding error-proof 
operation hints.**
-
-DECOMMISSION clause:
-
-```ALTER SYSTEM DECOMMISSION BACKEND "be_host:be_heartbeat_service_port";```
-
-> DECOMMISSION notes:
-> 
-> 1. This command is used to safely delete BE nodes. After the command is 
issued, Doris attempts to migrate the data on the BE to other BE nodes, and 
when all data is migrated, Doris automatically deletes the node.
-> 2. The command is an asynchronous operation. After execution, you can see 
that the BE node's `SystemDecommissioned` status is true through ``SHOW PROC 
'/backends';` Indicates that the node is offline.
-> 3. The order **does not necessarily carry out successfully**. For example, 
when the remaining BE storage space is insufficient to accommodate the data on 
the offline BE, or when the number of remaining machines does not meet the 
minimum number of replicas, the command cannot be completed, and the BE will 
always be in the state of `SystemDecommissioned` as true.
-> 4. The progress of DECOMMISSION can be viewed through `SHOW PROC 
'/backends';` Tablet Num, and if it is in progress, Tablet Num will continue to 
decrease.
-> 5. The operation can be carried out by:
->              ```CANCEL ALTER SYSTEM DECOMMISSION BACKEND 
"be_host:be_heartbeat_service_port";```
->      The order was cancelled. When cancelled, the data on the BE will 
maintain the current amount of data remaining. Follow-up Doris re-load balancing
-
-**For expansion and scaling of BE nodes in multi-tenant deployment 
environments, please refer to the [Multi-tenant Design 
Document](../admin-manual/maint-monitor/multi-tenant.md).**
-
-### Broker Expansion and Shrinkage
-
-There is no rigid requirement for the number of Broker instances. Usually one 
physical machine is deployed. Broker addition and deletion can be accomplished 
by following commands:
-
-```ALTER SYSTEM ADD BROKER broker_name "broker_host:broker_ipc_port";```
-```ALTER SYSTEM DROP BROKER broker_name "broker_host:broker_ipc_port";```
-```ALTER SYSTEM DROP ALL BROKER broker_name;```
-
-Broker is a stateless process that can be started or stopped at will. Of 
course, when it stops, the job running on it will fail. Just try again.
 
 ## Common Questions
 
diff --git a/docs/install/source-install/compilation-with-ldb-toolchain.md 
b/docs/install/source-install/compilation-with-ldb-toolchain.md
index d164a42e053..751ae1f588f 100644
--- a/docs/install/source-install/compilation-with-ldb-toolchain.md
+++ b/docs/install/source-install/compilation-with-ldb-toolchain.md
@@ -67,7 +67,7 @@ This works for most Linux distributions (CentOS, Ubuntu, 
etc.).
     2. [Apache Maven 
3.6.3](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/apache-maven-3.6.3-bin.tar.gz)
     3. [Node 
v12.13.0](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/node-v12.13.0-linux-x64.tar.gz)
 
-    If your environment is somehow minimal, additional packages should be 
installed before compiling Doris. The following instructions describe how to 
setup a minimal CentOS 6 box to compile Doris. Other linux distros should be 
similar.
+    Different Linux distributions may have different components included by 
default.Therefore some additional components may need to be installed.The 
following takes centos6 as an example, other distributions are similar:
 
     ```
     # install required system packages
diff --git a/docs/install/source-install/compilation.md 
b/docs/install/source-install/compilation.md
index 8c6c9d57b3d..e3184ee9389 100644
--- a/docs/install/source-install/compilation.md
+++ b/docs/install/source-install/compilation.md
@@ -59,7 +59,7 @@ This document focuses on how to code Doris through source 
code.
 | apache/incubator-doris:build-env-latest | before 
[0efef1b](https://github.com/apache/doris/commit/0efef1b332300887ee0473f9df9bdd9d7297d824)
 | |
 | apache/doris:build-env-for-1.0.0| | 1.0.0 |
 | apache/doris:build-env-for-1.1.0| | 1.1.0 |
-| apache/doris:build-env-ldb-toolchain-latest | trunk | trunk |
+| apache/doris:build-env-ldb-toolchain-latest | trunk |  |
 
 **note**:
 
@@ -133,11 +133,15 @@ This document focuses on how to code Doris through source 
code.
     >
     > If you are using `build-env-for-0.15.0` or  `build-env-for-1.0.0` for 
the first time, use the following command when compiling:
     >
-    > `sh build.sh --clean --be --fe --ui`
+    >```
+    >sh build.sh --clean --be --fe --ui
+    >```
     >
     > `build-env-ldb-toolchain-latest` and later, use the following command:
     >
-    > ` sh build.sh --clean --be --fe`
+    >```
+    >sh build.sh --clean --be --fe
+    >```
     >
     > This is because from build-env-for-0.15.0, we upgraded thrift (0.9 -> 
0.13), you need to use the --clean command to force the use of the new version 
of thrift to generate code files, otherwise incompatible code will appear.
    
@@ -230,6 +234,13 @@ You can try to compile Doris directly in your own Linux 
environment.
 
          REPOSITORY_URL contains all third-party library source code packages 
and their historical versions.
 
+3. `fatal error: Killed signal terminated program ...`
+
+   If you encounter the above error when compiling with a Docker image, it may 
be that the memory allocated to the image is insufficient（The default memory 
size allocated by Docker is 2GB, and the peak memory usage during compilation 
is greater than 2GB).
+
+   Try to increase the allocated memory of the image appropriately, 4GB ~ 8GB 
is recommended.
+
+
 ## Special statement
 
 Starting from version 0.13, the dependency on the two third-party libraries 
[1] and [2] will be removed in the default compiled output. These two 
third-party libraries are under [GNU General Public License 
V3](https://www.gnu.org/licenses/gpl-3.0.en.html). This license is incompatible 
with [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0), so it 
should not appear in the Apache release by default.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-table/data-model.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-table/data-model.md
index 9ac5d545737..3a73dbda8c4 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-table/data-model.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-table/data-model.md
@@ -344,7 +344,7 @@ PROPERTIES (
 
 这种数据模型区别于 Aggregate 和 Unique 模型。数据完全按照导入文件中的数据进行存储，不会有任何聚合。即使两行数据完全相同，也都会保留。 
而在建表语句中指定的 DUPLICATE KEY，只是用来指明底层数据按照那些列进行排序。（更贴切的名称应该为 “Sorted Column”，这里取名 
“DUPLICATE KEY” 只是用以明确表示所用的数据模型。关于 “Sorted 
Column”的更多解释，可以参阅[前缀索引](./index/prefix-index.md)）。在 DUPLICATE KEY 
的选择上，我们建议适当的选择前 2-4 列就可以。
 
-这种数据模型适用于既没有聚合需求，又没有主键唯一性约束的原始数据的存储。更多使用场景，可参阅**聚合模型的局限性**小节。
+这种数据模型适用于既没有聚合需求，又没有主键唯一性约束的原始数据的存储。更多使用场景，可参阅[聚合模型的局限性](./data-model.md#聚合模型的局限性)小节。
 
 ## 聚合模型的局限性
 
@@ -434,7 +434,7 @@ SELECT COUNT(*) FROM table;
 
 所以，`select count(*) from table;` 的正确结果应该为 **4**。但如果我们只扫描 `user_id` 
这一列，如果加上查询时聚合，最终得到的结果是 **3**（10001, 10002, 10003）。而如果不加查询时聚合，则得到的结果是 
**5**（两批次一共5行数据）。可见这两个结果都是不对的。
 
-为了得到正确的结果，我们必须同时读取 `user_id` 和 `date` 这两列的数据，**再加上查询时聚合**，才能返回 **4** 
这个正确的结果。也就是说，在 count(*) 查询中，Doris 必须扫描所有的 AGGREGATE KEY 列（这里就是 `user_id` 和 
`date`），并且聚合后，才能得到语意正确的结果。当聚合列非常多时，count(*) 查询需要扫描大量的数据。
+为了得到正确的结果，我们必须同时读取 `user_id` 和 `date` 这两列的数据，**再加上查询时聚合**，才能返回 **4** 
这个正确的结果。也就是说，在 `count(*)` 查询中，Doris 必须扫描所有的 AGGREGATE KEY 列（这里就是 `user_id` 和 
`date`），并且聚合后，才能得到语意正确的结果。当聚合列非常多时，`count(*)` 查询需要扫描大量的数据。
 
 因此，当业务上有频繁的 count(*) 查询时，我们建议用户通过增加一个**值恒为 1 的，聚合类型为 SUM 的列来模拟 
count(\*)**。如刚才的例子中的表结构，我们修改如下：
 
@@ -451,13 +451,13 @@ SELECT COUNT(*) FROM table;
 
 ### Duplicate 模型
 
-Duplicate 模型没有聚合模型的这个局限性。因为该模型不涉及聚合语意，在做 count(*) 查询时，任意选择一列查询，即可得到语意正确的结果。
+Duplicate 模型没有聚合模型的这个局限性。因为该模型不涉及聚合语意，在做 `count(*)` 查询时，任意选择一列查询，即可得到语意正确的结果。
 
 ## 数据模型的选择建议
 
 因为数据模型在建表时就已经确定，且**无法修改**。所以，选择一个合适的数据模型**非常重要**。
 
-1. Aggregate 模型可以通过预聚合，极大地降低聚合查询时所需扫描的数据量和查询的计算量，非常适合有固定模式的报表类查询场景。但是该模型对 
count(*) 查询很不友好。同时因为固定了 Value 列上的聚合方式，在进行其他类型的聚合查询时，需要考虑语意正确性。
+1. Aggregate 模型可以通过预聚合，极大地降低聚合查询时所需扫描的数据量和查询的计算量，非常适合有固定模式的报表类查询场景。但是该模型对 
`count(*)` 查询很不友好。同时因为固定了 Value 列上的聚合方式，在进行其他类型的聚合查询时，需要考虑语意正确性。
 
 2. Unique 模型针对需要唯一主键约束的场景，可以保证主键唯一性约束。但是无法利用 ROLLUP 等预聚合带来的查询优势（因为本质是 
REPLACE，没有 SUM 这种聚合方式）。
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/get-starting/get-starting.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/get-starting/get-starting.md
index cd619a15fb0..f77b8715100 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/get-starting/get-starting.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/get-starting/get-starting.md
@@ -468,6 +468,7 @@ FE 将查询计划拆分成为 Fragment 下发到 BE 进行任务执行。BE 在
      > 1. FE_HOST 是任一 FE 所在节点 IP，8030 为 fe.conf 中的 http_port。
      > 2. 可以使用任一 BE 的 IP，以及 be.conf 中的 webserver_port 进行导入。如：`BE_HOST:8040`
      > 3. example_db 是你导入表所在的数据库
+     
      本地文件 `table1_data` 以 `,` 作为数据之间的分隔，具体内容如下：
 
      ```text


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[doris-website] branch master updated: fix_en_doc (#49)

Reply via email to