(doris-website) branch master updated: [docs]update en docs of data model (#474)

luzhijing Tue, 26 Mar 2024 20:29:19 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 896d903df67b [docs]update en docs of data model (#474)
896d903df67b is described below

commit 896d903df67be796754e93279e9585fd4f67644e
Author: Luzhijing <82810928+luzhij...@users.noreply.github.com>
AuthorDate: Wed Mar 27 11:29:08 2024 +0800

    [docs]update en docs of data model (#474)
---
 .../table-design/data-model/aggregate.md           | 277 +++-----
 .../table-design/data-model/duplicate.md           | 267 +++-----
 .../table-design/data-model/overview.md            |  13 +-
 .../version-2.0/table-design/data-model/tips.md    | 750 ++++-----------------
 .../version-2.0/table-design/data-model/unique.md  | 148 ++--
 5 files changed, 345 insertions(+), 1110 deletions(-)

diff --git a/versioned_docs/version-2.0/table-design/data-model/aggregate.md 
b/versioned_docs/version-2.0/table-design/data-model/aggregate.md
index 3c4fe0efe4f0..491edd960053 100644
--- a/versioned_docs/version-2.0/table-design/data-model/aggregate.md
+++ b/versioned_docs/version-2.0/table-design/data-model/aggregate.md
@@ -25,16 +25,15 @@ under the License.
 -->
 
 
-## Aggregate Model
 
-We illustrate what aggregation model is and how to use it correctly with 
practical examples.
+The following are practical examples to illustrate what an aggregated model is 
and how to use it correctly.
 
-### Example 1: Importing Data Aggregation
+### Importing Data Aggregation
 
 Assume that the business has the following data table schema:
 
 | ColumnName      | Type         | AggregationType | Comment                   
  |
-|-----------------|--------------|-----------------|-----------------------------|
+| --------------- | ------------ | --------------- | 
--------------------------- |
 | userid          | LARGEINT     |                 | user id                   
  |
 | date            | DATE         |                 | date of data filling      
  |
 | City            | VARCHAR (20) |                 | User City                 
  |
@@ -75,27 +74,31 @@ The columns in the table are divided into Key (dimension) 
columns and Value (ind
 
 When data are imported, rows with the same contents in the Key columns will be 
aggregated into one row, and their values in the Value columns will be 
aggregated as their `AggregationType` specify. Currently, there are several 
aggregation methods and "agg_state" options available:
 
-1. SUM: Accumulate the values in multiple rows.
-2. REPLACE: The newly imported value will replace the previous value.
-3. MAX: Keep the maximum value.
-4. MIN: Keep the minimum value.
-5. REPLACE_IF_NOT_NULL: Non-null value replacement. Unlike REPLACE, it does 
not replace null values.
-6. HLL_UNION: Aggregation method for columns of HLL type, using the 
HyperLogLog algorithm for aggregation.
-7. BITMAP_UNION: Aggregation method for columns of BITMAP type, performing a 
union aggregation of bitmaps.
+- SUM: Accumulate the values in multiple rows.
+- REPLACE: The newly imported value will replace the previous value.
+- MAX: Keep the maximum value.
+- MIN: Keep the minimum value.
+- REPLACE_IF_NOT_NULL: Non-null value replacement. Unlike REPLACE, it does not 
replace null values.
+- HLL_UNION: Aggregation method for columns of HLL type, using the HyperLogLog 
algorithm for aggregation.
+- BITMAP_UNION: Aggregation method for columns of BITMAP type, performing a 
union aggregation of bitmaps.
+
+:::tip
 
 If these aggregation methods cannot meet the requirements, you can choose to 
use the "agg_state" type.
 
+:::
+
 Suppose that you have the following import data (raw data):
 
-| user\_id | date       | city      | age | sex | last\_visit\_date   | cost | 
max\_dwell\_time | min\_dwell\_time |
-|----------|------------|-----------|-----|-----|---------------------|------|------------------|------------------|
-| 10000    | 2017-10-01 | Beijing   | 20  | 0   | 2017-10-01 06:00    | 20   | 
10               | 10               |
-| 10000    | 2017-10-01 | Beijing   | 20  | 0   | 2017-10-01 07:00    | 15   | 
2                | 2                |
-| 10001    | 2017-10-01 | Beijing   | 30  | 1   | 2017-10-01 17:05:45 | 2    | 
22               | 22               |
-| 10002    | 2017-10-02 | Shanghai  | 20  | 1   | 2017-10-02 12:59:12 | 200  | 
5                | 5                |
-| 10003    | 2017-10-02 | Guangzhou | 32  | 0   | 2017-10-02 11:20:00 | 30   | 
11               | 11               |
-| 10004    | 2017-10-01 | Shenzhen  | 35  | 0   | 2017-10-01 10:00:15 | 100  | 
3                | 3                |
-| 10004    | 2017-10-03 | Shenzhen  | 35  | 0   | 2017-10-03 10:20:22 | 11   | 
6                | 6                |
+| user\_id | date       | city      | age  | sex  | last\_visit\_date   | cost 
| max\_dwell\_time | min\_dwell\_time |
+| -------- | ---------- | --------- | ---- | ---- | ------------------- | ---- 
| ---------------- | ---------------- |
+| 10000    | 2017-10-01 | Beijing   | 20   | 0    | 2017-10-01 06:00    | 20   
| 10               | 10               |
+| 10000    | 2017-10-01 | Beijing   | 20   | 0    | 2017-10-01 07:00    | 15   
| 2                | 2                |
+| 10001    | 2017-10-01 | Beijing   | 30   | 1    | 2017-10-01 17:05:45 | 2    
| 22               | 22               |
+| 10002    | 2017-10-02 | Shanghai  | 20   | 1    | 2017-10-02 12:59:12 | 200  
| 5                | 5                |
+| 10003    | 2017-10-02 | Guangzhou | 32   | 0    | 2017-10-02 11:20:00 | 30   
| 11               | 11               |
+| 10004    | 2017-10-01 | Shenzhen  | 35   | 0    | 2017-10-01 10:00:15 | 100  
| 3                | 3                |
+| 10004    | 2017-10-03 | Shenzhen  | 35   | 0    | 2017-10-03 10:20:22 | 11   
| 6                | 6                |
 
 
 And you can import data with the following sql:
@@ -111,145 +114,68 @@ insert into example_db.example_tbl_agg1 values
 (10004,"2017-10-03","Shenzhen",35,0,"2017-10-03 10:20:22",11,6,6);
 ```
 
-Assume that this is a table recording the user behaviors when visiting a 
certain commodity page. The first row of data, for example, is explained as 
follows:
-
-| Data             | Description                                               
|
-|------------------|-----------------------------------------------------------|
-| 10000            | User id, each user uniquely identifies id                 
|
-| 2017-10-01       | Data storage time, accurate to date                       
|
-| Beijing          | User City                                                 
|
-| 20               | User Age                                                  
|
-| 0                | Gender male (1 for female)                                
|
-| 2017-10-01 06:00 | User's time to visit this page, accurate to seconds       
|
-| 20               | Consumption generated by the user's current visit         
|
-| 10               | User's visit, time to stay on the page                    
|
+This is a table recording the user behaviors when visiting a certain commodity 
page. The first row of data, for example, is explained as follows:
+
+| Data             | Description                              |
+| ---------------- | ---------------------------------------- |
+| 10000            | User id, each user uniquely identifies id |
+| 2017-10-01       | Data storage time, accurate to date      |
+| Beijing          | User City                                |
+| 20               | User Age                                 |
+| 0                | Gender male (1 for female)               |
+| 2017-10-01 06:00 | User's time to visit this page, accurate to seconds |
+| 20               | Consumption generated by the user's current visit |
+| 10               | User's visit, time to stay on the page   |
 | 10               | User's current visit, time spent on the page (redundancy) 
|
 
 After this batch of data is imported into Doris correctly, it will be stored 
in Doris as follows:
 
-| user\_id | date       | city      | age | sex | last\_visit\_date   | cost | 
max\_dwell\_time | min\_dwell\_time |
-|----------|------------|-----------|-----|-----|---------------------|------|------------------|------------------|
-| 10000    | 2017-10-01 | Beijing   | 20  | 0   | 2017-10-01 07:00    | 35   | 
10               | 2                |
-| 10001    | 2017-10-01 | Beijing   | 30  | 1   | 2017-10-01 17:05:45 | 2    | 
22               | 22               |
-| 10002    | 2017-10-02 | Shanghai  | 20  | 1   | 2017-10-02 12:59:12 | 200  | 
5                | 5                |
-| 10003    | 2017-10-02 | Guangzhou | 32  | 0   | 2017-10-02 11:20:00 | 30   | 
11               | 11               |
-| 10004    | 2017-10-01 | Shenzhen  | 35  | 0   | 2017-10-01 10:00:15 | 100  | 
3                | 3                |
-| 10004    | 2017-10-03 | Shenzhen  | 35  | 0   | 2017-10-03 10:20:22 | 11   | 
6                | 6                |
+| user\_id | date       | city      | age  | sex  | last\_visit\_date   | cost 
| max\_dwell\_time | min\_dwell\_time |
+| -------- | ---------- | --------- | ---- | ---- | ------------------- | ---- 
| ---------------- | ---------------- |
+| 10000    | 2017-10-01 | Beijing   | 20   | 0    | 2017-10-01 07:00    | 35   
| 10               | 2                |
+| 10001    | 2017-10-01 | Beijing   | 30   | 1    | 2017-10-01 17:05:45 | 2    
| 22               | 22               |
+| 10002    | 2017-10-02 | Shanghai  | 20   | 1    | 2017-10-02 12:59:12 | 200  
| 5                | 5                |
+| 10003    | 2017-10-02 | Guangzhou | 32   | 0    | 2017-10-02 11:20:00 | 30   
| 11               | 11               |
+| 10004    | 2017-10-01 | Shenzhen  | 35   | 0    | 2017-10-01 10:00:15 | 100  
| 3                | 3                |
+| 10004    | 2017-10-03 | Shenzhen  | 35   | 0    | 2017-10-03 10:20:22 | 11   
| 6                | 6                |
 
-As you can see, the data of User 10000 have been aggregated to one row, while 
those of other users remain the same. The explanation for the aggregated data 
of User 10000 is as follows (the first 5 columns remain unchanged, so it starts 
with Column 6 `last_visit_date`):
+The data of User 10000 have been aggregated to one row, while those of other 
users remain the same. The explanation for the aggregated data of User 10000 is 
as follows (the first 5 columns remain unchanged, so it starts with Column 6 
`last_visit_date`):
 
-*`2017-10-01 07:00`: The `last_visit_date` column is aggregated by REPLACE, so 
`2017-10-01 07:00` has replaced  `2017-10-01 06:00`.
+- The value in the 6th column is 2017-10-01 07:00: The `last_visit_date` 
column is aggregated by REPLACE, so `2017-10-01 07:00` has replaced  
`2017-10-01 06:00`.
 
-> Note: When using REPLACE to aggregate data from the same import batch, the 
order of replacement is uncertain. That means, in this case, the data 
eventually saved in Doris could be `2017-10-01 06:00`. However, for different 
import batches, it is certain that data from the new batch will replace those 
from the old batch.
+:::tip
 
-*`35`: The `cost`column is aggregated by SUM, so the update value `35` is the 
result of `20` + `15`.
+When using REPLACE to aggregate data from the same import batch, the order of 
replacement is uncertain. That means, in this case, the data eventually saved 
in Doris could be `2017-10-01 06:00`. However, for different import batches, it 
is certain that data from the new batch will replace those from the old batch.
 
-*`10`: The `max_dwell_time` column is aggregated by MAX, so `10` is saved as 
it is the maximum between `10` and `2`.
+:::
 
-*`2`: The  `min_dwell_time` column is aggregated by MIN, so `2` is saved as it 
is the minimum between `10` and `2`.
+- The value in the 7th column is 35: The `cost`column is aggregated by SUM, so 
the update value `35` is the result of `20` + `15`.
+- The value in the 8th column is 10: The `max_dwell_time` column is aggregated 
by MAX, so `10` is saved as it is the maximum between `10` and `2`.
+- The value in the 9th column is 2: The  `min_dwell_time` column is aggregated 
by MIN, so `2` is saved as it is the minimum between `10` and `2`.
 
 After aggregation, Doris only stores the aggregated data. In other words, the 
detailed raw data will no longer be available.
 
-### Example 2: Keep Detailed Data
-
-Here is a modified version of the table schema in Example 1:
-
-| ColumnName      | Type         | AggregationType | Comment                   
                                            |
-|-----------------|--------------|-----------------|-----------------------------------------------------------------------|
-| user_id         | LARGEINT     |                 | User ID                   
                                            |
-| date            | DATE         |                 | Date when the data are 
imported                                       |
-| timestamp       | DATETIME     |                 | Date and time when the 
data are imported (with second-level accuracy) |
-| city            | VARCHAR (20) |                 | User location city        
                                            |
-| age             | SMALLINT     |                 | User age                  
                                            |
-| sex             | TINYINT      |                 | User gender               
                                            |
-| last visit date | DATETIME     | REPLACE         | Last visit time of the 
user                                           |
-| cost            | BIGINT       | SUM             | Total consumption of the 
user                                         |
-| max_dwell_time  | INT          | MAX             | Maximum user dwell time   
                                            |
-| min_dwell_time  | INT          | MIN             | Minimum user dwell time   
                                            |
+### Import data and aggregate with existing data.
 
-A new column  `timestamp` has been added to record the date and time when the 
data are imported (with second-level accuracy).
+Assuming that the table already contains the previously imported data:
 
-```sql
-CREATE TABLE IF NOT EXISTS example_db.example_tbl_agg2
-(
-    `user_id` LARGEINT NOT NULL COMMENT "用户id",
-    `date` DATE NOT NULL COMMENT "数据灌入日期时间",
-       `timestamp` DATETIME NOT NULL COMMENT "数据灌入日期时间戳",
-    `city` VARCHAR(20) COMMENT "用户所在城市",
-    `age` SMALLINT COMMENT "用户年龄",
-    `sex` TINYINT COMMENT "用户性别",
-    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT 
"用户最后一次访问时间",
-    `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
-    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
-    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
-)
-AGGREGATE KEY(`user_id`, `date`, `timestamp` ,`city`, `age`, `sex`)
-DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
-PROPERTIES (
-"replication_allocation" = "tag.location.default: 1"
-);
-```
+| user_id | date       | city      | age  | sex  | last\_visit\_date   | cost 
| max\_dwell\_time | min\_dwell\_time |
+| ------- | ---------- | --------- | ---- | ---- | ------------------- | ---- 
| ---------------- | ---------------- |
+| 10000   | 2017-10-01 | Beijing   | 20   | 0    | 2017-10-01 07:00    | 35   
| 10               | 2                |
+| 10001   | 2017-10-01 | Beijing   | 30   | 1    | 2017-10-01 17:05:45 | 2    
| 22               | 22               |
+| 10002   | 2017-10-02 | Shanghai  | 20   | 1    | 2017-10-02 12:59:12 | 200  
| 5                | 5                |
+| 10003   | 2017-10-02 | Guangzhou | 32   | 0    | 2017-10-02 11:20:00 | 30   
| 11               | 11               |
+| 10004   | 2017-10-01 | Shenzhen  | 35   | 0    | 2017-10-01 10:00:15 | 100  
| 3                | 3                |
+| 10004   | 2017-10-03 | Shenzhen  | 35   | 0    | 2017-10-03 10:20:22 | 11   
| 6                | 6                |
 
-Suppose that the import data are as follows:
+Now import a new batch of data:
 
-| user_id | date       | timestamp           | city      | age | sex | 
last\_visit\_date   | cost | max\_dwell\_time | min\_dwell\_time |
-|---------|------------|---------------------|-----------|-----|-----|---------------------|------|------------------|------------------|
-| 10000   | 2017-10-01 | 2017-10-01 08:00:05 | Beijing   | 20  | 0   | 
2017-10-01 06:00    | 20   | 10               | 10               |
-| 10000   | 2017-10-01 | 2017-10-01 09:00:05 | Beijing   | 20  | 0   | 
2017-10-01 07:00    | 15   | 2                | 2                |
-| 10001   | 2017-10-01 | 2017-10-01 18:12:10 | Beijing   | 30  | 1   | 
2017-10-01 17:05:45 | 2    | 22               | 22               |
-| 10002   | 2017-10-02 | 2017-10-02 13:10:00 | Shanghai  | 20  | 1   | 
2017-10-02 12:59:12 | 200  | 5                | 5                |
-| 10003   | 2017-10-02 | 2017-10-02 13:15:00 | Guangzhou | 32  | 0   | 
2017-10-02 11:20:00 | 30   | 11               | 11               |
-| 10004   | 2017-10-01 | 2017-10-01 12:12:48 | Shenzhen  | 35  | 0   | 
2017-10-01 10:00:15 | 100  | 3                | 3                |
-| 10004   | 2017-10-03 | 2017-10-03 12:38:20 | Shenzhen  | 35  | 0   | 
2017-10-03 10:20:22 | 11   | 6                | 6                |
+| user_id | date       | city     | age  | sex  | last\_visit\_date   | cost | 
max\_dwell\_time | min\_dwell\_time |
+| ------- | ---------- | -------- | ---- | ---- | ------------------- | ---- | 
---------------- | ---------------- |
+| 10004   | 2017-10-03 | Shenzhen | 35   | 0    | 2017-10-03 11:22:00 | 44   | 
19               | 19               |
+| 10005   | 2017-10-03 | Changsha | 29   | 1    | 2017-10-03 18:11:02 | 3    | 
1                | 1                |
 
-And you can import data with the following sql:
-
-```sql
-insert into example_db.example_tbl_agg2 values
-(10000,"2017-10-01","2017-10-01 08:00:05","Beijing",20,0,"2017-10-01 
06:00:00",20,10,10),
-(10000,"2017-10-01","2017-10-01 09:00:05","Beijing",20,0,"2017-10-01 
07:00:00",15,2,2),
-(10001,"2017-10-01","2017-10-01 18:12:10","Beijing",30,1,"2017-10-01 
17:05:45",2,22,22),
-(10002,"2017-10-02","2017-10-02 13:10:00","Shanghai",20,1,"2017-10-02 
12:59:12",200,5,5),
-(10003,"2017-10-02","2017-10-02 13:15:00","Guangzhou",32,0,"2017-10-02 
11:20:00",30,11,11),
-(10004,"2017-10-01","2017-10-01 12:12:48","Shenzhen",35,0,"2017-10-01 
10:00:15",100,3,3),
-(10004,"2017-10-03","2017-10-03 12:38:20","Shenzhen",35,0,"2017-10-03 
10:20:22",11,6,6);
-```
-
-After importing, this batch of data will be stored in Doris as follows:
-
-| user_id | date       | timestamp           | city      | age | sex | 
last\_visit\_date   | cost | max\_dwell\_time | min\_dwell\_time |
-|---------|------------|---------------------|-----------|-----|-----|---------------------|------|------------------|------------------|
-| 10000   | 2017-10-01 | 2017-10-01 08:00:05 | Beijing   | 20  | 0   | 
2017-10-01 06:00    | 20   | 10               | 10               |
-| 10000   | 2017-10-01 | 2017-10-01 09:00:05 | Beijing   | 20  | 0   | 
2017-10-01 07:00    | 15   | 2                | 2                |
-| 10001   | 2017-10-01 | 2017-10-01 18:12:10 | Beijing   | 30  | 1   | 
2017-10-01 17:05:45 | 2    | 22               | 22               |
-| 10002   | 2017-10-02 | 2017-10-02 13:10:00 | Shanghai  | 20  | 1   | 
2017-10-02 12:59:12 | 200  | 5                | 5                |
-| 10003   | 2017-10-02 | 2017-10-02 13:15:00 | Guangzhou | 32  | 0   | 
2017-10-02 11:20:00 | 30   | 11               | 11               |
-| 10004   | 2017-10-01 | 2017-10-01 12:12:48 | Shenzhen  | 35  | 0   | 
2017-10-01 10:00:15 | 100  | 3                | 3                |
-| 10004   | 2017-10-03 | 2017-10-03 12:38:20 | Shenzhen  | 35  | 0   | 
2017-10-03 10:20:22 | 11   | 6                | 6                |
-
-As you can see, the stored data are exactly the same as the import data. No 
aggregation has ever happened. This is because, the newly added `timestamp` 
column results in **difference of Keys** among the rows. That is to say, as 
long as the Keys of the rows are not identical in the import data, Doris can 
save the complete detailed data even in the Aggregate Model.
-
-### Example 3: Aggregate Import Data and Existing Data
-
-Based on Example 1, suppose that you have the following data stored in Doris:
-
-| user_id | date       | city      | age | sex | last\_visit\_date   | cost | 
max\_dwell\_time | min\_dwell\_time |
-|---------|------------|-----------|-----|-----|---------------------|------|------------------|------------------|
-| 10000   | 2017-10-01 | Beijing   | 20  | 0   | 2017-10-01 07:00    | 35   | 
10               | 2                |
-| 10001   | 2017-10-01 | Beijing   | 30  | 1   | 2017-10-01 17:05:45 | 2    | 
22               | 22               |
-| 10002   | 2017-10-02 | Shanghai  | 20  | 1   | 2017-10-02 12:59:12 | 200  | 
5                | 5                |
-| 10003   | 2017-10-02 | Guangzhou | 32  | 0   | 2017-10-02 11:20:00 | 30   | 
11               | 11               |
-| 10004   | 2017-10-01 | Shenzhen  | 35  | 0   | 2017-10-01 10:00:15 | 100  | 
3                | 3                |
-| 10004   | 2017-10-03 | Shenzhen  | 35  | 0   | 2017-10-03 10:20:22 | 11   | 
6                | 6                |
-
-Now you need to import a new batch of data:
-
-| user_id | date       | city     | age | sex | last\_visit\_date   | cost | 
max\_dwell\_time | min\_dwell\_time |
-|---------|------------|----------|-----|-----|---------------------|------|------------------|------------------|
-| 10004   | 2017-10-03 | Shenzhen | 35  | 0   | 2017-10-03 11:22:00 | 44   | 
19               | 19               |
-| 10005   | 2017-10-03 | Changsha | 29  | 1   | 2017-10-03 18:11:02 | 3    | 1 
               | 1                |
-
-And you can import data with the following sql:
+With the following SQL:
 
 ```sql
 insert into example_db.example_tbl_agg1 values
@@ -259,15 +185,15 @@ insert into example_db.example_tbl_agg1 values
 
 After importing, the data stored in Doris will be updated as follows:
 
-| user_id | date       | city      | age | sex | last\_visit\_date   | cost | 
max\_dwell\_time | min\_dwell\_time |
-|---------|------------|-----------|-----|-----|---------------------|------|------------------|------------------|
-| 10000   | 2017-10-01 | Beijing   | 20  | 0   | 2017-10-01 07:00    | 35   | 
10               | 2                |
-| 10001   | 2017-10-01 | Beijing   | 30  | 1   | 2017-10-01 17:05:45 | 2    | 
22               | 22               |
-| 10002   | 2017-10-02 | Shanghai  | 20  | 1   | 2017-10-02 12:59:12 | 200  | 
5                | 5                |
-| 10003   | 2017-10-02 | Guangzhou | 32  | 0   | 2017-10-02 11:20:00 | 30   | 
11               | 11               |
-| 10004   | 2017-10-01 | Shenzhen  | 35  | 0   | 2017-10-01 10:00:15 | 100  | 
3                | 3                |
-| 10004   | 2017-10-03 | Shenzhen  | 35  | 0   | 2017-10-03 11:22:00 | 55   | 
19               | 6                |
-| 10005   | 2017-10-03 | Changsha  | 29  | 1   | 2017-10-03 18:11:02 | 3    | 
1                | 1                |
+| user_id | date       | city      | age  | sex  | last\_visit\_date   | cost 
| max\_dwell\_time | min\_dwell\_time |
+| ------- | ---------- | --------- | ---- | ---- | ------------------- | ---- 
| ---------------- | ---------------- |
+| 10000   | 2017-10-01 | Beijing   | 20   | 0    | 2017-10-01 07:00    | 35   
| 10               | 2                |
+| 10001   | 2017-10-01 | Beijing   | 30   | 1    | 2017-10-01 17:05:45 | 2    
| 22               | 22               |
+| 10002   | 2017-10-02 | Shanghai  | 20   | 1    | 2017-10-02 12:59:12 | 200  
| 5                | 5                |
+| 10003   | 2017-10-02 | Guangzhou | 32   | 0    | 2017-10-02 11:20:00 | 30   
| 11               | 11               |
+| 10004   | 2017-10-01 | Shenzhen  | 35   | 0    | 2017-10-01 10:00:15 | 100  
| 3                | 3                |
+| 10004   | 2017-10-03 | Shenzhen  | 35   | 0    | 2017-10-03 11:22:00 | 55   
| 19               | 6                |
+| 10005   | 2017-10-03 | Changsha  | 29   | 1    | 2017-10-03 18:11:02 | 3    
| 1                | 1                |
 
 As you can see, the existing data and the newly imported data of User 10004 
have been aggregated. Meanwhile, the new data of User 10005 have been added.
 
@@ -277,11 +203,16 @@ In Doris, data aggregation happens in the following 3 
stages:
 2. The data compaction stage of the underlying BE. At this stage, BE will 
aggregate data from different batches that have been imported.
 3. The data query stage. The data involved in the query will be aggregated 
accordingly.
 
-At different stages, data will be aggregated to varying degrees. For example, 
when a batch of data is just imported, it may not be aggregated with the 
existing data. But for users, they **can only query aggregated data**. That is, 
what users see are the aggregated data, and they **should not assume that what 
they have seen are not or partly aggregated**. (See the [Limitations of 
Aggregate Model](#Limitations of Aggregate Model) section for more details.)
+At different stages, data will be aggregated to varying degrees. For example, 
when a batch of data is just imported, it may not be aggregated with the 
existing data. But for users, they **can only query aggregated data**. That is, 
what users see are the aggregated data, and they **should not assume that what 
they have seen are not or partly aggregated**. 
 
 ### agg_state
 
-    AGG_STATE cannot be used as a key column, and when creating a table, you 
need to declare the signature of the aggregation function. Users do not need to 
specify a length or default value. The actual storage size of the data depends 
on the function implementation.
+:::tip
+
+AGG_STATE cannot be used as a key column, and when creating a table, you need 
to declare the signature of the aggregation function. Users do not need to 
specify a length or default value. The actual storage size of the data depends 
on the function implementation.
+
+:::
+
 
 CREATE TABLE
 
@@ -298,15 +229,15 @@ properties("replication_num" = "1");
 ```
 
 
-"agg_state" is used to declare the data type as "agg_state," and 
"sum/group_concat" are the signatures of aggregation functions.
+`agg_state` is used to declare the data type as `agg_state,` and 
`sum/group_concat` are the signatures of aggregation functions.
 
-Please note that "agg_state" is a data type, similar to "int," "array," or 
"string."
+Please note that `agg_state` is a data type, similar to `int`, `array`, or 
`string`.
 
-"agg_state" can only be used in conjunction with the 
[state](../sql-manual/sql-functions/combinators/state.md)/[merge](../sql-manual/sql-functions/combinators/merge.md)/[union](../sql-manual/sql-functions/combinators/union.md)
 function combinators.
+`agg_state` can only be used in conjunction with the 
[state](../sql-manual/sql-functions/combinators/state.md)/[merge](../sql-manual/sql-functions/combinators/merge.md)/[union](../sql-manual/sql-functions/combinators/union.md)
 function combinators.
 
-"agg_state" represents an intermediate result of an aggregation function. For 
example, with the aggregation function "sum," "agg_state" can represent the 
intermediate state of summing values like sum(1, 2, 3, 4, 5), rather than the 
final result.
+`agg_state` represents an intermediate result of an aggregation function. For 
example, with the aggregation function `sum`, `agg_state` can represent the 
intermediate state of summing values like `sum(1, 2, 3, 4, 5)`, rather than the 
final result.
 
-The "agg_state" type needs to be generated using the "state" function. For the 
current table, it would be "sum_state" and "group_concat_state" for the "sum" 
and "group_concat" aggregation functions, respectively.
+The `agg_state` type needs to be generated using the `state` function. For the 
current table, it would be `sum_state` and `group_concat_state` for the "sum" 
and `group_concat` aggregation functions, respectively.
 
 ```sql
 insert into aggstate values(1,sum_state(1),group_concat_state('a'));
@@ -316,9 +247,9 @@ insert into aggstate 
values(1,sum_state(3),group_concat_state('c'));
 
 At this point, the table contains only one row. Please note that the table 
below is for illustrative purposes and cannot be selected/displayed directly:
 
-| k1 | k2         | k3                        |               
-|----|------------|---------------------------| 
-| 1  | sum(1,2,3) | group_concat_state(a,b,c) | 
+| k1   | k2         | k3                        |
+| ---- | ---------- | ------------------------- |
+| 1    | sum(1,2,3) | group_concat_state(a,b,c) |
 
 Insert another record.
 
@@ -328,12 +259,12 @@ insert into aggstate 
values(2,sum_state(4),group_concat_state('d'));
 
 The table's structure at this moment is...
 
-| k1 | k2         | k3                        |               
-|----|------------|---------------------------| 
-| 1  | sum(1,2,3) | group_concat_state(a,b,c) | 
-| 2  | sum(4)     | group_concat_state(d)     |
+| k1   | k2         | k3                        |
+| ---- | ---------- | ------------------------- |
+| 1    | sum(1,2,3) | group_concat_state(a,b,c) |
+| 2    | sum(4)     | group_concat_state(d)     |
 
-We can use the 'merge' operation to combine multiple states and return the 
final result calculated by the aggregation function.
+We can use the `merge` operation to combine multiple states and return the 
final result calculated by the aggregation function.
 
 ```
 mysql> select sum_merge(k2) from aggstate;
@@ -364,11 +295,11 @@ insert into aggstate select 
3,sum_union(k2),group_concat_union(k3) from aggstate
 
 The table's structure at this moment is...
 
-| k1 | k2           | k3                          |               
-|----|--------------|-----------------------------| 
-| 1  | sum(1,2,3)   | group_concat_state(a,b,c)   | 
-| 2  | sum(4)       | group_concat_state(d)       |
-| 3  | sum(1,2,3,4) | group_concat_state(a,b,c,d) |
+| k1   | k2           | k3                          |
+| ---- | ------------ | --------------------------- |
+| 1    | sum(1,2,3)   | group_concat_state(a,b,c)   |
+| 2    | sum(4)       | group_concat_state(d)       |
+| 3    | sum(1,2,3,4) | group_concat_state(a,b,c,d) |
 
 You can achieve this through a query.
 
@@ -390,4 +321,8 @@ mysql> select sum_merge(k2) , group_concat_merge(k3)from 
aggstate where k1 != 2;
 
 Users can perform more detailed aggregation function operations using 
`agg_state`.
 
-Please note that `agg_state` comes with a certain performance overhead.
+:::tip
+
+ `agg_state` comes with a certain performance overhead.
+
+:::
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/table-design/data-model/duplicate.md 
b/versioned_docs/version-2.0/table-design/data-model/duplicate.md
index 437726ca922a..ea0fd76cedad 100644
--- a/versioned_docs/version-2.0/table-design/data-model/duplicate.md
+++ b/versioned_docs/version-2.0/table-design/data-model/duplicate.md
@@ -24,70 +24,77 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+In certain multidimensional analysis scenarios, the data lacks both a primary 
key and aggregation requirements. For these cases, the Duplicate Data Model can 
be employed.
 
-## Duplicate Model
+In the Duplicate Data Model, data is stored precisely as it appears in the 
imported file, without any aggregation. Even if two rows of data are identical, 
both will be retained. The Duplicate Key specified in the table creation 
statement serves solely to indicate which columns the data should be sorted by 
during storage. It is advisable to select the first 2-4 columns for the 
Duplicate Key.
 
-In some multidimensional analysis scenarios, there is no need for primary keys 
or data aggregation. For these cases, we introduce the Duplicate Model to. Here 
is an example:
+For instance, consider a table with the following data columns that has no 
requirements for primary key updates or aggregations based on aggregate keys:
 
-| ColumnName | Type           | SortKey | Comment        |
-|------------|----------------|---------|----------------|
-| timstamp   | DATETIME       | Yes     | Log time       |
-| type       | INT            | Yes     | Log type       |
-| error_code | INT            | Yes     | Error code     |
-| Error_msg  | VARCHAR (1024) | No      | Error details  |
-| op_id      | BIGINT         | No      | Operator ID    |
-| op_time    | DATETIME       | No      | Operation time |
+:::tip
+The actual table structure and data columns have not been provided in the 
original text. Hence, a specific example cannot be given here. If needed, you 
can add the specific table structure and data columns based on your actual 
situation.
+:::
 
-The corresponding to CREATE TABLE statement is as follows:
+| ColumnName | Type          | SortKey | Comment        |
+| ---------- | ------------- | ------- | -------------- |
+| timstamp   | DATETIME      | Yes     | Log time       |
+| type       | INT           | Yes     | Log type       |
+| error_code | INT           | Yes     | Error code     |
+| Error_msg  | VARCHAR (128) | No      | Error details  |
+| op_id      | BIGINT        | No      | Operator ID    |
+| op_time    | DATETIME      | No      | Operation time |
+
+## **Default Duplicate Model**
+
+When no data model (Unique, Aggregate, or Duplicate) is specified during table 
creation, a Duplicate model table is created by default, and the sort columns 
are automatically selected according to certain rules. For example, in the 
following table creation statement, if no data model is specified, a Duplicate 
model table will be established, and the system will automatically select the 
first three columns as the sort columns.
 
 ```
-CREATE TABLE IF NOT EXISTS example_db.example_tbl_duplicate
+CREATE TABLE IF NOT EXISTS example_tbl_by_default
 (
     `timestamp` DATETIME NOT NULL COMMENT "Log time",
     `type` INT NOT NULL COMMENT "Log type",
     `error_code` INT COMMENT "Error code",
-    `error_msg` VARCHAR(1024) COMMENT "Error details",
+    `error_msg` VARCHAR(1024) COMMENT "Error detail message",
     `op_id` BIGINT COMMENT "Operator ID",
     `op_time` DATETIME COMMENT "Operation time"
 )
-DUPLICATE KEY(`timestamp`, `type`, `error_code`)
 DISTRIBUTED BY HASH(`type`) BUCKETS 1
 PROPERTIES (
 "replication_allocation" = "tag.location.default: 1"
 );
-```
-
-Different from the Aggregate and Unique Models, the Duplicate Model stores the 
data as they are and executes no aggregation. Even if there are two identical 
rows of data, they will both be retained.
-The DUPLICATE KEY in the CREATE TABLE statement is only used to specify based 
on which columns the data are sorted.
-(A more appropriate name than DUPLICATE KEY would be SORTING COLUMN, but it is 
named as such to specify the data model used. For more information,
-see [Prefix Index](./index/index-overview.md).) For the choice of DUPLICATE 
KEY, we recommend the first 2-4 columns.
 
-The Duplicate Model is suitable for storing raw data without aggregation 
requirements or primary key uniqueness constraints.
-For more usage scenarios, see the [Limitations of Aggregate 
Model](#limitations-of-aggregate-model) section.
-
-### Duplicate Model Without SORTING COLUMN (Since Doris 2.0)
+MySQL > desc example_tbl_by_default; 
++------------+---------------+------+-------+---------+-------+
+| Field      | Type          | Null | Key   | Default | Extra |
++------------+---------------+------+-------+---------+-------+
+| timestamp  | DATETIME      | No   | true  | NULL    | NONE  |
+| type       | INT           | No   | true  | NULL    | NONE  |
+| error_code | INT           | Yes  | true  | NULL    | NONE  |
+| error_msg  | VARCHAR(1024) | Yes  | false | NULL    | NONE  |
+| op_id      | BIGINT        | Yes  | false | NULL    | NONE  |
+| op_time    | DATETIME      | Yes  | false | NULL    | NONE  |
++------------+---------------+------+-------+---------+-------+
+6 rows in set (0.01 sec)
+```
 
-When creating a table without specifying Unique, Aggregate, or Duplicate, a 
table with a Duplicate model will be created by default, and the SORTING COLUMN 
will be automatically specified.
+## **Default Duplicate Model without Sort Columns (Since V2.0 )**
 
-When users do not need SORTING COLUMN or Prefix Index, they can configure the 
following table property:
+When users have no sorting requirements, they can add the following 
configuration to the table properties. This way, when creating the default 
Duplicate model , the system will not automatically select any sort columns.
 
 ```
 "enable_duplicate_without_keys_by_default" = "true"
 ```
 
-Then, when creating the default model, the sorting column will no longer be 
specified, and no prefix index will be created for the table to reduce 
additional overhead in importing and storing.
-
 The corresponding to CREATE TABLE statement is as follows:
 
-```sql
-CREATE TABLE IF NOT EXISTS 
example_db.example_tbl_duplicate_without_keys_by_default
+```
+CREATE TABLE IF NOT EXISTS example_tbl_duplicate_without_keys_by_default
 (
-    `timestamp` DATETIME NOT NULL COMMENT "日志时间",
-    `type` INT NOT NULL COMMENT "日志类型",
-    `error_code` INT COMMENT "错误码",
-    `error_msg` VARCHAR(1024) COMMENT "错误详细信息",
-    `op_id` BIGINT COMMENT "负责人id",
-    `op_time` DATETIME COMMENT "处理时间"
+    `timestamp` DATETIME NOT NULL COMMENT "Log time",
+    `type` INT NOT NULL COMMENT "Log type",
+    `error_code` INT COMMENT "Error code",
+    `error_msg` VARCHAR(1024) COMMENT "Error detail message",
+    `op_id` BIGINT COMMENT "Operator ID",
+    `op_time` DATETIME COMMENT "Operation time"
 )
 DISTRIBUTED BY HASH(`type`) BUCKETS 1
 PROPERTIES (
@@ -109,164 +116,40 @@ MySQL > desc 
example_tbl_duplicate_without_keys_by_default;
 6 rows in set (0.01 sec)
 ```
 
-## Limitations of Aggregate Model
-
-This section is about the limitations of the Aggregate Model.
-
-The Aggregate Model only presents the aggregated data. That means we have to 
ensure the presentation consistency of data that has not yet been aggregated 
(for example, two different import batches). The following provides further 
explanation with examples.
-
-Suppose that you have the following table schema:
-
-| ColumnName | Type     | AggregationType | Comment                         |
-|------------|----------|-----------------|---------------------------------|
-| user\_id   | LARGEINT |                 | User ID                         |
-| date       | DATE     |                 | Date when the data are imported |
-| cost       | BIGINT   | SUM             | Total user consumption          |
-
-Assume that there are two batches of data that have been imported into the 
storage engine as follows:
-
-**batch 1**
-
-| user\_id | date       | cost |
-|----------|------------|------|
-| 10001    | 2017-11-20 | 50   |
-| 10002    | 2017-11-21 | 39   |
-
-**batch 2**
-
-| user\_id | date       | cost |
-|----------|------------|------|
-| 10001    | 2017-11-20 | 1    |
-| 10001    | 2017-11-21 | 5    |
-| 10003    | 2017-11-22 | 22   |
-
-As you can see, data about User 10001 in these two import batches have not yet 
been aggregated. However, in order to ensure that users can only query the 
aggregated data as follows:
-
-| user\_id | date       | cost |
-|----------|------------|------|
-| 10001    | 2017-11-20 | 51   |
-| 10001    | 2017-11-21 | 5    |
-| 10002    | 2017-11-21 | 39   |
-| 10003    | 2017-11-22 | 22   |
-
-We have added an aggregation operator to the  query engine to ensure the 
presentation consistency of data.
-
-In addition, on the aggregate column (Value), when executing aggregate class 
queries that are inconsistent with the aggregate type, please pay attention to 
the semantics. For example, in the example above, if you execute the following 
query:
-
-`SELECT MIN(cost) FROM table;`
-
-The result will be 5, not 1.
-
-Meanwhile, this consistency guarantee could considerably reduce efficiency in 
some queries.
-
-Take the basic count (*) query as an example:
-
-`SELECT COUNT(*) FROM table;`
-
-In other databases, such queries return results quickly. Because in actual 
implementation, the models can get the query result by counting rows and saving 
the statistics upon import, or by scanning only one certain column of data to 
get count value upon query, with very little overhead. But in Doris's 
Aggregation Model, the overhead of such queries is **large**.
-
-For the previous example:
-
-**batch 1**
-
-| user\_id | date       | cost |
-|----------|------------|------|
-| 10001    | 2017-11-20 | 50   |
-| 10002    | 2017-11-21 | 39   |
+## **Duplicate Model with Sort Columns **
 
-**batch 2**
+In the table creation statement, the `Duplicate Key` can be designated to 
indicate that data storage should be sorted according to these key columns. 
When choosing the `Duplicate Key`, it is recommended to select the first 2-4 
columns.
 
-| user\_id | date       | cost |
-|----------|------------|------|
-| 10001    | 2017-11-20 | 1    |
-| 10001    | 2017-11-21 | 5    |
-| 10003    | 2017-11-22 | 22   |
+An example of a table creation statement is as follows, specifying sorting 
based on the `timestamp`, `type`, and `error_code` columns.
 
-Since the final aggregation result is:
-
-| user\_id | date       | cost |
-|----------|------------|------|
-| 10001    | 2017-11-20 | 51   |
-| 10001    | 2017-11-21 | 5    |
-| 10002    | 2017-11-21 | 39   |
-| 10003    | 2017-11-22 | 22   |
-
-The correct result of  `select count (*) from table;`  should be **4**. But if 
the model only scans the `user_id` 
-column and operates aggregation upon query, the final result will be **3** 
(10001, 10002, 10003). 
-And if it does not operate aggregation, the final result will be **5** (a 
total of five rows in two batches). Apparently, both results are wrong.
-
-In order to get the correct result, we must read both the  `user_id` and 
`date` column, and **performs aggregation** when querying.
-That is to say, in the `count (*)` query, Doris must scan all AGGREGATE KEY 
columns (in this case, `user_id` and `date`) 
-and aggregate them to get the semantically correct results. That means if 
there are many aggregated columns, `count (*)` queries could involve scanning 
large amounts of data.
-
-Therefore, if you need to perform frequent `count (*)` queries, we recommend 
that you simulate `count (*)` by adding a 
-column of value 1 and aggregation type SUM. In this way, the table schema in 
the previous example will be modified as follows:
-
-| ColumnName | Type   | AggregationType | Comment                         |
-|------------|--------|-----------------|---------------------------------|
-| user ID    | BIGINT |                 | User ID                         |
-| date       | DATE   |                 | Date when the data are imported |
-| Cost       | BIGINT | SUM             | Total user consumption          |
-| count      | BIGINT | SUM             | For count queries               |
-
-The above adds a count column, the value of which will always be **1**, so the 
result of `select count (*) from table;`
-is equivalent to that of `select sum (count) from table;` The latter is much 
more efficient than the former. However,
-this method has its shortcomings, too. That is, it  requires that users will 
not import rows with the same values in the
-AGGREGATE KEY columns. Otherwise, `select sum (count) from table;` can only 
express the number of rows of the originally imported data, instead of the 
semantics of `select count (*) from table;`
-
-Another method is to add a `cound` column of value 1 but aggregation type of 
REPLACE. Then `select sum (count) from table;`
-and `select count (*) from table;`  could produce the same results. Moreover, 
this method does not require the absence of same AGGREGATE KEY columns in the 
import data.
-
-### Merge on Write of Unique Model
-
-The Merge on Write implementation in the Unique Model does not impose the same 
limitation as the Aggregate Model. 
-In Merge on Write, the model adds a  `delete bitmap` for each imported rowset 
to mark the data being overwritten or deleted. With the previous example, after 
Batch 1 is imported, the data status will be as follows:
-
-**batch 1**
-
-| user_id | date       | cost | delete bit |
-|---------|------------|------|------------|
-| 10001   | 2017-11-20 | 50   | false      |
-| 10002   | 2017-11-21 | 39   | false      |
-
-After Batch 2 is imported, the duplicate rows in the first batch will be 
marked as deleted, and the status of the two batches of data is as follows
-
-**batch 1**
-
-| user_id | date       | cost | delete bit |
-|---------|------------|------|------------|
-| 10001   | 2017-11-20 | 50   | **true**   |
-| 10002   | 2017-11-21 | 39   | false      |
-
-**batch 2**
-
-| user\_id | date       | cost | delete bit |
-|----------|------------|------|------------|
-| 10001    | 2017-11-20 | 1    | false      |
-| 10001    | 2017-11-21 | 5    | false      |
-| 10003    | 2017-11-22 | 22   | false      |
-
-In queries, all data marked `true` in the `delete bitmap` will not be read, so 
there is no need for data aggregation.
-Since there are 4 valid rows in the above data, the query result should also 
be 4.  This also enables minimum overhead since it only scans one column of 
data.
-
-In the test environment, `count(*)` queries in Merge on Write of the Unique 
Model deliver 10 times higher performance than that of the Aggregate Model.
-
-### Duplicate Model
-
-The Duplicate Model does not impose the same limitation as the Aggregate Model 
because it does not involve aggregation semantics.
-For any columns, it can return the semantically correct results in  `count 
(*)` queries.
-
-## Key Columns
-
-For the Duplicate, Aggregate, and Unique Models, the Key columns will be 
specified when the table is created,
-but there exist some differences: In the Duplicate Model, the Key columns of 
the table can be regarded as just "sorting columns",
-but not unique identifiers. In Aggregate and Unique Models, the Key columns 
are both "sorting columns" and "unique identifier columns".
-
-## Suggestions for Choosing Data Model
+```
+CREATE TABLE IF NOT EXISTS example_tbl_duplicate
+(
+    `timestamp` DATETIME NOT NULL COMMENT "Log time",
+    `type` INT NOT NULL COMMENT "Log type",
+    `error_code` INT COMMENT "Error code",
+    `error_msg` VARCHAR(1024) COMMENT "Error detail message",
+    `op_id` BIGINT COMMENT "Operator ID",
+    `op_time` DATETIME COMMENT "Operation time"
+)
+DUPLICATE KEY(`timestamp`, `type`, `error_code`)
+DISTRIBUTED BY HASH(`type`) BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1"
+);
 
-Since the data model was established when the table was built, and 
**irrevocable thereafter, it is very important to select the appropriate data 
model**.
+MySQL > desc example_tbl_duplicate; 
++------------+---------------+------+-------+---------+-------+
+| Field      | Type          | Null | Key   | Default | Extra |
++------------+---------------+------+-------+---------+-------+
+| timestamp  | DATETIME      | No   | true  | NULL    | NONE  |
+| type       | INT           | No   | true  | NULL    | NONE  |
+| error_code | INT           | Yes  | true  | NULL    | NONE  |
+| error_msg  | VARCHAR(1024) | Yes  | false | NULL    | NONE  |
+| op_id      | BIGINT        | Yes  | false | NULL    | NONE  |
+| op_time    | DATETIME      | Yes  | false | NULL    | NONE  |
++------------+---------------+------+-------+---------+-------+
+6 rows in set (0.01 sec)
+```
 
-1. The Aggregate Model can greatly reduce the amount of data scanned and query 
computation by pre-aggregation. Thus, it is very suitable for report query 
scenarios with fixed patterns. But this model is unfriendly to `count (*)` 
queries. Meanwhile, since the aggregation method on the Value column is fixed, 
semantic correctness should be considered in other types of aggregation queries.
-2. The Unique Model guarantees the uniqueness of primary key for scenarios 
requiring a unique primary key. The downside is that it cannot exploit the 
advantage brought by pre-aggregation such as ROLLUP in queries. Users who have 
high-performance requirements for aggregate queries are recommended to use the 
newly added Merge on Write implementation since version 1.2.
-3. The Duplicate Model is suitable for ad-hoc queries of any dimensions. 
Although it may not be able to take advantage of the pre-aggregation feature, 
it is not limited by what constrains the Aggregate Model and can give full play 
to the advantage of columnar storage (reading only the relevant columns, but 
not all Key columns).
-4. If user need to use partial-update, please refer to document 
[partial-update](../data-operate/update-delete/partial-update.md)
+Data will be stored according to the original data in the imported file 
without any aggregation. Even if two rows of data are exactly the same, the 
system will retain them all. The `Duplicate Key` specified in the table 
creation statement is only used to indicate which columns should be used for 
sorting during data storage. When choosing the `Duplicate Key`, it is 
recommended to select the first 2-4 columns.
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/table-design/data-model/overview.md 
b/versioned_docs/version-2.0/table-design/data-model/overview.md
index 5a2557f50710..efcd016758c0 100644
--- a/versioned_docs/version-2.0/table-design/data-model/overview.md
+++ b/versioned_docs/version-2.0/table-design/data-model/overview.md
@@ -29,14 +29,15 @@ This topic introduces the data models in Doris from a 
logical perspective so you
 
 ## Basic Concepts
 
-In Doris, data is logically described in the form of tables. A table consists 
of rows and columns. Row is a row of user data. Column is used to describe 
different fields in a row of data.
+This document mainly describes the data model of Doris from a logical 
perspective, aiming to assist users in better utilizing Doris for different 
scenarios.
 
-Columns can be divided into two categories: Key and Value. From a business 
perspective, Key and Value correspond to dimension columns and indicator 
columns, respectively. The key column of Doris is the column specified in the 
table creation statement. The column after the keyword 'unique key' or 
'aggregate key' or 'duplicate key' in the table creation statement is the key 
column, and the rest except the key column is the value column .
+In Doris, data is logically represented in the form of tables. A table 
comprises Rows and Columns. Row represents a single data entry from the user. 
This row contains a set of related values that represent different attributes 
or fields, which are defined by the columns (Column) of the table.
 
-Data models in Doris fall into three types:
+Columns can be broadly categorized into two types: Key and Value. From a 
business perspective, Key and Value can correspond to dimension columns and 
metric columns, respectively. In Doris, the Key columns are those specified in 
the table creation statement. The columns that follow the keywords `unique 
key`, `aggregate key`, or `duplicate key` in the table creation statement are 
considered Key columns, while the remaining columns are Value columns.
 
-* Aggregate
-* Unique
-* Duplicate
+The data models in Doris are primarily classified into three types:
 
+- Duplicate: This data model allows for storing duplicate rows based on the 
specified key columns. It is suitable for scenarios where preserving all the 
original data records is essential.
+- Unique: In this data model, each row is uniquely identified by the 
combination of values in the key columns. This ensures that no duplicate rows 
exist for a given set of key values.
+- Aggregate: This model enables the aggregation of data based on the key 
columns. It is commonly used for scenarios where summary or aggregated 
information, such as totals or averages, is required.
 
diff --git a/versioned_docs/version-2.0/table-design/data-model/tips.md 
b/versioned_docs/version-2.0/table-design/data-model/tips.md
index 5acf45a72070..e53f4dee915b 100644
--- a/versioned_docs/version-2.0/table-design/data-model/tips.md
+++ b/versioned_docs/version-2.0/table-design/data-model/tips.md
@@ -25,692 +25,176 @@ under the License.
 -->
 
 
-# User Guide
 
-Doris uses MySQL protocol for communication. Users can connect to Doris 
clusters through MySQL Client or MySQL JDBC. MySQL Client 5.1 or newer versions 
are recommended because they support user names of more than 16 characters. 
This topic walks you through how to use Doris with the example of MySQL Client.
+## Suggestions for column types
 
-## Create Users
+Suggestions for column types when creating a table:
 
-### Root User Login and Change Password
-
-Doris has its built-in root, and the default password is empty. 
-
->Note:
->
->Doris provides a default root.
->
->The root user has all the privileges about the clusters by default. Users who 
have both Grant_priv and Node_priv can grant these privileges to other users. 
Node changing privileges include adding, deleting, and offlining FE, BE, and 
BROKER nodes.
->
->For more instructions on privileges, please refer to [Privilege 
Management](../admin-manual/privilege-ldap/user-privilege.md)
-
-After starting the Doris program, root or admin users can connect to Doris 
clusters. You can use the following command to log in to Doris. After login, 
you will enter the corresponding MySQL command line interface.
-
-```sql
-[root@doris ~]# mysql  -h FE_HOST -P9030 -uroot
-Welcome to the MySQL monitor.  Commands end with ; or \g.
-Your MySQL connection id is 41
-Server version: 5.1.0 Doris version 1.0.0-preview2-b48ee2734
-
-Copyright (c) 2000, 2022, Oracle and/or its affiliates.
-
-Oracle is a registered trademark of Oracle Corporation and/or its
-affiliates. Other names may be trademarks of their respective
-owners.
-
-Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
-
-mysql> 
-```
-
->1. ` FE_HOST` is the IP address of any FE node. ` 9030 ` is the query_port 
configuration in fe.conf.
-
-After login, you can change the root password by the following command:
-
-```sql
-mysql> SET PASSWORD FOR 'root' = PASSWORD('your_password');
-Query OK, 0 rows affected (0.00 sec)
-```
+1. The Key column should precede all Value columns.
+2. Whenever possible, choose integer types. This is because the calculation 
and search efficiency of integer types are much higher than that of strings.
+3. For selecting integer types of different lengths, follow the principle of 
sufficiency.
+4. For the lengths of VARCHAR and STRING types, also follow the principle of 
sufficiency.
 
-> `your_password` is a new password for the `root` user, which can be set at 
will. A strong password is recommended for security. The new password is 
required in the next login.
-
-### Create New Users
 
-You can create a regular user named  `test` with the following command:
 
-```sql
-mysql> CREATE USER 'test' IDENTIFIED BY 'test_passwd';
-Query OK, 0 rows affected (0.00 sec)
-```
+## Limitations of aggregate model
 
-Follow-up logins can be performed with the following connection commands.
+This section is about the limitations of the Aggregate Model.
 
-```sql
-[root@doris ~]# mysql -h FE_HOST -P9030 -utest -ptest_passwd
-```
+The Aggregate Model only presents the aggregated data. That means we have to 
ensure the presentation consistency of data that has not yet been aggregated 
(for example, two different import batches). The following provides further 
explanation with examples.
 
-> By default, the newly created regular users do not have any privileges. 
Privileges can be granted to these users.
+Suppose that you have the following table schema:
 
-## Create Data Table and Import Data
+| ColumnName | Type     | AggregationType | Comment                         |
+| ---------- | -------- | --------------- | ------------------------------- |
+| user\_id   | LARGEINT |                 | User ID                         |
+| date       | DATE     |                 | Date when the data are imported |
+| cost       | BIGINT   | SUM             | Total user consumption          |
 
-### Create a Database
+Assume that there are two batches of data that have been imported into the 
storage engine as follows:
 
-Initially, root or admin users can create a database by the following command:
+**batch 1**
 
-```sql
-CREATE DATABASE example_db;
-```
-
-> You can use the  `HELP` command to check the syntax of all commands. For 
example, `HELP CREATE DATABASE;`. Or you can refer to the [SHOW CREATE 
DATABASE](../sql-manual/sql-reference/Show-Statements/SHOW-CREATE-DATABASE.md) 
command manual.
->
-> If you don't know the full name of the command, you can use "HELP + a field 
of the command" for fuzzy query. For example, if you type in  `HELP CREATE`, 
you can find commands including `CREATE DATABASE`, `CREATE TABLE`, and `CREATE 
USER`.
->
-> ```
->mysql> HELP CREATE;
->Many help items for your request exist.
->To make a more specific request, please type 'help <item>',
->where <item> is one of the following
->topics:
->   CREATE CATALOG
->   CREATE DATABASE
->   CREATE ENCRYPTKEY
->   CREATE EXTERNAL TABLE
->   CREATE FILE
->   CREATE FUNCTION
->   CREATE INDEX
->   CREATE MATERIALIZED VIEW
->   CREATE POLICY
->   CREATE REPOSITORY
->   CREATE RESOURCE
->   CREATE ROLE
->   CREATE ROUTINE LOAD
->   CREATE SQL BLOCK RULE
->   CREATE SYNC JOB
->   CREATE TABLE
->   CREATE TABLE AS SELECT
->   CREATE TABLE LIKE
->   CREATE USER
->   CREATE VIEW
->   CREATE WORKLOAD GROUP
->   SHOW CREATE CATALOG
->   SHOW CREATE DATABASE
->   SHOW CREATE FUNCTION
->   SHOW CREATE LOAD
->   SHOW CREATE REPOSITORY
->   SHOW CREATE ROUTINE LOAD
->   SHOW CREATE TABLE
-> ```
-
-After the database is created, you can view the information about the database 
via the [SHOW 
DATABASES](../sql-manual/sql-reference/Show-Statements/SHOW-DATABASES.md) 
command.
-
-```sql
-MySQL> SHOW DATABASES;
-+--------------------+
-| Database           |
-+--------------------+
-| example_db         |
-| information_schema |
-+--------------------+
-2 rows in set (0.00 sec)
-```
-
-> `information_schema` exists for compatibility with MySQL protocol, so the 
information might not be 100% accurate in practice. Therefore, for information 
about the specific databases, please query the corresponding databases directly.
-
-### Authorize an Account
-
-After  `example_db`  is created, root/admin users can grant read/write 
privileges of `example_db`  to regular users, such as `test`, using the `GRANT` 
command. After authorization, user `test` can perform operations on 
`example_db`.
-
-```sql
-mysql> GRANT ALL ON example_db TO test;
-Query OK, 0 rows affected (0.01 sec)
-```
-
-### Create a Table
-
-You can create a table using the [CREATE 
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md)
 command. For detailed parameters, you can send a  `HELP CREATE TABLE;` command.
-
-Firstly, you need to switch to the target database using the 
[USE](../sql-manual/sql-reference/Utility-Statements/USE.md) command:
-
-```sql
-mysql> USE example_db;
-Database changed
-```
-
-Doris supports two table creation methods: [compound partitioning and single 
partitioning](./data-partition.md). The following takes the Aggregate Model as 
an example to demonstrate how to create tables with these two methods, 
respectively.
-
-#### Single Partitioning
-
-Create a logical table named `table1`. The bucketing column is the `siteid` 
column, and the number of buckets is 10.
-
-The table schema is as follows:
-
-* `siteid`:  INT (4 bytes); default value: 10
-* `citycode`: SMALLINT (2 bytes)
-* `username`: VARCHAR, with a maximum length of 32; default value: empty string
-* `pv`: BIGINT (8 bytes); default value: 0; This is a metric column, and Doris 
will aggregate the metric columns internally. The `pv` column is aggregated by 
SUM.
-
-The corresponding CREATE TABLE statement is as follows:
-```sql
-CREATE TABLE table1
-(
-    siteid INT DEFAULT '10',
-    citycode SMALLINT,
-    username VARCHAR(32) DEFAULT '',
-    pv BIGINT SUM DEFAULT '0'
-)
-AGGREGATE KEY(siteid, citycode, username)
-DISTRIBUTED BY HASH(siteid) BUCKETS 10
-PROPERTIES("replication_num" = "1");
-```
-
-#### Compound Partitioning
-
-Create a logical table named `table2`.
-
-The table schema is as follows:
-
-* `event_day`: DATE; no default value
-* `siteid`: INT (4 bytes); default value: 10
-* `citycode`: SMALLINT (2 bytes)
-* `username`: VARCHAR, with a maximum length of 32; default value: empty string
-* `pv`: BIGINT (8 bytes); default value: 0; This is a metric column, and Doris 
will aggregate the metric columns internally. The `pv` column is aggregated by 
SUM.
-
-Use the `event_day` column as the partitioning column and create 3 partitions: 
p201706, p201707, and p201708.
-
-* p201706: Range [Minimum, 2017-07-01)
-* p201707: Range [2017-07-01, 2017-08-01)
-* p201708: Range [2017-08-01, 2017-09-01)
-
-> Note that the intervals are left-closed and right-open.
-
-HASH bucket each partition based on `siteid`. The number of buckets per 
partition is 10.
-
-The corresponding CREATE TABLE statement is as follows:
-```sql
-CREATE TABLE table2
-(
-    event_day DATE,
-    siteid INT DEFAULT '10',
-    citycode SMALLINT,
-    username VARCHAR(32) DEFAULT '',
-    pv BIGINT SUM DEFAULT '0'
-)
-AGGREGATE KEY(event_day, siteid, citycode, username)
-PARTITION BY RANGE(event_day)
-(
-    PARTITION p201706 VALUES LESS THAN ('2017-07-01'),
-    PARTITION p201707 VALUES LESS THAN ('2017-08-01'),
-    PARTITION p201708 VALUES LESS THAN ('2017-09-01')
-)
-DISTRIBUTED BY HASH(siteid) BUCKETS 10
-PROPERTIES("replication_num" = "1");
-```
-
-After the table is created, you can view the information of the table in 
`example_db`:
-
-```sql
-MySQL> SHOW TABLES;
-+----------------------+
-| Tables_in_example_db |
-+----------------------+
-| table1               |
-| table2               |
-+----------------------+
-2 rows in set (0.01 sec)
-
-MySQL> DESC table1;
-+----------+-------------+------+-------+---------+-------+
-| Field    | Type        | Null | Key   | Default | Extra |
-+----------+-------------+------+-------+---------+-------+
-| siteid   | int(11)     | Yes  | true  | 10      |       |
-| citycode | smallint(6) | Yes  | true  | N/A     |       |
-| username | varchar(32) | Yes  | true  |         |       |
-| pv       | bigint(20)  | Yes  | false | 0       | SUM   |
-+----------+-------------+------+-------+---------+-------+
-4 rows in set (0.00 sec)
-
-MySQL> DESC table2;
-+-----------+-------------+------+-------+---------+-------+
-| Field     | Type        | Null | Key   | Default | Extra |
-+-----------+-------------+------+-------+---------+-------+
-| event_day | date        | Yes  | true  | N/A     |       |
-| siteid    | int(11)     | Yes  | true  | 10      |       |
-| citycode  | smallint(6) | Yes  | true  | N/A     |       |
-| username  | varchar(32) | Yes  | true  |         |       |
-| pv        | bigint(20)  | Yes  | false | 0       | SUM   |
-+-----------+-------------+------+-------+---------+-------+
-5 rows in set (0.00 sec)
-```
-
-> Note:
->
-> 1. As  `replication_num`  is set to `1`, the above tables are created with 
only one copy. We recommend that you adopt the default three-copy settings to 
ensure high availability.
-> 2. You can dynamically add or delete partitions of compoundly partitioned 
tables. See  `HELP ALTER TABLE`.
-> 3. You can import data into the specified Partition. See `HELP LOAD;`.
-> 4. You can dynamically change the table schema. See `HELP ALTER TABLE;`.
-> 5. You can add Rollups to Tables to improve query performance. See the 
Rollup-related section in "Advanced Usage".
-> 6. The value of the column is nullable by default, which may affect query 
performance.
-
-### Load Data
-
-Doris supports a variety of data loading methods. You can refer to [Data 
Loading](../data-operate/import/load-manual.md) for more details. The following 
uses Stream Load and Broker Load as examples.
-
-#### Stream Load 
-
-The Stream Load method transfers data to Doris via HTTP protocol. It can 
import local data directly without relying on any other systems or components. 
For the detailed syntax, please see `HELP STREAM LOAD;`.
-
-Example 1: Use "table1_20170707" as the Label, import the local file 
`table1_data` into `table1`.
-
-```bash
-curl --location-trusted -u test:test_passwd -H "label:table1_20170707" -H 
"column_separator:," -T table1_data 
http://FE_HOST:8030/api/example_db/table1/_stream_load
-```
-
-> 1. FE_HOST is the IP address of any FE node and 8030 is the http_port in 
fe.conf.
-> 2. You can use the IP address of any BE and the webserver_port in be.conf 
for import. For example: `BE_HOST:8040`.
-
-The local file `table1_data` uses `,` as the separator between data. The 
details are as follows:
-
-```text
-1,1,Jim,2
-2,1,grace,2
-3,2,tom,2
-4,3,bush,3
-5,3,helen,3
-```
-
-Example 2: Use "table2_20170707" as the Label, import the local file 
`table2_data` into `table2`.
-
-```bash
-curl --location-trusted -u test:test -H "label:table2_20170707" -H 
"column_separator:|" -T table2_data 
http://127.0.0.1:8030/api/example_db/table2/_stream_load
-```
-
-The local file `table2_data` uses `|` as the separator between data. The 
details are as follows:
-
-```
-2017-07-03|1|1|jim|2
-2017-07-05|2|1|grace|2
-2017-07-12|3|2|tom|2
-2017-07-15|4|3|bush|3
-2017-07-12|5|3|helen|3
-```
-
-> Note:
->
-> 1. The recommended file size for Stream Load is less than 10GB. Excessive 
file size will result in higher retry cost.
-> 2. Each batch of import data should have a Label. Label serves as the unique 
identifier of the load task, and guarantees that the same batch of data will 
only be successfully loaded into a database once. For more details, please see 
[Data Loading and 
Atomicity](https://doris.apache.org/docs/dev/data-operate/import/import-scenes/load-atomicity/).
 
-> 3. Stream Load is a synchronous command. The return of the command indicates 
that the data has been loaded successfully; otherwise the data has not been 
loaded.
-
-#### Broker Load
-
-The Broker Load method imports externally stored data via deployed Broker 
processes. For more details, please see `HELP BROKER LOAD;`
-
-Example: Use "table1_20170708" as the Label, import files on HDFS into 
`table1` .
-
-```sql
-LOAD LABEL table1_20170708
-(
-    DATA INFILE("hdfs://your.namenode.host:port/dir/table1_data")
-    INTO TABLE table1
-)
-WITH BROKER hdfs 
-(
-    "username"="hdfs_user",
-    "password"="hdfs_password"
-)
-PROPERTIES
-(
-    "timeout"="3600",
-    "max_filter_ratio"="0.1"
-);
-```
-
-The Broker Load is an asynchronous command. Successful execution of it only 
indicates successful submission of the task. You can check if the import task 
has been completed by `SHOW LOAD;` . For example:
-
-```sql
-SHOW LOAD WHERE LABEL = "table1_20170708";
-```
-
-In the return result, if you find  `FINISHED` in the `State` field, that means 
the import is successful.
-
-For more instructions on `SHOW LOAD`, see` HELP SHOW LOAD; `.
-
-Asynchronous import tasks can be cancelled before it is completed:
-
-```sql
-CANCEL LOAD WHERE LABEL = "table1_20170708";
-```
-
-## Query the Data
-
-### Simple Query
-
-Query example::
-
-```sql
-MySQL> SELECT * FROM table1 LIMIT 3;
-+--------+----------+----------+------+
-| siteid | citycode | username | pv   |
-+--------+----------+----------+------+
-|      2 |        1 | 'grace'  |    2 |
-|      5 |        3 | 'helen'  |    3 |
-|      3 |        2 | 'tom'    |    2 |
-+--------+----------+----------+------+
-3 rows in set (0.01 sec)
-
-MySQL> SELECT * FROM table1 ORDER BY citycode;
-+--------+----------+----------+------+
-| siteid | citycode | username | pv   |
-+--------+----------+----------+------+
-|      2 |        1 | 'grace'  |    2 |
-|      1 |        1 | 'jim'    |    2 |
-|      3 |        2 | 'tom'    |    2 |
-|      4 |        3 | 'bush'   |    3 |
-|      5 |        3 | 'helen'  |    3 |
-+--------+----------+----------+------+
-5 rows in set (0.01 sec)
-```
-
-### SELECT * EXCEPT
-
-The `SELECT * EXCEPT` statement is used to exclude one or more columns from 
the result. The output will not include any of the specified columns.
-
-```sql
-MySQL> SELECT * except (username, citycode) FROM table1;
-+--------+------+
-| siteid | pv   |
-+--------+------+
-|      2 |    2 |
-|      5 |    3 |
-|      3 |    2 |
-+--------+------+
-3 rows in set (0.01 sec)
-```
-
-**Note**: `SELECT * EXCEPT` does not exclude columns that do not have a name.
-
-###  Join Query
-
-Query example::
-
-```sql
-MySQL> SELECT SUM(table1.pv) FROM table1 JOIN table2 WHERE table1.siteid = 
table2.siteid;
-+--------------------+
-| sum(`table1`.`pv`) |
-+--------------------+
-|                 12 |
-+--------------------+
-1 row in set (0.20 sec)
-```
-
-### Subquery
-
-Query example::
-
-```sql
-MySQL> SELECT SUM(pv) FROM table2 WHERE siteid IN (SELECT siteid FROM table1 
WHERE siteid > 2);
-+-----------+
-| sum(`pv`) |
-+-----------+
-|         8 |
-+-----------+
-1 row in set (0.13 sec)
-```
+| user\_id | date       | cost |
+| -------- | ---------- | ---- |
+| 10001    | 2017-11-20 | 50   |
+| 10002    | 2017-11-21 | 39   |
 
-## Change Table Schema
+**batch 2**
 
-Use the [ALTER TABLE 
COLUMN](../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN.md)
 command to modify the table Schema, including the following changes.
+| user\_id | date       | cost |
+| -------- | ---------- | ---- |
+| 10001    | 2017-11-20 | 1    |
+| 10001    | 2017-11-21 | 5    |
+| 10003    | 2017-11-22 | 22   |
 
-- Adding columns
-- Deleting columns
-- Modify column types
-- Changing the order of columns
-
-The following table structure changes are illustrated by using the following 
example.
-
-The Schema for the original table1 is as follows:
-
-```text
-+----------+-------------+------+-------+---------+-------+
-| Field    | Type        | Null | Key   | Default | Extra |
-+----------+-------------+------+-------+---------+-------+
-| siteid   | int(11)     | No   | true  | 10      |       |
-| citycode | smallint(6) | No   | true  | N/A     |       |
-| username | varchar(32) | No   | true  |         |       |
-| pv       | bigint(20)  | No   | false | 0       | SUM   |
-+----------+-------------+------+-------+---------+-------+
-```
-
-We add a new column uv, type BIGINT, aggregation type SUM, default value 0:
-
-```sql
-ALTER TABLE table1 ADD COLUMN uv BIGINT SUM DEFAULT '0' after pv;
-```
-
-After successful submission, you can check the progress of the job with the 
following command:
-
-```sql
-SHOW ALTER TABLE COLUMN;
-```
+As you can see, data about User 10001 in these two import batches have not yet 
been aggregated. However, in order to ensure that users can only query the 
aggregated data as follows:
 
-When the job status is ``FINISHED``, the job is complete. The new Schema has 
taken effect.
+| user\_id | date       | cost |
+| -------- | ---------- | ---- |
+| 10001    | 2017-11-20 | 51   |
+| 10001    | 2017-11-21 | 5    |
+| 10002    | 2017-11-21 | 39   |
+| 10003    | 2017-11-22 | 22   |
 
-After ALTER TABLE is completed, you can view the latest Schema via ``DESC 
TABLE``.
+We have added an aggregation operator to the  query engine to ensure the 
presentation consistency of data.
 
-```
-mysql> DESC table1;
-+----------+-------------+------+-------+---------+-------+
-| Field    | Type        | Null | Key   | Default | Extra |
-+----------+-------------+------+-------+---------+-------+
-| siteid   | int(11)     | No   | true  | 10      |       |
-| citycode | smallint(6) | No   | true  | N/A     |       |
-| username | varchar(32) | No   | true  |         |       |
-| pv       | bigint(20)  | No   | false | 0       | SUM   |
-| uv       | bigint(20)  | No   | false | 0       | SUM   |
-+----------+-------------+------+-------+---------+-------+
-5 rows in set (0.00 sec)
-```
+In addition, on the aggregate column (Value), when executing aggregate class 
queries that are inconsistent with the aggregate type, please pay attention to 
the semantics. For example, in the example above, if you execute the following 
query:
 
-You can cancel the currently executing job with the following command:
+`SELECT MIN(cost) FROM table;`
 
-```sql
-CANCEL ALTER TABLE COLUMN FROM table1;
-```
+The result will be 5, not 1.
 
-For more help, see ``HELP ALTER TABLE``.
+Meanwhile, this consistency guarantee could considerably reduce efficiency in 
some queries.
 
-## Rollup
+Take the basic count (*) query as an example:
 
-Rollup can be seen as a materialized index structure for a Table, 
**materialized** in the sense that its data is physically independent in 
storage, and **indexed** in the sense that Rollup can reorder columns to 
increase the hit rate of prefix indexes as well as reduce Key columns to 
increase the aggregation level of data.
+`SELECT COUNT(*) FROM table;`
 
-You can perform various changes to Rollup using [ALTER TABLE 
ROLLUP](../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-ROLLUP.md).
+In other databases, such queries return results quickly. Because in actual 
implementation, the models can get the query result by counting rows and saving 
the statistics upon import, or by scanning only one certain column of data to 
get count value upon query, with very little overhead. But in Doris's 
Aggregation Model, the overhead of such queries is **large**.
 
-The following is an exemplified illustration.
+For the previous example:
 
-The original schema of `table1` is as follows:
+**batch 1**
 
-```
-+----------+-------------+------+-------+---------+-------+
-| Field    | Type        | Null | Key   | Default | Extra |
-+----------+-------------+------+-------+---------+-------+
-| siteid   | int(11)     | No   | true  | 10      |       |
-| citycode | smallint(6) | No   | true  | N/A     |       |
-| username | varchar(32) | No   | true  |         |       |
-| pv       | bigint(20)  | No   | false | 0       | SUM   |
-| uv       | bigint(20)  | No   | false | 0       | SUM   |
-+----------+-------------+------+-------+---------+-------+
-```
+| user\_id | date       | cost |
+| -------- | ---------- | ---- |
+| 10001    | 2017-11-20 | 50   |
+| 10002    | 2017-11-21 | 39   |
 
-For `table1`,  `siteid`, `citycode`, and `username`  constitute a set of Key, 
based on which the pv fields are aggregated; if you have a frequent need to 
view the total city pv, you can create a Rollup consisting of only `citycode` 
and  `pv`:
+**batch 2**
 
-```sql
-ALTER TABLE table1 ADD ROLLUP rollup_city(citycode, pv);
-```
+| user\_id | date       | cost |
+| -------- | ---------- | ---- |
+| 10001    | 2017-11-20 | 1    |
+| 10001    | 2017-11-21 | 5    |
+| 10003    | 2017-11-22 | 22   |
 
-After successful submission, you can check the progress of the task with the 
following command:
+Since the final aggregation result is:
 
-```sql
-SHOW ALTER TABLE ROLLUP;
-```
+| user\_id | date       | cost |
+| -------- | ---------- | ---- |
+| 10001    | 2017-11-20 | 51   |
+| 10001    | 2017-11-21 | 5    |
+| 10002    | 2017-11-21 | 39   |
+| 10003    | 2017-11-22 | 22   |
 
-If the task status is ``FINISHED``, the job is completed.
+The correct result of  `select count (*) from table;`  should be **4**. But if 
the model only scans the `user_id` 
+column and operates aggregation upon query, the final result will be **3** 
(10001, 10002, 10003). 
+And if it does not operate aggregation, the final result will be **5** (a 
total of five rows in two batches). Apparently, both results are wrong.
 
-After the Rollup is created, you can use ``DESC table1 ALL`` to check the 
information of the Rollup.
+In order to get the correct result, we must read both the  `user_id` and 
`date` column, and **performs aggregation** when querying.
+That is to say, in the `count (*)` query, Doris must scan all AGGREGATE KEY 
columns (in this case, `user_id` and `date`) 
+and aggregate them to get the semantically correct results. That means if 
there are many aggregated columns, `count (*)` queries could involve scanning 
large amounts of data.
 
-```
-mysql> desc table1 all;
-+-------------+----------+-------------+------+-------+--------+-------+
-| IndexName   | Field    | Type        | Null | Key   | Default | Extra |
-+-------------+----------+-------------+------+-------+---------+-------+
-| table1      | siteid   | int(11)     | No   | true  | 10      |       |
-|             | citycode | smallint(6) | No   | true  | N/A     |       |
-|             | username | varchar(32) | No   | true  |         |       |
-|             | pv       | bigint(20)  | No   | false | 0       | SUM   |
-|             | uv       | bigint(20)  | No   | false | 0       | SUM   |
-|             |          |             |      |       |         |       |
-| rollup_city | citycode | smallint(6) | No   | true  | N/A     |       |
-|             | pv       | bigint(20)  | No   | false | 0       | SUM   |
-+-------------+----------+-------------+------+-------+---------+-------+
-8 rows in set (0.01 sec)
-```
+Therefore, if you need to perform frequent `count (*)` queries, we recommend 
that you simulate `count (*)` by adding a 
+column of value 1 and aggregation type SUM. In this way, the table schema in 
the previous example will be modified as follows:
 
-You can cancel the currently ongoing task using the following command:
+| ColumnName | Type   | AggregationType | Comment                         |
+| ---------- | ------ | --------------- | ------------------------------- |
+| user ID    | BIGINT |                 | User ID                         |
+| date       | DATE   |                 | Date when the data are imported |
+| Cost       | BIGINT | SUM             | Total user consumption          |
+| count      | BIGINT | SUM             | For count queries               |
 
-```sql
-CANCEL ALTER TABLE ROLLUP FROM table1;
-```
+The above adds a count column, the value of which will always be **1**, so the 
result of `select count (*) from table;`
+is equivalent to that of `select sum (count) from table;` The latter is much 
more efficient than the former. However,
+this method has its shortcomings, too. That is, it  requires that users will 
not import rows with the same values in the
+AGGREGATE KEY columns. Otherwise, `select sum (count) from table;` can only 
express the number of rows of the originally imported data, instead of the 
semantics of `select count (*) from table;`
 
-With created Rollups, you do not need to specify the Rollup in queries, but 
only specify the original table for the query. The program will automatically 
determine if Rollup should be used. You can check whether Rollup is hit or not  
using the ``EXPLAIN your_sql;`` command.
+Another method is to add a `cound` column of value 1 but aggregation type of 
REPLACE. Then `select sum (count) from table;`
+and `select count (*) from table;`  could produce the same results. Moreover, 
this method does not require the absence of same AGGREGATE KEY columns in the 
import data.
 
-For more help, see `HELP ALTER TABLE`.
+### Merge on write of unique model
 
+The Merge on Write implementation in the Unique Model does not impose the same 
limitation as the Aggregate Model. 
+In Merge on Write, the model adds a  `delete bitmap` for each imported rowset 
to mark the data being overwritten or deleted. With the previous example, after 
Batch 1 is imported, the data status will be as follows:
 
+**batch 1**
 
-## Materialized Views
+| user_id | date       | cost | delete bit |
+| ------- | ---------- | ---- | ---------- |
+| 10001   | 2017-11-20 | 50   | false      |
+| 10002   | 2017-11-21 | 39   | false      |
 
-Materialized views are a space-for-time data analysis acceleration technique. 
Doris supports building materialized views on top of base tables. For example, 
a partial column-based aggregated view can be built on top of a table with a 
granular data model, allowing for fast querying of both granular and aggregated 
data.
+After Batch 2 is imported, the duplicate rows in the first batch will be 
marked as deleted, and the status of the two batches of data is as follows
 
-Doris can automatically ensure data consistency between materialized views and 
base tables, and automatically match the appropriate materialized view at query 
time, greatly reducing the cost of data maintenance for users and providing a 
consistent and transparent query acceleration experience.
+**batch 1**
 
-For more information about materialized views, see [Materialized 
Views](../query-acceleration/materialized-view.md)
+| user_id | date       | cost | delete bit |
+| ------- | ---------- | ---- | ---------- |
+| 10001   | 2017-11-20 | 50   | **true**   |
+| 10002   | 2017-11-21 | 39   | false      |
 
-## Data Table Queries
+**batch 2**
 
-### Memory Limit
+| user\_id | date       | cost | delete bit |
+| -------- | ---------- | ---- | ---------- |
+| 10001    | 2017-11-20 | 1    | false      |
+| 10001    | 2017-11-21 | 5    | false      |
+| 10003    | 2017-11-22 | 22   | false      |
 
-To prevent excessive memory usage of one single query, Doris imposes memory 
limit on queries. By default, one query task should consume no more than 2GB of 
memory on one single BE node.
+In queries, all data marked `true` in the `delete bitmap` will not be read, so 
there is no need for data aggregation.
+Since there are 4 valid rows in the above data, the query result should also 
be 4.  This also enables minimum overhead since it only scans one column of 
data.
 
-If you find a `Memory limit exceeded` error, that means the program is trying 
to allocate more memory than the memory limit. You can solve this by optimizing 
your SQL statements.
+In the test environment, `count(*)` queries in Merge on Write of the Unique 
Model deliver 10 times higher performance than that of the Aggregate Model.
 
-You can change the 2GB memory limit by modifying the memory parameter settings.
+### Duplicate model
 
-Show the memory limit for one query:
+The Duplicate Model does not impose the same limitation as the Aggregate Model 
because it does not involve aggregation semantics.
+For any columns, it can return the semantically correct results in  `count 
(*)` queries.
 
-```sql
-mysql> SHOW VARIABLES LIKE "%mem_limit%";
-+---------------+------------+
-| Variable_name | Value      |
-+---------------+------------+
-| exec_mem_limit| 2147483648 |
-+---------------+------------+
-1 row in set (0.00 sec)
-```
+## Key columns
 
- `exec_mem_limit` is measured in byte. You can change the value of 
`exec_mem_limit` using the `SET` command. For example, you can change it to 8GB 
as follows:
+For the Duplicate, Aggregate, and Unique Models, the Key columns will be 
specified when the table is created,
+but there exist some differences: In the Duplicate Model, the Key columns of 
the table can be regarded as just "sorting columns",
+but not unique identifiers. In Aggregate and Unique Models, the Key columns 
are both "sorting columns" and "unique identifier columns".
 
-```sql
-mysql> SET exec_mem_limit = 8589934592;
-Query OK, 0 rows affected (0.00 sec)
-mysql> SHOW VARIABLES LIKE "%mem_limit%";
-+---------------+------------+
-| Variable_name | Value      |
-+---------------+------------+
-| exec_mem_limit| 8589934592 |
-+---------------+------------+
-1 row in set (0.00 sec)
-```
+## Suggestions for choosing data model
 
->- The above change is executed at the session level and is only valid for the 
currently connected session. The default memory limit will restore after 
reconnection.
->- If you need to change the global variable, you can set: `SET GLOBAL 
exec_mem_limit = 8589934592;`. After this, you disconnect and log back in, and 
then the new parameter will take effect permanently.
+Since the data model was established when the table was built, and 
**irrevocable thereafter, it is very important to select the appropriate data 
model**.
 
-### Query Timeout
-
-The default query timeout is set to 300 seconds. If a query is not completed 
within 300 seconds, it will be cancelled by the Doris system. You change this 
parameter and customize the timeout for your application to achieve a blocking 
method similar to wait(timeout).
-
-View the current timeout settings:
-
-```sql
-mysql> SHOW VARIABLES LIKE "%query_timeout%";
-+---------------+-------+
-| Variable_name | Value |
-+---------------+-------+
-| QUERY_TIMEOUT | 300   |
-+---------------+-------+
-1 row in set (0.00 sec)
-```
-
-Change query timeout to 1 minute:
-
-```sql
-mysql>  SET query_timeout = 60;
-Query OK, 0 rows affected (0.00 sec)
-```
-
->- The current timeout check interval is 5 seconds, so if you set the query 
timeout to less than 5 seconds, it might not be executed accurately.
->- The above changes are also performed at the session level. You can change 
the global variable by `SET GLOBAL`.
-
-### Broadcast/Shuffle Join
-
-The default way to implement Join is to filter the sub table conditionally, 
broadcast it to each node of the overall table to form a memory Hash table, and 
then stream read the data of the overall table for Hash Join, but if the 
filtered data of the sub table cannot be put into memory, the Join will not be 
completed, and then usually a memory overrun error will occur.
-
-In this case, it is recommended to explicitly specify Shuffle Join, also known 
as Partitioned Join, where both the sub table and overall table are Hashed 
according to the Key of the Join and then perform a distributed Join, with the 
memory consumption being spread across all compute nodes in the cluster.
-
-Doris will automatically attempt a Broadcast Join and switch to a Shuffle Join 
if the sub table is estimated to be too large; note that if a Broadcast Join is 
explicitly specified at this point, it will enforce Broadcast Join.
-
-Use Broadcast Join (default):
-
-```sql
-mysql> select sum(table1.pv) from table1 join table2 where table1.siteid = 2;
-+--------------------+
-| sum(`table1`.`pv`) |
-+--------------------+
-|                 10 |
-+--------------------+
-1 row in set (0.20 sec)
-```
-
-Use Broadcast Join (explicitly specified):
-
-```sql
-mysql> select sum(table1.pv) from table1 join [broadcast] table2 where 
table1.siteid = 2;
-+--------------------+
-| sum(`table1`.`pv`) |
-+--------------------+
-|                 10 |
-+--------------------+
-1 row in set (0.20 sec)
-```
-
-Use Shuffle Join:
-
-```sql
-mysql> select sum(table1.pv) from table1 join [shuffle] table2 where 
table1.siteid = 2;
-+--------------------+
-| sum(`table1`.`pv`) |
-+--------------------+
-|                 10 |
-+--------------------+
-1 row in set (0.15 sec)
-```
-
-### Query Retry and High Availability
-
-When deploying multiple FE nodes, you can deploy a load balancing layer on top 
of multiple FEs to achieve high availability of Doris.
-
-Please refer to [Load 
Balancing](../admin-manual/cluster-management/load-balancing.md) for details on 
installation, deployment, and usage.
-
-## Update and Delete Data
-
-Doris supports two methods to delete imported data. One is to use the `DELETE 
FROM` statement and specify the target data by the `WHERE` condition. This 
method is widely applicable and suitable for less frequent scheduled deletion 
tasks.
-
-The other method is only used in the Unique Models with a unique primary key. 
It imports the the primary key rows that are to be deleted, and the final 
physical deletion of the data is performed internally by Doris using the 
deletion mark. This method is suitable for real-time deletion of data.
-
-For specific instructions on deleting and updating data, see [Data 
Update](../data-operate/update-delete/update.md).
+1. The Aggregate Model can greatly reduce the amount of data scanned and query 
computation by pre-aggregation. Thus, it is very suitable for report query 
scenarios with fixed patterns. But this model is unfriendly to `count (*)` 
queries. Meanwhile, since the aggregation method on the Value column is fixed, 
semantic correctness should be considered in other types of aggregation queries.
+2. The Unique Model guarantees the uniqueness of primary key for scenarios 
requiring a unique primary key. The downside is that it cannot exploit the 
advantage brought by pre-aggregation such as ROLLUP in queries. Users who have 
high-performance requirements for aggregate queries are recommended to use the 
newly added Merge on Write implementation since version 1.2.
+3. The Duplicate Model is suitable for ad-hoc queries of any dimensions. 
Although it may not be able to take advantage of the pre-aggregation feature, 
it is not limited by what constrains the Aggregate Model and can give full play 
to the advantage of columnar storage (reading only the relevant columns, but 
not all Key columns).
+4. If user need to use partial-update, please refer to document 
[partial-update](../data-operate/update/unique-load-update)
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/table-design/data-model/unique.md 
b/versioned_docs/version-2.0/table-design/data-model/unique.md
index 9e5706f5a688..3c7d19615d93 100644
--- a/versioned_docs/version-2.0/table-design/data-model/unique.md
+++ b/versioned_docs/version-2.0/table-design/data-model/unique.md
@@ -25,29 +25,18 @@ under the License.
 -->
 
 
-## Unique Model
 
-When users have data update requirement, they can choose to use the Unique 
data model. The Unique model ensures the uniqueness of keys, and when a user 
updates a piece of data, the newly written data will overwrite the old data 
with the same key.
+When users have data update requirements, they can choose to use the Unique 
data model. The unique model can ensure the uniqueness of the primary key. When 
a user updates a piece of data, the newly written data will overwrite the old 
data with the same primary key.
 
-**Two Implementation Methods**
+The Unique data model provides two implementation methods:
 
-The Unique model provides two implementation methods:
+- Merge-on-read. In the merge-on-read implementation, users will not trigger 
any data deduplication-related operations when writing data. All data 
deduplication operations are performed during queries or compaction. Therefore, 
the write performance of merge-on-read is better, the query performance is 
poor, and the memory consumption is also higher.
+- Merge-on-write. Starting from version 1.2, Doris has introduced the 
merge-on-write implementation, which completes all data deduplication work 
during the data write phase, thus providing excellent query performance. On a 
Unique table with the merge-on-write option enabled, data that will be 
overwritten and updated during the import phase will be marked for deletion, 
and new data will be written to a new file. During a query, all marked data 
will be filtered out at the file level, and t [...]
 
-- Merge-on-read: In the merge-on-read implementation, no data 
deduplication-related operations are triggered when writing data. All data 
deduplication operations occur during queries or compaction. Therefore, 
merge-on-read has better write performance, poorer query performance, and 
higher memory consumption.
-- Merge-on-write: In version 1.2, we introduced the merge-on-write 
implementation, which performs all data deduplication during the data writing 
phase, providing excellent query performance.
-
-Since version 2.0, merge-on-write has become a mature and stable, due to its 
excellent query performance, we recommend the majority of users to choose this 
implementation. Starting from version 2.1, merge-on-write has become the 
default implementation for the Unique model.
-For detailed differences between the two implementation methods, refer to the 
subsequent sections in this chapter. For performance differences between the 
two implementation methods, see the description in the following section 
[Limitations of Aggregate Model](#limitations-of-aggregate-model).
-
-**Semantic of Data Updates**
-
-- The default update semantic for the Unique model is **whole-row `UPSERT`**, 
meaning UPDATE OR INSERT. If the key of a row of data exists, it is updated; if 
it does not exist, new data is inserted. Under the whole-row `UPSERT` semantic, 
even if users use `insert into` to write into specific columns, Doris will fill 
in the columns not provided with NULL values or default values in the Planner.
-- Partial column updates: If users want to update only specific fields, they 
need to use the merge-on-write implementation and enable support for partial 
column updates through specific parameters. Refer to the documentation [Partial 
Column Updates](../data-operate/update-delete/partial-update.md) for relevant 
usage recommendations.
-
-### Merge on Read ( Same Implementation as Aggregate Model)
+Let's look at how to create a Unique model table with merge-on-read and 
merge-on-write using a typical user basic information table as an example. This 
table does not have aggregation requirements and only needs to ensure the 
uniqueness of the primary key (The primary key is user_id + username).
 
 | ColumnName    | Type          | IsKey | Comment                |
-|---------------|---------------|-------|------------------------|
+| ------------- | ------------- | ----- | ---------------------- |
 | user_id       | BIGINT        | Yes   | User ID                |
 | username      | VARCHAR (50)  | Yes   | Username               |
 | city          | VARCHAR (20)  | No    | User location city     |
@@ -57,121 +46,64 @@ For detailed differences between the two implementation 
methods, refer to the su
 | address       | VARCHAR (500) | No    | User address           |
 | register_time | DATETIME      | No    | User registration time |
 
-This is a typical user basic information table. There is no aggregation 
requirement for such data. The only concern is to ensure the uniqueness of the 
primary key. (The primary key here is user_id + username). The CREATE TABLE 
statement for the above table is as follows:
+## Merge-on-Read
+
+The table creation statement for Merge-on-read is as follows:
 
 ```
-CREATE TABLE IF NOT EXISTS example_db.example_tbl_unique
+CREATE TABLE IF NOT EXISTS example_tbl_unique
 (
-`user_id` LARGEINT NOT NULL COMMENT "User ID",
-`username` VARCHAR (50) NOT NULL COMMENT "Username",
-`city` VARCHAR (20) COMMENT "User location city",
-`age` SMALLINT COMMENT "User age",
-`sex` TINYINT COMMENT "User sex",
-`phone` LARGEINT COMMENT "User phone number",
-`address` VARCHAR (500) COMMENT "User address",
-`register_time` DATETIME COMMENT "User registration time"
+    `user_id` LARGEINT NOT NULL COMMENT "User ID",
+    `username` VARCHAR(50) NOT NULL COMMENT "Username",
+    `city` VARCHAR(20) COMMENT "User location city",
+    `age` SMALLINT COMMENT "User age",
+    `sex` TINYINT COMMENT "User gender",
+    `phone` LARGEINT COMMENT "User phone number",
+    `address` VARCHAR(500) COMMENT "User address",
+    `register_time` DATETIME COMMENT "User registration time"
 )
-UNIQUE KEY (`user_id`, `username`)
+UNIQUE KEY(`user_id`, `username`)
 DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
 PROPERTIES (
 "replication_allocation" = "tag.location.default: 1"
 );
 ```
 
-This is the same table schema and the CREATE TABLE statement as those of the 
Aggregate Model:
+## Merge-on-Write
 
-| ColumnName    | Type          | AggregationType | Comment                |
-|---------------|---------------|-----------------|------------------------|
-| user_id       | BIGINT        |                 | User ID                |
-| username      | VARCHAR (50)  |                 | Username               |
-| city          | VARCHAR (20)  | REPLACE         | User location city     |
-| age           | SMALLINT      | REPLACE         | User age               |
-| sex           | TINYINT       | REPLACE         | User gender            |
-| phone         | LARGEINT      | REPLACE         | User phone number      |
-| address       | VARCHAR (500) | REPLACE         | User address           |
-| register_time | DATETIME      | REPLACE         | User registration time |
+The table creation statement for Merge-on-write is as follows:
 
 ```
-CREATE TABLE IF NOT EXISTS example_db.example_tbl_agg3
+CREATE TABLE IF NOT EXISTS example_tbl_unique_merge_on_write
 (
-`user_id` LARGEINT NOT NULL COMMENT "User ID",
-`username` VARCHAR (50) NOT NULL COMMENT "Username",
-`city` VARCHAR (20) REPLACE COMMENT "User location city",
-`sex` TINYINT REPLACE COMMENT "User gender",
-`phone` LARGEINT REPLACE COMMENT "User phone number",
-`address` VARCHAR(500) REPLACE COMMENT "User address",
-`register_time` DATETIME REPLACE COMMENT "User registration time"
+    `user_id` LARGEINT NOT NULL COMMENT "User ID",
+    `username` VARCHAR(50) NOT NULL COMMENT "Username",
+    `city` VARCHAR(20) COMMENT "User location city",
+    `age` SMALLINT COMMENT "User age",
+    `sex` TINYINT COMMENT "User gender",
+    `phone` LARGEINT COMMENT "User phone number",
+    `address` VARCHAR(500) COMMENT "User address",
+    `register_time` DATETIME COMMENT "User registration time"
 )
-AGGREGATE KEY(`user_id`, `username`)
+UNIQUE KEY(`user_id`, `username`)
 DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
 PROPERTIES (
-"replication_allocation" = "tag.location.default: 1"
-);
-```
-
-That is to say, the Merge on Read implementation of the Unique Model is 
equivalent to the REPLACE aggregation type in the Aggregate Model. The internal 
implementation and data storage are exactly the same.
-
-<version since="1.2">
-
-### Merge on Write
-
-The Merge on Write implementation of the Unique Model can deliver better 
performance in aggregation queries with primary key limitations.
-
-In Doris 1.2.0, as a new feature, Merge on Write is disabled by default(before 
version 2.1), and users can enable it by adding the following property:
-
-```
+"replication_allocation" = "tag.location.default: 1",
 "enable_unique_key_merge_on_write" = "true"
+);
 ```
 
-In Doris 2.1, Merge on Write is enabled by default.
-
-> Note:
-> 1. For users on version 1.2:
->    1. It is recommended to use version 1.2.4 or above, as this version 
addresses some bugs and stability issues.
->    2. Add the configuration item `disable_storage_page_cache=false` in 
`be.conf`. Failure to add this configuration item may significantly impact data 
import performance.
-> 2. For new users, it is strongly recommended to use version 2.0 or above. In 
version 2.0, there has been a significant improvement and optimization in the 
performance and stability of merge-on-write.
-
-Take the previous table as an example, the corresponding to CREATE TABLE 
statement should be:
+Users need to add the property with **enable_unique_key_merge_on_write" = 
"true"** when creating the table to enable Merge-on-write.
 
 ```
-CREATE TABLE IF NOT EXISTS example_db.example_tbl_unique_merge_on_write
-(
-`user_id` LARGEINT NOT NULL COMMENT "User ID",
-`username` VARCHAR (50) NOT NULL COMMENT "Username",
-`city` VARCHAR (20) COMMENT "User location city",
-`age` SMALLINT COMMENT "Userage",
-`sex` TINYINT COMMENT "User gender",
-`phone` LARGEINT COMMENT "User phone number",
-`address` VARCHAR (500) COMMENT "User address",
-`register_time` DATETIME COMMENT "User registration time"
-)
-UNIQUE KEY (`user_id`, `username`)
-DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
-PROPERTIES (
-"replication_allocation" = "tag.location.default: 1"
 "enable_unique_key_merge_on_write" = "true"
-);
 ```
 
-The table schema produced by the above statement will be different from that 
of the Aggregate Model.
-
-
-| ColumnName    | Type          | AggregationType | Comment                |
-|---------------|---------------|-----------------|------------------------|
-| user_id       | BIGINT        |                 | User ID                |
-| username      | VARCHAR (50)  |                 | Username               |
-| city          | VARCHAR (20)  | NONE            | User location city     |
-| age           | SMALLINT      | NONE            | User age               |
-| sex           | TINYINT       | NONE            | User gender            |
-| phone         | LARGEINT      | NONE            | User phone number      |
-| address       | VARCHAR (500) | NONE            | User address           |
-| register_time | DATETIME      | NONE            | User registration time |
-
-On a Unique table with the Merge on Write option enabled, during the import 
stage, the data that are to be overwritten and updated will be marked for 
deletion, and new data will be written in. When querying, all data marked for 
deletion will be filtered out at the file level, and only the latest data would 
be readed. This eliminates the data aggregation cost while reading, and 
supports many types of predicate pushdown now. Therefore, it can largely 
improve performance in many scenarios,  [...]
-
-[NOTE]
+In version 2.1, Merge-on-Write will be the default behavior for the unique key 
model. Therefore, if you are using Doris version 2.1, it is important to read 
the relevant table creation documentation.
 
-1. The implementation method of a Unique table can only be determined during 
table creation and cannot be modified through schema changes.
-2. The old Merge on Read cannot be seamlessly upgraded to the Merge on Write 
implementation (since they have completely different data organization). If you 
want to switch to the Merge on Write implementation, you need to manually 
execute `insert into unique-mow-table select * from source table` to load data 
to new table.
+## Use attention
 
-</version>
\ No newline at end of file
+- The implementation of the Unique model can only be determined during table 
creation and cannot be modified through schema changes.
+- The Merge-on-read table cannot be seamlessly upgraded to the Merge-on-write 
table (due to completely different data organization methods). If you need to 
switch to Merge-on-write, you must manually perform an `INSERT INTO 
unique-mow-table SELECT * FROM source_table` to re-import the data.
+- **Whole-row Updates**: The default update semantics for the Unique model is 
a whole-row UPSERT, which stands for UPDATE OR INSERT. If the key for a row of 
data exists, it will be updated; if it does not exist, new data will be 
inserted. Under the whole-row UPSERT semantics, even if the user specifies only 
certain columns for insertion using `INSERT INTO`, Doris will fill in the 
unprovided columns with NULL values or default values during the planning phase.
+- **Partial Column Updates**: If the user wishes to update only certain 
fields, they must use Merge-on-write and enable support for partial column 
updates through specific parameters. Please refer to the documentation on 
partial column updates(links) for relevant usage recommendations.
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris-website) branch master updated: [docs]update en docs of data model (#474)

Reply via email to