This is an automated email from the ASF dual-hosted git repository.

kassiez pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new ee3b222a40 [doc](delete) add delete overview (#1488)
ee3b222a40 is described below

commit ee3b222a4016b13df23c816bb8ee0f6686c299a1
Author: zhannngchen <zhangc...@selectdb.com>
AuthorDate: Tue Dec 17 11:22:12 2024 +0800

    [doc](delete) add delete overview (#1488)
    
    ## Versions
    
    - [x] dev
    - [x] 3.0
    - [x] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [x] Checked by AI
    - [ ] Test Cases Built
---
 docs/data-operate/delete/delete-overview.md        | 77 ++++++++++++++++++++++
 .../current/data-operate/delete/delete-overview.md | 76 +++++++++++++++++++++
 .../data-operate/delete/delete-overview.md         | 76 +++++++++++++++++++++
 .../data-operate/delete/delete-overview.md         | 76 +++++++++++++++++++++
 sidebars.json                                      |  1 +
 .../data-operate/delete/delete-overview.md         | 77 ++++++++++++++++++++++
 .../data-operate/delete/delete-overview.md         | 77 ++++++++++++++++++++++
 7 files changed, 460 insertions(+)

diff --git a/docs/data-operate/delete/delete-overview.md 
b/docs/data-operate/delete/delete-overview.md
new file mode 100644
index 0000000000..994f2274c7
--- /dev/null
+++ b/docs/data-operate/delete/delete-overview.md
@@ -0,0 +1,77 @@
+---
+{
+    "title": "Delete Overview",
+    "language": "en"
+}
+
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+In Apache Doris, the delete operation is a key feature for managing and 
cleaning data to meet the flexibility needs of users in large-scale data 
analysis scenarios. Doris's deletion mechanism supports efficient logical 
deletion and multi-version data management, achieving a good balance between 
performance and flexibility.
+
+## Implementation Mechanism of Deletion
+
+Doris's delete operation uses **logical deletion** rather than directly 
physically deleting data. The core implementation mechanisms are as follows:
+
+1. **Logical Deletion**. The delete operation does not directly remove data 
from storage but adds a delete marker to the target data. There are two main 
ways to implement logical deletion: delete predicate and delete sign.
+
+    1. Delete predicate is used for Duplicate and Aggregate models. Each 
deletion directly records a conditional predicate on the corresponding dataset 
to filter out the deleted data during queries.
+    2. Delete sign is used for the Unique Key model. Each deletion writes a 
new batch of data to overwrite the data to be deleted, and the hidden column 
`__DORIS_VERSION_COL__` of the new data is set to 1, indicating that the data 
has been deleted.
+    3. Performance comparison: The operation speed of "delete predicate" is 
very fast, whether deleting 1 row or 100 million rows, the speed is almost the 
same, it just write a conditional predicate to the dataset; the write speed of 
delete sign is proportional to the amount of data.
+
+2. **Multi-Version Data Management**. Doris supports multi-version data (MVCC, 
Multi-Version Concurrency Control), allowing concurrent operations on the same 
dataset without affecting query results. The delete operation creates a new 
version containing the delete marker, while the old version data is still 
retained.
+
+3. **Physical Deletion (Compaction)**. The periodically executed compaction 
process cleans up data marked for deletion, thereby freeing up storage space. 
This process is automatically completed by the system without user 
intervention. Note that only Base Compaction will physically delete data, while 
Cumulative Compaction only merges and reorders data, reducing the number of 
rowsets and segments.
+
+## Use Cases for Delete Operations
+
+Doris provides various deletion methods to meet different needs:
+
+### Conditional Deletion
+
+Users can delete rows that meet specified conditions. For example:
+
+```sql
+DELETE FROM table_name WHERE condition;
+```
+
+### Batch Deletion via data loading
+
+During data loading, logical deletion can be achieved by overwriting. This 
method is suitable for batch deletion of a large number of keys or 
synchronizing TP database deletions during CDC binlog synchronization.
+
+### Deleting All Data
+
+In some cases, data can be deleted by directly truncating the table or 
partition. For example:
+
+```sql
+TRUNCATE TABLE table_name;
+```
+
+### Atomic Overwrite Using Temporary Partitions
+
+In some cases, users may want to rewrite the data of a partition. If the data 
is deleted and then imported, there will be a period when the data is 
unavailable. In this case, users can create a corresponding temporary 
partition, import the new data into the temporary partition, and then replace 
the original partition atomically to achieve the goal.
+
+## Notes
+
+1. The delete operation generates new data versions, so frequent deletions may 
increase the number of versions, affecting query performance.
+2. Compaction is a key step in freeing up storage space. Users are advised to 
adjust the compaction strategy based on system load.
+3. Deleted data will still occupy storage until compaction is completed, so 
the delete operation itself will not immediately reduce storage usage.
+
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/delete/delete-overview.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/delete/delete-overview.md
new file mode 100644
index 0000000000..74b6a6049a
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/delete/delete-overview.md
@@ -0,0 +1,76 @@
+---
+{
+    "title": "删除操作概述",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+在 Apache Doris 中,删除操作(Delete)是一项关键功能,用于管理和清理数据,以满足用户在大规模数据分析场景中的灵活性需求。Doris 
的删除机制支持高效的标记删除和多版本数据管理,在性能和灵活性之间达到了良好的平衡。
+
+## 删除的实现机制
+
+Doris 的删除操作采用**标记删除(Logical Deletion)**的方式,而不是直接物理删除数据。以下是其核心实现机制:
+
+1. **标记删除**。删除操作不会直接从存储中移除数据,而是为目标数据添加一条删除标记。标记删除主要有两种实现方式:delete 谓词和 delete 
sign。
+
+   1. delete 谓词用于 Duplicate 模型和 Aggregate 
模型,每次删除会直接在对应的数据集上记录一个条件谓词,用于在查询时过滤掉被删除的数据。
+   2. delete sign 用于 Unique Key 模型,每次删除会新写入一批数据覆盖要被删除的数据,同时新写入的数据会将隐藏列 
`__DORIS_VERSION_COL__` 设置为 1,表示该数据已经被删除。
+   3. 性能比较:“delete 谓词”的操作速度非常快,无论是删除 1 条数据还是 1 
亿条数据,速度都差不多——都是写一个条件谓词到数据集上;delete sign 的写入速度与数据量成正比。
+
+2. **多版本数据管理**。Doris 支持多版本数据(MVCC,Multi-Version Concurrency 
Control),允许在同一数据集上进行并发操作而不会影响查询结果。删除操作会创建一个新的版本,其中包含删除标记,而旧版本数据仍然被保留。
+
+3. 
**物理删除(Compaction)**。定期执行的合并压缩(Compaction)过程会清理标记为删除的数据,从而释放存储空间。此过程由系统自动完成,无需用户手动干预。注意,只有
 Base Compaction 才会对数据进行物理删除,Cumulative Compaction 仅对数据进行合并及重新排序,减少 rowset 及 
segment 数量。
+
+## 删除操作的使用场景
+
+Doris 提供多种删除方式,以满足不同场景的需求:
+
+### 条件删除
+
+用户可以通过指定过滤条件,删除满足条件的行。例如:
+
+```sql
+DELETE FROM table_name WHERE condition;
+```
+
+### 通过导入进行批量删除
+
+在数据导入时,通过覆盖的方式实现逻辑删除。这种方式适用于批量删除大量的 key,或者在 CDC 同步 binlog 时同步 TP 数据库的删除操作。
+
+### 删除全部数据
+
+在某些情况下,可以通过直接清空表或分区实现对数据的删除,例如:
+
+```sql
+TRUNCATE TABLE table_name;
+```
+
+### 使用临时分区实现原子覆盖写
+
+某些情况下,用户希望能够重写某一分区的数据,但如果采用先删除再导入的方式进行,在中间会有一段时间无法查看数据。这时,用户可以先创建一个对应的临时分区,将新的数据导入到临时分区后,通过替换操作,原子性地替换原有分区,以达到目的。
+
+## 注意事项
+
+1. 删除操作会生成新的数据版本,因此频繁执行删除可能会导致版本数量增加,从而影响查询性能。
+2. 合并压缩是释放存储空间的关键步骤,建议用户根据系统负载调整压缩策略。
+3. 删除后的数据在合并压缩完成之前仍会占用存储,因此删除操作本身不会立即降低存储使用。
+
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/delete/delete-overview.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/delete/delete-overview.md
new file mode 100644
index 0000000000..74b6a6049a
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/delete/delete-overview.md
@@ -0,0 +1,76 @@
+---
+{
+    "title": "删除操作概述",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+在 Apache Doris 中,删除操作(Delete)是一项关键功能,用于管理和清理数据,以满足用户在大规模数据分析场景中的灵活性需求。Doris 
的删除机制支持高效的标记删除和多版本数据管理,在性能和灵活性之间达到了良好的平衡。
+
+## 删除的实现机制
+
+Doris 的删除操作采用**标记删除(Logical Deletion)**的方式,而不是直接物理删除数据。以下是其核心实现机制:
+
+1. **标记删除**。删除操作不会直接从存储中移除数据,而是为目标数据添加一条删除标记。标记删除主要有两种实现方式:delete 谓词和 delete 
sign。
+
+   1. delete 谓词用于 Duplicate 模型和 Aggregate 
模型,每次删除会直接在对应的数据集上记录一个条件谓词,用于在查询时过滤掉被删除的数据。
+   2. delete sign 用于 Unique Key 模型,每次删除会新写入一批数据覆盖要被删除的数据,同时新写入的数据会将隐藏列 
`__DORIS_VERSION_COL__` 设置为 1,表示该数据已经被删除。
+   3. 性能比较:“delete 谓词”的操作速度非常快,无论是删除 1 条数据还是 1 
亿条数据,速度都差不多——都是写一个条件谓词到数据集上;delete sign 的写入速度与数据量成正比。
+
+2. **多版本数据管理**。Doris 支持多版本数据(MVCC,Multi-Version Concurrency 
Control),允许在同一数据集上进行并发操作而不会影响查询结果。删除操作会创建一个新的版本,其中包含删除标记,而旧版本数据仍然被保留。
+
+3. 
**物理删除(Compaction)**。定期执行的合并压缩(Compaction)过程会清理标记为删除的数据,从而释放存储空间。此过程由系统自动完成,无需用户手动干预。注意,只有
 Base Compaction 才会对数据进行物理删除,Cumulative Compaction 仅对数据进行合并及重新排序,减少 rowset 及 
segment 数量。
+
+## 删除操作的使用场景
+
+Doris 提供多种删除方式,以满足不同场景的需求:
+
+### 条件删除
+
+用户可以通过指定过滤条件,删除满足条件的行。例如:
+
+```sql
+DELETE FROM table_name WHERE condition;
+```
+
+### 通过导入进行批量删除
+
+在数据导入时,通过覆盖的方式实现逻辑删除。这种方式适用于批量删除大量的 key,或者在 CDC 同步 binlog 时同步 TP 数据库的删除操作。
+
+### 删除全部数据
+
+在某些情况下,可以通过直接清空表或分区实现对数据的删除,例如:
+
+```sql
+TRUNCATE TABLE table_name;
+```
+
+### 使用临时分区实现原子覆盖写
+
+某些情况下,用户希望能够重写某一分区的数据,但如果采用先删除再导入的方式进行,在中间会有一段时间无法查看数据。这时,用户可以先创建一个对应的临时分区,将新的数据导入到临时分区后,通过替换操作,原子性地替换原有分区,以达到目的。
+
+## 注意事项
+
+1. 删除操作会生成新的数据版本,因此频繁执行删除可能会导致版本数量增加,从而影响查询性能。
+2. 合并压缩是释放存储空间的关键步骤,建议用户根据系统负载调整压缩策略。
+3. 删除后的数据在合并压缩完成之前仍会占用存储,因此删除操作本身不会立即降低存储使用。
+
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/delete/delete-overview.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/delete/delete-overview.md
new file mode 100644
index 0000000000..74b6a6049a
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/delete/delete-overview.md
@@ -0,0 +1,76 @@
+---
+{
+    "title": "删除操作概述",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+在 Apache Doris 中,删除操作(Delete)是一项关键功能,用于管理和清理数据,以满足用户在大规模数据分析场景中的灵活性需求。Doris 
的删除机制支持高效的标记删除和多版本数据管理,在性能和灵活性之间达到了良好的平衡。
+
+## 删除的实现机制
+
+Doris 的删除操作采用**标记删除(Logical Deletion)**的方式,而不是直接物理删除数据。以下是其核心实现机制:
+
+1. **标记删除**。删除操作不会直接从存储中移除数据,而是为目标数据添加一条删除标记。标记删除主要有两种实现方式:delete 谓词和 delete 
sign。
+
+   1. delete 谓词用于 Duplicate 模型和 Aggregate 
模型,每次删除会直接在对应的数据集上记录一个条件谓词,用于在查询时过滤掉被删除的数据。
+   2. delete sign 用于 Unique Key 模型,每次删除会新写入一批数据覆盖要被删除的数据,同时新写入的数据会将隐藏列 
`__DORIS_VERSION_COL__` 设置为 1,表示该数据已经被删除。
+   3. 性能比较:“delete 谓词”的操作速度非常快,无论是删除 1 条数据还是 1 
亿条数据,速度都差不多——都是写一个条件谓词到数据集上;delete sign 的写入速度与数据量成正比。
+
+2. **多版本数据管理**。Doris 支持多版本数据(MVCC,Multi-Version Concurrency 
Control),允许在同一数据集上进行并发操作而不会影响查询结果。删除操作会创建一个新的版本,其中包含删除标记,而旧版本数据仍然被保留。
+
+3. 
**物理删除(Compaction)**。定期执行的合并压缩(Compaction)过程会清理标记为删除的数据,从而释放存储空间。此过程由系统自动完成,无需用户手动干预。注意,只有
 Base Compaction 才会对数据进行物理删除,Cumulative Compaction 仅对数据进行合并及重新排序,减少 rowset 及 
segment 数量。
+
+## 删除操作的使用场景
+
+Doris 提供多种删除方式,以满足不同场景的需求:
+
+### 条件删除
+
+用户可以通过指定过滤条件,删除满足条件的行。例如:
+
+```sql
+DELETE FROM table_name WHERE condition;
+```
+
+### 通过导入进行批量删除
+
+在数据导入时,通过覆盖的方式实现逻辑删除。这种方式适用于批量删除大量的 key,或者在 CDC 同步 binlog 时同步 TP 数据库的删除操作。
+
+### 删除全部数据
+
+在某些情况下,可以通过直接清空表或分区实现对数据的删除,例如:
+
+```sql
+TRUNCATE TABLE table_name;
+```
+
+### 使用临时分区实现原子覆盖写
+
+某些情况下,用户希望能够重写某一分区的数据,但如果采用先删除再导入的方式进行,在中间会有一段时间无法查看数据。这时,用户可以先创建一个对应的临时分区,将新的数据导入到临时分区后,通过替换操作,原子性地替换原有分区,以达到目的。
+
+## 注意事项
+
+1. 删除操作会生成新的数据版本,因此频繁执行删除可能会导致版本数量增加,从而影响查询性能。
+2. 合并压缩是释放存储空间的关键步骤,建议用户根据系统负载调整压缩策略。
+3. 删除后的数据在合并压缩完成之前仍会占用存储,因此删除操作本身不会立即降低存储使用。
+
diff --git a/sidebars.json b/sidebars.json
index da87f8e895..06bb42b6c2 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -193,6 +193,7 @@
                             "type": "category",
                             "label": "Deleting Data",
                             "items": [
+                                "data-operate/delete/delete-overview",
                                 "data-operate/delete/delete-manual",
                                 "data-operate/delete/batch-delete-manual",
                                 "data-operate/delete/truncate-manual",
diff --git a/versioned_docs/version-2.1/data-operate/delete/delete-overview.md 
b/versioned_docs/version-2.1/data-operate/delete/delete-overview.md
new file mode 100644
index 0000000000..994f2274c7
--- /dev/null
+++ b/versioned_docs/version-2.1/data-operate/delete/delete-overview.md
@@ -0,0 +1,77 @@
+---
+{
+    "title": "Delete Overview",
+    "language": "en"
+}
+
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+In Apache Doris, the delete operation is a key feature for managing and 
cleaning data to meet the flexibility needs of users in large-scale data 
analysis scenarios. Doris's deletion mechanism supports efficient logical 
deletion and multi-version data management, achieving a good balance between 
performance and flexibility.
+
+## Implementation Mechanism of Deletion
+
+Doris's delete operation uses **logical deletion** rather than directly 
physically deleting data. The core implementation mechanisms are as follows:
+
+1. **Logical Deletion**. The delete operation does not directly remove data 
from storage but adds a delete marker to the target data. There are two main 
ways to implement logical deletion: delete predicate and delete sign.
+
+    1. Delete predicate is used for Duplicate and Aggregate models. Each 
deletion directly records a conditional predicate on the corresponding dataset 
to filter out the deleted data during queries.
+    2. Delete sign is used for the Unique Key model. Each deletion writes a 
new batch of data to overwrite the data to be deleted, and the hidden column 
`__DORIS_VERSION_COL__` of the new data is set to 1, indicating that the data 
has been deleted.
+    3. Performance comparison: The operation speed of "delete predicate" is 
very fast, whether deleting 1 row or 100 million rows, the speed is almost the 
same, it just write a conditional predicate to the dataset; the write speed of 
delete sign is proportional to the amount of data.
+
+2. **Multi-Version Data Management**. Doris supports multi-version data (MVCC, 
Multi-Version Concurrency Control), allowing concurrent operations on the same 
dataset without affecting query results. The delete operation creates a new 
version containing the delete marker, while the old version data is still 
retained.
+
+3. **Physical Deletion (Compaction)**. The periodically executed compaction 
process cleans up data marked for deletion, thereby freeing up storage space. 
This process is automatically completed by the system without user 
intervention. Note that only Base Compaction will physically delete data, while 
Cumulative Compaction only merges and reorders data, reducing the number of 
rowsets and segments.
+
+## Use Cases for Delete Operations
+
+Doris provides various deletion methods to meet different needs:
+
+### Conditional Deletion
+
+Users can delete rows that meet specified conditions. For example:
+
+```sql
+DELETE FROM table_name WHERE condition;
+```
+
+### Batch Deletion via data loading
+
+During data loading, logical deletion can be achieved by overwriting. This 
method is suitable for batch deletion of a large number of keys or 
synchronizing TP database deletions during CDC binlog synchronization.
+
+### Deleting All Data
+
+In some cases, data can be deleted by directly truncating the table or 
partition. For example:
+
+```sql
+TRUNCATE TABLE table_name;
+```
+
+### Atomic Overwrite Using Temporary Partitions
+
+In some cases, users may want to rewrite the data of a partition. If the data 
is deleted and then imported, there will be a period when the data is 
unavailable. In this case, users can create a corresponding temporary 
partition, import the new data into the temporary partition, and then replace 
the original partition atomically to achieve the goal.
+
+## Notes
+
+1. The delete operation generates new data versions, so frequent deletions may 
increase the number of versions, affecting query performance.
+2. Compaction is a key step in freeing up storage space. Users are advised to 
adjust the compaction strategy based on system load.
+3. Deleted data will still occupy storage until compaction is completed, so 
the delete operation itself will not immediately reduce storage usage.
+
diff --git a/versioned_docs/version-3.0/data-operate/delete/delete-overview.md 
b/versioned_docs/version-3.0/data-operate/delete/delete-overview.md
new file mode 100644
index 0000000000..994f2274c7
--- /dev/null
+++ b/versioned_docs/version-3.0/data-operate/delete/delete-overview.md
@@ -0,0 +1,77 @@
+---
+{
+    "title": "Delete Overview",
+    "language": "en"
+}
+
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+In Apache Doris, the delete operation is a key feature for managing and 
cleaning data to meet the flexibility needs of users in large-scale data 
analysis scenarios. Doris's deletion mechanism supports efficient logical 
deletion and multi-version data management, achieving a good balance between 
performance and flexibility.
+
+## Implementation Mechanism of Deletion
+
+Doris's delete operation uses **logical deletion** rather than directly 
physically deleting data. The core implementation mechanisms are as follows:
+
+1. **Logical Deletion**. The delete operation does not directly remove data 
from storage but adds a delete marker to the target data. There are two main 
ways to implement logical deletion: delete predicate and delete sign.
+
+    1. Delete predicate is used for Duplicate and Aggregate models. Each 
deletion directly records a conditional predicate on the corresponding dataset 
to filter out the deleted data during queries.
+    2. Delete sign is used for the Unique Key model. Each deletion writes a 
new batch of data to overwrite the data to be deleted, and the hidden column 
`__DORIS_VERSION_COL__` of the new data is set to 1, indicating that the data 
has been deleted.
+    3. Performance comparison: The operation speed of "delete predicate" is 
very fast, whether deleting 1 row or 100 million rows, the speed is almost the 
same, it just write a conditional predicate to the dataset; the write speed of 
delete sign is proportional to the amount of data.
+
+2. **Multi-Version Data Management**. Doris supports multi-version data (MVCC, 
Multi-Version Concurrency Control), allowing concurrent operations on the same 
dataset without affecting query results. The delete operation creates a new 
version containing the delete marker, while the old version data is still 
retained.
+
+3. **Physical Deletion (Compaction)**. The periodically executed compaction 
process cleans up data marked for deletion, thereby freeing up storage space. 
This process is automatically completed by the system without user 
intervention. Note that only Base Compaction will physically delete data, while 
Cumulative Compaction only merges and reorders data, reducing the number of 
rowsets and segments.
+
+## Use Cases for Delete Operations
+
+Doris provides various deletion methods to meet different needs:
+
+### Conditional Deletion
+
+Users can delete rows that meet specified conditions. For example:
+
+```sql
+DELETE FROM table_name WHERE condition;
+```
+
+### Batch Deletion via data loading
+
+During data loading, logical deletion can be achieved by overwriting. This 
method is suitable for batch deletion of a large number of keys or 
synchronizing TP database deletions during CDC binlog synchronization.
+
+### Deleting All Data
+
+In some cases, data can be deleted by directly truncating the table or 
partition. For example:
+
+```sql
+TRUNCATE TABLE table_name;
+```
+
+### Atomic Overwrite Using Temporary Partitions
+
+In some cases, users may want to rewrite the data of a partition. If the data 
is deleted and then imported, there will be a period when the data is 
unavailable. In this case, users can create a corresponding temporary 
partition, import the new data into the temporary partition, and then replace 
the original partition atomically to achieve the goal.
+
+## Notes
+
+1. The delete operation generates new data versions, so frequent deletions may 
increase the number of versions, affecting query performance.
+2. Compaction is a key step in freeing up storage space. Users are advised to 
adjust the compaction strategy based on system load.
+3. Deleted data will still occupy storage until compaction is completed, so 
the delete operation itself will not immediately reduce storage usage.
+


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to