This is an automated email from the ASF dual-hosted git repository. kassiez pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new ee3b222a40 [doc](delete) add delete overview (#1488) ee3b222a40 is described below commit ee3b222a4016b13df23c816bb8ee0f6686c299a1 Author: zhannngchen <zhangc...@selectdb.com> AuthorDate: Tue Dec 17 11:22:12 2024 +0800 [doc](delete) add delete overview (#1488) ## Versions - [x] dev - [x] 3.0 - [x] 2.1 - [ ] 2.0 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [x] Checked by AI - [ ] Test Cases Built --- docs/data-operate/delete/delete-overview.md | 77 ++++++++++++++++++++++ .../current/data-operate/delete/delete-overview.md | 76 +++++++++++++++++++++ .../data-operate/delete/delete-overview.md | 76 +++++++++++++++++++++ .../data-operate/delete/delete-overview.md | 76 +++++++++++++++++++++ sidebars.json | 1 + .../data-operate/delete/delete-overview.md | 77 ++++++++++++++++++++++ .../data-operate/delete/delete-overview.md | 77 ++++++++++++++++++++++ 7 files changed, 460 insertions(+) diff --git a/docs/data-operate/delete/delete-overview.md b/docs/data-operate/delete/delete-overview.md new file mode 100644 index 0000000000..994f2274c7 --- /dev/null +++ b/docs/data-operate/delete/delete-overview.md @@ -0,0 +1,77 @@ +--- +{ + "title": "Delete Overview", + "language": "en" +} + +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +In Apache Doris, the delete operation is a key feature for managing and cleaning data to meet the flexibility needs of users in large-scale data analysis scenarios. Doris's deletion mechanism supports efficient logical deletion and multi-version data management, achieving a good balance between performance and flexibility. + +## Implementation Mechanism of Deletion + +Doris's delete operation uses **logical deletion** rather than directly physically deleting data. The core implementation mechanisms are as follows: + +1. **Logical Deletion**. The delete operation does not directly remove data from storage but adds a delete marker to the target data. There are two main ways to implement logical deletion: delete predicate and delete sign. + + 1. Delete predicate is used for Duplicate and Aggregate models. Each deletion directly records a conditional predicate on the corresponding dataset to filter out the deleted data during queries. + 2. Delete sign is used for the Unique Key model. Each deletion writes a new batch of data to overwrite the data to be deleted, and the hidden column `__DORIS_VERSION_COL__` of the new data is set to 1, indicating that the data has been deleted. + 3. Performance comparison: The operation speed of "delete predicate" is very fast, whether deleting 1 row or 100 million rows, the speed is almost the same, it just write a conditional predicate to the dataset; the write speed of delete sign is proportional to the amount of data. + +2. **Multi-Version Data Management**. Doris supports multi-version data (MVCC, Multi-Version Concurrency Control), allowing concurrent operations on the same dataset without affecting query results. The delete operation creates a new version containing the delete marker, while the old version data is still retained. + +3. **Physical Deletion (Compaction)**. The periodically executed compaction process cleans up data marked for deletion, thereby freeing up storage space. This process is automatically completed by the system without user intervention. Note that only Base Compaction will physically delete data, while Cumulative Compaction only merges and reorders data, reducing the number of rowsets and segments. + +## Use Cases for Delete Operations + +Doris provides various deletion methods to meet different needs: + +### Conditional Deletion + +Users can delete rows that meet specified conditions. For example: + +```sql +DELETE FROM table_name WHERE condition; +``` + +### Batch Deletion via data loading + +During data loading, logical deletion can be achieved by overwriting. This method is suitable for batch deletion of a large number of keys or synchronizing TP database deletions during CDC binlog synchronization. + +### Deleting All Data + +In some cases, data can be deleted by directly truncating the table or partition. For example: + +```sql +TRUNCATE TABLE table_name; +``` + +### Atomic Overwrite Using Temporary Partitions + +In some cases, users may want to rewrite the data of a partition. If the data is deleted and then imported, there will be a period when the data is unavailable. In this case, users can create a corresponding temporary partition, import the new data into the temporary partition, and then replace the original partition atomically to achieve the goal. + +## Notes + +1. The delete operation generates new data versions, so frequent deletions may increase the number of versions, affecting query performance. +2. Compaction is a key step in freeing up storage space. Users are advised to adjust the compaction strategy based on system load. +3. Deleted data will still occupy storage until compaction is completed, so the delete operation itself will not immediately reduce storage usage. + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/delete/delete-overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/delete/delete-overview.md new file mode 100644 index 0000000000..74b6a6049a --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/delete/delete-overview.md @@ -0,0 +1,76 @@ +--- +{ + "title": "删除操作概述", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +在 Apache Doris 中,删除操作(Delete)是一项关键功能,用于管理和清理数据,以满足用户在大规模数据分析场景中的灵活性需求。Doris 的删除机制支持高效的标记删除和多版本数据管理,在性能和灵活性之间达到了良好的平衡。 + +## 删除的实现机制 + +Doris 的删除操作采用**标记删除(Logical Deletion)**的方式,而不是直接物理删除数据。以下是其核心实现机制: + +1. **标记删除**。删除操作不会直接从存储中移除数据,而是为目标数据添加一条删除标记。标记删除主要有两种实现方式:delete 谓词和 delete sign。 + + 1. delete 谓词用于 Duplicate 模型和 Aggregate 模型,每次删除会直接在对应的数据集上记录一个条件谓词,用于在查询时过滤掉被删除的数据。 + 2. delete sign 用于 Unique Key 模型,每次删除会新写入一批数据覆盖要被删除的数据,同时新写入的数据会将隐藏列 `__DORIS_VERSION_COL__` 设置为 1,表示该数据已经被删除。 + 3. 性能比较:“delete 谓词”的操作速度非常快,无论是删除 1 条数据还是 1 亿条数据,速度都差不多——都是写一个条件谓词到数据集上;delete sign 的写入速度与数据量成正比。 + +2. **多版本数据管理**。Doris 支持多版本数据(MVCC,Multi-Version Concurrency Control),允许在同一数据集上进行并发操作而不会影响查询结果。删除操作会创建一个新的版本,其中包含删除标记,而旧版本数据仍然被保留。 + +3. **物理删除(Compaction)**。定期执行的合并压缩(Compaction)过程会清理标记为删除的数据,从而释放存储空间。此过程由系统自动完成,无需用户手动干预。注意,只有 Base Compaction 才会对数据进行物理删除,Cumulative Compaction 仅对数据进行合并及重新排序,减少 rowset 及 segment 数量。 + +## 删除操作的使用场景 + +Doris 提供多种删除方式,以满足不同场景的需求: + +### 条件删除 + +用户可以通过指定过滤条件,删除满足条件的行。例如: + +```sql +DELETE FROM table_name WHERE condition; +``` + +### 通过导入进行批量删除 + +在数据导入时,通过覆盖的方式实现逻辑删除。这种方式适用于批量删除大量的 key,或者在 CDC 同步 binlog 时同步 TP 数据库的删除操作。 + +### 删除全部数据 + +在某些情况下,可以通过直接清空表或分区实现对数据的删除,例如: + +```sql +TRUNCATE TABLE table_name; +``` + +### 使用临时分区实现原子覆盖写 + +某些情况下,用户希望能够重写某一分区的数据,但如果采用先删除再导入的方式进行,在中间会有一段时间无法查看数据。这时,用户可以先创建一个对应的临时分区,将新的数据导入到临时分区后,通过替换操作,原子性地替换原有分区,以达到目的。 + +## 注意事项 + +1. 删除操作会生成新的数据版本,因此频繁执行删除可能会导致版本数量增加,从而影响查询性能。 +2. 合并压缩是释放存储空间的关键步骤,建议用户根据系统负载调整压缩策略。 +3. 删除后的数据在合并压缩完成之前仍会占用存储,因此删除操作本身不会立即降低存储使用。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/delete/delete-overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/delete/delete-overview.md new file mode 100644 index 0000000000..74b6a6049a --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/delete/delete-overview.md @@ -0,0 +1,76 @@ +--- +{ + "title": "删除操作概述", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +在 Apache Doris 中,删除操作(Delete)是一项关键功能,用于管理和清理数据,以满足用户在大规模数据分析场景中的灵活性需求。Doris 的删除机制支持高效的标记删除和多版本数据管理,在性能和灵活性之间达到了良好的平衡。 + +## 删除的实现机制 + +Doris 的删除操作采用**标记删除(Logical Deletion)**的方式,而不是直接物理删除数据。以下是其核心实现机制: + +1. **标记删除**。删除操作不会直接从存储中移除数据,而是为目标数据添加一条删除标记。标记删除主要有两种实现方式:delete 谓词和 delete sign。 + + 1. delete 谓词用于 Duplicate 模型和 Aggregate 模型,每次删除会直接在对应的数据集上记录一个条件谓词,用于在查询时过滤掉被删除的数据。 + 2. delete sign 用于 Unique Key 模型,每次删除会新写入一批数据覆盖要被删除的数据,同时新写入的数据会将隐藏列 `__DORIS_VERSION_COL__` 设置为 1,表示该数据已经被删除。 + 3. 性能比较:“delete 谓词”的操作速度非常快,无论是删除 1 条数据还是 1 亿条数据,速度都差不多——都是写一个条件谓词到数据集上;delete sign 的写入速度与数据量成正比。 + +2. **多版本数据管理**。Doris 支持多版本数据(MVCC,Multi-Version Concurrency Control),允许在同一数据集上进行并发操作而不会影响查询结果。删除操作会创建一个新的版本,其中包含删除标记,而旧版本数据仍然被保留。 + +3. **物理删除(Compaction)**。定期执行的合并压缩(Compaction)过程会清理标记为删除的数据,从而释放存储空间。此过程由系统自动完成,无需用户手动干预。注意,只有 Base Compaction 才会对数据进行物理删除,Cumulative Compaction 仅对数据进行合并及重新排序,减少 rowset 及 segment 数量。 + +## 删除操作的使用场景 + +Doris 提供多种删除方式,以满足不同场景的需求: + +### 条件删除 + +用户可以通过指定过滤条件,删除满足条件的行。例如: + +```sql +DELETE FROM table_name WHERE condition; +``` + +### 通过导入进行批量删除 + +在数据导入时,通过覆盖的方式实现逻辑删除。这种方式适用于批量删除大量的 key,或者在 CDC 同步 binlog 时同步 TP 数据库的删除操作。 + +### 删除全部数据 + +在某些情况下,可以通过直接清空表或分区实现对数据的删除,例如: + +```sql +TRUNCATE TABLE table_name; +``` + +### 使用临时分区实现原子覆盖写 + +某些情况下,用户希望能够重写某一分区的数据,但如果采用先删除再导入的方式进行,在中间会有一段时间无法查看数据。这时,用户可以先创建一个对应的临时分区,将新的数据导入到临时分区后,通过替换操作,原子性地替换原有分区,以达到目的。 + +## 注意事项 + +1. 删除操作会生成新的数据版本,因此频繁执行删除可能会导致版本数量增加,从而影响查询性能。 +2. 合并压缩是释放存储空间的关键步骤,建议用户根据系统负载调整压缩策略。 +3. 删除后的数据在合并压缩完成之前仍会占用存储,因此删除操作本身不会立即降低存储使用。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/delete/delete-overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/delete/delete-overview.md new file mode 100644 index 0000000000..74b6a6049a --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/delete/delete-overview.md @@ -0,0 +1,76 @@ +--- +{ + "title": "删除操作概述", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +在 Apache Doris 中,删除操作(Delete)是一项关键功能,用于管理和清理数据,以满足用户在大规模数据分析场景中的灵活性需求。Doris 的删除机制支持高效的标记删除和多版本数据管理,在性能和灵活性之间达到了良好的平衡。 + +## 删除的实现机制 + +Doris 的删除操作采用**标记删除(Logical Deletion)**的方式,而不是直接物理删除数据。以下是其核心实现机制: + +1. **标记删除**。删除操作不会直接从存储中移除数据,而是为目标数据添加一条删除标记。标记删除主要有两种实现方式:delete 谓词和 delete sign。 + + 1. delete 谓词用于 Duplicate 模型和 Aggregate 模型,每次删除会直接在对应的数据集上记录一个条件谓词,用于在查询时过滤掉被删除的数据。 + 2. delete sign 用于 Unique Key 模型,每次删除会新写入一批数据覆盖要被删除的数据,同时新写入的数据会将隐藏列 `__DORIS_VERSION_COL__` 设置为 1,表示该数据已经被删除。 + 3. 性能比较:“delete 谓词”的操作速度非常快,无论是删除 1 条数据还是 1 亿条数据,速度都差不多——都是写一个条件谓词到数据集上;delete sign 的写入速度与数据量成正比。 + +2. **多版本数据管理**。Doris 支持多版本数据(MVCC,Multi-Version Concurrency Control),允许在同一数据集上进行并发操作而不会影响查询结果。删除操作会创建一个新的版本,其中包含删除标记,而旧版本数据仍然被保留。 + +3. **物理删除(Compaction)**。定期执行的合并压缩(Compaction)过程会清理标记为删除的数据,从而释放存储空间。此过程由系统自动完成,无需用户手动干预。注意,只有 Base Compaction 才会对数据进行物理删除,Cumulative Compaction 仅对数据进行合并及重新排序,减少 rowset 及 segment 数量。 + +## 删除操作的使用场景 + +Doris 提供多种删除方式,以满足不同场景的需求: + +### 条件删除 + +用户可以通过指定过滤条件,删除满足条件的行。例如: + +```sql +DELETE FROM table_name WHERE condition; +``` + +### 通过导入进行批量删除 + +在数据导入时,通过覆盖的方式实现逻辑删除。这种方式适用于批量删除大量的 key,或者在 CDC 同步 binlog 时同步 TP 数据库的删除操作。 + +### 删除全部数据 + +在某些情况下,可以通过直接清空表或分区实现对数据的删除,例如: + +```sql +TRUNCATE TABLE table_name; +``` + +### 使用临时分区实现原子覆盖写 + +某些情况下,用户希望能够重写某一分区的数据,但如果采用先删除再导入的方式进行,在中间会有一段时间无法查看数据。这时,用户可以先创建一个对应的临时分区,将新的数据导入到临时分区后,通过替换操作,原子性地替换原有分区,以达到目的。 + +## 注意事项 + +1. 删除操作会生成新的数据版本,因此频繁执行删除可能会导致版本数量增加,从而影响查询性能。 +2. 合并压缩是释放存储空间的关键步骤,建议用户根据系统负载调整压缩策略。 +3. 删除后的数据在合并压缩完成之前仍会占用存储,因此删除操作本身不会立即降低存储使用。 + diff --git a/sidebars.json b/sidebars.json index da87f8e895..06bb42b6c2 100644 --- a/sidebars.json +++ b/sidebars.json @@ -193,6 +193,7 @@ "type": "category", "label": "Deleting Data", "items": [ + "data-operate/delete/delete-overview", "data-operate/delete/delete-manual", "data-operate/delete/batch-delete-manual", "data-operate/delete/truncate-manual", diff --git a/versioned_docs/version-2.1/data-operate/delete/delete-overview.md b/versioned_docs/version-2.1/data-operate/delete/delete-overview.md new file mode 100644 index 0000000000..994f2274c7 --- /dev/null +++ b/versioned_docs/version-2.1/data-operate/delete/delete-overview.md @@ -0,0 +1,77 @@ +--- +{ + "title": "Delete Overview", + "language": "en" +} + +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +In Apache Doris, the delete operation is a key feature for managing and cleaning data to meet the flexibility needs of users in large-scale data analysis scenarios. Doris's deletion mechanism supports efficient logical deletion and multi-version data management, achieving a good balance between performance and flexibility. + +## Implementation Mechanism of Deletion + +Doris's delete operation uses **logical deletion** rather than directly physically deleting data. The core implementation mechanisms are as follows: + +1. **Logical Deletion**. The delete operation does not directly remove data from storage but adds a delete marker to the target data. There are two main ways to implement logical deletion: delete predicate and delete sign. + + 1. Delete predicate is used for Duplicate and Aggregate models. Each deletion directly records a conditional predicate on the corresponding dataset to filter out the deleted data during queries. + 2. Delete sign is used for the Unique Key model. Each deletion writes a new batch of data to overwrite the data to be deleted, and the hidden column `__DORIS_VERSION_COL__` of the new data is set to 1, indicating that the data has been deleted. + 3. Performance comparison: The operation speed of "delete predicate" is very fast, whether deleting 1 row or 100 million rows, the speed is almost the same, it just write a conditional predicate to the dataset; the write speed of delete sign is proportional to the amount of data. + +2. **Multi-Version Data Management**. Doris supports multi-version data (MVCC, Multi-Version Concurrency Control), allowing concurrent operations on the same dataset without affecting query results. The delete operation creates a new version containing the delete marker, while the old version data is still retained. + +3. **Physical Deletion (Compaction)**. The periodically executed compaction process cleans up data marked for deletion, thereby freeing up storage space. This process is automatically completed by the system without user intervention. Note that only Base Compaction will physically delete data, while Cumulative Compaction only merges and reorders data, reducing the number of rowsets and segments. + +## Use Cases for Delete Operations + +Doris provides various deletion methods to meet different needs: + +### Conditional Deletion + +Users can delete rows that meet specified conditions. For example: + +```sql +DELETE FROM table_name WHERE condition; +``` + +### Batch Deletion via data loading + +During data loading, logical deletion can be achieved by overwriting. This method is suitable for batch deletion of a large number of keys or synchronizing TP database deletions during CDC binlog synchronization. + +### Deleting All Data + +In some cases, data can be deleted by directly truncating the table or partition. For example: + +```sql +TRUNCATE TABLE table_name; +``` + +### Atomic Overwrite Using Temporary Partitions + +In some cases, users may want to rewrite the data of a partition. If the data is deleted and then imported, there will be a period when the data is unavailable. In this case, users can create a corresponding temporary partition, import the new data into the temporary partition, and then replace the original partition atomically to achieve the goal. + +## Notes + +1. The delete operation generates new data versions, so frequent deletions may increase the number of versions, affecting query performance. +2. Compaction is a key step in freeing up storage space. Users are advised to adjust the compaction strategy based on system load. +3. Deleted data will still occupy storage until compaction is completed, so the delete operation itself will not immediately reduce storage usage. + diff --git a/versioned_docs/version-3.0/data-operate/delete/delete-overview.md b/versioned_docs/version-3.0/data-operate/delete/delete-overview.md new file mode 100644 index 0000000000..994f2274c7 --- /dev/null +++ b/versioned_docs/version-3.0/data-operate/delete/delete-overview.md @@ -0,0 +1,77 @@ +--- +{ + "title": "Delete Overview", + "language": "en" +} + +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +In Apache Doris, the delete operation is a key feature for managing and cleaning data to meet the flexibility needs of users in large-scale data analysis scenarios. Doris's deletion mechanism supports efficient logical deletion and multi-version data management, achieving a good balance between performance and flexibility. + +## Implementation Mechanism of Deletion + +Doris's delete operation uses **logical deletion** rather than directly physically deleting data. The core implementation mechanisms are as follows: + +1. **Logical Deletion**. The delete operation does not directly remove data from storage but adds a delete marker to the target data. There are two main ways to implement logical deletion: delete predicate and delete sign. + + 1. Delete predicate is used for Duplicate and Aggregate models. Each deletion directly records a conditional predicate on the corresponding dataset to filter out the deleted data during queries. + 2. Delete sign is used for the Unique Key model. Each deletion writes a new batch of data to overwrite the data to be deleted, and the hidden column `__DORIS_VERSION_COL__` of the new data is set to 1, indicating that the data has been deleted. + 3. Performance comparison: The operation speed of "delete predicate" is very fast, whether deleting 1 row or 100 million rows, the speed is almost the same, it just write a conditional predicate to the dataset; the write speed of delete sign is proportional to the amount of data. + +2. **Multi-Version Data Management**. Doris supports multi-version data (MVCC, Multi-Version Concurrency Control), allowing concurrent operations on the same dataset without affecting query results. The delete operation creates a new version containing the delete marker, while the old version data is still retained. + +3. **Physical Deletion (Compaction)**. The periodically executed compaction process cleans up data marked for deletion, thereby freeing up storage space. This process is automatically completed by the system without user intervention. Note that only Base Compaction will physically delete data, while Cumulative Compaction only merges and reorders data, reducing the number of rowsets and segments. + +## Use Cases for Delete Operations + +Doris provides various deletion methods to meet different needs: + +### Conditional Deletion + +Users can delete rows that meet specified conditions. For example: + +```sql +DELETE FROM table_name WHERE condition; +``` + +### Batch Deletion via data loading + +During data loading, logical deletion can be achieved by overwriting. This method is suitable for batch deletion of a large number of keys or synchronizing TP database deletions during CDC binlog synchronization. + +### Deleting All Data + +In some cases, data can be deleted by directly truncating the table or partition. For example: + +```sql +TRUNCATE TABLE table_name; +``` + +### Atomic Overwrite Using Temporary Partitions + +In some cases, users may want to rewrite the data of a partition. If the data is deleted and then imported, there will be a period when the data is unavailable. In this case, users can create a corresponding temporary partition, import the new data into the temporary partition, and then replace the original partition atomically to achieve the goal. + +## Notes + +1. The delete operation generates new data versions, so frequent deletions may increase the number of versions, affecting query performance. +2. Compaction is a key step in freeing up storage space. Users are advised to adjust the compaction strategy based on system load. +3. Deleted data will still occupy storage until compaction is completed, so the delete operation itself will not immediately reduce storage usage. + --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org