This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 356ba2f5b13 [improve] change table model to table type (#2830)
356ba2f5b13 is described below
commit 356ba2f5b1344b751f90be5499727dc548712e00
Author: Yongqiang YANG <[email protected]>
AuthorDate: Wed Sep 3 15:25:45 2025 +0800
[improve] change table model to table type (#2830)
## Versions
- [x] dev
- [x] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [ ] Chinese
- [ ] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---------
Co-authored-by: Yongqiang YANG <[email protected]>
Co-authored-by: yagagagaga <[email protected]>
---
docs/table-design/data-model/aggregate.md | 14 ++++-----
docs/table-design/data-model/duplicate.md | 16 +++++-----
docs/table-design/data-model/overview.md | 14 ++++-----
docs/table-design/data-model/unique.md | 24 +++++++--------
.../docusaurus-plugin-content-docs/current.json | 6 ++--
.../version-2.1.json | 6 ++--
.../version-3.0.json | 6 ++--
.../table-design/data-model/aggregate.md | 8 ++---
.../table-design/data-model/duplicate.md | 18 ++++++------
.../table-design/data-model/overview.md | 32 ++++++++++----------
.../version-3.0/table-design/data-model/unique.md | 18 ++++++------
sidebars.json | 2 +-
.../table-design/data-model/aggregate.md | 14 ++++-----
.../table-design/data-model/duplicate.md | 16 +++++-----
.../table-design/data-model/overview.md | 34 +++++++++++-----------
.../version-3.0/table-design/data-model/unique.md | 26 ++++++++---------
versioned_sidebars/version-2.1-sidebars.json | 2 +-
versioned_sidebars/version-3.0-sidebars.json | 2 +-
18 files changed, 129 insertions(+), 129 deletions(-)
diff --git a/docs/table-design/data-model/aggregate.md
b/docs/table-design/data-model/aggregate.md
index d2bd91e7463..f38a58ce552 100644
--- a/docs/table-design/data-model/aggregate.md
+++ b/docs/table-design/data-model/aggregate.md
@@ -1,21 +1,21 @@
---
{
- "title": "Aggregate Model",
+ "title": "Aggregate Key Table",
"language": "en"
}
---
-Doris's **Aggregate Key Model** is designed to efficiently handle aggregation
operations in large-scale data queries. By performing pre-aggregation on the
data, it reduces redundancy in computations and improves query performance. The
model stores only aggregated data, omitting raw data, which saves storage space
and enhances query performance.
+Doris's **Aggregate Key Table** is designed to efficiently handle aggregation
operations in large-scale data queries. By performing pre-aggregation on the
data, it reduces redundancy in computations and improves query performance. The
table stores only aggregated data, omitting raw data, which saves storage space
and enhances query performance.
## Use Cases
-* **Summarizing Detailed Data**: The Aggregate Key Model is used in scenarios
like e-commerce platforms evaluating monthly sales, financial risk control
calculating customer transaction totals, or advertising campaigns analyzing
total ad clicks, for multidimensional summarization of detailed data.
+* **Summarizing Detailed Data**: The Aggregate Key Table is used in scenarios
like e-commerce platforms evaluating monthly sales, financial risk control
calculating customer transaction totals, or advertising campaigns analyzing
total ad clicks, for multidimensional summarization of detailed data.
* **No Need to Query Raw Detailed Data**: For use cases such as dashboard
reports or user transaction behavior analysis, where the raw data is stored in
a data lake and does not need to be retained in the database, only the
aggregated data is stored.
## Principle
-Each data import creates a version in the Aggregate Key Model, and during the
**Compaction** stage, versions are merged. When querying, data is aggregated by
the primary key:
+Each data import creates a version in the Aggregate Key Table, and during the
**Compaction** stage, versions are merged. When querying, data is aggregated by
the primary key:
* **Data Import Stage**
@@ -38,7 +38,7 @@ Each data import creates a version in the Aggregate Key
Model, and during the **
## Table Creation Instructions
-When creating a table, the **AGGREGATE KEY** keyword can be used to specify
the Aggregate Key Model. The Aggregate Key Model must specify Key columns,
which are used to aggregate Value columns during storage.
+When creating a table, the **AGGREGATE KEY** keyword can be used to specify
the Aggregate Key Table. The Aggregate Key Table must specify Key columns,
which are used to aggregate Value columns during storage.
```sql
CREATE TABLE IF NOT EXISTS example_tbl_agg
@@ -56,7 +56,7 @@ DISTRIBUTED BY HASH(user_id) BUCKETS 10;
In the example above, a fact table for user information and access behavior is
defined, where `user_id`, `load_date`, `city`, and `age` are used as Key
columns for aggregation. During data import, the Key columns are aggregated
into one row, and the Value columns are aggregated according to the specified
aggregation types.
-The following types of dimension aggregation are supported in the Aggregate
Key Model:
+The following types of dimension aggregation are supported in the Aggregate
Key Table:
| Aggregation Method | Description
|
@@ -82,7 +82,7 @@ If the aggregation methods above do not meet your business
requirements, conside
In the Aggregate Key table, data is aggregated based on the primary key. After
data insertion, aggregation operations are completed.
-
+
In the example above, there were originally 4 rows of data in the table. After
inserting 2 rows, aggregation operations on the dimension columns are performed
based on the Key columns:
diff --git a/docs/table-design/data-model/duplicate.md
b/docs/table-design/data-model/duplicate.md
index 13b2cb46d4b..92017f7b47d 100644
--- a/docs/table-design/data-model/duplicate.md
+++ b/docs/table-design/data-model/duplicate.md
@@ -1,21 +1,21 @@
---
{
- "title": "Duplicate Key Model",
+ "title": "Duplicate Key Table",
"language": "en"
}
---
-The **Duplicate Key Model** in Doris is the default table model, designed to
store individual raw data records. The `Duplicate Key` specified during table
creation determines the columns for sorting and storage, optimizing common
queries. It is recommended to choose no more than three columns as the sort
key. For more specific selection guidelines, refer to [Sort
Key](../index/prefix-index). The Duplicate Key Model has the following
characteristics:
+The **Duplicate Key Table** in Doris is the default table type, designed to
store individual raw data records. The `Duplicate Key` specified during table
creation determines the columns for sorting and storage, optimizing common
queries. It is recommended to choose no more than three columns as the sort
key. For more specific selection guidelines, refer to [Sort
Key](../index/prefix-index). The Duplicate Key Table has the following
characteristics:
-* **Preserving Raw Data**: The Duplicate Key Model retains all original data,
making it ideal for storing and querying raw data. It is recommended for use
cases requiring detailed data analysis to avoid data loss.
+* **Preserving Raw Data**: The Duplicate Key Table retains all original data,
making it ideal for storing and querying raw data. It is recommended for use
cases requiring detailed data analysis to avoid data loss.
-* **No Deduplication or Aggregation**: Unlike the Aggregate and Primary Key
models, the Duplicate Key Model does not perform deduplication or aggregation,
fully retaining identical records.
+* **No Deduplication or Aggregation**: Unlike the Aggregate and Primary Key
tables, the Duplicate Key Table does not perform deduplication or aggregation,
fully retaining identical records.
-* **Flexible Data Querying**: The Duplicate Key Model retains all original
data, enabling detailed extraction and aggregation across any dimension for
metadata auditing and fine-grained analysis.
+* **Flexible Data Querying**: The Duplicate Key Table retains all original
data, enabling detailed extraction and aggregation across any dimension for
metadata auditing and fine-grained analysis.
## Use Cases
-In the Duplicate Key Model, data is generally only appended, and old data is
not updated. The Duplicate Key Model is ideal for scenarios that require full
raw data:
+In the Duplicate Key Table, data is generally only appended, and old data is
not updated. The Duplicate Key Table is ideal for scenarios that require full
raw data:
* **Log Storage**: Used for storing various types of application logs, such as
access logs, error logs, etc. Each piece of data needs to be detailed for
future auditing and analysis.
@@ -26,7 +26,7 @@ In the Duplicate Key Model, data is generally only appended,
and old data is not
## Table Creation Instructions
-When creating a table, the **DUPLICATE KEY** keyword can be used to specify
the Duplicate Key Model. The Duplicate Key table must specify the Key columns,
which are used to sort the data during storage. In the following example, the
Duplicate Key table stores log information and sorts the data based on the
`log_time`, `log_type`, and `error_code` columns:
+When creating a table, the **DUPLICATE KEY** keyword can be used to specify
the Duplicate Key Table. The Duplicate Key table must specify the Key columns,
which are used to sort the data during storage. In the following example, the
Duplicate Key table stores log information and sorts the data based on the
`log_time`, `log_type`, and `error_code` columns:
```sql
CREATE TABLE IF NOT EXISTS example_tbl_duplicate
@@ -44,7 +44,7 @@ DISTRIBUTED BY HASH(log_type) BUCKETS 10;
## Data Insertion and Storage
-In a Duplicate Key table, data is not deduplicated or aggregated; inserting
data directly stores it. The Key columns in the Duplicate Key Model are used
for sorting.
+In a Duplicate Key table, data is not deduplicated or aggregated; inserting
data directly stores it. The Key columns in the Duplicate Key Table are used
for sorting.

diff --git a/docs/table-design/data-model/overview.md
b/docs/table-design/data-model/overview.md
index f8540651fb6..1b89e85bead 100644
--- a/docs/table-design/data-model/overview.md
+++ b/docs/table-design/data-model/overview.md
@@ -1,15 +1,15 @@
---
{
- "title": "Table Model Overview",
+ "title": "Table Type Overview",
"language": "en"
}
---
-When creating a table in Doris, it is necessary to specify the table model to
define how data is stored and managed. Doris provides three table models: the
**Duplicate Key Model**, **Unique Key Model** and **Aggregate Key Model**,
which cater to different application scenarios. Each model has corresponding
mechanisms for data deduplication, aggregation, and updates. Choosing the
appropriate table model helps achieve business objectives while ensuring
flexibility and efficiency in data pr [...]
+When creating a table in Doris, you need to specify the table type, which
determines how data is stored and managed. In Doris, the concept of "Key Model"
is used to define the storage and management semantics of keys, and is closely
related to the table type. Doris supports three key models (or table types):
**Duplicate Key**, **Unique Key**, and **Aggregate Key**. Each key model
provides different mechanisms for data deduplication, aggregation, and update
handling, and is suitable for d [...]
-## Table Model Classification
+## Table Type Classification
-Doris supports three types of table models:
+Doris supports three types of table types:
* [Duplicate Key Model](./duplicate): Allows the specified Key columns to be
duplicated, and Doris's storage layer retains all written data. This model is
suitable for situations where all original data records must be preserved.
@@ -17,7 +17,7 @@ Doris supports three types of table models:
* [Aggregate Key Model](./aggregate): Allows data to be aggregated based on
the Key columns. The Doris storage layer retains aggregated data, reducing
storage space and improving query performance. This model is typically used in
situations where summary or aggregated information (such as totals or averages)
is required.
-After creating the table, the properties of the table model are confirmed and
cannot be modified. Choosing the right model for the business is crucial:
+After creating the table, the properties of the table type are confirmed and
cannot be modified. Choosing the right model for the business is crucial:
* **Duplicate Key Model** is suitable for ad-hoc queries with any dimensions.
Although it cannot leverage the benefits of pre-aggregation, it is not
constrained by aggregation models and can take advantage of the columnar
storage model (only reading relevant columns without needing to read all key
columns).
@@ -33,7 +33,7 @@ After creating the table, the properties of the table model
are confirmed and ca
In Doris, data is stored in a columnar format, and a table can be divided into
Key columns and Value columns. The Key columns are used for grouping and
sorting, while the Value columns are used for aggregation. Key columns can
consist of one or more fields, and when creating a table, data is sorted and
stored according to the columns of Aggregate Key, Unique Key, and Duplicate Key
models.
-Different table models require the specification of Key columns during table
creation, each with a different significance: for the Duplicate Key model, the
Key columns represent sorting, without any uniqueness constraints. In the
Aggregate Key and Unique Key models, aggregation is performed based on the Key
columns, which not only have sorting capabilities but also enforce uniqueness
constraints.
+Different table types require the specification of Key columns during table
creation, each with a different significance: for the Duplicate Key model, the
Key columns represent sorting, without any uniqueness constraints. In the
Aggregate Key and Unique Key models, aggregation is performed based on the Key
columns, which not only have sorting capabilities but also enforce uniqueness
constraints.
Proper use of the Sort Key can provide the following benefits:
@@ -53,7 +53,7 @@ When selecting a sort key, the following recommendations can
be followed:
* For the length of `VARCHAR` and `STRING` types, follow the principle of
choosing enough...
-## Table Model Comparison
+## table type Comparison
| | Duplicate Key Model | Unique Key Model | Aggregate Key
Model |
| --------- | ------------------ | ----------------- | --------------- |
diff --git a/docs/table-design/data-model/unique.md
b/docs/table-design/data-model/unique.md
index f83fedab50d..680f7deb327 100644
--- a/docs/table-design/data-model/unique.md
+++ b/docs/table-design/data-model/unique.md
@@ -1,38 +1,38 @@
---
{
- "title": "Unique Key Model",
+ "title": "Unique Key Table",
"language": "en"
}
---
-When data updates are required, use the **Unique Key Model**. It guarantees
the uniqueness of the Key columns so that new data overwrites existing records
with matching keys, ensuring that only the most up-to-date records are
maintained. This model is ideal for update scenarios, enabling unique-key-level
updates during data insertion.
-The Unique Key Model has the following characteristics:
+When data updates are required, use the **Unique Key Table**. It guarantees
the uniqueness of the Key columns so that new data overwrites existing records
with matching keys, ensuring that only the most up-to-date records are
maintained. This table is ideal for update scenarios, enabling unique-key-level
updates during data insertion.
+The Unique Key Table has the following characteristics:
* **Unique Key UPSERT**: During insertion, records with duplicate keys are
updated, while new keys are inserted.
-* **Automatic Deduplication**: The model ensures key uniqueness and
automatically deduplicates data based on the unique key.
+* **Automatic Deduplication**: The table ensures key uniqueness and
automatically deduplicates data based on the unique key.
* **Optimized for High-frequency Updates**: It efficiently handles
high-frequency updates while balancing update and query performance.
## Use Cases
-* **High-frequency Data Updates**: In upstream OLTP databases, where dimension
tables are frequently updated, the Unique Key Model can efficiently synchronize
the upstream updated records and perform efficient UPSERT operations.
+* **High-frequency Data Updates**: In upstream OLTP databases, where dimension
tables are frequently updated, the Unique Key Table can efficiently synchronize
the upstream updated records and perform efficient UPSERT operations.
-* **Efficient Data Deduplication**: In scenarios such as advertising campaigns
or customer relationship management systems, where deduplication is required
based on user IDs, the Unique Key Model ensures efficient deduplication.
+* **Efficient Data Deduplication**: In scenarios such as advertising campaigns
or customer relationship management systems, where deduplication is required
based on user IDs, the Unique Key Table ensures efficient deduplication.
-* **Partial Columns Updates**: In scenarios such as in user profiling where
dynamic tags change frequently, or in order consumption scenarios where the
transaction status needs to be updated. The Unique Key Model's partial column
update capability allows for changes to specific columns.
+* **Partial Columns Updates**: In scenarios such as in user profiling where
dynamic tags change frequently, or in order consumption scenarios where the
transaction status needs to be updated. The Unique Key Table's partial column
update capability allows for changes to specific columns.
## Implementation Methods
-In Doris, the Unique Key Model has two implementation methods:
+In Doris, the Unique Key Table has two implementation methods:
-* **Merge-on-write**: Starting from version 1.2, the default implementation of
the Unique Key Model in Doris is the merge-on-write mode. In this mode, data is
immediately merged for the same Key upon writing, ensuring that the data
storage state after each write is the final merged result of the unique key,
and only the latest result is stored. Merge-on-write provides a good balance
between query and write performance, avoiding the need to merge multiple
versions of data during queries a [...]
+* **Merge-on-write**: Starting from version 1.2, the default implementation of
the Unique Key Table in Doris is the merge-on-write mode. In this mode, data is
immediately merged for the same Key upon writing, ensuring that the data
storage state after each write is the final merged result of the unique key,
and only the latest result is stored. Merge-on-write provides a good balance
between query and write performance, avoiding the need to merge multiple
versions of data during queries a [...]
-* **Merge-on-read**: Prior to version 1.2, Doris's Unique Key Model defaulted
to merge-on-read mode. In this mode, data is not merged upon writing but is
appended incrementally, retaining multiple versions within Doris. During
queries or Compaction, data is merged by the same Key version. Merge-on-read is
suitable for write-heavy and read-light scenarios, but during queries, multiple
versions must be merged, and predicates cannot be pushed down, which may affect
query speed.
+* **Merge-on-read**: Prior to version 1.2, Doris's Unique Key Table defaulted
to merge-on-read mode. In this mode, data is not merged upon writing but is
appended incrementally, retaining multiple versions within Doris. During
queries or Compaction, data is merged by the same Key version. Merge-on-read is
suitable for write-heavy and read-light scenarios, but during queries, multiple
versions must be merged, and predicates cannot be pushed down, which may affect
query speed.
-In Doris, there are two types of update semantics for the Unique Key Model:
+In Doris, there are two types of update semantics for the Unique Key Table:
-* **Full Row Upsert**: The default update semantic for the Unique Key Model is
**full row UPSERT**, i.e., UPDATE OR INSERT. If the Key of the row exists, it
will be updated; if it does not exist, new data will be inserted. In the full
row UPSERT semantic, even if the user inserts data into specific columns using
`INSERT INTO`, Doris will fill in the missing columns with NULL values or
default values during the planner stage.
+* **Full Row Upsert**: The default update semantic for the Unique Key Table is
**full row UPSERT**, i.e., UPDATE OR INSERT. If the Key of the row exists, it
will be updated; if it does not exist, new data will be inserted. In the full
row UPSERT semantic, even if the user inserts data into specific columns using
`INSERT INTO`, Doris will fill in the missing columns with NULL values or
default values during the planner stage.
* **Partial Column Upsert**: If users want to update specific fields, they
need to use the merge-on-write implementation and enable partial column updates
support via specific parameters. Please refer to the documentation on [Partial
Column Updates](../../data-operate/update/update-of-unique-model).
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
index d68e0134005..3bcf3e174bd 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
@@ -79,9 +79,9 @@
"message": "数据表设计",
"description": "The label for category Data Table Design in sidebar docs"
},
- "sidebar.docs.category.Data Models": {
- "message": "数据模型",
- "description": "The label for category Data Models in sidebar docs"
+ "sidebar.docs.category.Table Types": {
+ "message": "表类型",
+ "description": "The label for category Table Types in sidebar docs"
},
"sidebar.docs.category.Data Partitioning": {
"message": "数据划分",
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json
index b3a214054b7..93ecd757a3b 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json
@@ -75,9 +75,9 @@
"message": "数据表设计",
"description": "The label for category Data Table Design in sidebar docs"
},
- "sidebar.docs.category.Data Models": {
- "message": "数据模型",
- "description": "The label for category Data Models in sidebar docs"
+ "sidebar.docs.category.Table Types": {
+ "message": "表类型",
+ "description": "The label for category Table Types in sidebar docs"
},
"sidebar.docs.category.Data Partitioning": {
"message": "数据划分",
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json
index a32da03ca5f..b4dd4acfce2 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json
@@ -75,9 +75,9 @@
"message": "数据表设计",
"description": "The label for category Data Table Design in sidebar docs"
},
- "sidebar.docs.category.Data Models": {
- "message": "数据模型",
- "description": "The label for category Data Models in sidebar docs"
+ "sidebar.docs.category.Table Types": {
+ "message": "表类型",
+ "description": "The label for category Table Types in sidebar docs"
},
"sidebar.docs.category.Data Partitioning": {
"message": "数据划分",
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/aggregate.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/aggregate.md
index 087b9d37b37..80cf661e3fd 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/aggregate.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/aggregate.md
@@ -1,11 +1,11 @@
---
{
- "title": "聚合模型",
+ "title": "聚合表",
"language": "zh-CN"
}
---
-Doris
的聚合模型专为高效处理大规模数据查询中的聚合操作设计。它通过预聚合数据,减少重复计算,提升查询性能。聚合模型只存储聚合后的数据,节省存储空间并加速查询。
+Doris
的聚合表专为高效处理大规模数据查询中的聚合操作设计。它通过预聚合数据,减少重复计算,提升查询性能。聚合表只存储聚合后的数据,节省存储空间并加速查询。
## 使用场景
@@ -15,7 +15,7 @@ Doris 的聚合模型专为高效处理大规模数据查询中的聚合操作
## 原理
-每一次数据导入会在聚合模型内形成一个版本,在 Compaction 阶段进行版本合并,在查询时会按照主键进行数据聚合:
+每一次数据导入会在聚合表内形成一个版本,在 Compaction 阶段进行版本合并,在查询时会按照主键进行数据聚合:
1. **数据导入阶段**:数据按批次导入,每批次生成一个版本,并对相同聚合键的数据进行初步聚合(如求和、计数);
@@ -25,7 +25,7 @@ Doris 的聚合模型专为高效处理大规模数据查询中的聚合操作
## 建表说明
-使用 AGGREGATE KEY 关键字在建表时指定聚合模型,并指定 Key 列用于聚合 Value 列。
+使用 AGGREGATE KEY 关键字在建表时指定聚合表,并指定 Key 列用于聚合 Value 列。
```sql
CREATE TABLE IF NOT EXISTS example_tbl_agg
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/duplicate.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/duplicate.md
index 5d1efdde688..3debb327536 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/duplicate.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/duplicate.md
@@ -1,31 +1,31 @@
---
{
- "title": "明细模型",
+ "title": "明细表",
"language": "zh-CN"
}
---
-明细模型是 Doris 中的默认建表模型,用于保存每条原始数据记录。在建表时,通过 `DUPLICATE KEY`
指定数据存储的排序列,以优化常用查询。一般建议选择三列或更少的列作为排序键,具体选择方式参考[排序键](../index/prefix-index)。明细模型具有以下特点:
+明细表是 Doris 中的默认建表模型,用于保存每条原始数据记录。在建表时,通过 `DUPLICATE KEY`
指定数据存储的排序列,以优化常用查询。一般建议选择三列或更少的列作为排序键,具体选择方式参考[排序键](../index/prefix-index)。明细表具有以下特点:
-* **保留原始数据**:明细模型保留了全量的原始数据,适合于存储与查询原始数据。对于需要进行详细数据分析的应用场景,建议使用明细模型,以避免数据丢失的风险;
+* **保留原始数据**:明细表保留了全量的原始数据,适合于存储与查询原始数据。对于需要进行详细数据分析的应用场景,建议使用明细表,以避免数据丢失的风险;
-* **不去重也不聚合**:与聚合模型与主键模型不同,明细模型不会对数据进行去重与聚合操作。即使两条相同的数据,每次插入时也会被完整保留;
+* **不去重也不聚合**:与聚合模型与主键模型不同,明细表不会对数据进行去重与聚合操作。即使两条相同的数据,每次插入时也会被完整保留;
-* **灵活的数据查询**:明细模型保留了全量的原始数据,可以从完整数据中提取细节,基于全量数据做任意维度的聚合操作,从而进行元数数据的审计及细粒度的分析。
+* **灵活的数据查询**:明细表保留了全量的原始数据,可以从完整数据中提取细节,基于全量数据做任意维度的聚合操作,从而进行元数数据的审计及细粒度的分析。
## 使用场景
-一般明细模型中的数据只进行追加,旧数据不会更新。明细模型适用于需要存储全量原始数据的场景:
+一般明细表中的数据只进行追加,旧数据不会更新。明细表适用于需要存储全量原始数据的场景:
* **日志存储**:用于存储各类的程序操作日志,如访问日志、错误日志等。每一条数据都需要被详细记录,方便后续的审计与分析;
* **用户行为数据**:在分析用户行为时,如点击数据、用户访问轨迹等,需要保留用户的详细行为,方便后续构建用户画像及对行为路径进行详细分析;
-*
**交易数据**:在某些存储交易行为或订单数据时,交易结束时一般不会发生数据变更。明细模型适合保留这一类交易信息,不遗漏任意一笔记录,方便对交易进行精确的对账。
+*
**交易数据**:在某些存储交易行为或订单数据时,交易结束时一般不会发生数据变更。明细表适合保留这一类交易信息,不遗漏任意一笔记录,方便对交易进行精确的对账。
## 建表说明
-在建表时,可以通过 `DUPLICATE KEY` 关键字指定明细模型。明细表必须指定数据的 Key
列,用于在存储时对数据进行排序。下例的明细表中存储了日志信息,并针对于 `log_time`、`log_type` 及 `error_code`
三列进行了排序:
+在建表时,可以通过 `DUPLICATE KEY` 关键字指定明细表。明细表必须指定数据的 Key
列,用于在存储时对数据进行排序。下例的明细表中存储了日志信息,并针对于 `log_time`、`log_type` 及 `error_code`
三列进行了排序:
```sql
CREATE TABLE IF NOT EXISTS example_tbl_duplicate
@@ -43,7 +43,7 @@ DISTRIBUTED BY HASH(log_type) BUCKETS 10;
## 数据插入与存储
-在明细表中,数据不进行去重与聚合,插入数据即存储数据。明细模型中 Key 列指做为排序。
+在明细表中,数据不进行去重与聚合,插入数据即存储数据。明细表中 Key 列指做为排序。

diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/overview.md
index 58ad6f7a68e..f350a042e12 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/overview.md
@@ -1,37 +1,37 @@
---
{
- "title": "模型概述",
+ "title": "表类型概述",
"language": "zh-CN"
}
---
-在 Doris 中建表时需要指定表模型,以定义数据存储与管理方式。在 Doris
中提供了明细模型、聚合模型以及主键模型三种表模型,可以应对不同的应用场景需求。不同的表模型具有相应的数据去重、聚合及更新机制。选择合适的表模型有助于实现业务目标,同时保证数据处理的灵活性和高效性。
+在 Doris 中建表时需要指定表类型,以定义数据存储与管理方式。在 Doris
中提供了明细表、聚合表以及主键表三种表类型,可以应对不同的应用场景需求。不同的表类型具有相应的数据去重、聚合及更新机制。选择合适的表类型有助于实现业务目标,同时保证数据处理的灵活性和高效性。
-## 表模型分类
+## 表类型分类
-在 Doris 中支持三种表模型:
+在 Doris 中支持三种表类型:
-* [明细模型](./duplicate)(Duplicate Key Model):允许指定的 Key 列重复,Doirs
存储层保留所有写入的数据,适用于必须保留所有原始数据记录的情况;
+* [明细表](./duplicate)(Duplicate Key Table):允许指定的 Key 列重复,Doirs
存储层保留所有写入的数据,适用于必须保留所有原始数据记录的情况;
-* [主键模型](./unique)(Unique Key Model):每一行的 Key 值唯一,可确保给定的 Key 列不会存在重复行,Doris
存储层对每个 key 只保留最新写入的数据,适用于数据更新的情况;
+* [主键表](./unique)(Unique Key Table):每一行的 Key 值唯一,可确保给定的 Key 列不会存在重复行,Doris
存储层对每个 key 只保留最新写入的数据,适用于数据更新的情况;
-* [聚合模型](./aggregate)(Aggregate Key Model):可根据 Key 列聚合数据,Doris
存储层保留聚合后的数据,从而可以减少存储空间和提升查询性能;通常用于需要汇总或聚合信息(如总数或平均值)的情况。
+* [聚合表](./aggregate)(Aggregate Key Table):可根据 Key 列聚合数据,Doris
存储层保留聚合后的数据,从而可以减少存储空间和提升查询性能;通常用于需要汇总或聚合信息(如总数或平均值)的情况。
-在建表后,表模型的属性已经确认,无法修改。针对业务选择合适的模型至关重要:
+在建表后,表类型已经确认,无法修改。针对业务选择合适的类型至关重要:
-* **Duplicate Key**:适合任意维度的 Ad-hoc
查询。虽然同样无法利用预聚合的特性,但是不受聚合模型的约束,可以发挥列存模型的优势(只读取相关列,而不需要读取所有 Key 列)。
+* **Duplicate Key Table**:适合任意维度的 Ad-hoc
查询。虽然同样无法利用预聚合的特性,但是不受聚合表的约束,可以发挥列列存的优势(只读取相关列,而不需要读取所有 Key 列)。
-* **Unique Key**:针对需要唯一主键约束的场景,可以保证主键唯一性约束。但是无法利用 ROLLUP 等预聚合带来的查询优势。
+* **Unique Key Table**:针对需要唯一主键约束的场景,可以保证主键唯一性约束。但是无法利用 ROLLUP 等预聚合带来的查询优势。
-* **Aggregate Key**:可以通过预聚合,极大地降低聚合查询时所需扫描的数据量和查询的计算量,非常适合有固定模式的报表类查询场景。但是该模型对
`count(*)` 查询很不友好。同时因为固定了 Value 列上的聚合方式,在进行其他类型的聚合查询时,需要考虑语意正确性。
+* **Aggregate Key
Table**:可以通过预聚合,极大地降低聚合查询时所需扫描的数据量和查询的计算量,非常适合有固定模式的报表类查询场景。但是该类型表对 `count(*)`
查询很不友好。同时因为固定了 Value 列上的聚合方式,在进行其他类型的聚合查询时,需要考虑语意正确性。
-*
**部分列更新**:请查阅文档[主键模型部分列更新](../../data-operate/update/update-of-aggregate-model)与[聚合模型部份列更新](../../data-operate/update/update-of-aggregate-model)获取相关使用建议。
+*
**部分列更新**:请查阅文档[主键表部分列更新](../../data-operate/update/update-of-aggregate-model)与[聚合表部份列更新](../../data-operate/update/update-of-aggregate-model)获取相关使用建议。
## 排序键
-在 Doris 中,数据以列的形式存储,一张表可以分为 key 列与 value 列。其中,key 列用于分组与排序,value 列用于参与聚合。Key
列可以是一个或多个字段,在建表时,按照各种表模型中,Aggregate Key、Unique Key 和 Duplicate Key 的列进行数据排序存储。
+在 Doris 中,数据以列的形式存储,一张表可以分为 key 列与 value 列。其中,key 列用于分组与排序,value 列用于参与聚合。Key
列可以是一个或多个字段,在建表时,按照各种表类型中,Aggregate Key、Unique Key 和 Duplicate Key 的列进行数据排序存储。
-不同的表模型都需要在建表时指定 Key 列,分别有不同的意义:对于 Duplicate Key 模型,Key 列表示排序,没有唯一键的约束。在
Aggregate Key 与 Unique Key 模型中,会基于 Key 列进行聚合,Key 列既有排序的能力,又有唯一键的约束。
+不同的表类型都需要在建表时指定 Key 列,分别有不同的意义:对于明细表,Key 列表示排序,没有唯一键的约束。在聚合表与主键表中,会基于 Key
列进行聚合,Key 列既有排序的能力,又有唯一键的约束。
合理使用排序键可以带来以下收益:
@@ -51,9 +51,9 @@
* 对于 `VARCHAR` 和 `STRING` 类型的长度,遵循够用即可原则。
-## 表模型能力对比
+## 表类型能力对比
-| | 明细模型 | 主键模型 | 聚合模型 |
+| | 明细表 | 主键表 | 聚合表 |
| --------- | ------------- | ---- | ---- |
| Key 列唯一约束 | 不支持,Key 列可以重复 | 支持 | 支持 |
| 同步物化视图 | 支持 | 支持 | 支持 |
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/unique.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/unique.md
index 9f750719107..0d84e5bca16 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/unique.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-model/unique.md
@@ -1,17 +1,17 @@
---
{
- "title": "主键模型",
+ "title": "主键表",
"language": "zh-CN"
}
---
-当需要更新数据时,可以选择主键模型(Unique Key Model)。该模型保证 Key 列的唯一性,插入或更新数据时,新数据会覆盖具有相同 Key
的旧数据,确保数据记录为最新。与其他数据模型相比,主键模型适用于数据的更新场景,在插入过程中进行主键级别的更新覆盖。
+当需要更新数据时,可以选择主键表(Unique Key Table)。该模型保证 Key 列的唯一性,插入或更新数据时,新数据会覆盖具有相同 Key
的旧数据,确保数据记录为最新。与其他数据模型相比,主键表适用于数据的更新场景,在插入过程中进行主键级别的更新覆盖。
-主键模型有以下特点:
+主键表有以下特点:
* **基于主键进行 UPSERT**:在插入数据时,主键重复的数据会更新,主键不存在的记录会插入;
-* **基于主键进行去重**:主键模型中的 Key 列具有唯一性,会对根据主键列对数据进行去重操作;
+* **基于主键进行去重**:主键表中的 Key 列具有唯一性,会对根据主键列对数据进行去重操作;
* **高频数据更新**:支持高频数据更新场景,同时平衡数据更新性能与查询性能。
@@ -19,19 +19,19 @@
* **高频数据更新**:适用于上游 OLTP 数据库中的维度表,实时同步更新记录,并高效执行 UPSERT 操作;
-* **数据高效去重**:如广告投放和客户关系管理系统中,使用主键模型可以基于用户 ID 高效去重;
+* **数据高效去重**:如广告投放和客户关系管理系统中,使用主键表可以基于用户 ID 高效去重;
-* **需要部分列更新**:如画像标签场景需要变更频繁改动的动态标签,消费订单场景需要改变交易的状态。通过主键模型部分列更新能力可以完成某几列的变更操作。
+* **需要部分列更新**:如画像标签场景需要变更频繁改动的动态标签,消费订单场景需要改变交易的状态。通过主键表部分列更新能力可以完成某几列的变更操作。
## 实现方式
-在 Doris 中主键模型有两种实现方式:
+在 Doris 中主键表有两种实现方式:
* **写时合并**(merge-on-write):自 1.2 版本起,Doris 默认使用写时合并模式,数据在写入时立即合并相同 Key
的记录,确保存储的始终是最新数据。写时合并兼顾查询和写入性能,避免多个版本的数据合并,并支持谓词下推到存储层。大多数场景推荐使用此模式;
-* **读时合并**(merge-on-read):在 1.2 版本前,Doris
中的主键模型默认使用读时合并模式,数据在写入时并不进行合并,以增量的方式被追加存储,在 Doris 内保留多个版本。查询或 Compaction
时,会对数据进行相同 Key 的版本合并。读时合并适合写多读少的场景,在查询是需要进行多个版本合并,谓词无法下推,可能会影响到查询速度。
+* **读时合并**(merge-on-read):在 1.2 版本前,Doris
中的主键表默认使用读时合并模式,数据在写入时并不进行合并,以增量的方式被追加存储,在 Doris 内保留多个版本。查询或 Compaction
时,会对数据进行相同 Key 的版本合并。读时合并适合写多读少的场景,在查询是需要进行多个版本合并,谓词无法下推,可能会影响到查询速度。
-在 Doris 中基于主键模型更新有两种语义:
+在 Doris 中基于主键表更新有两种语义:
* **整行更新**:Unique Key 模型默认的更新语义为整行`UPSERT`,即 UPDATE OR INSERT,该行数据的 Key
如果存在,则进行更新,如果不存在,则进行新数据插入。在整行 `UPSERT` 语义下,即使用户使用 Insert Into 指定部分列进行写入,Doris
也会在 Planner 中将未提供的列使用 NULL 值或者默认值进行填充。
diff --git a/sidebars.json b/sidebars.json
index 32c7bc865e3..9ad95bb336f 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -97,7 +97,7 @@
"table-design/overview",
{
"type": "category",
- "label": "Data Models",
+ "label": "Table Types",
"items": [
"table-design/data-model/overview",
"table-design/data-model/duplicate",
diff --git a/versioned_docs/version-3.0/table-design/data-model/aggregate.md
b/versioned_docs/version-3.0/table-design/data-model/aggregate.md
index cc07045a895..49facbe6b90 100644
--- a/versioned_docs/version-3.0/table-design/data-model/aggregate.md
+++ b/versioned_docs/version-3.0/table-design/data-model/aggregate.md
@@ -1,21 +1,21 @@
---
{
- "title": "Aggregate Model",
+ "title": "Aggregate Key Table",
"language": "en"
}
---
-Doris's **Aggregate Key Model** is designed to efficiently handle aggregation
operations in large-scale data queries. By performing pre-aggregation on the
data, it reduces redundancy in computations and improves query performance. The
model stores only aggregated data, omitting raw data, which saves storage space
and enhances query performance.
+Doris's **Aggregate Key Table** is designed to efficiently handle aggregation
operations in large-scale data queries. By performing pre-aggregation on the
data, it reduces redundancy in computations and improves query performance. The
table stores only aggregated data, omitting raw data, which saves storage space
and enhances query performance.
## Use Cases
-* **Summarizing Detailed Data**: The Aggregate Key Model is used in scenarios
like e-commerce platforms evaluating monthly sales, financial risk control
calculating customer transaction totals, or advertising campaigns analyzing
total ad clicks, for multidimensional summarization of detailed data.
+* **Summarizing Detailed Data**: The Aggregate Key Table is used in scenarios
like e-commerce platforms evaluating monthly sales, financial risk control
calculating customer transaction totals, or advertising campaigns analyzing
total ad clicks, for multidimensional summarization of detailed data.
* **No Need to Query Raw Detailed Data**: For use cases such as dashboard
reports or user transaction behavior analysis, where the raw data is stored in
a data lake and does not need to be retained in the database, only the
aggregated data is stored.
## Principle
-Each data import creates a version in the Aggregate Key Model, and during the
**Compaction** stage, versions are merged. When querying, data is aggregated by
the primary key:
+Each data import creates a version in the Aggregate Key Table, and during the
**Compaction** stage, versions are merged. When querying, data is aggregated by
the primary key:
* **Data Import Stage**
@@ -38,7 +38,7 @@ Each data import creates a version in the Aggregate Key
Model, and during the **
## Table Creation Instructions
-When creating a table, the **AGGREGATE KEY** keyword can be used to specify
the Aggregate Key Model. The Aggregate Key Model must specify Key columns,
which are used to aggregate Value columns during storage.
+When creating a table, the **AGGREGATE KEY** keyword can be used to specify
the Aggregate Key Table. The Aggregate Key Table must specify Key columns,
which are used to aggregate Value columns during storage.
```sql
CREATE TABLE IF NOT EXISTS example_tbl_agg
@@ -56,7 +56,7 @@ DISTRIBUTED BY HASH(user_id) BUCKETS 10;
In the example above, a fact table for user information and access behavior is
defined, where `user_id`, `load_date`, `city`, and `age` are used as Key
columns for aggregation. During data import, the Key columns are aggregated
into one row, and the Value columns are aggregated according to the specified
aggregation types.
-The following types of dimension aggregation are supported in the Aggregate
Key Model:
+The following types of dimension aggregation are supported in the Aggregate
Key Table:
| Aggregation Method | Description
|
@@ -81,7 +81,7 @@ If the aggregation methods above do not meet your business
requirements, conside
In the Aggregate Key table, data is aggregated based on the primary key. After
data insertion, aggregation operations are completed.
-
+
In the example above, there were originally 4 rows of data in the table. After
inserting 2 rows, aggregation operations on the dimension columns are performed
based on the Key columns:
diff --git a/versioned_docs/version-3.0/table-design/data-model/duplicate.md
b/versioned_docs/version-3.0/table-design/data-model/duplicate.md
index 13b2cb46d4b..92017f7b47d 100644
--- a/versioned_docs/version-3.0/table-design/data-model/duplicate.md
+++ b/versioned_docs/version-3.0/table-design/data-model/duplicate.md
@@ -1,21 +1,21 @@
---
{
- "title": "Duplicate Key Model",
+ "title": "Duplicate Key Table",
"language": "en"
}
---
-The **Duplicate Key Model** in Doris is the default table model, designed to
store individual raw data records. The `Duplicate Key` specified during table
creation determines the columns for sorting and storage, optimizing common
queries. It is recommended to choose no more than three columns as the sort
key. For more specific selection guidelines, refer to [Sort
Key](../index/prefix-index). The Duplicate Key Model has the following
characteristics:
+The **Duplicate Key Table** in Doris is the default table type, designed to
store individual raw data records. The `Duplicate Key` specified during table
creation determines the columns for sorting and storage, optimizing common
queries. It is recommended to choose no more than three columns as the sort
key. For more specific selection guidelines, refer to [Sort
Key](../index/prefix-index). The Duplicate Key Table has the following
characteristics:
-* **Preserving Raw Data**: The Duplicate Key Model retains all original data,
making it ideal for storing and querying raw data. It is recommended for use
cases requiring detailed data analysis to avoid data loss.
+* **Preserving Raw Data**: The Duplicate Key Table retains all original data,
making it ideal for storing and querying raw data. It is recommended for use
cases requiring detailed data analysis to avoid data loss.
-* **No Deduplication or Aggregation**: Unlike the Aggregate and Primary Key
models, the Duplicate Key Model does not perform deduplication or aggregation,
fully retaining identical records.
+* **No Deduplication or Aggregation**: Unlike the Aggregate and Primary Key
tables, the Duplicate Key Table does not perform deduplication or aggregation,
fully retaining identical records.
-* **Flexible Data Querying**: The Duplicate Key Model retains all original
data, enabling detailed extraction and aggregation across any dimension for
metadata auditing and fine-grained analysis.
+* **Flexible Data Querying**: The Duplicate Key Table retains all original
data, enabling detailed extraction and aggregation across any dimension for
metadata auditing and fine-grained analysis.
## Use Cases
-In the Duplicate Key Model, data is generally only appended, and old data is
not updated. The Duplicate Key Model is ideal for scenarios that require full
raw data:
+In the Duplicate Key Table, data is generally only appended, and old data is
not updated. The Duplicate Key Table is ideal for scenarios that require full
raw data:
* **Log Storage**: Used for storing various types of application logs, such as
access logs, error logs, etc. Each piece of data needs to be detailed for
future auditing and analysis.
@@ -26,7 +26,7 @@ In the Duplicate Key Model, data is generally only appended,
and old data is not
## Table Creation Instructions
-When creating a table, the **DUPLICATE KEY** keyword can be used to specify
the Duplicate Key Model. The Duplicate Key table must specify the Key columns,
which are used to sort the data during storage. In the following example, the
Duplicate Key table stores log information and sorts the data based on the
`log_time`, `log_type`, and `error_code` columns:
+When creating a table, the **DUPLICATE KEY** keyword can be used to specify
the Duplicate Key Table. The Duplicate Key table must specify the Key columns,
which are used to sort the data during storage. In the following example, the
Duplicate Key table stores log information and sorts the data based on the
`log_time`, `log_type`, and `error_code` columns:
```sql
CREATE TABLE IF NOT EXISTS example_tbl_duplicate
@@ -44,7 +44,7 @@ DISTRIBUTED BY HASH(log_type) BUCKETS 10;
## Data Insertion and Storage
-In a Duplicate Key table, data is not deduplicated or aggregated; inserting
data directly stores it. The Key columns in the Duplicate Key Model are used
for sorting.
+In a Duplicate Key table, data is not deduplicated or aggregated; inserting
data directly stores it. The Key columns in the Duplicate Key Table are used
for sorting.

diff --git a/versioned_docs/version-3.0/table-design/data-model/overview.md
b/versioned_docs/version-3.0/table-design/data-model/overview.md
index f8540651fb6..2ec82de91d8 100644
--- a/versioned_docs/version-3.0/table-design/data-model/overview.md
+++ b/versioned_docs/version-3.0/table-design/data-model/overview.md
@@ -1,39 +1,39 @@
---
{
- "title": "Table Model Overview",
+ "title": "Table Type Overview",
"language": "en"
}
---
-When creating a table in Doris, it is necessary to specify the table model to
define how data is stored and managed. Doris provides three table models: the
**Duplicate Key Model**, **Unique Key Model** and **Aggregate Key Model**,
which cater to different application scenarios. Each model has corresponding
mechanisms for data deduplication, aggregation, and updates. Choosing the
appropriate table model helps achieve business objectives while ensuring
flexibility and efficiency in data pr [...]
+When creating a table in Doris, it is necessary to specify the Table Type to
define how data is stored and managed. Doris provides three Table Types: the
**Duplicate Key Table**, **Unique Key Table** and **Aggregate Key Table**,
which cater to different application scenarios. Each type has corresponding
mechanisms for data deduplication, aggregation, and updates. Choosing the
appropriate Table Type helps achieve business objectives while ensuring
flexibility and efficiency in data processing.
-## Table Model Classification
+## Table Type Classification
-Doris supports three types of table models:
+Doris supports three types of Table Types:
-* [Duplicate Key Model](./duplicate): Allows the specified Key columns to be
duplicated, and Doris's storage layer retains all written data. This model is
suitable for situations where all original data records must be preserved.
+* [Duplicate Key Table](./duplicate): Allows the specified Key columns to be
duplicated, and Doris's storage layer retains all written data. This type is
suitable for situations where all original data records must be preserved.
-* [Unique Key Model](./unique): Ensures that each row has a unique Key value,
and guarantees that there are no duplicate rows for a given Key column. The
Doris storage layer retains only the latest written data for each key, making
this model suitable for scenarios that involve data updates.
+* [Unique Key Table](./unique): Ensures that each row has a unique Key value,
and guarantees that there are no duplicate rows for a given Key column. The
Doris storage layer retains only the latest written data for each key, making
this type suitable for scenarios that involve data updates.
-* [Aggregate Key Model](./aggregate): Allows data to be aggregated based on
the Key columns. The Doris storage layer retains aggregated data, reducing
storage space and improving query performance. This model is typically used in
situations where summary or aggregated information (such as totals or averages)
is required.
+* [Aggregate Key Table](./aggregate): Allows data to be aggregated based on
the Key columns. The Doris storage layer retains aggregated data, reducing
storage space and improving query performance. This type is typically used in
situations where summary or aggregated information (such as totals or averages)
is required.
-After creating the table, the properties of the table model are confirmed and
cannot be modified. Choosing the right model for the business is crucial:
+After creating the table, the properties of the Table Type are confirmed and
cannot be modified. Choosing the right type for the business is crucial:
-* **Duplicate Key Model** is suitable for ad-hoc queries with any dimensions.
Although it cannot leverage the benefits of pre-aggregation, it is not
constrained by aggregation models and can take advantage of the columnar
storage model (only reading relevant columns without needing to read all key
columns).
+* **Duplicate Key Table** is suitable for ad-hoc queries with any dimensions.
Although it cannot leverage the benefits of pre-aggregation, it is not
constrained by aggregation tables and can take advantage of the columnar
storage (only reading relevant columns without needing to read all key columns).
-* **Unique Key Model** is designed for scenarios where a unique key constraint
is needed, ensuring the uniqueness of the key. However, it cannot utilize the
query benefits brought by pre-aggregations such as ROLLUP.
+* **Unique Key Table** is designed for scenarios where a unique key constraint
is needed, ensuring the uniqueness of the key. However, it cannot utilize the
query benefits brought by pre-aggregations such as ROLLUP.
-* **Aggregate Key Model** can greatly reduce the data and computation required
for aggregation queries through pre-aggregation, making it ideal for
fixed-schema reporting queries. However, this model is not friendly to
`count(*)` queries. Also, because the aggregation method for the Value columns
is fixed, when performing other types of aggregation queries, semantic
correctness must be considered.
+* **Aggregate Key Table** can greatly reduce the data and computation required
for aggregation queries through pre-aggregation, making it ideal for
fixed-schema reporting queries. However, this type is not friendly to
`count(*)` queries. Also, because the aggregation method for the Value columns
is fixed, when performing other types of aggregation queries, semantic
correctness must be considered.
-* **Update partial columns**, please refer to the documentation for [Partial
Column Updates in Unique Key
Model](../../data-operate/update/update-of-aggregate-model) and [Partial Column
Updates in Aggregate
Model](../../data-operate/update/update-of-aggregate-model) for relevant usage
advice.
+* **Update partial columns**, please refer to the documentation for [Partial
Column Updates in Unique Key
Table](../../data-operate/update/update-of-aggregate-model) and [Partial Column
Updates in Aggregate Key
Table](../../data-operate/update/update-of-aggregate-model) for relevant usage
advice.
## Sort Key
-In Doris, data is stored in a columnar format, and a table can be divided into
Key columns and Value columns. The Key columns are used for grouping and
sorting, while the Value columns are used for aggregation. Key columns can
consist of one or more fields, and when creating a table, data is sorted and
stored according to the columns of Aggregate Key, Unique Key, and Duplicate Key
models.
+In Doris, data is stored in a columnar format, and a table can be divided into
Key columns and Value columns. The Key columns are used for grouping and
sorting, while the Value columns are used for aggregation. Key columns can
consist of one or more fields, and when creating a table, data is sorted and
stored according to the columns of Aggregate Key, Unique Key, and Duplicate Key
Tables.
-Different table models require the specification of Key columns during table
creation, each with a different significance: for the Duplicate Key model, the
Key columns represent sorting, without any uniqueness constraints. In the
Aggregate Key and Unique Key models, aggregation is performed based on the Key
columns, which not only have sorting capabilities but also enforce uniqueness
constraints.
+Different Table Types require the specification of Key columns during table
creation, each with a different significance: for the Duplicate Key Table, the
Key columns represent sorting, without any uniqueness constraints. In the
Aggregate Key and Unique Key Tables, aggregation is performed based on the Key
columns, which not only have sorting capabilities but also enforce uniqueness
constraints.
Proper use of the Sort Key can provide the following benefits:
@@ -41,7 +41,7 @@ Proper use of the Sort Key can provide the following benefits:
* **Data Compression Optimization**: Storing data in an ordered fashion based
on the sort key improves compression efficiency, as similar data will be
grouped together, significantly increasing the compression ratio and reducing
storage space.
-* **Reduced Deduplication Costs**: When using the Unique Key Model, the sort
key allows Doris to perform deduplication more efficiently, ensuring data
uniqueness.
+* **Reduced Deduplication Costs**: When using the Unique Key Table, the sort
key allows Doris to perform deduplication more efficiently, ensuring data
uniqueness.
When selecting a sort key, the following recommendations can be followed:
@@ -53,9 +53,9 @@ When selecting a sort key, the following recommendations can
be followed:
* For the length of `VARCHAR` and `STRING` types, follow the principle of
choosing enough...
-## Table Model Comparison
+## Table Type Comparison
-| | Duplicate Key Model | Unique Key Model | Aggregate Key
Model |
+| | Duplicate Key Table | Unique Key Table | Aggregate Key
Table |
| --------- | ------------------ | ----------------- | --------------- |
| Key Column Uniqueness | Not Supported, Key columns can be duplicated |
Supported | Supported |
| Synchronous Materialized View | Supported | Supported | Supported |
diff --git a/versioned_docs/version-3.0/table-design/data-model/unique.md
b/versioned_docs/version-3.0/table-design/data-model/unique.md
index f83fedab50d..10a1cf1efa2 100644
--- a/versioned_docs/version-3.0/table-design/data-model/unique.md
+++ b/versioned_docs/version-3.0/table-design/data-model/unique.md
@@ -1,38 +1,38 @@
---
{
- "title": "Unique Key Model",
+ "title": "Unique Key Table",
"language": "en"
}
---
-When data updates are required, use the **Unique Key Model**. It guarantees
the uniqueness of the Key columns so that new data overwrites existing records
with matching keys, ensuring that only the most up-to-date records are
maintained. This model is ideal for update scenarios, enabling unique-key-level
updates during data insertion.
-The Unique Key Model has the following characteristics:
+When data updates are required, use the **Unique Key Table**. It guarantees
the uniqueness of the Key columns so that new data overwrites existing records
with matching keys, ensuring that only the most up-to-date records are
maintained. This model is ideal for update scenarios, enabling unique-key-level
updates during data insertion.
+The Unique Key Table has the following characteristics:
* **Unique Key UPSERT**: During insertion, records with duplicate keys are
updated, while new keys are inserted.
-* **Automatic Deduplication**: The model ensures key uniqueness and
automatically deduplicates data based on the unique key.
+* **Automatic Deduplication**: The table ensures key uniqueness and
automatically deduplicates data based on the unique key.
* **Optimized for High-frequency Updates**: It efficiently handles
high-frequency updates while balancing update and query performance.
## Use Cases
-* **High-frequency Data Updates**: In upstream OLTP databases, where dimension
tables are frequently updated, the Unique Key Model can efficiently synchronize
the upstream updated records and perform efficient UPSERT operations.
+* **High-frequency Data Updates**: In upstream OLTP databases, where dimension
tables are frequently updated, the Unique Key Table can efficiently synchronize
the upstream updated records and perform efficient UPSERT operations.
-* **Efficient Data Deduplication**: In scenarios such as advertising campaigns
or customer relationship management systems, where deduplication is required
based on user IDs, the Unique Key Model ensures efficient deduplication.
+* **Efficient Data Deduplication**: In scenarios such as advertising campaigns
or customer relationship management systems, where deduplication is required
based on user IDs, the Unique Key Table ensures efficient deduplication.
-* **Partial Columns Updates**: In scenarios such as in user profiling where
dynamic tags change frequently, or in order consumption scenarios where the
transaction status needs to be updated. The Unique Key Model's partial column
update capability allows for changes to specific columns.
+* **Partial Columns Updates**: In scenarios such as in user profiling where
dynamic tags change frequently, or in order consumption scenarios where the
transaction status needs to be updated. The Unique Key Table's partial column
update capability allows for changes to specific columns.
## Implementation Methods
-In Doris, the Unique Key Model has two implementation methods:
+In Doris, the Unique Key Table has two implementation methods:
-* **Merge-on-write**: Starting from version 1.2, the default implementation of
the Unique Key Model in Doris is the merge-on-write mode. In this mode, data is
immediately merged for the same Key upon writing, ensuring that the data
storage state after each write is the final merged result of the unique key,
and only the latest result is stored. Merge-on-write provides a good balance
between query and write performance, avoiding the need to merge multiple
versions of data during queries a [...]
+* **Merge-on-write**: Starting from version 1.2, the default implementation of
the Unique Key Table in Doris is the merge-on-write mode. In this mode, data is
immediately merged for the same Key upon writing, ensuring that the data
storage state after each write is the final merged result of the unique key,
and only the latest result is stored. Merge-on-write provides a good balance
between query and write performance, avoiding the need to merge multiple
versions of data during queries a [...]
-* **Merge-on-read**: Prior to version 1.2, Doris's Unique Key Model defaulted
to merge-on-read mode. In this mode, data is not merged upon writing but is
appended incrementally, retaining multiple versions within Doris. During
queries or Compaction, data is merged by the same Key version. Merge-on-read is
suitable for write-heavy and read-light scenarios, but during queries, multiple
versions must be merged, and predicates cannot be pushed down, which may affect
query speed.
+* **Merge-on-read**: Prior to version 1.2, Doris's Unique Key Table defaulted
to merge-on-read mode. In this mode, data is not merged upon writing but is
appended incrementally, retaining multiple versions within Doris. During
queries or Compaction, data is merged by the same Key version. Merge-on-read is
suitable for write-heavy and read-light scenarios, but during queries, multiple
versions must be merged, and predicates cannot be pushed down, which may affect
query speed.
-In Doris, there are two types of update semantics for the Unique Key Model:
+In Doris, there are two types of update semantics for the Unique Key Table:
-* **Full Row Upsert**: The default update semantic for the Unique Key Model is
**full row UPSERT**, i.e., UPDATE OR INSERT. If the Key of the row exists, it
will be updated; if it does not exist, new data will be inserted. In the full
row UPSERT semantic, even if the user inserts data into specific columns using
`INSERT INTO`, Doris will fill in the missing columns with NULL values or
default values during the planner stage.
+* **Full Row Upsert**: The default update semantic for the Unique Key Table is
**full row UPSERT**, i.e., UPDATE OR INSERT. If the Key of the row exists, it
will be updated; if it does not exist, new data will be inserted. In the full
row UPSERT semantic, even if the user inserts data into specific columns using
`INSERT INTO`, Doris will fill in the missing columns with NULL values or
default values during the planner stage.
* **Partial Column Upsert**: If users want to update specific fields, they
need to use the merge-on-write implementation and enable partial column updates
support via specific parameters. Please refer to the documentation on [Partial
Column Updates](../../data-operate/update/update-of-unique-model).
@@ -85,7 +85,7 @@ PROPERTIES (
In a Unique Key table, the Key columns serve both for sorting and
deduplication. New insertions overwrite existing records with matching keys.
-
+
As shown in the example, there were 4 rows of data in the original table.
After inserting 2 new rows, the newly inserted rows are updated based on the
unique key:
diff --git a/versioned_sidebars/version-2.1-sidebars.json
b/versioned_sidebars/version-2.1-sidebars.json
index c0ecd087975..b61700d2b39 100644
--- a/versioned_sidebars/version-2.1-sidebars.json
+++ b/versioned_sidebars/version-2.1-sidebars.json
@@ -72,7 +72,7 @@
"table-design/overview",
{
"type": "category",
- "label": "Data Models",
+ "label": "Table Types",
"items": [
"table-design/data-model/overview",
"table-design/data-model/duplicate",
diff --git a/versioned_sidebars/version-3.0-sidebars.json
b/versioned_sidebars/version-3.0-sidebars.json
index b1e97f357a9..b301c5a9e2d 100644
--- a/versioned_sidebars/version-3.0-sidebars.json
+++ b/versioned_sidebars/version-3.0-sidebars.json
@@ -97,7 +97,7 @@
"table-design/overview",
{
"type": "category",
- "label": "Data Models",
+ "label": "Table Types",
"items": [
"table-design/data-model/overview",
"table-design/data-model/duplicate",
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]