This is an automated email from the ASF dual-hosted git repository. zhangchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 1816a0eeaa [doc](data update)Address comment on update of agg model and translate en doc by LLM (#1721) 1816a0eeaa is described below commit 1816a0eeaa066f8655ac80d5fb998193fd7e82af Author: zhannngchen <zhangc...@selectdb.com> AuthorDate: Tue Jan 14 14:08:07 2025 +0800 [doc](data update)Address comment on update of agg model and translate en doc by LLM (#1721) ## Versions - [x] dev - [x] 3.0 - [x] 2.1 - [ ] 2.0 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [x] Checked by AI - [x] Test Cases Built --- .../update/update-of-aggregate-model.md | 50 +++++++++------------- .../update/update-of-aggregate-model.md | 10 +---- .../update/update-of-aggregate-model.md | 10 +---- .../update/update-of-aggregate-model.md | 10 +---- .../update/update-of-aggregate-model.md | 50 +++++++++------------- .../update/update-of-aggregate-model.md | 50 +++++++++------------- 6 files changed, 69 insertions(+), 111 deletions(-) diff --git a/docs/data-operate/update/update-of-aggregate-model.md b/docs/data-operate/update/update-of-aggregate-model.md index 4b5ef90675..3fde759a85 100644 --- a/docs/data-operate/update/update-of-aggregate-model.md +++ b/docs/data-operate/update/update-of-aggregate-model.md @@ -1,7 +1,7 @@ ---- +- { - "title": "Updating Data on Aggregate Key Model", - "language": "en" + "title": "Updating Data on Aggregate Key Model", + "language": "en" } --- @@ -24,23 +24,21 @@ specific language governing permissions and limitations under the License. --> +This document primarily introduces how to update the Doris Aggregate model based on data load. +## Whole Row Update -This guide is about ingestion-based data updates for the Aggregate Key model in Doris. - -## Update all columns - -When importing data into an Aggregate Key model in Doris by methods like Stream Load, Broker Load, Routine Load, and Insert Into, the new values are combined with the old values to produce new aggregated values based on the column's aggregation function. These values might be generated during insertion or produced asynchronously during compaction. However, when querying, users will always receive the same returned values. +When loading data into the Aggregate model table using Doris-supported methods such as Stream Load, Broker Load, Routine Load, Insert Into, etc., the new values will be aggregated with the old values according to the column's aggregation function to produce new aggregated values. This value may be produced at the time of insertion or during asynchronous compaction, but users will get the same return value when querying. -## Partial column update for Aggregate Key model +## Partial Column Update of Aggregate Model -Tables in the Aggregate Key model are primarily used in cases with pre-aggregation requirements rather than data updates, but Doris allows partial column updates for them, too. Simply set the aggregation function to `REPLACE_IF_NOT_NULL`. +The Aggregate table is mainly used in pre-aggregation scenarios rather than data update scenarios, but partial column updates can be achieved by setting the aggregation function to REPLACE_IF_NOT_NULL. -**Create table** +**Create Table** -For the columns that need to be updated, set the aggregation function to `REPLACE_IF_NOT_NULL`. +Set the aggregation function of the fields that need to be updated to `REPLACE_IF_NOT_NULL`. -```Plain +```sql CREATE TABLE order_tbl ( order_id int(11) NULL, order_amount int(11) REPLACE_IF_NOT_NULL NULL, @@ -52,38 +50,32 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); -+----------+--------------+-----------------+ -| order_id | order_amount | order_status | -+----------+--------------+-----------------+ -| 1 | 100 | Pending Payment | -+----------+--------------+-----------------+ -1 row in set (0.01 sec) ``` -**Ingest data** +**Data Insertion** -For Stream Load, Broker Load, Routine Load, or INSERT INTO, you can directly write the updates to the fields. +Whether it is Stream Load, Broker Load, Routine Load, or `INSERT INTO`, directly write the data of the fields to be updated. **Example** -Using the same example as above, the corresponding Stream Load command would be (no additional headers required): +Similar to the previous example, the corresponding Stream Load command is (no additional header required): ```shell $ cat update.csv 1,To be shipped -$ curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load +curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load ``` -The corresponding `INSERT INTO` statement would be (no additional session variables required): +The corresponding `INSERT INTO` statement is (no additional session variable settings required): -```Plain -INSERT INTO order_tbl (order_id, order_status) values (1,'Delivery Pending'); +```sql +INSERT INTO order_tbl (order_id, order_status) values (1,'Shipped'); ``` -## Note +## Notes on Partial Column Updates -The Aggregate Key model does not perform additional data processing during data writing, so the writing performance in this model is the same as other models. However, aggregation during queries can result in performance loss. Typical aggregation queries can be 5~10 times slower than queries on Merge-on-Write tables in the Unique Key model. +The Aggregate Key model does not perform any additional processing during the write process, so the write performance is not affected and is the same as normal data load. However, the cost of aggregation during query is relatively high, and the typical aggregation query performance is 5-10 times lower than the Merge-on-Write implementation of the Unique Key model. -Under this circumstance, users cannot set a field from non-NULL to NULL, because NULL values written will be automatically neglected by the REPLACE_IF_NOT_NULL aggregation function. +Since the `REPLACE_IF_NOT_NULL` aggregation function only takes effect when the value is not NULL, users cannot change a field value to NULL. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-aggregate-model.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-aggregate-model.md index fe2fbe3f98..b1ad17ce72 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-aggregate-model.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-aggregate-model.md @@ -26,7 +26,7 @@ under the License. 这篇文档主要介绍 Doris 聚合模型上基于导入的更新。 -## 所有列更新 +## 整行更新 使用 Doris 支持的 Stream Load,Broker Load,Routine Load,Insert Into 等导入方式,往聚合模型(Agg 模型)中进行数据导入时,都会将新的值与旧的聚合值,根据列的聚合函数产出新的聚合值,这个值可能是插入时产出,也可能是异步 Compaction 时产出,但是用户查询时,都会得到一样的返回值。 @@ -50,12 +50,6 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待付款 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ``` **数据写入** @@ -84,4 +78,4 @@ INSERT INTO order_tbl (order_id, order_status) values (1,'待发货'); Aggregate Key 模型在写入过程中不做任何额外处理,所以写入性能不受影响,与普通的数据导入相同。但是在查询时进行聚合的代价较大,典型的聚合查询性能相比 Unique Key 模型的 Merge-on-Write 实现会有 5-10 倍的下降。 -用户无法通过将某个字段由非 NULL 设置为 NULL,写入的 NULL 值在`REPLACE_IF_NOT_NULL`聚合函数的处理中会自动忽略。 +由于 `REPLACE_IF_NOT_NULL` 聚合函数仅在非 NULL 值时才会生效,因此用户无法将某个字段值修改为NULL值。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-of-aggregate-model.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-of-aggregate-model.md index fe2fbe3f98..b1ad17ce72 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-of-aggregate-model.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-of-aggregate-model.md @@ -26,7 +26,7 @@ under the License. 这篇文档主要介绍 Doris 聚合模型上基于导入的更新。 -## 所有列更新 +## 整行更新 使用 Doris 支持的 Stream Load,Broker Load,Routine Load,Insert Into 等导入方式,往聚合模型(Agg 模型)中进行数据导入时,都会将新的值与旧的聚合值,根据列的聚合函数产出新的聚合值,这个值可能是插入时产出,也可能是异步 Compaction 时产出,但是用户查询时,都会得到一样的返回值。 @@ -50,12 +50,6 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待付款 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ``` **数据写入** @@ -84,4 +78,4 @@ INSERT INTO order_tbl (order_id, order_status) values (1,'待发货'); Aggregate Key 模型在写入过程中不做任何额外处理,所以写入性能不受影响,与普通的数据导入相同。但是在查询时进行聚合的代价较大,典型的聚合查询性能相比 Unique Key 模型的 Merge-on-Write 实现会有 5-10 倍的下降。 -用户无法通过将某个字段由非 NULL 设置为 NULL,写入的 NULL 值在`REPLACE_IF_NOT_NULL`聚合函数的处理中会自动忽略。 +由于 `REPLACE_IF_NOT_NULL` 聚合函数仅在非 NULL 值时才会生效,因此用户无法将某个字段值修改为NULL值。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-of-aggregate-model.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-of-aggregate-model.md index fe2fbe3f98..b1ad17ce72 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-of-aggregate-model.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-of-aggregate-model.md @@ -26,7 +26,7 @@ under the License. 这篇文档主要介绍 Doris 聚合模型上基于导入的更新。 -## 所有列更新 +## 整行更新 使用 Doris 支持的 Stream Load,Broker Load,Routine Load,Insert Into 等导入方式,往聚合模型(Agg 模型)中进行数据导入时,都会将新的值与旧的聚合值,根据列的聚合函数产出新的聚合值,这个值可能是插入时产出,也可能是异步 Compaction 时产出,但是用户查询时,都会得到一样的返回值。 @@ -50,12 +50,6 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待付款 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ``` **数据写入** @@ -84,4 +78,4 @@ INSERT INTO order_tbl (order_id, order_status) values (1,'待发货'); Aggregate Key 模型在写入过程中不做任何额外处理,所以写入性能不受影响,与普通的数据导入相同。但是在查询时进行聚合的代价较大,典型的聚合查询性能相比 Unique Key 模型的 Merge-on-Write 实现会有 5-10 倍的下降。 -用户无法通过将某个字段由非 NULL 设置为 NULL,写入的 NULL 值在`REPLACE_IF_NOT_NULL`聚合函数的处理中会自动忽略。 +由于 `REPLACE_IF_NOT_NULL` 聚合函数仅在非 NULL 值时才会生效,因此用户无法将某个字段值修改为NULL值。 diff --git a/versioned_docs/version-2.1/data-operate/update/update-of-aggregate-model.md b/versioned_docs/version-2.1/data-operate/update/update-of-aggregate-model.md index 1a7dedcad7..3fde759a85 100644 --- a/versioned_docs/version-2.1/data-operate/update/update-of-aggregate-model.md +++ b/versioned_docs/version-2.1/data-operate/update/update-of-aggregate-model.md @@ -1,7 +1,7 @@ ---- +- { - "title": "Updating Data on Aggregate Key Model", - "language": "en" + "title": "Updating Data on Aggregate Key Model", + "language": "en" } --- @@ -24,23 +24,21 @@ specific language governing permissions and limitations under the License. --> -# Update for Aggregate Load - -This guide is about ingestion-based data updates for the Aggregate Key model in Doris. +This document primarily introduces how to update the Doris Aggregate model based on data load. -## Update all columns +## Whole Row Update -When importing data into an Aggregate Key model in Doris by methods like Stream Load, Broker Load, Routine Load, and Insert Into, the new values are combined with the old values to produce new aggregated values based on the column's aggregation function. These values might be generated during insertion or produced asynchronously during compaction. However, when querying, users will always receive the same returned values. +When loading data into the Aggregate model table using Doris-supported methods such as Stream Load, Broker Load, Routine Load, Insert Into, etc., the new values will be aggregated with the old values according to the column's aggregation function to produce new aggregated values. This value may be produced at the time of insertion or during asynchronous compaction, but users will get the same return value when querying. -## Partial column update for Aggregate Key model +## Partial Column Update of Aggregate Model -Tables in the Aggregate Key model are primarily used in cases with pre-aggregation requirements rather than data updates, but Doris allows partial column updates for them, too. Simply set the aggregation function to `REPLACE_IF_NOT_NULL`. +The Aggregate table is mainly used in pre-aggregation scenarios rather than data update scenarios, but partial column updates can be achieved by setting the aggregation function to REPLACE_IF_NOT_NULL. -**Create table** +**Create Table** -For the columns that need to be updated, set the aggregation function to `REPLACE_IF_NOT_NULL`. +Set the aggregation function of the fields that need to be updated to `REPLACE_IF_NOT_NULL`. -```Plain +```sql CREATE TABLE order_tbl ( order_id int(11) NULL, order_amount int(11) REPLACE_IF_NOT_NULL NULL, @@ -52,38 +50,32 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); -+----------+--------------+-----------------+ -| order_id | order_amount | order_status | -+----------+--------------+-----------------+ -| 1 | 100 | Pending Payment | -+----------+--------------+-----------------+ -1 row in set (0.01 sec) ``` -**Ingest data** +**Data Insertion** -For Stream Load, Broker Load, Routine Load, or INSERT INTO, you can directly write the updates to the fields. +Whether it is Stream Load, Broker Load, Routine Load, or `INSERT INTO`, directly write the data of the fields to be updated. **Example** -Using the same example as above, the corresponding Stream Load command would be (no additional headers required): +Similar to the previous example, the corresponding Stream Load command is (no additional header required): ```shell $ cat update.csv 1,To be shipped -$ curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load +curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load ``` -The corresponding `INSERT INTO` statement would be (no additional session variables required): +The corresponding `INSERT INTO` statement is (no additional session variable settings required): -```Plain -INSERT INTO order_tbl (order_id, order_status) values (1,'Delivery Pending'); +```sql +INSERT INTO order_tbl (order_id, order_status) values (1,'Shipped'); ``` -## Note +## Notes on Partial Column Updates -The Aggregate Key model does not perform additional data processing during data writing, so the writing performance in this model is the same as other models. However, aggregation during queries can result in performance loss. Typical aggregation queries can be 5~10 times slower than queries on Merge-on-Write tables in the Unique Key model. +The Aggregate Key model does not perform any additional processing during the write process, so the write performance is not affected and is the same as normal data load. However, the cost of aggregation during query is relatively high, and the typical aggregation query performance is 5-10 times lower than the Merge-on-Write implementation of the Unique Key model. -Under this circumstance, users cannot set a field from non-NULL to NULL, because NULL values written will be automatically neglected by the REPLACE_IF_NOT_NULL aggregation function. +Since the `REPLACE_IF_NOT_NULL` aggregation function only takes effect when the value is not NULL, users cannot change a field value to NULL. diff --git a/versioned_docs/version-3.0/data-operate/update/update-of-aggregate-model.md b/versioned_docs/version-3.0/data-operate/update/update-of-aggregate-model.md index 4b5ef90675..3fde759a85 100644 --- a/versioned_docs/version-3.0/data-operate/update/update-of-aggregate-model.md +++ b/versioned_docs/version-3.0/data-operate/update/update-of-aggregate-model.md @@ -1,7 +1,7 @@ ---- +- { - "title": "Updating Data on Aggregate Key Model", - "language": "en" + "title": "Updating Data on Aggregate Key Model", + "language": "en" } --- @@ -24,23 +24,21 @@ specific language governing permissions and limitations under the License. --> +This document primarily introduces how to update the Doris Aggregate model based on data load. +## Whole Row Update -This guide is about ingestion-based data updates for the Aggregate Key model in Doris. - -## Update all columns - -When importing data into an Aggregate Key model in Doris by methods like Stream Load, Broker Load, Routine Load, and Insert Into, the new values are combined with the old values to produce new aggregated values based on the column's aggregation function. These values might be generated during insertion or produced asynchronously during compaction. However, when querying, users will always receive the same returned values. +When loading data into the Aggregate model table using Doris-supported methods such as Stream Load, Broker Load, Routine Load, Insert Into, etc., the new values will be aggregated with the old values according to the column's aggregation function to produce new aggregated values. This value may be produced at the time of insertion or during asynchronous compaction, but users will get the same return value when querying. -## Partial column update for Aggregate Key model +## Partial Column Update of Aggregate Model -Tables in the Aggregate Key model are primarily used in cases with pre-aggregation requirements rather than data updates, but Doris allows partial column updates for them, too. Simply set the aggregation function to `REPLACE_IF_NOT_NULL`. +The Aggregate table is mainly used in pre-aggregation scenarios rather than data update scenarios, but partial column updates can be achieved by setting the aggregation function to REPLACE_IF_NOT_NULL. -**Create table** +**Create Table** -For the columns that need to be updated, set the aggregation function to `REPLACE_IF_NOT_NULL`. +Set the aggregation function of the fields that need to be updated to `REPLACE_IF_NOT_NULL`. -```Plain +```sql CREATE TABLE order_tbl ( order_id int(11) NULL, order_amount int(11) REPLACE_IF_NOT_NULL NULL, @@ -52,38 +50,32 @@ DISTRIBUTED BY HASH(order_id) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); -+----------+--------------+-----------------+ -| order_id | order_amount | order_status | -+----------+--------------+-----------------+ -| 1 | 100 | Pending Payment | -+----------+--------------+-----------------+ -1 row in set (0.01 sec) ``` -**Ingest data** +**Data Insertion** -For Stream Load, Broker Load, Routine Load, or INSERT INTO, you can directly write the updates to the fields. +Whether it is Stream Load, Broker Load, Routine Load, or `INSERT INTO`, directly write the data of the fields to be updated. **Example** -Using the same example as above, the corresponding Stream Load command would be (no additional headers required): +Similar to the previous example, the corresponding Stream Load command is (no additional header required): ```shell $ cat update.csv 1,To be shipped -$ curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load +curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load ``` -The corresponding `INSERT INTO` statement would be (no additional session variables required): +The corresponding `INSERT INTO` statement is (no additional session variable settings required): -```Plain -INSERT INTO order_tbl (order_id, order_status) values (1,'Delivery Pending'); +```sql +INSERT INTO order_tbl (order_id, order_status) values (1,'Shipped'); ``` -## Note +## Notes on Partial Column Updates -The Aggregate Key model does not perform additional data processing during data writing, so the writing performance in this model is the same as other models. However, aggregation during queries can result in performance loss. Typical aggregation queries can be 5~10 times slower than queries on Merge-on-Write tables in the Unique Key model. +The Aggregate Key model does not perform any additional processing during the write process, so the write performance is not affected and is the same as normal data load. However, the cost of aggregation during query is relatively high, and the typical aggregation query performance is 5-10 times lower than the Merge-on-Write implementation of the Unique Key model. -Under this circumstance, users cannot set a field from non-NULL to NULL, because NULL values written will be automatically neglected by the REPLACE_IF_NOT_NULL aggregation function. +Since the `REPLACE_IF_NOT_NULL` aggregation function only takes effect when the value is not NULL, users cannot change a field value to NULL. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org