(doris-website) branch master updated: docs: add flexible partial column update support for Routine Load (#3296)

dataroaring Fri, 23 Jan 2026 13:08:47 -0800

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new ab3f192176b docs: add flexible partial column update support for 
Routine Load (#3296)
ab3f192176b is described below

commit ab3f192176b13fd42dddc867362800aecb7c56a4
Author: Yongqiang YANG <[email protected]>
AuthorDate: Fri Jan 23 13:07:17 2026 -0800

    docs: add flexible partial column update support for Routine Load (#3296)
    
    ## Summary
    
    This PR adds documentation for the new `unique_key_update_mode` property
    in Routine Load that enables flexible partial column updates,
    corresponding to
    [apache/doris#59896](https://github.com/apache/doris/pull/59896).
    
    ### Changes
    
    - **partial-column-update.md**:
      - Updated caution note to include Routine Load as a supported method
    - Added complete Routine Load section with CREATE and ALTER syntax
    examples
      - Added caution block for Routine Load-specific limitations
    
    - **routine-load-manual.md**:
    - Added `unique_key_update_mode` property to the job_properties table
    with all three modes documented
      - Fixed `partial_columns` link to point to correct documentation
      - Added complete "Flexible Partial Column Update" example with:
        - Sample JSON data with different columns per row
        - Table creation with required properties
        - Routine Load job creation
        - Expected results
    
    ### Feature Description
    
    The `unique_key_update_mode` property supports three modes:
    
    | Mode | Description |
    |------|-------------|
    | `UPSERT` | Default mode. Full row insert/update |
    | `UPDATE_FIXED_COLUMNS` | Partial update where all rows update same
    columns (equivalent to `partial_columns=true`) |
    | `UPDATE_FLEXIBLE_COLUMNS` | **New** - Each row can update different
    columns |
    
    ### Limitations for UPDATE_FLEXIBLE_COLUMNS
    
    - JSON format only
    - Cannot use `jsonpaths`, `fuzzy_parse`, `COLUMNS` clause, or `WHERE`
    clause
    - Table must have `enable_unique_key_skip_bitmap_column=true`
---
 .../import/import-way/routine-load-manual.md       | 80 ++++++++++++++++++++-
 docs/data-operate/update/partial-column-update.md  | 44 +++++++++++-
 .../import/import-way/routine-load-manual.md       | 82 +++++++++++++++++++++-
 .../data-operate/update/partial-column-update.md   | 48 ++++++++++++-
 4 files changed, 247 insertions(+), 7 deletions(-)

diff --git a/docs/data-operate/import/import-way/routine-load-manual.md 
b/docs/data-operate/import/import-way/routine-load-manual.md
index 5dd95494d2d..25774f2af53 100644
--- a/docs/data-operate/import/import-way/routine-load-manual.md
+++ b/docs/data-operate/import/import-way/routine-load-manual.md
@@ -406,7 +406,8 @@ Here are the available parameters for the job_properties 
clause:
 | strip_outer_array         | When importing JSON format data, if 
strip_outer_array is true, it indicates that the JSON data is presented as an 
array, and each element in the data will be treated as a row. Default value is 
false. Typically, JSON data in Kafka might be represented as an array with 
square brackets `[]` in the outermost layer. In this case, you can specify 
`"strip_outer_array" = "true"` to consume Topic data in array mode. For 
example, the following data will be parsed into [...]
 | send_batch_parallelism    | Used to set the parallelism of sending batch 
data. If the parallelism value exceeds the `max_send_batch_parallelism_per_job` 
in BE configuration, the coordinating BE will use the value of 
`max_send_batch_parallelism_per_job`. |
 | load_to_single_tablet     | Supports importing data to only one tablet in 
the corresponding partition per task. Default value is false. This parameter 
can only be set when importing data to OLAP tables with random bucketing. |
-| partial_columns           | Specifies whether to enable partial column 
update feature. Default value is false. This parameter can only be set when the 
table model is Unique and uses Merge on Write. Multi-table streaming does not 
support this parameter. For details, refer to [Partial Column 
Update](../../../data-operate/update/update-of-unique-model) |
+| partial_columns           | Specifies whether to enable partial column 
update feature. Default value is false. This parameter can only be set when the 
table model is Unique and uses Merge on Write. Multi-table streaming does not 
support this parameter. For details, refer to [Partial Column 
Update](../../../data-operate/update/partial-column-update) |
+| unique_key_update_mode    | Specifies the update mode for Unique Key tables. 
Available values are:<ul><li>`UPSERT` (default): Standard insert or update 
operation for the entire row.</li><li>`UPDATE_FIXED_COLUMNS`: Partial column 
update where all rows update the same columns. Equivalent to 
`partial_columns=true`.</li><li>`UPDATE_FLEXIBLE_COLUMNS`: Flexible partial 
column update where each row can update different columns. Requires JSON format 
and the table must have `enable_unique_key_s [...]
 | partial_update_new_key_behavior | When performing partial column updates on 
Unique Merge on Write table, this parameter controls how new rows are handled. 
There are two types: `APPEND` and `ERROR`.<br/>- `APPEND`: Allows inserting new 
row data<br/>- `ERROR`: Fails and reports an error when inserting new rows |
 | max_filter_ratio          | The maximum allowed filter ratio within the 
sampling window. Must be between 0 and 1 inclusive. Default value is 1.0, 
indicating any error rows can be tolerated. The sampling window is 
`max_batch_rows * 10`. If the ratio of error rows to total rows within the 
sampling window exceeds `max_filter_ratio`, the routine job will be suspended 
and require manual intervention to check data quality issues. Rows filtered by 
WHERE conditions are not counted as error rows. |
 | enclose                   | Specifies the enclosing character. When CSV data 
fields contain line or column separators, a single-byte character can be 
specified as an enclosing character for protection to prevent accidental 
truncation. For example, if the column separator is "," and the enclosing 
character is "'", the data "a,'b,c'" will have "b,c" parsed as one field. |
@@ -1362,6 +1363,83 @@ The columns in the result set provide the following 
information:
     3 rows in set (0.01 sec)
     ```
 
+**Flexible Partial Column Update**
+
+This example demonstrates how to use flexible partial column updates where 
each row can update different columns. This is useful for CDC scenarios where 
change records may contain different fields.
+
+1. Load sample data (each JSON record updates different columns):
+
+    ```json
+    {"id": 1, "balance": 150.00, "last_active": "2024-01-15 10:30:00"}
+    {"id": 2, "city": "Shanghai", "age": 28}
+    {"id": 3, "name": "Alice", "balance": 500.00, "city": "Beijing"}
+    {"id": 1, "age": 30}
+    {"id": 4, "__DORIS_DELETE_SIGN__": 1}
+    ```
+
+2. Create table (must enable Merge-on-Write and skip bitmap column):
+
+    ```sql
+    CREATE TABLE demo.routine_test_flexible (
+        id           INT            NOT NULL  COMMENT "id",
+        name         VARCHAR(30)              COMMENT "name",
+        age          INT                      COMMENT "age",
+        city         VARCHAR(50)              COMMENT "city",
+        balance      DECIMAL(10,2)            COMMENT "balance",
+        last_active  DATETIME                 COMMENT "last active time"
+    )
+    UNIQUE KEY(`id`)
+    DISTRIBUTED BY HASH(`id`) BUCKETS 1
+    PROPERTIES (
+        "replication_num" = "1",
+        "enable_unique_key_merge_on_write" = "true",
+        "enable_unique_key_skip_bitmap_column" = "true"
+    );
+    ```
+
+3. Insert initial data:
+
+    ```sql
+    INSERT INTO demo.routine_test_flexible VALUES
+    (1, 'John', 25, 'Shenzhen', 100.00, '2024-01-01 08:00:00'),
+    (2, 'Jane', 30, 'Guangzhou', 200.00, '2024-01-02 09:00:00'),
+    (3, 'Bob', 35, 'Hangzhou', 300.00, '2024-01-03 10:00:00'),
+    (4, 'Tom', 40, 'Nanjing', 400.00, '2024-01-04 11:00:00');
+    ```
+
+4. Load command:
+
+    ```sql
+    CREATE ROUTINE LOAD demo.kafka_job_flexible ON routine_test_flexible
+            PROPERTIES
+            (
+                "format" = "json",
+                "unique_key_update_mode" = "UPDATE_FLEXIBLE_COLUMNS"
+            )
+            FROM KAFKA
+            (
+                "kafka_broker_list" = "10.16.10.6:9092",
+                "kafka_topic" = "routineLoadFlexible",
+                "property.kafka_default_offsets" = "OFFSET_BEGINNING"
+            );
+    ```
+
+5. Load result:
+
+    ```sql
+    mysql> SELECT * FROM demo.routine_test_flexible ORDER BY id;
+    +------+-------+------+-----------+---------+---------------------+
+    | id   | name  | age  | city      | balance | last_active         |
+    +------+-------+------+-----------+---------+---------------------+
+    |    1 | John  |   30 | Shenzhen  |  150.00 | 2024-01-15 10:30:00 |
+    |    2 | Jane  |   28 | Shanghai  |  200.00 | 2024-01-02 09:00:00 |
+    |    3 | Alice |   35 | Beijing   |  500.00 | 2024-01-03 10:00:00 |
+    +------+-------+------+-----------+---------+---------------------+
+    3 rows in set (0.01 sec)
+    ```
+
+    Note: Row with `id=4` was deleted due to `__DORIS_DELETE_SIGN__`, and each 
row was updated with only the columns present in its corresponding JSON record.
+
 ### Loading Complex Data Types
 
 **Load Array Data Type**
diff --git a/docs/data-operate/update/partial-column-update.md 
b/docs/data-operate/update/partial-column-update.md
index 42daf124af4..a3435dd4e84 100644
--- a/docs/data-operate/update/partial-column-update.md
+++ b/docs/data-operate/update/partial-column-update.md
@@ -122,7 +122,7 @@ Previously, Doris's partial update feature required that 
every row in an import
 
 :::caution Note:
 
-1. Currently, only the Stream Load import method and tools using Stream Load 
(e.g. Doris-Flink-Connector) support this feature.
+1. Flexible column updates are supported by Stream Load, Routine Load, and 
tools using Stream Load (e.g. Doris-Flink-Connector).
 2. The import file must be in JSON format when using flexible column updates.
 :::
 
@@ -161,6 +161,48 @@ If using the Flink Doris Connector, add the following 
configuration:
 'sink.properties.unique_key_update_mode' = 'UPDATE_FLEXIBLE_COLUMNS'
 ```
 
+**Routine Load**
+
+When using Routine Load, add the following property in the `PROPERTIES` clause:
+
+```sql
+CREATE ROUTINE LOAD db1.job1 ON tbl1
+PROPERTIES (
+    "format" = "json",
+    "unique_key_update_mode" = "UPDATE_FLEXIBLE_COLUMNS"
+)
+FROM KAFKA (
+    "kafka_broker_list" = "localhost:9092",
+    "kafka_topic" = "my_topic",
+    "property.kafka_default_offsets" = "OFFSET_BEGINNING"
+);
+```
+
+You can also modify the update mode of an existing Routine Load job using 
`ALTER ROUTINE LOAD`:
+
+```sql
+-- Pause the job first
+PAUSE ROUTINE LOAD FOR db1.job1;
+
+-- Alter the update mode
+ALTER ROUTINE LOAD FOR db1.job1
+PROPERTIES (
+    "unique_key_update_mode" = "UPDATE_FLEXIBLE_COLUMNS"
+);
+
+-- Resume the job
+RESUME ROUTINE LOAD FOR db1.job1;
+```
+
+:::caution Routine Load Limitations
+When using `UPDATE_FLEXIBLE_COLUMNS` mode with Routine Load, the following 
restrictions apply:
+- The data format must be JSON (`"format" = "json"`)
+- The `jsonpaths` property cannot be specified
+- The `fuzzy_parse` option cannot be enabled
+- The `COLUMNS` clause cannot be used
+- The `WHERE` clause cannot be used
+:::
+
 #### Example
 
 Assuming the following table:
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md
index 98d2e385c2d..9bab1fe4080 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md
@@ -417,8 +417,9 @@ job_properties 子句具体参数选项如下：
 | strip_outer_array         | 当导入数据格式为 json 时，strip_outer_array 为 true 表示 JSON 
数据以数组的形式展现，数据中的每一个元素将被视为一行数据。默认值是 false。通常情况下，Kafka 中的 JSON 
数据可能以数组形式表示，即在最外层中包含中括号`[]`，此时，可以指定 `"strip_outer_array" = "true"`，以数组模式消费 
Topic 
中的数据。如以下数据会被解析成两行：`[{"user_id":1,"name":"Emily","age":25},{"user_id":2,"name":"Benjamin","age":35}]`
 |
 | send_batch_parallelism    | 用于设置发送批量数据的并行度。如果并行度的值超过 BE 配置中的 
`max_send_batch_parallelism_per_job`，那么作为协调点的 BE 将使用 
`max_send_batch_parallelism_per_job` 的值。 |
 | load_to_single_tablet     | 支持一个任务只导入数据到对应分区的一个 tablet，默认值为 false，该参数只允许在对带有 
random 分桶的 olap 表导数的时候设置。 |
-| partial_columns           | 指定是否开启部分列更新功能。默认值为 false。该参数只允许在表模型为 Unique 且采用 
Merge on Write 
时设置。一流多表不支持此参数。具体参考文档[列更新](../../../data-operate/update/partial-column-update.md)
 |
-| partial_update_new_key_behavior | 在 Unique Merge on Write 
表上进行部分列更新时，对新插入行的处理方式。有两种类型 `APPEND`, `ERROR`。<br/>-`APPEND`: 
允许插入新行数据<br/>-`ERROR`: 插入新行时倒入失败并报错 |
+| partial_columns           | 指定是否开启部分列更新功能。默认值为 false。该参数只允许在表模型为 Unique 且采用 
Merge on Write 
时设置。一流多表不支持此参数。具体参考文档[部分列更新](../../../data-operate/update/partial-column-update.md)
 |
+| unique_key_update_mode    | 指定 Unique Key 
表的更新模式。可选值：<ul><li>`UPSERT`（默认）：标准的整行插入或更新操作。</li><li>`UPDATE_FIXED_COLUMNS`：部分列更新，所有行更新相同的列。等同于
 
`partial_columns=true`。</li><li>`UPDATE_FLEXIBLE_COLUMNS`：灵活部分列更新，每行可以更新不同的列。需要 
JSON 格式且表必须设置 `enable_unique_key_skip_bitmap_column=true`。不能与 
`jsonpaths`、`fuzzy_parse`、`COLUMNS` 子句或 `WHERE` 
子句一起使用。</li></ul>详情参考[部分列更新](../../../data-operate/update/partial-column-update#灵活部分列更新)
 |
+| partial_update_new_key_behavior | 在 Unique Merge on Write 
表上进行部分列更新时，对新插入行的处理方式。有两种类型 `APPEND`、`ERROR`。<br/>- `APPEND`：允许插入新行数据<br/>- 
`ERROR`：插入新行时导入失败并报错 |
 | max_filter_ratio          | 采样窗口内，允许的最大过滤率。必须在大于等于 0 到小于等于 1 之间。默认值是 
1.0，表示可以容忍任何错误行。采样窗口为 `max_batch_rows * 10`。即如果在采样窗口内，错误行数/总行数大于 
`max_filter_ratio`，则会导致例行作业被暂停，需要人工介入检查数据质量问题。被 where 条件过滤掉的行不算错误行。 |
 | enclose                   | 指定包围符。当 CSV 
数据字段中含有行分隔符或列分隔符时，为防止意外截断，可指定单字节字符作为包围符起到保护作用。例如列分隔符为 ","，包围符为 "'"，数据为 
"a,'b,c'"，则 "b,c" 会被解析为一个字段。 |
 | escape                    | 指定转义符。用于转义在字段中出现的与包围符相同的字符。例如数据为 "a,'b,'c'"，包围符为 
"'"，希望 "b,'c 被作为一个字段解析，则需要指定单字节转义符，例如"\"，将数据修改为 "a,'b,\'c'"。 |
@@ -1370,6 +1371,83 @@ mysql> SELECT * FROM routine_test08;
     3 rows in set (0.01 sec)
     ```
 
+**灵活部分列更新**
+
+本示例演示如何使用灵活部分列更新，其中每行可以更新不同的列。这在 CDC 场景中非常有用，因为变更记录可能包含不同的字段。
+
+1. 导入数据样例（每条 JSON 记录更新不同的列）：
+
+    ```json
+    {"id": 1, "balance": 150.00, "last_active": "2024-01-15 10:30:00"}
+    {"id": 2, "city": "Shanghai", "age": 28}
+    {"id": 3, "name": "Alice", "balance": 500.00, "city": "Beijing"}
+    {"id": 1, "age": 30}
+    {"id": 4, "__DORIS_DELETE_SIGN__": 1}
+    ```
+
+2. 建表（必须启用 Merge-on-Write 和 skip bitmap 列）：
+
+    ```sql
+    CREATE TABLE demo.routine_test_flexible (
+        id           INT            NOT NULL  COMMENT "id",
+        name         VARCHAR(30)              COMMENT "姓名",
+        age          INT                      COMMENT "年龄",
+        city         VARCHAR(50)              COMMENT "城市",
+        balance      DECIMAL(10,2)            COMMENT "余额",
+        last_active  DATETIME                 COMMENT "最后活跃时间"
+    )
+    UNIQUE KEY(`id`)
+    DISTRIBUTED BY HASH(`id`) BUCKETS 1
+    PROPERTIES (
+        "replication_num" = "1",
+        "enable_unique_key_merge_on_write" = "true",
+        "enable_unique_key_skip_bitmap_column" = "true"
+    );
+    ```
+
+3. 插入初始数据：
+
+    ```sql
+    INSERT INTO demo.routine_test_flexible VALUES
+    (1, 'John', 25, 'Shenzhen', 100.00, '2024-01-01 08:00:00'),
+    (2, 'Jane', 30, 'Guangzhou', 200.00, '2024-01-02 09:00:00'),
+    (3, 'Bob', 35, 'Hangzhou', 300.00, '2024-01-03 10:00:00'),
+    (4, 'Tom', 40, 'Nanjing', 400.00, '2024-01-04 11:00:00');
+    ```
+
+4. 导入命令：
+
+    ```sql
+    CREATE ROUTINE LOAD demo.kafka_job_flexible ON routine_test_flexible
+            PROPERTIES
+            (
+                "format" = "json",
+                "unique_key_update_mode" = "UPDATE_FLEXIBLE_COLUMNS"
+            )
+            FROM KAFKA
+            (
+                "kafka_broker_list" = "10.16.10.6:9092",
+                "kafka_topic" = "routineLoadFlexible",
+                "property.kafka_default_offsets" = "OFFSET_BEGINNING"
+            );
+    ```
+
+5. 导入结果：
+
+    ```sql
+    mysql> SELECT * FROM demo.routine_test_flexible ORDER BY id;
+    +------+-------+------+-----------+---------+---------------------+
+    | id   | name  | age  | city      | balance | last_active         |
+    +------+-------+------+-----------+---------+---------------------+
+    |    1 | John  |   30 | Shenzhen  |  150.00 | 2024-01-15 10:30:00 |
+    |    2 | Jane  |   28 | Shanghai  |  200.00 | 2024-01-02 09:00:00 |
+    |    3 | Alice |   35 | Beijing   |  500.00 | 2024-01-03 10:00:00 |
+    +------+-------+------+-----------+---------+---------------------+
+    3 rows in set (0.01 sec)
+    ```
+
+    注意：`id=4` 的行因为 `__DORIS_DELETE_SIGN__` 被删除，每行只更新了其对应 JSON 记录中包含的列。
+
 ### 导入复杂类型
 
 **导入 Array 数据类型**
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/partial-column-update.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/partial-column-update.md
index 4ccc8f9ac9d..14d8afcabba 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/partial-column-update.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/partial-column-update.md
@@ -124,8 +124,8 @@ INSERT INTO order_tbl (order_id, order_status) VALUES (1, 
'待发货');
 
 :::caution 注意：
 
-1. 目前只有 stream load 这一种导入方式以及使用 stream load 作为其导入方式的工具 (如 
doris-flink-connector) 支持灵活列更新功能
-2. 在使用灵活列更新时导入文件必须为 json 格式的数据
+1. 灵活列更新功能支持 Stream Load、Routine Load 以及使用 Stream Load 作为其导入方式的工具（如 
Doris-Flink-Connector）
+2. 在使用灵活列更新时导入文件必须为 JSON 格式的数据
 :::
 
 #### 适用场景
@@ -157,12 +157,54 @@ unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS
 
 **Flink Doris Connector**
 
-如果使用 Flink Doris Connector, 需要添加如下配置：
+如果使用 Flink Doris Connector，需要添加如下配置：
 
 ```Plain
 'sink.properties.unique_key_update_mode' = 'UPDATE_FLEXIBLE_COLUMNS'
 ```
 
+**Routine Load**
+
+在使用 Routine Load 导入时，在 `PROPERTIES` 子句中添加如下属性：
+
+```sql
+CREATE ROUTINE LOAD db1.job1 ON tbl1
+PROPERTIES (
+    "format" = "json",
+    "unique_key_update_mode" = "UPDATE_FLEXIBLE_COLUMNS"
+)
+FROM KAFKA (
+    "kafka_broker_list" = "localhost:9092",
+    "kafka_topic" = "my_topic",
+    "property.kafka_default_offsets" = "OFFSET_BEGINNING"
+);
+```
+
+也可以使用 `ALTER ROUTINE LOAD` 来修改现有 Routine Load 作业的更新模式：
+
+```sql
+-- 首先暂停作业
+PAUSE ROUTINE LOAD FOR db1.job1;
+
+-- 修改更新模式
+ALTER ROUTINE LOAD FOR db1.job1
+PROPERTIES (
+    "unique_key_update_mode" = "UPDATE_FLEXIBLE_COLUMNS"
+);
+
+-- 恢复作业
+RESUME ROUTINE LOAD FOR db1.job1;
+```
+
+:::caution Routine Load 限制
+在 Routine Load 中使用 `UPDATE_FLEXIBLE_COLUMNS` 模式时，存在以下限制：
+- 数据格式必须为 JSON（`"format" = "json"`）
+- 不能指定 `jsonpaths` 属性
+- 不能启用 `fuzzy_parse` 选项
+- 不能使用 `COLUMNS` 子句
+- 不能使用 `WHERE` 子句
+:::
+
 #### 示例
 
 假设有如下表


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: docs: add flexible partial column update support for Routine Load (#3296)

Reply via email to