This is an automated email from the ASF dual-hosted git repository.
diwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 17989cc32ce [ecosystem](flink) update flink connector 404 (#2792)
17989cc32ce is described below
commit 17989cc32ce36193c7a349161beecfd9fa6686d1
Author: wudi <[email protected]>
AuthorDate: Thu Aug 28 14:08:19 2025 +0800
[ecosystem](flink) update flink connector 404 (#2792)
## Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/ecosystem/flink-doris-connector.md | 8 ++++----
.../current/ecosystem/flink-doris-connector.md | 8 ++++----
.../version-2.1/ecosystem/flink-doris-connector.md | 8 ++++----
.../version-3.0/ecosystem/flink-doris-connector.md | 8 ++++----
versioned_docs/version-2.1/ecosystem/flink-doris-connector.md | 8 ++++----
versioned_docs/version-3.0/ecosystem/flink-doris-connector.md | 8 ++++----
6 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/docs/ecosystem/flink-doris-connector.md
b/docs/ecosystem/flink-doris-connector.md
index 4956d0e7f81..a98d8805505 100644
--- a/docs/ecosystem/flink-doris-connector.md
+++ b/docs/ecosystem/flink-doris-connector.md
@@ -828,9 +828,9 @@ After starting the Flink cluster, you can directly run the
following command:
| Key | Default Value | Required | Comment
|
| --------------------------- | ------------- | -------- |
------------------------------------------------------------ |
| sink.label-prefix | -- | Y | The label prefix
used for Stream load import. In the 2pc scenario, it is required to be globally
unique to ensure the EOS semantics of Flink. |
-| sink.properties.* | -- | N | Import parameters
for Stream Load. For example, 'sink.properties.column_separator' = ', ' defines
the column separator, and 'sink.properties.escape_delimiters' = 'true' means
that special characters as delimiters, like \x01, will be converted to binary
0x01. For JSON format import, 'sink.properties.format' = 'json',
'sink.properties.read_json_by_line' = 'true'. For detailed parameters, refer to
[here](https://doris.apache.org/zh- [...]
+| sink.properties.* | -- | N | Import parameters
for Stream Load. For example, 'sink.properties.column_separator' = ', ' defines
the column separator, and 'sink.properties.escape_delimiters' = 'true' means
that special characters as delimiters, like \x01, will be converted to binary
0x01. For JSON format import, 'sink.properties.format' = 'json',
'sink.properties.read_json_by_line' = 'true'. For detailed parameters, refer to
[here](../data-operate/import/impor [...]
| sink.enable-delete | TRUE | N | Whether to enable
deletion. This option requires the Doris table to have the batch deletion
feature enabled (enabled by default in Doris 0.15+ versions), and only supports
the Unique model. |
-| sink.enable-2pc | TRUE | N | Whether to enable
two-phase commit (2pc). The default is true, ensuring Exactly-Once semantics.
For details about two-phase commit, refer to
[here](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/stream-load-manual.md).
|
+| sink.enable-2pc | TRUE | N | Whether to enable
two-phase commit (2pc). The default is true, ensuring Exactly-Once semantics.
For details about two-phase commit, refer to
[here](../data-operate/transaction.md#streamload-2pc). |
| sink.buffer-size | 1MB | N | The size of the
write data cache buffer, in bytes. It is not recommended to modify it, and the
default configuration can be used. |
| sink.buffer-count | 3 | N | The number of write
data cache buffers. It is not recommended to modify it, and the default
configuration can be used. |
| sink.max-retries | 3 | N | The maximum number
of retries after a Commit failure. The default is 3 times. |
@@ -891,7 +891,7 @@ After starting the Flink cluster, you can directly run the
following command:
| --postgres-conf | The configuration of the Postgres CDCSource, for
example, --postgres-conf hostname=127.0.0.1. You can view all the
configurations of Postgres-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/postgres-cdc/).
Among them, hostname, username, password, database-name, schema-name, and
slot.name are required. |
| --sqlserver-conf | The configuration of the SQLServer CDCSource, for
example, --sqlserver-conf hostname=127.0.0.1. You can view all the
configurations of SQLServer-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/sqlserver-cdc/).
Among them, hostname, username, password, database-name, and schema-name are
required. |
| --db2-conf | The configuration of the SQLServer CDCSource, for
example, --db2-conf hostname=127.0.0.1. You can view all the configurations of
DB2-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/db2-cdc/).
Among them, hostname, username, password, database-name, and schema-name are
required. |
-| --sink-conf | All the configurations of the Doris Sink can be
viewed
[here](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#General
Configuration Items). |
+| --sink-conf | All the configurations of the Doris Sink can be
viewed [here](./flink-doris-connector.md#general-configuration-items). |
| --mongodb-conf | The configuration of the MongoDB CDCSource, for
example, --mongodb-conf hosts=127.0.0.1:27017. You can view all the
configurations of Mongo-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/flink-sources/mongodb-cdc/).
Among them, hosts, username, password, and database are required.
--mongodb-conf schema.sample-percent is the configuration for automatically
sampling MongoDB data to create tables in Doris, and the default [...]
| --table-conf | The configuration items of the Doris table, that
is, the content included in properties (except for table-buckets, which is not
a properties attribute). For example, --table-conf replication_num=1, and
--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50" means
specifying the number of buckets for different tables in the order of regular
expressions. If there is no match, the BUCKETS AUTO method will be used to
create tables. |
| --schema-change-mode | The modes for parsing schema change, including
debezium_structure and sql_parser. The debezium_structure mode is used by
default. The debezium_structure mode parses the data structure used when the
upstream CDC synchronizes data and judges DDL change operations by parsing this
structure. The sql_parser mode parses the DDL statements when the upstream CDC
synchronizes data to judge DDL change operations, so this parsing mode is more
accurate. Usage example: --s [...]
@@ -1104,7 +1104,7 @@ In the whole database synchronization tool provided by
the Connector, no additio
3. **errCode = 2, detailMessage = current running txns on db 10006 is 100,
larger than limit 100**
- This is because the concurrent imports into the same database exceed 100.
It can be solved by adjusting the parameter `max_running_txn_num_per_db` in
`fe.conf`. For specific details, please refer to
[max_running_txn_num_per_db](https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/fe-config/#max_running_txn_num_per_db).
+ This is because the concurrent imports into the same database exceed 100.
It can be solved by adjusting the parameter `max_running_txn_num_per_db` in
`fe.conf`. For specific details, please refer to
[max_running_txn_num_per_db](../admin-manual/config/fe-config#max_running_txn_num_per_db).
Meanwhile, frequently modifying the label and restarting a task may also
lead to this error. In the 2pc scenario (for Duplicate/Aggregate models), the
label of each task needs to be unique. And when restarting from a checkpoint,
the Flink task will actively abort the transactions that have been
pre-committed successfully but not yet committed. Frequent label modifications
and restarts will result in a large number of pre-committed successful
transactions that cannot be aborted and thu [...]
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector.md
index 540acf20e11..f9c67311efa 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector.md
@@ -831,9 +831,9 @@ Flink Doris Connector 中集成了[Flink
CDC](https://nightlies.apache.org/flink
| Key | Default Value | Required | Comment
|
| --------------------------- | ------------- | -------- |
------------------------------------------------------------ |
| sink.label-prefix | -- | Y | Stream load 导入使用的
label 前缀。2pc 场景下要求全局唯一,用来保证 Flink 的 EOS 语义。 |
-| sink.properties.* | -- | N | Stream Load
的导入参数。例如: 'sink.properties.column_separator' = ', ' 定义列分隔符,
'sink.properties.escape_delimiters' = 'true' 特殊字符作为分隔符,\x01 会被转换为二进制的 0x01。JSON
格式导入 'sink.properties.format' = 'json' , 'sink.properties.read_json_by_line' =
'true'
详细参数参考[这里](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/stream-load-manual.md)。Group
Commit 模式 例如:'sink.properties.group_commit' = 'sync_mode' 设置 group commit
为同步模式。flink connector [...]
+| sink.properties.* | -- | N | Stream Load
的导入参数。例如: 'sink.properties.column_separator' = ', ' 定义列分隔符,
'sink.properties.escape_delimiters' = 'true' 特殊字符作为分隔符,\x01 会被转换为二进制的 0x01。JSON
格式导入 'sink.properties.format' = 'json' , 'sink.properties.read_json_by_line' =
'true'
详细参数参考[这里](../data-operate/import/import-way/stream-load-manual.md#导入配置参数)。Group
Commit 模式 例如:'sink.properties.group_commit' = 'sync_mode' 设置 group commit
为同步模式。flink connector 从 1.6.2 开始支持导入配置 g [...]
| sink.enable-delete | TRUE | N | 是否启用删除。此选项需要 Doris
表开启批量删除功能 (Doris0.15+ 版本默认开启),只支持 Unique 模型。 |
-| sink.enable-2pc | TRUE | N | 是否开启两阶段提交 (2pc),默认为
true,保证 Exactly-Once
语义。关于两阶段提交可参考[这里](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/stream-load-manual.md)。
|
+| sink.enable-2pc | TRUE | N | 是否开启两阶段提交 (2pc),默认为
true,保证 Exactly-Once
语义。关于两阶段提交可参考[这里](../data-operate/transaction.md#streamload-2pc)。 |
| sink.buffer-size | 1MB | N | 写数据缓存 buffer
大小,单位字节。不建议修改,默认配置即可 |
| sink.buffer-count | 3 | N | 写数据缓存 buffer
个数。不建议修改,默认配置即可 |
| sink.max-retries | 3 | N | Commit
失败后的最大重试次数,默认 3 次 |
@@ -894,7 +894,7 @@ Flink Doris Connector 中集成了[Flink
CDC](https://nightlies.apache.org/flink
| --postgres-conf | Postgres CDCSource 配置,例如--postgres-conf
hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/postgres-cdc/)查看所有配置
Postgres-CDC,其中 hostname/username/password/database-name/schema-name/slot.name
是必需的。 |
| --sqlserver-conf | SQLServer CDCSource 配置,例如--sqlserver-conf
hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/sqlserver-cdc/)查看所有配置
SQLServer-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 |
| --db2-conf | SQLServer CDCSource 配置,例如--db2-conf
hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/db2-cdc/)查看所有配置
DB2-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。| |
-| --sink-conf | Doris Sink
的所有配置,可以在[这里](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#通用配置项)查看完整的配置项。
|
+| --sink-conf | Doris Sink
的所有配置,可以在[这里](./flink-doris-connector.md#sink-配置项)查看完整的配置项。 |
| --mongodb-conf | MongoDB CDCSource 配置,例如 --mongodb-conf
hosts=127.0.0.1:27017,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/mongodb-cdc/)查看所有配置
Mongo-CDC,其中 hosts/username/password/database 是必须的。其中 --mongodb-conf
schema.sample-percent 为自动采样 mongodb 数据为 Doris 建表的配置,默认为 0.2 |
| --table-conf | Doris 表的配置项,即 properties 中包含的内容(其中 table-buckets
例外,非 properties 属性)。例如 --table-conf replication_num=1,而 --table-conf
table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"表示按照正则表达式顺序指定不同表的 buckets
数量,如果没有匹配到则采用 BUCKETS AUTO 建表。 |
| --schema-change-mode | 解析 schema change 的模式,支持
debezium_structure、sql_parser 两种解析模式,默认采用 debezium_structure
模式。debezium_structure 解析上游 CDC 同步数据时所使用的数据结构,通过解析该结构判断 DDL 变更操作。sql_parser
通过解析上游 CDC 同步数据时的 DDL 语句,从而判断 DDL 变更操作,因此该解析模式更加准确。使用例子:--schema-change-mode
debezium_structure。24.0.0 后支持 |
@@ -1106,7 +1106,7 @@ from KAFKA_SOURCE;
3. **errCode = 2, detailMessage = current running txns on db 10006 is 100,
larger than limit 100**
- 这是因为同一个库并发导入超过了 100,可通过调整 fe.conf 的参数 `max_running_txn_num_per_db`
来解决,具体可参考
[max_running_txn_num_per_db](https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/fe-config/#max_running_txn_num_per_db)。
+ 这是因为同一个库并发导入超过了 100,可通过调整 fe.conf 的参数 `max_running_txn_num_per_db`
来解决,具体可参考
[max_running_txn_num_per_db](../admin-manual/config/fe-config#max_running_txn_num_per_db)。
同时,一个任务频繁修改 label 重启,也可能会导致这个错误。2pc 场景下 (Duplicate/Aggregate 模型),每个任务的
label 需要唯一,并且从 checkpoint 重启时,flink 任务才会主动 abort 掉之前已经 precommit 成功,没有 commit 的
txn,频繁修改 label 重启,会导致大量 precommit 成功的 txn 无法被 abort,占用事务。在 Unique 模型下也可关闭
2pc,可以实现幂等写入。
4. **tablet writer write failed, tablet_id=190958, txn_id=3505530, err=-235**
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/flink-doris-connector.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/flink-doris-connector.md
index cba59afe0f0..70cbb2b4c2b 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/flink-doris-connector.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/flink-doris-connector.md
@@ -831,9 +831,9 @@ Flink Doris Connector 中集成了[Flink
CDC](https://nightlies.apache.org/flink
| Key | Default Value | Required | Comment
|
| --------------------------- | ------------- | -------- |
------------------------------------------------------------ |
| sink.label-prefix | -- | Y | Stream load 导入使用的
label 前缀。2pc 场景下要求全局唯一,用来保证 Flink 的 EOS 语义。 |
-| sink.properties.* | -- | N | Stream Load
的导入参数。例如: 'sink.properties.column_separator' = ', ' 定义列分隔符,
'sink.properties.escape_delimiters' = 'true' 特殊字符作为分隔符,\x01 会被转换为二进制的 0x01。JSON
格式导入 'sink.properties.format' = 'json' , 'sink.properties.read_json_by_line' =
'true'
详细参数参考[这里](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/stream-load-manual.md)。Group
Commit 模式 例如:'sink.properties.group_commit' = 'sync_mode' 设置 group commit
为同步模式。flink connector [...]
+| sink.properties.* | -- | N | Stream Load
的导入参数。例如: 'sink.properties.column_separator' = ', ' 定义列分隔符,
'sink.properties.escape_delimiters' = 'true' 特殊字符作为分隔符,\x01 会被转换为二进制的 0x01。JSON
格式导入 'sink.properties.format' = 'json' , 'sink.properties.read_json_by_line' =
'true'
详细参数参考[这里](../data-operate/import/import-way/stream-load-manual.md#导入配置参数)。Group
Commit 模式 例如:'sink.properties.group_commit' = 'sync_mode' 设置 group commit
为同步模式。flink connector 从 1.6.2 开始支持导入配置 g [...]
| sink.enable-delete | TRUE | N | 是否启用删除。此选项需要 Doris
表开启批量删除功能 (Doris0.15+ 版本默认开启),只支持 Unique 模型。 |
-| sink.enable-2pc | TRUE | N | 是否开启两阶段提交 (2pc),默认为
true,保证 Exactly-Once
语义。关于两阶段提交可参考[这里](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/stream-load-manual.md)。
|
+| sink.enable-2pc | TRUE | N | 是否开启两阶段提交 (2pc),默认为
true,保证 Exactly-Once
语义。关于两阶段提交可参考[这里](../data-operate/transaction.md#streamload-2pc)。 |
| sink.buffer-size | 1MB | N | 写数据缓存 buffer
大小,单位字节。不建议修改,默认配置即可 |
| sink.buffer-count | 3 | N | 写数据缓存 buffer
个数。不建议修改,默认配置即可 |
| sink.max-retries | 3 | N | Commit
失败后的最大重试次数,默认 3 次 |
@@ -894,7 +894,7 @@ Flink Doris Connector 中集成了[Flink
CDC](https://nightlies.apache.org/flink
| --postgres-conf | Postgres CDCSource 配置,例如--postgres-conf
hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/postgres-cdc/)查看所有配置
Postgres-CDC,其中 hostname/username/password/database-name/schema-name/slot.name
是必需的。 |
| --sqlserver-conf | SQLServer CDCSource 配置,例如--sqlserver-conf
hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/sqlserver-cdc/)查看所有配置
SQLServer-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 |
| --db2-conf | SQLServer CDCSource 配置,例如--db2-conf
hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/db2-cdc/)查看所有配置
DB2-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。| |
-| --sink-conf | Doris Sink
的所有配置,可以在[这里](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#通用配置项)查看完整的配置项。
|
+| --sink-conf | Doris Sink
的所有配置,可以在[这里](./flink-doris-connector.md#sink-配置项)查看完整的配置项。 |
| --mongodb-conf | MongoDB CDCSource 配置,例如 --mongodb-conf
hosts=127.0.0.1:27017,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/mongodb-cdc/)查看所有配置
Mongo-CDC,其中 hosts/username/password/database 是必须的。其中 --mongodb-conf
schema.sample-percent 为自动采样 mongodb 数据为 Doris 建表的配置,默认为 0.2 |
| --table-conf | Doris 表的配置项,即 properties 中包含的内容(其中 table-buckets
例外,非 properties 属性)。例如 --table-conf replication_num=1,而 --table-conf
table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"表示按照正则表达式顺序指定不同表的 buckets
数量,如果没有匹配到则采用 BUCKETS AUTO 建表。 |
| --schema-change-mode | 解析 schema change 的模式,支持
debezium_structure、sql_parser 两种解析模式,默认采用 debezium_structure
模式。debezium_structure 解析上游 CDC 同步数据时所使用的数据结构,通过解析该结构判断 DDL 变更操作。sql_parser
通过解析上游 CDC 同步数据时的 DDL 语句,从而判断 DDL 变更操作,因此该解析模式更加准确。使用例子:--schema-change-mode
debezium_structure。24.0.0 后支持 |
@@ -1106,7 +1106,7 @@ from KAFKA_SOURCE;
3. **errCode = 2, detailMessage = current running txns on db 10006 is 100,
larger than limit 100**
- 这是因为同一个库并发导入超过了 100,可通过调整 fe.conf 的参数 `max_running_txn_num_per_db`
来解决,具体可参考
[max_running_txn_num_per_db](https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/fe-config/#max_running_txn_num_per_db)。
+ 这是因为同一个库并发导入超过了 100,可通过调整 fe.conf 的参数 `max_running_txn_num_per_db`
来解决,具体可参考
[max_running_txn_num_per_db](../admin-manual/config/fe-config#max_running_txn_num_per_db)。
同时,一个任务频繁修改 label 重启,也可能会导致这个错误。2pc 场景下 (Duplicate/Aggregate 模型),每个任务的
label 需要唯一,并且从 checkpoint 重启时,flink 任务才会主动 abort 掉之前已经 precommit 成功,没有 commit 的
txn,频繁修改 label 重启,会导致大量 precommit 成功的 txn 无法被 abort,占用事务。在 Unique 模型下也可关闭
2pc,可以实现幂等写入。
4. **tablet writer write failed, tablet_id=190958, txn_id=3505530, err=-235**
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/ecosystem/flink-doris-connector.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/ecosystem/flink-doris-connector.md
index 6d3558a76cf..acd1e5d6252 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/ecosystem/flink-doris-connector.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/ecosystem/flink-doris-connector.md
@@ -831,9 +831,9 @@ Flink Doris Connector 中集成了[Flink
CDC](https://nightlies.apache.org/flink
| Key | Default Value | Required | Comment
|
| --------------------------- | ------------- | -------- |
------------------------------------------------------------ |
| sink.label-prefix | -- | Y | Stream load 导入使用的
label 前缀。2pc 场景下要求全局唯一,用来保证 Flink 的 EOS 语义。 |
-| sink.properties.* | -- | N | Stream Load
的导入参数。例如: 'sink.properties.column_separator' = ', ' 定义列分隔符,
'sink.properties.escape_delimiters' = 'true' 特殊字符作为分隔符,\x01 会被转换为二进制的 0x01。JSON
格式导入 'sink.properties.format' = 'json' , 'sink.properties.read_json_by_line' =
'true'
详细参数参考[这里](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/stream-load-manual.md)。Group
Commit 模式 例如:'sink.properties.group_commit' = 'sync_mode' 设置 group commit
为同步模式。flink connector [...]
+| sink.properties.* | -- | N | Stream Load
的导入参数。例如: 'sink.properties.column_separator' = ', ' 定义列分隔符,
'sink.properties.escape_delimiters' = 'true' 特殊字符作为分隔符,\x01 会被转换为二进制的 0x01。JSON
格式导入 'sink.properties.format' = 'json' , 'sink.properties.read_json_by_line' =
'true'
详细参数参考[这里](../data-operate/import/import-way/stream-load-manual.md#导入配置参数)。Group
Commit 模式 例如:'sink.properties.group_commit' = 'sync_mode' 设置 group commit
为同步模式。flink connector 从 1.6.2 开始支持导入配置 g [...]
| sink.enable-delete | TRUE | N | 是否启用删除。此选项需要 Doris
表开启批量删除功能 (Doris0.15+ 版本默认开启),只支持 Unique 模型。 |
-| sink.enable-2pc | TRUE | N | 是否开启两阶段提交 (2pc),默认为
true,保证 Exactly-Once
语义。关于两阶段提交可参考[这里](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/stream-load-manual.md)。
|
+| sink.enable-2pc | TRUE | N | 是否开启两阶段提交 (2pc),默认为
true,保证 Exactly-Once
语义。关于两阶段提交可参考[这里](../data-operate/transaction.md#streamload-2pc)。 |
| sink.buffer-size | 1MB | N | 写数据缓存 buffer
大小,单位字节。不建议修改,默认配置即可 |
| sink.buffer-count | 3 | N | 写数据缓存 buffer
个数。不建议修改,默认配置即可 |
| sink.max-retries | 3 | N | Commit
失败后的最大重试次数,默认 3 次 |
@@ -894,8 +894,8 @@ Flink Doris Connector 中集成了[Flink
CDC](https://nightlies.apache.org/flink
| --postgres-conf | Postgres CDCSource 配置,例如--postgres-conf
hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/postgres-cdc/)查看所有配置
Postgres-CDC,其中 hostname/username/password/database-name/schema-name/slot.name
是必需的。 |
| --sqlserver-conf | SQLServer CDCSource 配置,例如--sqlserver-conf
hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/sqlserver-cdc/)查看所有配置
SQLServer-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 |
| --db2-conf | SQLServer CDCSource 配置,例如--db2-conf
hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/db2-cdc/)查看所有配置
DB2-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。| |
-| --sink-conf | Doris Sink
的所有配置,可以在[这里](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#通用配置项)查看完整的配置项。
|
| --mongodb-conf | MongoDB CDCSource 配置,例如 --mongodb-conf
hosts=127.0.0.1:27017,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/flink-sources/mongodb-cdc/)查看所有配置
Mongo-CDC,其中 hosts/username/password/database 是必须的。其中 --mongodb-conf
schema.sample-percent 为自动采样 mongodb 数据为 Doris 建表的配置,默认为 0.2 |
+| --sink-conf | Doris Sink
的所有配置,可以在[这里](./flink-doris-connector.md#sink-配置项)查看完整的配置项。 |
| --table-conf | Doris 表的配置项,即 properties 中包含的内容(其中 table-buckets
例外,非 properties 属性)。例如 --table-conf replication_num=1,而 --table-conf
table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"表示按照正则表达式顺序指定不同表的 buckets
数量,如果没有匹配到则采用 BUCKETS AUTO 建表。 |
| --schema-change-mode | 解析 schema change 的模式,支持
debezium_structure、sql_parser 两种解析模式,默认采用 debezium_structure
模式。debezium_structure 解析上游 CDC 同步数据时所使用的数据结构,通过解析该结构判断 DDL 变更操作。sql_parser
通过解析上游 CDC 同步数据时的 DDL 语句,从而判断 DDL 变更操作,因此该解析模式更加准确。使用例子:--schema-change-mode
debezium_structure。24.0.0 后支持 |
| --single-sink | 是否使用单个 Sink 同步所有表,开启后也可自动识别上游新创建的表,自动创建表。 |
@@ -1106,7 +1106,7 @@ from KAFKA_SOURCE;
3. **errCode = 2, detailMessage = current running txns on db 10006 is 100,
larger than limit 100**
- 这是因为同一个库并发导入超过了 100,可通过调整 fe.conf 的参数 `max_running_txn_num_per_db`
来解决,具体可参考
[max_running_txn_num_per_db](https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/fe-config/#max_running_txn_num_per_db)。
+ 这是因为同一个库并发导入超过了 100,可通过调整 fe.conf 的参数 `max_running_txn_num_per_db`
来解决,具体可参考
[max_running_txn_num_per_db](../admin-manual/config/fe-config#max_running_txn_num_per_db)。
同时,一个任务频繁修改 label 重启,也可能会导致这个错误。2pc 场景下 (Duplicate/Aggregate 模型),每个任务的
label 需要唯一,并且从 checkpoint 重启时,flink 任务才会主动 abort 掉之前已经 precommit 成功,没有 commit 的
txn,频繁修改 label 重启,会导致大量 precommit 成功的 txn 无法被 abort,占用事务。在 Unique 模型下也可关闭
2pc,可以实现幂等写入。
4. **tablet writer write failed, tablet_id=190958, txn_id=3505530, err=-235**
diff --git a/versioned_docs/version-2.1/ecosystem/flink-doris-connector.md
b/versioned_docs/version-2.1/ecosystem/flink-doris-connector.md
index 4377a5d1052..fd8c410f67d 100644
--- a/versioned_docs/version-2.1/ecosystem/flink-doris-connector.md
+++ b/versioned_docs/version-2.1/ecosystem/flink-doris-connector.md
@@ -828,9 +828,9 @@ After starting the Flink cluster, you can directly run the
following command:
| Key | Default Value | Required | Comment
|
| --------------------------- | ------------- | -------- |
------------------------------------------------------------ |
| sink.label-prefix | -- | Y | The label prefix
used for Stream load import. In the 2pc scenario, it is required to be globally
unique to ensure the EOS semantics of Flink. |
-| sink.properties.* | -- | N | Import parameters
for Stream Load. For example, 'sink.properties.column_separator' = ', ' defines
the column separator, and 'sink.properties.escape_delimiters' = 'true' means
that special characters as delimiters, like \x01, will be converted to binary
0x01. For JSON format import, 'sink.properties.format' = 'json',
'sink.properties.read_json_by_line' = 'true'. For detailed parameters, refer to
[here](https://doris.apache.org/zh- [...]
+| sink.properties.* | -- | N | Import parameters
for Stream Load. For example, 'sink.properties.column_separator' = ', ' defines
the column separator, and 'sink.properties.escape_delimiters' = 'true' means
that special characters as delimiters, like \x01, will be converted to binary
0x01. For JSON format import, 'sink.properties.format' = 'json',
'sink.properties.read_json_by_line' = 'true'. For detailed parameters, refer to
[here](../data-operate/import/impor [...]
| sink.enable-delete | TRUE | N | Whether to enable
deletion. This option requires the Doris table to have the batch deletion
feature enabled (enabled by default in Doris 0.15+ versions), and only supports
the Unique model. |
-| sink.enable-2pc | TRUE | N | Whether to enable
two-phase commit (2pc). The default is true, ensuring Exactly-Once semantics.
For details about two-phase commit, refer to
[here](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/stream-load-manual.md).
|
+| sink.enable-2pc | TRUE | N | Whether to enable
two-phase commit (2pc). The default is true, ensuring Exactly-Once semantics.
For details about two-phase commit, refer to
[here](../data-operate/transaction.md#streamload-2pc). |
| sink.buffer-size | 1MB | N | The size of the
write data cache buffer, in bytes. It is not recommended to modify it, and the
default configuration can be used. |
| sink.buffer-count | 3 | N | The number of write
data cache buffers. It is not recommended to modify it, and the default
configuration can be used. |
| sink.max-retries | 3 | N | The maximum number
of retries after a Commit failure. The default is 3 times. |
@@ -891,8 +891,8 @@ After starting the Flink cluster, you can directly run the
following command:
| --postgres-conf | The configuration of the Postgres CDCSource, for
example, --postgres-conf hostname=127.0.0.1. You can view all the
configurations of Postgres-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/postgres-cdc/).
Among them, hostname, username, password, database-name, schema-name, and
slot.name are required. |
| --sqlserver-conf | The configuration of the SQLServer CDCSource, for
example, --sqlserver-conf hostname=127.0.0.1. You can view all the
configurations of SQLServer-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/sqlserver-cdc/).
Among them, hostname, username, password, database-name, and schema-name are
required. |
| --db2-conf | The configuration of the SQLServer CDCSource, for
example, --db2-conf hostname=127.0.0.1. You can view all the configurations of
DB2-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/db2-cdc/).
Among them, hostname, username, password, database-name, and schema-name are
required. |
-| --sink-conf | All the configurations of the Doris Sink can be
viewed
[here](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#General
Configuration Items). |
| --mongodb-conf | The configuration of the MongoDB CDCSource, for
example, --mongodb-conf hosts=127.0.0.1:27017. You can view all the
configurations of Mongo-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/mongodb-cdc/).
Among them, hosts, username, password, and database are required.
--mongodb-conf schema.sample-percent is the configuration for automatically
sampling MongoDB data to create tables in Doris, and the default [...]
+| --sink-conf | All the configurations of the Doris Sink can be
viewed [here](./flink-doris-connector.md#general-configuration-items). |
| --table-conf | The configuration items of the Doris table, that
is, the content included in properties (except for table-buckets, which is not
a properties attribute). For example, --table-conf replication_num=1, and
--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50" means
specifying the number of buckets for different tables in the order of regular
expressions. If there is no match, the BUCKETS AUTO method will be used to
create tables. |
| --schema-change-mode | The modes for parsing schema change, including
debezium_structure and sql_parser. The debezium_structure mode is used by
default. The debezium_structure mode parses the data structure used when the
upstream CDC synchronizes data and judges DDL change operations by parsing this
structure. The sql_parser mode parses the DDL statements when the upstream CDC
synchronizes data to judge DDL change operations, so this parsing mode is more
accurate. Usage example: --s [...]
| --single-sink | Whether to use a single Sink to synchronize all
tables. After enabling, it can also automatically identify newly created tables
upstream and create tables automatically. |
@@ -1104,7 +1104,7 @@ In the whole database synchronization tool provided by
the Connector, no additio
3. **errCode = 2, detailMessage = current running txns on db 10006 is 100,
larger than limit 100**
- This is because the concurrent imports into the same database exceed 100.
It can be solved by adjusting the parameter `max_running_txn_num_per_db` in
`fe.conf`. For specific details, please refer to
[max_running_txn_num_per_db](https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/fe-config/#max_running_txn_num_per_db).
+ This is because the concurrent imports into the same database exceed 100.
It can be solved by adjusting the parameter `max_running_txn_num_per_db` in
`fe.conf`. For specific details, please refer to
[max_running_txn_num_per_db](../admin-manual/config/fe-config#max_running_txn_num_per_db).
Meanwhile, frequently modifying the label and restarting a task may also
lead to this error. In the 2pc scenario (for Duplicate/Aggregate models), the
label of each task needs to be unique. And when restarting from a checkpoint,
the Flink task will actively abort the transactions that have been
pre-committed successfully but not yet committed. Frequent label modifications
and restarts will result in a large number of pre-committed successful
transactions that cannot be aborted and thu [...]
diff --git a/versioned_docs/version-3.0/ecosystem/flink-doris-connector.md
b/versioned_docs/version-3.0/ecosystem/flink-doris-connector.md
index 4377a5d1052..fd8c410f67d 100644
--- a/versioned_docs/version-3.0/ecosystem/flink-doris-connector.md
+++ b/versioned_docs/version-3.0/ecosystem/flink-doris-connector.md
@@ -828,9 +828,9 @@ After starting the Flink cluster, you can directly run the
following command:
| Key | Default Value | Required | Comment
|
| --------------------------- | ------------- | -------- |
------------------------------------------------------------ |
| sink.label-prefix | -- | Y | The label prefix
used for Stream load import. In the 2pc scenario, it is required to be globally
unique to ensure the EOS semantics of Flink. |
-| sink.properties.* | -- | N | Import parameters
for Stream Load. For example, 'sink.properties.column_separator' = ', ' defines
the column separator, and 'sink.properties.escape_delimiters' = 'true' means
that special characters as delimiters, like \x01, will be converted to binary
0x01. For JSON format import, 'sink.properties.format' = 'json',
'sink.properties.read_json_by_line' = 'true'. For detailed parameters, refer to
[here](https://doris.apache.org/zh- [...]
+| sink.properties.* | -- | N | Import parameters
for Stream Load. For example, 'sink.properties.column_separator' = ', ' defines
the column separator, and 'sink.properties.escape_delimiters' = 'true' means
that special characters as delimiters, like \x01, will be converted to binary
0x01. For JSON format import, 'sink.properties.format' = 'json',
'sink.properties.read_json_by_line' = 'true'. For detailed parameters, refer to
[here](../data-operate/import/impor [...]
| sink.enable-delete | TRUE | N | Whether to enable
deletion. This option requires the Doris table to have the batch deletion
feature enabled (enabled by default in Doris 0.15+ versions), and only supports
the Unique model. |
-| sink.enable-2pc | TRUE | N | Whether to enable
two-phase commit (2pc). The default is true, ensuring Exactly-Once semantics.
For details about two-phase commit, refer to
[here](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/stream-load-manual.md).
|
+| sink.enable-2pc | TRUE | N | Whether to enable
two-phase commit (2pc). The default is true, ensuring Exactly-Once semantics.
For details about two-phase commit, refer to
[here](../data-operate/transaction.md#streamload-2pc). |
| sink.buffer-size | 1MB | N | The size of the
write data cache buffer, in bytes. It is not recommended to modify it, and the
default configuration can be used. |
| sink.buffer-count | 3 | N | The number of write
data cache buffers. It is not recommended to modify it, and the default
configuration can be used. |
| sink.max-retries | 3 | N | The maximum number
of retries after a Commit failure. The default is 3 times. |
@@ -891,8 +891,8 @@ After starting the Flink cluster, you can directly run the
following command:
| --postgres-conf | The configuration of the Postgres CDCSource, for
example, --postgres-conf hostname=127.0.0.1. You can view all the
configurations of Postgres-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/postgres-cdc/).
Among them, hostname, username, password, database-name, schema-name, and
slot.name are required. |
| --sqlserver-conf | The configuration of the SQLServer CDCSource, for
example, --sqlserver-conf hostname=127.0.0.1. You can view all the
configurations of SQLServer-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/sqlserver-cdc/).
Among them, hostname, username, password, database-name, and schema-name are
required. |
| --db2-conf | The configuration of the SQLServer CDCSource, for
example, --db2-conf hostname=127.0.0.1. You can view all the configurations of
DB2-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/db2-cdc/).
Among them, hostname, username, password, database-name, and schema-name are
required. |
-| --sink-conf | All the configurations of the Doris Sink can be
viewed
[here](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#General
Configuration Items). |
| --mongodb-conf | The configuration of the MongoDB CDCSource, for
example, --mongodb-conf hosts=127.0.0.1:27017. You can view all the
configurations of Mongo-CDC
[here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/mongodb-cdc/).
Among them, hosts, username, password, and database are required.
--mongodb-conf schema.sample-percent is the configuration for automatically
sampling MongoDB data to create tables in Doris, and the default [...]
+| --sink-conf | All the configurations of the Doris Sink can be
viewed [here](./flink-doris-connector.md#general-configuration-items). |
| --table-conf | The configuration items of the Doris table, that
is, the content included in properties (except for table-buckets, which is not
a properties attribute). For example, --table-conf replication_num=1, and
--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50" means
specifying the number of buckets for different tables in the order of regular
expressions. If there is no match, the BUCKETS AUTO method will be used to
create tables. |
| --schema-change-mode | The modes for parsing schema change, including
debezium_structure and sql_parser. The debezium_structure mode is used by
default. The debezium_structure mode parses the data structure used when the
upstream CDC synchronizes data and judges DDL change operations by parsing this
structure. The sql_parser mode parses the DDL statements when the upstream CDC
synchronizes data to judge DDL change operations, so this parsing mode is more
accurate. Usage example: --s [...]
| --single-sink | Whether to use a single Sink to synchronize all
tables. After enabling, it can also automatically identify newly created tables
upstream and create tables automatically. |
@@ -1104,7 +1104,7 @@ In the whole database synchronization tool provided by
the Connector, no additio
3. **errCode = 2, detailMessage = current running txns on db 10006 is 100,
larger than limit 100**
- This is because the concurrent imports into the same database exceed 100.
It can be solved by adjusting the parameter `max_running_txn_num_per_db` in
`fe.conf`. For specific details, please refer to
[max_running_txn_num_per_db](https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/fe-config/#max_running_txn_num_per_db).
+ This is because the concurrent imports into the same database exceed 100.
It can be solved by adjusting the parameter `max_running_txn_num_per_db` in
`fe.conf`. For specific details, please refer to
[max_running_txn_num_per_db](../admin-manual/config/fe-config#max_running_txn_num_per_db).
Meanwhile, frequently modifying the label and restarting a task may also
lead to this error. In the 2pc scenario (for Duplicate/Aggregate models), the
label of each task needs to be unique. And when restarting from a checkpoint,
the Flink task will actively abort the transactions that have been
pre-committed successfully but not yet committed. Frequent label modifications
and restarts will result in a large number of pre-committed successful
transactions that cannot be aborted and thu [...]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]