This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new a174a63e9ee [doc](automq) add AutoMQ related docs (#668) a174a63e9ee is described below commit a174a63e9ee513e546c3bdd1bafe88b218ec3e65 Author: KamiWan <kaiming....@automq.com> AuthorDate: Tue May 28 13:00:55 2024 +0800 [doc](automq) add AutoMQ related docs (#668) Signed-off-by: Kami Wan <kamiwan@iMac-Pro.local> --- docs/data-operate/import/load-manual.md | 17 ++-- docs/ecosystem/automq-load.md | 110 +++++++++++++++++++++ .../current/ecosystem/automq-load.md | 109 ++++++++++++++++++++ .../version-1.2/data-operate/import/load-manual.md | 17 ++-- .../version-1.2/ecosystem/automq-load.md | 109 ++++++++++++++++++++ .../version-2.0/data-operate/import/load-manual.md | 17 ++-- .../version-2.0/ecosystem/automq-load.md | 109 ++++++++++++++++++++ .../version-2.1/data-operate/import/load-manual.md | 1 + .../version-2.1/ecosystem/automq-load.md | 108 ++++++++++++++++++++ sidebars.json | 1 + .../images/automq/automq_storage_architecture.png | Bin 0 -> 222695 bytes .../version-1.2/ecosystem/automq-load.md | 110 +++++++++++++++++++++ .../version-2.0/ecosystem/automq-load.md | 110 +++++++++++++++++++++ .../version-2.1/ecosystem/automq-load.md | 110 +++++++++++++++++++++ versioned_sidebars/version-1.2-sidebars.json | 1 + versioned_sidebars/version-2.0-sidebars.json | 1 + versioned_sidebars/version-2.1-sidebars.json | 1 + 17 files changed, 907 insertions(+), 24 deletions(-) diff --git a/docs/data-operate/import/load-manual.md b/docs/data-operate/import/load-manual.md index 994a6f5e345..06f51cd607d 100644 --- a/docs/data-operate/import/load-manual.md +++ b/docs/data-operate/import/load-manual.md @@ -32,14 +32,15 @@ Doris provides a variety of data import solutions, and you can choose different ### By Scene -| Data Source | Loading Method | -| ------------------------------------ | ------------------------------------------------------------ | -| Object Storage (s3), HDFS | [Loading data using Broker](./broker-load-manual) | -| Local file | [Loading local data](./stream-load-manual) | -| Kafka | [Subscribing to Kafka data](./routine-load-manual) | -| MySQL, PostgreSQL, Oracle, SQLServer | [Sync data via external table](./mysql-load-manual) | -| Loading via JDBC | [Sync data using JDBC](../../lakehouse/database/jdbc) | -| Loading JSON format data | [JSON format data Loading](./load-json-format) | +| Data Source | Loading Method | +| ------------------------------------ |-------------------------------------------------------| +| Object Storage (s3), HDFS | [Loading data using Broker](./broker-load-manual) | +| Local file | [Loading local data](./stream-load-manual) | +| Kafka | [Subscribing to Kafka data](./routine-load-manual) | +| MySQL, PostgreSQL, Oracle, SQLServer | [Sync data via external table](./mysql-load-manual) | +| Loading via JDBC | [Sync data using JDBC](../../lakehouse/database/jdbc) | +| Loading JSON format data | [JSON format data Loading](./load-json-format) | +| AutoMQ | [AutoMQ Load](../../ecosystem/automq-load.md) | ### By Loading Method diff --git a/docs/ecosystem/automq-load.md b/docs/ecosystem/automq-load.md new file mode 100644 index 00000000000..b395052529f --- /dev/null +++ b/docs/ecosystem/automq-load.md @@ -0,0 +1,110 @@ +--- +{ + "title": "AutoMQ Load", + "language": "en" +} +--- + +[AutoMQ](https://github.com/AutoMQ/automq) is a cloud-native fork of Kafka by separating storage to object storage like S3. It remains 100% compatible with Apache Kafka® while offering users up to a 10x cost-effective and 100x elasticity . Through its innovative shared storage architecture, it achieves capabilities such as reassign partitions in seconds, self-balancing and auto scaling in seconds while ensuring high throughput and low latency. + + +This article will explain how to use Apache Doris Routine Load to import data from AutoMQ into Doris. For more details on Routine Load, please refer to the [Routine Load](https://doris.apache.org/docs/data-operate/import/routine-load-manual/) document. + +## Environment Preparation +### Prepare Apache Doris and Test Data + +Ensure that a working Apache Doris cluster is already set up. For demonstration purposes, we have deployed a test Apache Doris environment on Linux following the [Deploying with Docker](https://doris.apache.org/docs/install/cluster-deployment/run-docker-cluster) document. +Create databases and test tables: +``` +create database automq_db; +CREATE TABLE automq_db.users ( + id bigint NOT NULL, + name string NOT NULL, + timestamp string NULL, + status string NULL + +) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1'); +``` + +### Prepare Kafka Command Line Tools + +Download the latest TGZ package from [AutoMQ Releases](https://github.com/AutoMQ/automq) and extract it. Assuming the extraction directory is $AUTOMQ_HOME, this article will use the scripts under $AUTOMQ_HOME/bin to create topics and generate test data. + +### Prepare AutoMQ and test data + +Refer to the AutoMQ [official deployment documentation](https://docs.automq.com/docs/automq-opensource/EvqhwAkpriAomHklOUzcUtybn7g) to deploy a functional cluster, ensuring network connectivity between AutoMQ and Apache Doris. +Quickly create a topic named example_topic in AutoMQ and write a test JSON data to it by following these steps. + +**Create Topic** + +Use the Apache Kafka® command line tool in AutoMQ to create the topic, ensuring that you have access to a Kafka environment and that the Kafka service is running. Here is an example command to create a topic: +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1 +``` +> Tips: When executing the command, replace `topic` and `bootstarp-server` with the actual AutoMQ Bootstrap Server address. + +After creating the topic, you can use the following command to verify that the topic has been successfully created. +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092 +``` + +**Generate test data** + +Create a JSON-formatted test data entry, corresponding to the table mentioned earlier. +``` +{ + "id": 1, + "name": "testuser", + "timestamp": "2023-11-10T12:00:00", + "status": "active" +} +``` +**Write test data** + +Use Kafka's command-line tools or a programming approach to write the test data to a topic named `example_topic`. Below is an example using the command-line tool: +``` +echo '{"id": 1, "name": "testuser", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic +``` +To view the data just written to the topic, use the following command: +``` +sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning +``` +> Tips: When executing the command, replace `topic` and `bootstarp-server` with the actual AutoMQ Bootstrap Server address. + +## Create a Routine Load import job + +In the Apache Doris command line, create a Routine Load job that accepts JSON data to continuously import data from an AutoMQ Kafka topic. For detailed parameter information of Routine Load, please refer to [Doris Routine Load]. +``` +CREATE ROUTINE LOAD automq_example_load ON users +COLUMNS(id, name, timestamp, status) +PROPERTIES +( + "format" = "json", + "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]" + ) +FROM KAFKA +( + "kafka_broker_list" = "127.0.0.1:9092", + "kafka_topic" = "example_topic", + "property.kafka_default_offsets" = "OFFSET_BEGINNING" +); +``` +> Tips: When executing the command, you need to replace kafka_broker_list with the actual AutoMQ Bootstrap Server address. + +## Verify data import + +First, check the status of the Routine Load import job to ensure that the task is running. +``` +show routine load\G; +``` +Then query the relevant tables in the Apache Doris database, and you will see that the data has been successfully imported. +``` +select * from users; ++------+--------------+---------------------+--------+ +| id | name | timestamp | status | ++------+--------------+---------------------+--------+ +| 1 | testuser | 2023-11-10T12:00:00 | active | +| 2 | testuser | 2023-11-10T12:00:00 | active | ++------+--------------+---------------------+--------+ +2 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/automq-load.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/automq-load.md new file mode 100644 index 00000000000..bebbc21d841 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/automq-load.md @@ -0,0 +1,109 @@ +--- +{ + "title": "AutoMQ Load", + "language": "zh-CN" +} + +--- + + +[AutoMQ](https://github.com/AutoMQ/automq) 是基于云重新设计的云原生 Kafka。通过将存储分离至对象存储,在保持和 Apache Kafka 100% 兼容的前提下,为用户提供高达 10 倍的成本优势以及百倍的弹性优势。通过其创新的共享存储架构,在保证高吞吐、低延迟的性能指标下实现了秒级分区迁移、流量自平衡、秒级自动弹性等能力。 + + + +## 环境准备 +### 准备 Apache Doris 和测试数据 + +确保当前已准备好可用的 Apache Doris 集群。为了便于演示,我们参考 [Docker 部署 Doris](https://doris.apache.org/zh-CN/docs/install/cluster-deployment/run-docker-cluster) 文档在 Linux 上部署了一套测试用的 Apache Doris 环境。 +创建库和测试表: +``` +create database automq_db; +CREATE TABLE automq_db.users ( + id bigint NOT NULL, + name string NOT NULL, + timestamp string NULL, + status string NULL + +) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1'); +``` +### 准备 Kafka 命令行工具 + +从 [AutoMQ Releases](https://github.com/AutoMQ/automq) 下载最新的 TGZ 包并解压。假设解压目录为 $AUTOMQ_HOME,在本文中将会使用 $AUTOMQ_HOME/bin 下的工具命令来创建主题和生成测试数据。 + +### 准备 AutoMQ 和测试数据 + +参考 AutoMQ [官方部署文档](https://docs.automq.com/docs/automq-opensource/EvqhwAkpriAomHklOUzcUtybn7g)部署一套可用的集群,确保 AutoMQ 与 Apache Doris 之间保持网络连通。 +在 AutoMQ 中快速创建一个名为 example_topic 的主题,并向其中写入一条测试 JSON 数据,按照以下步骤操作。 + +**创建 Topic** + +使用 Apache Kafka 命令行工具创建主题,需要确保当前拥有 Kafka 环境的访问权限并且 Kafka 服务正在运行。以下是创建主题的命令示例: +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1 +``` +在执行命令时,需要将 topic 和 bootstarp-server 替换为实际使用的 AutoMQ Bootstarp Server 地址。 +创建完主题后,可以使用以下命令来验证主题是否已成功创建。 +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092 +``` +**生成测试数据** + +生成一条 JSON 格式的测试数据,和前文的表需要对应。 +``` +{ + "id": 1, + "name": "测试用户", + "timestamp": "2023-11-10T12:00:00", + "status": "active" +} +``` +**写入测试数据** + +通过 Kafka 的命令行工具或编程方式将测试数据写入到名为 example_topic 的主题中。下面是一个使用命令行工具的示例: +``` +echo '{"id": 1, "name": "测试用户", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic +``` +使用如下命令可以查看刚写入的 topic 数据: +``` +sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning +``` +> 注意:在执行命令时,需要将 topic 和 bootstarp-server 替换为实际使用的 AutoMQ Bootstarp Server 地址。 + +## 创建 Routine Load 导入作业 + +在 Apache Doris 的命令行中创建一个接收 JSON 数据的 Routine Load 作业,用来持续导入 AutoMQ Kafka topic 中的数据。具体 Routine Load 的参数说明请参考 [Doris Routine Load](https://doris.apache.org/zh-CN/docs/data-operate/import/routine-load-manual)。 +``` +CREATE ROUTINE LOAD automq_example_load ON users +COLUMNS(id, name, timestamp, status) +PROPERTIES +( + "format" = "json", + "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]" + ) +FROM KAFKA +( + "kafka_broker_list" = "127.0.0.1:9092", + "kafka_topic" = "example_topic", + "property.kafka_default_offsets" = "OFFSET_BEGINNING" +); +``` +> 注意:在执行命令时,需要将 kafka_broker_list 替换为实际使用的 AutoMQ Bootstarp Server 地址。 + +## 验证数据导入 + +首先,检查 Routine Load 导入作业的状态,确保任务正在运行中。 +``` +show routine load\G; +``` +然后查询 Apache Doris 数据库中的相关表,可以看到数据已经被成功导入。 +``` +select * from users; ++------+--------------+---------------------+--------+ +| id | name | timestamp | status | ++------+--------------+---------------------+--------+ +| 1 | 测试用户 | 2023-11-10T12:00:00 | active | +| 2 | 测试用户 | 2023-11-10T12:00:00 | active | ++------+--------------+---------------------+--------+ +2 rows in set (0.01 sec) +``` + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/data-operate/import/load-manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/data-operate/import/load-manual.md index 122d71c8c2e..077044c9805 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/data-operate/import/load-manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/data-operate/import/load-manual.md @@ -33,14 +33,15 @@ Doris 提供多种数据导入方案,可以针对不同的数据源进行选 ### 按场景划分 | 数据源 | 导入方式 | -| ------------------------------------ | ------------------------------------------------------------ | -| 对象存储(s3),HDFS | [使用Broker导入数据](./import-scenes/external-storage-load.md) | -| 本地文件 | [导入本地数据](./import-scenes/local-file-load.md) | -| Kafka | [订阅Kafka数据](./import-scenes/kafka-load.md) | -| Mysql、PostgreSQL,Oracle,SQLServer | [通过外部表同步数据](./import-scenes/external-table-load.md) | -| 通过JDBC导入 | [使用JDBC同步数据](./import-scenes/jdbc-load.md) | -| 导入JSON格式数据 | [JSON格式数据导入](./import-way/load-json-format.md) | -| MySQL Binlog | [Binlog Load](./import-way/binlog-load-manual.md) | +|-----------------------------------|----------------------------------------------------------| +| 对象存储(s3),HDFS | [使用Broker导入数据](./import-scenes/external-storage-load.md) | +| 本地文件 | [导入本地数据](./import-scenes/local-file-load.md) | +| Kafka | [订阅Kafka数据](./import-scenes/kafka-load.md) | +| Mysql、PostgreSQL,Oracle,SQLServer | [通过外部表同步数据](./import-scenes/external-table-load.md) | +| 通过JDBC导入 | [使用JDBC同步数据](./import-scenes/jdbc-load.md) | +| 导入JSON格式数据 | [JSON格式数据导入](./import-way/load-json-format.md) | +| MySQL Binlog | [Binlog Load](./import-way/binlog-load-manual.md) | +| AutoMQ | [AutoMQ Load](../../ecosystem/automq-load.md) | ### 按导入方式划分 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/ecosystem/automq-load.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/ecosystem/automq-load.md new file mode 100644 index 00000000000..bebbc21d841 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/ecosystem/automq-load.md @@ -0,0 +1,109 @@ +--- +{ + "title": "AutoMQ Load", + "language": "zh-CN" +} + +--- + + +[AutoMQ](https://github.com/AutoMQ/automq) 是基于云重新设计的云原生 Kafka。通过将存储分离至对象存储,在保持和 Apache Kafka 100% 兼容的前提下,为用户提供高达 10 倍的成本优势以及百倍的弹性优势。通过其创新的共享存储架构,在保证高吞吐、低延迟的性能指标下实现了秒级分区迁移、流量自平衡、秒级自动弹性等能力。 + + + +## 环境准备 +### 准备 Apache Doris 和测试数据 + +确保当前已准备好可用的 Apache Doris 集群。为了便于演示,我们参考 [Docker 部署 Doris](https://doris.apache.org/zh-CN/docs/install/cluster-deployment/run-docker-cluster) 文档在 Linux 上部署了一套测试用的 Apache Doris 环境。 +创建库和测试表: +``` +create database automq_db; +CREATE TABLE automq_db.users ( + id bigint NOT NULL, + name string NOT NULL, + timestamp string NULL, + status string NULL + +) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1'); +``` +### 准备 Kafka 命令行工具 + +从 [AutoMQ Releases](https://github.com/AutoMQ/automq) 下载最新的 TGZ 包并解压。假设解压目录为 $AUTOMQ_HOME,在本文中将会使用 $AUTOMQ_HOME/bin 下的工具命令来创建主题和生成测试数据。 + +### 准备 AutoMQ 和测试数据 + +参考 AutoMQ [官方部署文档](https://docs.automq.com/docs/automq-opensource/EvqhwAkpriAomHklOUzcUtybn7g)部署一套可用的集群,确保 AutoMQ 与 Apache Doris 之间保持网络连通。 +在 AutoMQ 中快速创建一个名为 example_topic 的主题,并向其中写入一条测试 JSON 数据,按照以下步骤操作。 + +**创建 Topic** + +使用 Apache Kafka 命令行工具创建主题,需要确保当前拥有 Kafka 环境的访问权限并且 Kafka 服务正在运行。以下是创建主题的命令示例: +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1 +``` +在执行命令时,需要将 topic 和 bootstarp-server 替换为实际使用的 AutoMQ Bootstarp Server 地址。 +创建完主题后,可以使用以下命令来验证主题是否已成功创建。 +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092 +``` +**生成测试数据** + +生成一条 JSON 格式的测试数据,和前文的表需要对应。 +``` +{ + "id": 1, + "name": "测试用户", + "timestamp": "2023-11-10T12:00:00", + "status": "active" +} +``` +**写入测试数据** + +通过 Kafka 的命令行工具或编程方式将测试数据写入到名为 example_topic 的主题中。下面是一个使用命令行工具的示例: +``` +echo '{"id": 1, "name": "测试用户", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic +``` +使用如下命令可以查看刚写入的 topic 数据: +``` +sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning +``` +> 注意:在执行命令时,需要将 topic 和 bootstarp-server 替换为实际使用的 AutoMQ Bootstarp Server 地址。 + +## 创建 Routine Load 导入作业 + +在 Apache Doris 的命令行中创建一个接收 JSON 数据的 Routine Load 作业,用来持续导入 AutoMQ Kafka topic 中的数据。具体 Routine Load 的参数说明请参考 [Doris Routine Load](https://doris.apache.org/zh-CN/docs/data-operate/import/routine-load-manual)。 +``` +CREATE ROUTINE LOAD automq_example_load ON users +COLUMNS(id, name, timestamp, status) +PROPERTIES +( + "format" = "json", + "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]" + ) +FROM KAFKA +( + "kafka_broker_list" = "127.0.0.1:9092", + "kafka_topic" = "example_topic", + "property.kafka_default_offsets" = "OFFSET_BEGINNING" +); +``` +> 注意:在执行命令时,需要将 kafka_broker_list 替换为实际使用的 AutoMQ Bootstarp Server 地址。 + +## 验证数据导入 + +首先,检查 Routine Load 导入作业的状态,确保任务正在运行中。 +``` +show routine load\G; +``` +然后查询 Apache Doris 数据库中的相关表,可以看到数据已经被成功导入。 +``` +select * from users; ++------+--------------+---------------------+--------+ +| id | name | timestamp | status | ++------+--------------+---------------------+--------+ +| 1 | 测试用户 | 2023-11-10T12:00:00 | active | +| 2 | 测试用户 | 2023-11-10T12:00:00 | active | ++------+--------------+---------------------+--------+ +2 rows in set (0.01 sec) +``` + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/load-manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/load-manual.md index a4d8da64049..32435eda2e9 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/load-manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/load-manual.md @@ -32,14 +32,15 @@ Doris 提供多种数据导入方案,可以针对不同的数据源进行选 ### 按场景划分 -| 数据源 | 导入方式 | -| ------------------------------------ | ------------------------------------------------------------ | -| 对象存储(s3),HDFS | [使用 Broker 导入数据](../import/broker-load-manual) | -| 本地文件 | [Stream Load](../import/stream-load-manual), [MySQL Load](../import/mysql-load-manual) | -| Kafka | [订阅 Kafka 数据](../import/routine-load-manual) | -| Mysql、PostgreSQL,Oracle,SQLServer | [通过外部表同步数据](../import/insert-into-manual) | -| 通过 JDBC 导入 | [使用 JDBC 同步数据](../../lakehouse/database/jdbc) | -| 导入 JSON 格式数据 | [JSON 格式数据导入](../import/load-json-format) | +| 数据源 | 导入方式 | +| ------------------------------------ |----------------------------------------------------------------------------------------| +| 对象存储(s3),HDFS | [使用 Broker 导入数据](../import/broker-load-manual) | +| 本地文件 | [Stream Load](../import/stream-load-manual), [MySQL Load](../import/mysql-load-manual) | +| Kafka | [订阅 Kafka 数据](../import/routine-load-manual) | +| Mysql、PostgreSQL,Oracle,SQLServer | [通过外部表同步数据](../import/insert-into-manual) | +| 通过 JDBC 导入 | [使用 JDBC 同步数据](../../lakehouse/database/jdbc) | +| 导入 JSON 格式数据 | [JSON 格式数据导入](../import/load-json-format) | +| AutoMQ | [订阅 AutoMQ 数据](../../ecosystem/automq-load.md) | ### 按导入方式划分 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/ecosystem/automq-load.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/ecosystem/automq-load.md new file mode 100644 index 00000000000..e01074386eb --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/ecosystem/automq-load.md @@ -0,0 +1,109 @@ +--- +{ + "title": "AutoMQ 数据导入", + "language": "zh-CN" +} + +--- + + +[AutoMQ](https://github.com/AutoMQ/automq) 是基于云重新设计的云原生 Kafka。通过将存储分离至对象存储,在保持和 Apache Kafka 100% 兼容的前提下,为用户提供高达 10 倍的成本优势以及百倍的弹性优势。通过其创新的共享存储架构,在保证高吞吐、低延迟的性能指标下实现了秒级分区迁移、流量自平衡、秒级自动弹性等能力。 + + + +## 环境准备 +### 准备 Apache Doris 和测试数据 + +确保当前已准备好可用的 Apache Doris 集群。为了便于演示,我们参考 [Docker 部署 Doris](https://doris.apache.org/zh-CN/docs/install/cluster-deployment/run-docker-cluster) 文档在 Linux 上部署了一套测试用的 Apache Doris 环境。 +创建库和测试表: +``` +create database automq_db; +CREATE TABLE automq_db.users ( + id bigint NOT NULL, + name string NOT NULL, + timestamp string NULL, + status string NULL + +) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1'); +``` +### 准备 Kafka 命令行工具 + +从 [AutoMQ Releases](https://github.com/AutoMQ/automq) 下载最新的 TGZ 包并解压。假设解压目录为 $AUTOMQ_HOME,在本文中将会使用 $AUTOMQ_HOME/bin 下的工具命令来创建主题和生成测试数据。 + +### 准备 AutoMQ 和测试数据 + +参考 AutoMQ [官方部署文档](https://docs.automq.com/docs/automq-opensource/EvqhwAkpriAomHklOUzcUtybn7g)部署一套可用的集群,确保 AutoMQ 与 Apache Doris 之间保持网络连通。 +在 AutoMQ 中快速创建一个名为 example_topic 的主题,并向其中写入一条测试 JSON 数据,按照以下步骤操作。 + +**创建 Topic** + +使用 Apache Kafka 命令行工具创建主题,需要确保当前拥有 Kafka 环境的访问权限并且 Kafka 服务正在运行。以下是创建主题的命令示例: +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1 +``` +在执行命令时,需要将 topic 和 bootstarp-server 替换为实际使用的 AutoMQ Bootstarp Server 地址。 +创建完主题后,可以使用以下命令来验证主题是否已成功创建。 +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092 +``` +**生成测试数据** + +生成一条 JSON 格式的测试数据,和前文的表需要对应。 +``` +{ + "id": 1, + "name": "测试用户", + "timestamp": "2023-11-10T12:00:00", + "status": "active" +} +``` +**写入测试数据** + +通过 Kafka 的命令行工具或编程方式将测试数据写入到名为 example_topic 的主题中。下面是一个使用命令行工具的示例: +``` +echo '{"id": 1, "name": "测试用户", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic +``` +使用如下命令可以查看刚写入的 topic 数据: +``` +sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning +``` +> 注意:在执行命令时,需要将 topic 和 bootstarp-server 替换为实际使用的 AutoMQ Bootstarp Server 地址。 + +## 创建 Routine Load 导入作业 + +在 Apache Doris 的命令行中创建一个接收 JSON 数据的 Routine Load 作业,用来持续导入 AutoMQ Kafka topic 中的数据。具体 Routine Load 的参数说明请参考 [Doris Routine Load](https://doris.apache.org/zh-CN/docs/data-operate/import/routine-load-manual)。 +``` +CREATE ROUTINE LOAD automq_example_load ON users +COLUMNS(id, name, timestamp, status) +PROPERTIES +( + "format" = "json", + "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]" + ) +FROM KAFKA +( + "kafka_broker_list" = "127.0.0.1:9092", + "kafka_topic" = "example_topic", + "property.kafka_default_offsets" = "OFFSET_BEGINNING" +); +``` +> 注意:在执行命令时,需要将 kafka_broker_list 替换为实际使用的 AutoMQ Bootstarp Server 地址。 + +## 验证数据导入 + +首先,检查 Routine Load 导入作业的状态,确保任务正在运行中。 +``` +show routine load\G; +``` +然后查询 Apache Doris 数据库中的相关表,可以看到数据已经被成功导入。 +``` +select * from users; ++------+--------------+---------------------+--------+ +| id | name | timestamp | status | ++------+--------------+---------------------+--------+ +| 1 | 测试用户 | 2023-11-10T12:00:00 | active | +| 2 | 测试用户 | 2023-11-10T12:00:00 | active | ++------+--------------+---------------------+--------+ +2 rows in set (0.01 sec) +``` + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-manual.md index 187c3ff01ce..bfb01b14bd9 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-manual.md @@ -40,6 +40,7 @@ Doris 提供多种数据导入方案,可以针对不同的数据源进行选 | Mysql、PostgreSQL,Oracle,SQLServer | [通过外部表同步数据](./insert-into-manual) | | 通过 JDBC 导入 | [使用 JDBC 同步数据](../../lakehouse/database/jdbc) | | 导入 JSON 格式数据 | [JSON 格式数据导入](./load-json-format) | +| AutoMQ | [AutoMQ Load](../../ecosystem/automq-load.md) | ### 按导入方式划分 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/automq-load.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/automq-load.md new file mode 100644 index 00000000000..3a1e121c06e --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/automq-load.md @@ -0,0 +1,108 @@ +--- +{ + "title": "AutoMQ 数据导入", + "language": "zh-CN" +} + +--- + + +[AutoMQ](https://github.com/AutoMQ/automq) 是基于云重新设计的云原生 Kafka。通过将存储分离至对象存储,在保持和 Apache Kafka 100% 兼容的前提下,为用户提供高达 10 倍的成本优势以及百倍的弹性优势。通过其创新的共享存储架构,在保证高吞吐、低延迟的性能指标下实现了秒级分区迁移、流量自平衡、秒级自动弹性等能力。 + + +## 环境准备 +### 准备 Apache Doris 和测试数据 + +确保当前已准备好可用的 Apache Doris 集群。为了便于演示,我们参考 [Docker 部署 Doris](https://doris.apache.org/zh-CN/docs/install/cluster-deployment/run-docker-cluster) 文档在 Linux 上部署了一套测试用的 Apache Doris 环境。 +创建库和测试表: +``` +create database automq_db; +CREATE TABLE automq_db.users ( + id bigint NOT NULL, + name string NOT NULL, + timestamp string NULL, + status string NULL + +) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1'); +``` +### 准备 AutoMQ 命令行工具 + +从 [AutoMQ Releases](https://github.com/AutoMQ/automq) 下载最新的 TGZ 包并解压。假设解压目录为 $AUTOMQ_HOME,在本文中将会使用 $AUTOMQ_HOME/bin 下的工具命令来创建主题和生成测试数据。 + +### 准备 AutoMQ 和测试数据 + +参考 AutoMQ [官方部署文档](https://docs.automq.com/docs/automq-opensource/EvqhwAkpriAomHklOUzcUtybn7g)部署一套可用的集群,确保 AutoMQ 与 Apache Doris 之间保持网络连通。 +在 AutoMQ 中快速创建一个名为 example_topic 的主题,并向其中写入一条测试 JSON 数据,按照以下步骤操作。 + +**创建 Topic** + +使用 Apache Kafka 命令行工具创建主题,需要确保当前拥有 Kafka 环境的访问权限并且 Kafka 服务正在运行。以下是创建主题的命令示例: +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1 +``` +在执行命令时,需要将 topic 和 bootstarp-server 替换为实际使用的 AutoMQ Bootstarp Server 地址。 +创建完主题后,可以使用以下命令来验证主题是否已成功创建。 +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092 +``` +**生成测试数据** + +生成一条 JSON 格式的测试数据,和前文的表需要对应。 +``` +{ + "id": 1, + "name": "测试用户", + "timestamp": "2023-11-10T12:00:00", + "status": "active" +} +``` +**写入测试数据** + +通过 Kafka 的命令行工具或编程方式将测试数据写入到名为 example_topic 的主题中。下面是一个使用命令行工具的示例: +``` +echo '{"id": 1, "name": "测试用户", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic +``` +使用如下命令可以查看刚写入的 topic 数据: +``` +sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning +``` +> 注意:在执行命令时,需要将 topic 和 bootstarp-server 替换为实际使用的 AutoMQ Bootstarp Server 地址。 + +## 创建 Routine Load 导入作业 + +在 Apache Doris 的命令行中创建一个接收 JSON 数据的 Routine Load 作业,用来持续导入 AutoMQ Kafka topic 中的数据。具体 Routine Load 的参数说明请参考 [Doris Routine Load](https://doris.apache.org/zh-CN/docs/data-operate/import/routine-load-manual)。 +``` +CREATE ROUTINE LOAD automq_example_load ON users +COLUMNS(id, name, timestamp, status) +PROPERTIES +( + "format" = "json", + "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]" + ) +FROM KAFKA +( + "kafka_broker_list" = "127.0.0.1:9092", + "kafka_topic" = "example_topic", + "property.kafka_default_offsets" = "OFFSET_BEGINNING" +); +``` +> 注意:在执行命令时,需要将 kafka_broker_list 替换为实际使用的 AutoMQ Bootstarp Server 地址。 + +## 验证数据导入 + +首先,检查 Routine Load 导入作业的状态,确保任务正在运行中。 +``` +show routine load\G; +``` +然后查询 Apache Doris 数据库中的相关表,可以看到数据已经被成功导入。 +``` +select * from users; ++------+--------------+---------------------+--------+ +| id | name | timestamp | status | ++------+--------------+---------------------+--------+ +| 1 | 测试用户 | 2023-11-10T12:00:00 | active | +| 2 | 测试用户 | 2023-11-10T12:00:00 | active | ++------+--------------+---------------------+--------+ +2 rows in set (0.01 sec) +``` + diff --git a/sidebars.json b/sidebars.json index f73b0a51988..de5351eba98 100644 --- a/sidebars.json +++ b/sidebars.json @@ -493,6 +493,7 @@ "ecosystem/logstash", "ecosystem/beats", "ecosystem/cloudcanal", + "ecosystem/automq-load", "ecosystem/doris-streamloader", "ecosystem/hive-bitmap-udf", "ecosystem/hive-hll-udf", diff --git a/static/images/automq/automq_storage_architecture.png b/static/images/automq/automq_storage_architecture.png new file mode 100644 index 00000000000..c0239c15d2d Binary files /dev/null and b/static/images/automq/automq_storage_architecture.png differ diff --git a/versioned_docs/version-1.2/ecosystem/automq-load.md b/versioned_docs/version-1.2/ecosystem/automq-load.md new file mode 100644 index 00000000000..b395052529f --- /dev/null +++ b/versioned_docs/version-1.2/ecosystem/automq-load.md @@ -0,0 +1,110 @@ +--- +{ + "title": "AutoMQ Load", + "language": "en" +} +--- + +[AutoMQ](https://github.com/AutoMQ/automq) is a cloud-native fork of Kafka by separating storage to object storage like S3. It remains 100% compatible with Apache Kafka® while offering users up to a 10x cost-effective and 100x elasticity . Through its innovative shared storage architecture, it achieves capabilities such as reassign partitions in seconds, self-balancing and auto scaling in seconds while ensuring high throughput and low latency. + + +This article will explain how to use Apache Doris Routine Load to import data from AutoMQ into Doris. For more details on Routine Load, please refer to the [Routine Load](https://doris.apache.org/docs/data-operate/import/routine-load-manual/) document. + +## Environment Preparation +### Prepare Apache Doris and Test Data + +Ensure that a working Apache Doris cluster is already set up. For demonstration purposes, we have deployed a test Apache Doris environment on Linux following the [Deploying with Docker](https://doris.apache.org/docs/install/cluster-deployment/run-docker-cluster) document. +Create databases and test tables: +``` +create database automq_db; +CREATE TABLE automq_db.users ( + id bigint NOT NULL, + name string NOT NULL, + timestamp string NULL, + status string NULL + +) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1'); +``` + +### Prepare Kafka Command Line Tools + +Download the latest TGZ package from [AutoMQ Releases](https://github.com/AutoMQ/automq) and extract it. Assuming the extraction directory is $AUTOMQ_HOME, this article will use the scripts under $AUTOMQ_HOME/bin to create topics and generate test data. + +### Prepare AutoMQ and test data + +Refer to the AutoMQ [official deployment documentation](https://docs.automq.com/docs/automq-opensource/EvqhwAkpriAomHklOUzcUtybn7g) to deploy a functional cluster, ensuring network connectivity between AutoMQ and Apache Doris. +Quickly create a topic named example_topic in AutoMQ and write a test JSON data to it by following these steps. + +**Create Topic** + +Use the Apache Kafka® command line tool in AutoMQ to create the topic, ensuring that you have access to a Kafka environment and that the Kafka service is running. Here is an example command to create a topic: +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1 +``` +> Tips: When executing the command, replace `topic` and `bootstarp-server` with the actual AutoMQ Bootstrap Server address. + +After creating the topic, you can use the following command to verify that the topic has been successfully created. +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092 +``` + +**Generate test data** + +Create a JSON-formatted test data entry, corresponding to the table mentioned earlier. +``` +{ + "id": 1, + "name": "testuser", + "timestamp": "2023-11-10T12:00:00", + "status": "active" +} +``` +**Write test data** + +Use Kafka's command-line tools or a programming approach to write the test data to a topic named `example_topic`. Below is an example using the command-line tool: +``` +echo '{"id": 1, "name": "testuser", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic +``` +To view the data just written to the topic, use the following command: +``` +sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning +``` +> Tips: When executing the command, replace `topic` and `bootstarp-server` with the actual AutoMQ Bootstrap Server address. + +## Create a Routine Load import job + +In the Apache Doris command line, create a Routine Load job that accepts JSON data to continuously import data from an AutoMQ Kafka topic. For detailed parameter information of Routine Load, please refer to [Doris Routine Load]. +``` +CREATE ROUTINE LOAD automq_example_load ON users +COLUMNS(id, name, timestamp, status) +PROPERTIES +( + "format" = "json", + "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]" + ) +FROM KAFKA +( + "kafka_broker_list" = "127.0.0.1:9092", + "kafka_topic" = "example_topic", + "property.kafka_default_offsets" = "OFFSET_BEGINNING" +); +``` +> Tips: When executing the command, you need to replace kafka_broker_list with the actual AutoMQ Bootstrap Server address. + +## Verify data import + +First, check the status of the Routine Load import job to ensure that the task is running. +``` +show routine load\G; +``` +Then query the relevant tables in the Apache Doris database, and you will see that the data has been successfully imported. +``` +select * from users; ++------+--------------+---------------------+--------+ +| id | name | timestamp | status | ++------+--------------+---------------------+--------+ +| 1 | testuser | 2023-11-10T12:00:00 | active | +| 2 | testuser | 2023-11-10T12:00:00 | active | ++------+--------------+---------------------+--------+ +2 rows in set (0.01 sec) +``` diff --git a/versioned_docs/version-2.0/ecosystem/automq-load.md b/versioned_docs/version-2.0/ecosystem/automq-load.md new file mode 100644 index 00000000000..b395052529f --- /dev/null +++ b/versioned_docs/version-2.0/ecosystem/automq-load.md @@ -0,0 +1,110 @@ +--- +{ + "title": "AutoMQ Load", + "language": "en" +} +--- + +[AutoMQ](https://github.com/AutoMQ/automq) is a cloud-native fork of Kafka by separating storage to object storage like S3. It remains 100% compatible with Apache Kafka® while offering users up to a 10x cost-effective and 100x elasticity . Through its innovative shared storage architecture, it achieves capabilities such as reassign partitions in seconds, self-balancing and auto scaling in seconds while ensuring high throughput and low latency. + + +This article will explain how to use Apache Doris Routine Load to import data from AutoMQ into Doris. For more details on Routine Load, please refer to the [Routine Load](https://doris.apache.org/docs/data-operate/import/routine-load-manual/) document. + +## Environment Preparation +### Prepare Apache Doris and Test Data + +Ensure that a working Apache Doris cluster is already set up. For demonstration purposes, we have deployed a test Apache Doris environment on Linux following the [Deploying with Docker](https://doris.apache.org/docs/install/cluster-deployment/run-docker-cluster) document. +Create databases and test tables: +``` +create database automq_db; +CREATE TABLE automq_db.users ( + id bigint NOT NULL, + name string NOT NULL, + timestamp string NULL, + status string NULL + +) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1'); +``` + +### Prepare Kafka Command Line Tools + +Download the latest TGZ package from [AutoMQ Releases](https://github.com/AutoMQ/automq) and extract it. Assuming the extraction directory is $AUTOMQ_HOME, this article will use the scripts under $AUTOMQ_HOME/bin to create topics and generate test data. + +### Prepare AutoMQ and test data + +Refer to the AutoMQ [official deployment documentation](https://docs.automq.com/docs/automq-opensource/EvqhwAkpriAomHklOUzcUtybn7g) to deploy a functional cluster, ensuring network connectivity between AutoMQ and Apache Doris. +Quickly create a topic named example_topic in AutoMQ and write a test JSON data to it by following these steps. + +**Create Topic** + +Use the Apache Kafka® command line tool in AutoMQ to create the topic, ensuring that you have access to a Kafka environment and that the Kafka service is running. Here is an example command to create a topic: +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1 +``` +> Tips: When executing the command, replace `topic` and `bootstarp-server` with the actual AutoMQ Bootstrap Server address. + +After creating the topic, you can use the following command to verify that the topic has been successfully created. +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092 +``` + +**Generate test data** + +Create a JSON-formatted test data entry, corresponding to the table mentioned earlier. +``` +{ + "id": 1, + "name": "testuser", + "timestamp": "2023-11-10T12:00:00", + "status": "active" +} +``` +**Write test data** + +Use Kafka's command-line tools or a programming approach to write the test data to a topic named `example_topic`. Below is an example using the command-line tool: +``` +echo '{"id": 1, "name": "testuser", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic +``` +To view the data just written to the topic, use the following command: +``` +sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning +``` +> Tips: When executing the command, replace `topic` and `bootstarp-server` with the actual AutoMQ Bootstrap Server address. + +## Create a Routine Load import job + +In the Apache Doris command line, create a Routine Load job that accepts JSON data to continuously import data from an AutoMQ Kafka topic. For detailed parameter information of Routine Load, please refer to [Doris Routine Load]. +``` +CREATE ROUTINE LOAD automq_example_load ON users +COLUMNS(id, name, timestamp, status) +PROPERTIES +( + "format" = "json", + "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]" + ) +FROM KAFKA +( + "kafka_broker_list" = "127.0.0.1:9092", + "kafka_topic" = "example_topic", + "property.kafka_default_offsets" = "OFFSET_BEGINNING" +); +``` +> Tips: When executing the command, you need to replace kafka_broker_list with the actual AutoMQ Bootstrap Server address. + +## Verify data import + +First, check the status of the Routine Load import job to ensure that the task is running. +``` +show routine load\G; +``` +Then query the relevant tables in the Apache Doris database, and you will see that the data has been successfully imported. +``` +select * from users; ++------+--------------+---------------------+--------+ +| id | name | timestamp | status | ++------+--------------+---------------------+--------+ +| 1 | testuser | 2023-11-10T12:00:00 | active | +| 2 | testuser | 2023-11-10T12:00:00 | active | ++------+--------------+---------------------+--------+ +2 rows in set (0.01 sec) +``` diff --git a/versioned_docs/version-2.1/ecosystem/automq-load.md b/versioned_docs/version-2.1/ecosystem/automq-load.md new file mode 100644 index 00000000000..b395052529f --- /dev/null +++ b/versioned_docs/version-2.1/ecosystem/automq-load.md @@ -0,0 +1,110 @@ +--- +{ + "title": "AutoMQ Load", + "language": "en" +} +--- + +[AutoMQ](https://github.com/AutoMQ/automq) is a cloud-native fork of Kafka by separating storage to object storage like S3. It remains 100% compatible with Apache Kafka® while offering users up to a 10x cost-effective and 100x elasticity . Through its innovative shared storage architecture, it achieves capabilities such as reassign partitions in seconds, self-balancing and auto scaling in seconds while ensuring high throughput and low latency. + + +This article will explain how to use Apache Doris Routine Load to import data from AutoMQ into Doris. For more details on Routine Load, please refer to the [Routine Load](https://doris.apache.org/docs/data-operate/import/routine-load-manual/) document. + +## Environment Preparation +### Prepare Apache Doris and Test Data + +Ensure that a working Apache Doris cluster is already set up. For demonstration purposes, we have deployed a test Apache Doris environment on Linux following the [Deploying with Docker](https://doris.apache.org/docs/install/cluster-deployment/run-docker-cluster) document. +Create databases and test tables: +``` +create database automq_db; +CREATE TABLE automq_db.users ( + id bigint NOT NULL, + name string NOT NULL, + timestamp string NULL, + status string NULL + +) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1'); +``` + +### Prepare Kafka Command Line Tools + +Download the latest TGZ package from [AutoMQ Releases](https://github.com/AutoMQ/automq) and extract it. Assuming the extraction directory is $AUTOMQ_HOME, this article will use the scripts under $AUTOMQ_HOME/bin to create topics and generate test data. + +### Prepare AutoMQ and test data + +Refer to the AutoMQ [official deployment documentation](https://docs.automq.com/docs/automq-opensource/EvqhwAkpriAomHklOUzcUtybn7g) to deploy a functional cluster, ensuring network connectivity between AutoMQ and Apache Doris. +Quickly create a topic named example_topic in AutoMQ and write a test JSON data to it by following these steps. + +**Create Topic** + +Use the Apache Kafka® command line tool in AutoMQ to create the topic, ensuring that you have access to a Kafka environment and that the Kafka service is running. Here is an example command to create a topic: +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1 +``` +> Tips: When executing the command, replace `topic` and `bootstarp-server` with the actual AutoMQ Bootstrap Server address. + +After creating the topic, you can use the following command to verify that the topic has been successfully created. +``` +$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092 +``` + +**Generate test data** + +Create a JSON-formatted test data entry, corresponding to the table mentioned earlier. +``` +{ + "id": 1, + "name": "testuser", + "timestamp": "2023-11-10T12:00:00", + "status": "active" +} +``` +**Write test data** + +Use Kafka's command-line tools or a programming approach to write the test data to a topic named `example_topic`. Below is an example using the command-line tool: +``` +echo '{"id": 1, "name": "testuser", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic +``` +To view the data just written to the topic, use the following command: +``` +sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning +``` +> Tips: When executing the command, replace `topic` and `bootstarp-server` with the actual AutoMQ Bootstrap Server address. + +## Create a Routine Load import job + +In the Apache Doris command line, create a Routine Load job that accepts JSON data to continuously import data from an AutoMQ Kafka topic. For detailed parameter information of Routine Load, please refer to [Doris Routine Load]. +``` +CREATE ROUTINE LOAD automq_example_load ON users +COLUMNS(id, name, timestamp, status) +PROPERTIES +( + "format" = "json", + "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]" + ) +FROM KAFKA +( + "kafka_broker_list" = "127.0.0.1:9092", + "kafka_topic" = "example_topic", + "property.kafka_default_offsets" = "OFFSET_BEGINNING" +); +``` +> Tips: When executing the command, you need to replace kafka_broker_list with the actual AutoMQ Bootstrap Server address. + +## Verify data import + +First, check the status of the Routine Load import job to ensure that the task is running. +``` +show routine load\G; +``` +Then query the relevant tables in the Apache Doris database, and you will see that the data has been successfully imported. +``` +select * from users; ++------+--------------+---------------------+--------+ +| id | name | timestamp | status | ++------+--------------+---------------------+--------+ +| 1 | testuser | 2023-11-10T12:00:00 | active | +| 2 | testuser | 2023-11-10T12:00:00 | active | ++------+--------------+---------------------+--------+ +2 rows in set (0.01 sec) +``` diff --git a/versioned_sidebars/version-1.2-sidebars.json b/versioned_sidebars/version-1.2-sidebars.json index fd11c31d590..bbd6a6faa6d 100644 --- a/versioned_sidebars/version-1.2-sidebars.json +++ b/versioned_sidebars/version-1.2-sidebars.json @@ -242,6 +242,7 @@ "ecosystem/plugin-development-manual", "ecosystem/audit-plugin", "ecosystem/cloudcanal", + "ecosystem/automq-load", "ecosystem/hive-bitmap-udf", { "type": "category", diff --git a/versioned_sidebars/version-2.0-sidebars.json b/versioned_sidebars/version-2.0-sidebars.json index 7c320681c21..c5f509d9184 100644 --- a/versioned_sidebars/version-2.0-sidebars.json +++ b/versioned_sidebars/version-2.0-sidebars.json @@ -479,6 +479,7 @@ "ecosystem/logstash", "ecosystem/beats", "ecosystem/cloudcanal", + "ecosystem/automq-load", "ecosystem/doris-streamloader", "ecosystem/hive-bitmap-udf", { diff --git a/versioned_sidebars/version-2.1-sidebars.json b/versioned_sidebars/version-2.1-sidebars.json index 36e763b7a30..ee2d63d30a2 100644 --- a/versioned_sidebars/version-2.1-sidebars.json +++ b/versioned_sidebars/version-2.1-sidebars.json @@ -490,6 +490,7 @@ "ecosystem/logstash", "ecosystem/beats", "ecosystem/cloudcanal", + "ecosystem/automq-load", "ecosystem/doris-streamloader", "ecosystem/hive-bitmap-udf", "ecosystem/hive-hll-udf", --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org