JNSimba commented on code in PR #3036:
URL: https://github.com/apache/doris-website/pull/3036#discussion_r2493181730
##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/streaming-job.md:
##########
@@ -0,0 +1,262 @@
+---
+{
+ "title": "持续导入",
+ "language": "zh-CN"
+}
+---
+
+## 概述
+
+Doris 可以通过 Job + TVF 的方式,创建一个持续导入任务。在提交 Job 作业后,Doris 会持续运行该导入作业,实时的查询 TVF
中的数据写入到 Doris 表中。
+
+## 支持的 TVF
+
+[S3](../../sql-manual/sql-functions/table-valued-functions/s3.md) TVF
+
+## 基本原理
+
+### S3
+
+遍历 S3 指定目录的文件,对文件进行拆分成文件列表,以小批次的文件列表的方式写入到 Doris 表中。
+
+**增量读取方式**
+
+创建任务后,Doris 会持续从指定路径中读取数据,并以固定频率轮询是否有新文件。
+
+注意:新文件的名称必须按字典序大于上一次已导入的文件名,否则 Doris 不会将其作为新文件处理。比如,文件命名为 file1、file2、file3
时会按顺序导入;如果随后新增一个 file0,由于它在字典序上小于最后已导入的文件 file3,Doris 将不会导入该文件。
+
+## 快速上手
+
+### 创建导入作业
+
+假设 S3 的目录下,会定期的产生以 CSV 结尾的文件。此时可以创建 Job
+
+```SQL
+CREATE JOB my_job
+ON STREAMING
+DO
+INSERT INTO db1.tbl1
+select * from S3(
+ "uri" = "s3://bucket/*.csv",
+ "s3.access_key" = "<s3_access_key>",
+ "s3.secret_key" = "<s3_secret_key>",
+ "s3.region" = "<s3_region>",
+ "s3.endpoint" = "<s3_endpoint>",
+ "format" = "<format>"
+)
+```
+
+### 查看导入状态
+
+```SQL
+select * from job(type=insert) where ExecuteType = "streaming"
+ Id: 1758538737484
+ Name: my_job1
+ Definer: root
+ ExecuteType: STREAMING
+RecurringStrategy: \N
+ Status: RUNNING
+ ExecuteSql: INSERT INTO test.`student1`
+SELECT * FROM S3
+(
+ "uri" = "s3://bucket/s3/demo/*.csv",
+ "format" = "csv",
+ "column_separator" = ",",
+ "s3.endpoint" = "s3.ap-southeast-1.amazonaws.com",
+ "s3.region" = "ap-southeast-1",
+ "s3.access_key" = "",
+ "s3.secret_key" = ""
+)
+ CreateTime: 2025-09-22 19:24:51
+ SucceedTaskCount: 1
+ FailedTaskCount: 0
+CanceledTaskCount: 0
+ Comment: \N
+ Properties: \N
+ CurrentOffset: {"endFile":"s3/demo/test/1.csv"}
Review Comment:
task里面的running offset,可能表示start和end会合适一些?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]