This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch branch-1.2-unstable in repository https://gitbox.apache.org/repos/asf/doris.git
commit fdb42a4dd2836206052b23bc5c6ba5c11004cf0e Author: carlvinhust2012 <huchengha...@126.com> AuthorDate: Wed Nov 9 12:21:26 2022 +0800 [docs](array-type) update the docs to specify how to use array function when import data (#13995) Co-authored-by: hucheng01 <huchen...@baidu.com> --- docs/en/docs/data-operate/import/load-manual.md | 17 +++++++++++++++++ docs/zh-CN/docs/data-operate/import/load-manual.md | 18 ++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/docs/en/docs/data-operate/import/load-manual.md b/docs/en/docs/data-operate/import/load-manual.md index 5bc11a82da..84469e7e05 100644 --- a/docs/en/docs/data-operate/import/load-manual.md +++ b/docs/en/docs/data-operate/import/load-manual.md @@ -81,3 +81,20 @@ For best practices on atomicity guarantees, see Importing Transactions and Atomi ## Synchronous and asynchronous imports Import methods are divided into synchronous and asynchronous. For the synchronous import method, the returned result indicates whether the import succeeds or fails. For the asynchronous import method, a successful return only means that the job was submitted successfully, not that the data was imported successfully. You need to use the corresponding command to check the running status of the import job. + +## Import the data of array type + +The array function can only be supported in vectorization scenarios, but non-vectorization scenarios are not supported. +if you want to apply the array function to import data, you should enable vectorization engine. Then you need to cast the input parameter column into the array type according to the parameter of the array function. Finally, you can continue to use the array function. + +For example, in the following import, you need to cast columns b14 and a13 into `array<string>` type, and then use the `array_union` function. + +```sql +LOAD LABEL label_03_14_49_34_898986_19090452100 ( + DATA INFILE("hdfs://test.hdfs.com:9000/user/test/data/sys/load/array_test.data") + INTO TABLE `test_array_table` + COLUMNS TERMINATED BY "|" (`k1`, `a1`, `a2`, `a3`, `a4`, `a5`, `a6`, `a7`, `a8`, `a9`, `a10`, `a11`, `a12`, `a13`, `b14`) + SET(a14=array_union(cast(b14 as array<string>), cast(a13 as array<string>))) WHERE size(a2) > 270) + WITH BROKER "hdfs" ("username"="test_array", "password"="") + PROPERTIES( "max_filter_ratio"="0.8" ); +``` diff --git a/docs/zh-CN/docs/data-operate/import/load-manual.md b/docs/zh-CN/docs/data-operate/import/load-manual.md index bdbd8765e7..4d8d80baca 100644 --- a/docs/zh-CN/docs/data-operate/import/load-manual.md +++ b/docs/zh-CN/docs/data-operate/import/load-manual.md @@ -82,3 +82,21 @@ Label 是用于保证对应的导入作业,仅能成功导入一次。一个 导入方式分为同步和异步。对于同步导入方式,返回结果即表示导入成功还是失败。而对于异步导入方式,返回成功仅代表作业提交成功,不代表数据导入成功,需要使用对应的命令查看导入作业的运行状态。 +## 导入array类型 + +向量化场景才能支持array函数,非向量化场景不支持。 + +如果想要应用array函数导入数据,则应先启用向量化功能;然后需要根据array函数的参数类型将输入参数列转换为array类型;最后,就可以继续使用array函数了。 + +例如以下导入,需要先将列b14和列a13先cast成`array<string>`类型,再运用`array_union`函数。 + +```sql +LOAD LABEL label_03_14_49_34_898986_19090452100 ( + DATA INFILE("hdfs://test.hdfs.com:9000/user/test/data/sys/load/array_test.data") + INTO TABLE `test_array_table` + COLUMNS TERMINATED BY "|" (`k1`, `a1`, `a2`, `a3`, `a4`, `a5`, `a6`, `a7`, `a8`, `a9`, `a10`, `a11`, `a12`, `a13`, `b14`) + SET(a14=array_union(cast(b14 as array<string>), cast(a13 as array<string>))) WHERE size(a2) > 270) + WITH BROKER "hdfs" ("username"="test_array", "password"="") + PROPERTIES( "max_filter_ratio"="0.8" ); +``` + --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org