This is an automated email from the ASF dual-hosted git repository. jiafengzheng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 9166bc0c6c load json format fix 9166bc0c6c is described below commit 9166bc0c6c0f1f4f34926550469a44f34e1879e2 Author: jiafeng.zhang <zhang...@gmail.com> AuthorDate: Fri Jul 22 11:39:56 2022 +0800 load json format fix load json format fix --- .../import/import-way/load-json-format.md | 70 ++++++++++++++++------ .../import/import-way/load-json-format.md | 68 ++++++++++++++++----- 2 files changed, 105 insertions(+), 33 deletions(-) diff --git a/docs/data-operate/import/import-way/load-json-format.md b/docs/data-operate/import/import-way/load-json-format.md index d6097e2137..eafdc543e4 100644 --- a/docs/data-operate/import/import-way/load-json-format.md +++ b/docs/data-operate/import/import-way/load-json-format.md @@ -66,19 +66,32 @@ Currently only the following two Json formats are supported: This method must be used with the setting `strip_outer_array=true`. Doris will expand the array when parsing, and then parse each Object in turn as a row of data. 2. A single row of data represented by Object - Json format with Object as root node. The entire Object represents a row of data to be imported. An example is as follows: ````json { "id": 123, "city" : "beijing"} ```` - + ````json { "id": 123, "city" : { "name" : "beijing", "region" : "haidian" }} ```` - + This method is usually used for the Routine Load import method, such as representing a message in Kafka, that is, a row of data. +3. Multiple lines of Object data separated by a fixed delimiter + + A row of data represented by Object represents a row of data to be imported. The example is as follows: + + ````json + { "id": 123, "city" : "beijing"} + { "id": 456, "city" : "shanghai"} + ... + ```` + + This method is typically used for Stream Load import methods to represent multiple rows of data in a batch of imported data. + + This method must be used with the setting `read_json_by_line=true`, the special delimiter also needs to specify the `line_delimiter` parameter, the default is `\n`. When Doris parses, it will be separated according to the delimiter, and then parse each line of Object as a line of data. + ### fuzzy_parse parameters In [STREAM LOAD](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md) `fuzzy_parse` parameter can be added to speed up JSON Data import efficiency. @@ -380,7 +393,7 @@ code INT NULL 100 beijing 1 ```` -3. Import multiple rows of data +3. Import multiple rows of data as Array ````json [ @@ -416,24 +429,45 @@ code INT NULL 105 {"order1":["guangzhou"]} 6 ```` -4. Transform the imported data +4. Import multi-line data as multi-line Object - The data is still the multi-line data in Example 3, and now it is necessary to add 1 to the `code` column in the imported data before importing. + ```json + {"id": 100, "city": "beijing", "code" : 1} + {"id": 101, "city": "shanghai"} + {"id": 102, "city": "tianjin", "code" : 3} + {"id": 103, "city": "chongqing", "code" : 4} + ``` - ```bash - curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.city\",\"$.code\"]" - H "strip_outer_array: true" -H "columns: id, city, tmpc, code=tmpc+1" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load - ```` + StreamLoad import: - Import result: +```bash +curl --location-trusted -u user:passwd -H "format: json" -H "read_json_by_line: true" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load +``` + Import result: + + 100 beijing 1 + 101 shanghai NULL + 102 tianjin 3 + 103 chongqing 4 - ````text - 100 beijing 2 - 101 shanghai NULL - 102 tianjin 4 - 103 chongqing 5 - 104 ["zhejiang","guangzhou"] 6 - 105 {"order1":["guangzhou"]} 7 - ```` +5. Transform the imported data + +The data is still the multi-line data in Example 3, and now it is necessary to add 1 to the `code` column in the imported data before importing. + +```bash +curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.city\",\"$.code\"]" - H "strip_outer_array: true" -H "columns: id, city, tmpc, code=tmpc+1" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load +```` + +Import result: + +````text +100 beijing 2 +101 shanghai NULL +102 tianjin 4 +103 chongqing 5 +104 ["zhejiang","guangzhou"] 6 +105 {"order1":["guangzhou"]} 7 +```` ### Routine Load diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md index 97a4ebf99c..25cc84b5db 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md @@ -78,6 +78,20 @@ Doris 支持导入 JSON 格式的数据。本文档主要说明在进行JSON格 ``` 这种方式通常用于 Routine Load 导入方式,如表示 Kafka 中的一条消息,即一行数据。 + +2. 以固定分隔符分隔的多行 Object 数据 + + Object表示的一行数据即表示要导入的一行数据,示例如下: + + ```json + { "id": 123, "city" : "beijing"} + { "id": 456, "city" : "shanghai"} + ... + ``` + + 这种方式通常用于 Stream Load 导入方式,以便在一批导入数据中表示多行数据。 + + 这种方式必须配合设置 `read_json_by_line=true` 使用,特殊分隔符还需要指定`line_delimiter`参数,默认`\n`。Doris 在解析时会按照分隔符分隔,然后解析其中的每一行 Object 作为一行数据。 ### fuzzy_parse 参数 @@ -379,7 +393,7 @@ code INT NULL 100 beijing 1 ``` -3. 导入多行数据 +3. 以 Array 形式导入多行数据 ```json [ @@ -415,24 +429,48 @@ code INT NULL 105 {"order1":["guangzhou"]} 6 ``` -4. 对导入数据进行转换 +4. 以 多行Object 形式导入多行数据 - 数据依然是示例3中的多行数据,现需要对导入数据中的 `code` 列加1后导入。 + ``` + {"id": 100, "city": "beijing", "code" : 1} + {"id": 101, "city": "shanghai"} + {"id": 102, "city": "tianjin", "code" : 3} + {"id": 103, "city": "chongqing", "code" : 4} + ``` - ```bash - curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.city\",\"$.code\"]" -H "strip_outer_array: true" -H "columns: id, city, tmpc, code=tmpc+1" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load - ``` +StreamLoad导入: + +```bash +curl --location-trusted -u user:passwd -H "format: json" -H "read_json_by_line: true" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load +``` - 导入结果: +导入结果: - ```text - 100 beijing 2 - 101 shanghai NULL - 102 tianjin 4 - 103 chongqing 5 - 104 ["zhejiang","guangzhou"] 6 - 105 {"order1":["guangzhou"]} 7 - ``` +``` +100 beijing 1 +101 shanghai NULL +102 tianjin 3 +103 chongqing 4 +``` + +5. 对导入数据进行转换 + +数据依然是示例3中的多行数据,现需要对导入数据中的 `code` 列加1后导入。 + +```bash +curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.city\",\"$.code\"]" -H "strip_outer_array: true" -H "columns: id, city, tmpc, code=tmpc+1" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load +``` + +导入结果: + +```text +100 beijing 2 +101 shanghai NULL +102 tianjin 4 +103 chongqing 5 +104 ["zhejiang","guangzhou"] 6 +105 {"order1":["guangzhou"]} 7 +``` ### Routine Load --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org