This is an automated email from the ASF dual-hosted git repository. jiafengzheng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new f2036d5af57 Specify JSON root document when adding JSON import, and add JSON file size limit parameter prompt (#74) f2036d5af57 is described below commit f2036d5af57bfcb1b01739c8d6e8c5ec4a96d56d Author: zy-kkk <zhong...@qq.com> AuthorDate: Wed Aug 31 12:05:25 2022 +0800 Specify JSON root document when adding JSON import, and add JSON file size limit parameter prompt (#74) --- .../import/import-way/load-json-format.md | 38 ++++++++++++++++++++++ .../import/import-way/load-json-format.md | 38 ++++++++++++++++++++++ 2 files changed, 76 insertions(+) diff --git a/docs/data-operate/import/import-way/load-json-format.md b/docs/data-operate/import/import-way/load-json-format.md index eafdc543e44..d612a3faf3a 100644 --- a/docs/data-operate/import/import-way/load-json-format.md +++ b/docs/data-operate/import/import-way/load-json-format.md @@ -92,6 +92,12 @@ Currently only the following two Json formats are supported: This method must be used with the setting `read_json_by_line=true`, the special delimiter also needs to specify the `line_delimiter` parameter, the default is `\n`. When Doris parses, it will be separated according to the delimiter, and then parse each line of Object as a line of data. +### streaming_load_json_max_mb parameters + +Some data formats, such as JSON, cannot be split. Doris must read all the data into the memory before parsing can begin. Therefore, this value is used to limit the maximum amount of data that can be loaded in a single Stream load. + +The default value is 100, The unit is MB, modify this parameter by referring to the [BE configuration](../../../admin-manual/config/be-config.md). + ### fuzzy_parse parameters In [STREAM LOAD](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md) `fuzzy_parse` parameter can be added to speed up JSON Data import efficiency. @@ -276,6 +282,38 @@ The above example will import the value of k1 multiplied by 100. The final impor +------+------+ ```` +## Json root + +Doris supports extracting data specified in Json through Json root. + +**Note: Because for Array type data, Doris will expand the array first, and finally process it in a single line according to the Object format. So the examples later in this document are all explained with Json data in a single Object format. ** + +- do not specify Json root + + If Json root is not specified, Doris will use the column name in the table to find the element in Object by default. An example is as follows: + + The table contains two columns: `id`, `city` + + The Json data is as follows: + + ```json + { "id": 123, "name" : { "id" : "321", "city" : "shanghai" }} + ``` + + Then use `id`, `city` for matching, and get the final data `123` and `null` + +- Specify Json root + + When the import data format is json, you can specify the root node of the Json data through json_root. Doris will extract the elements of the root node through json_root for parsing. Default is empty. + + Specify Json root `-H "json_root: $.name"`. The matched elements are: + + ```json + { "id" : "321", "city" : "shanghai" } + ``` + + The element will be treated as new json for subsequent import operations,and get the final data 321 and shanghai + ## NULL and Default values Example data is as follows: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md index 25cc84b5db3..f28d591f22d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md @@ -93,6 +93,12 @@ Doris 支持导入 JSON 格式的数据。本文档主要说明在进行JSON格 这种方式必须配合设置 `read_json_by_line=true` 使用,特殊分隔符还需要指定`line_delimiter`参数,默认`\n`。Doris 在解析时会按照分隔符分隔,然后解析其中的每一行 Object 作为一行数据。 +### streaming_load_json_max_mb 参数 + +一些数据格式,如 JSON,无法进行拆分处理,必须读取全部数据到内存后才能开始解析,因此,这个值用于限制此类格式数据单次导入最大数据量。 + +默认值为100,单位MB,可参考[BE配置项](../../../admin-manual/config/be-config.md)修改这个参数 + ### fuzzy_parse 参数 在 [STREAM LOAD](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md)中,可以添加 `fuzzy_parse` 参数来加速 JSON 数据的导入效率。 @@ -277,6 +283,38 @@ curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\", +------+------+ ``` +## Json root + +Doris 支持通过 Json root 抽取 Json 中指定的数据。 + +**注:因为对于 Array 类型的数据,Doris 会先进行数组展开,最终按照 Object 格式进行单行处理。所以本文档之后的示例都以单个 Object 格式的 Json 数据进行说明。** + +- 不指定 Json root + + 如果没有指定 Json root,则 Doris 会默认使用表中的列名查找 Object 中的元素。示例如下: + + 表中包含两列: `id`, `city` + + Json 数据为: + + ```json + { "id": 123, "name" : { "id" : "321", "city" : "shanghai" }} + ``` + + 则 Doris 会使用id, city 进行匹配,得到最终数据 123 和 null。 + +- 指定 Json root + + 通过 json_root 指定 Json 数据的根节点。Doris 将通过 json_root 抽取根节点的元素进行解析。默认为空。 + + 指定 Json root `-H "json_root: $.name"`。则匹配到的元素为: + + ```json + { "id" : "321", "city" : "shanghai" } + ``` + + 该元素会被当作新Json进行后续导入操作,得到最终数据 321 和 shanghai + ## NULL 和 Default 值 示例数据如下: --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org