This is an automated email from the ASF dual-hosted git repository. kassiez pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 4a011a276d1 [docs](load) add data type docs (#1521) 4a011a276d1 is described below commit 4a011a276d1bdfc2c4b32e715494f4e5cc6f2ebb Author: Kaijie Chen <c...@apache.org> AuthorDate: Fri Dec 27 10:14:26 2024 +0800 [docs](load) add data type docs (#1521) ## Versions - [x] dev - [x] 3.0 - [x] 2.1 - [ ] 2.0 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --- docs/data-operate/import/complex-types/array.md | 147 +++++++++++++++++++ docs/data-operate/import/complex-types/json.md | 149 ++++++++++++++++++++ docs/data-operate/import/complex-types/map.md | 144 +++++++++++++++++++ docs/data-operate/import/complex-types/struct.md | 156 +++++++++++++++++++++ .../docusaurus-plugin-content-docs/current.json | 4 + .../data-operate/import/complex-types/array.md | 147 +++++++++++++++++++ .../data-operate/import/complex-types/json.md | 152 ++++++++++++++++++++ .../data-operate/import/complex-types/map.md | 146 +++++++++++++++++++ .../data-operate/import/complex-types/struct.md | 154 ++++++++++++++++++++ .../version-2.1.json | 4 + .../data-operate/import/complex-types/array.md | 147 +++++++++++++++++++ .../data-operate/import/complex-types/json.md | 152 ++++++++++++++++++++ .../data-operate/import/complex-types/map.md | 146 +++++++++++++++++++ .../data-operate/import/complex-types/struct.md | 154 ++++++++++++++++++++ .../version-3.0.json | 4 + .../data-operate/import/complex-types/array.md | 147 +++++++++++++++++++ .../data-operate/import/complex-types/json.md | 152 ++++++++++++++++++++ .../data-operate/import/complex-types/map.md | 146 +++++++++++++++++++ .../data-operate/import/complex-types/struct.md | 154 ++++++++++++++++++++ sidebars.json | 10 ++ .../data-operate/import/complex-types/array.md | 147 +++++++++++++++++++ .../data-operate/import/complex-types/json.md | 152 ++++++++++++++++++++ .../data-operate/import/complex-types/map.md | 146 +++++++++++++++++++ .../data-operate/import/complex-types/struct.md | 154 ++++++++++++++++++++ .../data-operate/import/complex-types/array.md | 147 +++++++++++++++++++ .../data-operate/import/complex-types/json.md | 152 ++++++++++++++++++++ .../data-operate/import/complex-types/map.md | 146 +++++++++++++++++++ .../data-operate/import/complex-types/struct.md | 154 ++++++++++++++++++++ versioned_sidebars/version-2.1-sidebars.json | 10 ++ versioned_sidebars/version-3.0-sidebars.json | 10 ++ 30 files changed, 3633 insertions(+) diff --git a/docs/data-operate/import/complex-types/array.md b/docs/data-operate/import/complex-types/array.md new file mode 100644 index 00000000000..a079606b5e0 --- /dev/null +++ b/docs/data-operate/import/complex-types/array.md @@ -0,0 +1,147 @@ +--- +{ + "title": "ARRAY", + "language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`ARRAY<T>` An array of T-type items, it cannot be used as a key column. + +- Before version 2.0, it was only supported in the Duplicate model table. +- Starting from version 2.0, it is supported in the non-key columns of the Unique model table. + +T-type could be any of: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, +DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV format import + +### Step 1: Prepare the data + +Create the following csv file: `test_array.csv` +The separator is `|` instead of comma to distinguish it from the comma in array. + +``` +1|[1,2,3,4,5] +2|[6,7,8] +3|[] +4|null +``` + +### Step 2: Create a table in the database + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### Step 3: Load data + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_array" \ + -T "test_array.csv" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### Step 4: Check the imported data + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` + +## JSON format import + +### Step 1: Prepare the data + +Create the following JSON file, `test_array.json` + +```json +[ + {"id":1, "c_array":[1,2,3,4,5]}, + {"id":2, "c_array":[6,7,8]}, + {"id":3, "c_array":[]}, + {"id":4, "c_array":null} +] +``` + +### Step 2: Create a table in the database + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### Step 3: Load data + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_array" \ + -H "strip_outer_array:true" \ + -T "test_array.json" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### Step 4: Check the imported data + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` diff --git a/docs/data-operate/import/complex-types/json.md b/docs/data-operate/import/complex-types/json.md new file mode 100644 index 00000000000..faf949d19f7 --- /dev/null +++ b/docs/data-operate/import/complex-types/json.md @@ -0,0 +1,149 @@ +--- +{ + "title": "JSON", + "language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +The JSON data type stores JSON data efficiently in a binary format and allows access to its internal fields through JSON functions. + +By default, it supports up to 1048576 bytes (1MB), and can be increased up to 2147483643 bytes (2GB). This can be adjusted via the string_type_length_soft_limit_bytes configuration. + +Compared to storing JSON strings in a regular STRING type, the JSON type has two main advantages: + +JSON format validation during data insertion. +More efficient binary storage format, enabling faster access to JSON internal fields using functions like json_extract, compared to get_json_xx functions. +Note: In version 1.2.x, the JSON type was named JSONB. To maintain compatibility with MySQL, it was renamed to JSON starting from version 2.0.0. Older tables can still use the previous name. + +## CSV format import + +### Step 1: Prepare the data + +Create the following csv file: `test_json.csv` +The separator is `|` instead of comma to distinguish it from the comma in json. + +``` +1|{"name": "tom", "age": 35} +2|{"name": null, "age": 28} +3|{"name": "micheal", "age": null} +4|{"name": null, "age": null} +5|null +``` + +### Step 2: Create a table in the database + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### Step 3: Load data + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_json" \ + -T "test_json.csv" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### Step 4: Check the imported data + +```sql +SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | null | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON format import + +### Step 1: Prepare the data + +Create the following JSON file, `test_json.json` + +```json +[ + {"id": 1, "c_json": {"name": "tom", "age": 35}}, + {"id": 2, "c_json": {"name": null, "age": 28}}, + {"id": 3, "c_json": {"name": "micheal", "age": null}}, + {"id": 4, "c_json": {"name": null, "age": null}}, + {"id": 5, "c_json": null} +] +``` + +### Step 2: Create a table in the database + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### Step 3: Load data + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_json" \ + -H "strip_outer_array:true" \ + -T "test_json.json" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### Step 4: Check the imported data + +```sql +mysql> SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | NULL | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` diff --git a/docs/data-operate/import/complex-types/map.md b/docs/data-operate/import/complex-types/map.md new file mode 100644 index 00000000000..256c388dca7 --- /dev/null +++ b/docs/data-operate/import/complex-types/map.md @@ -0,0 +1,144 @@ +--- +{ + "title": "MAP", + "language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`MAP<K, V>` A Map of K, V items, it cannot be used as a key column. Now MAP can only used in Duplicate and Unique Model Tables. + +K,V could be any of: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, +DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV format import + +### Step 1: Prepare the data + +Create the following csv file: `test_map.csv` +The separator is `|` instead of comma to distinguish it from the comma in map. + +``` +1|{"Emily":101,"age":25} +2|{"Benjamin":102} +3|{} +4|null +``` + +### Step 2: Create a table in the database + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### Step 3: Load data + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_map" \ + -T "test_map.csv" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### Step 4: Check the imported data + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` + +## JSON format import + +### Step 1: Prepare the data + +Create the following JSON file, `test_map.json` + +```json +[ + {"id":1, "c_map":{"Emily":101, "age":25}}, + {"id":2, "c_map":{"Benjamin":102}}, + {"id":3, "c_map":{}}, + {"id":4, "c_map":null} +] +``` + +### Step 2: Create a table in the database + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### Step 3: Load data + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_map" \ + -H "strip_outer_array:true" \ + -T "test_map.json" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### Step 4: Check the imported data + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` diff --git a/docs/data-operate/import/complex-types/struct.md b/docs/data-operate/import/complex-types/struct.md new file mode 100644 index 00000000000..9cfc23352d9 --- /dev/null +++ b/docs/data-operate/import/complex-types/struct.md @@ -0,0 +1,156 @@ +--- +{ + "title": "STRUCT", + "language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns. + +- It cannot be used as a Key column. Now STRUCT can only used in Duplicate Model Tables. + +- The names and number of Fields in a Struct is fixed and always Nullable, and a Field typically consists of the following parts. + + - field_name: Identifier naming the field, non repeatable. + - field_type: A data type. + - COMMENT: An optional string describing the field. (currently not supported) + +The currently supported types are: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, +DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV format import + +### Step 1: Prepare the data + +Create the following csv file: `test_struct.csv` +The separator is `|` instead of comma to distinguish it from the comma in struct. + +``` +1|{10, 3.14, "Emily"} +2|{4, 1.5, null} +3|{7, null, "Benjamin"} +4|{} +5|null +``` + +### Step 2: Create a table in the database + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### Step 3: Load data + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_struct" \ + -T "test_struct.csv" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### Step 4: Check the imported data + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON format import + +### Step 1: Prepare the data + +Create the following JSON file, `test_struct.json` + +```json +[ + {"id":1, "c_struct":{"f1":10, "f2":3.14, "f3":"Emily"}}, + {"id":2, "c_struct":{"f1":4, "f2":1.5, "f3":null}}, + {"id":3, "c_struct":{"f1":7, "f2":null, "f3":"Benjamin"}}, + {"id":4, "c_struct":{}}, + {"id":5, "c_struct":null} +] +``` + +### Step 2: Create a table in the database + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### Step 3: Load data + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_struct" \ + -H "strip_outer_array:true" \ + -T "test_struct.json" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### Step 4: Check the imported data + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.00 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json index 9332a9deb9a..2a05c19351b 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json @@ -111,6 +111,10 @@ "message": "导入方式", "description": "The label for category Import Way in sidebar docs" }, + "sidebar.docs.category.Complex Data Types": { + "message": "复杂数据类型", + "description": "The label for category Complex Data Types in sidebar docs" + }, "sidebar.docs.category.Updating Data": { "message": "数据更新", "description": "The label for category Update in sidebar docs" diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/array.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/array.md new file mode 100644 index 00000000000..e67fa40cfbe --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/array.md @@ -0,0 +1,147 @@ +--- +{ + "title": "ARRAY", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`ARRAY<T>` 表示由 T 类型元素组成的数组,不能作为 key 列使用。 + +- 2.0 之前仅支持在 Duplicate 模型的表中使用。 +- 从 2.0 版本开始支持在 Unique 模型的表中的非 key 列使用。 + +T 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_array.csv` +其中分隔符使用 `|` 而不是逗号,以便和 array 中的逗号区分。 + +``` +1|[1,2,3,4,5] +2|[6,7,8] +3|[] +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_array" \ + -T "test_array.csv" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_array.json` + +```json +[ + {"id":1, "c_array":[1,2,3,4,5]}, + {"id":2, "c_array":[6,7,8]}, + {"id":3, "c_array":[]}, + {"id":4, "c_array":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_array" \ + -H "strip_outer_array:true" \ + -T "test_array.json" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/json.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/json.md new file mode 100644 index 00000000000..a6a825dc0a5 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/json.md @@ -0,0 +1,152 @@ +--- +{ + "title": "JSON", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`JSON` 数据类型,用二进制格式高效存储 JSON 数据,通过 JSON 函数访问其内部字段。 + +默认支持 1048576 字节(1 MB),可调大到 2147483643 字节(2 GB),可通过 BE 配置`string_type_length_soft_limit_bytes` 调整。 + +与普通 String 类型存储的 JSON 字符串相比,JSON 类型有两点优势 + +1. 数据写入时进行 JSON 格式校验 +2. 二进制存储格式更加高效,通过json_extract等函数可以高效访问JSON内部字段,比get_json_xx函数快几倍 + +:::caution[注意] +在1.2.x版本中,JSON 类型的名字是 JSONB,为了尽量跟 MySQL 兼容,从 2.0.0 版本开始改名为 JSON,老的表仍然可以使用。 +::: + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_json.csv` +其中分隔符使用 `|` 而不是逗号,以便和 json 中的逗号区分。 + +``` +1|{"name": "tom", "age": 35} +2|{"name": null, "age": 28} +3|{"name": "micheal", "age": null} +4|{"name": null, "age": null} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_json" \ + -T "test_json.csv" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | null | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_json.json` + +```json +[ + {"id": 1, "c_json": {"name": "tom", "age": 35}}, + {"id": 2, "c_json": {"name": null, "age": 28}}, + {"id": 3, "c_json": {"name": "micheal", "age": null}}, + {"id": 4, "c_json": {"name": null, "age": null}}, + {"id": 5, "c_json": null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_json" \ + -H "strip_outer_array:true" \ + -T "test_json.json" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | NULL | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/map.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/map.md new file mode 100644 index 00000000000..0ca127503dd --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/map.md @@ -0,0 +1,146 @@ +--- +{ + "title": "MAP", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`MAP<K, V>` 表示由K, V类型元素组成的 map,不能作为 key 列使用。 + +- 目前支持在 Duplicate,Unique 模型的表中使用。 + +K, V 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_map.csv` +其中分隔符使用 `|` 而不是逗号,以便和 map 中的逗号区分。 + +``` +1|{"Emily":101,"age":25} +2|{"Benjamin":102} +3|{} +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_map" \ + -T "test_map.csv" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_map.json` + +```json +[ + {"id":1, "c_map":{"Emily":101, "age":25}}, + {"id":2, "c_map":{"Benjamin":102}}, + {"id":3, "c_map":{}}, + {"id":4, "c_map":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_map" \ + -H "strip_outer_array:true" \ + -T "test_map.json" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/struct.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/struct.md new file mode 100644 index 00000000000..19d46f40b56 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/struct.md @@ -0,0 +1,154 @@ +--- +{ + "title": "STRUCT", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` 表示由多个 Field 组成的结构体,也可被理解为多个列的集合。 + +- 不能作为 Key 使用,目前 STRUCT 仅支持在 Duplicate 模型的表中使用。 +- 一个 Struct 中的 Field 的名字和数量固定,总是为 Nullable,一个 Field 通常由下面部分组成。 + - field_name: Field 的标识符,不可重复 + - field_type: Field 的类型 + - COMMENT: Field 的注释,可选 (暂不支持) + +当前可支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_struct.csv` +其中分隔符使用 `|` 而不是逗号,以便和 struct 中的逗号区分。 + +``` +1|{10, 3.14, "Emily"} +2|{4, 1.5, null} +3|{7, null, "Benjamin"} +4|{} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_struct" \ + -T "test_struct.csv" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_struct.json` + +```json +[ + {"id":1, "c_struct":{"f1":10, "f2":3.14, "f3":"Emily"}}, + {"id":2, "c_struct":{"f1":4, "f2":1.5, "f3":null}}, + {"id":3, "c_struct":{"f1":7, "f2":null, "f3":"Benjamin"}}, + {"id":4, "c_struct":{}}, + {"id":5, "c_struct":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_struct" \ + -H "strip_outer_array:true" \ + -T "test_struct.json" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.00 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json index 7661356d9cf..3bb2abeaad1 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json @@ -111,6 +111,10 @@ "message": "导入方式", "description": "The label for category Import Way in sidebar docs" }, + "sidebar.docs.category.Complex Data Types": { + "message": "复杂数据类型", + "description": "The label for category Complex Data Types in sidebar docs" + }, "sidebar.docs.category.Updating Data": { "message": "数据更新", "description": "The label for category Updating Data in sidebar docs" diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/array.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/array.md new file mode 100644 index 00000000000..e67fa40cfbe --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/array.md @@ -0,0 +1,147 @@ +--- +{ + "title": "ARRAY", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`ARRAY<T>` 表示由 T 类型元素组成的数组,不能作为 key 列使用。 + +- 2.0 之前仅支持在 Duplicate 模型的表中使用。 +- 从 2.0 版本开始支持在 Unique 模型的表中的非 key 列使用。 + +T 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_array.csv` +其中分隔符使用 `|` 而不是逗号,以便和 array 中的逗号区分。 + +``` +1|[1,2,3,4,5] +2|[6,7,8] +3|[] +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_array" \ + -T "test_array.csv" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_array.json` + +```json +[ + {"id":1, "c_array":[1,2,3,4,5]}, + {"id":2, "c_array":[6,7,8]}, + {"id":3, "c_array":[]}, + {"id":4, "c_array":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_array" \ + -H "strip_outer_array:true" \ + -T "test_array.json" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/json.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/json.md new file mode 100644 index 00000000000..a6a825dc0a5 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/json.md @@ -0,0 +1,152 @@ +--- +{ + "title": "JSON", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`JSON` 数据类型,用二进制格式高效存储 JSON 数据,通过 JSON 函数访问其内部字段。 + +默认支持 1048576 字节(1 MB),可调大到 2147483643 字节(2 GB),可通过 BE 配置`string_type_length_soft_limit_bytes` 调整。 + +与普通 String 类型存储的 JSON 字符串相比,JSON 类型有两点优势 + +1. 数据写入时进行 JSON 格式校验 +2. 二进制存储格式更加高效,通过json_extract等函数可以高效访问JSON内部字段,比get_json_xx函数快几倍 + +:::caution[注意] +在1.2.x版本中,JSON 类型的名字是 JSONB,为了尽量跟 MySQL 兼容,从 2.0.0 版本开始改名为 JSON,老的表仍然可以使用。 +::: + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_json.csv` +其中分隔符使用 `|` 而不是逗号,以便和 json 中的逗号区分。 + +``` +1|{"name": "tom", "age": 35} +2|{"name": null, "age": 28} +3|{"name": "micheal", "age": null} +4|{"name": null, "age": null} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_json" \ + -T "test_json.csv" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | null | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_json.json` + +```json +[ + {"id": 1, "c_json": {"name": "tom", "age": 35}}, + {"id": 2, "c_json": {"name": null, "age": 28}}, + {"id": 3, "c_json": {"name": "micheal", "age": null}}, + {"id": 4, "c_json": {"name": null, "age": null}}, + {"id": 5, "c_json": null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_json" \ + -H "strip_outer_array:true" \ + -T "test_json.json" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | NULL | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/map.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/map.md new file mode 100644 index 00000000000..0ca127503dd --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/map.md @@ -0,0 +1,146 @@ +--- +{ + "title": "MAP", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`MAP<K, V>` 表示由K, V类型元素组成的 map,不能作为 key 列使用。 + +- 目前支持在 Duplicate,Unique 模型的表中使用。 + +K, V 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_map.csv` +其中分隔符使用 `|` 而不是逗号,以便和 map 中的逗号区分。 + +``` +1|{"Emily":101,"age":25} +2|{"Benjamin":102} +3|{} +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_map" \ + -T "test_map.csv" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_map.json` + +```json +[ + {"id":1, "c_map":{"Emily":101, "age":25}}, + {"id":2, "c_map":{"Benjamin":102}}, + {"id":3, "c_map":{}}, + {"id":4, "c_map":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_map" \ + -H "strip_outer_array:true" \ + -T "test_map.json" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/struct.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/struct.md new file mode 100644 index 00000000000..19d46f40b56 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/complex-types/struct.md @@ -0,0 +1,154 @@ +--- +{ + "title": "STRUCT", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` 表示由多个 Field 组成的结构体,也可被理解为多个列的集合。 + +- 不能作为 Key 使用,目前 STRUCT 仅支持在 Duplicate 模型的表中使用。 +- 一个 Struct 中的 Field 的名字和数量固定,总是为 Nullable,一个 Field 通常由下面部分组成。 + - field_name: Field 的标识符,不可重复 + - field_type: Field 的类型 + - COMMENT: Field 的注释,可选 (暂不支持) + +当前可支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_struct.csv` +其中分隔符使用 `|` 而不是逗号,以便和 struct 中的逗号区分。 + +``` +1|{10, 3.14, "Emily"} +2|{4, 1.5, null} +3|{7, null, "Benjamin"} +4|{} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_struct" \ + -T "test_struct.csv" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_struct.json` + +```json +[ + {"id":1, "c_struct":{"f1":10, "f2":3.14, "f3":"Emily"}}, + {"id":2, "c_struct":{"f1":4, "f2":1.5, "f3":null}}, + {"id":3, "c_struct":{"f1":7, "f2":null, "f3":"Benjamin"}}, + {"id":4, "c_struct":{}}, + {"id":5, "c_struct":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_struct" \ + -H "strip_outer_array:true" \ + -T "test_struct.json" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.00 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json index f89507adb77..5b954ab391a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json @@ -111,6 +111,10 @@ "message": "导入方式", "description": "The label for category Import Way in sidebar docs" }, + "sidebar.docs.category.Complex Data Types": { + "message": "复杂数据类型", + "description": "The label for category Complex Data Types in sidebar docs" + }, "sidebar.docs.category.Updating Data": { "message": "数据更新", "description": "The label for category Update in sidebar docs" diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/array.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/array.md new file mode 100644 index 00000000000..e67fa40cfbe --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/array.md @@ -0,0 +1,147 @@ +--- +{ + "title": "ARRAY", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`ARRAY<T>` 表示由 T 类型元素组成的数组,不能作为 key 列使用。 + +- 2.0 之前仅支持在 Duplicate 模型的表中使用。 +- 从 2.0 版本开始支持在 Unique 模型的表中的非 key 列使用。 + +T 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_array.csv` +其中分隔符使用 `|` 而不是逗号,以便和 array 中的逗号区分。 + +``` +1|[1,2,3,4,5] +2|[6,7,8] +3|[] +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_array" \ + -T "test_array.csv" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_array.json` + +```json +[ + {"id":1, "c_array":[1,2,3,4,5]}, + {"id":2, "c_array":[6,7,8]}, + {"id":3, "c_array":[]}, + {"id":4, "c_array":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_array" \ + -H "strip_outer_array:true" \ + -T "test_array.json" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/json.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/json.md new file mode 100644 index 00000000000..a6a825dc0a5 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/json.md @@ -0,0 +1,152 @@ +--- +{ + "title": "JSON", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`JSON` 数据类型,用二进制格式高效存储 JSON 数据,通过 JSON 函数访问其内部字段。 + +默认支持 1048576 字节(1 MB),可调大到 2147483643 字节(2 GB),可通过 BE 配置`string_type_length_soft_limit_bytes` 调整。 + +与普通 String 类型存储的 JSON 字符串相比,JSON 类型有两点优势 + +1. 数据写入时进行 JSON 格式校验 +2. 二进制存储格式更加高效,通过json_extract等函数可以高效访问JSON内部字段,比get_json_xx函数快几倍 + +:::caution[注意] +在1.2.x版本中,JSON 类型的名字是 JSONB,为了尽量跟 MySQL 兼容,从 2.0.0 版本开始改名为 JSON,老的表仍然可以使用。 +::: + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_json.csv` +其中分隔符使用 `|` 而不是逗号,以便和 json 中的逗号区分。 + +``` +1|{"name": "tom", "age": 35} +2|{"name": null, "age": 28} +3|{"name": "micheal", "age": null} +4|{"name": null, "age": null} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_json" \ + -T "test_json.csv" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | null | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_json.json` + +```json +[ + {"id": 1, "c_json": {"name": "tom", "age": 35}}, + {"id": 2, "c_json": {"name": null, "age": 28}}, + {"id": 3, "c_json": {"name": "micheal", "age": null}}, + {"id": 4, "c_json": {"name": null, "age": null}}, + {"id": 5, "c_json": null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_json" \ + -H "strip_outer_array:true" \ + -T "test_json.json" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | NULL | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/map.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/map.md new file mode 100644 index 00000000000..0ca127503dd --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/map.md @@ -0,0 +1,146 @@ +--- +{ + "title": "MAP", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`MAP<K, V>` 表示由K, V类型元素组成的 map,不能作为 key 列使用。 + +- 目前支持在 Duplicate,Unique 模型的表中使用。 + +K, V 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_map.csv` +其中分隔符使用 `|` 而不是逗号,以便和 map 中的逗号区分。 + +``` +1|{"Emily":101,"age":25} +2|{"Benjamin":102} +3|{} +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_map" \ + -T "test_map.csv" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_map.json` + +```json +[ + {"id":1, "c_map":{"Emily":101, "age":25}}, + {"id":2, "c_map":{"Benjamin":102}}, + {"id":3, "c_map":{}}, + {"id":4, "c_map":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_map" \ + -H "strip_outer_array:true" \ + -T "test_map.json" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/struct.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/struct.md new file mode 100644 index 00000000000..19d46f40b56 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/complex-types/struct.md @@ -0,0 +1,154 @@ +--- +{ + "title": "STRUCT", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` 表示由多个 Field 组成的结构体,也可被理解为多个列的集合。 + +- 不能作为 Key 使用,目前 STRUCT 仅支持在 Duplicate 模型的表中使用。 +- 一个 Struct 中的 Field 的名字和数量固定,总是为 Nullable,一个 Field 通常由下面部分组成。 + - field_name: Field 的标识符,不可重复 + - field_type: Field 的类型 + - COMMENT: Field 的注释,可选 (暂不支持) + +当前可支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_struct.csv` +其中分隔符使用 `|` 而不是逗号,以便和 struct 中的逗号区分。 + +``` +1|{10, 3.14, "Emily"} +2|{4, 1.5, null} +3|{7, null, "Benjamin"} +4|{} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_struct" \ + -T "test_struct.csv" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_struct.json` + +```json +[ + {"id":1, "c_struct":{"f1":10, "f2":3.14, "f3":"Emily"}}, + {"id":2, "c_struct":{"f1":4, "f2":1.5, "f3":null}}, + {"id":3, "c_struct":{"f1":7, "f2":null, "f3":"Benjamin"}}, + {"id":4, "c_struct":{}}, + {"id":5, "c_struct":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_struct" \ + -H "strip_outer_array:true" \ + -T "test_struct.json" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.00 sec) +``` diff --git a/sidebars.json b/sidebars.json index 56b276706aa..11c49e7c948 100644 --- a/sidebars.json +++ b/sidebars.json @@ -214,6 +214,16 @@ "data-operate/import/import-way/mysql-load-manual" ] }, + { + "type": "category", + "label": "Complex Data Types", + "items": [ + "data-operate/import/complex-types/array", + "data-operate/import/complex-types/map", + "data-operate/import/complex-types/struct", + "data-operate/import/complex-types/json" + ] + }, "data-operate/import/load-data-format", "data-operate/import/error-data-handling", "data-operate/import/load-data-convert", diff --git a/versioned_docs/version-2.1/data-operate/import/complex-types/array.md b/versioned_docs/version-2.1/data-operate/import/complex-types/array.md new file mode 100644 index 00000000000..e67fa40cfbe --- /dev/null +++ b/versioned_docs/version-2.1/data-operate/import/complex-types/array.md @@ -0,0 +1,147 @@ +--- +{ + "title": "ARRAY", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`ARRAY<T>` 表示由 T 类型元素组成的数组,不能作为 key 列使用。 + +- 2.0 之前仅支持在 Duplicate 模型的表中使用。 +- 从 2.0 版本开始支持在 Unique 模型的表中的非 key 列使用。 + +T 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_array.csv` +其中分隔符使用 `|` 而不是逗号,以便和 array 中的逗号区分。 + +``` +1|[1,2,3,4,5] +2|[6,7,8] +3|[] +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_array" \ + -T "test_array.csv" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_array.json` + +```json +[ + {"id":1, "c_array":[1,2,3,4,5]}, + {"id":2, "c_array":[6,7,8]}, + {"id":3, "c_array":[]}, + {"id":4, "c_array":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_array" \ + -H "strip_outer_array:true" \ + -T "test_array.json" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` diff --git a/versioned_docs/version-2.1/data-operate/import/complex-types/json.md b/versioned_docs/version-2.1/data-operate/import/complex-types/json.md new file mode 100644 index 00000000000..a6a825dc0a5 --- /dev/null +++ b/versioned_docs/version-2.1/data-operate/import/complex-types/json.md @@ -0,0 +1,152 @@ +--- +{ + "title": "JSON", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`JSON` 数据类型,用二进制格式高效存储 JSON 数据,通过 JSON 函数访问其内部字段。 + +默认支持 1048576 字节(1 MB),可调大到 2147483643 字节(2 GB),可通过 BE 配置`string_type_length_soft_limit_bytes` 调整。 + +与普通 String 类型存储的 JSON 字符串相比,JSON 类型有两点优势 + +1. 数据写入时进行 JSON 格式校验 +2. 二进制存储格式更加高效,通过json_extract等函数可以高效访问JSON内部字段,比get_json_xx函数快几倍 + +:::caution[注意] +在1.2.x版本中,JSON 类型的名字是 JSONB,为了尽量跟 MySQL 兼容,从 2.0.0 版本开始改名为 JSON,老的表仍然可以使用。 +::: + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_json.csv` +其中分隔符使用 `|` 而不是逗号,以便和 json 中的逗号区分。 + +``` +1|{"name": "tom", "age": 35} +2|{"name": null, "age": 28} +3|{"name": "micheal", "age": null} +4|{"name": null, "age": null} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_json" \ + -T "test_json.csv" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | null | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_json.json` + +```json +[ + {"id": 1, "c_json": {"name": "tom", "age": 35}}, + {"id": 2, "c_json": {"name": null, "age": 28}}, + {"id": 3, "c_json": {"name": "micheal", "age": null}}, + {"id": 4, "c_json": {"name": null, "age": null}}, + {"id": 5, "c_json": null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_json" \ + -H "strip_outer_array:true" \ + -T "test_json.json" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | NULL | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` diff --git a/versioned_docs/version-2.1/data-operate/import/complex-types/map.md b/versioned_docs/version-2.1/data-operate/import/complex-types/map.md new file mode 100644 index 00000000000..0ca127503dd --- /dev/null +++ b/versioned_docs/version-2.1/data-operate/import/complex-types/map.md @@ -0,0 +1,146 @@ +--- +{ + "title": "MAP", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`MAP<K, V>` 表示由K, V类型元素组成的 map,不能作为 key 列使用。 + +- 目前支持在 Duplicate,Unique 模型的表中使用。 + +K, V 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_map.csv` +其中分隔符使用 `|` 而不是逗号,以便和 map 中的逗号区分。 + +``` +1|{"Emily":101,"age":25} +2|{"Benjamin":102} +3|{} +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_map" \ + -T "test_map.csv" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_map.json` + +```json +[ + {"id":1, "c_map":{"Emily":101, "age":25}}, + {"id":2, "c_map":{"Benjamin":102}}, + {"id":3, "c_map":{}}, + {"id":4, "c_map":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_map" \ + -H "strip_outer_array:true" \ + -T "test_map.json" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` diff --git a/versioned_docs/version-2.1/data-operate/import/complex-types/struct.md b/versioned_docs/version-2.1/data-operate/import/complex-types/struct.md new file mode 100644 index 00000000000..19d46f40b56 --- /dev/null +++ b/versioned_docs/version-2.1/data-operate/import/complex-types/struct.md @@ -0,0 +1,154 @@ +--- +{ + "title": "STRUCT", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` 表示由多个 Field 组成的结构体,也可被理解为多个列的集合。 + +- 不能作为 Key 使用,目前 STRUCT 仅支持在 Duplicate 模型的表中使用。 +- 一个 Struct 中的 Field 的名字和数量固定,总是为 Nullable,一个 Field 通常由下面部分组成。 + - field_name: Field 的标识符,不可重复 + - field_type: Field 的类型 + - COMMENT: Field 的注释,可选 (暂不支持) + +当前可支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_struct.csv` +其中分隔符使用 `|` 而不是逗号,以便和 struct 中的逗号区分。 + +``` +1|{10, 3.14, "Emily"} +2|{4, 1.5, null} +3|{7, null, "Benjamin"} +4|{} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_struct" \ + -T "test_struct.csv" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_struct.json` + +```json +[ + {"id":1, "c_struct":{"f1":10, "f2":3.14, "f3":"Emily"}}, + {"id":2, "c_struct":{"f1":4, "f2":1.5, "f3":null}}, + {"id":3, "c_struct":{"f1":7, "f2":null, "f3":"Benjamin"}}, + {"id":4, "c_struct":{}}, + {"id":5, "c_struct":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_struct" \ + -H "strip_outer_array:true" \ + -T "test_struct.json" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.00 sec) +``` diff --git a/versioned_docs/version-3.0/data-operate/import/complex-types/array.md b/versioned_docs/version-3.0/data-operate/import/complex-types/array.md new file mode 100644 index 00000000000..e67fa40cfbe --- /dev/null +++ b/versioned_docs/version-3.0/data-operate/import/complex-types/array.md @@ -0,0 +1,147 @@ +--- +{ + "title": "ARRAY", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`ARRAY<T>` 表示由 T 类型元素组成的数组,不能作为 key 列使用。 + +- 2.0 之前仅支持在 Duplicate 模型的表中使用。 +- 从 2.0 版本开始支持在 Unique 模型的表中的非 key 列使用。 + +T 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_array.csv` +其中分隔符使用 `|` 而不是逗号,以便和 array 中的逗号区分。 + +``` +1|[1,2,3,4,5] +2|[6,7,8] +3|[] +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_array" \ + -T "test_array.csv" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_array.json` + +```json +[ + {"id":1, "c_array":[1,2,3,4,5]}, + {"id":2, "c_array":[6,7,8]}, + {"id":3, "c_array":[]}, + {"id":4, "c_array":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE `array_test` ( + `id` INT NOT NULL, + `c_array` ARRAY<INT> NULL +) +DUPLICATE KEY(`id`) +DISTRIBUTED BY HASH(`id`) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_array" \ + -H "strip_outer_array:true" \ + -T "test_array.json" \ + http://localhost:8040/api/testdb/array_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM array_test; ++------+-----------------+ +| id | c_array | ++------+-----------------+ +| 1 | [1, 2, 3, 4, 5] | +| 2 | [6, 7, 8] | +| 3 | [] | +| 4 | NULL | ++------+-----------------+ +4 rows in set (0.01 sec) +``` diff --git a/versioned_docs/version-3.0/data-operate/import/complex-types/json.md b/versioned_docs/version-3.0/data-operate/import/complex-types/json.md new file mode 100644 index 00000000000..a6a825dc0a5 --- /dev/null +++ b/versioned_docs/version-3.0/data-operate/import/complex-types/json.md @@ -0,0 +1,152 @@ +--- +{ + "title": "JSON", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`JSON` 数据类型,用二进制格式高效存储 JSON 数据,通过 JSON 函数访问其内部字段。 + +默认支持 1048576 字节(1 MB),可调大到 2147483643 字节(2 GB),可通过 BE 配置`string_type_length_soft_limit_bytes` 调整。 + +与普通 String 类型存储的 JSON 字符串相比,JSON 类型有两点优势 + +1. 数据写入时进行 JSON 格式校验 +2. 二进制存储格式更加高效,通过json_extract等函数可以高效访问JSON内部字段,比get_json_xx函数快几倍 + +:::caution[注意] +在1.2.x版本中,JSON 类型的名字是 JSONB,为了尽量跟 MySQL 兼容,从 2.0.0 版本开始改名为 JSON,老的表仍然可以使用。 +::: + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_json.csv` +其中分隔符使用 `|` 而不是逗号,以便和 json 中的逗号区分。 + +``` +1|{"name": "tom", "age": 35} +2|{"name": null, "age": 28} +3|{"name": "micheal", "age": null} +4|{"name": null, "age": null} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_json" \ + -T "test_json.csv" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | null | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_json.json` + +```json +[ + {"id": 1, "c_json": {"name": "tom", "age": 35}}, + {"id": 2, "c_json": {"name": null, "age": 28}}, + {"id": 3, "c_json": {"name": "micheal", "age": null}}, + {"id": 4, "c_json": {"name": null, "age": null}}, + {"id": 5, "c_json": null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE json_test ( + id INT NOT NULL, + c_json JSON NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_json" \ + -H "strip_outer_array:true" \ + -T "test_json.json" \ + http://localhost:8040/api/testdb/json_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM json_test; ++------+-------------------------------+ +| id | c_json | ++------+-------------------------------+ +| 1 | {"name":"tom","age":35} | +| 2 | {"name":null,"age":28} | +| 3 | {"name":"micheal","age":null} | +| 4 | {"name":null,"age":null} | +| 5 | NULL | ++------+-------------------------------+ +5 rows in set (0.01 sec) +``` diff --git a/versioned_docs/version-3.0/data-operate/import/complex-types/map.md b/versioned_docs/version-3.0/data-operate/import/complex-types/map.md new file mode 100644 index 00000000000..0ca127503dd --- /dev/null +++ b/versioned_docs/version-3.0/data-operate/import/complex-types/map.md @@ -0,0 +1,146 @@ +--- +{ + "title": "MAP", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`MAP<K, V>` 表示由K, V类型元素组成的 map,不能作为 key 列使用。 + +- 目前支持在 Duplicate,Unique 模型的表中使用。 + +K, V 支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_map.csv` +其中分隔符使用 `|` 而不是逗号,以便和 map 中的逗号区分。 + +``` +1|{"Emily":101,"age":25} +2|{"Benjamin":102} +3|{} +4|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_map" \ + -T "test_map.csv" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_map.json` + +```json +[ + {"id":1, "c_map":{"Emily":101, "age":25}}, + {"id":2, "c_map":{"Benjamin":102}}, + {"id":3, "c_map":{}}, + {"id":4, "c_map":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE map_test ( + id INT NOT NULL, + c_map MAP<STRING, INT> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_map" \ + -H "strip_outer_array:true" \ + -T "test_map.json" \ + http://localhost:8040/api/testdb/map_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM map_test; ++------+-------------------------+ +| id | c_map | ++------+-------------------------+ +| 1 | {"Emily":101, "age":25} | +| 2 | {"Benjamin":102} | +| 3 | {} | +| 4 | NULL | ++------+-------------------------+ +4 rows in set (0.01 sec) +``` diff --git a/versioned_docs/version-3.0/data-operate/import/complex-types/struct.md b/versioned_docs/version-3.0/data-operate/import/complex-types/struct.md new file mode 100644 index 00000000000..19d46f40b56 --- /dev/null +++ b/versioned_docs/version-3.0/data-operate/import/complex-types/struct.md @@ -0,0 +1,154 @@ +--- +{ + "title": "STRUCT", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` 表示由多个 Field 组成的结构体,也可被理解为多个列的集合。 + +- 不能作为 Key 使用,目前 STRUCT 仅支持在 Duplicate 模型的表中使用。 +- 一个 Struct 中的 Field 的名字和数量固定,总是为 Nullable,一个 Field 通常由下面部分组成。 + - field_name: Field 的标识符,不可重复 + - field_type: Field 的类型 + - COMMENT: Field 的注释,可选 (暂不支持) + +当前可支持的类型有: + +```sql +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, +DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +``` + +## CSV格式导入 + +### 第 1 步:准备数据 + +创建如下的 csv 文件:`test_struct.csv` +其中分隔符使用 `|` 而不是逗号,以便和 struct 中的逗号区分。 + +``` +1|{10, 3.14, "Emily"} +2|{4, 1.5, null} +3|{7, null, "Benjamin"} +4|{} +5|null +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "column_separator:|" \ + -H "columns: id, c_struct" \ + -T "test_struct.csv" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.01 sec) +``` + +## JSON格式导入 + +### 第 1 步:准备数据 + +创建如下的 JSON 文件,`test_struct.json` + +```json +[ + {"id":1, "c_struct":{"f1":10, "f2":3.14, "f3":"Emily"}}, + {"id":2, "c_struct":{"f1":4, "f2":1.5, "f3":null}}, + {"id":3, "c_struct":{"f1":7, "f2":null, "f3":"Benjamin"}}, + {"id":4, "c_struct":{}}, + {"id":5, "c_struct":null} +] +``` + +### 第 2 步:在数据库中建表 + +```sql +CREATE TABLE struct_test ( + id INT NOT NULL, + c_struct STRUCT<f1:INT,f2:FLOAT,f3:STRING> NULL +) +DUPLICATE KEY(id) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" +); +``` + +### 第 3 步:导入数据 + +```bash +curl --location-trusted \ + -u "root":"" \ + -H "format:json" \ + -H "columns: id, c_struct" \ + -H "strip_outer_array:true" \ + -T "test_struct.json" \ + http://localhost:8040/api/testdb/struct_test/_stream_load +``` + +### 第 4 步:检查导入数据 + +```sql +mysql> SELECT * FROM struct_test; ++------+--------------------------------------+ +| id | c_struct | ++------+--------------------------------------+ +| 1 | {"f1":10, "f2":3.14, "f3":"Emily"} | +| 2 | {"f1":4, "f2":1.5, "f3":null} | +| 3 | {"f1":7, "f2":null, "f3":"Benjamin"} | +| 4 | {"f1":null, "f2":null, "f3":null} | +| 5 | NULL | ++------+--------------------------------------+ +5 rows in set (0.00 sec) +``` diff --git a/versioned_sidebars/version-2.1-sidebars.json b/versioned_sidebars/version-2.1-sidebars.json index 51625eb3df6..4b860211e7c 100644 --- a/versioned_sidebars/version-2.1-sidebars.json +++ b/versioned_sidebars/version-2.1-sidebars.json @@ -188,6 +188,16 @@ "data-operate/import/import-way/mysql-load-manual" ] }, + { + "type": "category", + "label": "Complex Data Types", + "items": [ + "data-operate/import/complex-types/array", + "data-operate/import/complex-types/map", + "data-operate/import/complex-types/struct", + "data-operate/import/complex-types/json" + ] + }, "data-operate/import/load-data-format", "data-operate/import/error-data-handling", "data-operate/import/load-data-convert", diff --git a/versioned_sidebars/version-3.0-sidebars.json b/versioned_sidebars/version-3.0-sidebars.json index 12131db510f..c19272f7781 100644 --- a/versioned_sidebars/version-3.0-sidebars.json +++ b/versioned_sidebars/version-3.0-sidebars.json @@ -201,6 +201,16 @@ "data-operate/import/import-way/mysql-load-manual" ] }, + { + "type": "category", + "label": "Complex Data Types", + "items": [ + "data-operate/import/complex-types/array", + "data-operate/import/complex-types/map", + "data-operate/import/complex-types/struct", + "data-operate/import/complex-types/json" + ] + }, "data-operate/import/load-data-format", "data-operate/import/error-data-handling", "data-operate/import/load-data-convert", --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org