This is an automated email from the ASF dual-hosted git repository. kassiez pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new d3687b0136a [fix](docs) Fix complex types docs of en version (#1982) d3687b0136a is described below commit d3687b0136acce1a2528bb52863f9043d75314f5 Author: KassieZ <139741991+kass...@users.noreply.github.com> AuthorDate: Fri Feb 7 11:12:12 2025 +0800 [fix](docs) Fix complex types docs of en version (#1982) ## Versions - [ ] dev - [x] 3.0 - [x] 2.1 - [ ] 2.0 ## Languages - [ ] Chinese - [x] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --- docs/data-operate/import/handling-messy-data.md | 13 +++--- .../data-operate/import/complex-types/array.md | 40 +++++++++--------- .../data-operate/import/complex-types/json.md | 43 +++++++++---------- .../data-operate/import/complex-types/map.md | 38 ++++++++--------- .../data-operate/import/complex-types/struct.md | 48 +++++++++++----------- .../data-operate/import/handling-messy-data.md | 13 +++--- .../data-operate/import/complex-types/array.md | 40 +++++++++--------- .../data-operate/import/complex-types/json.md | 43 +++++++++---------- .../data-operate/import/complex-types/map.md | 38 ++++++++--------- .../data-operate/import/complex-types/struct.md | 48 +++++++++++----------- .../data-operate/import/handling-messy-data.md | 13 +++--- 11 files changed, 190 insertions(+), 187 deletions(-) diff --git a/docs/data-operate/import/handling-messy-data.md b/docs/data-operate/import/handling-messy-data.md index 8325e1eac58..6688d27e4bd 100644 --- a/docs/data-operate/import/handling-messy-data.md +++ b/docs/data-operate/import/handling-messy-data.md @@ -1,6 +1,6 @@ --- { - "title": "Handling Messy Data", + "title": "Handling Data Issues", "language": "en-US" } --- @@ -24,12 +24,15 @@ specific language governing permissions and limitations under the License. --> -During data ingestion, discrepancies may arise between source and target column data types. Although the load process attempts to convert these inconsistent types, issues such as type mismatches, field length exceeding limits, or precision mismatches may occur, resulting in conversion failures. -To address these exceptional cases, Doris provides two essential control parameters: +When loading data, sometimes the types of data in the source and target columns don't match. The system tries to fix these mismatches, but problems like wrong types, too long fields, or wrong precision can cause errors. -- Strict Mode (strict_mode): Regulates whether to filter out rows with conversion failures -- Maximum Filter Ratio (max_filter_ratio): Defines the maximum allowable ratio of filtered data to total data during load +To deal with these problems, Doris has two key settings: + +- Strict Mode (strict_mode): Decides if rows with errors should be removed. +- Max Filter Ratio (max_filter_ratio): Sets the highest allowed percentage of data that can be removed during loading. + +This makes it easier to handle data loading problems and keeps data management strong and simple. ## Strict Mode diff --git a/versioned_docs/version-2.1/data-operate/import/complex-types/array.md b/versioned_docs/version-2.1/data-operate/import/complex-types/array.md index e67fa40cfbe..a079606b5e0 100644 --- a/versioned_docs/version-2.1/data-operate/import/complex-types/array.md +++ b/versioned_docs/version-2.1/data-operate/import/complex-types/array.md @@ -1,7 +1,7 @@ --- { "title": "ARRAY", - "language": "zh-CN" + "language": "en" } --- @@ -24,24 +24,24 @@ specific language governing permissions and limitations under the License. --> -`ARRAY<T>` 表示由 T 类型元素组成的数组,不能作为 key 列使用。 +`ARRAY<T>` An array of T-type items, it cannot be used as a key column. -- 2.0 之前仅支持在 Duplicate 模型的表中使用。 -- 从 2.0 版本开始支持在 Unique 模型的表中的非 key 列使用。 +- Before version 2.0, it was only supported in the Duplicate model table. +- Starting from version 2.0, it is supported in the non-key columns of the Unique model table. -T 支持的类型有: +T-type could be any of: ```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, -DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, +DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING ``` -## CSV格式导入 +## CSV format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 csv 文件:`test_array.csv` -其中分隔符使用 `|` 而不是逗号,以便和 array 中的逗号区分。 +Create the following csv file: `test_array.csv` +The separator is `|` instead of comma to distinguish it from the comma in array. ``` 1|[1,2,3,4,5] @@ -50,7 +50,7 @@ DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING 4|null ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE `array_test` ( @@ -64,7 +64,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -75,7 +75,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/array_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM array_test; @@ -90,11 +90,11 @@ mysql> SELECT * FROM array_test; 4 rows in set (0.01 sec) ``` -## JSON格式导入 +## JSON format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 JSON 文件,`test_array.json` +Create the following JSON file, `test_array.json` ```json [ @@ -105,7 +105,7 @@ mysql> SELECT * FROM array_test; ] ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE `array_test` ( @@ -119,7 +119,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -131,7 +131,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/array_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM array_test; diff --git a/versioned_docs/version-2.1/data-operate/import/complex-types/json.md b/versioned_docs/version-2.1/data-operate/import/complex-types/json.md index a6a825dc0a5..faf949d19f7 100644 --- a/versioned_docs/version-2.1/data-operate/import/complex-types/json.md +++ b/versioned_docs/version-2.1/data-operate/import/complex-types/json.md @@ -1,7 +1,7 @@ --- { "title": "JSON", - "language": "zh-CN" + "language": "en" } --- @@ -24,25 +24,22 @@ specific language governing permissions and limitations under the License. --> -`JSON` 数据类型,用二进制格式高效存储 JSON 数据,通过 JSON 函数访问其内部字段。 +The JSON data type stores JSON data efficiently in a binary format and allows access to its internal fields through JSON functions. -默认支持 1048576 字节(1 MB),可调大到 2147483643 字节(2 GB),可通过 BE 配置`string_type_length_soft_limit_bytes` 调整。 +By default, it supports up to 1048576 bytes (1MB), and can be increased up to 2147483643 bytes (2GB). This can be adjusted via the string_type_length_soft_limit_bytes configuration. -与普通 String 类型存储的 JSON 字符串相比,JSON 类型有两点优势 +Compared to storing JSON strings in a regular STRING type, the JSON type has two main advantages: -1. 数据写入时进行 JSON 格式校验 -2. 二进制存储格式更加高效,通过json_extract等函数可以高效访问JSON内部字段,比get_json_xx函数快几倍 +JSON format validation during data insertion. +More efficient binary storage format, enabling faster access to JSON internal fields using functions like json_extract, compared to get_json_xx functions. +Note: In version 1.2.x, the JSON type was named JSONB. To maintain compatibility with MySQL, it was renamed to JSON starting from version 2.0.0. Older tables can still use the previous name. -:::caution[注意] -在1.2.x版本中,JSON 类型的名字是 JSONB,为了尽量跟 MySQL 兼容,从 2.0.0 版本开始改名为 JSON,老的表仍然可以使用。 -::: +## CSV format import -## CSV格式导入 +### Step 1: Prepare the data -### 第 1 步:准备数据 - -创建如下的 csv 文件:`test_json.csv` -其中分隔符使用 `|` 而不是逗号,以便和 json 中的逗号区分。 +Create the following csv file: `test_json.csv` +The separator is `|` instead of comma to distinguish it from the comma in json. ``` 1|{"name": "tom", "age": 35} @@ -52,7 +49,7 @@ under the License. 5|null ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE json_test ( @@ -66,7 +63,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -77,7 +74,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/json_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql SELECT * FROM json_test; @@ -93,11 +90,11 @@ SELECT * FROM json_test; 5 rows in set (0.01 sec) ``` -## JSON格式导入 +## JSON format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 JSON 文件,`test_json.json` +Create the following JSON file, `test_json.json` ```json [ @@ -109,7 +106,7 @@ SELECT * FROM json_test; ] ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE json_test ( @@ -123,7 +120,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -135,7 +132,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/json_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM json_test; diff --git a/versioned_docs/version-2.1/data-operate/import/complex-types/map.md b/versioned_docs/version-2.1/data-operate/import/complex-types/map.md index 0ca127503dd..256c388dca7 100644 --- a/versioned_docs/version-2.1/data-operate/import/complex-types/map.md +++ b/versioned_docs/version-2.1/data-operate/import/complex-types/map.md @@ -1,7 +1,7 @@ --- { "title": "MAP", - "language": "zh-CN" + "language": "en" } --- @@ -24,23 +24,21 @@ specific language governing permissions and limitations under the License. --> -`MAP<K, V>` 表示由K, V类型元素组成的 map,不能作为 key 列使用。 +`MAP<K, V>` A Map of K, V items, it cannot be used as a key column. Now MAP can only used in Duplicate and Unique Model Tables. -- 目前支持在 Duplicate,Unique 模型的表中使用。 - -K, V 支持的类型有: +K,V could be any of: ```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, -DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, +DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING ``` -## CSV格式导入 +## CSV format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 csv 文件:`test_map.csv` -其中分隔符使用 `|` 而不是逗号,以便和 map 中的逗号区分。 +Create the following csv file: `test_map.csv` +The separator is `|` instead of comma to distinguish it from the comma in map. ``` 1|{"Emily":101,"age":25} @@ -49,7 +47,7 @@ DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING 4|null ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE map_test ( @@ -63,7 +61,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -74,7 +72,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/map_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM map_test; @@ -89,11 +87,11 @@ mysql> SELECT * FROM map_test; 4 rows in set (0.01 sec) ``` -## JSON格式导入 +## JSON format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 JSON 文件,`test_map.json` +Create the following JSON file, `test_map.json` ```json [ @@ -104,7 +102,7 @@ mysql> SELECT * FROM map_test; ] ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE map_test ( @@ -118,7 +116,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -130,7 +128,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/map_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM map_test; diff --git a/versioned_docs/version-2.1/data-operate/import/complex-types/struct.md b/versioned_docs/version-2.1/data-operate/import/complex-types/struct.md index 19d46f40b56..9cfc23352d9 100644 --- a/versioned_docs/version-2.1/data-operate/import/complex-types/struct.md +++ b/versioned_docs/version-2.1/data-operate/import/complex-types/struct.md @@ -1,7 +1,7 @@ --- { "title": "STRUCT", - "language": "zh-CN" + "language": "en" } --- @@ -24,27 +24,29 @@ specific language governing permissions and limitations under the License. --> -`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` 表示由多个 Field 组成的结构体,也可被理解为多个列的集合。 +`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns. -- 不能作为 Key 使用,目前 STRUCT 仅支持在 Duplicate 模型的表中使用。 -- 一个 Struct 中的 Field 的名字和数量固定,总是为 Nullable,一个 Field 通常由下面部分组成。 - - field_name: Field 的标识符,不可重复 - - field_type: Field 的类型 - - COMMENT: Field 的注释,可选 (暂不支持) +- It cannot be used as a Key column. Now STRUCT can only used in Duplicate Model Tables. -当前可支持的类型有: +- The names and number of Fields in a Struct is fixed and always Nullable, and a Field typically consists of the following parts. + + - field_name: Identifier naming the field, non repeatable. + - field_type: A data type. + - COMMENT: An optional string describing the field. (currently not supported) + +The currently supported types are: ```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, -DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, +DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING ``` -## CSV格式导入 +## CSV format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 csv 文件:`test_struct.csv` -其中分隔符使用 `|` 而不是逗号,以便和 struct 中的逗号区分。 +Create the following csv file: `test_struct.csv` +The separator is `|` instead of comma to distinguish it from the comma in struct. ``` 1|{10, 3.14, "Emily"} @@ -54,7 +56,7 @@ DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING 5|null ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE struct_test ( @@ -68,7 +70,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -79,7 +81,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/struct_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM struct_test; @@ -95,11 +97,11 @@ mysql> SELECT * FROM struct_test; 5 rows in set (0.01 sec) ``` -## JSON格式导入 +## JSON format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 JSON 文件,`test_struct.json` +Create the following JSON file, `test_struct.json` ```json [ @@ -111,7 +113,7 @@ mysql> SELECT * FROM struct_test; ] ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE struct_test ( @@ -125,7 +127,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -137,7 +139,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/struct_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM struct_test; diff --git a/versioned_docs/version-2.1/data-operate/import/handling-messy-data.md b/versioned_docs/version-2.1/data-operate/import/handling-messy-data.md index 8325e1eac58..f35646ea818 100644 --- a/versioned_docs/version-2.1/data-operate/import/handling-messy-data.md +++ b/versioned_docs/version-2.1/data-operate/import/handling-messy-data.md @@ -1,6 +1,6 @@ --- { - "title": "Handling Messy Data", + "title": "Handling Data Issues", "language": "en-US" } --- @@ -24,12 +24,15 @@ specific language governing permissions and limitations under the License. --> -During data ingestion, discrepancies may arise between source and target column data types. Although the load process attempts to convert these inconsistent types, issues such as type mismatches, field length exceeding limits, or precision mismatches may occur, resulting in conversion failures. +When loading data, sometimes the types of data in the source and target columns don't match. The system tries to fix these mismatches, but problems like wrong types, too long fields, or wrong precision can cause errors. -To address these exceptional cases, Doris provides two essential control parameters: +To deal with these problems, Doris has two key settings: + +- Strict Mode (strict_mode): Decides if rows with errors should be removed. +- Max Filter Ratio (max_filter_ratio): Sets the highest allowed percentage of data that can be removed during loading. + +This makes it easier to handle data loading problems and keeps data management strong and simple. -- Strict Mode (strict_mode): Regulates whether to filter out rows with conversion failures -- Maximum Filter Ratio (max_filter_ratio): Defines the maximum allowable ratio of filtered data to total data during load ## Strict Mode diff --git a/versioned_docs/version-3.0/data-operate/import/complex-types/array.md b/versioned_docs/version-3.0/data-operate/import/complex-types/array.md index e67fa40cfbe..a079606b5e0 100644 --- a/versioned_docs/version-3.0/data-operate/import/complex-types/array.md +++ b/versioned_docs/version-3.0/data-operate/import/complex-types/array.md @@ -1,7 +1,7 @@ --- { "title": "ARRAY", - "language": "zh-CN" + "language": "en" } --- @@ -24,24 +24,24 @@ specific language governing permissions and limitations under the License. --> -`ARRAY<T>` 表示由 T 类型元素组成的数组,不能作为 key 列使用。 +`ARRAY<T>` An array of T-type items, it cannot be used as a key column. -- 2.0 之前仅支持在 Duplicate 模型的表中使用。 -- 从 2.0 版本开始支持在 Unique 模型的表中的非 key 列使用。 +- Before version 2.0, it was only supported in the Duplicate model table. +- Starting from version 2.0, it is supported in the non-key columns of the Unique model table. -T 支持的类型有: +T-type could be any of: ```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, -DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, +DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING ``` -## CSV格式导入 +## CSV format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 csv 文件:`test_array.csv` -其中分隔符使用 `|` 而不是逗号,以便和 array 中的逗号区分。 +Create the following csv file: `test_array.csv` +The separator is `|` instead of comma to distinguish it from the comma in array. ``` 1|[1,2,3,4,5] @@ -50,7 +50,7 @@ DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING 4|null ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE `array_test` ( @@ -64,7 +64,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -75,7 +75,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/array_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM array_test; @@ -90,11 +90,11 @@ mysql> SELECT * FROM array_test; 4 rows in set (0.01 sec) ``` -## JSON格式导入 +## JSON format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 JSON 文件,`test_array.json` +Create the following JSON file, `test_array.json` ```json [ @@ -105,7 +105,7 @@ mysql> SELECT * FROM array_test; ] ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE `array_test` ( @@ -119,7 +119,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -131,7 +131,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/array_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM array_test; diff --git a/versioned_docs/version-3.0/data-operate/import/complex-types/json.md b/versioned_docs/version-3.0/data-operate/import/complex-types/json.md index a6a825dc0a5..faf949d19f7 100644 --- a/versioned_docs/version-3.0/data-operate/import/complex-types/json.md +++ b/versioned_docs/version-3.0/data-operate/import/complex-types/json.md @@ -1,7 +1,7 @@ --- { "title": "JSON", - "language": "zh-CN" + "language": "en" } --- @@ -24,25 +24,22 @@ specific language governing permissions and limitations under the License. --> -`JSON` 数据类型,用二进制格式高效存储 JSON 数据,通过 JSON 函数访问其内部字段。 +The JSON data type stores JSON data efficiently in a binary format and allows access to its internal fields through JSON functions. -默认支持 1048576 字节(1 MB),可调大到 2147483643 字节(2 GB),可通过 BE 配置`string_type_length_soft_limit_bytes` 调整。 +By default, it supports up to 1048576 bytes (1MB), and can be increased up to 2147483643 bytes (2GB). This can be adjusted via the string_type_length_soft_limit_bytes configuration. -与普通 String 类型存储的 JSON 字符串相比,JSON 类型有两点优势 +Compared to storing JSON strings in a regular STRING type, the JSON type has two main advantages: -1. 数据写入时进行 JSON 格式校验 -2. 二进制存储格式更加高效,通过json_extract等函数可以高效访问JSON内部字段,比get_json_xx函数快几倍 +JSON format validation during data insertion. +More efficient binary storage format, enabling faster access to JSON internal fields using functions like json_extract, compared to get_json_xx functions. +Note: In version 1.2.x, the JSON type was named JSONB. To maintain compatibility with MySQL, it was renamed to JSON starting from version 2.0.0. Older tables can still use the previous name. -:::caution[注意] -在1.2.x版本中,JSON 类型的名字是 JSONB,为了尽量跟 MySQL 兼容,从 2.0.0 版本开始改名为 JSON,老的表仍然可以使用。 -::: +## CSV format import -## CSV格式导入 +### Step 1: Prepare the data -### 第 1 步:准备数据 - -创建如下的 csv 文件:`test_json.csv` -其中分隔符使用 `|` 而不是逗号,以便和 json 中的逗号区分。 +Create the following csv file: `test_json.csv` +The separator is `|` instead of comma to distinguish it from the comma in json. ``` 1|{"name": "tom", "age": 35} @@ -52,7 +49,7 @@ under the License. 5|null ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE json_test ( @@ -66,7 +63,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -77,7 +74,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/json_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql SELECT * FROM json_test; @@ -93,11 +90,11 @@ SELECT * FROM json_test; 5 rows in set (0.01 sec) ``` -## JSON格式导入 +## JSON format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 JSON 文件,`test_json.json` +Create the following JSON file, `test_json.json` ```json [ @@ -109,7 +106,7 @@ SELECT * FROM json_test; ] ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE json_test ( @@ -123,7 +120,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -135,7 +132,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/json_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM json_test; diff --git a/versioned_docs/version-3.0/data-operate/import/complex-types/map.md b/versioned_docs/version-3.0/data-operate/import/complex-types/map.md index 0ca127503dd..256c388dca7 100644 --- a/versioned_docs/version-3.0/data-operate/import/complex-types/map.md +++ b/versioned_docs/version-3.0/data-operate/import/complex-types/map.md @@ -1,7 +1,7 @@ --- { "title": "MAP", - "language": "zh-CN" + "language": "en" } --- @@ -24,23 +24,21 @@ specific language governing permissions and limitations under the License. --> -`MAP<K, V>` 表示由K, V类型元素组成的 map,不能作为 key 列使用。 +`MAP<K, V>` A Map of K, V items, it cannot be used as a key column. Now MAP can only used in Duplicate and Unique Model Tables. -- 目前支持在 Duplicate,Unique 模型的表中使用。 - -K, V 支持的类型有: +K,V could be any of: ```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, -DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, +DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING ``` -## CSV格式导入 +## CSV format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 csv 文件:`test_map.csv` -其中分隔符使用 `|` 而不是逗号,以便和 map 中的逗号区分。 +Create the following csv file: `test_map.csv` +The separator is `|` instead of comma to distinguish it from the comma in map. ``` 1|{"Emily":101,"age":25} @@ -49,7 +47,7 @@ DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING 4|null ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE map_test ( @@ -63,7 +61,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -74,7 +72,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/map_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM map_test; @@ -89,11 +87,11 @@ mysql> SELECT * FROM map_test; 4 rows in set (0.01 sec) ``` -## JSON格式导入 +## JSON format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 JSON 文件,`test_map.json` +Create the following JSON file, `test_map.json` ```json [ @@ -104,7 +102,7 @@ mysql> SELECT * FROM map_test; ] ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE map_test ( @@ -118,7 +116,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -130,7 +128,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/map_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM map_test; diff --git a/versioned_docs/version-3.0/data-operate/import/complex-types/struct.md b/versioned_docs/version-3.0/data-operate/import/complex-types/struct.md index 19d46f40b56..9cfc23352d9 100644 --- a/versioned_docs/version-3.0/data-operate/import/complex-types/struct.md +++ b/versioned_docs/version-3.0/data-operate/import/complex-types/struct.md @@ -1,7 +1,7 @@ --- { "title": "STRUCT", - "language": "zh-CN" + "language": "en" } --- @@ -24,27 +24,29 @@ specific language governing permissions and limitations under the License. --> -`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` 表示由多个 Field 组成的结构体,也可被理解为多个列的集合。 +`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns. -- 不能作为 Key 使用,目前 STRUCT 仅支持在 Duplicate 模型的表中使用。 -- 一个 Struct 中的 Field 的名字和数量固定,总是为 Nullable,一个 Field 通常由下面部分组成。 - - field_name: Field 的标识符,不可重复 - - field_type: Field 的类型 - - COMMENT: Field 的注释,可选 (暂不支持) +- It cannot be used as a Key column. Now STRUCT can only used in Duplicate Model Tables. -当前可支持的类型有: +- The names and number of Fields in a Struct is fixed and always Nullable, and a Field typically consists of the following parts. + + - field_name: Identifier naming the field, non repeatable. + - field_type: A data type. + - COMMENT: An optional string describing the field. (currently not supported) + +The currently supported types are: ```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, -DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING +BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, +DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING ``` -## CSV格式导入 +## CSV format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 csv 文件:`test_struct.csv` -其中分隔符使用 `|` 而不是逗号,以便和 struct 中的逗号区分。 +Create the following csv file: `test_struct.csv` +The separator is `|` instead of comma to distinguish it from the comma in struct. ``` 1|{10, 3.14, "Emily"} @@ -54,7 +56,7 @@ DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING 5|null ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE struct_test ( @@ -68,7 +70,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -79,7 +81,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/struct_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM struct_test; @@ -95,11 +97,11 @@ mysql> SELECT * FROM struct_test; 5 rows in set (0.01 sec) ``` -## JSON格式导入 +## JSON format import -### 第 1 步:准备数据 +### Step 1: Prepare the data -创建如下的 JSON 文件,`test_struct.json` +Create the following JSON file, `test_struct.json` ```json [ @@ -111,7 +113,7 @@ mysql> SELECT * FROM struct_test; ] ``` -### 第 2 步:在数据库中建表 +### Step 2: Create a table in the database ```sql CREATE TABLE struct_test ( @@ -125,7 +127,7 @@ PROPERTIES ( ); ``` -### 第 3 步:导入数据 +### Step 3: Load data ```bash curl --location-trusted \ @@ -137,7 +139,7 @@ curl --location-trusted \ http://localhost:8040/api/testdb/struct_test/_stream_load ``` -### 第 4 步:检查导入数据 +### Step 4: Check the imported data ```sql mysql> SELECT * FROM struct_test; diff --git a/versioned_docs/version-3.0/data-operate/import/handling-messy-data.md b/versioned_docs/version-3.0/data-operate/import/handling-messy-data.md index 8325e1eac58..6688d27e4bd 100644 --- a/versioned_docs/version-3.0/data-operate/import/handling-messy-data.md +++ b/versioned_docs/version-3.0/data-operate/import/handling-messy-data.md @@ -1,6 +1,6 @@ --- { - "title": "Handling Messy Data", + "title": "Handling Data Issues", "language": "en-US" } --- @@ -24,12 +24,15 @@ specific language governing permissions and limitations under the License. --> -During data ingestion, discrepancies may arise between source and target column data types. Although the load process attempts to convert these inconsistent types, issues such as type mismatches, field length exceeding limits, or precision mismatches may occur, resulting in conversion failures. -To address these exceptional cases, Doris provides two essential control parameters: +When loading data, sometimes the types of data in the source and target columns don't match. The system tries to fix these mismatches, but problems like wrong types, too long fields, or wrong precision can cause errors. -- Strict Mode (strict_mode): Regulates whether to filter out rows with conversion failures -- Maximum Filter Ratio (max_filter_ratio): Defines the maximum allowable ratio of filtered data to total data during load +To deal with these problems, Doris has two key settings: + +- Strict Mode (strict_mode): Decides if rows with errors should be removed. +- Max Filter Ratio (max_filter_ratio): Sets the highest allowed percentage of data that can be removed during loading. + +This makes it easier to handle data loading problems and keeps data management strong and simple. ## Strict Mode --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org