[doris-website] branch master updated: fix

jiafengzheng Sat, 22 Oct 2022 04:55:47 -0700

This is an automated email from the ASF dual-hosted git repository.

jiafengzheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new a0cd6239cb2 fix
a0cd6239cb2 is described below

commit a0cd6239cb26bd39bbb3572587559e9380dfbfb2
Author: jiafeng.zhang <zhang...@gmail.com>
AuthorDate: Sat Oct 22 19:55:29 2022 +0800

    fix
---
 .../Create/CREATE-TABLE.md                         |   3 +-
 .../Load/BROKER-LOAD.md                            |   9 ++
 .../Load/CREATE-ROUTINE-LOAD.md                    |   8 ++
 .../Load/STREAM-LOAD.md                            | 126 ++++++++++----------
 .../Load/BROKER-LOAD.md                            |   6 +
 .../Load/CREATE-ROUTINE-LOAD.md                    |   9 ++
 .../Load/STREAM-LOAD.md                            | 128 +++++++++++----------
 7 files changed, 163 insertions(+), 126 deletions(-)

diff --git 
a/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md
 
b/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md
index 32780c16efb..d99832c852e 100644
--- 
a/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md
+++ 
b/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md
@@ -230,7 +230,8 @@ distribution_desc
 
     Define the data bucketing method.
 
-    `DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num]`
+    1. Hash Syntax: `DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num]` Explain: 
Hash bucketing using the specified key column.
+    2. Random Syntax: `DISTRIBUTED BY RANDOM [BUCKETS num]` Explain: Use 
random numbers for bucketing.
 
     Suggestion: It is recommended to use random bucketing when there is no 
suitable key for hash bucketing to make the data of the table evenly 
distributed. 
 
diff --git 
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
 
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
index 881ff65a56e..d033478033e 100644
--- 
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
+++ 
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
@@ -167,6 +167,15 @@ WITH BROKER broker_name
   - `timezone`
 
     Specify the time zone for some functions that are affected by time zones, 
such as `strftime/alignment_timestamp/from_unixtime`, etc. Please refer to the 
[timezone](../../../../advanced/time-zone) documentation for details. If not 
specified, the "Asia/Shanghai" timezone is used
+    
+  - `send_batch_parallelism` : 
+  
+    Used to set the default parallelism for sending batch, if the value for 
parallelism exceed `max_send_batch_parallelism_per_job` in BE config, then the 
coordinator BE will use the value of `max_send_batch_parallelism_per_job`.
+  
+  - `load_to_single_tablet`
+  
+     Boolean type, True means that one task can only load data to one tablet 
in the corresponding partition at a time. The default value is false. The 
number of tasks for the job depends on the overall concurrency. This parameter 
can only be set when loading data into the OLAP table with random partition. 
+
 
 ### Example
 
diff --git 
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md
 
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md
index c6ec707cf9b..4988946c158 100644
--- 
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md
+++ 
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md
@@ -219,6 +219,14 @@ FROM data_source [data_source_properties]
 
      `-H "json_root: $.RECORDS"`
 
+  10. `send_batch_parallelism` :
+
+     Integer, Used to set the default parallelism for sending batch, if the 
value for parallelism exceed `max_send_batch_parallelism_per_job` in BE config, 
then the coordinator BE will use the value of 
`max_send_batch_parallelism_per_job`.
+
+  11. `load_to_single_tablet`
+
+      Boolean type, True means that one task can only load data to one tablet 
in the corresponding partition at a time. The default value is false. This 
parameter can only be set when loading data into the OLAP table with random 
partition.
+
 - `FROM data_source [data_source_properties]`
 
   The type of data source. Currently supports:
diff --git 
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md
 
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md
index e38fa2b684a..3e7ded9a1f7 100644
--- 
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md
+++ 
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md
@@ -53,19 +53,19 @@ In addition, it is best for users to set the content of the 
Expect Header field
 Parameter introduction:
         Users can pass in import parameters through the Header part of HTTP
 
-1. label: The label imported once, the data of the same label cannot be 
imported multiple times. Users can avoid the problem of duplicate data import 
by specifying Label.
+1. `label`: The label imported once, the data of the same label cannot be 
imported multiple times. Users can avoid the problem of duplicate data import 
by specifying Label.
 
    Currently, Doris retains the most recent successful label within 30 minutes.
 
-2. column_separator: used to specify the column separator in the import file, 
the default is \t. If it is an invisible character, you need to add \x as a 
prefix and use hexadecimal to represent the separator.
+2. `column_separator`: used to specify the column separator in the import 
file, the default is \t. If it is an invisible character, you need to add \x as 
a prefix and use hexadecimal to represent the separator.
 
     For example, the separator \x01 of the hive file needs to be specified as 
-H "column_separator:\x01".
 
     You can use a combination of multiple characters as column separators.
 
-3. line_delimiter: used to specify the newline character in the imported file, 
the default is \n. Combinations of multiple characters can be used as newlines.
+3. `line_delimiter`: used to specify the newline character in the imported 
file, the default is \n. Combinations of multiple characters can be used as 
newlines.
 
-4. columns: used to specify the correspondence between the columns in the 
import file and the columns in the table. If the column in the source file 
corresponds exactly to the content in the table, then there is no need to 
specify the content of this field.
+4. `columns`: used to specify the correspondence between the columns in the 
import file and the columns in the table. If the column in the source file 
corresponds exactly to the content in the table, then there is no need to 
specify the content of this field.
 
    If the source file does not correspond to the table schema, then this field 
is required for some data conversion. There are two forms of column, one is 
directly corresponding to the field in the imported file, which is directly 
represented by the field name;
 
@@ -81,82 +81,84 @@ Parameter introduction:
 
     Then you can specify -H "columns: col, year = year(col), month=month(col), 
day=day(col)" to complete the import
 
-5. where: used to extract part of the data. If the user needs to filter out 
the unnecessary data, he can achieve this by setting this option.
+5. `where`: used to extract part of the data. If the user needs to filter out 
the unnecessary data, he can achieve this by setting this option.
 
    Example 1: Only import data greater than k1 column equal to 20180601, then 
you can specify -H "where: k1 = 20180601" when importing
 
-6. max_filter_ratio: The maximum tolerable data ratio that can be filtered 
(for reasons such as data irregularity). Zero tolerance by default. Data 
irregularities do not include rows filtered out by where conditions.
+6. `max_filter_ratio`: The maximum tolerable data ratio that can be filtered 
(for reasons such as data irregularity). Zero tolerance by default. Data 
irregularities do not include rows filtered out by where conditions.
 
-7. partitions: used to specify the partition designed for this import. If the 
user can determine the partition corresponding to the data, it is recommended 
to specify this item. Data that does not satisfy these partitions will be 
filtered out.
+7. `partitions`: used to specify the partition designed for this import. If 
the user can determine the partition corresponding to the data, it is 
recommended to specify this item. Data that does not satisfy these partitions 
will be filtered out.
 
    For example, specify import to p1, p2 partition, -H "partitions: p1, p2"
 
-8. timeout: Specify the import timeout. in seconds. The default is 600 
seconds. The setting range is from 1 second to 259200 seconds.
+8. `timeout`: Specify the import timeout. in seconds. The default is 600 
seconds. The setting range is from 1 second to 259200 seconds.
 
-9. strict_mode: The user specifies whether to enable strict mode for this 
import. The default is off. The enable mode is -H "strict_mode: true".
+9. `strict_mode`: The user specifies whether to enable strict mode for this 
import. The default is off. The enable mode is -H "strict_mode: true".
 
-10. timezone: Specify the time zone used for this import. The default is 
Dongba District. This parameter affects the results of all time zone-related 
functions involved in the import.
+10. `timezone`: Specify the time zone used for this import. The default is 
Dongba District. This parameter affects the results of all time zone-related 
functions involved in the import.
 
-11. exec_mem_limit: Import memory limit. Default is 2GB. The unit is bytes.
+11. `exec_mem_limit`: Import memory limit. Default is 2GB. The unit is bytes.
 
-12. format: Specify the import data format, the default is csv, and 
csv_with_names(filter out the first row of your csv file), 
csv_with_names_and_types(filter out the first two lines of your csv file), json 
format are supported.
+12. `format`: Specify the import data format, the default is csv, and 
csv_with_names(filter out the first row of your csv file), 
csv_with_names_and_types(filter out the first two lines of your csv file), json 
format are supported.
 
-13. jsonpaths: The way of importing json is divided into: simple mode and 
matching mode.
+13. `jsonpaths`: The way of importing json is divided into: simple mode and 
matching mode.
 
-    Simple mode: The simple mode is not set the jsonpaths parameter. In this 
mode, the json data is required to be an object type, for example:
+      Simple mode: The simple mode is not set the jsonpaths parameter. In this 
mode, the json data is required to be an object type, for example:
 
-       ````
-    {"k1":1, "k2":2, "k3":"hello"}, where k1, k2, k3 are column names.
-       ````
+         ````
+      {"k1":1, "k2":2, "k3":"hello"}, where k1, k2, k3 are column names.
+         ````
 
-    Matching mode: It is relatively complex for json data and needs to match 
the corresponding value through the jsonpaths parameter.
+      Matching mode: It is relatively complex for json data and needs to match 
the corresponding value through the jsonpaths parameter.
 
-14. strip_outer_array: Boolean type, true indicates that the json data starts 
with an array object and flattens the array object, the default value is false. 
E.g:
+14. `strip_outer_array`: Boolean type, true indicates that the json data 
starts with an array object and flattens the array object, the default value is 
false. E.g:
 
-       ````
-        [
-         {"k1" : 1, "v1" : 2},
-         {"k1" : 3, "v1" : 4}
-        ]
-        When strip_outer_array is true, the final import into doris will 
generate two rows of data.
-       ````
+         ````
+          [
+           {"k1" : 1, "v1" : 2},
+           {"k1" : 3, "v1" : 4}
+          ]
+          When strip_outer_array is true, the final import into doris will 
generate two rows of data.
+         ````
+
+15. `json_root`: json_root is a valid jsonpath string, used to specify the 
root node of the json document, the default value is "".
+
+16. `merge_type`: The merge type of data, which supports three types: APPEND, 
DELETE, and MERGE. Among them, APPEND is the default value, which means that 
this batch of data needs to be appended to the existing data, and DELETE means 
to delete all the data with the same key as this batch of data. Line, the MERGE 
semantics need to be used in conjunction with the delete condition, which means 
that the data that meets the delete condition is processed according to the 
DELETE semantics and t [...]
+
+17. `delete`: Only meaningful under MERGE, indicating the deletion condition 
of the data
+          function_column.sequence_col: Only applicable to UNIQUE_KEYS. Under 
the same key column, ensure that the value column is REPLACEed according to the 
source_sequence column. The source_sequence can be a column in the data source 
or a column in the table structure.
+
+18. `fuzzy_parse`: Boolean type, true means that json will be parsed with the 
schema of the first row. Enabling this option can improve the efficiency of 
json import, but requires that the order of the keys of all json objects is the 
same as the first row, the default is false, only use in json format
+
+19. `num_as_string`: Boolean type, true means that when parsing json data, the 
numeric type will be converted to a string, and then imported without losing 
precision.
+
+20. `read_json_by_line`: Boolean type, true to support reading one json object 
per line, the default value is false.
+
+21. `send_batch_parallelism`: Integer, used to set the parallelism of sending 
batch data. If the value of parallelism exceeds 
`max_send_batch_parallelism_per_job` in the BE configuration, the BE as a 
coordination point will use the value of `max_send_batch_parallelism_per_job`.
+
+22. `load_to_single_tablet` :  Boolean type, True means that one task can only 
load data to one tablet in the corresponding partition at a time. The default 
value is false. The number of tasks for the job depends on the overall 
concurrency. This parameter can only be set when loading data into the OLAP 
table with random partition. 
 
-15. json_root: json_root is a valid jsonpath string, used to specify the root 
node of the json document, the default value is "".
-
-16. merge_type: The merge type of data, which supports three types: APPEND, 
DELETE, and MERGE. Among them, APPEND is the default value, which means that 
this batch of data needs to be appended to the existing data, and DELETE means 
to delete all the data with the same key as this batch of data. Line, the MERGE 
semantics need to be used in conjunction with the delete condition, which means 
that the data that meets the delete condition is processed according to the 
DELETE semantics and the [...]
-
-17. delete: Only meaningful under MERGE, indicating the deletion condition of 
the data
-        function_column.sequence_col: Only applicable to UNIQUE_KEYS. Under 
the same key column, ensure that the value column is REPLACEed according to the 
source_sequence column. The source_sequence can be a column in the data source 
or a column in the table structure.
-
-18. fuzzy_parse: Boolean type, true means that json will be parsed with the 
schema of the first row. Enabling this option can improve the efficiency of 
json import, but requires that the order of the keys of all json objects is the 
same as the first row, the default is false, only use in json format
-
-19. num_as_string: Boolean type, true means that when parsing json data, the 
numeric type will be converted to a string, and then imported without losing 
precision.
-
-20. read_json_by_line: Boolean type, true to support reading one json object 
per line, the default value is false.
-
-21. send_batch_parallelism: Integer, used to set the parallelism of sending 
batch data. If the value of parallelism exceeds 
`max_send_batch_parallelism_per_job` in the BE configuration, the BE as a 
coordination point will use the value of `max_send_batch_parallelism_per_job`.
-
-    RETURN VALUES
-        After the import is complete, the related content of this import will 
be returned in Json format. Currently includes the following fields
-        Status: Import the last status.
-            Success: Indicates that the import is successful and the data is 
already visible;
-            Publish Timeout: Indicates that the import job has been 
successfully committed, but is not immediately visible for some reason. The 
user can consider the import to be successful and not have to retry the import
-            Label Already Exists: Indicates that the Label has been occupied 
by other jobs. It may be imported successfully or it may be being imported.
-            The user needs to determine the subsequent operation through the 
get label state command
-            Others: The import failed, the user can specify the Label to retry 
the job
-        Message: Detailed description of the import status. On failure, the 
specific failure reason is returned.
-        NumberTotalRows: The total number of rows read from the data stream
-        NumberLoadedRows: The number of data rows imported this time, only 
valid in Success
-        NumberFilteredRows: The number of rows filtered out by this import, 
that is, the number of rows with unqualified data quality
-        NumberUnselectedRows: This import, the number of rows filtered out by 
the where condition
-        LoadBytes: The size of the source file data imported this time
-        LoadTimeMs: The time taken for this import
-        BeginTxnTimeMs: The time it takes to request Fe to start a 
transaction, in milliseconds.
-        StreamLoadPutTimeMs: The time it takes to request Fe to obtain the 
execution plan for importing data, in milliseconds.
-        ReadDataTimeMs: Time spent reading data, in milliseconds.
-        WriteDataTimeMs: The time taken to perform the write data operation, 
in milliseconds.
-        CommitAndPublishTimeMs: The time it takes to submit a request to Fe 
and publish the transaction, in milliseconds.
-        ErrorURL: The specific content of the filtered data, only the first 
1000 items are retained
+RETURN VALUES
+         After the import is complete, the related content of this import will 
be returned in Json format. Currently includes the following fields
+         Status: Import the last status.
+             Success: Indicates that the import is successful and the data is 
already visible;
+             Publish Timeout: Indicates that the import job has been 
successfully committed, but is not immediately visible for some reason. The 
user can consider the import to be successful and not have to retry the import
+             Label Already Exists: Indicates that the Label has been occupied 
by other jobs. It may be imported successfully or it may be being imported.
+             The user needs to determine the subsequent operation through the 
get label state command
+             Others: The import failed, the user can specify the Label to 
retry the job
+         Message: Detailed description of the import status. On failure, the 
specific failure reason is returned.
+         NumberTotalRows: The total number of rows read from the data stream
+         NumberLoadedRows: The number of data rows imported this time, only 
valid in Success
+         NumberFilteredRows: The number of rows filtered out by this import, 
that is, the number of rows with unqualified data quality
+         NumberUnselectedRows: This import, the number of rows filtered out by 
the where condition
+         LoadBytes: The size of the source file data imported this time
+         LoadTimeMs: The time taken for this import
+         BeginTxnTimeMs: The time it takes to request Fe to start a 
transaction, in milliseconds.
+         StreamLoadPutTimeMs: The time it takes to request Fe to obtain the 
execution plan for importing data, in milliseconds.
+         ReadDataTimeMs: Time spent reading data, in milliseconds.
+         WriteDataTimeMs: The time taken to perform the write data operation, 
in milliseconds.
+         CommitAndPublishTimeMs: The time it takes to submit a request to Fe 
and publish the transaction, in milliseconds.
+         ErrorURL: The specific content of the filtered data, only the first 
1000 items are retained
 
 ERRORS:
         Import error details can be viewed with the following statement:
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
index d3f5a83910b..e30e5dcc85f 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
@@ -167,6 +167,12 @@ WITH BROKER broker_name
   - `timezone`
 
     指定某些受时区影响的函数的时区，如 `strftime/alignment_timestamp/from_unixtime` 等等，具体请查阅 
[时区](../../../../advanced/time-zone.md) 文档。如果不指定，则使用 "Asia/Shanghai" 时区
+    
+  - send_batch_parallelism: 用于设置发送批处理数据的并行度，如果并行度的值超过 BE 配置中的 
`max_send_batch_parallelism_per_job`，那么作为协调点的 BE 将使用 
`max_send_batch_parallelism_per_job` 的值。
+  
+  - load_to_single_tablet :  
布尔类型，为true表示支持一个任务只导入数据到对应分区的一个tablet，默认值为false，作业的任务数取决于整体并发度。该参数只允许在对带有random分区的olap表导数的时候设置
+  
+
 
 ### Example
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md
index 6ced9ee91e4..290e2419a76 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md
@@ -219,6 +219,15 @@ FROM data_source [data_source_properties]
      当导入数据格式为 json 时，可以通过 json_root 指定 Json 数据的根节点。Doris 将通过 json_root 
抽取根节点的元素进行解析。默认为空。
 
      `-H "json_root: $.RECORDS"`
+     
+  10. send_batch_parallelism
+      
+        整型，用于设置发送批处理数据的并行度，如果并行度的值超过 BE 配置中的 
`max_send_batch_parallelism_per_job`，那么作为协调点的 BE 将使用 
`max_send_batch_parallelism_per_job` 的值。 
+      
+  11. load_to_single_tablet
+     
+        
布尔类型，为true表示支持一个任务只导入数据到对应分区的一个tablet，默认值为false，该参数只允许在对带有random分区的olap表导数的时候设置。
+
 
 - `FROM data_source [data_source_properties]`
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md
index c11c5d41133..5ae2c67fb4e 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md
@@ -54,43 +54,43 @@ curl --location-trusted -u user:passwd [-H ""...] -T 
data.file -XPUT http://fe_h
         用户可以通过HTTP的Header部分来传入导入参数
 
 1. label: 一次导入的标签，相同标签的数据无法多次导入。用户可以通过指定Label的方式来避免一份数据重复导入的问题。
-   
+
      当前Doris内部保留30分钟内最近成功的label。
-    
+
 2. column_separator：用于指定导入文件中的列分隔符，默认为\t。如果是不可见字符，则需要加\x作为前缀，使用十六进制来表示分隔符。
-   
+
         如hive文件的分隔符\x01，需要指定为-H "column_separator:\x01"。
-    
+
         可以使用多个字符的组合作为列分隔符。
-    
+
 3. line_delimiter：用于指定导入文件中的换行符，默认为\n。可以使用做多个字符的组合作为换行符。
-   
+
 4. columns：用于指定导入文件中的列和 table 中的列的对应关系。如果源文件中的列正好对应表中的内容，那么是不需要指定这个字段的内容的。
-   
+
     如果源文件与表schema不对应，那么需要这个字段进行一些数据转换。这里有两种形式column，一种是直接对应导入文件中的字段，直接使用字段名表示；
-    
+
         一种是衍生列，语法为 `column_name` = expression。举几个例子帮助理解。
-    
+
         例1: 表中有3个列“c1, c2, c3”，源文件中的三个列一次对应的是"c3,c2,c1"; 那么需要指定-H "columns: 
c3, c2, c1"
-    
+
         例2: 表中有3个列“c1, c2, c3", 源文件中前三列依次对应，但是有多余1列；那么需要指定-H "columns: c1, 
c2, c3, xxx";
-    
+
         最后一个列随意指定个名称占位即可
-    
+
         例3: 表中有3个列“year, month, day"三个列，源文件中只有一个时间列，为”2018-06-01 01:02:03“格式；
-    
+
         那么可以指定-H "columns: col, year = year(col), month=month(col), 
day=day(col)"完成导入
-    
+
 5. where: 用于抽取部分数据。用户如果有需要将不需要的数据过滤掉，那么可以通过设定这个选项来达到。
-   
+
     例1: 只导入大于k1列等于20180601的数据，那么可以在导入时候指定-H "where: k1 = 20180601"
-    
+
 6. max_filter_ratio：最大容忍可过滤（数据不规范等原因）的数据比例。默认零容忍。数据不规范不包括通过 where 条件过滤掉的行。
 
 7. partitions: 
用于指定这次导入所设计的partition。如果用户能够确定数据对应的partition，推荐指定该项。不满足这些分区的数据将被过滤掉。
-   
+
     比如指定导入到p1, p2分区，-H "partitions: p1, p2"
-    
+
 8. timeout: 指定导入的超时时间。单位秒。默认是 600 秒。可设置范围为 1 秒 ~ 259200 秒。
 
 9. strict_mode: 用户指定此次导入是否开启严格模式，默认为关闭。开启方式为 -H "strict_mode: true"。
@@ -102,59 +102,61 @@ curl --location-trusted -u user:passwd [-H ""...] -T 
data.file -XPUT http://fe_h
 12. format: 
指定导入数据格式，默认是csv，也支持：csv_with_names(支持csv文件行首过滤)，csv_with_names_and_types(支持csv文件前两行过滤)
 或 json格式。
 
 13. jsonpaths: 导入json方式分为：简单模式和匹配模式。
-    
-    简单模式：没有设置jsonpaths参数即为简单模式，这种模式下要求json数据是对象类型，例如：
-    
-       ```
-       {"k1":1, "k2":2, "k3":"hello"}，其中k1，k2，k3是列名字。
-       ```
-    匹配模式：用于json数据相对复杂，需要通过jsonpaths参数匹配对应的value。
-    
+
+     简单模式：没有设置jsonpaths参数即为简单模式，这种模式下要求json数据是对象类型，例如：
+
+        ```
+        {"k1":1, "k2":2, "k3":"hello"}，其中k1，k2，k3是列名字。
+        ```
+     匹配模式：用于json数据相对复杂，需要通过jsonpaths参数匹配对应的value。
+
 14. strip_outer_array: 布尔类型，为true表示json数据以数组对象开始且将数组对象中进行展平，默认值是false。例如：
-       ```
-           [
-            {"k1" : 1, "v1" : 2},
-            {"k1" : 3, "v1" : 4}
-           ]
-           当strip_outer_array为true，最后导入到doris中会生成两行数据。
-       ```
-    
+        ```
+            [
+             {"k1" : 1, "v1" : 2},
+             {"k1" : 3, "v1" : 4}
+            ]
+            当strip_outer_array为true，最后导入到doris中会生成两行数据。
+        ```
+
 15. json_root: json_root为合法的jsonpath字符串，用于指定json document的根节点，默认值为""。
-    
+
 16. merge_type: 数据的合并类型，一共支持三种类型APPEND、DELETE、MERGE 
其中，APPEND是默认值，表示这批数据全部需要追加到现有数据中，DELETE 表示删除与这批数据key相同的所有行，MERGE 语义 需要与delete 
条件联合使用，表示满足delete 条件的数据按照DELETE 语义处理其余的按照APPEND 语义处理， 示例：`-H "merge_type: 
MERGE" -H "delete: flag=1"`
 
 17. delete: 仅在 MERGE下有意义， 表示数据的删除条件
-        function_column.sequence_col: 
只适用于UNIQUE_KEYS,相同key列下，保证value列按照source_sequence列进行REPLACE, 
source_sequence可以是数据源中的列，也可以是表结构中的一列。
-    
+         function_column.sequence_col: 
只适用于UNIQUE_KEYS,相同key列下，保证value列按照source_sequence列进行REPLACE, 
source_sequence可以是数据源中的列，也可以是表结构中的一列。
+
 18. fuzzy_parse: 布尔类型，为true表示json将以第一行为schema 进行解析，开启这个选项可以提高json 
导入效率，但是要求所有json 对象的key的顺序和第一行一致， 默认为false，仅用于json 格式
-    
+
 19. num_as_string: 布尔类型，为true表示在解析json数据时会将数字类型转为字符串，然后在确保不会出现精度丢失的情况下进行导入。
-    
+
 20. read_json_by_line: 布尔类型，为true表示支持每行读取一个json对象，默认值为false。
-    
-21. send_batch_parallelism: 整型，用于设置发送批处理数据的并行度，如果并行度的值超过 BE 配置中的 
`max_send_batch_parallelism_per_job`，那么作为协调点的 BE 将使用 
`max_send_batch_parallelism_per_job` 的值。
-    
-    RETURN VALUES
-        导入完成后，会以Json格式返回这次导入的相关内容。当前包括以下字段
-        Status: 导入最后的状态。
-            Success：表示导入成功，数据已经可见；
-            Publish Timeout：表述导入作业已经成功Commit，但是由于某种原因并不能立即可见。用户可以视作已经成功不必重试导入
-            Label Already Exists: 表明该Label已经被其他作业占用，可能是导入成功，也可能是正在导入。
-            用户需要通过get label state命令来确定后续的操作
-            其他：此次导入失败，用户可以指定Label重试此次作业
-        Message: 导入状态详细的说明。失败时会返回具体的失败原因。
-        NumberTotalRows: 从数据流中读取到的总行数
-        NumberLoadedRows: 此次导入的数据行数，只有在Success时有效
-        NumberFilteredRows: 此次导入过滤掉的行数，即数据质量不合格的行数
-        NumberUnselectedRows: 此次导入，通过 where 条件被过滤掉的行数
-        LoadBytes: 此次导入的源文件数据量大小
-        LoadTimeMs: 此次导入所用的时间
-        BeginTxnTimeMs: 向Fe请求开始一个事务所花费的时间，单位毫秒。
-        StreamLoadPutTimeMs: 向Fe请求获取导入数据执行计划所花费的时间，单位毫秒。
-        ReadDataTimeMs: 读取数据所花费的时间，单位毫秒。
-        WriteDataTimeMs: 执行写入数据操作所花费的时间，单位毫秒。
-        CommitAndPublishTimeMs: 向Fe请求提交并且发布事务所花费的时间，单位毫秒。
-        ErrorURL: 被过滤数据的具体内容，仅保留前1000条
+
+22. send_batch_parallelism : 整型，用于设置发送批处理数据的并行度，如果并行度的值超过 BE 配置中的 
`max_send_batch_parallelism_per_job`，那么作为协调点的 BE 将使用 
`max_send_batch_parallelism_per_job` 的值。
+23.  load_to_single_tablet : 
布尔类型，为true表示支持一个任务只导入数据到对应分区的一个tablet，默认值为false，该参数只允许在对带有random分区的olap表导数的时候设置。
+
+**返回值：**
+
+         导入完成后，会以Json格式返回这次导入的相关内容。当前包括以下字段
+         Status: 导入最后的状态。
+             Success：表示导入成功，数据已经可见；
+             Publish Timeout：表述导入作业已经成功Commit，但是由于某种原因并不能立即可见。用户可以视作已经成功不必重试导入
+             Label Already Exists: 表明该Label已经被其他作业占用，可能是导入成功，也可能是正在导入。
+             用户需要通过get label state命令来确定后续的操作
+             其他：此次导入失败，用户可以指定Label重试此次作业
+         Message: 导入状态详细的说明。失败时会返回具体的失败原因。
+         NumberTotalRows: 从数据流中读取到的总行数
+         NumberLoadedRows: 此次导入的数据行数，只有在Success时有效
+         NumberFilteredRows: 此次导入过滤掉的行数，即数据质量不合格的行数
+         NumberUnselectedRows: 此次导入，通过 where 条件被过滤掉的行数
+         LoadBytes: 此次导入的源文件数据量大小
+         LoadTimeMs: 此次导入所用的时间
+         BeginTxnTimeMs: 向Fe请求开始一个事务所花费的时间，单位毫秒。
+         StreamLoadPutTimeMs: 向Fe请求获取导入数据执行计划所花费的时间，单位毫秒。
+         ReadDataTimeMs: 读取数据所花费的时间，单位毫秒。
+         WriteDataTimeMs: 执行写入数据操作所花费的时间，单位毫秒。
+         CommitAndPublishTimeMs: 向Fe请求提交并且发布事务所花费的时间，单位毫秒。
+         ErrorURL: 被过滤数据的具体内容，仅保留前1000条
 
 ERRORS:
         可以通过以下语句查看导入错误详细信息：


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[doris-website] branch master updated: fix

Reply via email to