This is an automated email from the ASF dual-hosted git repository. jiafengzheng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push: new 0744aeb201 [fix](docs) fix the 404 bad link of website doc (#16284) 0744aeb201 is described below commit 0744aeb20190aed21fa23e22ea88156ee435fae9 Author: Hong Liu <844981...@qq.com> AuthorDate: Mon Feb 6 18:56:07 2023 +0800 [fix](docs) fix the 404 bad link of website doc (#16284) --- .../memory-management/be-oom-analysis.md | 4 +- .../admin-manual/maint-monitor/monitor-alert.md | 2 +- docs/en/docs/advanced/compute_node.md | 2 +- docs/en/docs/advanced/resource.md | 2 +- docs/en/docs/releasenotes/release-1.2.0.md | 2 +- .../Create/CREATE-RESOURCE.md | 4 +- .../Load/STREAM-LOAD.md | 212 +++++++++++---------- .../memory-management/be-oom-analysis.md | 2 +- .../admin-manual/maint-monitor/monitor-alert.md | 2 +- docs/zh-CN/docs/advanced/compute_node.md | 4 +- docs/zh-CN/docs/advanced/resource.md | 2 +- docs/zh-CN/docs/releasenotes/release-1.2.0.md | 2 +- .../Create/CREATE-RESOURCE.md | 4 +- .../Load/BROKER-LOAD.md | 2 +- .../Load/STREAM-LOAD.md | 196 ++++++++++--------- 15 files changed, 228 insertions(+), 214 deletions(-) diff --git a/docs/en/docs/admin-manual/maint-monitor/memory-management/be-oom-analysis.md b/docs/en/docs/admin-manual/maint-monitor/memory-management/be-oom-analysis.md index be8662dce6..5d855d6e03 100644 --- a/docs/en/docs/admin-manual/maint-monitor/memory-management/be-oom-analysis.md +++ b/docs/en/docs/admin-manual/maint-monitor/memory-management/be-oom-analysis.md @@ -67,9 +67,9 @@ Memory Tracker Summary: MemTrackerLimiter Label=DeleteBitmap AggCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B) ``` -3. When the end of be/log/be.INFO before OOM contains the system memory exceeded log, refer to [Memory Limit Exceeded Analysis](../admin-manual/memory-management/memory-limit-exceeded-analysis. The log analysis method in md) looks at the memory usage of each category of the process. If the current `type=query` memory usage is high, if the query before OOM is known, continue to step 4, otherwise continue to step 5; if the current `type=load` memory usage is more, continue to step 6, if th [...] +3. When the end of be/log/be.INFO before OOM contains the system memory exceeded log, refer to [Memory Limit Exceeded Analysis](./memory-limit-exceeded-analysis). The log analysis method in md) looks at the memory usage of each category of the process. If the current `type=query` memory usage is high, if the query before OOM is known, continue to step 4, otherwise continue to step 5; if the current `type=load` memory usage is more, continue to step 6, if the current `type= Global `memory [...] -4. `type=query` query memory usage is high, and the query before OOM is known, such as test cluster or scheduled task, restart the BE node, refer to [Memory Tracker](../admin-manual/memory-management/memory -tracker.md) View real-time memory tracker statistics, retry the query after `set global enable_profile=true`, observe the memory usage location of specific operators, confirm whether the query memory usage is reasonable, and further consider optimizing SQL memory usage, such as adjus [...] +4. `type=query` query memory usage is high, and the query before OOM is known, such as test cluster or scheduled task, restart the BE node, refer to [Memory Tracker](./memory-tracker) View real-time memory tracker statistics, retry the query after `set global enable_profile=true`, observe the memory usage location of specific operators, confirm whether the query memory usage is reasonable, and further consider optimizing SQL memory usage, such as adjusting the join order . 5. `type=query` query memory usage is high, and the query before OOM is unknown, such as in an online cluster, then search `Deregister query/load memory tracker from the back to the front in `be/log/be.INFO`, queryId` and `Register query/load memory tracker, query/load id`, if the same query id prints the above two lines of logs at the same time, it means that the query or import is successful. If there is only Register but no Deregister, the query or import is still before OOM In this w [...] diff --git a/docs/en/docs/admin-manual/maint-monitor/monitor-alert.md b/docs/en/docs/admin-manual/maint-monitor/monitor-alert.md index 494a653345..8bb4940472 100644 --- a/docs/en/docs/admin-manual/maint-monitor/monitor-alert.md +++ b/docs/en/docs/admin-manual/maint-monitor/monitor-alert.md @@ -28,7 +28,7 @@ under the License. This document mainly introduces Doris's monitoring items and how to collect and display them. And how to configure alarm (TODO) -[Dashboard template click download](https://grafana.com/grafana/dashboards/9734-doris-overview/) +[Dashboard template click download](https://grafana.com/api/dashboards/9734/revisions/4/download) > Note: Before 0.9.0 (excluding), please use revision 1. For version 0.9.x, > use revision 2. For version 0.10.x, use revision 3. diff --git a/docs/en/docs/advanced/compute_node.md b/docs/en/docs/advanced/compute_node.md index 5be6e131fe..7f57d88ea1 100644 --- a/docs/en/docs/advanced/compute_node.md +++ b/docs/en/docs/advanced/compute_node.md @@ -96,7 +96,7 @@ HeartbeatFailureCounter: 0 ``` ### Usage -When using the [MultiCatalog](https://doris.apache.org/docs/dev/ecosystem/external-table/multi-catalog/) , the query will be preferentially scheduled to the compute node. +When using the [MultiCatalog](../lakehouse/multi-catalog/multi-catalog) , the query will be preferentially scheduled to the compute node. In order to balance task scheduling, FE has a `backend_num_for_federation` configuration item, which defaults to 3. When executing a federated query, the optimizer will select `backend_num_for_federation` as an alternative to the scheduler, and the scheduler will decide which node to execute on to prevent the task from being skewed. diff --git a/docs/en/docs/advanced/resource.md b/docs/en/docs/advanced/resource.md index 885240d190..4ee70efb61 100644 --- a/docs/en/docs/advanced/resource.md +++ b/docs/en/docs/advanced/resource.md @@ -132,7 +132,7 @@ PROPERTIES `driver`: Indicates the driver dynamic library used by the ODBC external table. The ODBC external table referring to the resource is required. The old MySQL external table referring to the resource is optional. -For the usage of ODBC resource, please refer to [ODBC of Doris](../ecosystem/external-table/odbc-of-doris.md) +For the usage of ODBC resource, please refer to [ODBC of Doris](../lakehouse/external-table/odbc) #### Example diff --git a/docs/en/docs/releasenotes/release-1.2.0.md b/docs/en/docs/releasenotes/release-1.2.0.md index 016d524e09..cd0a6cc6c4 100644 --- a/docs/en/docs/releasenotes/release-1.2.0.md +++ b/docs/en/docs/releasenotes/release-1.2.0.md @@ -69,7 +69,7 @@ When creating a table, set `"light_schema_change"="true"` in properties. - SQL Server - Clickhouse - Documentation: https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/jdbc-of-doris/ + Documentation: [https://doris.apache.org/en/docs/dev/lakehouse/multi-catalog/jdbc](https://doris.apache.org/docs/dev/lakehouse/multi-catalog/jdbc/) > Note: The ODBC feature will be removed in a later version, please try to switch to the JDBC. diff --git a/docs/en/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-RESOURCE.md b/docs/en/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-RESOURCE.md index f0d81d9fd5..e335cc702d 100644 --- a/docs/en/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-RESOURCE.md +++ b/docs/en/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-RESOURCE.md @@ -129,7 +129,7 @@ illustrate: ); ``` - If S3 resource is used for [cold hot separation](../../../../../docs/advanced/cold_hot_separation.md), we should add more required fields. + If S3 resource is used for [cold hot separation](../../../../../docs/advanced/cold_hot_separation), we should add more required fields. ```sql CREATE RESOURCE "remote_s3" PROPERTIES @@ -203,7 +203,7 @@ illustrate: 6. Create HMS resource - HMS resource is used to create [hms catalog](../../../../ecosystem/external-table/multi-catalog.md) + HMS resource is used to create [hms catalog](../../../../ecosystem/external-table/multi-catalog) ```sql CREATE RESOURCE hms_resource PROPERTIES ( 'type'='hms', diff --git a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md index 042f0b21a8..897b6f5116 100644 --- a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md +++ b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md @@ -113,13 +113,14 @@ Parameter introduction: 14. strip_outer_array: Boolean type, true indicates that the json data starts with an array object and flattens the array object, the default value is false. E.g: - ```` - [ - {"k1" : 1, "v1" : 2}, - {"k1" : 3, "v1" : 4} - ] - When strip_outer_array is true, the final import into doris will generate two rows of data. - ```` + ```` + [ + {"k1" : 1, "v1" : 2}, + {"k1" : 3, "v1" : 4} + ] + ```` + When strip_outer_array is true, the final import into doris will generate two rows of data. + 15. json_root: json_root is a valid jsonpath string, used to specify the root node of the json document, the default value is "". @@ -170,12 +171,10 @@ separated by commas. ERRORS: Import error details can be viewed with the following statement: - - ```sql - SHOW LOAD WARNINGS ON 'url - ```` - - where url is the url given by ErrorURL. + ```` + SHOW LOAD WARNINGS ON 'url' + ```` + where url is the url given by ErrorURL. 24. compress_type @@ -183,8 +182,6 @@ ERRORS: 25. trim_double_quotes: Boolean type, The default value is false. True means that the outermost double quotes of each field in the csv file are trimmed. -26. skip_lines: <version since="dev" type="inline"> Integer type, the default value is 0. It will skip some lines in the head of csv file. It will be disabled when format is `csv_with_names` or `csv_with_names_and_types`. </version> - ### Example 1. Import the data in the local file 'testData' into the table 'testTbl' in the database 'testDb', and use Label for deduplication. Specify a timeout of 100 seconds @@ -194,131 +191,136 @@ ERRORS: ```` 2. Import the data in the local file 'testData' into the table 'testTbl' in the database 'testDb', use Label for deduplication, and only import data whose k1 is equal to 20180601 - ```` - curl --location-trusted -u root -H "label:123" -H "where: k1=20180601" -T testData http://host:port/api/testDb/testTbl/_stream_load - ```` + + ```` + curl --location-trusted -u root -H "label:123" -H "where: k1=20180601" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` 3. Import the data in the local file 'testData' into the table 'testTbl' in the database 'testDb', allowing a 20% error rate (the user is in the defalut_cluster) - ```` - curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -T testData http://host:port/api/testDb/testTbl/_stream_load - ```` + + ```` + curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` 4. Import the data in the local file 'testData' into the table 'testTbl' in the database 'testDb', allow a 20% error rate, and specify the column name of the file (the user is in the defalut_cluster) - ```` - curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "columns: k2, k1, v1" -T testData http://host:port/api/testDb/testTbl /_stream_load - ```` + + ```` + curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "columns: k2, k1, v1" -T testData http://host:port/api/testDb/testTbl /_stream_load + ```` 5. Import the data in the local file 'testData' into the p1, p2 partitions of the table 'testTbl' in the database 'testDb', allowing a 20% error rate. - ```` - curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "partitions: p1, p2" -T testData http://host:port/api/testDb/testTbl/_stream_load - ```` + + ```` + curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "partitions: p1, p2" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` 6. Import using streaming (user is in defalut_cluster) - ```` - seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u root -T - http://host:port/api/testDb/testTbl/ _stream_load - ```` + + ```` + seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u root -T - http://host:port/api/testDb/testTbl/ _stream_load + ```` 7. Import a table containing HLL columns, which can be columns in the table or columns in the data to generate HLL columns, or use hll_empty to supplement columns that are not in the data - ```` - curl --location-trusted -u root -H "columns: k1, k2, v1=hll_hash(k1), v2=hll_empty()" -T testData http://host:port/api/testDb/testTbl/_stream_load - ```` + + ```` + curl --location-trusted -u root -H "columns: k1, k2, v1=hll_hash(k1), v2=hll_empty()" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` 8. Import data for strict mode filtering and set the time zone to Africa/Abidjan - ```` - curl --location-trusted -u root -H "strict_mode: true" -H "timezone: Africa/Abidjan" -T testData http://host:port/api/testDb/testTbl/_stream_load - ```` + + ```` + curl --location-trusted -u root -H "strict_mode: true" -H "timezone: Africa/Abidjan" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` 9. Import a table with a BITMAP column, which can be a column in the table or a column in the data to generate a BITMAP column, or use bitmap_empty to fill an empty Bitmap - ```` - curl --location-trusted -u root -H "columns: k1, k2, v1=to_bitmap(k1), v2=bitmap_empty()" -T testData http://host:port/api/testDb/testTbl/_stream_load - ```` + ```` + curl --location-trusted -u root -H "columns: k1, k2, v1=to_bitmap(k1), v2=bitmap_empty()" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` 10. Simple mode, import json data Table Structure: -`category` varchar(512) NULL COMMENT "", -`author` varchar(512) NULL COMMENT "", -`title` varchar(512) NULL COMMENT "", -`price` double NULL COMMENT "" + `category` varchar(512) NULL COMMENT "", + `author` varchar(512) NULL COMMENT "", + `title` varchar(512) NULL COMMENT "", + `price` double NULL COMMENT "" -json data format: - -```` -{"category":"C++","author":"avc","title":"C++ primer","price":895} -```` - -Import command: - -```` -curl --location-trusted -u root -H "label:123" -H "format: json" -T testData http://host:port/api/testDb/testTbl/_stream_load -```` - -In order to improve throughput, it supports importing multiple pieces of json data at one time, each line is a json object, and \n is used as a newline by default. You need to set read_json_by_line to true. The json data format is as follows: - - -```` -{"category":"C++","author":"avc","title":"C++ primer","price":89.5} -{"category":"Java","author":"avc","title":"Effective Java","price":95} -{"category":"Linux","author":"avc","title":"Linux kernel","price":195} -```` + json data format: + ```` + {"category":"C++","author":"avc","title":"C++ primer","price":895} + ```` + + Import command: + + ```` + curl --location-trusted -u root -H "label:123" -H "format: json" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` + + In order to improve throughput, it supports importing multiple pieces of json data at one time, each line is a json object, and \n is used as a newline by default. You need to set read_json_by_line to true. The json data format is as follows: + ```` + {"category":"C++","author":"avc","title":"C++ primer","price":89.5} + {"category":"Java","author":"avc","title":"Effective Java","price":95} + {"category":"Linux","author":"avc","title":"Linux kernel","price":195} + ```` + 11. Match pattern, import json data json data format: -```` -[ -{"category":"xuxb111","author":"1avc","title":"SayingsoftheCentury","price":895},{"category":"xuxb222","author":"2avc"," title":"SayingsoftheCentury","price":895}, -{"category":"xuxb333","author":"3avc","title":"SayingsoftheCentury","price":895} -] -```` + ```` + [ + {"category":"xuxb111","author":"1avc","title":"SayingsoftheCentury","price":895},{"category":"xuxb222","author":"2avc"," title":"SayingsoftheCentury","price":895}, + {"category":"xuxb333","author":"3avc","title":"SayingsoftheCentury","price":895} + ] + ```` -Precise import by specifying jsonpath, such as importing only three attributes of category, author, and price + Precise import by specifying jsonpath, such as importing only three attributes of category, author, and price -```` -curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H "jsonpaths: [\"$.category\",\" $.price\",\"$.author\"]" -H "strip_outer_array: true" -T testData http://host:port/api/testDb/testTbl/_stream_load -```` + ```` + curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H "jsonpaths: [\"$.category\",\" $.price\",\"$.author\"]" -H "strip_outer_array: true" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` -illustrate: - 1) If the json data starts with an array, and each object in the array is a record, you need to set strip_outer_array to true, which means flatten the array. - 2) If the json data starts with an array, and each object in the array is a record, when setting jsonpath, our ROOT node is actually an object in the array. + illustrate: + 1) If the json data starts with an array, and each object in the array is a record, you need to set strip_outer_array to true, which means flatten the array. + 2) If the json data starts with an array, and each object in the array is a record, when setting jsonpath, our ROOT node is actually an object in the array. 12. User specified json root node json data format: -```` -{ - "RECORDS":[ -{"category":"11","title":"SayingsoftheCentury","price":895,"timestamp":1589191587}, -{"category":"22","author":"2avc","price":895,"timestamp":1589191487}, -{"category":"33","author":"3avc","title":"SayingsoftheCentury","timestamp":1589191387} -] -} -```` - -Precise import by specifying jsonpath, such as importing only three attributes of category, author, and price - -```` -curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H "jsonpaths: [\"$.category\",\" $.price\",\"$.author\"]" -H "strip_outer_array: true" -H "json_root: $.RECORDS" -T testData http://host:port/api/testDb/testTbl/_stream_load -```` - + ```` + { + "RECORDS":[ + {"category":"11","title":"SayingsoftheCentury","price":895,"timestamp":1589191587}, + {"category":"22","author":"2avc","price":895,"timestamp":1589191487}, + {"category":"33","author":"3avc","title":"SayingsoftheCentury","timestamp":1589191387} + ] + } + ```` + + Precise import by specifying jsonpath, such as importing only three attributes of category, author, and price + + ```` + curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H "jsonpaths: [\"$.category\",\" $.price\",\"$.author\"]" -H "strip_outer_array: true" -H "json_root: $.RECORDS" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` + 13. Delete the data with the same import key as this batch -```` -curl --location-trusted -u root -H "merge_type: DELETE" -T testData http://host:port/api/testDb/testTbl/_stream_load -```` + ```` + curl --location-trusted -u root -H "merge_type: DELETE" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` 14. Delete the columns in this batch of data that match the data whose flag is listed as true, and append other rows normally - -```` -curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv, flag" -H "merge_type: MERGE" -H "delete: flag=1" -T testData http://host:port/api/testDb/testTbl/_stream_load -```` - + + ```` + curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv, flag" -H "merge_type: MERGE" -H "delete: flag=1" -T testData http://host:port/api/testDb/testTbl/_stream_load + ```` + 15. Import data into UNIQUE_KEYS table with sequence column - -```` -curl --location-trusted -u root -H "columns: k1,k2,source_sequence,v1,v2" -H "function_column.sequence_col: source_sequence" -T testData http://host:port/api/testDb/testTbl/ _stream_load -```` - + + ```` + curl --location-trusted -u root -H "columns: k1,k2,source_sequence,v1,v2" -H "function_column.sequence_col: source_sequence" -T testData http://host:port/api/testDb/testTbl/ _stream_load + ```` + ### Keywords STREAM, LOAD diff --git a/docs/zh-CN/docs/admin-manual/maint-monitor/memory-management/be-oom-analysis.md b/docs/zh-CN/docs/admin-manual/maint-monitor/memory-management/be-oom-analysis.md index 41db09f3b3..3a63d2e387 100644 --- a/docs/zh-CN/docs/admin-manual/maint-monitor/memory-management/be-oom-analysis.md +++ b/docs/zh-CN/docs/admin-manual/maint-monitor/memory-management/be-oom-analysis.md @@ -69,7 +69,7 @@ Memory Tracker Summary: 3. 当 OOM 前 be/log/be.INFO 的最后包含系统内存超限的日志时,参考 [Memory Limit Exceeded Analysis](./memory-limit-exceeded-analysis) 中的日志分析方法,查看进程每个类别的内存使用情况。若当前是`type=query`内存使用较多,若已知 OOM 前的查询继续步骤4,否则继续步骤5;若当前是`type=load`内存使用多继续步骤6,若当前是`type=global`内存使用多继续步骤7。 -4. `type=query`查询内存使用多,且已知 OOM 前的查询时,比如测试集群或定时任务,重启BE节点,参考 [Memory Tracker](../admin-manual/memory-management/memory-tracker.md) 查看实时 memory tracker 统计,`set global enable_profile=true`后重试查询,观察具体算子的内存使用位置,确认查询内存使用是否合理,进一步考虑优化SQL内存使用,比如调整join顺序。 +4. `type=query`查询内存使用多,且已知 OOM 前的查询时,比如测试集群或定时任务,重启BE节点,参考 [Memory Tracker](./memory-tracker) 查看实时 memory tracker 统计,`set global enable_profile=true`后重试查询,观察具体算子的内存使用位置,确认查询内存使用是否合理,进一步考虑优化SQL内存使用,比如调整join顺序。 5. `type=query`查询内存使用多,且未知 OOM 前的查询时,比如位于线上集群,则在`be/log/be.INFO`从后向前搜`Deregister query/load memory tracker, queryId` 和 `Register query/load memory tracker, query/load id`,同一个query id若同时打出上述两行日志则表示查询或导入成功,若只有 Register 没有 Deregister,则这个查询或导入在 OOM 前仍在运行,这样可以得到OOM 前所有正在运行的查询和导入,按照步骤4的方法对可疑大内存查询分析其内存使用。 diff --git a/docs/zh-CN/docs/admin-manual/maint-monitor/monitor-alert.md b/docs/zh-CN/docs/admin-manual/maint-monitor/monitor-alert.md index 0906e9b1a1..e4d627955b 100644 --- a/docs/zh-CN/docs/admin-manual/maint-monitor/monitor-alert.md +++ b/docs/zh-CN/docs/admin-manual/maint-monitor/monitor-alert.md @@ -28,7 +28,7 @@ under the License. 本文档主要介绍 Doris 的监控项及如何采集、展示监控项。以及如何配置报警(TODO) -[Dashboard 模板点击下载](https://grafana.com/grafana/dashboards/9734-doris-overview/) +[Dashboard 模板点击下载](https://grafana.com/api/dashboards/9734/revisions/4/download) > 注:0.9.0(不含)之前的版本请使用 revision 1。0.9.x 版本请使用 revision 2。0.10.x 版本请使用 revision > 3。 diff --git a/docs/zh-CN/docs/advanced/compute_node.md b/docs/zh-CN/docs/advanced/compute_node.md index 9fe03e86d3..8b2825034f 100644 --- a/docs/zh-CN/docs/advanced/compute_node.md +++ b/docs/zh-CN/docs/advanced/compute_node.md @@ -89,13 +89,13 @@ HeartbeatFailureCounter: 0 ``` ### 使用 -当查询时使用[MultiCatalog](https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/multi-catalog)功能时, 查询会优先调度到计算节点, 为了均衡任务调度, FE有一个`backend_num_for_federation`配置项, 默认是3. +当查询时使用[MultiCatalog](../lakehouse/multi-catalog/multi-catalog)功能时, 查询会优先调度到计算节点, 为了均衡任务调度, FE有一个`backend_num_for_federation`配置项, 默认是3. 当执行联邦查询时, 优化器会选取`backend_num_for_federation`给调度器备选, 由调取器决定具体在哪个节点执行, 防止查询任务倾斜. 当计算节点个数小于`backend_num_for_federation`时, 会随机选择混合节点补齐个数;当计算节点大于`backend_num_for_federation`, 那么联邦查询任务只会在计算节点执行. ### 一些限制 -- 计算节点目前只支持[MultiCatalog](https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/multi-catalog)对应的Hive MetaStore表类型查询语法, 普通外表的计算依然在混合节点上. +- 计算节点目前只支持[MultiCatalog](../lakehouse/multi-catalog/multi-catalog)对应的Hive MetaStore表类型查询语法, 普通外表的计算依然在混合节点上. - 计算节点由配置项控制, 但不要将混合类型节点, 修改配置为计算节点. diff --git a/docs/zh-CN/docs/advanced/resource.md b/docs/zh-CN/docs/advanced/resource.md index 88bc67114c..76ee557cd8 100644 --- a/docs/zh-CN/docs/advanced/resource.md +++ b/docs/zh-CN/docs/advanced/resource.md @@ -127,7 +127,7 @@ PROPERTIES `driver`: 标示外部表使用的driver动态库,引用该resource的ODBC外表必填,旧的mysql外表选填。 -具体如何使用可以,可以参考[ODBC of Doris](../ecosystem/external-table/odbc-of-doris.md) +具体如何使用可以,可以参考[ODBC of Doris](../lakehouse/external-table/odbc) #### 示例 diff --git a/docs/zh-CN/docs/releasenotes/release-1.2.0.md b/docs/zh-CN/docs/releasenotes/release-1.2.0.md index c474ed67bb..6c2228b3d3 100644 --- a/docs/zh-CN/docs/releasenotes/release-1.2.0.md +++ b/docs/zh-CN/docs/releasenotes/release-1.2.0.md @@ -143,7 +143,7 @@ Multi-Catalog 多源数据目录功能的目标在于能够帮助用户更方便 更多数据源的适配已经在规划之中,原则上任何支持 JDBC 协议访问的数据库均能通过 JDBC 外部表的方式来访问。而之前的 ODBC 外部表功能将会在后续的某个版本中移除,还请尽量切换到 JDBC 外表功能。 -文档:[https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/jdbc-of-doris/](https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/jdbc-of-doris/) +文档:[https://doris.apache.org/zh-CN/docs/dev/lakehouse/multi-catalog/jdbc](https://doris.apache.org/zh-CN/docs/dev/lakehouse/multi-catalog/jdbc) ### 6. JAVA UDF diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-RESOURCE.md b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-RESOURCE.md index 4acc877ceb..da2eddca6f 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-RESOURCE.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-RESOURCE.md @@ -129,7 +129,7 @@ PROPERTIES ("key"="value", ...); ); ``` - 如果 s3 reource 在[冷热分离](../../../../../docs/advanced/cold_hot_separation.md)中使用,需要添加额外的字段。 + 如果 s3 reource 在[冷热分离](../../../../../docs/advanced/cold_hot_separation)中使用,需要添加额外的字段。 ```sql CREATE RESOURCE "remote_s3" PROPERTIES @@ -202,7 +202,7 @@ PROPERTIES ("key"="value", ...); 6. 创建 HMS resource - HMS resource 用于 [hms catalog](../../../../ecosystem/external-table/multi-catalog.md) + HMS resource 用于 [hms catalog](../../../../ecosystem/external-table/multi-catalog) ```sql CREATE RESOURCE hms_resource PROPERTIES ( 'type'='hms', diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md index 79754c2be5..83838b00a3 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md @@ -487,7 +487,7 @@ WITH BROKER broker_name 2. 取消导入任务 - 已提交切尚未结束的导入任务可以通过 [CANCEL LOAD](../CANCEL-LOAD) 命令取消。取消后,已写入的数据也会回滚,不会生效。 + 已提交切尚未结束的导入任务可以通过 [CANCEL LOAD](./CANCEL-LOAD) 命令取消。取消后,已写入的数据也会回滚,不会生效。 3. Label、导入事务、多表原子性 diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md index c6d1f4188d..444509b5c9 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md @@ -168,11 +168,10 @@ curl --location-trusted -u user:passwd [-H ""...] -T data.file -XPUT http://fe_h ERRORS: 可以通过以下语句查看导入错误详细信息: - ```sql + ``` SHOW LOAD WARNINGS ON 'url' ``` - - 其中 url 为 ErrorURL 给出的 url。 + 其中 url 为 ErrorURL 给出的 url。 24. compress_type @@ -180,126 +179,139 @@ ERRORS: 25. trim_double_quotes: 布尔类型,默认值为 false,为 true 时表示裁剪掉 csv 文件每个字段最外层的双引号。 -26. skip_lines: <version since="dev" type="inline"> 整数类型, 默认值为0, 含义为跳过csv文件的前几行. 当设置format设置为 `csv_with_names` 或、`csv_with_names_and_types` 时, 该参数会失效. </version> - ### Example 1. 将本地文件'testData'中的数据导入到数据库'testDb'中'testTbl'的表,使用Label用于去重。指定超时时间为 100 秒 - ``` - curl --location-trusted -u root -H "label:123" -H "timeout:100" -T testData http://host:port/api/testDb/testTbl/_stream_load - ``` + ``` + curl --location-trusted -u root -H "label:123" -H "timeout:100" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 2. 将本地文件'testData'中的数据导入到数据库'testDb'中'testTbl'的表,使用Label用于去重, 并且只导入k1等于20180601的数据 - ``` - curl --location-trusted -u root -H "label:123" -H "where: k1=20180601" -T testData http://host:port/api/testDb/testTbl/_stream_load - ``` + + ``` + curl --location-trusted -u root -H "label:123" -H "where: k1=20180601" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 3. 将本地文件'testData'中的数据导入到数据库'testDb'中'testTbl'的表, 允许20%的错误率(用户是defalut_cluster中的) - ``` - curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -T testData http://host:port/api/testDb/testTbl/_stream_load - ``` + + ``` + curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 4. 将本地文件'testData'中的数据导入到数据库'testDb'中'testTbl'的表, 允许20%的错误率,并且指定文件的列名(用户是defalut_cluster中的) - ``` - curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "columns: k2, k1, v1" -T testData http://host:port/api/testDb/testTbl/_stream_load - ``` + + ``` + curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "columns: k2, k1, v1" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 5. 将本地文件'testData'中的数据导入到数据库'testDb'中'testTbl'的表中的p1, p2分区, 允许20%的错误率。 - ``` - curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "partitions: p1, p2" -T testData http://host:port/api/testDb/testTbl/_stream_load - ``` + + ``` + curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "partitions: p1, p2" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 6. 使用streaming方式导入(用户是defalut_cluster中的) - ``` - seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u root -T - http://host:port/api/testDb/testTbl/_stream_load - ``` + + ``` + seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u root -T - http://host:port/api/testDb/testTbl/_stream_load + ``` 7. 导入含有HLL列的表,可以是表中的列或者数据中的列用于生成HLL列,也可使用hll_empty补充数据中没有的列 - ``` - curl --location-trusted -u root -H "columns: k1, k2, v1=hll_hash(k1), v2=hll_empty()" -T testData http://host:port/api/testDb/testTbl/_stream_load - ``` + + ``` + curl --location-trusted -u root -H "columns: k1, k2, v1=hll_hash(k1), v2=hll_empty()" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 8. 导入数据进行严格模式过滤,并设置时区为 Africa/Abidjan - ``` - curl --location-trusted -u root -H "strict_mode: true" -H "timezone: Africa/Abidjan" -T testData http://host:port/api/testDb/testTbl/_stream_load - ``` + + ``` + curl --location-trusted -u root -H "strict_mode: true" -H "timezone: Africa/Abidjan" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 9. 导入含有BITMAP列的表,可以是表中的列或者数据中的列用于生成BITMAP列,也可以使用bitmap_empty填充空的Bitmap - ``` - curl --location-trusted -u root -H "columns: k1, k2, v1=to_bitmap(k1), v2=bitmap_empty()" -T testData http://host:port/api/testDb/testTbl/_stream_load - ``` + + ``` + curl --location-trusted -u root -H "columns: k1, k2, v1=to_bitmap(k1), v2=bitmap_empty()" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 10. 简单模式,导入json数据 -表结构: - -`category` varchar(512) NULL COMMENT "", -`author` varchar(512) NULL COMMENT "", -`title` varchar(512) NULL COMMENT "", -`price` double NULL COMMENT "" + + 表结构: -json数据格式: -``` -{"category":"C++","author":"avc","title":"C++ primer","price":895} -``` -导入命令: -``` -curl --location-trusted -u root -H "label:123" -H "format: json" -T testData http://host:port/api/testDb/testTbl/_stream_load -``` -为了提升吞吐量,支持一次性导入多条json数据,每行为一个json对象,默认使用\n作为换行符,需要将read_json_by_line设置为true,json数据格式如下: - -``` -{"category":"C++","author":"avc","title":"C++ primer","price":89.5} -{"category":"Java","author":"avc","title":"Effective Java","price":95} -{"category":"Linux","author":"avc","title":"Linux kernel","price":195} -``` + `category` varchar(512) NULL COMMENT "", + `author` varchar(512) NULL COMMENT "", + `title` varchar(512) NULL COMMENT "", + `price` double NULL COMMENT "" + json数据格式: + ``` + {"category":"C++","author":"avc","title":"C++ primer","price":895} + ``` + 导入命令: + ``` + curl --location-trusted -u root -H "label:123" -H "format: json" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` + 为了提升吞吐量,支持一次性导入多条json数据,每行为一个json对象,默认使用\n作为换行符,需要将read_json_by_line设置为true,json数据格式如下: + + ``` + {"category":"C++","author":"avc","title":"C++ primer","price":89.5} + {"category":"Java","author":"avc","title":"Effective Java","price":95} + {"category":"Linux","author":"avc","title":"Linux kernel","price":195} + ``` + 11. 匹配模式,导入json数据 -json数据格式: -``` -[ -{"category":"xuxb111","author":"1avc","title":"SayingsoftheCentury","price":895},{"category":"xuxb222","author":"2avc","title":"SayingsoftheCentury","price":895}, -{"category":"xuxb333","author":"3avc","title":"SayingsoftheCentury","price":895} -] -``` -通过指定jsonpath进行精准导入,例如只导入category、author、price三个属性 -``` -curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H "jsonpaths: [\"$.category\",\"$.price\",\"$.author\"]" -H "strip_outer_array: true" -T testData http://host:port/api/testDb/testTbl/_stream_load -``` -说明: - 1)如果json数据是以数组开始,并且数组中每个对象是一条记录,则需要将strip_outer_array设置成true,表示展平数组。 - 2)如果json数据是以数组开始,并且数组中每个对象是一条记录,在设置jsonpath时,我们的ROOT节点实际上是数组中对象。 + json数据格式: + ``` + [ + {"category":"xuxb111","author":"1avc","title":"SayingsoftheCentury","price":895},{"category":"xuxb222","author":"2avc","title":"SayingsoftheCentury","price":895}, + {"category":"xuxb333","author":"3avc","title":"SayingsoftheCentury","price":895} + ] + ``` + 通过指定jsonpath进行精准导入,例如只导入category、author、price三个属性 + ``` + curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H "jsonpaths: [\"$.category\",\"$.price\",\"$.author\"]" -H "strip_outer_array: true" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` + + 说明: + 1)如果json数据是以数组开始,并且数组中每个对象是一条记录,则需要将strip_outer_array设置成true,表示展平数组。 + 2)如果json数据是以数组开始,并且数组中每个对象是一条记录,在设置jsonpath时,我们的ROOT节点实际上是数组中对象。 + 12. 用户指定json根节点 -json数据格式: -``` -{ - "RECORDS":[ -{"category":"11","title":"SayingsoftheCentury","price":895,"timestamp":1589191587}, -{"category":"22","author":"2avc","price":895,"timestamp":1589191487}, -{"category":"33","author":"3avc","title":"SayingsoftheCentury","timestamp":1589191387} -] -} -``` -通过指定jsonpath进行精准导入,例如只导入category、author、price三个属性 -``` -curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H "jsonpaths: [\"$.category\",\"$.price\",\"$.author\"]" -H "strip_outer_array: true" -H "json_root: $.RECORDS" -T testData http://host:port/api/testDb/testTbl/_stream_load -``` + + json数据格式: + ``` + { + "RECORDS":[ + {"category":"11","title":"SayingsoftheCentury","price":895,"timestamp":1589191587}, + {"category":"22","author":"2avc","price":895,"timestamp":1589191487}, + {"category":"33","author":"3avc","title":"SayingsoftheCentury","timestamp":1589191387} + ] + } + ``` + 通过指定jsonpath进行精准导入,例如只导入category、author、price三个属性 + ``` + curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H "jsonpaths: [\"$.category\",\"$.price\",\"$.author\"]" -H "strip_outer_array: true" -H "json_root: $.RECORDS" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 13. 删除与这批导入key 相同的数据 -``` -curl --location-trusted -u root -H "merge_type: DELETE" -T testData http://host:port/api/testDb/testTbl/_stream_load -``` + ``` + curl --location-trusted -u root -H "merge_type: DELETE" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 14. 将这批数据中与flag 列为ture 的数据相匹配的列删除,其他行正常追加 -``` -curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv, flag" -H "merge_type: MERGE" -H "delete: flag=1" -T testData http://host:port/api/testDb/testTbl/_stream_load -``` + + ``` + curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv, flag" -H "merge_type: MERGE" -H "delete: flag=1" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` 15. 导入数据到含有sequence列的UNIQUE_KEYS表中 -``` -curl --location-trusted -u root -H "columns: k1,k2,source_sequence,v1,v2" -H "function_column.sequence_col: source_sequence" -T testData http://host:port/api/testDb/testTbl/_stream_load -``` + + ``` + curl --location-trusted -u root -H "columns: k1,k2,source_sequence,v1,v2" -H "function_column.sequence_col: source_sequence" -T testData http://host:port/api/testDb/testTbl/_stream_load + ``` + ### Keywords STREAM, LOAD @@ -410,11 +422,11 @@ curl --location-trusted -u root -H "columns: k1,k2,source_sequence,v1,v2" -H "fu Doris 的导入任务可以容忍一部分格式错误的数据。容忍率通过 `max_filter_ratio` 设置。默认为0,即表示当有一条错误数据时,整个导入任务将会失败。如果用户希望忽略部分有问题的数据行,可以将次参数设置为 0~1 之间的数值,Doris 会自动跳过哪些数据格式不正确的行。 - 关于容忍率的一些计算方式,可以参阅 [列的映射,转换与过滤](../../../../data-operate/import/import-scenes/load-data-convert.md) 文档。 + 关于容忍率的一些计算方式,可以参阅 [列的映射,转换与过滤](../../../../data-operate/import/import-scenes/load-data-convert) 文档。 7. 严格模式 - `strict_mode` 属性用于设置导入任务是否运行在严格模式下。该格式会对列映射、转换和过滤的结果产生影响。关于严格模式的具体说明,可参阅 [严格模式](../../../../data-operate/import/import-scenes/load-strict-mode.md) 文档。 + `strict_mode` 属性用于设置导入任务是否运行在严格模式下。该格式会对列映射、转换和过滤的结果产生影响。关于严格模式的具体说明,可参阅 [严格模式](../../../../data-operate/import/import-scenes/load-strict-mode) 文档。 8. 超时时间 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org