This is an automated email from the ASF dual-hosted git repository. yiguolei pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 8a7f8bedc3 [docs]update workload document (#1041) 8a7f8bedc3 is described below commit 8a7f8bedc3ac640060a0b06ea1f8c888b2e75817 Author: wangbo <wan...@apache.org> AuthorDate: Thu Aug 22 22:13:34 2024 +0800 [docs]update workload document (#1041) --- .../resource-admin/use-workload-cpu-limit.md | 4 +- .../resource-admin/use-workload-local-io-limit.md | 4 +- .../resource-admin/use-workload-mem-limit.md | 4 +- .../resource-admin/use-workload-remote-io-limit.md | 4 +- .../resource-admin/workload-analysis.md | 172 +++++++++++++++ .../admin-manual/resource-admin/workload-fusing.md | 2 - docs/admin-manual/resource-admin/workload-group.md | 42 +++- .../admin-manual/resource-admin/workload-policy.md | 1 - .../resource-admin/workload-system-table.md | 230 --------------------- .../resource-admin/use-workload-cpu-limit.md | 4 +- .../resource-admin/use-workload-local-io-limit.md | 4 +- .../resource-admin/use-workload-mem-limit.md | 4 +- .../resource-admin/use-workload-remote-io-limit.md | 4 +- .../resource-admin/workload-analysis.md | 19 +- .../admin-manual/resource-admin/workload-fusing.md | 6 +- .../admin-manual/resource-admin/workload-group.md | 1 - .../admin-manual/resource-admin/workload-policy.md | 5 +- sidebars.json | 2 +- 18 files changed, 240 insertions(+), 272 deletions(-) diff --git a/docs/admin-manual/resource-admin/use-workload-cpu-limit.md b/docs/admin-manual/resource-admin/use-workload-cpu-limit.md index 99e8668198..b1eca6d96e 100644 --- a/docs/admin-manual/resource-admin/use-workload-cpu-limit.md +++ b/docs/admin-manual/resource-admin/use-workload-cpu-limit.md @@ -1,6 +1,6 @@ --- { -"title": "Use Workload CPU Limit", +"title": "Use Workload Group limit CPU", "language": "zh-CN" } --- @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# Use Workload Group limit CPU - The workloads in Doris can generally be divided into three categories: Core Report Queries: These are typically used by company executives to view reports. The load may not be very high, but the availability requirements are high. These queries can be grouped together and assigned a higher-priority soft limit, ensuring they receive more CPU resources when CPU availability is limited. diff --git a/docs/admin-manual/resource-admin/use-workload-local-io-limit.md b/docs/admin-manual/resource-admin/use-workload-local-io-limit.md index 475ffd7584..44fc8c5e3d 100644 --- a/docs/admin-manual/resource-admin/use-workload-local-io-limit.md +++ b/docs/admin-manual/resource-admin/use-workload-local-io-limit.md @@ -1,6 +1,6 @@ --- { -"title": "Use Workload Local IO Limit", +"title": "Use Workload Group limit local IO", "language": "zh-CN" } --- @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# Use Workload Group limit local IO - In OLAP systems, when performing ETL or large ad-hoc queries, a significant amount of data needs to be read. To speed up data analysis, Doris internally uses multithreading to scan multiple disk files in parallel, which generates a large amount of disk I/O and can negatively impact other queries, such as report analysis. By using Workload Groups, you can group offline ETL data processing and online report queries separately and limit the I/O bandwidth for offline data processing, thereby reducing its impact on online report analysis. diff --git a/docs/admin-manual/resource-admin/use-workload-mem-limit.md b/docs/admin-manual/resource-admin/use-workload-mem-limit.md index 6ec6b70e79..418c5318fc 100644 --- a/docs/admin-manual/resource-admin/use-workload-mem-limit.md +++ b/docs/admin-manual/resource-admin/use-workload-mem-limit.md @@ -1,6 +1,6 @@ --- { -"title": "Use Workload Memory Limit", +"title": "Use Workload Group limit memory", "language": "zh-CN" } --- @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# Use Workload Group limit memory - Ad-hoc queries typically have unpredictable SQL input and uncertain memory usage, which carries the risk of a few queries consuming a large amount of memory. This type of load can be assigned to a separate group. By using the hard memory limits feature of the Workload Group, sudden large queries can be prevented from consuming all available memory, which would otherwise leave no memory for other queries or cause an Out of Memory (OOM) error. When the memory usage of this Workload Group exceeds the configured hard limit, memory will be freed by killing queries, thus preventing the process memory from being completely consumed. diff --git a/docs/admin-manual/resource-admin/use-workload-remote-io-limit.md b/docs/admin-manual/resource-admin/use-workload-remote-io-limit.md index 4f5972b5c9..0c2d766dc2 100644 --- a/docs/admin-manual/resource-admin/use-workload-remote-io-limit.md +++ b/docs/admin-manual/resource-admin/use-workload-remote-io-limit.md @@ -1,6 +1,6 @@ --- { -"title": "Use Workload Remote IO Limit", +"title": "Use Workload Group limit remote IO", "language": "zh-CN" } --- @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# Use Workload Group limit remote IO - BrokerLoad and S3Load are commonly used methods for importing large volumes of data. Users can first upload data to HDFS or S3 and then use BrokerLoad and S3Load for parallel data imports. To speed up the import process, Doris uses multithreading to pull data from HDFS/S3 in parallel. This can put significant pressure on HDFS/S3, potentially causing instability for other jobs running on HDFS/S3. You can use the remote I/O limitation feature of Workload Groups to limit the bandwidth used during the import process, thereby reducing the impact on other operations. diff --git a/docs/admin-manual/resource-admin/workload-analysis.md b/docs/admin-manual/resource-admin/workload-analysis.md new file mode 100644 index 0000000000..4a9baf614b --- /dev/null +++ b/docs/admin-manual/resource-admin/workload-analysis.md @@ -0,0 +1,172 @@ +--- +{ + "title": "Workload Analysis", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Doris supports analyzing workloads in the cluster through the Workload system tables, addressing the following issues: +1. It helps users understand the resource utilization of each Workload Group, allowing for reasonable setting of resource limits to avoid waste. +2. When cluster availability decreases due to insufficient resources, the system tables can be used to quickly identify the current resource usage distribution, enabling the formulation of appropriate resource management decisions to restore cluster availability. + +## Workload Table +All tables are in the database```information_schema```. +### active_queries +```active_queries``` records quereis in FE: +* query_id, query's id +* query_start_time, the time when the query starts executing; if the query was queued, it represents the time when execution begins after the queuing ends. +* query_time_ms, the duration of the query, measured in milliseconds. +* workload_group_id, the ID of the workload group used by the query. +* database, the database used by the SQL query. +* frontend_instance, the node name of the FE where the query is located. +* queue_start_time, if the query enters the queuing logic upon arrival, it represents the time point when the query started queuing. +* queue_end_time, if the query enters the queuing logic upon arrival, it represents the time point when the query finishes queuing. +* query_status, the current status of the query, which mainly has two values: RUNNING and QUEUED. RUNNING indicates that the query is currently running, while QUEUED means that the query is currently in the queue. +* sql, sql content. + +### backend_active_tasks +A query is typically divided into multiple fragments and executed on multiple BEs. The backend_active_tasks represents the total amount of CPU and memory resources used by a query on a single BE. If the query has multiple concurrent fragments on a single BE, the data will be aggregated into a single row. +Detailed information about the fields is as follows: +* be_id, backend id. +* fe_host, Represents which FE the query was submitted from. +* query_id, query's id. +* task_time_ms, query running time in BE, measured in milliseconds. +* task_cpu_time_ms, CPU time of the query while executing on the BE, measured in milliseconds. +* scan_rows, the number of rows scanned by the query on the current BE. If multiple tables are scanned, this value is the sum of rows scanned across all those tables. +* scan_bytes, the number of bytes scanned by the query on the current BE. If multiple tables are scanned, this value is the sum of bytes scanned across all those tables. +* be_peak_memory_bytes, the peak memory usage of the query on the current BE, measured in bytes. +* current_used_memory_bytes, the amount of memory currently used by the query on the current BE, measured in bytes. +* shuffle_send_bytes, the number of bytes sent by the query as a shuffle client on the current node. +* shuffle_send_rows, the number of rows sent by the query as a shuffle client on the current node. + +### workload_group_resource_usage +```workload_group_resource_usage```table provides real-time information on the current resource usage of Workload Groups. +The field descriptions are as follows: +* be_id, backend's i. +* workload_group_id, workload group's id. +* memory_usage_bytes, workload group's memory usage. +* cpu_usage_percent, the percentage of CPU usage by the Workload Group, calculated as the total CPU active time of the Workload Group in 1 second divided by the total available CPU time in 1 second. This value is the average over the last 10 seconds. +* local_scan_bytes_per_second, the number of bytes read per second by the Workload Group from local files. Note that due to Doris's Page Cache and operating system cache, this value is typically higher than what is monitored using system tools like pidstat. If multiple directories are configured, this value is the sum of I/O reads from all directories. For per-directory read I/O throughput, detailed data can be found in BE's bvar monitoring. +* remote_scan_bytes_per_second, the number of bytes read per second by the Workload Group from remote data. + +## Workload Analysis and Handling Methods + +When monitoring indicates a decrease in cluster availability, you can follow these steps to address the issue: + +1. Identify the Bottleneck: First, use monitoring to roughly determine the current bottleneck in the cluster, such as high memory usage, high CPU usage, or high I/O. If all are high, prioritize resolving memory issues. +2. Examine Resource Usage: After identifying the cluster bottleneck, use the workload_group_resource_usage table to identify the Workload Groups with the highest resource usage. For example, if memory is the bottleneck, find the top N groups with the highest memory usage. +3. Reduce Query Concurrency: Once you have identified the groups with the highest resource usage, try reducing the query concurrency for these groups. Since cluster resources are already tight, avoid allowing new queries to exhaust the cluster's resources. +4. Degrade Queries: Apply degradation to the queries in the current group based on the type of bottleneck: +* For CPU Bottleneck: Set the group's CPU to a hard limit and adjust cpu_hard_limit to a lower value to proactively release CPU resources. +* For I/O Bottleneck: Limit the group's maximum I/O using the read_bytes_per_second parameter. +* For Memory Bottleneck: Set the group's memory to a hard limit and reduce the memory_limit value to free up some memory. Be aware that this may cause many queries in the group to fail. +5. Further Analysis: After completing the above steps, the cluster’s availability should improve. Conduct further analysis to determine whether the increase in resource usage is due to higher overall query concurrency in the group or specific large queries. If specific large queries are causing the issue, quickly killing these large queries can help recover the system. +6. Kill Abnormal Queries: Use backend_active_tasks in conjunction with active_queries to identify SQL queries with abnormal resource usage in the cluster. Then, use the KILL statement to terminate these queries and free up resources. + +## Commonly Used SQL +:::tip +Note that the active_queries table records queries running on the FE, while the backend_active_tasks table records queries running on the BE. Not all queries are registered with the FE during execution; for example, stream loads are not registered with the FE. Therefore, it is normal to get no matching results when performing a LEFT JOIN between backend_active_tasks and active_queries. + +When a query is a SELECT query, the queryId recorded in both active_queries and backend_active_tasks is the same. When a query is a stream load, the queryId in the active_queries table is empty, while the queryId in backend_active_tasks is the ID of the stream load. +::: + +1. View all current Workload Groups and display them in descending order of memory/CPU/I/O usage. +``` +select be_id,workload_group_id,memory_usage_bytes,cpu_usage_percent,local_scan_bytes_per_second + from workload_group_resource_usage +order by memory_usage_bytes,cpu_usage_percent,local_scan_bytes_per_second desc +``` + +2. Cpu usage TopN Sql. + ``` + select + t1.query_id as be_query_id, + t1.query_type, + t2.query_id, + t2.workload_group_id, + t2.`database`, + t1.cpu_time, + t2.`sql` + from + (select query_id, query_type,sum(task_cpu_time_ms) as cpu_time from backend_active_tasks group by query_id, query_type) + t1 left join active_queries t2 + on t1.query_id = t2.query_id + order by cpu_time desc limit 10; + ``` + +3. Memory usage TopN Sql. + ``` + select + t1.query_id as be_query_id, + t1.query_type, + t2.query_id, + t2.workload_group_id, + t1.mem_used + from + (select query_id, query_type, sum(current_used_memory_bytes) as mem_used from backend_active_tasks group by query_id, query_type) + t1 left join active_queries t2 + on t1.query_id = t2.query_id + order by mem_used desc limit 10; + ``` + +4. Scan bytes/rows TopN Sql. + ``` + select + t1.query_id as be_query_id, + t1.query_type, + t2.query_id, + t2.workload_group_id, + t1.scan_rows, + t1.scan_bytes + from + (select query_id, query_type, sum(scan_rows) as scan_rows,sum(scan_bytes) as scan_bytes from backend_active_tasks group by query_id,query_type) + t1 left join active_queries t2 + on t1.query_id = t2.query_id + order by scan_rows desc,scan_bytes desc limit 10; + ``` + +5. Show workload group's scan rows/bytes. + ``` + select + t2.workload_group_id, + sum(t1.scan_rows) as wg_scan_rows, + sum(t1.scan_bytes) as wg_scan_bytes + from + (select query_id, sum(scan_rows) as scan_rows,sum(scan_bytes) as scan_bytes from backend_active_tasks group by query_id) + t1 left join active_queries t2 + on t1.query_id = t2.query_id + group by t2.workload_group_id + order by wg_scan_rows desc,wg_scan_bytes desc + ``` + +6. Show workload group's query queue details. + ``` + select + workload_group_id, + query_id, + query_status, + now() - queue_start_time as queued_time + from + active_queries + where query_status='queued' + order by workload_group_id + ``` \ No newline at end of file diff --git a/docs/admin-manual/resource-admin/workload-fusing.md b/docs/admin-manual/resource-admin/workload-fusing.md index 62622235f9..a38f3136af 100644 --- a/docs/admin-manual/resource-admin/workload-fusing.md +++ b/docs/admin-manual/resource-admin/workload-fusing.md @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# Workload Fusing - Workload Fusing is typically applied in scenarios where certain queries excessively consume resources, leading to a decline in the overall availability of the online cluster. In such cases, the cluster can be restored to normal operation by detecting and blocking the abnormal big queries during runtime. If users have already grouped their workload based on query latency, such as assigning big queries to one Workload Group and small queries to another, the big query circuit breaking method can also be used to improve the availability of the small query group. diff --git a/docs/admin-manual/resource-admin/workload-group.md b/docs/admin-manual/resource-admin/workload-group.md index ac88a8f1b2..6b6d01ffbe 100644 --- a/docs/admin-manual/resource-admin/workload-group.md +++ b/docs/admin-manual/resource-admin/workload-group.md @@ -24,9 +24,13 @@ specific language governing permissions and limitations under the License. --> +You can use Workload Groups to manage the CPU, memory, and I/O resources used by queries and imports in the Doris cluster, and to control the maximum concurrency of queries in the cluster. Permissions for Workload Groups can be granted to specific roles and users. +Workload Groups are particularly effective in the following scenarios: +1. For scenarios where performance stability is preferred, and it is not required for query load to utilize all cluster resources, but stable query latency is desired. In such cases, you can set hard limits on CPU/I/O for Workload Groups. +2. When overall cluster load is high and availability decreases, you can restore cluster availability by degrading Workload Groups that consume excessive resources. For example, reduce the maximum query concurrency and I/O throughput for these Workload Groups. -The workload group can limit the use of compute and memory resources on a single be node for tasks within the group. Currently, query binding to workload groups is supported. +Using hard limits for resource management usually results in better stability and performance, such as configuring maximum concurrency for FE and setting hard limits on CPU. Soft limits on CPU typically only have an effect when the CPU is fully utilized, which can lead to increased latency due to resource contention with other Doris components (RPC) and the operating system. Configuring hard limits for Doris query loads can effectively mitigate this issue. Additionally, setting maximum c [...] ## Version Description Workload Group is a feature that has been supported since version 2.0. The main difference between version 2.0 and 2.1 is that the 2.0 version of Workload Group does not rely on CGroup, while the 2.1 version of Workload Group depends on CGroup. Therefore, when using the 2.1 version of Workload Group, the environment of CGroup needs to be configured. @@ -57,6 +61,11 @@ Scenario 2: If the Workload Group is not used in version 2.0, it is also necessa * max_queue_size: Optional, length of the query queue. When the queue is full, new queries will be rejected. The default value is 0, which means no queuing. * queue_timeout: Optional, query the timeout time in the queue, measured in milliseconds. If the query exceeds this value, an exception will be thrown directly to the client. The default value is 0, which means no queuing. * scan_thread_num: Optional, the number of threads used for scanning in the current workload group. The default value is -1, which means it does not take effect, the number of scan threads in the be configuration shall prevail. The value is an integer greater than 0. +* max_remote_scan_thread_num: Optional. The maximum number of threads in the scan thread pool for reading external data sources. The default value is -1, which means the actual number of threads is determined by the BE and is typically related to the number of cores. +* min_remote_scan_thread_num: Optional. The minimum number of threads in the scan thread pool for reading external data sources. The default value is -1, which means the actual number of threads is determined by the BE and is typically related to the number of cores. +* tag: Optional. Default is empty. Assigns a tag to the Workload Group. The sum of resources for Workload Groups with the same tag cannot exceed 100%. If multiple values are desired, they can be separated by commas. Detailed description of the tagging function will follow. +* read_bytes_per_second: Optional. Specifies the maximum I/O throughput when reading internal tables in Doris. The default value is -1, which means there is no limit on I/O bandwidth. Note that this value is not bound to disks but to folders. For example, if Doris is configured with two folders for storing internal table data, the maximum read I/O for each folder will not exceed this value. If these two folders are configured on the same disk, the maximum throughput control will be twice [...] +* remote_read_bytes_per_second: Optional. Specifies the maximum I/O throughput when reading external tables in Doris. The default value is -1, which means there is no limit on I/O bandwidth. Notes: @@ -64,6 +73,26 @@ Notes: 2 All properties are optional, but at least one propety needs to be specified when creating a Workload Group. +## Grouping Workload Group By Tag +The Workload Group feature divides the resource usage of a single BE. When a user creates a Workload Group (Group A), its metadata is by default sent to all BEs and threads are started on each BE, leading to the following issues: +1. Multiple Clusters Issue: In a production environment, a Doris cluster is typically divided into several smaller clusters, such as a local storage cluster and a cluster with Compute Nodes for querying external storage. These two clusters operate independently. If a user wants to use the Workload Group feature, it would lead to the issue where the mem_limit of Workload Groups for external storage and local storage cannot exceed 100%, even though these two types of load are on completely [...] +2. Thread Resource Management: The number of threads itself is a resource. If a process's thread quota is exhausted, it will cause the process to crash. Therefore, sending the Workload Group metadata to all nodes by default is also unreasonable. + +To address these issues, Doris implements a grouping feature for Workload Groups. The cumulative value of Workload Groups with the same tag cannot exceed 100%, but there can be multiple such tag groups within a cluster. When a BE node is tagged, it will match the corresponding Workload Groups based on specific rules. + +Example: +1. Create a Workload Group named tag_wg with the tag cn1. If none of the BEs in the cluster have been tagged, the metadata for this Workload Group will be sent to all BEs. The tag attribute can specify multiple values, separated by commas. +``` +create workload group tag_wg properties('tag'='cn1'); +``` +2. Modify the tag of a BE in the cluster to cn1. At this point, the tag_wg Workload Group will only be sent to this BE and any BE with no tag. The tag.workload_group attribute can specify multiple values, separated by commas. +``` +alter system modify backend "localhost:9050" set ("tag.workload_group" = "cn1"); +``` + +Workload Group and BE Matching Rules: +If the Workload Group's tag is empty, the Workload Group can be sent to all BEs, regardless of whether the BE has a tag or not. +If the Workload Group's tag is not empty, the Workload Group will only be sent to BEs with the same tag. ## Configure cgroup v1 @@ -99,6 +128,15 @@ doris_cgroup_cpu_path = /sys/fs/cgroup/cpu/doris It should be noted that the current workload group does not support the deployment of multiple BE on same machine. +## Note for Using Workload Groups in K8S +The CPU management for Workloads is implemented based on CGroup. To use Workload Groups within containers, you need to start the containers in privileged mode so that the Doris processes inside the container have permission to read and write CGroup files on the host. + +When Doris runs inside a container, the CPU resources for Workload Groups are allocated based on the container's available resources. For example, if the host machine has 64 cores and the container is allocated 8 cores, with a CPU hard limit of 50% configured for the Workload Group, the actual number of usable cores for the Workload Group would be 4 (8 cores * 50%). + +Memory and I/O management for Workload Groups are handled internally by Doris and do not rely on external components, so there is no difference in deployment between containers and physical machines. + +If you want to use Doris on K8S, it is recommended to use the Doris Operator for deployment, as it can abstract away the underlying permission details. + ## Workload group usage 1. First, create a custom workload group. ``` @@ -198,7 +236,7 @@ ADMIN SET FRONTEND CONFIG ("enable_cpu_hard_limit" = "true"); If user expects to switch back from cpu hard limit to cpu soft limit, then they only need to set ```enable_cpu_hard_limit=false```. CPU Soft Limit property ```cpu_share``` will be filled with a valid value of 1024 by default(If the user has never set the cpu_share before), and users can adjust cpu_share based on the priority of Workload Group. -# Workload Group权限表 +# Workload Group Permissions Table You can view the Workload Groups that users or roles have access to through the Workload Group privilege table. Authorization related usage can refer to[grant statement](../../sql-manual/sql-statements/Account-Management-Statements/GRANT). This table currently has row level permission control. Root or admin accounts can view all data, while non root/admin accounts can only see data from Workload Groups that they have access to。 diff --git a/docs/admin-manual/resource-admin/workload-policy.md b/docs/admin-manual/resource-admin/workload-policy.md index 3647aa8cab..659ef546e1 100644 --- a/docs/admin-manual/resource-admin/workload-policy.md +++ b/docs/admin-manual/resource-admin/workload-policy.md @@ -24,7 +24,6 @@ specific language governing permissions and limitations under the License. --> -## Backgroup The Workload Group solves the problem of isolation between different workload, but it cannot solve negative impact of large queries on stability within the same Group. When users encounter large queries that affect cluster stability, they can only manually handle them. Therefore, Doris has implemented Workload Policy, which supports the automation of query load management, such as automatically canceling queries with running time exceeding 5 seconds. diff --git a/docs/admin-manual/resource-admin/workload-system-table.md b/docs/admin-manual/resource-admin/workload-system-table.md deleted file mode 100644 index 3e8bd9e82e..0000000000 --- a/docs/admin-manual/resource-admin/workload-system-table.md +++ /dev/null @@ -1,230 +0,0 @@ ---- -{ -"title": "Workload System Table", -"language": "en" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -## Backgroup -Doris supports analyzing the resource usage of running workloads through the Workload system table, commonly used in the following scenarios: -1. View the resource usage of Workload Group in the cluster, including CPU and memory. -2. View the TopN SQL with the highest resource usage in the current cluster. -3. View the queue status of Workload Groups in the cluster - -Users can query this information by submitting SQL, identify the Workload Group or SQL with high resource usage in the system, and perform corresponding processing. - -## Workload System Table Introduction -At present, the system tables are in database ```information_schema```. -### active_queries -The ```active_queries``` table records the execution information of the current query on FE, and the detailed information of the fields is as follows: -* query_id, query's id -* query_start_time, the start time of the query execution; If query queues, it represents the time when execution starts after the queue ends -* query_time_ms, the query time, in milliseconds -* workload_group_id, workload group's id -* database, sql's database -* frontend_instance, the FE which query is submitted -* queue_start_time, if query queues, it means the time that query begins to queue -* queue_end_time, if query queues, it means the time that query ends to queue -* query_status, query status, it has two value RUNNING and QUEUED, RUNNIG means query is running; QUEUED means query is queued. -* sql, sql content - -### backend_active_tasks -A query is usually divided into multiple fragments to be executed on multiple BEs, and ```backend_active_tasks``` table represent the total amount of CPU and memory resources used by a query on a single BE. If this query has multiple concurrency and fragments on a single BE, it will also be summarized into one row of data. -The detailed information of the fields is as follows: -* be_id, BE's id -* fe_host, it represents which FE this query was submitted from -* query_id, query's id -* task_time_ms, query execution time on the current BE, in milliseconds -* task_cpu_time_ms,query cpu time on the current BE, in milliseconds -* scan_rows, the number of rows scanned on the current BE. If multiple tables are scanned, it is the cumulative value of multiple tables -* scan_bytes, the number of bytes scanned on the current BE. If multiple tables are scanned, it is the cumulative value of multiple tables -* be_peak_memory_bytes, the peak memory usage on the current BE, in bytes -* current_used_memory_bytes, the amount of memory currently in use on the BE, in bytes -* shuffle_send_bytes, the number of bytes sent as shuffle clients at the BE -* shuffle_send_rows, the number of rows sent as shuffle clients at the BE - -## Basic Usage -1. TopN resource usage sql - ``` - select - t2.query_id, - t2.workload_group_id, - t2.`database`, - t1.cpu_time, - t1.mem_used, - t2.`sql` - from - (select query_id, sum(task_cpu_time_ms) as cpu_time,sum(current_used_memory_bytes) as mem_used from backend_active_tasks group by query_id) - t1 left join active_queries t2 - on t1.query_id = t2.query_id - order by cpu_time desc, mem_used desc limit 10; - ``` - -2. TopN resource usage sql on BE - ``` - select - t2.query_id, - t2.workload_group_id, - t2.`database`, - t1.cpu_time, - t1.mem_used, - t2.`sql` - from - (select query_id, sum(task_cpu_time_ms) as cpu_time,sum(current_used_memory_bytes) as mem_used - from backend_active_tasks where be_id=12345 group by query_id) - t1 left join active_queries t2 - on t1.query_id = t2.query_id - order by cpu_time desc, mem_used desc limit 10; - ``` - - -3. WorkloadGroup queue details - ``` - select - workload_group_id, - sum(case when query_status='QUEUED' then 1 else 0 end) as queue_num, - sum(case when query_status='RUNNING' then 1 else 0 end) as running_query_num - from - active_queries - group by workload_group_id - ``` - -4. Query queue time - ``` - select - workload_group_id, - query_id, - query_status, - now() - queue_start_time as queued_time - from - active_queries - where query_status='queued' - order by workload_group_id - ``` - -## Application scenarios -When the query latency of the cluster increases and the availability decreases, bottleneck points can be identified through the overall monitoring of the cluster: -1. When the CPU resources of BE are fully utilized and the memory usage is not high, it indicates that the main bottleneck should be on the CPU. -2. When the CPU and memory resources of BE are both high, it indicates that the main bottleneck is in memory. -3. When the CPU and memory resource usage of BE is not high, but the IO usage is high, it indicates that the main bottleneck is in IO. -4. The CPU/memory/IO is not high, but there are many queued queries, indicating that the queue parameter configuration is unreasonable. You can try increasing the queue concurrency. - -After identifying the bottleneck points of the cluster, the SQL that is currently using more resources can be further analyzed through the workload system table, and then the query can be downgraded. - -### Find SQL with the highest CPU usage -1. TopN CPU usage sql - ``` - select - t2.query_id, - t2.workload_group_id, - t2.`database`, - t1.cpu_time, - t2.`sql` - from - (select query_id, sum(task_cpu_time_ms) as cpu_time from backend_active_tasks group by query_id) - t1 left join active_queries t2 - on t1.query_id = t2.query_id - order by cpu_time desc limit 10; - ``` - -2. Workload group cpu time - ``` - select - t2.workload_group_id, - sum(t1.cpu_time) cpu_time - from - (select query_id, sum(task_cpu_time_ms) as cpu_time from backend_active_tasks group by query_id) - t1 left join active_queries t2 - on t1.query_id = t2.query_id - group by workload_group_id order by cpu_time desc - ``` - -If the CPU usage of a single SQL is too high, it can be alleviated by canceling the query. - -If SQL with longer CPU time comes from the same workload group, CPU usage can be reduced by lowering the CPU priority of this workload group or lowering the number of scan threads. - -### Find SQL with the highest memory usage -1. TopN memory usage sql - ``` - select - t2.query_id, - t2.workload_group_id, - t1.mem_used - from - (select query_id, sum(current_used_memory_bytes) as mem_used from backend_active_tasks group by query_id) - t1 left join active_queries t2 - on t1.query_id = t2.query_id - order by mem_used desc limit 10; - ``` - -2. Workload group memory usage - ``` - select - t2.workload_group_id, - sum(t1.mem_used) wg_mem_used - from - (select query_id, sum(current_used_memory_bytes) as mem_used from backend_active_tasks group by query_id) - t1 left join active_queries t2 - on t1.query_id = t2.query_id - group by t2.workload_group_id order by wg_mem_used desc - ``` - -If a single query occupies most of the memory, then killing this query can be used to quickly free up memory. - -If a lower priority workload group is using more memory, memory can be released by downgrading this workload group: -1. If the memory configuration is soft limit, it can be modified to hard limit and the memory limit of the workload group can be reduced -2. Reduce the query concurrency of the workload group through its queuing function - -### Find queries which scan too much data -At present, Doris does not directly collect indicators of disk IO for queries, but it can indirectly find SQL with heavier scans through the number of scan rows and scan bytes. - -1. TopN scan data query - ``` - select - t2.query_id, - t2.workload_group_id, - t1.scan_rows, - t1.scan_bytes - from - (select query_id, sum(scan_rows) as scan_rows,sum(scan_bytes) as scan_bytes from backend_active_tasks group by query_id) - t1 left join active_queries t2 - on t1.query_id = t2.query_id - order by scan_rows desc,scan_bytes desc limit 10; - ``` - -2. Workload group scan data - ``` - select - t2.workload_group_id, - sum(t1.scan_rows) as wg_scan_rows, - sum(t1.scan_bytes) as wg_scan_bytes - from - (select query_id, sum(scan_rows) as scan_rows,sum(scan_bytes) as scan_bytes from backend_active_tasks group by query_id) - t1 left join active_queries t2 - on t1.query_id = t2.query_id - group by t2.workload_group_id - order by wg_scan_rows desc,wg_scan_bytes desc - ``` - -If the scan data volume of a single SQL statement is large, then a kill query can be used to check if there will be any relief. - -If the scanning data volume of a certain workload group is large, IO can be reduced by lowering the number of scanning threads of the workload group. \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-cpu-limit.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-cpu-limit.md index 180165c56b..6e09633734 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-cpu-limit.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-cpu-limit.md @@ -1,6 +1,6 @@ --- { -"title": "Use Workload CPU Limit", +"title": "使用Workload Group管理CPU资源", "language": "zh-CN" } --- @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# 使用Workload Group管理CPU资源 - Doris 的负载大体可以分为三类: 1. 核心报表查询,通常给公司高层查看报表使用,负载不一定很高,但是对可用性要求较高,这类查询可以划分到一个分组,配置较高优先级的软限,保证CPU资源不够时可以获得更多的CPU资源。 2. Adhoc类查询,这类查询通常偏探索分析,SQL比较随机,具体的资源用量也比较未知,优先级通常不高。因此可以使用CPU硬限进行管理,并配置较低的值,避免占用过多CPU资源降低集群可用性。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-local-io-limit.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-local-io-limit.md index 0f12a7958a..d523f30bbd 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-local-io-limit.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-local-io-limit.md @@ -1,6 +1,6 @@ --- { -"title": "Use Workload Local IO Limit", +"title": "使用Workload Group管理本地IO", "language": "zh-CN" } --- @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# 使用Workload Group管理本地IO - OLAP 系统在做ETL或者大的Adhoc 查询时,需要读取大量的数据,Doris 为了加速数据分析过程,内部会使用多线程并行的方式对多个磁盘文件扫描,会产生巨大的磁盘IO,就会对其他的查询(比如报表分析)产生影响。 可以通过Workload Group 对离线的ETL数据处理和在线的报表查询做分组,限制离线数据处理IO带宽的方式,降低它对在线报表分析的影响。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-mem-limit.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-mem-limit.md index c313223c83..af86547dac 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-mem-limit.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-mem-limit.md @@ -1,6 +1,6 @@ --- { -"title": "Use Workload Memory Limit", +"title": "使用Workload Group管理内存资源", "language": "zh-CN" } --- @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# 使用Workload Group管理内存资源 - Adhoc类查询通常输入的SQL不确定,使用的内存资源也不确定,因此存在少数查询占用很大内存的风险。 可以对这类负载可以划分到独立的分组,通过Workload Group对内存的硬限的功能,避免突发性的大查询占满所有内存,导致其他查询没有可用内存或者OOM。 当这个Workload Group的内存使用超过配置的硬限值时,会通过杀死查询的方式释放内存,避免进程内存被打满。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-remote-io-limit.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-remote-io-limit.md index e2bb996794..a57990c410 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-remote-io-limit.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/use-workload-remote-io-limit.md @@ -1,6 +1,6 @@ --- { -"title": "Use Workload Remote IO Limit", +"title": "使用Workload Group管理远程IO", "language": "zh-CN" } --- @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# 使用Workload Group管理远程IO - BrokerLoad和S3Load 是常用的大批量数据导入方式,用户可以把数据先上传到HDFS或者S3,然后通过Brokerload和S3Load 对数据进行并行导入。 Doris为了加快导入速度,会使用多线程并行的方式从HDFS/S3拉取数据,此时会对HDFS/S3 产生巨大的压力,会导致HDFS/S3上运行的别的作业不稳定。 可以通过Workload Group 远程IO的限制功能来限制导入过程中对HDFS/S3的带宽,降低对其他业务的影响。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-analysis.md index fd90fea7b0..7888a10243 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-analysis.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-analysis.md @@ -24,7 +24,6 @@ specific language governing permissions and limitations under the License. --> -## 背景 Doris支持通过Workload系统表对集群中的工作负载进行分析,可以解决以下问题: 1. 帮助用户了解每个Workload Group的资源利用率,合理的设置资源上限,避免资源的浪费。 2. 当集群由于资源不足导致可用性下降时,可以使用系统表快速定位出目前资源使用的分布情况,从做出相应的资源管控决策,恢复集群的可用性。 @@ -85,6 +84,12 @@ Doris支持通过Workload系统表对集群中的工作负载进行分析,可 6. 可以使用```backend_active_tasks```结合```active_queries```找出目前集群中资源用量比较异常的SQL,然后通过kill语句杀死这些SQL释放资源。 ## 常用SQL +:::tip +需要注意的是,active_queries表记录了在FE运行的query,backend_active_tasks记录了在BE运行的query,并非所有query运行时在FE注册,比如stream load就并未在FE注册。 +因此使用backend_active_tasks表left join active_queries如果没有匹配的结果属于正常情况。 +当一个Query是select查询时,那么active_queries和backend_active_tasks中记录的queryId是一致的。当一个Query是stream load时,那么在active_queries表中的queryId为空,backend_active_tasks的queryId是该stream load的Id。 +::: + 1. 查看目前所有的Workload Group并依次按照内存/CPU/IO降序显示。 ``` select be_id,workload_group_id,memory_usage_bytes,cpu_usage_percent,local_scan_bytes_per_second @@ -95,13 +100,15 @@ order by memory_usage_bytes,cpu_usage_percent,local_scan_bytes_per_second desc 2. CPU使用topN的sql ``` select + t1.query_id as be_query_id, + t1.query_type, t2.query_id, t2.workload_group_id, t2.`database`, t1.cpu_time, t2.`sql` from - (select query_id, sum(task_cpu_time_ms) as cpu_time from backend_active_tasks group by query_id) + (select query_id, query_type,sum(task_cpu_time_ms) as cpu_time from backend_active_tasks group by query_id, query_type) t1 left join active_queries t2 on t1.query_id = t2.query_id order by cpu_time desc limit 10; @@ -110,11 +117,13 @@ order by memory_usage_bytes,cpu_usage_percent,local_scan_bytes_per_second desc 3. 内存使用topN的sql ``` select + t1.query_id as be_query_id, + t1.query_type, t2.query_id, t2.workload_group_id, t1.mem_used from - (select query_id, sum(current_used_memory_bytes) as mem_used from backend_active_tasks group by query_id) + (select query_id, query_type, sum(current_used_memory_bytes) as mem_used from backend_active_tasks group by query_id, query_type) t1 left join active_queries t2 on t1.query_id = t2.query_id order by mem_used desc limit 10; @@ -123,12 +132,14 @@ order by memory_usage_bytes,cpu_usage_percent,local_scan_bytes_per_second desc 4. 扫描数据量topN的sql ``` select + t1.query_id as be_query_id, + t1.query_type, t2.query_id, t2.workload_group_id, t1.scan_rows, t1.scan_bytes from - (select query_id, sum(scan_rows) as scan_rows,sum(scan_bytes) as scan_bytes from backend_active_tasks group by query_id) + (select query_id, query_type, sum(scan_rows) as scan_rows,sum(scan_bytes) as scan_bytes from backend_active_tasks group by query_id,query_type) t1 left join active_queries t2 on t1.query_id = t2.query_id order by scan_rows desc,scan_bytes desc limit 10; diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-fusing.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-fusing.md index d8cd0b8a7a..6a8b06a295 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-fusing.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-fusing.md @@ -1,7 +1,7 @@ --- { -"title": "Workload Fusing", -"language": "zh-CN" + "title": "大查询熔断", + "language": "zh-CN" } --- @@ -24,8 +24,6 @@ specific language governing permissions and limitations under the License. --> -# 大查询熔断 - 大查询熔断通常适用于线上集群经常出现个别占用资源过多的查询导致集群整体可用性下降的场景,此时可以通过在运行时检测异常的大查询,并且阻止这个查询运行的方式使得集群恢复正常。 如果用户已经根据查询延迟对负载进行了分组,比如大查询划分到一个Workload Group,小查询划分到一个Workload Group,那么也可以通过大查询熔断的方式提高小查询分组的可用性。 目前的大查询熔断策略主要是通过Workload Policy功能实现的,用户可以通过配置特定的策略实现大查询的自动熔断。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-group.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-group.md index 7b83354794..5b9912c431 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-group.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-group.md @@ -24,7 +24,6 @@ specific language governing permissions and limitations under the License. --> -# WORKLOAD GROUP 你可以使用Workload Group管理Doris集群中查询和导入负载所使用的CPU/内存/IO资源用量,控制集群中查询的最大并发。Workload Group的使用权限可以授予给特定的角色和用户。 在以下场景使用Workload Group通常会取得不错的效果: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-policy.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-policy.md index 86136a7cc6..b506b1f81b 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-policy.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/resource-admin/workload-policy.md @@ -1,7 +1,7 @@ --- { -"title": "Workload Policy", -"language": "zh-CN" + "title": "Workload Policy", + "language": "zh-CN" } --- @@ -24,7 +24,6 @@ specific language governing permissions and limitations under the License. --> -## 功能背景 Workload Group功能解决了不同负载间的隔离问题,但无法解决同一个Group内的大查询熔断的问题, 用户遇到大查询影响集群稳定性时只能手动处理。 因此Doris实现了Workload Policy的功能,支持查询负载管理的自动化,比如实现自动取消运行时间超过5s的查询这样的功能。 diff --git a/sidebars.json b/sidebars.json index 1cc877ced4..6b88c99017 100644 --- a/sidebars.json +++ b/sidebars.json @@ -394,7 +394,7 @@ "items": [ "admin-manual/resource-admin/workload-group", "admin-manual/resource-admin/workload-policy", - "admin-manual/resource-admin/workload-system-table", + "admin-manual/resource-admin/workload-analysis", "admin-manual/resource-admin/multi-tenant", "admin-manual/resource-admin/workload-fusing", "admin-manual/resource-admin/use-workload-mem-limit", --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org