This is an automated email from the ASF dual-hosted git repository. kassiez pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new c93755ebff9 [cloud] update compute group doc (#1508) c93755ebff9 is described below commit c93755ebff976c004c97a9541aaa700207ae4de9 Author: deardeng <565620...@qq.com> AuthorDate: Fri Dec 27 10:14:38 2024 +0800 [cloud] update compute group doc (#1508) ## Versions - [ ] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [ ] Chinese - [ ] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --- .../managing-compute-cluster.md | 51 ++++++++++++++------ docs/compute-storage-decoupled/overview.md | 13 +----- .../managing-compute-cluster.md | 52 ++++++++++++++------- .../current/compute-storage-decoupled/overview.md | 12 +---- .../managing-compute-cluster.md | 54 +++++++++++++++------- .../compute-storage-decoupled/overview.md | 14 +----- .../managing-compute-cluster.md | 52 +++++++++++++++------ .../compute-storage-decoupled/overview.md | 13 +----- 8 files changed, 151 insertions(+), 110 deletions(-) diff --git a/docs/compute-storage-decoupled/managing-compute-cluster.md b/docs/compute-storage-decoupled/managing-compute-cluster.md index 85ed98d61f7..001db65d431 100644 --- a/docs/compute-storage-decoupled/managing-compute-cluster.md +++ b/docs/compute-storage-decoupled/managing-compute-cluster.md @@ -33,9 +33,38 @@ In a compute-storage decoupled architecture, one or more compute nodes (BE) can *Note* In versions prior to 3.0.2, this was referred to as a Compute Cluster. +## Compute Group Usage Scenarios + +In a multi-compute group architecture, you can group one or more stateless BE nodes into compute clusters. By using compute cluster specification statements (use @<compute_group_name>), you can allocate specific workloads to specific compute clusters, achieving physical isolation of multiple import and query workloads. + +Assume there are two compute clusters: C1 and C2. + +- **Read-Read Isolation**: Before initiating two large queries, use `use @c1` and `use @c2` respectively to ensure that the queries run on different compute nodes. This prevents resource contention (CPU, memory, etc.) when accessing the same dataset. + +- **Read-Write Isolation**: Doris data imports consume substantial resources, especially in scenarios with large data volumes and high-frequency imports. To avoid resource contention between queries and imports, you can use `use @c1` and `use @c2` to specify that queries execute on C1 and imports on C2. Additionally, the C1 compute cluster can access newly imported data in the C2 compute cluster. + +- **Write-Write Isolation**: Similar to read-write isolation, imports can also be isolated from each other. For example, when the system has both high-frequency small imports and large batch imports, batch imports typically take longer and have higher retry costs, while high-frequency small imports are quick with lower retry costs. To prevent small imports from interfering with batch imports, you can use `use @c1` and `use @c2` to specify small imports to execute on C1 and batch imports on C2. + +## Default Compute Group Selection Mechanism + +When a user has not explicitly [set a default compute group](#setting-default-compute-group), the system will automatically select a compute group with Active BE that the user has usage permissions for. Once the default compute group is determined in a specific session, it will remain unchanged during that session unless the user explicitly changes the default setting. + +In different sessions, if the following situations occur, the system may automatically change the user's default compute group: + +- The user has lost usage permissions for the default compute group selected in the last session +- A compute group has been added or removed +- The previously selected default compute group no longer has Alive BE + +Situations one and two will definitely lead to a change in the automatically selected default compute group, while situation three may lead to a change. + ## Viewing All Compute Groups -You can view all compute groups owned by the current repository using `SHOW COMPUTE GROUPS`. +Use the `SHOW COMPUTE GROUPS` command to view all compute groups in the current repository. The returned results will display different content based on the user's permission level: + +- Users with `ADMIN` privileges can view all compute groups +- Regular users can only view compute groups for which they have usage permissions (USAGE_PRIV) +- If a user doesn't have usage permissions for any compute groups, an empty result will be returned + ```sql SHOW COMPUTE GROUPS; @@ -43,7 +72,8 @@ SHOW COMPUTE GROUPS; ## Adding Compute Groups -Using [Add BE ](../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND.md) to add a BE into a compute group, for example: +Managing compute groups requires `OPERATOR` privilege, which controls node management permissions. For more details, please refer to [Privilege Management](../sql-manual/sql-statements/Account-Management-Statements/GRANT.md). By default, only the root account has the `OPERATOR` privilege, but it can be granted to other accounts using the `GRANT` command. +To add a BE and assign it to a compute group, use the [Add BE](../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND.md) command. For example: ```sql ALTER SYSTEM ADD BACKEND 'host:9050' PROPERTIES ("tag.compute_group_name" = "new_group"); @@ -56,12 +86,14 @@ ALTER SYSTEM ADD BACKEND 'host:9050'; ``` ## Granting Compute Group Access +Prerequisite: The current operating user has' ADMIN 'permission, or the current user belongs to the admin role. ```sql GRANT USAGE_PRIV ON COMPUTE GROUP {compute_group_name} TO {user} ``` ## Revoking Compute Group Access +Prerequisite: The current operating user has' ADMIN 'permission, or the current user belongs to the admin role. ```sql REVOKE USAGE_PRIV ON COMPUTE GROUP {compute_group_name} FROM {user} @@ -69,7 +101,7 @@ REVOKE USAGE_PRIV ON COMPUTE GROUP {compute_group_name} FROM {user} ## Setting Default Compute Group -To set the default compute group for the current user: +To set the default compute group for the current user(This operation requires the current user to already have permission to use the computing group): ```sql SET PROPERTY 'default_compute_group' = '{clusterName}'; @@ -87,7 +119,7 @@ To view the current user's default compute group, the value of `default_compute_ SHOW PROPERTY; ``` -To view the default compute group of other users, this operation requires the current user to have relevant permissions, and the value of `default_compute_group` in the returned result is the default compute group: +To view the default compute group of other users, This operation requires the current user to have admin privileges, and the value of `default_compute_group` in the returned result is the default compute group: ```sql SHOW PROPERTY FOR {user}; @@ -113,17 +145,6 @@ SHOW COMPUTE GROUPS; ::: -## Default Compute Group Selection Mechanism - -When a user has not explicitly set a default compute group, the system will automatically select a compute group with Active BE that the user has usage permissions for. Once the default compute group is determined in a specific session, it will remain unchanged during that session unless the user explicitly changes the default setting. - -In different sessions, if the following situations occur, the system may automatically change the user's default compute group: - -- The user has lost usage permissions for the default compute group selected in the last session -- A compute group has been added or removed -- The previously selected default compute group no longer has Active BE - -Situations one and two will definitely lead to a change in the automatically selected default compute group, while situation three may lead to a change. ## Switching Compute Groups diff --git a/docs/compute-storage-decoupled/overview.md b/docs/compute-storage-decoupled/overview.md index eeab2319a00..0d4848165ad 100644 --- a/docs/compute-storage-decoupled/overview.md +++ b/docs/compute-storage-decoupled/overview.md @@ -93,16 +93,5 @@ The shared storage layer stores the data files, including segment files and the - When you have already adopted public cloud services; - When you have reliable shared storage systems, such as HDFS, Ceph, and object storage; - When you require high elastic scalability, Kubernetes containerization, or to run on a private cloud; +- High throughput shared storage capability, allowing multiple computing groups to share data - When you have a dedicated team responsible for maintaining the company's entire data warehouse platform. - -## Workload isolation across compute clusters - -As mentioned earlier, a compute cluster is formed by one or more stateless BE nodes. By using the compute cluster specification statement (`use @<compute_group_name>`), you can direct specific workloads to specific compute clusters, thus realizing physical isolation of data import and query workloads. - -Assuming there are 2 compute clusters: C1 and C2. - -**Read isolation**: Before initiating two large queries, you can leverage `use @c1` and `use @c2` respectively to make the two queries run on different compute nodes. In this way, the two queries will not interfere with each other due to competition for CPU and memory resources when accessing the same dataset. - -**Read-write isolation**: Data imports can consume resources, especially with large data volumes and high import frequency. To avoid resource contention between queries and imports, you can specify query requests to run on C1 and import requests to run on C2 using `use @c1` and `use @c2`. Meanwhile, the `c1` compute cluster can access the newly imported data in the `c2` compute cluster. - -**Write-write isolation**: Data import tasks can also be isolated from each other. In some cases, the system handles both high-frequency small imports and large-scale batch imports. The batch imports often take longer and have higher retry costs, while the high-frequency small imports are the opposite. To avoid small imports interfering with batch imports, you can direct the small imports to run on `c1` and the batch imports to run on `c2` via `use @c1` and `use @c2`. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md index 1361a80b43a..fe818f05616 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md @@ -33,9 +33,38 @@ under the License. *注意* 3.0.2 之前的版本中叫做计算集群(Compute Cluster)。 +## 计算组使用场景 + +在多计算组的架构下,可以通过将一个或多个无状态的 BE 节点组成计算集群,利用计算集群指定语句 (use @<compute_group_name>) 将特定负载分配到特定的计算集群中,从而实现多导入和查询负载的物理隔离。 + +假设当前有两个计算集群:C1 和 C2。 + +- **读读隔离**:在发起两个大型查询之前,分别使用 `use @c1` 和 `use @c2`,确保两个查询在不同的计算节点上运行,从而避免在访问相同数据集时因 CPU 和内存等资源竞争而相互干扰。 + +- **读写隔离**:Doris 的数据导入会消耗大量资源,尤其是在大数据量和高频导入的场景中。为了避免查询和导入之间的资源竞争,可以通过 `use @c1` 和 `use @c2` 指定查询在 C1 上执行,导入在 C2 上执行。同时,C1 计算集群可以访问 C2 计算集群中新导入的数据。 + +- **写写隔离**:与读写隔离类似,导入之间也可以进行隔离。例如,当系统中存在高频小量导入和大批量导入时,批量导入通常耗时较长且重试成本高,而高频小量导入耗时短且重试成本低。为了避免小量导入对批量导入的干扰,可以通过 `use @c1` 和 `use @c2`,将小量导入指定到 C1 上执行,批量导入指定到 C2 上执行。 + + +## 默认计算组的选择机制 + +当用户未明确[设置默认计算组](#设置默认计算组)时,系统将自动为用户选择一个具有存活计算节点且用户具有使用权限的计算组。在特定会话中确定默认计算组后,默认计算组将在该会话期间保持不变,除非用户显式更改了默认设置。 + +在不同次的会话中,若发生以下情况,系统可能会自动更改用户的默认计算组: + +- 用户失去了在上次会话中所选择默认计算组的使用权限 +- 有计算组被添加或移除 +- 上次所选择的默认计算组不再具有存活计算节点 + +其中,情况一和情况二必定会导致系统自动选择的默认计算组更改,情况三可能会导致更改。 + ## 查看所有计算组 -可通过 `SHOW COMPUTE GROUPS` 查看当前仓库拥有的所有计算组。 +使用 `SHOW COMPUTE GROUPS` 命令可以查看当前仓库中的所有计算组。返回结果会根据用户权限级别显示不同内容: + +- 具有 `ADMIN` 权限的用户可以查看所有计算组 +- 普通用户只能查看其拥有使用权限(USAGE_PRIV)的计算组 +- 如果用户没有任何计算组的使用权限,则返回结果为空 ```sql SHOW COMPUTE GROUPS; @@ -43,7 +72,8 @@ SHOW COMPUTE GROUPS; ## 添加计算组 -使用[Add BE ](../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND.md)命令添加 BE 并为 BE 指定计算组,示例: +操作计算组需要具备 `OPERATOR` 权限,即节点管理权限。有关详细信息,请参阅[权限管理](../sql-manual/sql-statements/Account-Management-Statements/GRANT.md)。默认情况下,只有 root 账号拥有 `OPERATOR` 权限,但可以通过 `GRANT` 命令将此权限授予其他账号。 +要添加 BE 并为其指定计算组,请使用 [Add BE](../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND.md) 命令。例如: ```sql ALTER SYSTEM ADD BACKEND 'host:9050' PROPERTIES ("tag.compute_group_name" = "new_group"); @@ -57,19 +87,20 @@ ALTER SYSTEM ADD BACKEND 'host:9050'; ## 授予计算组访问权限 +前置条件:当前操作用户具备 `ADMIN` 权限,或者当前用户属于admin role。 ```sql GRANT USAGE_PRIV ON COMPUTE GROUP {compute_group_name} TO {user} ``` ## 撤销计算组访问权限 - +前置条件:当前操作用户具备 `ADMIN` 权限,或者当前用户属于admin role。 ```sql REVOKE USAGE_PRIV ON COMPUTE GROUP {compute_group_name} FROM {user} ``` ## 设置默认计算组 -为当前用户设置默认计算组: +为当前用户设置默认计算组(此操作需要当前用户已经拥有计算组的使用权限): ```sql SET PROPERTY 'default_compute_group' = '{clusterName}'; @@ -87,7 +118,7 @@ SET PROPERTY FOR {user} 'default_compute_group' = '{clusterName}'; SHOW PROPERTY; ``` -查看其他用户默认计算组,此操作需要当前用户具备相关权限,返回结果中`default_compute_group` 的值即为默认计算组: +查看其他用户默认计算组,此操作需要当前用户具备admin权限,返回结果中`default_compute_group` 的值即为默认计算组: ```sql SHOW PROPERTY FOR {user}; @@ -113,17 +144,6 @@ SHOW COMPUTE GROUPS; ::: -## 默认计算组的选择机制 - -当用户未明确设置默认计算组时,系统将自动为用户选择一个具有 Active BE 且用户具有使用权限的计算组。在特定会话中确定默认计算组后,默认计算组将在该会话期间保持不变,除非用户显式更改了默认设置。 - -在不同次的会话中,若发生以下情况,系统可能会自动更改用户的默认计算组: - -- 用户失去了在上次会话中所选择默认计算组的使用权限 -- 有计算组被添加或移除 -- 上次所选择的默认计算组不再具有 Active BE - -其中,情况一和情况二必定会导致系统自动选择的默认计算组更改,情况三可能会导致更改。 ## 切换计算组 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/overview.md index 11ecb9b888b..8b676ff1e68 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/overview.md @@ -83,17 +83,7 @@ Meta Service 是 Doris 存算分离元数据服务,主要负责处理导入事 - 已使用公有云服务 - 具备可靠的共享存储系统,比如 HDFS、Ceph、对象存储等 +- 高吞吐的共享存储能力,多计算组共享一份数据 - 需要极致的弹性扩缩容,需要 K8S 容器化,需要运行在私有云上 - 有专职团队维护整个公司的数据仓库平台 -## 基于存算分离实现多计算组工作负载隔离 - -如前所述,一个或多个无状态的 BE 节点可以组成计算组,可以运用计算组指定语句 (`use @<compute_group_name>`) 将特定负载指定到特定的计算组中,从而实现多导入以及查询负载的物理隔离。 - -假设当前存在 2 个计算组:C1 与 C2。 - -**读读隔离**:两个(类)大查询发起之前分别通过 `use @c1`,`use @c2`实现两个查询使用不同的计算节点运行,使两个查询在访问相同数据集时,不会因 CPU 和内存等资源的竞争而相互干扰。 - -**读写隔离**:Doris 的导入会消耗资源,特别是在大数据量和高频导入场景。为了避免查询和导入之间的资源竞争,可以通过 `use @c1`,`use @c2`指定查询请求在 C1 上执行,导入请求在 C2 上执行。同时,`c1`计算组可以访问`c2`计算组中新导入的数据。 - -**写写隔离**:与读写隔离同理,导入和导入之间同样可以进行隔离。例如,当系统中存在高频小量导入和大批量导入时,批量导入往往耗时长,重试成本高,而高频小量导入单次耗时短,重试成本低,为了避免小量导入对批量导入造成干扰,可以通过`use @c1`,`use @c2`,将小量导入指定到 `c1` 上执行,批量导入指定到 `c2` 上执行。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/compute-storage-decoupled/managing-compute-cluster.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/compute-storage-decoupled/managing-compute-cluster.md index 7dd03a5dedc..8662eefabb8 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/compute-storage-decoupled/managing-compute-cluster.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/compute-storage-decoupled/managing-compute-cluster.md @@ -35,9 +35,38 @@ under the License. 3.0.2 之前的版本中叫做计算集群(Compute Cluster)。 ::: +## 计算组使用场景 + +在多计算组的架构下,可以通过将一个或多个无状态的 BE 节点组成计算集群,利用计算集群指定语句 (use @<compute_group_name>) 将特定负载分配到特定的计算集群中,从而实现多导入和查询负载的物理隔离。 + +假设当前有两个计算集群:C1 和 C2。 + +- **读读隔离**:在发起两个大型查询之前,分别使用 `use @c1` 和 `use @c2`,确保两个查询在不同的计算节点上运行,从而避免在访问相同数据集时因 CPU 和内存等资源竞争而相互干扰。 + +- **读写隔离**:Doris 的数据导入会消耗大量资源,尤其是在大数据量和高频导入的场景中。为了避免查询和导入之间的资源竞争,可以通过 `use @c1` 和 `use @c2` 指定查询在 C1 上执行,导入在 C2 上执行。同时,C1 计算集群可以访问 C2 计算集群中新导入的数据。 + +- **写写隔离**:与读写隔离类似,导入之间也可以进行隔离。例如,当系统中存在高频小量导入和大批量导入时,批量导入通常耗时较长且重试成本高,而高频小量导入耗时短且重试成本低。为了避免小量导入对批量导入的干扰,可以通过 `use @c1` 和 `use @c2`,将小量导入指定到 C1 上执行,批量导入指定到 C2 上执行。 + + +## 默认计算组的选择机制 + +当用户未明确[设置默认计算组](#设置默认计算组)时,系统将自动为用户选择一个具有存活计算节点且用户具有使用权限的计算组。在特定会话中确定默认计算组后,默认计算组将在该会话期间保持不变,除非用户显式更改了默认设置。 + +在不同次的会话中,若发生以下情况,系统可能会自动更改用户的默认计算组: + +- 用户失去了在上次会话中所选择默认计算组的使用权限 +- 有计算组被添加或移除 +- 上次所选择的默认计算组不再具有存活计算节点 + +其中,情况一和情况二必定会导致系统自动选择的默认计算组更改,情况三可能会导致更改。 + ## 查看所有计算组 -可通过 `SHOW COMPUTE GROUPS` 查看当前仓库拥有的所有计算组。 +使用 `SHOW COMPUTE GROUPS` 命令可以查看当前仓库中的所有计算组。返回结果会根据用户权限级别显示不同内容: + +- 具有 `ADMIN` 权限的用户可以查看所有计算组 +- 普通用户只能查看其拥有使用权限(USAGE_PRIV)的计算组 +- 如果用户没有任何计算组的使用权限,则返回结果为空 ```sql SHOW COMPUTE GROUPS; @@ -45,7 +74,9 @@ SHOW COMPUTE GROUPS; ## 添加计算组 -使用[ADD BE ](../../sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md)命令添加 BE 并为 BE 指定计算组,示例: +操作计算组需要具备 `OPERATOR` 权限,即节点管理权限。有关详细信息,请参阅[权限管理](../sql-manual/sql-statements/Account-Management-Statements/GRANT.md)。默认情况下,只有 root 账号拥有 `OPERATOR` 权限,但可以通过 `GRANT` 命令将此权限授予其他账号。 +要添加 BE 并为其指定计算组,请使用 [Add BE](../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND.md) 命令。例如: + ```sql ALTER SYSTEM ADD BACKEND 'host:9050' PROPERTIES ("tag.compute_group_name" = "new_group"); @@ -59,19 +90,22 @@ ALTER SYSTEM ADD BACKEND 'host:9050'; ## 授予计算组访问权限 +前置条件:当前操作用户具备 `ADMIN` 权限,或者当前用户属于admin role。 + ```sql GRANT USAGE_PRIV ON COMPUTE GROUP {compute_group_name} TO {user}; ``` ## 撤销计算组访问权限 +前置条件:当前操作用户具备 `ADMIN` 权限,或者当前用户属于admin role。 ```sql REVOKE USAGE_PRIV ON COMPUTE GROUP {compute_group_name} FROM {user}; ``` ## 设置默认计算组 -为当前用户设置默认计算组: +为当前用户设置默认计算组(此操作需要当前用户已经拥有计算组的使用权限): ```sql SET PROPERTY 'default_compute_group' = '{clusterName}'; @@ -89,7 +123,7 @@ SET PROPERTY FOR {user} 'default_compute_group' = '{clusterName}'; SHOW PROPERTY; ``` -查看其他用户默认计算组,此操作需要当前用户具备相关权限,返回结果中`default_compute_group` 的值即为默认计算组: +查看其他用户默认计算组,此操作需要当前用户具备admin权限,返回结果中`default_compute_group` 的值即为默认计算组: ```sql SHOW PROPERTY FOR {user}; @@ -123,18 +157,6 @@ SHOW COMPUTE GROUPS; ::: -## 默认计算组的选择机制 - -当用户未明确设置默认计算组时,系统将自动为用户选择一个具有 Active BE 且用户具有使用权限的计算组。在特定会话中确定默认计算组后,默认计算组将在该会话期间保持不变,除非用户显式更改了默认设置。 - -在不同次的会话中,若发生以下情况,系统可能会自动更改用户的默认计算组: - -- 用户失去了在上次会话中所选择默认计算组的使用权限 -- 有计算组被添加或移除 -- 上次所选择的默认计算组不再具有 Active BE - -其中,情况一和情况二必定会导致系统自动选择的默认计算组更改,情况三可能会导致更改。 - ## 切换计算组 用户可在存算分离架构中指定使用的数据库和计算组。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/compute-storage-decoupled/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/compute-storage-decoupled/overview.md index b37508caaab..e34f6e7b77f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/compute-storage-decoupled/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/compute-storage-decoupled/overview.md @@ -90,16 +90,4 @@ Meta Service 是 Doris 存算分离元数据服务,主要负责处理导入事 - 已使用公有云服务 - 具备可靠的共享存储系统,比如 HDFS、Ceph、对象存储等 - 需要极致的弹性扩缩容,需要 K8S 容器化,需要运行在私有云上 -- 有专职团队维护整个公司的数据仓库平台 - -## 基于存算分离实现多计算集群工作负载隔离 - -如前所述,一个或多个无状态的 BE 节点可以组成计算集群,可以运用计算集群指定语句 (`use @<compute_group_name>`) 将特定负载指定到特定的计算集群中,从而实现多导入以及查询负载的物理隔离。 - -假设当前存在 2 个计算集群:C1 与 C2。 - -**读读隔离**:两个(类)大查询发起之前分别通过 `use @c1`,`use @c2`实现两个查询使用不同的计算节点运行,使两个查询在访问相同数据集时,不会因 CPU 和内存等资源的竞争而相互干扰。 - -**读写隔离**:Doris 的导入会消耗资源,特别是在大数据量和高频导入场景。为了避免查询和导入之间的资源竞争,可以通过 `use @c1`,`use @c2`指定查询请求在 C1 上执行,导入请求在 C2 上执行。同时,`c1`计算集群可以访问`c2`计算集群中新导入的数据。 - -**写写隔离**:与读写隔离同理,导入和导入之间同样可以进行隔离。例如,当系统中存在高频小量导入和大批量导入时,批量导入往往耗时长,重试成本高,而高频小量导入单次耗时短,重试成本低,为了避免小量导入对批量导入造成干扰,可以通过`use @c1`,`use @c2`,将小量导入指定到 `c1` 上执行,批量导入指定到 `c2` 上执行。 +- 有专职团队维护整个公司的数据仓库平台 \ No newline at end of file diff --git a/versioned_docs/version-3.0/compute-storage-decoupled/managing-compute-cluster.md b/versioned_docs/version-3.0/compute-storage-decoupled/managing-compute-cluster.md index f4434cfded9..f136be704d6 100644 --- a/versioned_docs/version-3.0/compute-storage-decoupled/managing-compute-cluster.md +++ b/versioned_docs/version-3.0/compute-storage-decoupled/managing-compute-cluster.md @@ -33,9 +33,37 @@ In a compute-storage decoupled architecture, one or more compute nodes (BE) can *Note* In versions prior to 3.0.2, this was referred to as a Compute Cluster. +## Compute Group Usage Scenarios + +In a multi-compute group architecture, you can group one or more stateless BE nodes into compute clusters. By using compute cluster specification statements (use @<compute_group_name>), you can allocate specific workloads to specific compute clusters, achieving physical isolation of multiple import and query workloads. + +Assume there are two compute clusters: C1 and C2. + +- **Read-Read Isolation**: Before initiating two large queries, use `use @c1` and `use @c2` respectively to ensure that the queries run on different compute nodes. This prevents resource contention (CPU, memory, etc.) when accessing the same dataset. + +- **Read-Write Isolation**: Doris data imports consume substantial resources, especially in scenarios with large data volumes and high-frequency imports. To avoid resource contention between queries and imports, you can use `use @c1` and `use @c2` to specify that queries execute on C1 and imports on C2. Additionally, the C1 compute cluster can access newly imported data in the C2 compute cluster. + +- **Write-Write Isolation**: Similar to read-write isolation, imports can also be isolated from each other. For example, when the system has both high-frequency small imports and large batch imports, batch imports typically take longer and have higher retry costs, while high-frequency small imports are quick with lower retry costs. To prevent small imports from interfering with batch imports, you can use `use @c1` and `use @c2` to specify small imports to execute on C1 and batch imports on C2. + +## Default Compute Group Selection Mechanism + +When a user has not explicitly [set a default compute group](#setting-default-compute-group), the system will automatically select a compute group with Active BE that the user has usage permissions for. Once the default compute group is determined in a specific session, it will remain unchanged during that session unless the user explicitly changes the default setting. + +In different sessions, if the following situations occur, the system may automatically change the user's default compute group: + +- The user has lost usage permissions for the default compute group selected in the last session +- A compute group has been added or removed +- The previously selected default compute group no longer has Alive BE + +Situations one and two will definitely lead to a change in the automatically selected default compute group, while situation three may lead to a change. + ## Viewing All Compute Groups -You can view all compute groups owned by the current repository using `SHOW COMPUTE GROUPS`. +Use the `SHOW COMPUTE GROUPS` command to view all compute groups in the current repository. The returned results will display different content based on the user's permission level: + +- Users with `ADMIN` privileges can view all compute groups +- Regular users can only view compute groups for which they have usage permissions (USAGE_PRIV) +- If a user doesn't have usage permissions for any compute groups, an empty result will be returned ```sql SHOW COMPUTE GROUPS; @@ -43,7 +71,8 @@ SHOW COMPUTE GROUPS; ## Adding Compute Groups -Using [Add BE ](../sql-manual/sql-statements/cluster-management/instance-management/ADD-BACKEND.md) to add a BE into a compute group, for example: +Managing compute groups requires `OPERATOR` privilege, which controls node management permissions. For more details, please refer to [Privilege Management](../sql-manual/sql-statements/Account-Management-Statements/GRANT.md). By default, only the root account has the `OPERATOR` privilege, but it can be granted to other accounts using the `GRANT` command. +To add a BE and assign it to a compute group, use the [Add BE](../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND.md) command. For example: ```sql ALTER SYSTEM ADD BACKEND 'host:9050' PROPERTIES ("tag.compute_group_name" = "new_group"); @@ -57,19 +86,23 @@ ALTER SYSTEM ADD BACKEND 'host:9050'; ## Granting Compute Group Access +Prerequisite: The current operating user has' ADMIN 'permission, or the current user belongs to the admin role. + ```sql GRANT USAGE_PRIV ON COMPUTE GROUP {compute_group_name} TO {user} ``` ## Revoking Compute Group Access +Prerequisite: The current operating user has' ADMIN 'permission, or the current user belongs to the admin role. + ```sql REVOKE USAGE_PRIV ON COMPUTE GROUP {compute_group_name} FROM {user} ``` ## Setting Default Compute Group -To set the default compute group for the current user: +To set the default compute group for the current user(This operation requires the current user to already have permission to use the computing group): ```sql SET PROPERTY 'default_compute_group' = '{clusterName}'; @@ -87,7 +120,7 @@ To view the current user's default compute group, the value of `default_compute_ SHOW PROPERTY; ``` -To view the default compute group of other users, this operation requires the current user to have relevant permissions, and the value of `default_compute_group` in the returned result is the default compute group: +To view the default compute group of other users, This operation requires the current user to have admin privileges, and the value of `default_compute_group` in the returned result is the default compute group: ```sql SHOW PROPERTY FOR {user}; @@ -113,17 +146,6 @@ SHOW COMPUTE GROUPS; ::: -## Default Compute Group Selection Mechanism - -When a user has not explicitly set a default compute group, the system will automatically select a compute group with Active BE that the user has usage permissions for. Once the default compute group is determined in a specific session, it will remain unchanged during that session unless the user explicitly changes the default setting. - -In different sessions, if the following situations occur, the system may automatically change the user's default compute group: - -- The user has lost usage permissions for the default compute group selected in the last session -- A compute group has been added or removed -- The previously selected default compute group no longer has Active BE - -Situations one and two will definitely lead to a change in the automatically selected default compute group, while situation three may lead to a change. ## Switching Compute Groups diff --git a/versioned_docs/version-3.0/compute-storage-decoupled/overview.md b/versioned_docs/version-3.0/compute-storage-decoupled/overview.md index eeab2319a00..0d4848165ad 100644 --- a/versioned_docs/version-3.0/compute-storage-decoupled/overview.md +++ b/versioned_docs/version-3.0/compute-storage-decoupled/overview.md @@ -93,16 +93,5 @@ The shared storage layer stores the data files, including segment files and the - When you have already adopted public cloud services; - When you have reliable shared storage systems, such as HDFS, Ceph, and object storage; - When you require high elastic scalability, Kubernetes containerization, or to run on a private cloud; +- High throughput shared storage capability, allowing multiple computing groups to share data - When you have a dedicated team responsible for maintaining the company's entire data warehouse platform. - -## Workload isolation across compute clusters - -As mentioned earlier, a compute cluster is formed by one or more stateless BE nodes. By using the compute cluster specification statement (`use @<compute_group_name>`), you can direct specific workloads to specific compute clusters, thus realizing physical isolation of data import and query workloads. - -Assuming there are 2 compute clusters: C1 and C2. - -**Read isolation**: Before initiating two large queries, you can leverage `use @c1` and `use @c2` respectively to make the two queries run on different compute nodes. In this way, the two queries will not interfere with each other due to competition for CPU and memory resources when accessing the same dataset. - -**Read-write isolation**: Data imports can consume resources, especially with large data volumes and high import frequency. To avoid resource contention between queries and imports, you can specify query requests to run on C1 and import requests to run on C2 using `use @c1` and `use @c2`. Meanwhile, the `c1` compute cluster can access the newly imported data in the `c2` compute cluster. - -**Write-write isolation**: Data import tasks can also be isolated from each other. In some cases, the system handles both high-frequency small imports and large-scale batch imports. The batch imports often take longer and have higher retry costs, while the high-frequency small imports are the opposite. To avoid small imports interfering with batch imports, you can direct the small imports to run on `c1` and the batch imports to run on `c2` via `use @c1` and `use @c2`. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org