This is an automated email from the ASF dual-hosted git repository. dataroaring pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 721074180f4 [improve] add data-distribution-concept (#1568) 721074180f4 is described below commit 721074180f41470291d66a53e009029bf628b39e Author: Yongqiang YANG <yangyongqi...@selectdb.com> AuthorDate: Tue Dec 24 17:18:23 2024 +0800 [improve] add data-distribution-concept (#1568) ## Versions - [x] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --------- Co-authored-by: Yongqiang YANG <yangyogqi...@selectdb.com> --- .../data-partitioning/data-distribution.md | 123 +++++++++++++++++++++ .../data-partitioning/data-distribution.md | 123 +++++++++++++++++++++ sidebars.json | 2 +- 3 files changed, 247 insertions(+), 1 deletion(-) diff --git a/docs/table-design/data-partitioning/data-distribution.md b/docs/table-design/data-partitioning/data-distribution.md new file mode 100644 index 00000000000..f9036d97244 --- /dev/null +++ b/docs/table-design/data-partitioning/data-distribution.md @@ -0,0 +1,123 @@ +--- +{ + "title": "Data Distribution Concept", + "language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +In Doris, the core of **data distribution** is to efficiently map the rows of data written to the table onto various **data shards (Tablets)** in the underlying storage through reasonable partitioning and bucketing strategies. Through data distribution strategies, Doris can fully utilize the storage and computing capabilities of multiple nodes, thereby supporting efficient storage and querying of large-scale data. + +--- + +## Overview of Data Distribution + +### Data Writing + +When writing data, Doris first allocates the rows of data to the corresponding partitions based on the table's partitioning strategy. Then, according to the bucketing strategy, the rows of data are further mapped to specific shards within the partition, determining the storage location of the data rows. + +### Query Execution + +During query execution, Doris's optimizer will trim data based on partitioning and bucketing strategies to maximize the reduction of the scanning range. In cases involving JOIN or aggregation queries, data transfer across nodes (Shuffle) may occur. Reasonable partitioning and bucketing design can reduce Shuffle and fully utilize **Colocate Join** to optimize query performance. + +--- + +## Node and Storage Architecture + +### Node Types + +A Doris cluster consists of the following two types of nodes: + +- **FE Node (Frontend)**: Manages cluster metadata (such as tables and shards) and is responsible for SQL parsing and execution planning. +- **BE Node (Backend)**: Stores data and is responsible for executing computational tasks. The results from BE are aggregated and returned to FE, which then returns them to the user. + +### Data Shard (Tablet) + +The data stored in the BE node is divided into shards, with each shard being the smallest unit of data management in Doris and the basic unit for data movement and replication. + +--- + +## Partitioning Strategy + +Partitioning is the first layer of logical division for data organization, used to divide the data in the table into smaller subsets. Doris provides the following two **partition types** and three **partition modes**: + +### Partition Types + +- **Range Partitioning**: Allocates data rows to corresponding partitions based on the value range of the partition column. +- **List Partitioning**: Allocates data rows to corresponding partitions based on specific values of the partition column. + +### Partition Modes + +- **Manual Partitioning**: Users manually create partitions (e.g., specified during table creation or added via `ALTER` statements). +- **Dynamic Partitioning**: The system automatically creates partitions based on time scheduling rules, but does not create partitions on demand when writing data. +- **Automatic Partitioning**: The system automatically creates corresponding partitions as needed during data writing, but care should be taken to avoid generating too many partitions with dirty data. + +--- + +## Bucketing Strategy + +Bucketing is the second layer of logical division for data organization, used to further divide data rows into smaller units within a partition. Doris supports the following two bucketing methods: + +- **Hash Bucketing**: Distributes data rows evenly across shards by calculating the `crc32` hash value of the bucketing column and taking the modulus of the number of buckets. +- **Random Bucketing**: Randomly allocates data rows to shards. When using Random bucketing, the `load_to_single_tablet` option can be used to optimize the quick writing of small-scale data. + +--- + +## Data Distribution Optimization + +### Colocate Join + +For large tables that frequently require JOIN or aggregation queries, the **Colocate** strategy can be enabled to place data with the same bucketing column values on the same physical node, reducing data transfer across nodes and significantly improving query performance. + +### Partition Pruning + +During queries, Doris can prune irrelevant partitions through filtering conditions, thereby reducing the data scanning range and lowering I/O costs. + +### Bucketing Parallelism + +During queries, a reasonable number of buckets can fully utilize the computational and I/O resources of the machines. + +--- + +## Data Distribution Goals + +1. **Uniform Data Distribution** + Ensure that data is evenly distributed across all BE nodes to avoid data skew that could overload certain nodes, thereby improving overall system performance. + +2. **Optimize Query Performance** + Reasonable partition pruning can significantly reduce the amount of data scanned, a reasonable number of buckets can enhance computational parallelism, and effective use of COLOCATE can lower Shuffle costs, improving JOIN and aggregation query efficiency. + +3. **Flexible Data Management** + - Time-based partitioning to store cold data (HDD) and hot data (SSD). + - Regularly delete historical partitions to free up storage space. + +4. **Control Metadata Scale** + The metadata for each shard is stored in both FE and BE, so it is necessary to reasonably control the number of shards. The empirical recommendation is: + - For every 10 million shards, FE requires at least 100GB of memory. + - The number of shards handled by a single BE should be less than 20,000. + +5. **Optimize Write Throughput** + - The number of buckets should be reasonably controlled (recommended < 128) to avoid degrading write performance. + - The number of partitions written at one time should be appropriate (recommended to write a small number of partitions at a time). + +--- + +By carefully designing and managing partitioning and bucketing strategies, Doris can efficiently support the storage and query processing of large-scale data, meeting various complex business needs. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/data-distribution.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/data-distribution.md new file mode 100644 index 00000000000..ebac723fe96 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/data-distribution.md @@ -0,0 +1,123 @@ +--- +{ + "title": "数据分布概念", + "language": "zh-CN" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +在 Doris 中,**数据分布**的核心是通过合理的分区和分桶策略,将写入到表的数据行高效地映射到底层存储的各个 **数据分片(Tablet)** 上。通过数据分布策略,Doris 可以充分利用多节点的存储和计算能力,从而支持大规模数据的高效存储与查询。 + +--- + +## 数据分布概览 + +### 数据写入 + +数据写入时,Doris 首先根据表的分区策略将数据行分配到对应的分区。接着,根据分桶策略将数据行进一步映射到分区内的具体分片,从而确定了数据行的存储位置。 + +### 查询执行 + +查询运行时,Doris 的优化器会根据分区和分桶策略裁剪数据,最大化减少扫描范围。在涉及 JOIN 或聚合查询时,可能会发生跨节点的数据传输(Shuffle)。合理的分区和分桶设计可以减少 Shuffle 并充分利用 **Colocate Join** 优化查询性能。 + +--- + +## 节点与存储架构 + +### 节点类型 + +Doris 集群由以下两种节点组成: + +- **FE 节点(Frontend)**:管理集群元数据(如表、分片),负责 SQL 的解析与执行规划。 +- **BE 节点(Backend)**:存储数据,负责计算任务的执行。BE 的结果汇总后返回至 FE,再返回给用户。 + +### 数据分片(Tablet) + +BE 节点的存储数据分片的数据,每个分片是 Doris 中数据管理的最小单元,也是数据移动和复制的基本单位。 + +--- + +## 分区策略 + +分区是数据组织的第一层逻辑划分,用于将表中的数据划分为更小的子集。Doris 提供以下两种 **分区类型** 和三种 **分区模式**: + +### 分区类型 + +- **Range 分区**:根据分区列的值范围将数据行分配到对应分区。 +- **List 分区**:根据分区列的具体值将数据行分配到对应分区。 + +### 分区模式 + +- **手动分区**:用户手动创建分区(如建表时指定或通过 `ALTER` 语句增加)。 +- **动态分区**:系统根据时间调度规则自动创建分区,但写入数据时不会按需创建分区。 +- **自动分区**:数据写入时,系统根据需要自动创建相应的分区,使用时注意脏数据生成过多的分区。 + +--- + +## 分桶策略 + +分桶是数据组织的第二层逻辑划分,用于在分区内将数据行进一步划分到更小的单元。Doris 支持以下两种分桶方式: + +- **Hash 分桶**:通过计算分桶列值的 `crc32` 哈希值,并对分桶数取模,将数据行均匀分布到分片中。 +- **Random 分桶**:随机分配数据行到分片中。使用 Random 分桶时,可以使用 `load_to_single_tablet` 优化小规模数据的快速写入。 + +--- + +## 数据分布优化 + +### Colocate Join + +对于需要频繁进行 JOIN 或聚合查询的大表,可以启用 **Colocate** 策略,将相同分桶列值的数据放置在同一物理节点上,减少跨节点的数据传输,从而显著提升查询性能。 + +### 分区裁剪 + +查询时,Doris 可以通过过滤条件裁剪掉不相关的分区,从而减少数据扫描范围,降低 I/O 开销。 + +### 分桶并行 + +查询时,合理的分桶数可以充分利用机器的计算资源和 I/O 资源。 + +--- + +## 数据分布目标 + +1. **均匀数据分布** + 确保数据均匀分布在各 BE 节点上,避免数据倾斜导致部分节点过载,从而提高系统整体性能。 + +2. **优化查询性能** + 合理的分区裁剪可以大幅减少扫描的数据量,合理的分桶数可以提升计算并行度,合理利用 Colocate 可以降低 Shuffle 成本,提升 JOIN 和聚合查询效率。 + +3. **灵活数据管理** + - 按时间分区保存冷数据(HDD)与热数据(SSD)。 + - 定期删除历史分区释放存储空间。 + +4. **控制元数据规模** + 每个分片的元数据存储在 FE 和 BE 中,因此需要合理控制分片数量。经验值建议: + - 每 1000w 分片,FE 至少需 100G 内存。 + - 单个 BE 承载的分片数应小于 2w。 + +5. **优化写入吞吐** + - 分桶数应合理控制(建议 < 128),以避免写入性能下降。 + - 每次写入的分区数量应适量(建议每次写入少量分区)。 + +--- + +通过精心设计和管理分区与分桶策略,Doris 能够高效地支持大规模数据的存储与查询处理,满足各种复杂业务需求。 diff --git a/sidebars.json b/sidebars.json index 45af4f4fd55..c745eb0b32e 100644 --- a/sidebars.json +++ b/sidebars.json @@ -130,7 +130,7 @@ "type": "category", "label": "Data Partitioning", "items": [ - "table-design/data-partitioning/basic-concepts", + "table-design/data-partitioning/data-distribution", "table-design/data-partitioning/manual-partitioning", "table-design/data-partitioning/dynamic-partitioning", "table-design/data-partitioning/auto-partitioning", --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org