This is an automated email from the ASF dual-hosted git repository. luzhijing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new ca2103f251 [doc] Update date-partition EN doc in 2.1 & dev version (#824) ca2103f251 is described below commit ca2103f251fdf875701277422ff0b92282609795 Author: lishiqi_amy <amylee9...@163.com> AuthorDate: Fri Jul 12 12:04:31 2024 +0800 [doc] Update date-partition EN doc in 2.1 & dev version (#824) --- docs/table-design/data-partition.md | 457 +++++++++++++++++---- .../version-2.1/table-design/data-partition.md | 457 +++++++++++++++++---- 2 files changed, 734 insertions(+), 180 deletions(-) diff --git a/docs/table-design/data-partition.md b/docs/table-design/data-partition.md index 2e2a1a9fdc..ed863ab76b 100644 --- a/docs/table-design/data-partition.md +++ b/docs/table-design/data-partition.md @@ -5,7 +5,7 @@ } --- -<!-- +<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information @@ -38,31 +38,31 @@ A table consists of rows and columns: - Column: Used to describe different fields in a row of data; -- Columns can be divided into two types: Key and Value. From a business perspective, Key and Value can correspond to dimension columns and metric columns, respectively. The key columns in Doris are those specified in the table creation statement, which are the columns following the keywords `unique key`, `aggregate key`, or `duplicate key`. The remaining columns are value columns. From the perspective of the aggregation model, rows with the same Key columns will be aggregated into a sing [...] +- Columns can be divided into two types: Key and Value. From a business perspective, Key and Value can correspond to dimension columns and metric columns respectively. The key columns in Apache Doris are those specified in the table creation statement, which are the columns following the keywords `unique key`, `aggregate key`, or `duplicate key`. The remaining columns are value columns. From the perspective of the aggregation model, rows with the same Key columns will be aggregated into [...] ### Partition & Tablet -Doris supports two levels of data partitioning. The first level is Partitioning, which supports Range and List partition. The second level is Bucket (also known as Tablet), which supports Hash and Random . If no partitioning is established during table creation, Doris generates a default partition that is transparent to the user. When using the default partition, only Bucket is supported. +Apache Doris supports two levels of data partitioning. The first level is partition, which supports RANGE partitioning and LIST partitioning. The second level is tablet (also called bucket), which supports Hash bucket and Random bucket. If no partition is established during table creation, Apache Doris generates a default partition that is transparent to the user. When using the default partition, only bucket is supported. -In the Doris storage engine, data is horizontally partitioned into several tablets. Each tablet contains several rows of data. There is no overlap between the data in different tablets, and they are stored physically independently. +In the Apache Doris storage engine, data is horizontally partitioned into several tablets. Each tablet contains several rows of data. There is no overlap between the data in different tablets, and they are stored physically independently. Multiple tablets logically belong to different partitions. A single tablet belongs to only one partition, while a partition contains several tablets. Because tablets are stored physically independently, partitions can also be considered physically independent. The tablet is the smallest physical storage unit for operations such as data movement and replication. Several partitions compose a table. The partition can be considered the smallest logical management unit. -Benefits of Two-Level data partitioning: +The benefits of Apache Doris's two-level data partitioning are as follows: -- For dimensions with time or similar ordered values, such dimension columns can be used as partitioning columns. The partition granularity can be evaluated based on import frequency and partition data volume. +- Columns with ordered values can be used as partitioning columns. The partition granularity can be evaluated based on import frequency and partition data volume. -- Historical data deletion requirements: If there is a need to delete historical data (such as retaining only the data for the most recent several days), composite partition can be used to achieve this goal by deleting historical partitions. Alternatively, DELETE statements can be sent within specified partitions to delete data. +- If there is a need to delete historical data (such as retaining only the data for the most recent several days), composite partition can be used to achieve this goal by deleting historical partitions. Alternatively, `DELETE` statements can be sent within specified partitions to delete data. -- Solving data skew issues: Each partition can specify the number of buckets independently. For example, when partitioning by day and there are significant differences in data volume between days, the number of buckets for each partition can be specified to reasonably distribute data across different partitions. It is recommended to choose a column with high distinctiveness as the bucketing column. +- Each partition can specify the number of buckets independently. For example, when data is partitioned by day and there are significant differences in data volume between days, the number of buckets for each partition can be specified to reasonably distribute data across different partitions. It is recommended to choose a column with high distinctiveness as the bucketing column. -### Example of creating a table +### Example of creating a table -CREATE TABLE in Doris is a synchronous command. It returns results after the SQL execution is completed. Successful returns indicate successful table creation. For more information, please refer to [CREATE TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE), or input the `HELP CREATE TABLE;` command. +`CREATE TABLE` in Apache Doris is a synchronous command which returns the result once the SQL is executed. Successful returns indicate successful table creation. For more information, refer to [CREATE-TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE) or input the `HELP CREATE TABLE` command. -This section introduces how to create tables in Doris by range partiton and hash buckets. +The following code sample introduces how to create tables in Apache Doris by RANGE partitioning and Hash buckets. ```sql -- Range Partition @@ -94,17 +94,17 @@ PROPERTIES ); ``` -Here use the AGGREGATE KEY data model as an example. In the AGGREGATE KEY data model, all columns that are specified with an aggregation type (SUM, REPLACE, MAX, or MIN) are Value columns. The rest are the Key columns. +Here use Aggregate Key Model as an example. In Aggregate Key Model, all columns that are specified with an aggregation type (SUM, REPLACE, MAX, or MIN) are Value columns. The rest are the Key columns. -In the PROPERTIES at the end of the CREATE TABLE statement, you can find detailed information about the relevant parameters that can be set in PROPERTIES by referring to the documentation on [CREATE TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE). +For more information about what fields can be set in the `PROPERTIES` section of `CREATE TABLE`, refer to [CREATE-TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE.md). -The default type of ENGINE is OLAP. In Doris, only this OLAP ENGINE type is responsible for data management and storage by Doris itself. Other ENGINE types, such as mysql, broker, es, etc., are essentially just mappings to tables in other external databases or systems, allowing Doris to read this data. However, Doris itself does not create, manage, or store any tables or data for non-OLAP ENGINE types. +The default type of `ENGINE` is `OLAP`. Only OLAP is responsible for data management and storage by Apache Doris itself. Other engine types, such as MySQL, Broker and ES, are essentially just mappings to tables in other external databases or systems, allowing Apache Doris to read this data. However, Apache Doris itself does not create, manage, or store any tables or data for engine types except OLAP. -`IF NOT EXISTS` indicates that if the table has not been created before, it will be created. Note that this only checks if the table name exists and does not check if the schema of the new table is the same as the schema of an existing table. Therefore, if there is a table with the same name but a different schema, this command will also return successfully, but it does not mean that a new table and a new schema have been created. +`IF NOT EXISTS` indicates that if the table has not been created before, it will be created. Note that this only checks if the table name exists and does not check if the schema of the new table is the same as the schema of an existing table. Therefore, if there is a table with the same name but a different schema, this command will also return successfully, but it does not mean that a new table with a new schema has been created. -### View partition +### View partitions -You can use the `show create table` command to view the partition information of a table. +View the partiton information of a table by running the `show create table` command. ```sql > show create table example_range_tbl @@ -141,7 +141,7 @@ You can use the `show create table` command to view the partition information of +-------------------+---------------------------------------------------------------------------------------------------------+ ``` -You can use `show partitions from your_table` command to view the partition information of a table. +Or run the `show partitions from your_table` command. ``` > show partitions from example_range_tbl @@ -161,15 +161,15 @@ You can use `show partitions from your_table` command to view the partition info +---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ ``` -### Alter partition +### Alter partitions -You can add a new partition by using the `alter table add partition` command. +You can add a new partition by running the `alter table add partition ` command. ```sql ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; ``` -For more partition modification operations, please refer to the SQL manual on [ALTER-TABLE-PARTITION](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION). +For more information about how to alter partitions, refer to [ALTER-TABLE-PARTITION](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION.md). ## Manual partitioning @@ -187,13 +187,13 @@ For more partition modification operations, please refer to the SQL manual on [A - Overlapping ranges are not allowed when creating partitions. -### Range partition +### RANGE partitioning -Partition columns are usually time columns for convenient management of old and new data. Range partition supports column types such as DATE, DATETIME, TINYINT, SMALLINT, INT, BIGINT, and LARGEINT. +Partition columns are usually time columns for convenient management of old and new data. RANGE partitioning supports column types such as `DATE`, `DATETIME`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, and `LARGEINT`. -Partition information supports four writing methods: +Partition information supports the following four writing methods: -- FIXED RANGE: the partition as a left-closed, right-open interval. +- `FIXED RANGE`: This method defines the partition as a left-closed, right-open interval. ```sql PARTITION BY RANGE(col1[, col2, ...]) @@ -214,7 +214,7 @@ PARTITION BY RANGE(`date`) ) ``` -- LESS THAN: Only define the upper bound of the partition. The lower bound is determined by the upper bound of the previous partition. +- `LESS THAN`: This method only defines the upper bound of the partition. The lower bound is determined by the upper bound of the previous partition. ```sql PARTITION BY RANGE(col1[, col2, ...]) @@ -243,7 +243,7 @@ PARTITION BY RANGE(`date`) ) ``` -- BATCH RANGE: Batch create RANGE partitions of numeric and time types, defining the partitions as left-closed, right-open intervals, and setting the step size. +- `BATCH RANGE`: This method batch creates partitions based on ranges of number or time, defining the partitions as left-closed, right-open intervals and setting the step size. ```sql PARTITION BY RANGE(int_col) @@ -271,7 +271,7 @@ PARTITION BY RANGE(`date`) ) ``` -- MULTI RANGE: Batch create RANGE partitions, defining the partitions as left-closed, right-open intervals. For example: +- `MULTI RANGE`: This method batch creates partitions based on range partitioning, defining the partitions as left-closed, right-open intervals. For example: ```sql PARTITION BY RANGE(col) @@ -284,11 +284,11 @@ PARTITION BY RANGE(col) ) ``` -### List partition +### LIST partitioning -Partition columns support data types such as BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME, CHAR, and VARCHAR. Partition values are enumerated values. Only when the data is one of the enumerated values of the target partition, the partition can be hit . +Partition columns based on LIST partitioning support data types such as `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, and `VARCHAR`. Partition values are enumerated values. Only when the data is one of the enumerated values of the target partition, the partition can be hit. -Partitions support specifying the enumerated values contained in each partition through VALUES IN (...). +Partitions support specifying the enumerated values contained in each partition through `VALUES IN (...)`. For example: @@ -301,7 +301,7 @@ PARTITION BY LIST(city) ) ``` -List partition also supports multi-column partitioning, for example: +LIST partitioning also supports multi-column partitioning, for example: ```sql PARTITION BY LIST(id, city) @@ -312,17 +312,91 @@ PARTITION BY LIST(id, city) ) ``` -## Dynamic partition +### NULL partitioning -Dynamic partition is designed to manage partition's Time-to-Life (TTL), reducing the burden on users. +Partition columns based on NULL partitioning must be not null columns by default. If you need to use null columns, set the session variable `allow_partition_column_nullable` to `true`. For LIST partitioning, the NULL partitioning is supported, while for RANGE partitioning, null values will be assigned to the `less than` partition. The columns are as follows: -In some usage scenarios, the user will partition the table according to the day and perform routine tasks regularly every day. At this time, the user needs to manually manage the partition. Otherwise, the data load may fail because the user does not create a partition. This brings additional maintenance costs to the user. +**LIST partitioning** -With dynamic partitioning, users can define rules for partition creation and deletion when establishing tables. The FE initiates a background thread to handle partition creation or deletion based on these user-defined rules. Users also have the flexibility to modify these rules during runtime. +```sql +mysql> create table null_list( + -> k0 varchar null + -> ) + -> partition by list (k0) + -> ( + -> PARTITION pX values in ((NULL)) + -> ) + -> DISTRIBUTED BY HASH(`k0`) BUCKETS 1 + -> properties("replication_num" = "1"); +Query OK, 0 rows affected (0.11 sec) + +mysql> insert into null_list values (null); +Query OK, 1 row affected (0.19 sec) + +mysql> select * from null_list; ++------+ +| k0 | ++------+ +| NULL | ++------+ +1 row in set (0.18 sec) +``` -It's important to note that dynamic partitioning is exclusively supported by range partitions. Currently, the functionality enables dynamic addition and deletion of partitions. +**RANGE partitioning with the `less than` partition** -:::tip +```sql +mysql> create table null_range( + -> k0 int null + -> ) + -> partition by range (k0) + -> ( + -> PARTITION p10 values less than (10), + -> PARTITION p100 values less than (100), + -> PARTITION pMAX values less than (maxvalue) + -> ) + -> DISTRIBUTED BY HASH(`k0`) BUCKETS 1 + -> properties("replication_num" = "1"); +Query OK, 0 rows affected (0.12 sec) + +mysql> insert into null_range values (null); +Query OK, 1 row affected (0.19 sec) + +mysql> select * from null_range partition(p10); ++------+ +| k0 | ++------+ +| NULL | ++------+ +1 row in set (0.18 sec) +``` + +**RANGE partitioning without the `less than` partition** + +```sql +mysql> create table null_range2( + -> k0 int null + -> ) + -> partition by range (k0) + -> ( + -> PARTITION p200 values [("100"), ("200")) + -> ) + -> DISTRIBUTED BY HASH(`k0`) BUCKETS 1 + -> properties("replication_num" = "1"); +Query OK, 0 rows affected (0.13 sec) + +mysql> insert into null_range2 values (null); +ERROR 5025 (HY000): Insert has filtered data in strict mode, tracking_url=...... +``` + +## Dynamic partitioning + +Dynamic partitioning is designed to manage the lifecycle of partitions, reducing the burden on users. + +Dynamic partitioning only supports RANGE partitioning based on `DATE` or `DATETIME` columns. It is applicable for cases where time data in partition columns grows synchronously with the real world. In such scenarios, data can be partitioned flexibly based on time data, and can be automatically stored with the cold-hot tiering strategy or recycled according to settings. + +For partitioning method that can be more widely applicable, see [Auto partitioning](https://doris.apache.org/docs/table-design/data-partition/#auto-partitioning). + +:::caution Warning This feature will be disabled when synchronized by CCR. If this table is copied by CCR, that is, PROPERTIES contains `is_being_synced = true`, it will be displayed as enabled in show create table, but will not actually take effect. When `is_being_synced` is set to `false`, these features will resume working, but the `is_being_synced` property is for CCR peripheral modules only and should not be manually set during CCR synchronization. @@ -400,15 +474,15 @@ The rules of dynamic partition are prefixed with `dynamic_partition.`: - `dynamic_partition.buckets` - The number of buckets corresponding to the dynamically created partitions. + The number of buckets corresponding to the dynamically created partitions. - `dynamic_partition.replication_num` - The replication number of dynamic partition.If not filled in, defaults to the number of table's replication number. + The replication number of dynamic partition.If not filled in, defaults to the number of table's replication number. - `dynamic_partition.start_day_of_week` - When `time_unit` is` WEEK`, this parameter is used to specify the starting point of the week. The value ranges from 1 to 7. Where 1 is Monday and 7 is Sunday. The default is 1, which means that every week starts on Monday. + When `time_unit` is` WEEK`, this parameter is used to specify the starting point of the week. The value ranges from 1 to 7. Where 1 is Monday and 7 is Sunday. The default is 1, which means that every week starts on Monday. - `dynamic_partition.start_day_of_month` @@ -490,7 +564,7 @@ The rules of dynamic partition are prefixed with `dynamic_partition.`: Note that when set to SSD, the `hot_partition_num` property will no longer take effect, all partitions will default to SSD storage media and the cooldown time will be 9999-12-31 23:59:59. -#### Create history partition rules +### Create history partition rules When `create_history_partition` is `true`, i.e. history partition creation is enabled, Doris determines the number of history partitions to be created based on `dynamic_partition.start` and `dynamic_partition.history_partition_num`. @@ -670,9 +744,35 @@ p20210523 p202006: ["2020-06-28", "2020-07-28") ``` -### Modify dynamic partition properties +### Principle and control behavior -You can modify the properties of the dynamic partition with the following command +Apache Doris sets a fixed FE control thread that continuously checks the table based on dynamic partitioning at specific time intervals (specified by the `dynamic_partition_check_interval_seconds` field) to perform the necessary partition creation and deletion operations. + +Specifically, when dynamic partitioning is executed, the following checks and operations are performed (refer to the start time of the partition as `START` and the end time as `END`, and omit `dynamic_partition.`): + +- All partitions before `START` are deleted. +- If `create_history_partition` is `false`, create all partitions between the current time and `END`; if `create_history_partition` is `true`, not only all partitions between the current time and `END` are created, but also all partitions between `START` and current time are created. If `history_partition_num` is specified, the number of created partitions before current time cannot exceed the value of `history_partition_num`. + +Note that: + +- If the partition time range intersects with the `[START, END]` range, it is considered to belong to the current dynamic partition time range. +- If the newly created partition conflicts with an existing partition, the current partition is retained, and the new partition is not created. If the conflict occurs when the table is created, DDL will occur an error. + +Therefore, after the automatic maintenance of the partition table, the state presented is as follows: + +- **No partitions are included** before the `START` time except for those specified in `reserved_history_periods`. +- **All manually created partitions** after the `END` time are retained. +- Apart from manually deleted or accidentally lost partitions, the table contains all partitions within a specific range: + - If `create_history_partition` is `true`, + - if `history_partition_num` is specified, the specific range is `[max(START, current time) - history_partition_num * time_unit), END]`; + - if `history_partition_num` is not specified, the specific range is `[START, END]`. + - If `dynamic_partition.create_history_partition` is `false`, the specific range is `[current time, END]`, also including existing partitions in `[START, current time)`. + The entire specific range is divided into multiple partition ranges based on `time_unit`. If a range intersects with an existing partition `X`, `X` is preserved; otherwise, the range will be completely covered by a partition created by dynamic partition. +- Unless the number of partitions is about to exceed `max_dynamic_partition_num`, the creation will fail. + +### Modify properties + +You can modify the properties of the dynamic partitioning with the following command: ``` ALTER TABLE tbl1 SET @@ -692,7 +792,7 @@ p20200521: ["2020-05-21", "2020-05-22") If the partition granularity is changed to MONTH at this time, the system will try to create a partition with the range `["2020-05-01", "2020-06-01")`, and this range conflicts with the existing partition. So it cannot be created. And the partition with the range `["2020-06-01", "2020-07-01")` can be created normally. Therefore, the partition between 2020-05-22 and 2020-05-30 needs to be filled manually. -### Check dynamic partition table scheduling status +### Check table scheduling status You can further view the scheduling of dynamic partitioned tables by using the following command: @@ -711,19 +811,19 @@ mysql> SHOW DYNAMIC PARTITION TABLES; 7 rows in set (0.02 sec) ``` -- LastUpdateTime: The last time of modifying dynamic partition properties -- LastSchedulerTime: The last time of performing dynamic partition scheduling -- State: The state of the last execution of dynamic partition scheduling -- LastCreatePartitionMsg: Error message of the last time to dynamically add partition scheduling -- LastDropPartitionMsg: Error message of the last execution of dynamic deletion partition scheduling +- `LastUpdateTime`: The last time of modifying dynamic partition properties +- `LastSchedulerTime`: The last time of performing dynamic partition scheduling +- `State`: The state of the last execution of dynamic partition scheduling +- `LastCreatePartitionMsg`: Error message of the last time to dynamically add partition scheduling +- `LastDropPartitionMsg`: Error message of the last execution of dynamic deletion partition scheduling -### Advanced operation +### Advanced operations -**FE Configuration Item** +**Modify FE configuration items** -- dynamic\_partition\_enable +- `dynamic\_partition\_enable` - Whether to enable Doris's dynamic partition feature. The default value is false, which is off. This parameter only affects the partitioning operation of dynamic partition tables, not normal tables. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also execute the following commands at runtime to take effect: + Whether to enable Doris's dynamic partition feature. The default value is `false`, which is off. This parameter only affects the partitioning operation of dynamic partition tables, not normal tables. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also execute the following commands at runtime to take effect: ```sql MySQL protocol: @@ -735,59 +835,41 @@ mysql> SHOW DYNAMIC PARTITION TABLES; `curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_enable=true` ``` - To turn off dynamic partitioning globally, set this parameter to false. + To turn off dynamic partitioning globally, set this parameter to `false`. -- dynamic\_partition\_check\_interval\_seconds +- `dynamic\_partition\_check\_interval\_seconds` - The execution frequency of dynamic partition threads defaults to 3600 (1 hour), that is, scheduling is performed every 1 hour. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also modify the following commands at runtime: + The execution frequency of dynamic partition threads defaults to 600 (10 minutes), that is, scheduling is performed every 10 minutes. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also modify the following commands at runtime: ```sql MySQL protocol: - + `ADMIN SET FRONTEND CONFIG ("dynamic_partition_check_interval_seconds" = "7200")` - + HTTP protocol: - + `curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_check_interval_seconds=432000` ``` -**Converting dynamic and manual partition tables to each other** +**Switching between dynamic partitioning and manual partitioning** -For a table, dynamic and manual partitioning can be freely converted, but they cannot exist at the same time, there is and only one state. +You can switch a table between dynamic and manual partitioning, but a table cannot be partitioned simultaneously by dynamic and manual partitioning. -**Converting Manual Partitioning to Dynamic Partitioning** +By running the `ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "<true/false>")` command, you can turn on and off dynamic partitioning. -If a table is not dynamically partitioned when it is created, it can be converted to dynamic partitioning at runtime by modifying the dynamic partitioning properties with `ALTER TABLE`, an example of which can be seen with `HELP ALTER TABLE`. +When dynamic partitioning is turned off, Apache Doris will no longer manage partitions automatically, and users need to create or delete partitions manually by using `ALTER TABLE`; when dynamic partitioning is turned on, redundant partitions will be deleted according to the rules of dynamic partitioning. -When dynamic partitioning feature is enabled, Doris will no longer allow users to manage partitions manually, but will automatically manage partitions based on dynamic partition properties. +## Auto partitioning -:::tip - -If `dynamic_partition.start` is set, historical partitions with a partition range before the start offset of the dynamic partition will be deleted. - -::: - -**Converting Dynamic Partitioning to Manual Partitioning** - -The dynamic partitioning feature can be disabled by executing `ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "false") ` and converting it to a manual partition table. - -When dynamic partitioning feature is disabled, Doris will no longer manage partitions automatically, and users will have to create or delete partitions manually by using `ALTER TABLE`. - -## Auto partition - -:::tip - -Doris version 2.1 starts to support automatic partitioning. To use this feature, please [download Doris 2.1](https://doris.apache.org/zh-CN/download) and refer to the documentation for version 2.1. - -::: +### Application scenario The Auto Partitioning feature supports automatic detection of whether the corresponding partition exists during the data import process. If it does not exist, the partition will be created automatically and imported normally. The auto partition function mainly solves the problem that the user expects to partition the table based on a certain column, but the data distribution of the column is scattered or unpredictable, so it is difficult to accurately create the required partitions when building or adjusting the structure of the table, or the number of partitions is so large that it is too cumbersome to create them manually. -Take the time type partition column as an example, in the Dynamic Partition function, we support the automatic creation of new partitions to accommodate real-time data at specific time periods. For real-time user behavior logs and other scenarios, this feature basically meets the requirements. However, in more complex scenarios, such as dealing with non-real-time data, the partition column is independent of the current system time and contains a large number of discrete values. At this t [...] +Take the time type partition column as an example, in dynamic partitioning, we support the automatic creation of new partitions to accommodate real-time data at specific time periods. For real-time user behavior logs and other scenarios, this feature basically meets the requirements. However, in more complex scenarios, such as dealing with non-real-time data, the partition column is independent of the current system time and contains a large number of discrete values. At this time, to im [...] -Suppose our table DDL is as follows: +Suppose the table DDL is as follows: ```sql CREATE TABLE `DAILY_TRADE_VALUE` @@ -828,7 +910,202 @@ PROPERTIES ( ); ``` -The table stores a large amount of business history data, partitioned based on the date the transaction occurred. As you can see when building the table, we need to manually create the partitions in advance. If the data range of the partitioned columns changes, for example, 2022 is added to the above table, we need to create a partition by [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION) to make changes to the table partition [...] +The table stores a large amount of business history data, partitioned based on the date the transaction occurred. As you can see when building the table, we need to manually create the partitions in advance. If the data range of the partitioned columns changes, for example, 2022 is added to the above table, we need to create a partition by [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION) to make changes to the table partition [...] + +### Syntax + +When creating a table, use the following syntax to populate the `partition_info` section in the `CREATE-TABLE`statement: + +- For RANGE partitioning: + + ```sql + AUTO PARTITION BY RANGE (FUNC_CALL_EXPR) + ( + ) + ``` + + Where, + + ```sql + FUNC_CALL_EXPR ::= date_trunc ( <partition_column>, '<interval>' ) + ``` + +:::info Note + +In Apache Doris 2.1.0 version, `FUNC_CALL_EXPR` needs not to be enclosed in parentheses. + +::: + +- For LIST partitioning: + + ```sql + AUTO PARTITION BY LIST(`partition_col`) + ( + ) + ``` + +**Sample** + + +- For RANGE partitioning: + + ```sql + CREATE TABLE `date_table` ( + `TIME_STAMP` datev2 NOT NULL COMMENT '采集日期' + ) ENGINE=OLAP + DUPLICATE KEY(`TIME_STAMP`) + AUTO PARTITION BY RANGE (date_trunc(`TIME_STAMP`, 'month')) + ( + ) + DISTRIBUTED BY HASH(`TIME_STAMP`) BUCKETS 10 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + ``` + +- For LIST partitioning: + + ```sql + CREATE TABLE `str_table` ( + `str` varchar not null + ) ENGINE=OLAP + DUPLICATE KEY(`str`) + AUTO PARTITION BY LIST (`str`) + ( + ) + DISTRIBUTED BY HASH(`str`) BUCKETS 10 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + ``` + +**Constraints** + +- In auto LIST partitioning, the partition name length **must** **not exceed 50 characters**. This length is derived from the concatenation and escape of contents of partition columns on corresponding data rows, so the actual allowed length may be shorter. +- In auto RANGE partitioning, the partition function only supports `date_trunc`, and the partition column supports only `DATE` or `DATETIME` formats. +- In auto LIST partitioning, function calls are not supported, and the partition column supports `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` data types, with partition values being enumeration values. +- In auto LIST partitioning, for every existing value in the partition column that does not correspond to a partition, a new independent partitioning will be created. + +**NULL value partitioning** + +When the session variable `allow_partition_column_nullable` is enabled, LIST and RANGE partitioning support null columns as partition columns. + +When an actual insertion encounters a null value in the partition column: + +- For auto LIST partitioning, the corresponding NULL value partition will be created automatically: +```sql +mysql> create table auto_null_list( + -> k0 varchar null + -> ) + -> auto partition by list (k0) + -> ( + -> ) + -> DISTRIBUTED BY HASH(`k0`) BUCKETS 1 + -> properties("replication_num" = "1"); +Query OK, 0 rows affected (0.10 sec) + +mysql> insert into auto_null_list values (null); +Query OK, 1 row affected (0.28 sec) + +mysql> select * from auto_null_list; ++------+ +| k0 | ++------+ +| NULL | ++------+ +1 row in set (0.20 sec) + +mysql> select * from auto_null_list partition(pX); ++------+ +| k0 | ++------+ +| NULL | ++------+ +1 row in set (0.20 sec) +``` + +- For auto LIST partitioning, **null columns are not supported to be partition columns**. +```sql +mysql> CREATE TABLE `range_table_nullable` ( + -> `k1` INT, + -> `k2` DATETIMEV2(3), + -> `k3` DATETIMEV2(6) + -> ) ENGINE=OLAP + -> DUPLICATE KEY(`k1`) + -> AUTO PARTITION BY RANGE (date_trunc(`k2`, 'day')) + -> ( + -> ) + -> DISTRIBUTED BY HASH(`k1`) BUCKETS 16 + -> PROPERTIES ( + -> "replication_allocation" = "tag.location.default: 1" + -> ); +ERROR 1105 (HY000): errCode = 2, detailMessage = AUTO RANGE PARTITION doesn't support NULL column +``` + +### Example + +When using auto partitioning, the example in the Application scenarios section can be rewritten as: + +```sql +CREATE TABLE `DAILY_TRADE_VALUE` +( + `TRADE_DATE` datev2 NOT NULL COMMENT '交易日期', + `TRADE_ID` varchar(40) NOT NULL COMMENT '交易编号', + ...... +) +UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`) +AUTO PARTITION BY RANGE (date_trunc(`TRADE_DATE`, 'year')) +( +) +DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10 +PROPERTIES ( + "replication_num" = "1" +); +``` + +At this point, the new table has no default partitions: + +```sql +mysql> show partitions from `DAILY_TRADE_VALUE`; +Empty set (0.12 sec) +``` + +After inserting data and checking again, it is found that the table has created the corresponding partitions: + +```sql +mysql> insert into `DAILY_TRADE_VALUE` values ('2012-12-13', 1), ('2008-02-03', 2), ('2014-11-11', 3); +Query OK, 3 rows affected (0.88 sec) + +mysql> show partitions from `DAILY_TRADE_VALUE`; ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ +| 180060 | p20080101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | +| 180039 | p20120101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2012-01-01]; ..types: [DATEV2]; keys: [2013-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | +| 180018 | p20140101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2014-01-01]; ..types: [DATEV2]; keys: [2015-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ +3 rows in set (0.12 sec) +``` + +It can be concluded that the partitions created by auto partitioning share the same functionality as partitions created by manual partitioning. + +### Conjunct with dynamic partitioning + +In order to maintain a clear partitioning logic, Apache Doris prohibits the simultaneous use of auto partitioning and dynamic partitioning on a single table, as this usage can easily lead to misuse. It is recommended to replace this with the standalone Auto Partitioning feature. + +:::info Note +In some early versions of Doris 2.1, this functionality was not prohibited but not recommended. +::: + +### Key points + +- Similar to regular partitioned tables, aoto LIST partitioning supports multi-column partitioning with no syntax differences. +- If partitions are created during data insertion or import processes, and the entire import process is not completed (fails or is canceled), the created partitions will not be automatically deleted. +- Tables using auto partitioning only differ in the method of partition creation, switching from manual to automatic. The original usage of the table and its created partitions remains the same as non-auto partitioning tables or partitions. +- To prevent the accidental creation of too many partitions, Apache Doris controls the maximum number of partitions an auto partitioning table can accommodate through the `max_auto_partition_num setting` in the FE configuration. This value can be adjusted if needed. +- When importing data into a table with auto partitioning enabled, the coordinator sends data with a polling interval different from regular tables. Refer to `olap_table_sink_send_interval_auto_partition_factor` in [BE Configuration](../admin-manual/config/be-config.md) for details. This setting does not have an impact after `enable_memtable_on_sink_node` is enabled. +- During data insertion using `INSERT-OVERWRITE`, if a specific partition for override is specified, the auto partitioning table behaves like a regular table during this process and does not create new partitions. +- If metadata operations are involved when importing and creating partitions, the import process may fail. ## Manual bucketing @@ -891,7 +1168,7 @@ properties("estimate_partition_size" = "100G") The new configuration parameter estimate_partition_size indicates the amount of data for a single partition. This parameter is optional and if not given, Doris will take the default value of estimate_partition_size to 10GB. -As you know from the above, a partitioned bucket is a Tablet at the physical level, and for best performance, it is recommended that the Tablet size be in the range of 1GB - 10GB. So how does the automatic bucketing projection ensure that the Tablet size is within this range? +As you know from the above, a partitioned bucket is a tablet at the physical level, and for best performance, it is recommended that the tablet size be in the range of 1GB - 10GB. So how does the automatic bucketing projection ensure that the tablet size is within this range? To summarize, there are a few principles. @@ -1050,9 +1327,9 @@ According to the above algorithm, the initial number of buckets and the number o When encountering this error, it is usually because the BE encountered a problem when creating data tablets. You can troubleshoot by following these steps: - In the fe.log, search for the `Failed to create partition` log entry at the corresponding timestamp. In this log entry, you may find a series of number pairs similar to `{10001-10010}`. The first number in the pair represents the Backend ID, and the second number represents the Tablet ID. For example, this number pair indicates that the creation of Tablet ID 10010 on Backend ID 10001 failed. - - Go to the be.INFO log of the corresponding Backend and search for tablet ID-related logs within the corresponding time period to find error messages. + - Go to the be.INFO log of the corresponding Backend and search for Tablet ID-related logs within the corresponding time period to find error messages. - Here are some common tablet creation failure errors, including but not limited to: - - The BE did not receive the relevant task. In this case, you cannot find tablet ID-related logs in be.INFO or the BE reports success but actually fails. For these issues, please refer to the [Installation and Deployment](../install/cluster-deployment/standard-deployment) section to check the connectivity between FE and BE. + - The BE did not receive the relevant task. In this case, you cannot find Tablet ID-related logs in be.INFO or the BE reports success but actually fails. For these issues, please refer to the [Installation and Deployment](../install/cluster-deployment/standard-deployment) section to check the connectivity between FE and BE. - Pre-allocated memory failure. This may be because the byte length of a row in the table exceeds 100KB. - `Too many open files`. The number of open file handles exceeds the Linux system limit. You need to modify the handle limit of the Linux system. diff --git a/versioned_docs/version-2.1/table-design/data-partition.md b/versioned_docs/version-2.1/table-design/data-partition.md index 2e2a1a9fdc..ed863ab76b 100644 --- a/versioned_docs/version-2.1/table-design/data-partition.md +++ b/versioned_docs/version-2.1/table-design/data-partition.md @@ -5,7 +5,7 @@ } --- -<!-- +<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information @@ -38,31 +38,31 @@ A table consists of rows and columns: - Column: Used to describe different fields in a row of data; -- Columns can be divided into two types: Key and Value. From a business perspective, Key and Value can correspond to dimension columns and metric columns, respectively. The key columns in Doris are those specified in the table creation statement, which are the columns following the keywords `unique key`, `aggregate key`, or `duplicate key`. The remaining columns are value columns. From the perspective of the aggregation model, rows with the same Key columns will be aggregated into a sing [...] +- Columns can be divided into two types: Key and Value. From a business perspective, Key and Value can correspond to dimension columns and metric columns respectively. The key columns in Apache Doris are those specified in the table creation statement, which are the columns following the keywords `unique key`, `aggregate key`, or `duplicate key`. The remaining columns are value columns. From the perspective of the aggregation model, rows with the same Key columns will be aggregated into [...] ### Partition & Tablet -Doris supports two levels of data partitioning. The first level is Partitioning, which supports Range and List partition. The second level is Bucket (also known as Tablet), which supports Hash and Random . If no partitioning is established during table creation, Doris generates a default partition that is transparent to the user. When using the default partition, only Bucket is supported. +Apache Doris supports two levels of data partitioning. The first level is partition, which supports RANGE partitioning and LIST partitioning. The second level is tablet (also called bucket), which supports Hash bucket and Random bucket. If no partition is established during table creation, Apache Doris generates a default partition that is transparent to the user. When using the default partition, only bucket is supported. -In the Doris storage engine, data is horizontally partitioned into several tablets. Each tablet contains several rows of data. There is no overlap between the data in different tablets, and they are stored physically independently. +In the Apache Doris storage engine, data is horizontally partitioned into several tablets. Each tablet contains several rows of data. There is no overlap between the data in different tablets, and they are stored physically independently. Multiple tablets logically belong to different partitions. A single tablet belongs to only one partition, while a partition contains several tablets. Because tablets are stored physically independently, partitions can also be considered physically independent. The tablet is the smallest physical storage unit for operations such as data movement and replication. Several partitions compose a table. The partition can be considered the smallest logical management unit. -Benefits of Two-Level data partitioning: +The benefits of Apache Doris's two-level data partitioning are as follows: -- For dimensions with time or similar ordered values, such dimension columns can be used as partitioning columns. The partition granularity can be evaluated based on import frequency and partition data volume. +- Columns with ordered values can be used as partitioning columns. The partition granularity can be evaluated based on import frequency and partition data volume. -- Historical data deletion requirements: If there is a need to delete historical data (such as retaining only the data for the most recent several days), composite partition can be used to achieve this goal by deleting historical partitions. Alternatively, DELETE statements can be sent within specified partitions to delete data. +- If there is a need to delete historical data (such as retaining only the data for the most recent several days), composite partition can be used to achieve this goal by deleting historical partitions. Alternatively, `DELETE` statements can be sent within specified partitions to delete data. -- Solving data skew issues: Each partition can specify the number of buckets independently. For example, when partitioning by day and there are significant differences in data volume between days, the number of buckets for each partition can be specified to reasonably distribute data across different partitions. It is recommended to choose a column with high distinctiveness as the bucketing column. +- Each partition can specify the number of buckets independently. For example, when data is partitioned by day and there are significant differences in data volume between days, the number of buckets for each partition can be specified to reasonably distribute data across different partitions. It is recommended to choose a column with high distinctiveness as the bucketing column. -### Example of creating a table +### Example of creating a table -CREATE TABLE in Doris is a synchronous command. It returns results after the SQL execution is completed. Successful returns indicate successful table creation. For more information, please refer to [CREATE TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE), or input the `HELP CREATE TABLE;` command. +`CREATE TABLE` in Apache Doris is a synchronous command which returns the result once the SQL is executed. Successful returns indicate successful table creation. For more information, refer to [CREATE-TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE) or input the `HELP CREATE TABLE` command. -This section introduces how to create tables in Doris by range partiton and hash buckets. +The following code sample introduces how to create tables in Apache Doris by RANGE partitioning and Hash buckets. ```sql -- Range Partition @@ -94,17 +94,17 @@ PROPERTIES ); ``` -Here use the AGGREGATE KEY data model as an example. In the AGGREGATE KEY data model, all columns that are specified with an aggregation type (SUM, REPLACE, MAX, or MIN) are Value columns. The rest are the Key columns. +Here use Aggregate Key Model as an example. In Aggregate Key Model, all columns that are specified with an aggregation type (SUM, REPLACE, MAX, or MIN) are Value columns. The rest are the Key columns. -In the PROPERTIES at the end of the CREATE TABLE statement, you can find detailed information about the relevant parameters that can be set in PROPERTIES by referring to the documentation on [CREATE TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE). +For more information about what fields can be set in the `PROPERTIES` section of `CREATE TABLE`, refer to [CREATE-TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE.md). -The default type of ENGINE is OLAP. In Doris, only this OLAP ENGINE type is responsible for data management and storage by Doris itself. Other ENGINE types, such as mysql, broker, es, etc., are essentially just mappings to tables in other external databases or systems, allowing Doris to read this data. However, Doris itself does not create, manage, or store any tables or data for non-OLAP ENGINE types. +The default type of `ENGINE` is `OLAP`. Only OLAP is responsible for data management and storage by Apache Doris itself. Other engine types, such as MySQL, Broker and ES, are essentially just mappings to tables in other external databases or systems, allowing Apache Doris to read this data. However, Apache Doris itself does not create, manage, or store any tables or data for engine types except OLAP. -`IF NOT EXISTS` indicates that if the table has not been created before, it will be created. Note that this only checks if the table name exists and does not check if the schema of the new table is the same as the schema of an existing table. Therefore, if there is a table with the same name but a different schema, this command will also return successfully, but it does not mean that a new table and a new schema have been created. +`IF NOT EXISTS` indicates that if the table has not been created before, it will be created. Note that this only checks if the table name exists and does not check if the schema of the new table is the same as the schema of an existing table. Therefore, if there is a table with the same name but a different schema, this command will also return successfully, but it does not mean that a new table with a new schema has been created. -### View partition +### View partitions -You can use the `show create table` command to view the partition information of a table. +View the partiton information of a table by running the `show create table` command. ```sql > show create table example_range_tbl @@ -141,7 +141,7 @@ You can use the `show create table` command to view the partition information of +-------------------+---------------------------------------------------------------------------------------------------------+ ``` -You can use `show partitions from your_table` command to view the partition information of a table. +Or run the `show partitions from your_table` command. ``` > show partitions from example_range_tbl @@ -161,15 +161,15 @@ You can use `show partitions from your_table` command to view the partition info +---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ ``` -### Alter partition +### Alter partitions -You can add a new partition by using the `alter table add partition` command. +You can add a new partition by running the `alter table add partition ` command. ```sql ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; ``` -For more partition modification operations, please refer to the SQL manual on [ALTER-TABLE-PARTITION](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION). +For more information about how to alter partitions, refer to [ALTER-TABLE-PARTITION](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION.md). ## Manual partitioning @@ -187,13 +187,13 @@ For more partition modification operations, please refer to the SQL manual on [A - Overlapping ranges are not allowed when creating partitions. -### Range partition +### RANGE partitioning -Partition columns are usually time columns for convenient management of old and new data. Range partition supports column types such as DATE, DATETIME, TINYINT, SMALLINT, INT, BIGINT, and LARGEINT. +Partition columns are usually time columns for convenient management of old and new data. RANGE partitioning supports column types such as `DATE`, `DATETIME`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, and `LARGEINT`. -Partition information supports four writing methods: +Partition information supports the following four writing methods: -- FIXED RANGE: the partition as a left-closed, right-open interval. +- `FIXED RANGE`: This method defines the partition as a left-closed, right-open interval. ```sql PARTITION BY RANGE(col1[, col2, ...]) @@ -214,7 +214,7 @@ PARTITION BY RANGE(`date`) ) ``` -- LESS THAN: Only define the upper bound of the partition. The lower bound is determined by the upper bound of the previous partition. +- `LESS THAN`: This method only defines the upper bound of the partition. The lower bound is determined by the upper bound of the previous partition. ```sql PARTITION BY RANGE(col1[, col2, ...]) @@ -243,7 +243,7 @@ PARTITION BY RANGE(`date`) ) ``` -- BATCH RANGE: Batch create RANGE partitions of numeric and time types, defining the partitions as left-closed, right-open intervals, and setting the step size. +- `BATCH RANGE`: This method batch creates partitions based on ranges of number or time, defining the partitions as left-closed, right-open intervals and setting the step size. ```sql PARTITION BY RANGE(int_col) @@ -271,7 +271,7 @@ PARTITION BY RANGE(`date`) ) ``` -- MULTI RANGE: Batch create RANGE partitions, defining the partitions as left-closed, right-open intervals. For example: +- `MULTI RANGE`: This method batch creates partitions based on range partitioning, defining the partitions as left-closed, right-open intervals. For example: ```sql PARTITION BY RANGE(col) @@ -284,11 +284,11 @@ PARTITION BY RANGE(col) ) ``` -### List partition +### LIST partitioning -Partition columns support data types such as BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME, CHAR, and VARCHAR. Partition values are enumerated values. Only when the data is one of the enumerated values of the target partition, the partition can be hit . +Partition columns based on LIST partitioning support data types such as `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, and `VARCHAR`. Partition values are enumerated values. Only when the data is one of the enumerated values of the target partition, the partition can be hit. -Partitions support specifying the enumerated values contained in each partition through VALUES IN (...). +Partitions support specifying the enumerated values contained in each partition through `VALUES IN (...)`. For example: @@ -301,7 +301,7 @@ PARTITION BY LIST(city) ) ``` -List partition also supports multi-column partitioning, for example: +LIST partitioning also supports multi-column partitioning, for example: ```sql PARTITION BY LIST(id, city) @@ -312,17 +312,91 @@ PARTITION BY LIST(id, city) ) ``` -## Dynamic partition +### NULL partitioning -Dynamic partition is designed to manage partition's Time-to-Life (TTL), reducing the burden on users. +Partition columns based on NULL partitioning must be not null columns by default. If you need to use null columns, set the session variable `allow_partition_column_nullable` to `true`. For LIST partitioning, the NULL partitioning is supported, while for RANGE partitioning, null values will be assigned to the `less than` partition. The columns are as follows: -In some usage scenarios, the user will partition the table according to the day and perform routine tasks regularly every day. At this time, the user needs to manually manage the partition. Otherwise, the data load may fail because the user does not create a partition. This brings additional maintenance costs to the user. +**LIST partitioning** -With dynamic partitioning, users can define rules for partition creation and deletion when establishing tables. The FE initiates a background thread to handle partition creation or deletion based on these user-defined rules. Users also have the flexibility to modify these rules during runtime. +```sql +mysql> create table null_list( + -> k0 varchar null + -> ) + -> partition by list (k0) + -> ( + -> PARTITION pX values in ((NULL)) + -> ) + -> DISTRIBUTED BY HASH(`k0`) BUCKETS 1 + -> properties("replication_num" = "1"); +Query OK, 0 rows affected (0.11 sec) + +mysql> insert into null_list values (null); +Query OK, 1 row affected (0.19 sec) + +mysql> select * from null_list; ++------+ +| k0 | ++------+ +| NULL | ++------+ +1 row in set (0.18 sec) +``` -It's important to note that dynamic partitioning is exclusively supported by range partitions. Currently, the functionality enables dynamic addition and deletion of partitions. +**RANGE partitioning with the `less than` partition** -:::tip +```sql +mysql> create table null_range( + -> k0 int null + -> ) + -> partition by range (k0) + -> ( + -> PARTITION p10 values less than (10), + -> PARTITION p100 values less than (100), + -> PARTITION pMAX values less than (maxvalue) + -> ) + -> DISTRIBUTED BY HASH(`k0`) BUCKETS 1 + -> properties("replication_num" = "1"); +Query OK, 0 rows affected (0.12 sec) + +mysql> insert into null_range values (null); +Query OK, 1 row affected (0.19 sec) + +mysql> select * from null_range partition(p10); ++------+ +| k0 | ++------+ +| NULL | ++------+ +1 row in set (0.18 sec) +``` + +**RANGE partitioning without the `less than` partition** + +```sql +mysql> create table null_range2( + -> k0 int null + -> ) + -> partition by range (k0) + -> ( + -> PARTITION p200 values [("100"), ("200")) + -> ) + -> DISTRIBUTED BY HASH(`k0`) BUCKETS 1 + -> properties("replication_num" = "1"); +Query OK, 0 rows affected (0.13 sec) + +mysql> insert into null_range2 values (null); +ERROR 5025 (HY000): Insert has filtered data in strict mode, tracking_url=...... +``` + +## Dynamic partitioning + +Dynamic partitioning is designed to manage the lifecycle of partitions, reducing the burden on users. + +Dynamic partitioning only supports RANGE partitioning based on `DATE` or `DATETIME` columns. It is applicable for cases where time data in partition columns grows synchronously with the real world. In such scenarios, data can be partitioned flexibly based on time data, and can be automatically stored with the cold-hot tiering strategy or recycled according to settings. + +For partitioning method that can be more widely applicable, see [Auto partitioning](https://doris.apache.org/docs/table-design/data-partition/#auto-partitioning). + +:::caution Warning This feature will be disabled when synchronized by CCR. If this table is copied by CCR, that is, PROPERTIES contains `is_being_synced = true`, it will be displayed as enabled in show create table, but will not actually take effect. When `is_being_synced` is set to `false`, these features will resume working, but the `is_being_synced` property is for CCR peripheral modules only and should not be manually set during CCR synchronization. @@ -400,15 +474,15 @@ The rules of dynamic partition are prefixed with `dynamic_partition.`: - `dynamic_partition.buckets` - The number of buckets corresponding to the dynamically created partitions. + The number of buckets corresponding to the dynamically created partitions. - `dynamic_partition.replication_num` - The replication number of dynamic partition.If not filled in, defaults to the number of table's replication number. + The replication number of dynamic partition.If not filled in, defaults to the number of table's replication number. - `dynamic_partition.start_day_of_week` - When `time_unit` is` WEEK`, this parameter is used to specify the starting point of the week. The value ranges from 1 to 7. Where 1 is Monday and 7 is Sunday. The default is 1, which means that every week starts on Monday. + When `time_unit` is` WEEK`, this parameter is used to specify the starting point of the week. The value ranges from 1 to 7. Where 1 is Monday and 7 is Sunday. The default is 1, which means that every week starts on Monday. - `dynamic_partition.start_day_of_month` @@ -490,7 +564,7 @@ The rules of dynamic partition are prefixed with `dynamic_partition.`: Note that when set to SSD, the `hot_partition_num` property will no longer take effect, all partitions will default to SSD storage media and the cooldown time will be 9999-12-31 23:59:59. -#### Create history partition rules +### Create history partition rules When `create_history_partition` is `true`, i.e. history partition creation is enabled, Doris determines the number of history partitions to be created based on `dynamic_partition.start` and `dynamic_partition.history_partition_num`. @@ -670,9 +744,35 @@ p20210523 p202006: ["2020-06-28", "2020-07-28") ``` -### Modify dynamic partition properties +### Principle and control behavior -You can modify the properties of the dynamic partition with the following command +Apache Doris sets a fixed FE control thread that continuously checks the table based on dynamic partitioning at specific time intervals (specified by the `dynamic_partition_check_interval_seconds` field) to perform the necessary partition creation and deletion operations. + +Specifically, when dynamic partitioning is executed, the following checks and operations are performed (refer to the start time of the partition as `START` and the end time as `END`, and omit `dynamic_partition.`): + +- All partitions before `START` are deleted. +- If `create_history_partition` is `false`, create all partitions between the current time and `END`; if `create_history_partition` is `true`, not only all partitions between the current time and `END` are created, but also all partitions between `START` and current time are created. If `history_partition_num` is specified, the number of created partitions before current time cannot exceed the value of `history_partition_num`. + +Note that: + +- If the partition time range intersects with the `[START, END]` range, it is considered to belong to the current dynamic partition time range. +- If the newly created partition conflicts with an existing partition, the current partition is retained, and the new partition is not created. If the conflict occurs when the table is created, DDL will occur an error. + +Therefore, after the automatic maintenance of the partition table, the state presented is as follows: + +- **No partitions are included** before the `START` time except for those specified in `reserved_history_periods`. +- **All manually created partitions** after the `END` time are retained. +- Apart from manually deleted or accidentally lost partitions, the table contains all partitions within a specific range: + - If `create_history_partition` is `true`, + - if `history_partition_num` is specified, the specific range is `[max(START, current time) - history_partition_num * time_unit), END]`; + - if `history_partition_num` is not specified, the specific range is `[START, END]`. + - If `dynamic_partition.create_history_partition` is `false`, the specific range is `[current time, END]`, also including existing partitions in `[START, current time)`. + The entire specific range is divided into multiple partition ranges based on `time_unit`. If a range intersects with an existing partition `X`, `X` is preserved; otherwise, the range will be completely covered by a partition created by dynamic partition. +- Unless the number of partitions is about to exceed `max_dynamic_partition_num`, the creation will fail. + +### Modify properties + +You can modify the properties of the dynamic partitioning with the following command: ``` ALTER TABLE tbl1 SET @@ -692,7 +792,7 @@ p20200521: ["2020-05-21", "2020-05-22") If the partition granularity is changed to MONTH at this time, the system will try to create a partition with the range `["2020-05-01", "2020-06-01")`, and this range conflicts with the existing partition. So it cannot be created. And the partition with the range `["2020-06-01", "2020-07-01")` can be created normally. Therefore, the partition between 2020-05-22 and 2020-05-30 needs to be filled manually. -### Check dynamic partition table scheduling status +### Check table scheduling status You can further view the scheduling of dynamic partitioned tables by using the following command: @@ -711,19 +811,19 @@ mysql> SHOW DYNAMIC PARTITION TABLES; 7 rows in set (0.02 sec) ``` -- LastUpdateTime: The last time of modifying dynamic partition properties -- LastSchedulerTime: The last time of performing dynamic partition scheduling -- State: The state of the last execution of dynamic partition scheduling -- LastCreatePartitionMsg: Error message of the last time to dynamically add partition scheduling -- LastDropPartitionMsg: Error message of the last execution of dynamic deletion partition scheduling +- `LastUpdateTime`: The last time of modifying dynamic partition properties +- `LastSchedulerTime`: The last time of performing dynamic partition scheduling +- `State`: The state of the last execution of dynamic partition scheduling +- `LastCreatePartitionMsg`: Error message of the last time to dynamically add partition scheduling +- `LastDropPartitionMsg`: Error message of the last execution of dynamic deletion partition scheduling -### Advanced operation +### Advanced operations -**FE Configuration Item** +**Modify FE configuration items** -- dynamic\_partition\_enable +- `dynamic\_partition\_enable` - Whether to enable Doris's dynamic partition feature. The default value is false, which is off. This parameter only affects the partitioning operation of dynamic partition tables, not normal tables. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also execute the following commands at runtime to take effect: + Whether to enable Doris's dynamic partition feature. The default value is `false`, which is off. This parameter only affects the partitioning operation of dynamic partition tables, not normal tables. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also execute the following commands at runtime to take effect: ```sql MySQL protocol: @@ -735,59 +835,41 @@ mysql> SHOW DYNAMIC PARTITION TABLES; `curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_enable=true` ``` - To turn off dynamic partitioning globally, set this parameter to false. + To turn off dynamic partitioning globally, set this parameter to `false`. -- dynamic\_partition\_check\_interval\_seconds +- `dynamic\_partition\_check\_interval\_seconds` - The execution frequency of dynamic partition threads defaults to 3600 (1 hour), that is, scheduling is performed every 1 hour. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also modify the following commands at runtime: + The execution frequency of dynamic partition threads defaults to 600 (10 minutes), that is, scheduling is performed every 10 minutes. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also modify the following commands at runtime: ```sql MySQL protocol: - + `ADMIN SET FRONTEND CONFIG ("dynamic_partition_check_interval_seconds" = "7200")` - + HTTP protocol: - + `curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_check_interval_seconds=432000` ``` -**Converting dynamic and manual partition tables to each other** +**Switching between dynamic partitioning and manual partitioning** -For a table, dynamic and manual partitioning can be freely converted, but they cannot exist at the same time, there is and only one state. +You can switch a table between dynamic and manual partitioning, but a table cannot be partitioned simultaneously by dynamic and manual partitioning. -**Converting Manual Partitioning to Dynamic Partitioning** +By running the `ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "<true/false>")` command, you can turn on and off dynamic partitioning. -If a table is not dynamically partitioned when it is created, it can be converted to dynamic partitioning at runtime by modifying the dynamic partitioning properties with `ALTER TABLE`, an example of which can be seen with `HELP ALTER TABLE`. +When dynamic partitioning is turned off, Apache Doris will no longer manage partitions automatically, and users need to create or delete partitions manually by using `ALTER TABLE`; when dynamic partitioning is turned on, redundant partitions will be deleted according to the rules of dynamic partitioning. -When dynamic partitioning feature is enabled, Doris will no longer allow users to manage partitions manually, but will automatically manage partitions based on dynamic partition properties. +## Auto partitioning -:::tip - -If `dynamic_partition.start` is set, historical partitions with a partition range before the start offset of the dynamic partition will be deleted. - -::: - -**Converting Dynamic Partitioning to Manual Partitioning** - -The dynamic partitioning feature can be disabled by executing `ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "false") ` and converting it to a manual partition table. - -When dynamic partitioning feature is disabled, Doris will no longer manage partitions automatically, and users will have to create or delete partitions manually by using `ALTER TABLE`. - -## Auto partition - -:::tip - -Doris version 2.1 starts to support automatic partitioning. To use this feature, please [download Doris 2.1](https://doris.apache.org/zh-CN/download) and refer to the documentation for version 2.1. - -::: +### Application scenario The Auto Partitioning feature supports automatic detection of whether the corresponding partition exists during the data import process. If it does not exist, the partition will be created automatically and imported normally. The auto partition function mainly solves the problem that the user expects to partition the table based on a certain column, but the data distribution of the column is scattered or unpredictable, so it is difficult to accurately create the required partitions when building or adjusting the structure of the table, or the number of partitions is so large that it is too cumbersome to create them manually. -Take the time type partition column as an example, in the Dynamic Partition function, we support the automatic creation of new partitions to accommodate real-time data at specific time periods. For real-time user behavior logs and other scenarios, this feature basically meets the requirements. However, in more complex scenarios, such as dealing with non-real-time data, the partition column is independent of the current system time and contains a large number of discrete values. At this t [...] +Take the time type partition column as an example, in dynamic partitioning, we support the automatic creation of new partitions to accommodate real-time data at specific time periods. For real-time user behavior logs and other scenarios, this feature basically meets the requirements. However, in more complex scenarios, such as dealing with non-real-time data, the partition column is independent of the current system time and contains a large number of discrete values. At this time, to im [...] -Suppose our table DDL is as follows: +Suppose the table DDL is as follows: ```sql CREATE TABLE `DAILY_TRADE_VALUE` @@ -828,7 +910,202 @@ PROPERTIES ( ); ``` -The table stores a large amount of business history data, partitioned based on the date the transaction occurred. As you can see when building the table, we need to manually create the partitions in advance. If the data range of the partitioned columns changes, for example, 2022 is added to the above table, we need to create a partition by [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION) to make changes to the table partition [...] +The table stores a large amount of business history data, partitioned based on the date the transaction occurred. As you can see when building the table, we need to manually create the partitions in advance. If the data range of the partitioned columns changes, for example, 2022 is added to the above table, we need to create a partition by [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION) to make changes to the table partition [...] + +### Syntax + +When creating a table, use the following syntax to populate the `partition_info` section in the `CREATE-TABLE`statement: + +- For RANGE partitioning: + + ```sql + AUTO PARTITION BY RANGE (FUNC_CALL_EXPR) + ( + ) + ``` + + Where, + + ```sql + FUNC_CALL_EXPR ::= date_trunc ( <partition_column>, '<interval>' ) + ``` + +:::info Note + +In Apache Doris 2.1.0 version, `FUNC_CALL_EXPR` needs not to be enclosed in parentheses. + +::: + +- For LIST partitioning: + + ```sql + AUTO PARTITION BY LIST(`partition_col`) + ( + ) + ``` + +**Sample** + + +- For RANGE partitioning: + + ```sql + CREATE TABLE `date_table` ( + `TIME_STAMP` datev2 NOT NULL COMMENT '采集日期' + ) ENGINE=OLAP + DUPLICATE KEY(`TIME_STAMP`) + AUTO PARTITION BY RANGE (date_trunc(`TIME_STAMP`, 'month')) + ( + ) + DISTRIBUTED BY HASH(`TIME_STAMP`) BUCKETS 10 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + ``` + +- For LIST partitioning: + + ```sql + CREATE TABLE `str_table` ( + `str` varchar not null + ) ENGINE=OLAP + DUPLICATE KEY(`str`) + AUTO PARTITION BY LIST (`str`) + ( + ) + DISTRIBUTED BY HASH(`str`) BUCKETS 10 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + ``` + +**Constraints** + +- In auto LIST partitioning, the partition name length **must** **not exceed 50 characters**. This length is derived from the concatenation and escape of contents of partition columns on corresponding data rows, so the actual allowed length may be shorter. +- In auto RANGE partitioning, the partition function only supports `date_trunc`, and the partition column supports only `DATE` or `DATETIME` formats. +- In auto LIST partitioning, function calls are not supported, and the partition column supports `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` data types, with partition values being enumeration values. +- In auto LIST partitioning, for every existing value in the partition column that does not correspond to a partition, a new independent partitioning will be created. + +**NULL value partitioning** + +When the session variable `allow_partition_column_nullable` is enabled, LIST and RANGE partitioning support null columns as partition columns. + +When an actual insertion encounters a null value in the partition column: + +- For auto LIST partitioning, the corresponding NULL value partition will be created automatically: +```sql +mysql> create table auto_null_list( + -> k0 varchar null + -> ) + -> auto partition by list (k0) + -> ( + -> ) + -> DISTRIBUTED BY HASH(`k0`) BUCKETS 1 + -> properties("replication_num" = "1"); +Query OK, 0 rows affected (0.10 sec) + +mysql> insert into auto_null_list values (null); +Query OK, 1 row affected (0.28 sec) + +mysql> select * from auto_null_list; ++------+ +| k0 | ++------+ +| NULL | ++------+ +1 row in set (0.20 sec) + +mysql> select * from auto_null_list partition(pX); ++------+ +| k0 | ++------+ +| NULL | ++------+ +1 row in set (0.20 sec) +``` + +- For auto LIST partitioning, **null columns are not supported to be partition columns**. +```sql +mysql> CREATE TABLE `range_table_nullable` ( + -> `k1` INT, + -> `k2` DATETIMEV2(3), + -> `k3` DATETIMEV2(6) + -> ) ENGINE=OLAP + -> DUPLICATE KEY(`k1`) + -> AUTO PARTITION BY RANGE (date_trunc(`k2`, 'day')) + -> ( + -> ) + -> DISTRIBUTED BY HASH(`k1`) BUCKETS 16 + -> PROPERTIES ( + -> "replication_allocation" = "tag.location.default: 1" + -> ); +ERROR 1105 (HY000): errCode = 2, detailMessage = AUTO RANGE PARTITION doesn't support NULL column +``` + +### Example + +When using auto partitioning, the example in the Application scenarios section can be rewritten as: + +```sql +CREATE TABLE `DAILY_TRADE_VALUE` +( + `TRADE_DATE` datev2 NOT NULL COMMENT '交易日期', + `TRADE_ID` varchar(40) NOT NULL COMMENT '交易编号', + ...... +) +UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`) +AUTO PARTITION BY RANGE (date_trunc(`TRADE_DATE`, 'year')) +( +) +DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10 +PROPERTIES ( + "replication_num" = "1" +); +``` + +At this point, the new table has no default partitions: + +```sql +mysql> show partitions from `DAILY_TRADE_VALUE`; +Empty set (0.12 sec) +``` + +After inserting data and checking again, it is found that the table has created the corresponding partitions: + +```sql +mysql> insert into `DAILY_TRADE_VALUE` values ('2012-12-13', 1), ('2008-02-03', 2), ('2014-11-11', 3); +Query OK, 3 rows affected (0.88 sec) + +mysql> show partitions from `DAILY_TRADE_VALUE`; ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ +| 180060 | p20080101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | +| 180039 | p20120101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2012-01-01]; ..types: [DATEV2]; keys: [2013-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | +| 180018 | p20140101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2014-01-01]; ..types: [DATEV2]; keys: [2015-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ +3 rows in set (0.12 sec) +``` + +It can be concluded that the partitions created by auto partitioning share the same functionality as partitions created by manual partitioning. + +### Conjunct with dynamic partitioning + +In order to maintain a clear partitioning logic, Apache Doris prohibits the simultaneous use of auto partitioning and dynamic partitioning on a single table, as this usage can easily lead to misuse. It is recommended to replace this with the standalone Auto Partitioning feature. + +:::info Note +In some early versions of Doris 2.1, this functionality was not prohibited but not recommended. +::: + +### Key points + +- Similar to regular partitioned tables, aoto LIST partitioning supports multi-column partitioning with no syntax differences. +- If partitions are created during data insertion or import processes, and the entire import process is not completed (fails or is canceled), the created partitions will not be automatically deleted. +- Tables using auto partitioning only differ in the method of partition creation, switching from manual to automatic. The original usage of the table and its created partitions remains the same as non-auto partitioning tables or partitions. +- To prevent the accidental creation of too many partitions, Apache Doris controls the maximum number of partitions an auto partitioning table can accommodate through the `max_auto_partition_num setting` in the FE configuration. This value can be adjusted if needed. +- When importing data into a table with auto partitioning enabled, the coordinator sends data with a polling interval different from regular tables. Refer to `olap_table_sink_send_interval_auto_partition_factor` in [BE Configuration](../admin-manual/config/be-config.md) for details. This setting does not have an impact after `enable_memtable_on_sink_node` is enabled. +- During data insertion using `INSERT-OVERWRITE`, if a specific partition for override is specified, the auto partitioning table behaves like a regular table during this process and does not create new partitions. +- If metadata operations are involved when importing and creating partitions, the import process may fail. ## Manual bucketing @@ -891,7 +1168,7 @@ properties("estimate_partition_size" = "100G") The new configuration parameter estimate_partition_size indicates the amount of data for a single partition. This parameter is optional and if not given, Doris will take the default value of estimate_partition_size to 10GB. -As you know from the above, a partitioned bucket is a Tablet at the physical level, and for best performance, it is recommended that the Tablet size be in the range of 1GB - 10GB. So how does the automatic bucketing projection ensure that the Tablet size is within this range? +As you know from the above, a partitioned bucket is a tablet at the physical level, and for best performance, it is recommended that the tablet size be in the range of 1GB - 10GB. So how does the automatic bucketing projection ensure that the tablet size is within this range? To summarize, there are a few principles. @@ -1050,9 +1327,9 @@ According to the above algorithm, the initial number of buckets and the number o When encountering this error, it is usually because the BE encountered a problem when creating data tablets. You can troubleshoot by following these steps: - In the fe.log, search for the `Failed to create partition` log entry at the corresponding timestamp. In this log entry, you may find a series of number pairs similar to `{10001-10010}`. The first number in the pair represents the Backend ID, and the second number represents the Tablet ID. For example, this number pair indicates that the creation of Tablet ID 10010 on Backend ID 10001 failed. - - Go to the be.INFO log of the corresponding Backend and search for tablet ID-related logs within the corresponding time period to find error messages. + - Go to the be.INFO log of the corresponding Backend and search for Tablet ID-related logs within the corresponding time period to find error messages. - Here are some common tablet creation failure errors, including but not limited to: - - The BE did not receive the relevant task. In this case, you cannot find tablet ID-related logs in be.INFO or the BE reports success but actually fails. For these issues, please refer to the [Installation and Deployment](../install/cluster-deployment/standard-deployment) section to check the connectivity between FE and BE. + - The BE did not receive the relevant task. In this case, you cannot find Tablet ID-related logs in be.INFO or the BE reports success but actually fails. For these issues, please refer to the [Installation and Deployment](../install/cluster-deployment/standard-deployment) section to check the connectivity between FE and BE. - Pre-allocated memory failure. This may be because the byte length of a row in the table exceeds 100KB. - `Too many open files`. The number of open file handles exceeds the Linux system limit. You need to modify the handle limit of the Linux system. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org