This is an automated email from the ASF dual-hosted git repository. w41ter pushed a commit to branch update_ccr_manual in repository https://gitbox.apache.org/repos/asf/doris-website.git
commit b46ff758ab03d147a77673b69a119e218b1fe4f2 Author: w41ter <w41te...@gmail.com> AuthorDate: Mon Dec 23 07:06:41 2024 +0000 Update CCR manual --- docs/admin-manual/data-admin/ccr/manual.md | 192 ++++++++++++++------- .../current/admin-manual/data-admin/ccr/manual.md | 150 ++++++++++------ 2 files changed, 229 insertions(+), 113 deletions(-) diff --git a/docs/admin-manual/data-admin/ccr/manual.md b/docs/admin-manual/data-admin/ccr/manual.md index d9cec3a7c12..a7f9ea5b0ac 100644 --- a/docs/admin-manual/data-admin/ccr/manual.md +++ b/docs/admin-manual/data-admin/ccr/manual.md @@ -24,14 +24,38 @@ specific language governing permissions and limitations under the License. --> -## Limitations +## Usage Requirements -### Network Constraints +### Network Requirements -- Syncer needs to be able to communicate with both the upstream and downstream FE (Frontend) and BE (Backend). +- The Syncer must be able to communicate with both the upstream and downstream FE (Frontend) and BE (Backend). -- The downstream BE and upstream BE are directly connected through the IP used by the Doris BE process (as seen in `show frontends/backends`). +- The downstream BE and upstream BE must use the IP of the Doris BE process (as seen in `show frontends/backends`), which must be directly accessible. +### Permission Requirements + +When syncing, the user must provide accounts for both upstream and downstream, and these accounts must have the following permissions: + +1. **Select_priv**: Read-only permissions on databases and tables. +2. **Load_priv**: Write permissions on databases and tables, including Load, Insert, Delete, etc. +3. **Alter_priv**: Permissions to modify databases and tables, including renaming databases/tables, adding/deleting/changing columns, adding/removing partitions, etc. +4. **Create_priv**: Permissions to create databases, tables, and views. +5. **drop_priv**: Permissions to delete databases, tables, and views. + +Additionally, **Admin privileges** are required (this may be removed in the future). These are needed to check the enable binlog configuration, which currently requires admin rights. + +### Version Requirements + +Minimum version required: v2.0.15 + +:::caution +**Starting from versions 2.1.8/3.0.4, the minimum supported Doris version for the ccr syncer is 2.1. Version 2.0 will no longer be supported.** +::: + +#### Versions Not Recommended for Use + +Doris Versions: +- 2.1.5/2.0.14: If upgrading from previous versions to these two versions, and the user has a drop partition operation, an NPE may occur during upgrade or restart. This is due to a new field introduced in these versions, which older versions don't have, causing a default value of null. This issue was fixed in 2.1.6/2.0.15. ## Start Syncer @@ -58,7 +82,7 @@ output_dir **Start options** -**--daemon** +**--daemon** Run Syncer in the background, set to false by default. @@ -66,7 +90,7 @@ Run Syncer in the background, set to false by default. bash bin/start_syncer.sh --daemon ``` -**--db_type** +**--db_type** Syncer can currently use two databases to store its metadata, `sqlite3 `(for local storage) and `mysql `(for local or remote storage). @@ -78,7 +102,7 @@ The default value is sqlite3. When using MySQL to store metadata, Syncer will use `CREATE IF NOT EXISTS `to create a database called `ccr`, where the metadata table related to CCR will be saved. -**--db_dir** +**--db_dir** **This option only works when db uses `sqlite3`.** @@ -100,9 +124,9 @@ bash bin/start_syncer.sh --db_host 127.0.0.1 --db_port 3306 --db_user root --db_ The default values of db_host and db_port are shown in the example. The default values of db_user and db_password are empty. -**--log_dir** +**--log_dir** -Output path of the logs: +Output path of the logs: ```SQL bash bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log @@ -110,7 +134,7 @@ bash bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log The default path is`SYNCER_OUTPUT_DIR/log` and the default file name is `ccr_syncer.log`. -**--log_level** +**--log_level** Used to specify the output level of Syncer logs. @@ -134,7 +158,7 @@ Under --daemon, the default value of log_level is `info`. When running in the foreground, log_level defaults to `trace`, and logs are saved to log_dir using the tee command. -**--host && --port** +**--host && --port** Used to specify the host and port of Syncer, where host only plays the role of distinguishing itself in the cluster, which can be understood as the name of Syncer, and the name of Syncer in the cluster is `host: port`. @@ -144,7 +168,7 @@ bash bin/start_syncer.sh --host 127.0.0.1 --port 9190 The default value of host is 127.0.0.1, and the default value of port is 9190. -**--pid_dir** +**--pid_dir** Used to specify the storage path of the pid file @@ -195,7 +219,7 @@ Specify the names of the pid files to be stopped, wrap the names in `""` and sep Follow the default configurations. -**--pid_dir** +**--pid_dir** Specify the directory where the pid file is located. The above three stopping methods all depend on the directory where the pid file is located for execution. @@ -207,7 +231,7 @@ The effect of the above example is to close the Syncer corresponding to all pid The default value is `SYNCER_OUTPUT_DIR/bin`. -**--host && --port** +**--host && --port** Stop the Syncer corresponding to host: port in the pid_dir path. @@ -217,7 +241,7 @@ bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190 The default value of host is 127.0.0.1, and the default value of port is empty. That is, specifying the host alone will degrade **method 1** to **method 3**. **Method 1** will only take effect when neither the host nor the port is empty. -**--files** +**--files** Stop the Syncer corresponding to the specified pid file name in the pid_dir path. @@ -298,7 +322,7 @@ The job_name is the name specified when create_ccr. ```shell curl -X POST -H "Content-Type: application/json" -d '{ "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/pause +}' http://ccr_syncer_host:ccr_syncer_port/pause ``` ### Resume Job @@ -388,74 +412,114 @@ output_dir bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db ``` -## High availability of Syncer +## Syncer High Availability -The high availability of Syncer relies on MySQL. If MySQL is used as the backend storage, the Syncer can discover other Syncers. If one Syncer crashes, the others will take over its tasks. +Syncer high availability relies on MySQL. If MySQL is used as the backend storage, Syncer can detect other Syncers. If one crashes, others will take over its tasks. -## Privilege requirements +## Usage Notes -1. `select_priv`: read-only privileges for databases and tables -2. `load_priv`: write privileges for databases and tables, including load, insert, delete, etc. -3. `alter_priv`: privilege to modify databases and tables, including renaming databases/tables, adding/deleting/changing columns, adding/deleting partitions, etc. -4. `create_priv`: privilege to create databases, tables, and views -5. `drop_priv`: privilege to drop databases, tables, and views +### `IS_BEING_SYNCED` Attribute -Admin privileges are required (We are planning on removing this in future versions). This is used to check the `enable binlog config`. +When the CCR (Cluster-to-Cluster Replication) feature is enabled, a replica table (referred to as the target table, located in the target cluster) is created in the target cluster for each table in the source cluster’s sync scope (referred to as the source table, located in the source cluster). However, some features and attributes need to be disabled or erased during the creation of the replica table to ensure the correctness of the sync process. -## Feature +For example: -### Rate limit +- The source table may contain information that might not be synced to the target cluster, such as `storage_policy`, which could cause the target table creation to fail or behave abnormally. +- The source table may include dynamic features, such as dynamic partitions, which could result in behavior inconsistencies in the target table, causing partitions to be inconsistent. -BE-side configuration parameter - -```shell -download_binlog_rate_limit_kbs=1024 # Limits the download speed of Binlog (including Local Snapshot) from the source cluster to 1 MB/s in a single BE node -``` - -1. The `download_binlog_rate_limit_kbs` parameter is configured on the BE nodes of the source cluster. By setting this parameter, the data pull rate can be effectively limited. - -2. The `download_binlog_rate_limit_kbs` parameter primarily controls the speed of data transfer for each single BE node. To calculate the overall cluster rate, one would multiply the parameter value by the number of nodes in the cluster. - - -## IS_BEING_SYNCED - -:::tip -Doris v2.0 "is_being_synced" = "true" -::: - -During data synchronization using CCR, replica tables (referred to as target tables) are created in the target cluster for the tables within the synchronization scope of the source cluster (referred to as source tables). However, certain functionalities and attributes need to be disabled or cleared when creating replica tables to ensure the correctness of the synchronization process. For example: - -- The source tables may contain information that is not synchronized to the target cluster, such as `storage_policy`, which may cause the creation of the target table to fail or result in abnormal behavior. -- The source tables may have dynamic functionalities, such as dynamic partitioning, which can lead to uncontrolled behavior in the target table and result in inconsistent partitions. - -The attributes that need to be cleared during replication are: +Attributes that need to be erased due to invalidation during replication include: - `storage_policy` - `colocate_with` -The functionalities that need to be disabled during synchronization are: +Features that need to be disabled during synchronization include: - Automatic bucketing -- Dynamic partitioning +- Dynamic partitions -### Implementation +#### Implementation -When creating the target table, the Syncer controls the addition or deletion of the `is_being_synced` property. In CCR, there are two approaches to creating a target table: +When creating the target table, these attributes will be controlled by Syncer, either added or removed. In the CCR functionality, there are two ways to create a target table: -1. During table synchronization, the Syncer performs a full copy of the source table using backup/restore to obtain the target table. -2. During database synchronization, for existing tables, the Syncer also uses backup/restore to obtain the target table. For incremental tables, the Syncer creates the target table using the CreateTableRecord binlog. +1. During table synchronization, Syncer performs a full copy of the source table using backup/restore to create the target table. +2. During database synchronization, for existing tables, Syncer also uses backup/restore to create the target table. For incremental tables, Syncer creates the target table via binlog containing CreateTableRecord. -Therefore, there are two entry points for inserting the `is_being_synced` property: the restore process during full synchronization and the getDdlStmt during incremental synchronization. +Thus, there are two points of insertion for adding the `is_being_synced` attribute: during the restore process in full synchronization and during incremental synchronization via `getDdlStmt`. -During the restoration process of full synchronization, the Syncer initiates a restore operation of the snapshot from the source cluster via RPC. During this process, the `is_being_synced` property is added to the RestoreStmt and takes effect in the final restoreJob, executing the relevant logic for `is_being_synced`. +During the restore process in full synchronization, Syncer triggers a restore of the snapshot in the original cluster via RPC. In this process, it will add the `is_being_synced` attribute to the RestoreStmt, which will be applied in the final `restoreJob`, executing the related `isBeingSynced` logic. In incremental synchronization, the `getDdlStmt` method will be enhanced with a `boolean getDdlForSync` parameter to distinguish if it’s a controlled transformation into target table DDL, an [...] -During incremental synchronization, add the `boolean getDdlForSync` parameter to the getDdlStmt method to differentiate whether it is a controlled transformation to the target table DDL, and execute the relevant logic for isBeingSynced during the creation of the target table. +The erasure of invalid attributes needs no further explanation, but the disabling of the above features requires clarification: -Regarding the disabling of the functionalities mentioned above: +- **Automatic Bucketing**: Automatic bucketing is applied when creating the table to compute the appropriate number of buckets. This might cause the source table and the target table to have a different number of buckets. Therefore, during synchronization, the bucket count of the source table is retrieved, and it’s also necessary to check whether the source table is an automatically bucketed table so that the feature can be restored after synchronization. The current approach sets the au [...] +- **Dynamic Partitions**: Dynamic partitions are controlled by adding `olapTable.isBeingSynced()` to the condition for performing add/drop partition operations. This ensures that during synchronization, the target table does not periodically perform add/drop partition operations. -- Automatic bucketing: Automatic bucketing is enabled when creating a table. It calculates the appropriate number of buckets. This may result in a mismatch in the number of buckets between the source and target tables. Therefore, during synchronization, obtain the number of buckets from the source table, as well as the information about whether the source table is an automatic bucketing table in order to restore the functionality after synchronization. The current recommended approach is [...] -- Dynamic partitioning: This is implemented by adding `olapTable.isBeingSynced()` to the condition for executing add/drop partition operations. This ensures that the target table does not perform periodic add/drop partition operations during synchronization. +:::caution -### Note +Under normal circumstances, the `is_being_synced` attribute should be entirely controlled by Syncer, and users should not modify this attribute manually. + +::: -The `is_being_synced` property should be fully controlled by the Syncer, and users should not modify this property manually unless there are exceptional circumstances. +### Recommended Configuration Settings + +- `restore_reset_index_id`: If the table to be synced contains an inverted index, this must be set to `false` on the target cluster. +- `ignore_backup_tmp_partitions`: If the upstream creates temporary partitions, Doris will prohibit performing backups, causing the ccr-syncer synchronization to break. This can be avoided by setting `ignore_backup_tmp_partitions=true` in the FE configuration. + +### Notes + +- During CCR synchronization, both backup/restore jobs and binlogs are stored in FE memory. Therefore, it is recommended to allocate at least 4GB of heap memory per CCR job (for both the source and target clusters). Additionally, consider modifying the following configurations to reduce memory consumption from unrelated jobs: + - Modify FE configuration `max_backup_restore_job_num_per_db`: + This configures the number of backup/restore jobs per DB stored in memory. The default value is 10, and setting it to 2 should suffice. + - Modify the source cluster's DB/table properties to set binlog retention limits: + - `binlog.max_bytes`: Maximum memory usage for binlogs. It is recommended to keep at least 4GB (default is unlimited). + - `binlog.ttl_seconds`: Binlog retention time. In versions prior to 2.0.5, the default is unlimited; in later versions, the default is one day (86400 seconds). + For example, to set binlog TTL to one hour: `ALTER TABLE table SET ("binlog.ttl_seconds"="3600")` +- The correctness of CCR also depends on the transaction status in the target cluster. To ensure transactions are not prematurely reclaimed during synchronization, the following configurations should be increased: + - `label_num_threshold`: Controls the number of TXN labels. + - `stream_load_default_timeout_second`: Controls the TXN timeout duration. + - `label_keep_max_second`: Controls the retention time after TXN ends. + - `streaming_label_keep_max_second`: Same as above. +- If it's a database synchronization and the source cluster has many tablets, the resulting CCR jobs may be very large. In this case, several FE configurations need to be adjusted: + - `max_backup_tablets_per_job`: The maximum number of tablets involved in a single backup job. Adjust this based on the tablet count (default is 300,000). Too many tablets could risk FE OOM (Out of Memory), so consider reducing the tablet count if possible. + - `thrift_max_message_size`: Maximum RPC packet size allowed by the FE thrift server. The default is 100MB. If the snapshot info exceeds 100MB due to too many tablets, this limit needs to be adjusted (maximum is 2GB). + - The snapshot info size can be found in the CCR syncer logs, using the keyword: `snapshot response meta size: %d, job info size: %d`. The snapshot info size is approximately `meta size + job info size`. + - `fe_thrift_max_pkg_bytes`: Another parameter for RPC packet size, which needs to be adjusted in version 2.0. The default value is 20MB. + - `restore_download_task_num_per_be`: The maximum number of download tasks per BE. The default is 3, which may be too small for restore jobs. It should be adjusted to 0 (i.e., disable this limit). From versions 2.1.8 and 3.0.4, this configuration is no longer needed. + - `backup_upload_task_num_per_be`: The maximum number of upload tasks per BE. The default is 3, which may be too small for backup jobs. It should be adjusted to 0 (i.e., disable this limit). From versions 2.1.8 and 3.0.4, this configuration is no longer needed. + - In addition to the above FE configurations, if the CCR job's DB type is MySQL, some MySQL configurations need to be adjusted: + - MySQL server limits the size of the data packet returned/inserted in a single `select/insert`. Increase the following configuration to relax this limit, for example, adjusting it to the maximum of 1GB: + ```ini + [mysqld] + max_allowed_packet = 1024MB + ``` + - MySQL client also has this limit. In CCR syncer versions 2.1.6/2.0.15 and earlier, the limit is 128MB. In later versions, this can be adjusted via the `--mysql_max_allowed_packet` parameter (in bytes). The default value is 1024MB. + > Note: Starting from versions 2.1.8 and 3.0.4, CCR syncer no longer stores snapshot info in the DB, so the default data packet size is already sufficient. +- Similarly, BE-side configurations need to be adjusted: + - `thrift_max_message_size`: The maximum RPC packet size allowed by the BE thrift server. The default is 100MB. If the agent task size exceeds 100MB due to too many tablets, this limit needs to be adjusted (maximum is 2GB). + - `be_thrift_max_pkg_bytes`: The same parameter as above, only needs adjustment in version 2.0. The default value is 20MB. +- Even after modifying the above configurations, as the number of tablets increases, the resulting snapshot size might exceed 2GB, which is the threshold for Doris FE edit log and RPC message size, leading to synchronization failure. Starting from versions 2.1.8 and 3.0.4, Doris can compress the snapshot to further increase the number of tablets supported during backup and restore. Compression can be enabled with the following parameters: + - `restore_job_compressed_serialization`: Enable compression for restore jobs (affects metadata compatibility, default is off). + - `backup_job_compressed_serialization`: Enable compression for backup jobs (affects metadata compatibility, default is off). + - `enable_restore_snapshot_rpc_compression`: Enable compression for snapshot info, mainly affecting RPC (default is on). + > Note: Since identifying whether a backup/restore job is compressed requires additional code, and versions before 2.1.8 and 3.0.4 do not include this code, once a backup/restore job is generated, it cannot be rolled back to an earlier Doris version. There are two exceptions: Backup/restore jobs that are already canceled or finished will not be compressed. Therefore, waiting for the job to finish or canceling the job before rolling back will allow safe rollback. +- Inside CCR, the database/table name is used as part of the internal job label. Therefore, if a CCR job exceeds the label length limit, the FE parameter `label_regex_length` can be adjusted to relax this limit (default value is 128). +- Since backup does not currently support backing up tables with cooldown tablets, encountering such a table will cause synchronization to fail. Therefore, ensure that the `storage_policy` attribute is not set on any table before creating the CCR job. + +### Performance-Related Parameters + +- If the user's data volume is very large, and the time required for backup and restore exceeds one day (the default value), the following parameters need to be adjusted as needed: + - `backup_job_default_timeout_ms`: Timeout for backup/restore tasks. This needs to be configured on the FE of both the source and target clusters. + - Modify upstream binlog retention time: `ALTER DATABASE $db SET PROPERTIES ("binlog.ttl_seconds" = "xxxx")` + +- Downstream BE download speed is slow: + - `max_download_speed_kbps`: Maximum download speed limit for a single download thread on a downstream BE, the default is 50MB/s. + - `download_worker_count`: Number of threads performing download tasks on the downstream BE, the default is 1. This should be adjusted based on the customer’s machine type. The goal is to increase the thread count to the maximum without affecting normal read/write operations. If this parameter is adjusted, there's no need to modify `max_download_speed_kbps`. + - For example, if the customer's network card can provide a maximum bandwidth of 1GB, and the maximum allowed download speed per thread is 200MB, then `download_worker_count` should be set to 4, without changing the `max_download_speed_kbps`. + +- Limit downstream BE's binlog download speed: + - BE-side configuration parameter: + ```shell + download_binlog_rate_limit_kbs=1024 # Limit the speed of binlog (including Local Snapshot) download from the source cluster to 1MB/s for each BE node. + ``` + Detailed parameters and explanations: + 1. The `download_binlog_rate_limit_kbs` parameter is configured on the source cluster BE nodes. Setting this parameter effectively limits the data pull speed. + 2. The `download_binlog_rate_limit_kbs` parameter mainly controls the speed of a single BE node. If the overall speed of the cluster is calculated, the parameter value should be multiplied by the number of BE nodes in the cluster. \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/manual.md index 819c923561a..2ab32e928e2 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/manual.md @@ -24,14 +24,38 @@ specific language governing permissions and limitations under the License. --> -## 使用限制 +## 使用要求 -### 网络约束 +### 网络要求 - 需要 Syncer 与上下游的 FE 和 BE 是互通的 - 下游 BE 与上游 BE 通过 Doris BE 进程使用的 IP (`show frontends/backends` 看到的) 是直通的。 +### 权限要求 + +Syncer 同步时需要用户提供上下游的账户,该账户需要拥有下述权限: + +1. Select_priv 对数据库、表的只读权限。 +2. Load_priv 对数据库、表的写权限。包括 Load、Insert、Delete 等。 +3. Alter_priv 对数据库、表的更改权限。包括重命名 库/表、添加/删除/变更 列、添加/删除 分区等操作。 +4. Create_priv 创建数据库、表、视图的权限。 +5. drop_priv 删除数据库、表、视图的权限。 + +此外还需要加上 Admin 权限 (之后考虑彻底移除), 这个是用来检测 enable binlog config 的,现在需要 admin 权限。 + +### 版本要求 + +版本最低要求:v2.0.15 + +:::caution +**从 2.1.8/3.0.4 开始,ccr syncer 支持的最小 doris 版本是 2.1,2.0 版本将不再支持。** +::: + +#### 不建议使用版本 + +Doris 版本 +- 2.1.5/2.0.14:如果从之前的版本升级到这两个版本,且用户有 drop partition 操作,那么会在升级、重启时碰到 NPE,原因是这个版本引入了一个新字段,旧版本没有所以默认值为 null。这个问题在 2.1.6/2.0.15 修复。 ## 启动 Syncer @@ -402,76 +426,35 @@ bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db ## Syncer 高可用 -Syncer 高可用依赖 mysql,如果使用 mysql 作为后端存储,Syncer 可以发现其它 Syncer,如果一个 crash 了,其他会分担他的任务 - -## 权限要求 - -1. Select_priv 对数据库、表的只读权限。 - -2. Load_priv 对数据库、表的写权限。包括 Load、Insert、Delete 等。 - -3. Alter_priv 对数据库、表的更改权限。包括重命名 库/表、添加/删除/变更 列、添加/删除 分区等操作。 - -4. Create_priv 创建数据库、表、视图的权限。 - -5. drop_priv 删除数据库、表、视图的权限。 - -加上 Admin 权限 (之后考虑彻底移除), 这个是用来检测 enable binlog config 的,现在需要 admin - - -### 版本要求 - -版本最低要求:v2.0.15 - - -## Feature - -### 限速 - -BE 端配置参数: - -```shell -download_binlog_rate_limit_kbs=1024 # 限制单个 BE 节点从源集群拉取 Binlog(包括 Local Snapshot)的速度为 1 MB/s -``` +Syncer 高可用依赖 mysql,如果使用 mysql 作为后端存储,Syncer 可以发现其它 Syncer,如果一个 crash 了,其他会分担它的任务。 -详细参数加说明: -1. `download_binlog_rate_limit_kbs` 参数在源集群 BE 节点配置,通过设置该参数能够有效限制数据拉取速度。 +## 使用须知 -2. `download_binlog_rate_limit_kbs` 参数主要用于设置单个 BE 节点的速度,若计算集群整体速率一般需要参数值乘以集群个数。 - - - -## IS_BEING_SYNCED 属性 - -从 Doris v2.0 "is_being_synced" = "true" +### IS_BEING_SYNCED 属性 CCR 功能在建立同步时,会在目标集群中创建源集群同步范围中表(后称源表,位于源集群)的副本表(后称目标表,位于目标集群),但是在创建副本表时需要失效或者擦除一些功能和属性以保证同步过程中的正确性。 如: - 源表中包含了可能没有被同步到目标集群的信息,如`storage_policy`等,可能会导致目标表创建失败或者行为异常。 - - 源表中可能包含一些动态功能,如动态分区等,可能导致目标表的行为不受 Syncer 控制导致 partition 不一致。 在被复制时因失效而需要擦除的属性有: - `storage_policy` - - `colocate_with` 在被同步时需要失效的功能有: - 自动分桶 - - 动态分区 -### 实现 +#### 实现 在创建目标表时,这条属性将会由 Syncer 控制添加或者删除,在 CCR 功能中,创建一个目标表有两个途径: 1. 在表同步时,Syncer 通过 backup/restore 的方式对源表进行全量复制来得到目标表。 - 2. 在库同步时,对于存量表而言,Syncer 同样通过 backup/restore 的方式来得到目标表,对于增量表而言,Syncer 会通过携带有 CreateTableRecord 的 binlog 来创建目标表。 综上,对于插入`is_being_synced`属性有两个切入点:全量同步中的 restore 过程和增量同步时的 getDdlStmt。 @@ -481,9 +464,78 @@ CCR 功能在建立同步时,会在目标集群中创建源集群同步范围 对于失效属性的擦除无需多言,对于上述功能的失效需要进行说明: - 自动分桶 自动分桶会在创建表时生效,计算当前合适的 bucket 数量,这就可能导致源表和目的表的 bucket 数目不一致。因此在同步时需要获得源表的 bucket 数目,并且也要获得源表是否为自动分桶表的信息以便结束同步后恢复功能。当前的做法是在获取 distribution 信息时默认 autobucket 为 false,在恢复表时通过检查`_auto_bucket`属性来判断源表是否为自动分桶表,如是则将目标表的 autobucket 字段设置为 true,以此来达到跳过计算 bucket 数量,直接应用源表 bucket 数量的目的。 - - 动态分区 动态分区则是通过将`olapTable.isBeingSynced()`添加到是否执行 add/drop partition 的判断中来实现的,这样目标表在被同步的过程中就不会周期性的执行 add/drop partition 操作。 -### 注意 +:::caution 在未出现异常时,`is_being_synced`属性应该完全由 Syncer 控制开启或关闭,用户不要自行修改该属性。 + +::: + +### 建议打开的配置 + +- `restore_reset_index_id`:如果要同步的表中带有 inverted index,那么必须在目标集群上配置为 `false`。 +- `ignore_backup_tmp_partitions`:如果上游有创建 tmp partition,那么 doris 会禁止做 backup,因此 ccr-syncer 同步会中断;通过在 FE 设置 `ignore_backup_tmp_partitions=true` 可以避免这个问题。 + +### 注意事项 + +- CCR 同步期间 backup/restore job 和 binlogs 都在 FE 内存中,因此建议在 FE 给每个 ccr job 都留出 4GB 及以上的堆内存(源和目标集群都需要),同时注意修改下列配置减少无关 job 对内存的消耗: + - 修改 FE 配置 `max_backup_restore_job_num_per_db`: + 记录在内存中的每个 DB 的 backup/restore job 数量。默认值是 10,设置为 2 就可以了。 + - 修改源集群 db/table property,设置 binlog 保留限制 + - `binlog.max_bytes`: binlog 最大占用内存,建议至少保留 4GB(默认无限制) + - `binlog.ttl_seconds`: binlog 保留时间,从 2.0.5 之前的老版本默认无限制;之后的版本默认值为一天(86400) + 比如要修改 binlog ttl seconds 为保留一个小时: `ALTER TABLE table SET ("binlog.ttl_seconds"="3600")` +- CCR 正确性依也赖于目标集群的事务状态,因此要保证在同步过程中事务不会过快被回收,需要调大下列配置 + - `label_num_threshold`:用于控制 TXN Label 数量 + - `stream_load_default_timeout_second`:用于控制 TXN 超时时间 + - `label_keep_max_second`: 用于控制 TXN 结束后保留时间 + - `streaming_label_keep_max_second`:同上 +- 如果是 db 同步且源集群的 tablet 数量较多,那么产生的 ccr job 可能非常大,需要修改几个 FE 的配置: + - `max_backup_tablets_per_job`: + 一次 backup 任务涉及的 tablet 上限,需要根据 tablet 数量调整(默认值为 30w,过多的 tablet 数量会有 FE OOM 风险,优先考虑能否降低 tablet 数量) + - `thrift_max_message_size`: + FE thrift server 允许的单次 RPC packet 上限,默认值为 100MB,如果 tablet 数量太多导致 snapshot info 大小超过 100MB,则需要调整该限制,最大 2GB + - Snapshot info 大小可以从 ccr syncer 日志中找到,关键字:`snapshot response meta size: %d, job info size: %d`,snapshot info 大小大约是 meta size + job info size。 + - `fe_thrift_max_pkg_bytes`: + 同上,一个额外的参数,2.0 中需要调整,默认值为 20MB + - `restore_download_task_num_per_be`: + 发送给每个 BE download task 数量上限,默认值是 3,对 restore job 来说太小了,需要调整为 0(也就是关闭这个限制); 2.1.8 和 3.0.4 起不再需要这个配置。 + - `backup_upload_task_num_per_be`: + 发送给每个 BE upload task 数量上限,默认值是 3,对 backup job 来说太小了,需要调整为 0 (也就是关闭这个限制);2.1.8 和 3.0.4 起不再需要这个配置。 + - 除了上述 FE 的配置外,如果 ccr job 的 db type 是 mysql,还需要调整 mysql 的一些配置: + - mysql 服务端会限制单次 select/insert 返回/插入数据包的大小。增加下列配置以放松该限制,比如调整到上限 1GB + ``` + [mysqld] + max_allowed_packet = 1024MB + ``` + - mysql client 也会有该限制,在 ccr syncer 2.1.6/2.0.15 及之前的版本,上限为 128MB;之后的版本可以通过参数 `--mysql_max_allowed_packet` 调整(单位 bytes),默认值为 1024MB + > 注:在 2.1.8 和 3.0.4 以后,ccr syncer 不再将 snapshot info 保存在 db 中,因此默认的数据包大小已经足够了。 +- 同上,BE 端也需要修改几个配置 + - `thrift_max_message_size`: BE thrift server 允许的单次 RPC packet 上限,默认值为 100MB,如果 tablet 数量太多导致 agent task 大小超过 100MB,则需要调整该限制,最大 2GB + - `be_thrift_max_pkg_bytes`:同上,只有 2.0 中需要调整的参数,默认值为 20MB +- 即使修改了上述配置,当 tablet 继续上升时,产生的 snapshot 大小可能会超过 2GB,也就是 doris FE edit log 和 RPC message size 的阈值,导致同步失败。从 2.1.8 和 3.0.4 开始,doris 可以通过压缩 snapshot 来进一步提高备份恢复支持的 tablet 数量。可以通过下面几个参数开启压缩: + - `restore_job_compressed_serialization`: 开启对 restore job 的压缩(影响元数据兼容性,默认关闭) + - `backup_job_compressed_serialization`: 开启对 backup job 的压缩(影响元数据兼容性,默认关闭) + - `enable_restore_snapshot_rpc_compression`: 开启对 snapshot info 的压缩,主要影响 RPC(默认开启) + > 注:由于识别 backup/restore job 是否压缩需要额外的代码,而 2.1.8 和 3.0.4 之前的代码中不包含相关代码,因此一旦有 backup/restore job 生成,那么就无法回退到更早的 doris 版本。有两种情况例外:已经 cancel 或者 finished 的 backup/restore job 不会被压缩,因此在回退前等待 backup/restore job 完成或者主动取消 job 后,就能安全回退。 +- Ccr 内部会使用 db/table 名作为一些内部 job 的 label,因此如果 ccr job 中碰到了 label 超过限制了,可以调整 FE 参数 `label_regex_length` 来放松该限制(默认值为 128) +- 由于 backup 暂时不支持备份带有 cooldown tablet 的表,如果碰到了会导致同步终端,因此需要在创建 ccr job 前检查是否有 table 设置了 `storage_policy` 属性。 + +### 性能相关参数 + +- 如果用户的数据量非常大,备份、恢复执行完需要的时间可能会超过一天(默认值),那么需要按需调整下列参数 + - `backup_job_default_timeout_ms` 备份/恢复任务超时时间,源、目标集群的 FE 都需要配置 + - 上游修改 binlog 保留时间: `ALTER DATABASE $db SET PROPERTIES ("binlog.ttl_seconds" = "xxxx")` +- 下游 BE 下载速度慢 + - `max_download_speed_kbps` 下游单个 BE 中单个下载线程的下载限速,默认值为 50MB/s + - `download_worker_count` 下游执行下载任务的线程数,默认值为 1;需要结合客户机型调整,在不影响客户正常读写时跳到最大;如果调整了这个参数,就可以不用调整 `max_download_speed_kbps`。 + - 比如客户机器网卡最大提供 1GB 的带宽,现在最大允许下载线程利用 200MB 的带宽,那么在不改变 `max_download_speed_kbps` 的情况下,`download_worker_count` 应该配置成 4。 +- 限制下游 BE 下载 binlog 速度 + BE 端配置参数: + ```shell + download_binlog_rate_limit_kbs=1024 # 限制单个 BE 节点从源集群拉取 Binlog(包括 Local Snapshot)的速度为 1 MB/s + ``` + 详细参数加说明: + 1. `download_binlog_rate_limit_kbs` 参数在源集群 BE 节点配置,通过设置该参数能够有效限制数据拉取速度。 + 2. `download_binlog_rate_limit_kbs` 参数主要用于设置单个 BE 节点的速度,若计算集群整体速率一般需要参数值乘以集群个数。 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org