(doris-website) branch master updated: [improvement](ccr) improve overview of ccr (#1427)

dataroaring Mon, 02 Dec 2024 03:33:30 -0800

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new ec83e9d31d [improvement](ccr) improve overview of ccr (#1427)
ec83e9d31d is described below

commit ec83e9d31d61169721c6241727fd08c9e5b400af
Author: Yongqiang YANG <yangyongqi...@selectdb.com>
AuthorDate: Mon Dec 2 19:33:19 2024 +0800

    [improvement](ccr) improve overview of ccr (#1427)
    
    ## Versions
    
    - [x] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
    
    ---------
    
    Co-authored-by: Yongqiang YANG <yangyogqi...@selectdb.com>
---
 docs/admin-manual/data-admin/ccr/manual.md         |  84 ++-
 docs/admin-manual/data-admin/ccr/overview.md       | 602 ++-------------------
 docs/admin-manual/data-admin/ccr/quickstart.md     |  12 +-
 .../current/admin-manual/data-admin/ccr/manual.md  |  73 +--
 .../admin-manual/data-admin/ccr/overview.md        |  69 ++-
 .../admin-manual/data-admin/ccr/quickstart.md      |  10 +-
 6 files changed, 162 insertions(+), 688 deletions(-)

diff --git a/docs/admin-manual/data-admin/ccr/manual.md 
b/docs/admin-manual/data-admin/ccr/manual.md
index a425e28615..d9cec3a7c1 100644
--- a/docs/admin-manual/data-admin/ccr/manual.md
+++ b/docs/admin-manual/data-admin/ccr/manual.md
@@ -24,9 +24,18 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Start syncer
+## Limitations
 
-Start syncer according to the configurations and save a pid file in the 
default or specified path. The name of the pid file should follow 
`host_port.pid`.
+### Network Constraints
+
+- Syncer needs to be able to communicate with both the upstream and downstream 
FE (Frontend) and BE (Backend).
+
+- The downstream BE and upstream BE are directly connected through the IP used 
by the Doris BE process (as seen in `show frontends/backends`).
+
+
+## Start Syncer
+
+Start Syncer according to the configurations and save a pid file in the 
default or specified path. The name of the pid file should follow 
`host_port.pid`.
 
 **Output file structure**
 
@@ -51,7 +60,7 @@ output_dir
 
 **--daemon** 
 
-Run syncer in the background, set to false by default.
+Run Syncer in the background, set to false by default.
 
 ```SQL
 bash bin/start_syncer.sh --daemon
@@ -67,7 +76,7 @@ bash bin/start_syncer.sh --db_type mysql
 
 The default value is sqlite3.
 
-When using MySQL to store metadata, syncer will use `CREATE IF NOT EXISTS `to 
create a database called `ccr`, where the metadata table related to CCR will be 
saved.
+When using MySQL to store metadata, Syncer will use `CREATE IF NOT EXISTS `to 
create a database called `ccr`, where the metadata table related to CCR will be 
saved.
 
 **--db_dir** 
 
@@ -103,7 +112,7 @@ The default path is`SYNCER_OUTPUT_DIR/log` and the default 
file name is `ccr_syn
 
 **--log_level** 
 
-Used to specify the output level of syncer logs.
+Used to specify the output level of Syncer logs.
 
 ```SQL
 bash bin/start_syncer.sh --log_level info
@@ -127,7 +136,7 @@ When running in the foreground, log_level defaults to 
`trace`, and logs are save
 
 **--host && --port** 
 
-Used to specify the host and port of syncer, where host only plays the role of 
distinguishing itself in the cluster, which can be understood as the name of 
syncer, and the name of syncer in the cluster is `host: port`.
+Used to specify the host and port of Syncer, where host only plays the role of 
distinguishing itself in the cluster, which can be understood as the name of 
Syncer, and the name of Syncer in the cluster is `host: port`.
 
 ```SQL
 bash bin/start_syncer.sh --host 127.0.0.1 --port 9190
@@ -139,7 +148,7 @@ The default value of host is 127.0.0.1, and the default 
value of port is 9190.
 
 Used to specify the storage path of the pid file
 
-The pid file is the credentials for closing the syncer. It is used in the 
stop_syncer.sh script. It saves the corresponding syncer process number. In 
order to facilitate management of syncer, you can specify the storage path of 
the pid file.
+The pid file is the credentials for closing the Syncer. It is used in the 
stop_syncer.sh script. It saves the corresponding Syncer process number. In 
order to facilitate management of Syncer, you can specify the storage path of 
the pid file.
 
 ```SQL
 bash bin/start_syncer.sh --pid_dir /path/to/pids
@@ -147,9 +156,9 @@ bash bin/start_syncer.sh --pid_dir /path/to/pids
 
 The default value is `SYNCER_OUTPUT_DIR/bin`.
 
-## Stop syncer
+## Stop Syncer
 
-Stop the syncer according to the process number in the pid file under the 
default or specified path. The name of the pid file should follow 
`host_port.pid`.
+Stop the Syncer according to the process number in the pid file under the 
default or specified path. The name of the pid file should follow 
`host_port.pid`.
 
 **Output file structure**
 
@@ -172,17 +181,17 @@ output_dir
 
 **Stop options**
 
-Syncers can be stopped in three ways: 
+Syncer can be stopped in three ways:
 
-1. Stop a single syncer in the directory
+1. Stop a single Syncer in the directory
 
-Specify the host and port of the syncer to be stopped. Be sure to keep it 
consistent with the host specified when start_syncer
+Specify the host and port of the Syncer to be stopped. Be sure to keep it 
consistent with the host specified when start_syncer
 
-2. Batch stop the specified syncers in the directory
+2. Batch stop the specified Syncer in the directory
 
 Specify the names of the pid files to be stopped, wrap the names in `""` and 
separate them with spaces.
 
-3. Stop all syncers in the directory
+3. Stop all Syncers in the directory
 
 Follow the default configurations.
 
@@ -194,13 +203,13 @@ Specify the directory where the pid file is located. The 
above three stopping me
 bash bin/stop_syncer.sh --pid_dir /path/to/pids
 ```
 
-The effect of the above example is to close the syncers corresponding to all 
pid files under `/path/to/pids `( **method 3** ). `-- pid_dir `can be used in 
combination with the above three syncer stopping methods.
+The effect of the above example is to close the Syncer corresponding to all 
pid files under `/path/to/pids `( **method 3** ). `-- pid_dir `can be used in 
combination with the above three Syncer stopping methods.
 
 The default value is `SYNCER_OUTPUT_DIR/bin`.
 
 **--host && --port** 
 
-Stop the syncer corresponding to host: port in the pid_dir path.
+Stop the Syncer corresponding to host: port in the pid_dir path.
 
 ```shell
 bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190
@@ -210,7 +219,7 @@ The default value of host is 127.0.0.1, and the default 
value of port is empty.
 
 **--files** 
 
-Stop the syncer corresponding to the specified pid file name in the pid_dir 
path.
+Stop the Syncer corresponding to the specified pid file name in the pid_dir 
path.
 
 ```shell
 bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid"
@@ -228,7 +237,7 @@ curl -X POST -H "Content-Type: application/json" -d 
{json_body} http://ccr_synce
 
 json_body: send operation information in JSON format
 
-operator: different operations for syncer
+operator: different operations for Syncer
 
 The interface returns JSON. If successful, the "success" field will be true. 
Conversely, if there is an error, it will be false, and then there will be an 
`ErrMsgs` field.
 
@@ -269,7 +278,7 @@ curl -X POST -H "Content-Type: application/json" -d '{
 - name: the name of the CCR synchronization task, should be unique
 - host, port: correspond to the host and mysql (jdbc) port of the cluster's 
master
 - thrift_port: corresponds to the rpc_port of the FE
-- user, password: the credentials used by the syncer to initiate transactions, 
fetch data, etc.
+- user, password: the credentials used by the Syncer to initiate transactions, 
fetch data, etc.
 - database, table:
   - If it is a database-level synchronization, fill in the database name and 
leave the table name empty.
   - If it is a table-level synchronization, specify both the database name and 
the table name.
@@ -379,9 +388,9 @@ output_dir
 bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db
 ```
 
-## High availability of syncer
+## High availability of Syncer
 
-The high availability of syncers relies on MySQL. If MySQL is used as the 
backend storage, the syncer can discover other syncers. If one syncer crashes, 
the others will take over its tasks.
+The high availability of Syncer relies on MySQL. If MySQL is used as the 
backend storage, the Syncer can discover other Syncers. If one Syncer crashes, 
the others will take over its tasks.
 
 ## Privilege requirements
 
@@ -393,29 +402,6 @@ The high availability of syncers relies on MySQL. If MySQL 
is used as the backen
 
 Admin privileges are required (We are planning on removing this in future 
versions). This is used to check the `enable binlog config`.
 
-## Usage restrictions
-
-### Network constraints
-
-- Syncer needs to have connectivity to both the upstream and downstream FEs 
and BEs.
-- The downstream BE should have connectivity to the upstream BE.
-- The external IP and Doris internal IP should be the same. In other words, 
the IP address visible in the output of `show frontends/backends` should be the 
same IP that can be directly connected to. It should not involve IP forwarding 
or NAT for direct connections.
-
-### ThriftPool constraints
-
-It is recommended to increase the size of the Thrift thread pool to a number 
greater than the number of buckets involved in a single commit operation.
-
-### Version requirements
-
-Minimum required version: V2.0.3
-
-### Unsupported operations
-
-- Rename table
-- Operations such as table drop-recovery
-- Operations related to rename table, replace partition
-- Concurrent backup/restore within the same database
-
 ## Feature
 
 ### Rate limit
@@ -454,14 +440,14 @@ The functionalities that need to be disabled during 
synchronization are:
 
 ### Implementation
 
-When creating the target table, the syncer controls the addition or deletion 
of the `is_being_synced` property. In CCR, there are two approaches to creating 
a target table:
+When creating the target table, the Syncer controls the addition or deletion 
of the `is_being_synced` property. In CCR, there are two approaches to creating 
a target table:
 
-1. During table synchronization, the syncer performs a full copy of the source 
table using backup/restore to obtain the target table.
-2. During database synchronization, for existing tables, the syncer also uses 
backup/restore to obtain the target table. For incremental tables, the syncer 
creates the target table using the CreateTableRecord binlog.
+1. During table synchronization, the Syncer performs a full copy of the source 
table using backup/restore to obtain the target table.
+2. During database synchronization, for existing tables, the Syncer also uses 
backup/restore to obtain the target table. For incremental tables, the Syncer 
creates the target table using the CreateTableRecord binlog.
 
 Therefore, there are two entry points for inserting the `is_being_synced` 
property: the restore process during full synchronization and the getDdlStmt 
during incremental synchronization.
 
-During the restoration process of full synchronization, the syncer initiates a 
restore operation of the snapshot from the source cluster via RPC. During this 
process, the `is_being_synced` property is added to the RestoreStmt and takes 
effect in the final restoreJob, executing the relevant logic for 
`is_being_synced`.
+During the restoration process of full synchronization, the Syncer initiates a 
restore operation of the snapshot from the source cluster via RPC. During this 
process, the `is_being_synced` property is added to the RestoreStmt and takes 
effect in the final restoreJob, executing the relevant logic for 
`is_being_synced`.
 
 During incremental synchronization, add the `boolean getDdlForSync` parameter 
to the getDdlStmt method to differentiate whether it is a controlled 
transformation to the target table DDL, and execute the relevant logic for 
isBeingSynced during the creation of the target table.
 
@@ -472,4 +458,4 @@ Regarding the disabling of the functionalities mentioned 
above:
 
 ### Note
 
-The `is_being_synced` property should be fully controlled by the syncer, and 
users should not modify this property manually unless there are exceptional 
circumstances.
+The `is_being_synced` property should be fully controlled by the Syncer, and 
users should not modify this property manually unless there are exceptional 
circumstances.
diff --git a/docs/admin-manual/data-admin/ccr/overview.md 
b/docs/admin-manual/data-admin/ccr/overview.md
index 123514e8c5..4dfb160871 100644
--- a/docs/admin-manual/data-admin/ccr/overview.md
+++ b/docs/admin-manual/data-admin/ccr/overview.md
@@ -1,8 +1,6 @@
 ---
-{
-    "title": "Overview",
-    "language": "en"
-}
+title: Overview
+language: en-US
 ---
 
 <!--
@@ -26,580 +24,70 @@ under the License.
 
 ## Overview
 
-Cross Cluster Replication (CCR) enables the synchronization of data changes 
from a source cluster to a target cluster at the database/table level. This 
feature can be used to ensure data availability for online services, isolate 
offline and online workloads, and build multiple data centers across various 
sites.
+CCR (Cross Cluster Replication) is a cross-cluster data synchronization 
mechanism that synchronizes data changes from the source cluster to the target 
cluster at the database or table level. It is mainly used to improve data 
availability for online services, support read-write load isolation, and build 
a dual-region, three-center architecture.
 
-CCR is applicable to the following scenarios:
+### Use Cases
 
-- Disaster recovery: This involves backing up enterprise data to another 
cluster and data center. In the event of a sudden incident causing business 
interruption or data loss, companies can recover data from the backup or 
quickly switch to the backup cluster. Disaster recovery is typically a 
must-have feature in use cases with high SLA requirements, such as those in 
finance, healthcare, and e-commerce.
-- Read/write separation: This is to isolate querying and writing operations to 
reduce their mutual impact and improve resource utilization. For example, in 
cases of high writing pressure or high concurrency, read/write separation can 
distribute read and write operations to read-only and write-only database 
instances in various regions. This helps ensure high database performance and 
stability.
-- Data transfer between headquarters and branch offices: In order to have 
unified data control and analysis within a corporation, the headquarters 
usually requires timely data synchronization from branch offices located in 
different regions. This avoids management confusion and wrong decision-making 
based on inconsistent data.
-- Isolated upgrades: During system cluster upgrades, there might be a need to 
roll back to a previous version. Many traditional upgrade methods do not allow 
rolling back due to incompatible metadata. CCR in Doris can address this issue 
by building a standby cluster for upgrade and conducting dual-running 
verification. Users can ungrade the clusters one by one. CCR is not dependent 
on specific versions, making version rollback feasible.
+CCR is applicable to the following common scenarios:
 
-### Task Categories
+- **Disaster Recovery and Backup**: Backing up enterprise data to another 
cluster and data center ensures that data can be restored or quickly switched 
to a backup in the event of business interruption or data loss. This high-SLA 
disaster recovery is commonly required in industries such as finance, 
healthcare, and e-commerce.
 
-CCR supports two categories of tasks: database-level and table-level. 
Database-level tasks synchronize data for an entire database, while table-level 
tasks synchronize data for a single table.
+- **Read/Write Separation**: By isolating data query operations from data 
write operations, the impact between read and write processes is minimized, 
enhancing service availability. In high-concurrency or high-write-pressure 
scenarios, read/write separation helps to distribute the load effectively, 
improving database performance and stability.
 
-## Design
+- **Data Centralization**: Group headquarters need to centrally manage and 
analyze data from branch offices located in different regions, avoiding 
management confusion and decision-making errors caused by inconsistent data, 
thus improving the efficiency of group management and decision-making quality.
 
-### Concepts
+- **Isolated Upgrades**: During system cluster upgrades, CCR can be used to 
verify and test the new cluster to avoid rollback difficulties due to version 
compatibility issues. Users can upgrade each cluster incrementally while 
ensuring data consistency.
 
-- Source cluster: the cluster where business data is written and originates 
from, requiring Doris version 2.0
+- **Cluster Migration**: When migrating a Doris cluster to a new data center 
or replacing hardware, CCR can be used to synchronize data from the old cluster 
to the new one, ensuring data consistency during the migration process.
 
-- Target cluster: the destination cluster for cross cluster replication, 
requiring version 2.0
+### Job Types
 
-- Binlog: the change log of the source cluster, including schema and data 
changes
+CCR supports two types of jobs:
 
-- Syncer: a lightweight process
+- **Database-Level Jobs**: Synchronize data for the entire database.
+- **Table-Level Jobs**: Synchronize data for a specific table. Note that 
table-level synchronization does not support renaming or replacing tables. 
Additionally, Doris only supports one snapshot job running per database, so 
table-level full sync jobs must queue for execution.
 
-### Architecture description
+## Principles and Architecture
 
-![ccr-architecture-description](/images/ccr-architecture-description.png)
+### Terminology
 
-CCR relies on a lightweight process called syncer. Syncers retrieve binlogs 
from the source cluster, directly apply the metadata to the target cluster, and 
notify the target cluster to pull data from the source cluster. CCR allows both 
full and incremental data migration.
+- **Source Cluster**: The cluster where the data originates, typically where 
business data is written.
+- **Target Cluster**: The cluster where the data is synchronized to in a 
cross-cluster setup.
+- **Binlog**: The change log from the source cluster, containing schema and 
data changes.
+- **Syncer**: A lightweight process responsible for synchronizing data.
+- **Upstream**: In a database-level job, this refers to the source database; 
in a table-level job, it refers to the source table.
+- **Downstream**: In a database-level job, this refers to the target database; 
in a table-level job, it refers to the target table.
 
-### Sync Methods
+### Architecture Overview
 
-CCR supports four synchronization methods:
+![CCR Architecture Overview](/images/ccr-architecture-description.png)
 
-| Sync Method   | Principle                                   | Trigger Timing 
                                  |
-| --------------| ------------------------------------------- | 
------------------------------------------------ |
-| Full Sync     | Full backup from upstream, restore downstream. | Triggered 
by the first synchronization or operation, see the feature list for details. |
-| Partial Sync  | Backup at the upstream table or partition level, restore at 
the downstream table or partition level. | Triggered by operations, see the 
feature list for details. |
-| TXN           | Incremental data synchronization, downstream starts syncing 
after upstream commits. | Triggered by operations, see the feature list for 
details. |
-| SQL           | Replay upstream operations' SQL at the downstream. | 
Triggered by operations, see the feature list for details. |
+CCR relies primarily on a lightweight process called `Syncer`. `Syncer` fetch 
binlogs from the source cluster, apply the metadata to the target cluster, and 
instruct the target cluster to pull data from the source cluster, enabling full 
and incremental synchronization.
 
-### Usage
+### Principles
 
-The usage of CCR is straightforward. Simply start the syncer service and send 
a command, and the syncers will take care of the rest.
+1. **Full Synchronization**:
+   - The CCR job first performs full synchronization, which copies all data 
from the upstream to the downstream in one complete operation.
 
-1. Deploy the source Doris cluster.
-2. Deploy the target Doris cluster.
-3. Both the source and target clusters need to enable binlog. Configure the 
following information in the fe.conf and be.conf files of the source and target 
clusters:
+2. **Incremental Synchronization**:
+   - After full synchronization is complete, the CCR job continues with 
incremental synchronization, keeping the data between the upstream and 
downstream clusters consistent.
 
-```SQL
-enable_feature_binlog=true
-```
+3. **Reinitiating Full Synchronization**:
+   - If the job encounters a DDL operation that does not support incremental 
synchronization, the CCR job will restart full synchronization. For a list of 
DDL operations that do not support incremental synchronization, refer to 
[Feature Details](../feature.md).
+   - If the upstream binlog is interrupted due to expiration or other reasons, 
the incremental synchronization will stop, triggering a restart of full 
synchronization.
 
-4. Deploy syncers
+4. **During Synchronization**:
+   - Incremental synchronization will pause during the full synchronization 
process.
+   - After full synchronization is completed, the downstream tables will 
undergo atomic replacement to ensure data consistency.
+   - After full synchronization is complete, incremental synchronization will 
resume.
 
-Build CCR syncer
+### Synchronization Modes
 
-```shell
-git clone https://github.com/selectdb/ccr-syncer
-cd ccr-syncer   
-bash build.sh <-j NUM_OF_THREAD> <--output SYNCER_OUTPUT_DIR>
-cd SYNCER_OUTPUT_DIR# Contact the Doris community for a free CCR binary package
-```
+CCR supports four synchronization modes:
 
-
-Start and stop syncer
-
-
-```shell
-# Start
-cd bin && sh start_syncer.sh --daemon
-   
-# Stop
-sh stop_syncer.sh
-```
-
-5. Enable binlog in the source cluster.
-
-```shell
--- If you want to synchronize the entire database, you can execute the 
following script:
-vim shell/enable_db_binlog.sh
-Modify host, port, user, password, and db in the source cluster
-Or ./enable_db_binlog.sh --host $host --port $port --user $user --password 
$password --db $db
-
--- If you want to synchronize a single table, you can execute the following 
script and enable binlog for the target table:
-ALTER TABLE enable_binlog SET ("binlog.enable" = "true");
-```
-
-6. Launch a synchronization task to the syncer
-
-```shell
-curl -X POST -H "Content-Type: application/json" -d '{
-    "name": "ccr_test",
-    "src": {
-      "host": "localhost",
-      "port": "9030",
-      "thrift_port": "9020",
-      "user": "root",
-      "password": "",
-      "database": "your_db_name",
-      "table": "your_table_name"
-    },
-    "dest": {
-      "host": "localhost",
-      "port": "9030",
-      "thrift_port": "9020",
-      "user": "root",
-      "password": "",
-      "database": "your_db_name",
-      "table": "your_table_name"
-    }
-}' http://127.0.0.1:9190/create_ccr
-```
-
-Parameter description:
-
-```shell
-name: name of the CCR synchronization task, should be unique
-host, port: host and mysql(jdbc) port for the master FE for the corresponding 
cluster
-user, password: the credentials used by the syncer to initiate transactions, 
fetch data, etc.
-If it is synchronization at the database level, specify your_db_name and leave 
your_table_name empty
-If it is synchronization at the table level, specify both your_db_name and 
your_table_name
-The synchronization task name can only be used once.
-```
-
-## Operation manual for syncer
-
-### Start syncer
-
-Start syncer according to the configurations and save a pid file in the 
default or specified path. The name of the pid file should follow 
`host_port.pid`.
-
-**Output file structure**
-
-The file structure can be seen under the output path after compilation:
-
-```SQL
-output_dir
-    bin
-        ccr_syncer
-        enable_db_binlog.sh
-        start_syncer.sh
-        stop_syncer.sh
-    db
-        [ccr.db] # Generated after running with the default configurations.
-    log
-        [ccr_syncer.log] # Generated after running with the default 
configurations.
-```
-
-**The start_syncer.sh in the following text refers to the start_syncer.sh 
under its corresponding path.**
-
-**Start options**
-
-**--daemon** 
-
-Run syncer in the background, set to false by default.
-
-```SQL
-bash bin/start_syncer.sh --daemon
-```
-
-**--db_type** 
-
-Syncer can currently use two databases to store its metadata, `sqlite3 `(for 
local storage) and `mysql `(for local or remote storage).
-
-```SQL
-bash bin/start_syncer.sh --db_type mysql
-```
-
-The default value is sqlite3.
-
-When using MySQL to store metadata, syncer will use `CREATE IF NOT EXISTS `to 
create a database called `ccr`, where the metadata table related to CCR will be 
saved.
-
-**--db_dir** 
-
-**This option only works when db uses `sqlite3`.**
-
-It allows you to specify the name and path of the db file generated by sqlite3.
-
-```SQL
-bash bin/start_syncer.sh --db_dir /path/to/ccr.db
-```
-
-The default path is `SYNCER_OUTPUT_DIR/db` and the default file name is 
`ccr.db`.
-
-**--db_host & db_port & db_user & db_password**
-
-**This option only works when db uses `mysql`.**
-
-```SQL
-bash bin/start_syncer.sh --db_host 127.0.0.1 --db_port 3306 --db_user root 
--db_password "qwe123456"
-```
-
-The default values of db_host and db_port are shown in the example. The 
default values of db_user and db_password are empty.
-
-**--log_dir** 
-
-Output path of the logs: 
-
-```SQL
-bash bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log
-```
-
-The default path is`SYNCER_OUTPUT_DIR/log` and the default file name is 
`ccr_syncer.log`.
-
-**--log_level** 
-
-Used to specify the output level of syncer logs.
-
-```SQL
-bash bin/start_syncer.sh --log_level info
-```
-
-The format of the log is as follows, where the hook will only be printed when 
`log_level > info `:
-
-```SQL
-#        time         level        msg                  hooks
-[2023-07-18 16:30:18] TRACE This is trace type. ccrName=xxx line=xxx
-[2023-07-18 16:30:18] DEBUG This is debug type. ccrName=xxx line=xxx
-[2023-07-18 16:30:18]  INFO This is info type. ccrName=xxx line=xxx
-[2023-07-18 16:30:18]  WARN This is warn type. ccrName=xxx line=xxx
-[2023-07-18 16:30:18] ERROR This is error type. ccrName=xxx line=xxx
-[2023-07-18 16:30:18] FATAL This is fatal type. ccrName=xxx line=xxx
-```
-
-Under --daemon, the default value of log_level is `info`.
-
-When running in the foreground, log_level defaults to `trace`, and logs are 
saved to log_dir using the tee command.
-
-**--host && --port** 
-
-Used to specify the host and port of syncer, where host only plays the role of 
distinguishing itself in the cluster, which can be understood as the name of 
syncer, and the name of syncer in the cluster is `host: port`.
-
-```SQL
-bash bin/start_syncer.sh --host 127.0.0.1 --port 9190
-```
-
-The default value of host is 127.0.0.1, and the default value of port is 9190.
-
-**--pid_dir** 
-
-Used to specify the storage path of the pid file
-
-The pid file is the credentials for closing the syncer. It is used in the 
stop_syncer.sh script. It saves the corresponding syncer process number. In 
order to facilitate management of syncer, you can specify the storage path of 
the pid file.
-
-```SQL
-bash bin/start_syncer.sh --pid_dir /path/to/pids
-```
-
-The default value is `SYNCER_OUTPUT_DIR/bin`.
-
-### Stop syncer
-
-Stop the syncer according to the process number in the pid file under the 
default or specified path. The name of the pid file should follow 
`host_port.pid`.
-
-**Output file structure**
-
-The file structure can be seen under the output path after compilation:
-
-```shell
-output_dir
-    bin
-        ccr_syncer
-        enable_db_binlog.sh
-        start_syncer.sh
-        stop_syncer.sh
-    db
-        [ccr.db] # Generated after running with the default configurations.
-    log
-        [ccr_syncer.log] # Generated after running with the default 
configurations.
-```
-
-**The start_syncer.sh in the following text refers to the start_syncer.sh 
under its corresponding path.**
-
-**Stop options**
-
-Syncers can be stopped in three ways: 
-
-1. Stop a single syncer in the directory
-
-Specify the host and port of the syncer to be stopped. Be sure to keep it 
consistent with the host specified when start_syncer
-
-2. Batch stop the specified syncers in the directory
-
-Specify the names of the pid files to be stopped, wrap the names in `""` and 
separate them with spaces.
-
-3. Stop all syncers in the directory
-
-Follow the default configurations.
-
-**--pid_dir** 
-
-Specify the directory where the pid file is located. The above three stopping 
methods all depend on the directory where the pid file is located for execution.
-
-```shell
-bash bin/stop_syncer.sh --pid_dir /path/to/pids
-```
-
-The effect of the above example is to close the syncers corresponding to all 
pid files under `/path/to/pids `( **method 3** ). `-- pid_dir `can be used in 
combination with the above three syncer stopping methods.
-
-The default value is `SYNCER_OUTPUT_DIR/bin`.
-
-**--host && --port** 
-
-Stop the syncer corresponding to host: port in the pid_dir path.
-
-```shell
-bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190
-```
-
-The default value of host is 127.0.0.1, and the default value of port is 
empty. That is, specifying the host alone will degrade **method 1** to **method 
3**. **Method 1** will only take effect when neither the host nor the port is 
empty.
-
-**--files** 
-
-Stop the syncer corresponding to the specified pid file name in the pid_dir 
path.
-
-```shell
-bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid"
-```
-
-The file names should be wrapped in `" "` and separated with spaces.
-
-### Syncer operations
-
-**Template for requests**
-
-```shell
-curl -X POST -H "Content-Type: application/json" -d {json_body} 
http://ccr_syncer_host:ccr_syncer_port/operator
-```
-
-json_body: send operation information in JSON format
-
-operator: different operations for syncer
-
-The interface returns JSON. If successful, the "success" field will be true. 
Conversely, if there is an error, it will be false, and then there will be an 
`ErrMsgs` field.
-
-```JSON
-{"success":true}
-
-or
-
-{"success":false,"error_msg":"job ccr_test not exist"}
-```
-
-### Create Job
-
-```shell
-curl -X POST -H "Content-Type: application/json" -d '{
-    "name": "ccr_test",
-    "src": {
-    "host": "localhost",
-    "port": "9030",
-    "thrift_port": "9020",
-    "user": "root",
-    "password": "",
-    "database": "demo",
-    "table": "example_tbl"
-    },
-    "dest": {
-    "host": "localhost",
-    "port": "9030",
-    "thrift_port": "9020",
-    "user": "root",
-    "password": "",
-    "database": "ccrt",
-    "table": "copy"
-    }
-}' http://127.0.0.1:9190/create_ccr
-```
-
-- name: the name of the CCR synchronization task, should be unique
-- host, port: correspond to the host and mysql (jdbc) port of the cluster's 
master
-- thrift_port: corresponds to the rpc_port of the FE
-- user, password: the credentials used by the syncer to initiate transactions, 
fetch data, etc.
-- database, table:
-  - If it is a database-level synchronization, fill in the database name and 
leave the table name empty.
-  - If it is a table-level synchronization, specify both the database name and 
the table name.
-
-### Get Synchronization Lag
-
-```shell
-curl -X POST -H "Content-Type: application/json" -d '{
-    "name": "job_name"
-}' http://ccr_syncer_host:ccr_syncer_port/get_lag
-```
-
-The job_name is the name specified when create_ccr.
-
-### Pause Job
-
-```shell
-curl -X POST -H "Content-Type: application/json" -d '{
-    "name": "job_name"
-}' http://ccr_syncer_host:ccr_syncer_port/pause 
-```
-
-### Resume Job
-
-```shell
-curl -X POST -H "Content-Type: application/json" -d '{
-    "name": "job_name"
-}' http://ccr_syncer_host:ccr_syncer_port/resume
-```
-
-### Delete Job
-
-```shell
-curl -X POST -H "Content-Type: application/json" -d '{
-    "name": "job_name"
-}' http://ccr_syncer_host:ccr_syncer_port/delete
-```
-
-### Display Version
-
-```shell
-curl http://ccr_syncer_host:ccr_syncer_port/version
-
-# > return
-{"version": "2.0.1"}
-```
-
-### View Job Status
-
-```shell
-curl -X POST -H "Content-Type: application/json" -d '{
-    "name": "job_name"
-}' http://ccr_syncer_host:ccr_syncer_port/job_status
-
-{
-  "success": true,
-  "status": {
-    "name": "ccr_db_table_alias",
-    "state": "running",
-    "progress_state": "TableIncrementalSync"
-  }
-}
-```
-
-### Desynchronize Job
-
-Do not sync any more. Users can swap the source and target clusters.
-
-```shell
-curl -X POST -H "Content-Type: application/json" -d '{
-    "name": "job_name"
-}' http://ccr_syncer_host:ccr_syncer_port/desync
-```
-
-### List All Jobs
-
-```shell
-curl http://ccr_syncer_host:ccr_syncer_port/list_jobs
-
-{"success":true,"jobs":["ccr_db_table_alias"]}
-```
-
-### Open binlog for all tables in the database
-
-**Output file structure**
-
-The file structure can be seen under the output path after compilation:
-
-```shell
-output_dir
-    bin
-        ccr_syncer
-        enable_db_binlog.sh
-        start_syncer.sh
-        stop_syncer.sh
-    db
-        [ccr.db] # Generated after running with the default configurations.
-    log
-        [ccr_syncer.log] # Generated after running with the default 
configurations.
-```
-
-**The start_syncer.sh in the following text refers to the start_syncer.sh 
under its corresponding path.**
-
-**Usage**
-
-```shell
-bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db
-```
-
-## High availability of syncer
-
-The high availability of syncers relies on MySQL. If MySQL is used as the 
backend storage, the syncer can discover other syncers. If one syncer crashes, 
the others will take over its tasks.
-
-### Privilege requirements
-
-1. `select_priv`: read-only privileges for databases and tables
-2. `load_priv`: write privileges for databases and tables, including load, 
insert, delete, etc.
-3. `alter_priv`: privilege to modify databases and tables, including renaming 
databases/tables, adding/deleting/changing columns, adding/deleting partitions, 
etc.
-4. `create_priv`: privilege to create databases, tables, and views
-5. `drop_priv`: privilege to drop databases, tables, and views
-
-Admin privileges are required (We are planning on removing this in future 
versions). This is used to check the `enable binlog config`.
-
-## Usage restrictions
-
-### Network constraints
-
-- Syncer needs to have connectivity to both the upstream and downstream FEs 
and BEs.
-- The downstream BE should have connectivity to the upstream BE.
-- The external IP and Doris internal IP should be the same. In other words, 
the IP address visible in the output of `show frontends/backends` should be the 
same IP that can be directly connected to. It should not involve IP forwarding 
or NAT for direct connections.
-
-### ThriftPool constraints
-
-It is recommended to increase the size of the Thrift thread pool to a number 
greater than the number of buckets involved in a single commit operation.
-
-### Version requirements
-
-Minimum required version: V2.0.3
-
-### Unsupported operations
-
-- Rename table
-- Operations such as table drop-recovery
-- Operations related to rename table, replace partition
-- Concurrent backup/restore within the same database
-
-## Feature
-
-### Rate limit
-
-BE-side configuration parameter
-
-```shell
-download_binlog_rate_limit_kbs=1024 # Limits the download speed of Binlog 
(including Local Snapshot) from the source cluster to 1 MB/s in a single BE node
-```
-
-1. The `download_binlog_rate_limit_kbs` parameter is configured on the BE 
nodes of the source cluster. By setting this parameter, the data pull rate can 
be effectively limited.
-
-2. The `download_binlog_rate_limit_kbs` parameter primarily controls the speed 
of data transfer for each single BE node. To calculate the overall cluster 
rate, one would multiply the parameter value by the number of nodes in the 
cluster.
-
-
-## IS_BEING_SYNCED
-
-:::tip 
-Doris v2.0 "is_being_synced" = "true" 
-:::
-
-During data synchronization using CCR, replica tables (referred to as target 
tables) are created in the target cluster for the tables within the 
synchronization scope of the source cluster (referred to as source tables). 
However, certain functionalities and attributes need to be disabled or cleared 
when creating replica tables to ensure the correctness of the synchronization 
process. For example:
-
-- The source tables may contain information that is not synchronized to the 
target cluster, such as `storage_policy`, which may cause the creation of the 
target table to fail or result in abnormal behavior.
-- The source tables may have dynamic functionalities, such as dynamic 
partitioning, which can lead to uncontrolled behavior in the target table and 
result in inconsistent partitions.
-
-The attributes that need to be cleared during replication are:
-
-- `storage_policy`
-- `colocate_with`
-
-The functionalities that need to be disabled during synchronization are:
-
-- Automatic bucketing
-- Dynamic partitioning
-
-### Implementation
-
-When creating the target table, the syncer controls the addition or deletion 
of the `is_being_synced` property. In CCR, there are two approaches to creating 
a target table:
-
-1. During table synchronization, the syncer performs a full copy of the source 
table using backup/restore to obtain the target table.
-2. During database synchronization, for existing tables, the syncer also uses 
backup/restore to obtain the target table. For incremental tables, the syncer 
creates the target table using the CreateTableRecord binlog.
-
-Therefore, there are two entry points for inserting the `is_being_synced` 
property: the restore process during full synchronization and the getDdlStmt 
during incremental synchronization.
-
-During the restoration process of full synchronization, the syncer initiates a 
restore operation of the snapshot from the source cluster via RPC. During this 
process, the `is_being_synced` property is added to the RestoreStmt and takes 
effect in the final restoreJob, executing the relevant logic for 
`is_being_synced`.
-
-During incremental synchronization, add the `boolean getDdlForSync` parameter 
to the getDdlStmt method to differentiate whether it is a controlled 
transformation to the target table DDL, and execute the relevant logic for 
isBeingSynced during the creation of the target table.
-
-Regarding the disabling of the functionalities mentioned above:
-
-- Automatic bucketing: Automatic bucketing is enabled when creating a table. 
It calculates the appropriate number of buckets. This may result in a mismatch 
in the number of buckets between the source and target tables. Therefore, 
during synchronization, obtain the number of buckets from the source table, as 
well as the information about whether the source table is an automatic 
bucketing table in order to restore the functionality after synchronization. 
The current recommended approach is [...]
-- Dynamic partitioning: This is implemented by adding 
`olapTable.isBeingSynced()` to the condition for executing add/drop partition 
operations. This ensures that the target table does not perform periodic 
add/drop partition operations during synchronization.
-
-### Note
-
-The `is_being_synced` property should be fully controlled by the syncer, and 
users should not modify this property manually unless there are exceptional 
circumstances.
+| Synchronization Mode | Principle                                             
  | Trigger Condition                                              |
+|----------------------|---------------------------------------------------------|----------------------------------------------------------------|
+| **Full Sync**         | Full backup of the upstream, restore on the 
downstream. DB-level jobs trigger DB backup, table-level jobs trigger table 
backup. | Initial synchronization or specific operations trigger this. See 
[Feature Details](../feature.md) for triggers. |
+| **Partial Sync**      | Backup at the table or partition level from the 
upstream, restore at the same level on the downstream. | Specific operations 
trigger this. See [Feature Details](../feature.md) for triggers. |
+| **TXN**               | Incremental data synchronization, downstream starts 
synchronization after upstream commit. | Specific operations trigger this. See 
[Feature Details](../feature.md) for triggers. |
+| **SQL**               | Replaying upstream SQL operations on the downstream. 
   | Specific operations trigger this. See [Feature Details](../feature.md) for 
triggers. |
diff --git a/docs/admin-manual/data-admin/ccr/quickstart.md 
b/docs/admin-manual/data-admin/ccr/quickstart.md
index c740efa4d7..f9d21a8028 100644
--- a/docs/admin-manual/data-admin/ccr/quickstart.md
+++ b/docs/admin-manual/data-admin/ccr/quickstart.md
@@ -24,7 +24,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-The usage of CCR is straightforward. Simply start the syncer service and send 
a command, and the syncers will take care of the rest.
+The usage of CCR is straightforward. Simply start the syncer service and send 
a command, and the Syncer will take care of the rest.
 
 ## Step 1. Deploy the source Doris cluster
 ## Step 2. Deploy the target Doris cluster
@@ -36,9 +36,9 @@ Both the source and target clusters need to enable binlog. 
Configure the followi
 enable_feature_binlog=true
 ```
 
-## Step 4. Deploy syncers
+## Step 4. Deploy Syncer
 
-Build CCR syncer
+Build CCR Syncer
 
 ```shell
 git clone https://github.com/selectdb/ccr-syncer
@@ -48,7 +48,7 @@ cd SYNCER_OUTPUT_DIR# Contact the Doris community for a free 
CCR binary package
 ```
 
 
-Start and stop syncer
+Start and stop Syncer
 
 
 ```shell
@@ -71,7 +71,7 @@ Or ./enable_db_binlog.sh --host $host --port $port --user 
$user --password $pass
 ALTER TABLE enable_binlog SET ("binlog.enable" = "true");
 ```
 
-## Step 6. Launch a synchronization task to the syncer
+## Step 6. Launch a synchronization task to the Syncer
 
 ```shell
 curl -X POST -H "Content-Type: application/json" -d '{
@@ -102,7 +102,7 @@ Parameter description:
 ```shell
 name: name of the CCR synchronization task, should be unique
 host, port: host and mysql(jdbc) port for the master FE for the corresponding 
cluster
-user, password: the credentials used by the syncer to initiate transactions, 
fetch data, etc.
+user, password: the credentials used by the Syncer to initiate transactions, 
fetch data, etc.
 If it is synchronization at the database level, specify your_db_name and leave 
your_table_name empty
 If it is synchronization at the table level, specify both your_db_name and 
your_table_name
 The synchronization task name can only be used once.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/manual.md
index 678b7812a3..819c923561 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/manual.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/manual.md
@@ -24,6 +24,15 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+## 使用限制
+
+### 网络约束
+
+- 需要 Syncer 与上下游的 FE 和 BE 是互通的
+
+- 下游 BE 与上游 BE 通过 Doris BE 进程使用的 IP （`show frontends/backends` 看到的） 是直通的。
+
+
 ## 启动 Syncer
 
 根据配置选项启动 Syncer，并且在默认或指定路径下保存一个 pid 文件，pid 文件的命名方式为`host_port.pid`。
@@ -141,7 +150,7 @@ host 默认值为 127.0.0.1，port 的默认值为 9190
 
 用于指定 pid 文件的保存路径
 
-pid 文件是 stop_syncer.sh 脚本用于关闭 Syncer 的凭据，里面保存了对应 Syncer 的进程号，为了方便 Syncer 
的集群化管理，可以指定 pid 文件的保存路径
+pid 文件是 stop_syncer.sh 脚本用于停止 Syncer 的凭据，里面保存了对应 Syncer 的进程号，为了方便 Syncer 
的集群化管理，可以指定 pid 文件的保存路径
 
 ```sql
 bash bin/start_syncer.sh --pid_dir /path/to/pids
@@ -151,7 +160,7 @@ bash bin/start_syncer.sh --pid_dir /path/to/pids
 
 ## 停止 Syncer
 
-根据默认或指定路径下 pid 文件中的进程号关闭对应 Syncer，pid 文件的命名方式为`host_port.pid`。
+根据默认或指定路径下 pid 文件中的进程号停止对应 Syncer，pid 文件的命名方式为`host_port.pid`。
 
 **输出路径下的文件结构**
 
@@ -175,35 +184,35 @@ output_dir
 
 **停止选项**
 
-有三种关闭方法：
+有三种停止方法：
 
-1. 关闭目录下单个 Syncer
+1. 停止目录下单个 Syncer
 
-    指定要关闭 Syncer 的 host && port，注意要与 start_syncer 时指定的 host 一致
+    指定要停止 Syncer 的 host && port，注意要与 start_syncer 时指定的 host 一致
 
-2. 批量关闭目录下指定 Syncer
+2. 批量停止目录下指定 Syncer
 
-    指定要关闭的 pid 文件名，以空格分隔，用`" "`包裹
+    指定要停止的 pid 文件名，以空格分隔，用`" "`包裹
 
-3. 关闭目录下所有 Syncer
+3. 停止目录下所有 Syncer
 
     默认即可
 
 1. --pid_dir
 
-指定 pid 文件所在目录，上述三种关闭方法都依赖于 pid 文件的所在目录执行
+指定 pid 文件所在目录，上述三种停止方法都依赖于 pid 文件的所在目录执行
 
 ```shell
 bash bin/stop_syncer.sh --pid_dir /path/to/pids
 ```
 
-例子中的执行效果就是关闭`/path/to/pids`下所有 pid 文件对应的 Syncers（**方法 
3**），`--pid_dir`可与上面三种关闭方法组合使用。
+例子中的执行效果就是停止`/path/to/pids`下所有 pid 文件对应的 Syncer（**方法 
3**），`--pid_dir`可与上面三种停止方法组合使用。
 
 默认值为`SYNCER_OUTPUT_DIR/bin`
 
 2. --host && --port
 
-关闭 pid_dir 路径下 host:port 对应的 Syncer
+停止 pid_dir 路径下 host:port 对应的 Syncer
 
 ```shell
 bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190
@@ -217,7 +226,7 @@ host 与 port 都不为空时**方法 1**才能生效
 
 3. --files
 
-关闭 pid_dir 路径下指定 pid 文件名对应的 Syncer
+停止 pid_dir 路径下指定 pid 文件名对应的 Syncer
 
 ```shell
 bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid"
@@ -279,7 +288,7 @@ curl -X POST -H "Content-Type: application/json" -d '{
 
 - thrift_port：对应 FE 的 rpc_port
 
-- user、password：syncer 以何种身份去开启事务、拉取数据等
+- user、password：Syncer 以何种身份去开启事务、拉取数据等
 
 - database、table：
 
@@ -393,7 +402,7 @@ bash bin/enable_db_binlog.sh -h host -p port -u user -P 
password -d db
 
 ## Syncer 高可用
 
-Syncer 高可用依赖 mysql，如果使用 mysql 作为后端存储，Syncer 可以发现其它 syncer，如果一个 crash 
了，其他会分担他的任务
+Syncer 高可用依赖 mysql，如果使用 mysql 作为后端存储，Syncer 可以发现其它 Syncer，如果一个 crash 
了，其他会分担他的任务
 
 ## 权限要求
 
@@ -409,33 +418,11 @@ Syncer 高可用依赖 mysql，如果使用 mysql 作为后端存储，Syncer 
 
 加上 Admin 权限 (之后考虑彻底移除), 这个是用来检测 enable binlog config 的，现在需要 admin
 
-## 使用限制
-
-### 网络约束
-
-- 需要 Syncer 与上下游的 FE 和 BE 都是通的
-
-- 下游 BE 与上游 BE 是通的
-
-- 对外 IP 和 Doris 内部 IP 是一样的，也就是说`show frontends/backends`看到的，和能直接连的 IP 
是一致的，要是直连，不能是 IP 转发或者 nat
-
-### ThriftPool 限制
-
-开大 thrift thread pool 大小，最好是超过一次 commit 的 bucket 数目大小
 
 ### 版本要求
 
-版本最低要求：v2.0.3
-
-### 不支持的操作
-
-- rename table 支持有点问题
-
-- 不支持一些 trash 的操作，比如 table 的 drop-recovery 操作
-
-- 和 rename table 有关的，replace partition 与
+版本最低要求：v2.0.15
 
-- 不能发生在同一个 db 上同时 backup/restore
 
 ## Feature
 
@@ -465,7 +452,7 @@ CCR 功能在建立同步时，会在目标集群中创建源集群同步范围
 
 - 源表中包含了可能没有被同步到目标集群的信息，如`storage_policy`等，可能会导致目标表创建失败或者行为异常。
 
-- 源表中可能包含一些动态功能，如动态分区等，可能导致目标表的行为不受 syncer 控制导致 partition 不一致。
+- 源表中可能包含一些动态功能，如动态分区等，可能导致目标表的行为不受 Syncer 控制导致 partition 不一致。
 
 在被复制时因失效而需要擦除的属性有：
 
@@ -481,15 +468,15 @@ CCR 功能在建立同步时，会在目标集群中创建源集群同步范围
 
 ### 实现
 
-在创建目标表时，这条属性将会由 syncer 控制添加或者删除，在 CCR 功能中，创建一个目标表有两个途径：
+在创建目标表时，这条属性将会由 Syncer 控制添加或者删除，在 CCR 功能中，创建一个目标表有两个途径：
 
-1. 在表同步时，syncer 通过 backup/restore 的方式对源表进行全量复制来得到目标表。
+1. 在表同步时，Syncer 通过 backup/restore 的方式对源表进行全量复制来得到目标表。
 
-2. 在库同步时，对于存量表而言，syncer 同样通过 backup/restore 的方式来得到目标表，对于增量表而言，syncer 会通过携带有 
CreateTableRecord 的 binlog 来创建目标表。
+2. 在库同步时，对于存量表而言，Syncer 同样通过 backup/restore 的方式来得到目标表，对于增量表而言，Syncer 会通过携带有 
CreateTableRecord 的 binlog 来创建目标表。
 
 综上，对于插入`is_being_synced`属性有两个切入点：全量同步中的 restore 过程和增量同步时的 getDdlStmt。
 
-在全量同步的 restore 过程中，syncer 会通过 rpc 发起对原集群中 snapshot 的 restore，在这个过程中为会为 
RestoreStmt 添加`is_being_synced`属性，并在最终的 restoreJob 
中生效，执行`isBeingSynced`的相关逻辑。在增量同步时的 getDdlStmt 中，为 getDdlStmt 方法添加参数`boolean 
getDdlForSync`，以区分是否为受控转化为目标表 ddl 的操作，并在创建目标表时执行`isBeingSynced`的相关逻辑。
+在全量同步的 restore 过程中，Syncer 会通过 rpc 发起对原集群中 snapshot 的 restore，在这个过程中为会为 
RestoreStmt 添加`is_being_synced`属性，并在最终的 restoreJob 
中生效，执行`isBeingSynced`的相关逻辑。在增量同步时的 getDdlStmt 中，为 getDdlStmt 方法添加参数`boolean 
getDdlForSync`，以区分是否为受控转化为目标表 ddl 的操作，并在创建目标表时执行`isBeingSynced`的相关逻辑。
 
 对于失效属性的擦除无需多言，对于上述功能的失效需要进行说明：
 
@@ -499,4 +486,4 @@ CCR 功能在建立同步时，会在目标集群中创建源集群同步范围
 
 ### 注意
 
-在未出现异常时，`is_being_synced`属性应该完全由 syncer 控制开启或关闭，用户不要自行修改该属性。
+在未出现异常时，`is_being_synced`属性应该完全由 Syncer 控制开启或关闭，用户不要自行修改该属性。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/overview.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/overview.md
index 39b1b32f05..c9f4d5c4b9 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/overview.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/overview.md
@@ -1,8 +1,6 @@
 ---
-{
-    "title": "概述",
-    "language": "zh-CN"
-}
+title: 概述
+language: zh-CN
 ---
 
 <!--
@@ -26,55 +24,70 @@ under the License.
 
 ## 概览
 
-### CCR 是什么
-
-CCR(Cross Cluster Replication) 
是跨集群数据同步，能够在库/表级别将源集群的数据变更同步到目标集群，可用于在线服务的数据可用性、隔离在离线负载、建设两地三中心。
+CCR (Cross Cluster Replication) 
是一种跨集群数据同步机制，能够在库或表级别将源集群的数据变更同步到目标集群。它主要用于提升在线服务的数据可用性、读写负载隔离和建设两地三中心架构。
 
 ### 适用场景
 
-CCR 通常被用于容灾备份、读写分离、集团与公司间数据传输和隔离升级等场景。
+CCR 适用于以下几种常见场景：
+
+- **容灾备份**：将企业数据备份到另一集群和机房，确保在业务中断或数据丢失时能够恢复数据，或快速实现主备切换。金融、医疗、电子商务等行业通常需要这种高 
SLA 的容灾备份。
 
-- 容灾备份：通常是将企业的数据备份到另一个集群与机房中，当突发事件导致业务中断或丢失时，可以从备份中恢复数据或快速进行主备切换。一般在对 SLA 
要求比较高的场景中，都需要进行容灾备份，比如在金融、医疗、电子商务等领域中比较常见。
+- 
**读写分离**：通过将数据的查询操作与写入操作分离，减小读写之间的相互影响，提升服务稳定性。对于高并发或写入压力大的场景，采用读写分离可以有效分散负载，提升数据库性能和稳定性。
 
-- 
读写分离：读写分离是将数据的查询操作和写入操作进行分离，目的是降低读写操作的相互影响并提升资源的利用率。比如在数据库写入压力过大或在高并发场景中，采用读写分离可以将读/写操作分散到多个地域的只读/只写的数据库案例上，减少读写间的互相影响，有效保证数据库的性能及稳定性。
+- **数据集中**：集团总部需统一管理和分析分布在不同地域的分公司数据，避免因数据不一致导致的管理混乱和决策错误，从而提升集团管理效率和决策质量。
 
-- 
集团与分公司间数据传输：集团总部为了对集团内数据进行统一管控和分析，通常需要分布在各地域的分公司及时将数据传输同步到集团总部，避免因为数据不一致而引起的管理混乱和决策错误，有利于提高集团的管理效率和决策质量。
+- **隔离升级**：在进行系统集群升级时，使用 CCR 
可以在新集群中进行验证和测试，避免因版本兼容问题导致的回滚困难。用户可以逐步升级各个集群，同时保证数据一致性。
 
-- 隔离升级：当在对系统集群升级时，有可能因为某些原因需要进行版本回滚，传统的升级模式往往会因为元数据不兼容的原因无法回滚。而使用 CCR 
可以解决该问题，先构建一个备用的集群进行升级并双跑验证，用户可以依次升级各个集群，同时 CCR 也不依赖特定版本，使版本的回滚变得可行。
+- **集群迁移**：在进行 Doris 集群的机房搬迁或设备更换时，使用 CCR 可以将老集群的数据同步到新集群，确保迁移过程中的数据一致性。
 
 ### 任务类别
 
-CCR 支持两个类别的任务，分别是库级别和表级别，库级别的任务同步一个库的数据，表级别的任务只同步一个表的数据。
+CCR 支持两种任务类型：
+
+- **库级任务**：同步整个数据库的数据。
+- **表级任务**：仅同步指定表的数据。注意，表级同步不支持重命名表或替换表操作。此外，Doris 
每个数据库只能同时运行一个快照任务，因此表级同步的全量同步任务需要排队执行。
 
 ## 原理与架构
 
 ### 名词解释
 
-源集群：源头集群，业务数据写入的集群，需要 2.0 版本
+- **源集群**：数据源所在的集群，通常为业务数据写入的集群。
+- **目标集群**：跨集群同步的目标集群。
+- **binlog**：源集群的变更日志，包含了 schema 和数据变更。
+- **Syncer**：一个轻量级的进程，负责同步数据。
+- **上游**：在库级任务中指上游库，在表级任务中指上游表。
+- **下游**：在库级任务中指下游库，在表级任务中指下游表。
 
-目标集群：跨集群同步的目标集群，需要 2.0 版本
+### 架构说明
 
-binlog：源集群的变更日志，包括 schema 和数据变更
+![CCR 架构说明](/images/ccr-architecture-description.png)
 
-syncer：一个轻量级的进程
+CCR 主要依赖一个轻量级进程：`Syncer`。`Syncer` 负责从源集群获取 
binlog，并将元数据应用到目标集群，通知目标集群从源集群拉取数据，从而实现全量同步和增量同步。
 
-上游：库级别任务时指上游库，表级别任务时指上游表。
+### 原理
 
-下游：库级别任务时指下游库，表级别人物时指下游表。
+1. **全量同步**：
+   - CCR 任务会首先进行全量同步，将上游数据一次性完整地复制到下游。
 
-### 架构说明
+2. **增量同步**：
+   - 在全量同步完成后，CCR 任务会继续进行增量同步，保持上游和下游数据的一致性。
 
-![ccr 架构说明](/images/ccr-architecture-description.png)
+3. **重新开始全量同步的情况**：
+   - 遇到当前不支持增量同步的 DDL 操作时，CCR 任务会重新启动全量同步。具体哪些 DDL 
操作不支持增量同步，请参见[功能详情](../feature.md)。
+   - 如果上游的 binlog 因为过期或其他原因中断，增量同步会停止，并触发全量同步的重新开始。
 
-CCR 工具主要依赖一个轻量级进程：Syncers。Syncers 会从源集群获取 
binlog，直接将元数据应用于目标集群，通知目标集群从源集群拉取数据。从而实现全量和增量迁移。
+4. **同步过程中**：
+   - 在全量同步进行期间，增量同步会暂停。
+   - 全量同步完成后，下游的数据表会进行原子替换，以确保数据一致性。
+   - 全量同步完成后，会恢复增量同步。
 
 ### 同步方式
 
 CCR 支持四种同步方式：
 
-| 同步方式    |   原理    |      触发时机     |
-|------------|-----------|------------------|
-| Full Sync  |  上游全量backup，下游restore。 | 首次同步或者操作触发，操作见功能列表。 |
-| Partial Sync  |  上游表或者分区级别 Backup，下游表或者分区级别restore。 | 操作触发，操作见功能列表。 |
-| TXN  |  增量数据同步，上游提交之后，下游开始同步。 | 操作触发，操作见功能列表。 |
-| SQL  |  在下游回放上游操作的 SQL。 | 操作触发，操作见功能列表。 |
+| 同步方式       | 原理                                                   | 触发时机     
                                            |
+|----------------|--------------------------------------------------------|----------------------------------------------------------|
+| **Full Sync**  | 上游进行全量备份，下游进行恢复。DB 级任务触发 DB 备份，表级任务触发表备份。 | 
首次同步或特定操作触发。触发条件请参见[功能详情](../feature.md)。 |
+| **Partial Sync** | 上游表或分区级别备份，下游表或分区级别恢复。             | 
特定操作触发，触发条件请参见[功能详情](../feature.md)。   |
+| **TXN**        | 增量数据同步，上游提交后，下游开始同步。                   | 
特定操作触发，触发条件请参见[功能详情](../feature.md)。   |
+| **SQL**        | 在下游回放上游操作的 SQL。                              | 
特定操作触发，触发条件请参见[功能详情](../feature.md)。   |
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/quickstart.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/quickstart.md
index 2ae696a034..bade155535 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/quickstart.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/quickstart.md
@@ -25,7 +25,7 @@ under the License.
 -->
 
 
-使用非常简单，只需把 Syncers 服务启动，给他发一个命令，剩下的交给 Syncers 完成就行。
+使用非常简单，只需把 Syncer 服务启动，给他发一个命令，剩下的交给 Syncer 完成就行。
 
 ## 第一步. 部署源 Doris 集群
 
@@ -39,9 +39,9 @@ under the License.
 enable_feature_binlog=true
 ```
 
-## 第四步. 部署 syncers
+## 第四步. 部署 Syncer
 
-4.1. 构建 CCR syncer
+4.1. 构建 CCR Syncer
 
     ```shell
     git clone https://github.com/selectdb/ccr-syncer
@@ -53,7 +53,7 @@ enable_feature_binlog=true
     cd SYNCER_OUTPUT_DIR# 联系相关同学免费获取 ccr 二进制包
     ```
 
-4.2. 启动和停止 syncer
+4.2. 启动和停止 Syncer
 
     ```shell
     # 启动
@@ -75,7 +75,7 @@ vim shell/enable_db_binlog.sh
 ALTER TABLE enable_binlog SET ("binlog.enable" = "true");
 ```
 
-## 第六步. 向 syncer 发起同步任务
+## 第六步. 向 Syncer 发起同步任务
 
 ```shell
 curl -X POST -H "Content-Type: application/json" -d '{


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris-website) branch master updated: [improvement](ccr) improve overview of ccr (#1427)

Reply via email to