[I] [DSIP-103][Migration]Support zero-downtime cross-major-version migration using Flink-CDC [dolphinscheduler]

via GitHub Thu, 01 Jan 2026 23:27:03 -0800


victorsheng opened a new issue, #17835:
URL: https://github.com/apache/dolphinscheduler/issues/17835


   ### Search before asking
   
   - [x] I had searched in the 
[DSIP](https://github.com/apache/dolphinscheduler/issues/14102) and found no 
similar DSIP.
   
   
   ### Motivation
   
   Currently, upgrading Apache DolphinScheduler between major versions (e.g., 
from 1.3.x to 3.x.x) relies on the official upgrade-schema.sh script. This 
approach has several limitations for large-scale production environments:
   
   - Downtime Requirement: The master/worker nodes and the metadata database 
must be offline during the schema upgrade, which is unacceptable for 24/7 SLA 
requirements.
   - All-or-Nothing Risk: It is impossible to migrate only a subset of 
projects. If an upgrade fails, rolling back a massive database is 
time-consuming and risky.
   - Schema Complexity: Major versions (especially the jump from 1.x to 
2.x/3.x) introduced significant changes, such as the decoupling of task and 
process definitions.
   
   Using Flink-CDC as a migration engine allows for real-time metadata 
synchronization, gradual "canary" migrations of specific workflows, and zero 
downtime for the source system.
   
   ### Design Detail
   
   The migration tool will be implemented as a Flink application that captures 
changes from the source metadata database and sinks them into the target 
database after applying version-specific transformations.
   
   1. Architecture:
   
   - Source: MySQL/PostgreSQL (Source DS Database) using Flink CDC Connectors.
   - Transformation Layer: A custom MapFunction or ProcessFunction that handles 
the schema mapping logic. For example:
     - Converting the process_definition_json in 1.3.x into the decoupled 
task_definition and task_relation in 3.x.x.
     - Generating new snowflake IDs (Global IDs) for the target version.
   - Sink: JDBC Sink (Target DS Database).
   
   2. Key Components:
   
   - Granular Filter: A configuration parameter (e.g., migration.project.codes) 
to allow users to select specific projects for migration.
   - Stateful Mapping: Use Flink State to maintain the mapping between old IDs 
and new IDs to ensure consistency across multiple tables.
   - Data Conversion Engine: A dedicated module to parse 1.x JSON strings and 
reconstruct them into the target version's relational model.
   
   ### Compatibility, Deprecation, and Migration Plan
   
   - Compatibility:  This feature is an alternative migration path and does not 
replace the existing upgrade-schema.sh.
     - It supports "Source-Live" mode, where the source system remains 
read-write while the target system is being populated.
   - Deprecation: None.
   - Migration Plan:
     - Deploy the Target DolphinScheduler version (fresh install).
     - Configure and start the Flink-CDC migration job.
     - Perform verification on the Target environment (e.g., dry-run workflows).
     - Gradually switch the scheduling traffic from Source to Target by project.
     - Stop the CDC job once all projects are migrated.
   
   ### Test Plan
   
   - Unit Tests:
      - Validate the JSON transformation logic from 1.3.x to 3.x.x.
     - Test the ID generator and mapping state.
   - Integration Tests:
     - End-to-end migration from a standard DS 1.3.5 database to a DS 3.2.2 
database.
     - Verify workflow execution on the target side after migration.
   - Consistency Tests:
     - Compare the MD5 of process definitions between source and target.
     - Validate record counts across all core tables (t_ds_project, 
t_ds_process_definition, etc.).
   - Performance Tests:
     - Benchmark the migration speed for environments with >10,000 workflow 
definitions.
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [DSIP-103][Migration]Support zero-downtime cross-major-version migration using Flink-CDC [dolphinscheduler]

Reply via email to