peterylh opened a new pull request, #54494:
URL: https://github.com/apache/doris/pull/54494
# Add version checking support for replace partition operations
### What problem does this PR solve?
Issue Number: #52031
Problem Summary:
Currently, the `REPLACE PARTITION` operation in Apache Doris lacks version
checking mechanism, which can lead to data consistency issues during partition
replacement. When users perform partition replacement operations (such as
adjusting bucketing strategies or merging small partitions), concurrent data
modifications might occur between the time when temporary partitions are
created and when the actual replacement happens. This can result in data loss
or inconsistency.
### What changes were proposed in this PR?
This PR introduces a version checking mechanism for `REPLACE PARTITION`
operations to ensure data consistency and prevent data loss during partition
replacement.
**Key Changes:**
1. **Enhanced ReplacePartitionClause**: Added `expectedVersions` field and
related methods to support version checking in the legacy analysis framework.
2. **Enhanced ReplacePartitionOp**: Added corresponding support in the
Nereids optimizer framework with version parsing and validation capabilities.
3. **Version Validation Logic**: Implemented `validatePartitionVersions()`
method in `Env.java` to check partition versions before performing replacement
operations.
**Usage Example:**
```sql
ALTER TABLE page_views
REPLACE PARTITION (p20240601, p20240602)
WITH TEMPORARY PARTITION (tp20240601, tp20240602)
PROPERTIES (
"versions" = "p20240601:2, p20240602:3",
"strict_range" = "true"
);
```
**Benefits:**
- **Data Safety**: Prevents data loss by ensuring partitions haven't been
modified during the replacement process
- **Consistency**: Maintains data consistency across concurrent operations
- **Atomicity**: Provides atomic partition replacement with version
validation
- **Backward Compatibility**: The feature is optional and doesn't affect
existing workflows
### Release note
**Feature**
- Add version checking support for `REPLACE PARTITION` operations to prevent
data loss and ensure consistency during partition replacement. Users can now
specify expected partition versions using the `versions` property to validate
that partitions haven't been modified during the replacement process.
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [X] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]