LemonCL opened a new pull request, #60950:
URL: https://github.com/apache/doris/pull/60950
## Proposed changes
This PR fixes a critical synchronization issue between
`Tablet::_cumulative_point` (in-memory) and
`TabletMeta::_cumulative_layer_point` (persistent in RocksDB), causing
cumulative compaction to lose progress after BE restart.
## Problem description
### Root Cause
There are two separate variables tracking the cumulative point:
- `Tablet::_cumulative_point` - Runtime value, updated by compaction
- `TabletMeta::_cumulative_layer_point` - Persistent value, stored in RocksDB
**These two were never synchronized**, leading to:
1. **After compaction**: `_cumulative_point` updated to 100, but
`_cumulative_layer_point` remains -1
2. **After BE restart**: Tablet loads `_cumulative_layer_point = -1` from
RocksDB, but constructor hardcoded `_cumulative_point = -1`
3. **Result**: Cumulative compaction restarts from scratch after every BE
restart
### Impact
- ❌ Cumulative compaction progress lost after BE restart
- ❌ Clone/Restore scenarios lose cumulative point information
- ❌ Inefficient compaction due to repeated work
## What changes were proposed in this pull request?
Add bidirectional synchronization between the two cumulative point variables:
### 1. Load Path (BE Restart/Clone/Restore)
**File**: `be/src/olap/tablet.cpp:260`
```cpp
// Before: hardcoded to -1
_cumulative_point(K_INVALID_CUMULATIVE_POINT),
// After: load from TabletMeta (which was deserialized from RocksDB)
_cumulative_point(_tablet_meta->cumulative_layer_point()),
```
**Data Flow**:
```
BE Startup
→ DataDir::load()
→ TabletMetaManager::traverse_headers() [RocksDB iteration]
→ TabletManager::load_tablet_from_meta(meta_binary)
→ TabletMeta::deserialize(meta_binary)
→ _cumulative_layer_point = pb.cumulative_layer_point() [from
RocksDB]
→ Tablet::Tablet(tablet_meta)
→ _cumulative_point = _tablet_meta->cumulative_layer_point()
[FIXED]
```
### 2. Save Path (After Compaction)
**File**: `be/src/olap/tablet.cpp:341`
```cpp
void Tablet::save_meta() {
check_table_size_correctness();
// NEW: Sync in-memory value to TabletMeta before persisting
_tablet_meta->set_cumulative_layer_point(_cumulative_point);
auto res = _tablet_meta->save_meta(_data_dir);
// ...
}
```
**Data Flow**:
```
CumulativeCompaction::execute_compact()
→ update_cumulative_point()
→ Tablet::set_cumulative_layer_point(100) [updates _cumulative_point]
→ Tablet::save_meta()
→ _tablet_meta->set_cumulative_layer_point(_cumulative_point) [NEW:
sync]
→ TabletMeta::save_meta()
→ serialize to RocksDB
```
## Types of changes
What types of changes does your code introduce to Doris?
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that could cause existing
functionality to not work as expected)
- [ ] Documentation Update (if none of the other choices apply)
- [ ] Code refactor (Modify the code structure, format or style, but without
changing the code logic)
- [ ] Optimization. Including functional usability improvements and
performance improvements.
- [ ] Dependency. Such as changes related to third-party components.
## Checklist
- [x] I have read the [**CLA
Document**](https://cla-assistant.io/apache/doris) and made sure all commits
are signed-off.
- [x] I have added necessary documentation (if appropriate)
- [x] Any dependent changes have been merged
- [ ] I have added test cases that prove my fix is effective or that my
feature works
## Further comments
### Why is this safe?
**For existing tablets (BE restart/clone/restore)**:
- `TabletMeta::_cumulative_layer_point` is correctly loaded from RocksDB via
`init_from_pb()`
- Constructor now reads this value instead of hardcoding to -1
- ✅ Restores the previously saved cumulative point
**For new tablet creation**:
- `TabletMeta` constructor sets `_cumulative_layer_point = -1` (line 171 in
tablet_meta.cpp)
- Constructor reads this value: `_cumulative_point = -1`
- ✅ Still correctly initializes new tablets
### Related TODO Comment
This fix also resolves an existing TODO:
```cpp
// TODO(ygl): lost some information here, such as cumulative layer point
// engine_storage_migration_task.cpp:348
```
### Test Plan
Manual testing:
1. Create tablet and run cumulative compaction
2. Check `_cumulative_point` is updated (e.g., to 100)
3. Restart BE
4. Verify `_cumulative_point` is 100 (not -1)
Existing tests should pass as this doesn't change behavior for correctly
implemented code paths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]