wyxxxcat opened a new pull request, #61666:
URL: https://github.com/apache/doris/pull/61666
## Summary
This PR implements a sharded LRU cache for `meta_tablet_idx_key` lookups in
both MetaService and Recycler to reduce frequent FDB reads of immutable
metadata.
## Background
MS-side operations like `commit_rowset`, `finish_tablet_job`, and
`commit_txn` frequently read the same tablet index metadata (`TabletIndexPB`),
which is nearly immutable after creation. This causes unnecessary FDB IO
overhead.
## Implementation
### Core Components
1. **KvCache Template** (`cloud/src/common/kv_cache.h`)
- Generic sharded LRU cache with 16 shards by default
- Reduces lock contention in high-concurrency scenarios
- Supports any `KeyTuple` and `ValuePB` types
- TTL support: entries expire after configurable time
2. **KvCacheManager** (`cloud/src/common/kv_cache_manager.h`)
- Manages cache instances with configurable capacity and TTL
- Extensible for future cache types (e.g., SchemaCache)
3. **Configuration** (`cloud/src/common/config.h`)
- `ms_tablet_index_cache_capacity`: MS cache capacity (default: 500000)
- `recycler_tablet_index_cache_capacity`: Recycler cache capacity
(default: 500000)
- `tablet_index_cache_ttl_seconds`: TTL in seconds (default: 0, no TTL)
### Integration Points
**MetaService** (`cloud/src/meta-service/meta_service.cpp`):
- Initialize global `g_ms_cache_manager` in constructor
- Add cache lookup/put in `get_tablet_idx()` function
- Transparent to callers - no API changes required
**Recycler** (`cloud/src/recycler/util.cpp`, `recycler.cpp`):
- Initialize global `g_recycler_cache_manager` in `Recycler::start()`
- Add cache lookup/put in recycler's `get_tablet_idx()` function
- Invalidate cache when deleting tablet_idx_key in `recycle_tablets()`
### Cache Invalidation Strategy
- **MS**: Can invalidate on `drop_tablet`/`drop_index`/`drop_partition` if
needed
- **Recycler**: Actively invalidates cache when deleting tablet_idx_key in
`recycle_tablets()`
- **TTL**: Entries automatically expire after configured TTL (if enabled)
## Testing
Added comprehensive unit tests in `cloud/test/kv_cache_test.cpp`:
- Basic get/put operations
- LRU eviction behavior
- Cache invalidation
- Concurrent access (8 threads)
## Performance Benefits
- Reduces FDB read operations for frequently accessed tablet metadata
- 16-way sharding minimizes lock contention under high concurrency
- Transparent integration - zero impact on existing code paths
- Dual eviction: LRU + TTL for flexible cache management
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]