xuchenhao opened a new issue, #59504: URL: https://github.com/apache/doris/issues/59504
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version master ### What's Wrong? **Location**: `be/src/io/cache/fs_file_cache_storage.cpp` in `FSFileCacheStorage::load_cache_info_into_memory()` (around line 880) **Description**: In the cache loading logic, we calculate consistency between RocksDB metadata and filesystem using: ``` double difference_ratio = (static_cast<double>(estimated_file_count) - static_cast<double>(db_block_count)) / static_cast<double>(estimated_file_count); ``` This formula assumes `estimated_file_count >= db_block_count`, where: - `estimated_file_count = directory_size / 1MB` (upper-bound assumption) - `db_block_count` = actual cache blocks loaded from RocksDB However, in **data lake scenarios** with many small files (<1MB), this estimation becomes an **underestimation**, resulting in `estimated_file_count < db_block_count` and producing negative `difference_ratio` values. **Impact**: 1. Inaccurate metric: Negative ratios don't represent the actual discrepancy magnitude 2. Wrong decisions: May incorrectly skip filesystem reload when difference_ratio is negative but below threshold ### What You Expected? **Suggested fix**: Use absolute value to measure the discrepancy magnitude: ``` double difference_ratio = std::abs(static_cast<double>(estimated_file_count) - static_cast<double>(db_block_count)) / static_cast<double>(estimated_file_count); ``` ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
