AshinGau opened a new pull request, #38184: URL: https://github.com/apache/doris/pull/38184
## Proposed changes ### Three Optimizations 1. **Default Value Change**: Rename `file_cache_max_file_segment_size` as `file_cache_each_block_size`, and rename `file_cache_min_file_segment_size` as `file_cache_hdfs_block_size`(keep the same as master branch). Changed the default value of `file_cache_hdfs_block_size` from 1MB to 4KB to reduce read amplification. This adjustment aims to enhance read performance by minimizing the unnecessary reading of extra data. While this modification reduces the amount of data read per segment, it also results in the creation of a larger number of small files, which could have implications for file management and performance. 2. **Asynchronous Write**: Introduced an asynchronous write interface async_write in the FileBlock class. This enhancement allows data to be read without having to wait for the completion of writing to the cache file. By decoupling read operations from write operations, this feature significantly improves read efficiency and reduces latency, ensuring that read operations are not blocked by write operations. 3. **Background Merging Process**: Implemented a daemon process that runs when `CachedRemoteFileReader::close()` to check and merge small files. This process is designed to mitigate the potential negative effects introduced by the first optimization, specifically the accumulation of small files. By merging small files into larger ones, this daemon process helps maintain system performance and prevents degradation caused by excessive small file handling. ### Effects Before Opt.  After Opt.  ### Restful API ```shell # release all cached files curl http://${be}:${webserver_port}/api/file_cache?op=release # merge small cached files curl http://${be}:${webserver_port}/api/file_cache?op=merge ``` ### Test After extensive stability testing, with scripts running continuously for over 2 hours, no concurrency issues or query errors were detected. When no deletion or merging operations were performed, the total execution time for the test SQL was 7.42 seconds. Under conditions of frequent deletion and merging of cache files, the total execution time for the test SQL was 8.73 seconds. Overall, performance remained consistent with minimal fluctuation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org