AshinGau opened a new pull request, #38184:
URL: https://github.com/apache/doris/pull/38184

   ## Proposed changes
   
   ### Three Optimizations
   1. **Default Value Change**: Rename `file_cache_max_file_segment_size` as 
`file_cache_each_block_size`, and rename `file_cache_min_file_segment_size` as 
`file_cache_hdfs_block_size`(keep the same as master branch). Changed the 
default value of `file_cache_hdfs_block_size` from 1MB to 4KB to reduce read 
amplification. This adjustment aims to enhance read performance by minimizing 
the unnecessary reading of extra data. While this modification reduces the 
amount of data read per segment, it also results in the creation of a larger 
number of small files, which could have implications for file management and 
performance.
   2. **Asynchronous Write**: Introduced an asynchronous write interface 
async_write in the FileBlock class. This enhancement allows data to be read 
without having to wait for the completion of writing to the cache file. By 
decoupling read operations from write operations, this feature significantly 
improves read efficiency and reduces latency, ensuring that read operations are 
not blocked by write operations.
   3. **Background Merging Process**: Implemented a daemon process that runs 
when `CachedRemoteFileReader::close()` to check and merge small files. This 
process is designed to mitigate the potential negative effects introduced by 
the first optimization, specifically the accumulation of small files. By 
merging small files into larger ones, this daemon process helps maintain system 
performance and prevents degradation caused by excessive small file handling.
   
   ### Effects
   Before Opt.
   
![image](https://github.com/apache/doris/assets/19337507/a88dd687-f63e-4f2d-b536-42861efc77f1)
   After Opt.
   
![image](https://github.com/apache/doris/assets/19337507/7e5df9d6-f9ee-455f-adb7-4d420d60a045)
   
   ### Restful API
   ```shell
   # release all cached files
   curl http://${be}:${webserver_port}/api/file_cache?op=release
   # merge small cached files
   curl http://${be}:${webserver_port}/api/file_cache?op=merge
   ```
   
   ### Test
   After extensive stability testing, with scripts running continuously for 
over 2 hours, no concurrency issues or query errors were detected. When no 
deletion or merging operations were performed, the total execution time for the 
test SQL was 7.42 seconds. Under conditions of frequent deletion and merging of 
cache files, the total execution time for the test SQL was 8.73 seconds. 
Overall, performance remained consistent with minimal fluctuation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to