Nice Job! As a database, we really need to manage memory usage more reasonably by ourselves.
-- 此致!Best Regards 陈明雨 Mingyu Chen Email: [email protected] At 2019-09-10 22:16:43, "Zhao Chun" <[email protected]> wrote: >Hi all > >I want to add a chunk allocator, following is design. > >## Motivation >In the case of high concurrency testing, many threads are waiting to be >applied and released in memory, and a large part of them are released by >Chunk in MemPool. One of the reasons for this is that MemPool is used >everywhere in code. On the other hand, the memory usage of these chunks is >relatively large 4K - 512K. This large amount of memory make TCMalloc >easily exceed the free memory reserved for each thread and needs to be >applied to the central memory. >Therefore, I implemented a demo ChunkAllocator to keep the released Chunk, >avoiding frequent allocate from or release to TCMalloc. Using this demo to >test the same high concurrency case, the throughput is more than doubled. >The throughput has increased from 280 QPS to 650 QPS. So based on this, I >want to implement a ChunkAllocator to reduce the allocation and release >operations of Chunk from system allocator, thus improving the performance >of the system. >## Design >How to manage free Chunks? The size of the Chunk is power-of-two, so we can >maintain a separate free chunk list for each size. When the Chunk is no >longer used, it will be placed in the free list of the corresponding size. >When allocating a new Chunk, it will first try to find it from the >corresponding size free list. If it can't find it, try to allocate a new >Chunk from the system allocator. >In order to avoid the Chunk Allocator's lock conflict which will affect >system performance, we need to reduce the collision domain. The idea here >is to maintain an Chunk Arena for each CPU core. When allocating, try to >allocate memory from the corresponding Chunk Arena. >For memory limitations, there are two options. One is to set a limit on the >total amount of memory that can be allocated; and the other is to set a >limit on the maximum amount of free memory that is reserved. In order to be >compatible with the current system behavior, I intend to limit only the >total amount of reserved memory. This only fails when the system memory is >completely drained, which is consistent with the current behavior. The >larger the reserved free memory limit is, the better it will result in a >better cache hit, but it will also lead to excessive free memory, causing >other modules hard to allocate memory. >What system allocator is used? malloc vs mmap? Currently, Malloc is used. >If we change to mmap and do not change the system >parameters(vm.max_map_count), it may cause the memory allocating to fail >even if there is memory. We can implement these two types system allocator, >and then leave a configure to choose which way to complete the system >memory allocation. And configure malloc as default >future work: >All large memory applications in the system can be applied through Chunk >Allocator, so that the Chunk Allocator can be changed from the reserved >limit to the memory allocating limit. >## Structure >``` >Struct Chunk { > Uint8_t* data; > Size_t size; > // core id from which this chunk was allocated > Int core_id; >}; >// Keep free chunk for each CPU core >Class ChunkArena { >Public: > // Pop a free chunk from correspoding fres list > // Return true if success with valid chunk saved in "chunk" > Bool pop_free_chunk(size_t size, Chunk* chunk); > > // push a free chunk in this arena for later use > Void push_free_chunk(const Chunk& chunk); >}; >Class ChunkAllocator { >Public: > // Allocate memory in size, size must be power-of-two. > // Return Status::OK() if success, and allocated chunk info will be >saved in chunk > Status allocate(size_t size, Chunk* chunk); > > Void free(const Chunk& chunk); >}; >``` >Allocate process: >1. Get the current core_id >2. Try to apply for an idle Chunk from the corresponding Arena. If >successful, return the corresponding Chunk. >3. Try to get free Chunk from Arena corresponding to other cores. If >successful, return to Chunk >4. Assign Chunk from the system allocator >Release process: >1. Determine if there is enough cache capacity, and if so, place the chunk >in the idle queue for the corresponding Arena. >2. Call the system release function to release the resource > >I create an issue in github[1], look forward to your feedback. > >1. https://github.com/apache/incubator-doris/issues/1776
