Ext3h opened a new issue, #48190:
URL: https://github.com/apache/arrow/issues/48190

   ### Describe the enhancement requested
   
   Right now there are integrations for `jemalloc` and `mimalloc` as 
alternative memory pools, but both of them only use the ***global*** heap APIs 
of the corresponding libraries. While that does work in general, there's still 
a significant performance benefit to using explicitly isolated heaps.
   
   I.e. when recording 100+ Parquet files in parallel in a single process on a 
system with 100+ logical processors, even `mimalloc` still performs by >10% 
better when using a dedicated heap per thread compared to sharing a single 
global heap. The main benefit here comes from the fact that this library was 
already properly tuned towards avoid any memory allocation pattern that 
couldn't be served by the backing arena, but those optimizations don't 
universally apply when too many threads interfere. This does come at the cost 
of a slight increase in total memory consumption, but in return a significant 
reduction of page table updates (due to hitting the systems allocators as well 
as page faults) which are getting quite costly with growing core counts.
   
   Given that `BaseMemoryPoolImpl ` is hidden in the implementation, 
implementing a new heap with correct alignment, reporting etc. is unnecessarily 
hard for the user of the external API.
   
   A copy of  `MimallocAllocator` - which is explicitly using the `mi_heap_*` 
instead of the `mi_*` allocation functions should be provided directly by arrow.
   
   Unlike the `MimallocAllocator` which is intended to be used as a shared or 
singleton-like global instance, the intended use of this pool is to explicitly 
create a distinct pool per NUMA region or potentially even per 
`ParquetFileWriter`.
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to