https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #4 from Dmitriy Ovdienko <dmitriy.ovdienko at gmail dot com> --- Created attachment 49190 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49190&action=edit Modified solution with custom allocator based on malloc (simplified, single threaded) Attached is a benchmark based on Malloc allocator, modified simplified single threaded. Following is a execution time for different tree depth: depth_17 depth_18 depth_19 bt_pmr_0thrd 0.105s 0.313s 0.577s bt_malloc_0thrd 0.087s 0.147s 0.448s Commandline is: time ./bt_pmr_0thrd <depth> time ./bt_malloc_0thrd <depth> On depth=18 boundary there is 2x times difference.