https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #6 from Dmitriy Ovdienko <dmitriy.ovdienko at gmail dot com> --- > looking at cache-misses counter does not make sense here Well, if you compare Rust and C++, cache-misses CPU counter differs dramatically... and page-faults too... while amount of instructions is the same. Page-faults btw, can significantly affect performance too. It could happen that that is the reason. I've put all numbers into one table for convenience: | CPU counter | PMR | Malloc | Rust | |------------------|----------------|----------------|----------------| | cache-references | 45,104,136 | 40,713,525 | 29,268,774 | | cache-misses | 24,448,475 | 14,147,648 | 12,147,041 | | cycles | 19,904,251,283 | 14,823,743,812 | 24,539,557,585 | | instructions | 30,462,013,065 | 22,306,442,507 | 31,784,741,964 | | branches | 4,834,392,341 | 4,331,968,591 | 4,829,547,556 | | faults | 234,796 | 60,227 | 68,023 | | migrations | 2 | 6 | 8 | > The main gotcha here is m_b_r does not allocate on construction, but rather > allocates 2x of the preallocation size on first call to 'allocate' In the two previous posts I've attached a code that does not create any thread and allocates/deallocates memory in the loop. So, both samples have the same behaviour.