https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #9 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
The most pronounced difference for depth=18 seems to be caused by m_b_r
over-allocating by 2x: internally it mallocs 2x of the size given to the
constructor, and then Linux pre-faults those extra pages, penalizing the
benchmark.

Dividing estimated size by 2 to counter the over-allocation effect:

    MemoryPool store (poolSize(stretch_depth) / 2);

substantially improves the benchmark for me.

I think the rest of the slowdown can be attributed to m_b_r simply doing more
work internally compared to your bare-bones malloc allocator (I'm seeing less
pronounced differences though, I'm testing on a Sandybridge CPU with -O2).

Reply via email to