https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #11 from Jonathan Wakely <redi at gcc dot gnu.org> --- (In reply to Alexander Monakov from comment #9) > The most pronounced difference for depth=18 seems to be caused by m_b_r > over-allocating by 2x: internally it mallocs 2x of the size given to the > constructor, and then Linux pre-faults those extra pages, penalizing the > benchmark. It adds 11 bytes to the size given to the constructor (for its internal bookkeeping) and then rounds up to a power of two. Since the poolSize function actually returns sizeof(Node) more than it needs, and sizeof(Node) > 11, the overallocation should be avoidable by simply fixing poolSize to return the right value: int poolSize(int depth) { return ((1 << (depth + 1)) - 1) * sizeof(Node); } The original function returns a power of two, but the code actually creates an odd number of nodes (there is only one node at depth zero, not two).