https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #11 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #9)
> The most pronounced difference for depth=18 seems to be caused by m_b_r
> over-allocating by 2x: internally it mallocs 2x of the size given to the
> constructor, and then Linux pre-faults those extra pages, penalizing the
> benchmark.

It adds 11 bytes to the size given to the constructor (for its internal
bookkeeping) and then rounds up to a power of two.

Since the poolSize function actually returns sizeof(Node) more than it needs,
and sizeof(Node) > 11, the overallocation should be avoidable by simply fixing
poolSize to return the right value:

int poolSize(int depth)
{
  return ((1 << (depth + 1)) - 1) * sizeof(Node);
}

The original function returns a power of two, but the code actually creates an
odd number of nodes (there is only one node at depth zero, not two).

Reply via email to