On Mon, 13 Aug 2018, Jonathan Wakely wrote:
Thanks to Lars for the suggestions.
* libsupc++/new_opa.cc (operator new(size_t, align_val_t)): Use
__is_pow2 to check for valid alignment. Avoid branching when rounding
size to multiple of alignment.
Tested x86_64-linux, committed to trunk.
Are you getting better code with __is_pow2 on many platforms? As far as I
can tell from a quick look at the patch (I didn't actually test it, I
could be completely off), this replaces (x&(x-1))==0 with popcount(x)==1.
On a basic x86_64, popcount calls into libgcc, which doesn't seem so good.
On a more recent x86_64 (BMI1), x&(x-1) is a single instruction that sets
a flag when the result is 0, that's hard to beat.
Or was the goal to accept an alignment of 0, and not an optimization?
+ sz = (sz + align - 1) & ~(align - 1);
Note that gcc immediately replaces ~(align - 1) with -align. It does it
even if you compute align-1 on the previous line and write (sz+X)&~X in
the hope of sharing the subtraction.
--
Marc Glisse