Segher Boessenkool wrote:
* I would like to do the same for __builtin_ctz, but there is a catch.
The synthetic ctz sequence in terms of popcount (as presently
implemented by ia64.md, and potentially usable for at least i386 and
rs6000 as well if moved to optabs.c) produces the canonical behavior at
zero, but the synthetic sequence in terms of clz (as presently
implemented by optabs.c) produces the value -1 at zero.
I suppose you're using (assuming 32-bit)
ctz(x) := 31 - clz(x & -x)
now, which gives -1 for 0; and the version you're looking for is
ctz(x) := 32 - clz(~x & (x-1))
which gives 32 for 0.
Thanks! That's, unfortunately, one more instruction, although I guess a
lot of chips have "a & ~b" as one operation.
What does the popcount version look like? Never seen that before,
but I think it will be really expensive on PowerPC.
ctz(x) := popcount(~x & (x-1))
Just the same thing as your version of the ctz-as-clz operation, but
without the final adjustment. It looks like ~x & (x-1) turns any number
into 000...111... where the boundary between zeroes and ones lies at the
lowest 1 in the original.
Is popcount really slow on PowerPC? (Compared to clz?) Ideally one
would choose between the two expansions based on RTL costs, but the only
architectures it matters for are i386 and powerpc, and neither of them
define the cost of either clz or popcount.
zw