Segher Boessenkool wrote:
* I would like to do the same for __builtin_ctz, but there is a catch.
The synthetic ctz sequence in terms of popcount (as presently
implemented by ia64.md, and potentially usable for at least i386 and
rs6000 as well if moved to optabs.c) produces the canonical behavior at
zero, but the synthetic sequence in terms of clz (as presently
implemented by optabs.c) produces the value -1 at zero.

I suppose you're using (assuming 32-bit)

    ctz(x) := 31 - clz(x & -x)

now, which gives -1 for 0; and the version you're looking for is

    ctz(x) := 32 - clz(~x & (x-1))

which gives 32 for 0.

Thanks! That's, unfortunately, one more instruction, although I guess a lot of chips have "a & ~b" as one operation.

What does the popcount version look like?  Never seen that before,
but I think it will be really expensive on PowerPC.

  ctz(x) := popcount(~x & (x-1))

Just the same thing as your version of the ctz-as-clz operation, but without the final adjustment. It looks like ~x & (x-1) turns any number into 000...111... where the boundary between zeroes and ones lies at the lowest 1 in the original.

Is popcount really slow on PowerPC? (Compared to clz?) Ideally one would choose between the two expansions based on RTL costs, but the only architectures it matters for are i386 and powerpc, and neither of them define the cost of either clz or popcount.

zw

Reply via email to