Re: RFC: Simplify rules for ctz/clz patterns and RTL

Zack Weinberg Wed, 15 Aug 2007 11:36:03 -0700

Segher Boessenkool wrote:

* I would like to do the same for __builtin_ctz, but there is a catch.
The synthetic ctz sequence in terms of popcount (as presently
implemented by ia64.md, and potentially usable for at least i386 and
rs6000 as well if moved to optabs.c) produces the canonical behavior at
zero, but the synthetic sequence in terms of clz (as presently
implemented by optabs.c) produces the value -1 at zero.


I suppose you're using (assuming 32-bit)

    ctz(x) := 31 - clz(x & -x)

now, which gives -1 for 0; and the version you're looking for is

    ctz(x) := 32 - clz(~x & (x-1))

which gives 32 for 0.

Thanks! That's, unfortunately, one more instruction, although I guess alot of chips have "a & ~b" as one operation.

What does the popcount version look like?  Never seen that before,
but I think it will be really expensive on PowerPC.


  ctz(x) := popcount(~x & (x-1))

Just the same thing as your version of the ctz-as-clz operation, butwithout the final adjustment. It looks like ~x & (x-1) turns any numberinto 000...111... where the boundary between zeroes and ones lies at thelowest 1 in the original.

Is popcount really slow on PowerPC? (Compared to clz?) Ideally onewould choose between the two expansions based on RTL costs, but the onlyarchitectures it matters for are i386 and powerpc, and neither of themdefine the cost of either clz or popcount.

zw

Re: RFC: Simplify rules for ctz/clz patterns and RTL

Reply via email to