Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread Joern Rennecke
On Wed, Aug 15, 2007 at 11:55:02AM -0700, Zack Weinberg wrote: > Joern Rennecke wrote: > >The operation the nsb instruction of the SHmedia instruction set performs > >is 'count number of sign bit copies'. > >[...] > > It sounds like the SH should probably be lumped in with the x86 as not > doing

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread David Edelsohn
> Segher Boessenkool writes: >> Yes, but do we even create POPCOUNT rtx if the insn isn't >> supported? Wouldn't we expand or create libcall early? Segher> I don't know, there's only one way to find out... :-) I did check. Didn't you? David

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread Segher Boessenkool
I think the cost would be something like: +case POPCOUNT: + *total = COSTS_N_INSNS (3); + return false; Segher> Is that the cost when using popcountb? It is a lot more Segher> expensive when that instruction isn't available (like on Segher> most current machines). Yes, bu

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread David Edelsohn
> Segher Boessenkool writes: >> I think the cost would be something like: >> +case POPCOUNT: >> + *total = COSTS_N_INSNS (3); >> + return false; Segher> Is that the cost when using popcountb? It is a lot more Segher> expensive when that instruction isn't available (like on Segh

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread Segher Boessenkool
I think the cost would be something like: +case POPCOUNT: + *total = COSTS_N_INSNS (3); + return false; Is that the cost when using popcountb? It is a lot more expensive when that instruction isn't available (like on most current machines). The rest (i.e. CLZ, CTZ) loo

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread David Edelsohn
I think the cost would be something like: Index: rs6000.c === --- rs6000.c(revision 127484) +++ rs6000.c(working copy) @@ -20292,10 +20292,15 @@ *total += COSTS_N_INSNS (2); return false; +case CTZ

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread Segher Boessenkool
I suppose you're using (assuming 32-bit) ctz(x) := 31 - clz(x & -x) now, which gives -1 for 0; and the version you're looking for is ctz(x) := 32 - clz(~x & (x-1)) which gives 32 for 0. Thanks! That's, unfortunately, one more instruction, although I guess a lot of chips have "a & ~b" a

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread David Edelsohn
> Zack Weinberg writes: Zack> Makes sense. I don't suppose I could persuade you to teach rs6000 Zack> RTX_COSTS about clz and popcount...? Sure. It's not that difficult to add to the table. David

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread Jan Hubicka
> > Is popcount really slow on PowerPC? (Compared to clz?) Ideally one > would choose between the two expansions based on RTL costs, but the only > architectures it matters for are i386 and powerpc, and neither of them > define the cost of either clz or popcount. Of course adding a popcount

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread Zack Weinberg
Joern Rennecke wrote: The score, sh and sparc instructions may or may not display canonical behavior; their ports do not define CLZ_DEFINED_VALUE_AT_ZERO and I was not able to find documentation of the relevant instruction. The operation the nsb instruction of the SHmedia instruction set perfor

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread Zack Weinberg
Andrew Pinski wrote: On 8/15/07, Zack Weinberg <[EMAIL PROTECTED]> wrote: Is popcount really slow on PowerPC? (Compared to clz?) popcount is really popcount in bytes and then you do a multiple to get the real popcount. This is why it is slower than count leading zeros. Also popcount does not

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread Andrew Pinski
On 8/15/07, Zack Weinberg <[EMAIL PROTECTED]> wrote: > Is popcount really slow on PowerPC? (Compared to clz?) popcount is really popcount in bytes and then you do a multiple to get the real popcount. This is why it is slower than count leading zeros. Also popcount does not exist in most powerpc'

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-15 Thread Zack Weinberg
Segher Boessenkool wrote: * I would like to do the same for __builtin_ctz, but there is a catch. The synthetic ctz sequence in terms of popcount (as presently implemented by ia64.md, and potentially usable for at least i386 and rs6000 as well if moved to optabs.c) produces the canonical behavior

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-13 Thread Joern Rennecke
> The score, sh and sparc instructions may or may not display canonical > behavior; their ports do not define CLZ_DEFINED_VALUE_AT_ZERO and I was > not able to find documentation of the relevant instruction. The operation the nsb instruction of the SHmedia instruction set performs is 'count number

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-12 Thread Segher Boessenkool
I suppose you're using (assuming 32-bit) ctz(x) := 31 - clz(x & -x) now, which gives -1 for 0; and the version you're looking for is ctz(x) := 32 - clz(~x & (x-1)) which gives 32 for 0. (Straight from the venerable PowerPC Compiler Writer's Guide, btw). What does the popcount

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-12 Thread Segher Boessenkool
* I would like to do the same for __builtin_ctz, but there is a catch. The synthetic ctz sequence in terms of popcount (as presently implemented by ia64.md, and potentially usable for at least i386 and rs6000 as well if moved to optabs.c) produces the canonical behavior at zero, but the synthetic

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-11 Thread Zack Weinberg
Richard Kenner wrote: * Since no one uses it, we rip out all support for the ffs pattern and expression. There's an ffs builtin! How do we know who uses it? I am not proposing to remove the built-in (i.e. the language visible __builtin_ffs() function); only the RTL expression (ffs:MODE ...)

Re: RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-11 Thread Richard Kenner
> * Since no one uses it, we rip out all support for the ffs pattern and > expression. There's an ffs builtin! How do we know who uses it? Moreover, expmed uses it as an option in expanding some comparisons.

RFC: Simplify rules for ctz/clz patterns and RTL

2007-08-10 Thread Zack Weinberg
During development of the patch I just posted for double-word clz, I went through all the back ends and audited their use of the bit-scan named patterns and RTL. It appears to me that our current handling of C[LT]Z_DEFINED_VALUE_AT_ZERO is much more complicated than it needs to be, and also that