On Wed, Aug 15, 2007 at 11:55:02AM -0700, Zack Weinberg wrote:
> Joern Rennecke wrote:
> >The operation the nsb instruction of the SHmedia instruction set performs
> >is 'count number of sign bit copies'.
> >[...]
>
> It sounds like the SH should probably be lumped in with the x86 as not
> doing
> Segher Boessenkool writes:
>> Yes, but do we even create POPCOUNT rtx if the insn isn't
>> supported? Wouldn't we expand or create libcall early?
Segher> I don't know, there's only one way to find out... :-)
I did check. Didn't you?
David
I think the cost would be something like:
+case POPCOUNT:
+ *total = COSTS_N_INSNS (3);
+ return false;
Segher> Is that the cost when using popcountb? It is a lot more
Segher> expensive when that instruction isn't available (like on
Segher> most current machines).
Yes, bu
> Segher Boessenkool writes:
>> I think the cost would be something like:
>> +case POPCOUNT:
>> + *total = COSTS_N_INSNS (3);
>> + return false;
Segher> Is that the cost when using popcountb? It is a lot more
Segher> expensive when that instruction isn't available (like on
Segh
I think the cost would be something like:
+case POPCOUNT:
+ *total = COSTS_N_INSNS (3);
+ return false;
Is that the cost when using popcountb? It is a lot more
expensive when that instruction isn't available (like on
most current machines).
The rest (i.e. CLZ, CTZ) loo
I think the cost would be something like:
Index: rs6000.c
===
--- rs6000.c(revision 127484)
+++ rs6000.c(working copy)
@@ -20292,10 +20292,15 @@
*total += COSTS_N_INSNS (2);
return false;
+case CTZ
I suppose you're using (assuming 32-bit)
ctz(x) := 31 - clz(x & -x)
now, which gives -1 for 0; and the version you're looking for is
ctz(x) := 32 - clz(~x & (x-1))
which gives 32 for 0.
Thanks! That's, unfortunately, one more instruction, although I guess
a lot of chips have "a & ~b" a
> Zack Weinberg writes:
Zack> Makes sense. I don't suppose I could persuade you to teach rs6000
Zack> RTX_COSTS about clz and popcount...?
Sure. It's not that difficult to add to the table.
David
>
> Is popcount really slow on PowerPC? (Compared to clz?) Ideally one
> would choose between the two expansions based on RTL costs, but the only
> architectures it matters for are i386 and powerpc, and neither of them
> define the cost of either clz or popcount.
Of course adding a popcount
Joern Rennecke wrote:
The score, sh and sparc instructions may or may not display canonical
behavior; their ports do not define CLZ_DEFINED_VALUE_AT_ZERO and I was
not able to find documentation of the relevant instruction.
The operation the nsb instruction of the SHmedia instruction set perfor
Andrew Pinski wrote:
On 8/15/07, Zack Weinberg <[EMAIL PROTECTED]> wrote:
Is popcount really slow on PowerPC? (Compared to clz?)
popcount is really popcount in bytes and then you do a multiple to get
the real popcount. This is why it is slower than count leading zeros.
Also popcount does not
On 8/15/07, Zack Weinberg <[EMAIL PROTECTED]> wrote:
> Is popcount really slow on PowerPC? (Compared to clz?)
popcount is really popcount in bytes and then you do a multiple to get
the real popcount. This is why it is slower than count leading zeros.
Also popcount does not exist in most powerpc'
Segher Boessenkool wrote:
* I would like to do the same for __builtin_ctz, but there is a catch.
The synthetic ctz sequence in terms of popcount (as presently
implemented by ia64.md, and potentially usable for at least i386 and
rs6000 as well if moved to optabs.c) produces the canonical behavior
> The score, sh and sparc instructions may or may not display canonical
> behavior; their ports do not define CLZ_DEFINED_VALUE_AT_ZERO and I was
> not able to find documentation of the relevant instruction.
The operation the nsb instruction of the SHmedia instruction set performs
is 'count number
I suppose you're using (assuming 32-bit)
ctz(x) := 31 - clz(x & -x)
now, which gives -1 for 0; and the version you're looking for is
ctz(x) := 32 - clz(~x & (x-1))
which gives 32 for 0.
(Straight from the venerable PowerPC Compiler Writer's Guide, btw).
What does the popcount
* I would like to do the same for __builtin_ctz, but there is a catch.
The synthetic ctz sequence in terms of popcount (as presently
implemented by ia64.md, and potentially usable for at least i386 and
rs6000 as well if moved to optabs.c) produces the canonical behavior at
zero, but the synthetic
Richard Kenner wrote:
* Since no one uses it, we rip out all support for the ffs pattern and
expression.
There's an ffs builtin! How do we know who uses it?
I am not proposing to remove the built-in (i.e. the language visible
__builtin_ffs() function); only the RTL expression (ffs:MODE ...)
> * Since no one uses it, we rip out all support for the ffs pattern and
> expression.
There's an ffs builtin! How do we know who uses it?
Moreover, expmed uses it as an option in expanding some comparisons.
During development of the patch I just posted for double-word clz, I
went through all the back ends and audited their use of the bit-scan
named patterns and RTL. It appears to me that our current handling of
C[LT]Z_DEFINED_VALUE_AT_ZERO is much more complicated than it needs to
be, and also that
19 matches
Mail list logo