Re: RFA: pervasive SSE codegen inefficiency

2005-09-20 Thread Dale Johannesen
On Sep 19, 2005, at 9:15 PM, Richard Henderson wrote: On Mon, Sep 19, 2005 at 05:33:54PM -0700, Dale Johannesen wrote: Do you have any constructive suggestions for how the RA might be fixed, then? Short term? No. But I don't see this as a short term problem. OK. Unfortunately, it is a sh

Re: RFA: pervasive SSE codegen inefficiency

2005-09-20 Thread Daniel Berlin
On Tue, 2005-09-20 at 15:44 +0200, Giovanni Bajo wrote: > Daniel Berlin <[EMAIL PROTECTED]> wrote: > > > For example, Kenny and I discovered during his prespilling work that the > > liveness is actually calculated wrong. > > > > It's half-forwards (local), half-backwards (globally), instead of all

Re: RFA: pervasive SSE codegen inefficiency

2005-09-20 Thread Giovanni Bajo
Daniel Berlin <[EMAIL PROTECTED]> wrote: > For example, Kenny and I discovered during his prespilling work that the > liveness is actually calculated wrong. > > It's half-forwards (local), half-backwards (globally), instead of all > backwards, which is how liveness is normally calculated, so we >

Re: RFA: pervasive SSE codegen inefficiency

2005-09-20 Thread Paolo Bonzini
So basically, pick a problem you see, and fix it. The RTL infrastructure is exceptionally good at doing some things, and exceptionally bad at doing some others. Sometimes, take into account the coding style and it is good and bad at the same time. :-( CSE, flow, etc. come to mind. All

Re: RFA: pervasive SSE codegen inefficiency

2005-09-19 Thread Richard Henderson
On Mon, Sep 19, 2005 at 05:33:54PM -0700, Dale Johannesen wrote: > Do you have any constructive suggestions for how the RA might be fixed, > then? Short term? No. But I don't see this as a short term problem. r~

Re: RFA: pervasive SSE codegen inefficiency

2005-09-19 Thread Daniel Berlin
On Mon, 2005-09-19 at 17:33 -0700, Dale Johannesen wrote: > On Sep 19, 2005, at 5:30 PM, Richard Henderson wrote: > >> (define_insn "*addmixed3" > >> [(set (match_operand:V2DI 0 "register_operand" "=x") > >>(subreg:V2DI (plus:SSEMODE124 > >> (match_operand:SSEMODE124 2 "nonimmediate_oper

Re: RFA: pervasive SSE codegen inefficiency

2005-09-19 Thread Dale Johannesen
On Sep 19, 2005, at 5:30 PM, Richard Henderson wrote: (define_insn "*addmixed3" [(set (match_operand:V2DI 0 "register_operand" "=x") (subreg:V2DI (plus:SSEMODE124 (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm") (subreg:SSEMODE124 (match_operand:V2DI 1 "nonimmediat

Re: RFA: pervasive SSE codegen inefficiency

2005-09-19 Thread Richard Henderson
On Mon, Sep 19, 2005 at 05:19:20PM -0700, Dale Johannesen wrote: > (Although just which subregs are safe to look under will require more > attention than I've given it, if we want this in.) Look at MODES_TIEABLE or something. > Really I don't think this is an RA problem at all. You don't?!? Ple

Re: RFA: pervasive SSE codegen inefficiency

2005-09-19 Thread Dale Johannesen
Just to review, the second function here was the problem: (-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2) #include __m128i foo3(__m128i z, __m128i a, int N) { int i; for (i=0; i where the inner loop compiles to movdqa %xmm2, %xmm0 paddw %xmm1, %xmm0

Re: RFA: pervasive SSE codegen inefficiency

2005-09-15 Thread Richard Henderson
On Thu, Sep 15, 2005 at 11:07:23AM -0700, Dale Johannesen wrote: > Having a more uniform representation for operations on __m128i > objects would simplify things all over the place, though. For some definition of "simplify" that doesn't actually make things simpler when it comes to the autovector

Re: RFA: pervasive SSE codegen inefficiency

2005-09-15 Thread Dale Johannesen
On Sep 14, 2005, at 9:50 PM, Andrew Pinski wrote: On Sep 14, 2005, at 9:21 PM, Dale Johannesen wrote: Consider the following SSE code (-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2) <4256776a.c> The first inner loop compiles to paddq %xmm0, %xmm1 Good. The second compile

Re: RFA: pervasive SSE codegen inefficiency

2005-09-14 Thread Andrew Pinski
On Sep 14, 2005, at 9:21 PM, Dale Johannesen wrote: Consider the following SSE code (-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2) <4256776a.c> The first inner loop compiles to paddq %xmm0, %xmm1 Good. The second compiles to movdqa %xmm2, %xmm0 paddw