On Mon, 21 Mar 2005, Uros Bizjak wrote: > Hello! > > >typedef float v4sf __attribute__((vector_size(16))); > >void foo(v4sf *a, v4sf *b, v4sf *c) > >{ > > *a = *b + *c; > >} > > > >we no longer (since 4.0) synthesize v2sf (aka sse) operations > >for f.i. -march=athlon (not that we were too successful at this > >in 3.4 - we generated horrible code instead). Instead for !sse2 > >architectures we generate standard i387 FP code (with some > >unnecessary temporaries, but reasonably well). > > > > > > > SSE _is_ v4sf. 'gcc -O2 -msse -S -fomit-frame-pointer' produces: > > foo: > movl 12(%esp), %eax > movaps (%eax), %xmm0 > movl 8(%esp), %eax > addps (%eax), %xmm0 > movl 4(%esp), %eax > movaps %xmm0, (%eax) > ret > > SSE2 is v2df. > > Athlon does not handle SSE insns.
Oh, so we used to expand to 3dnow? I see with gcc 3.4 produced: foo: pushl %ebp movl %esp, %ebp pushl %ebx subl $84, %esp movl 12(%ebp), %eax movl 16(%ebp), %edx [...] movq -64(%ebp), %mm0 movl %ebx, -72(%ebp) movl -36(%ebp), %ebx movl %ebx, -68(%ebp) pfadd -72(%ebp), %mm0 movq %mm0, -56(%ebp) movl 12(%eax), %eax etc. This doesn't happen anymore with 4.0/4.1. Richard. -- Richard Guenther <richard dot guenther at uni-tuebingen dot de> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/