On Mon, 21 Mar 2005, Uros Bizjak wrote:

> Hello!
>
> >typedef float v4sf __attribute__((vector_size(16)));
> >void foo(v4sf *a, v4sf *b, v4sf *c)
> >{
> >        *a = *b + *c;
> >}
> >
> >we no longer (since 4.0) synthesize v2sf (aka sse) operations
> >for f.i. -march=athlon (not that we were too successful at this
> >in 3.4 - we generated horrible code instead).  Instead for !sse2
> >architectures we generate standard i387 FP code (with some
> >unnecessary temporaries, but reasonably well).
> >
> >
> >
> SSE _is_ v4sf. 'gcc -O2 -msse -S -fomit-frame-pointer' produces:
>
> foo:
>         movl    12(%esp), %eax
>         movaps  (%eax), %xmm0
>         movl    8(%esp), %eax
>         addps   (%eax), %xmm0
>         movl    4(%esp), %eax
>         movaps  %xmm0, (%eax)
>         ret
>
> SSE2 is v2df.
>
> Athlon does not handle SSE insns.

Oh, so we used to expand to 3dnow?  I see with gcc 3.4 produced:

foo:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        subl    $84, %esp
        movl    12(%ebp), %eax
        movl    16(%ebp), %edx
[...]
        movq    -64(%ebp), %mm0
        movl    %ebx, -72(%ebp)
        movl    -36(%ebp), %ebx
        movl    %ebx, -68(%ebp)
        pfadd   -72(%ebp), %mm0
        movq    %mm0, -56(%ebp)
        movl    12(%eax), %eax
etc.

This doesn't happen anymore with 4.0/4.1.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

Reply via email to