Re: Useless vectorization of small loops

Richard Guenther Mon, 21 Mar 2005 04:55:49 -0800

On Mon, 21 Mar 2005 13:45:19 +0100 (CET), Richard Guenther
<[EMAIL PROTECTED]> wrote:
> Hi!
> 
> On mainline we now use loop versioning and peeling for alignment
> for the following loop (-march=pentium4):
> 
> void foo3(float * __restrict__ a, float * __restrict__ b,
>           float * __restrict__ c)
> {
>         int i;
>         for (i=0; i<4; ++i)
>                 a[i] = b[i] + c[i];
> }
> 
> which results only in slower and larger code.  I also cannot
> see why we zero the mm registers before loading and why we
> load them high/low separated:
> 
> .L13:
>         xorps   %xmm1, %xmm1
>         movlps  (%edx,%esi), %xmm1
>         movhps  8(%edx,%esi), %xmm1
>         xorps   %xmm0, %xmm0
>         movlps  (%edx,%ebx), %xmm0
>         movhps  8(%edx,%ebx), %xmm0
>         addps   %xmm0, %xmm1
>         movaps  %xmm1, (%edx,%eax)
>         addl    $1, %ecx
>         addl    $16, %edx
>         cmpl    %ecx, -16(%ebp)
>         ja      .L13
> 
> but the point is, there is nothing to win vectorizing the loop
> in the first place if we do not know alignment before.


Uh, and with -funroll-loops we seem to be lost completely, as we
produce peeling/loops for a eight times four rolling loop!  Where is
the information about the loop counter gone??

It looks like vectorization interacts badly with the rest of the loop
optimizers.

Ugh.

Richard.

Re: Useless vectorization of small loops

Reply via email to