While out benchmarking today, I ran across code similar to this:

int *a;
int *b;
int *c;

const int ad[320];
const int bd[320];
const int cd[320];

void fill()
{
  for (int i = 0; i < 320; i++)
    {
      a[i] = ad[i];
      b[i] = bd[i];
      c[i] = cd[i];
    }
}

I was surprised and happy to see the vectoriser kick in for the copy.
The inner loop looks like:

        add     r5, r3, ip
        adds    r4, r3, r7
        vldmia  r2!, {d16-d17}
        vldmia  r1!, {d18-d19}
        adds    r0, r3, r6
        vst1.32 {q9}, [r5]
        vst1.32 {q8}, [r4]
        vldmia  r3, {d16-d17}
        adds    r3, r3, #16
        cmp     r3, r8
        vst1.32 {q8}, [r0]
        bne     .L3

so r3 is the loop variable and {ip,r7} are the offsets from r3 to the
destination pointers.  Adding a __restrict doesn't change the code.

Richard, will your auto-inc/dec changes combine the final vldmia r3,
add r3 into a vldmia r3! ?

Changing the int *a into in-file arrays like int a[320] gives:

        vldmia  r0!, {d16-d17}
        vldmia  r5!, {d18-d19}
        vstmia  r4!, {d18-d19}
        vstmia  r1!, {d16-d17}
        vldmia  r2!, {d16-d17}
        vstmia  r3!, {d16-d17}
        cmp     r3, r6
        bne     .L2

Marking them as extern int a[320] goes back to the first form.

Can we always use the second form?  What optimisation is preventing it?

-- Michael

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to