While out benchmarking today, I ran across code similar to this: int *a; int *b; int *c;
const int ad[320]; const int bd[320]; const int cd[320]; void fill() { for (int i = 0; i < 320; i++) { a[i] = ad[i]; b[i] = bd[i]; c[i] = cd[i]; } } I was surprised and happy to see the vectoriser kick in for the copy. The inner loop looks like: add r5, r3, ip adds r4, r3, r7 vldmia r2!, {d16-d17} vldmia r1!, {d18-d19} adds r0, r3, r6 vst1.32 {q9}, [r5] vst1.32 {q8}, [r4] vldmia r3, {d16-d17} adds r3, r3, #16 cmp r3, r8 vst1.32 {q8}, [r0] bne .L3 so r3 is the loop variable and {ip,r7} are the offsets from r3 to the destination pointers. Adding a __restrict doesn't change the code. Richard, will your auto-inc/dec changes combine the final vldmia r3, add r3 into a vldmia r3! ? Changing the int *a into in-file arrays like int a[320] gives: vldmia r0!, {d16-d17} vldmia r5!, {d18-d19} vstmia r4!, {d18-d19} vstmia r1!, {d16-d17} vldmia r2!, {d16-d17} vstmia r3!, {d16-d17} cmp r3, r6 bne .L2 Marking them as extern int a[320] goes back to the first form. Can we always use the second form? What optimisation is preventing it? -- Michael _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain