[ACTIVITY] Weekly status

2011-08-28 Thread Revital Eres
Continue looking at Richard's micro benchmarks w.r.t SMS.
Wrote a new version to the patch to support instructions with
REG_INC_NOTE in SMS.
(http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01216.html)

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Vectorised copy

2011-08-28 Thread Michael Hope
While out benchmarking today, I ran across code similar to this:

int *a;
int *b;
int *c;

const int ad[320];
const int bd[320];
const int cd[320];

void fill()
{
  for (int i = 0; i < 320; i++)
{
  a[i] = ad[i];
  b[i] = bd[i];
  c[i] = cd[i];
}
}

I was surprised and happy to see the vectoriser kick in for the copy.
The inner loop looks like:

add r5, r3, ip
addsr4, r3, r7
vldmia  r2!, {d16-d17}
vldmia  r1!, {d18-d19}
addsr0, r3, r6
vst1.32 {q9}, [r5]
vst1.32 {q8}, [r4]
vldmia  r3, {d16-d17}
addsr3, r3, #16
cmp r3, r8
vst1.32 {q8}, [r0]
bne .L3

so r3 is the loop variable and {ip,r7} are the offsets from r3 to the
destination pointers.  Adding a __restrict doesn't change the code.

Richard, will your auto-inc/dec changes combine the final vldmia r3,
add r3 into a vldmia r3! ?

Changing the int *a into in-file arrays like int a[320] gives:

vldmia  r0!, {d16-d17}
vldmia  r5!, {d18-d19}
vstmia  r4!, {d18-d19}
vstmia  r1!, {d16-d17}
vldmia  r2!, {d16-d17}
vstmia  r3!, {d16-d17}
cmp r3, r6
bne .L2

Marking them as extern int a[320] goes back to the first form.

Can we always use the second form?  What optimisation is preventing it?

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain