The vectoriser can handle interleaved loads such as:
for (int i = 0; i < N; i++)
res[i] = a[2 * i] + a[2 * i + 1];
The vectorised code loads two consecutive vectors from A, then permutes
the elements. It can handle stores in a similar way.
This patch series adds support for load and store instructions that have
the interleaving "built in", such as NEON's vldN and vstN. The series
is based on the outline here:
http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
except that I'm now using "internal" functions rather than built-ins.
I'll update my internal function patch:
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00609.html
after Richard's recent changes and retest, but the patches in this
series are unaffected.
Richard