While benchmarking the auto-vectoriser on Libav, I noticed a performance
regression in gcc 4.7 (both FSF and Linaro) compared to gcc 4.6 in the AAC
decoder. I narrowed it down to this function:
static void ps_hybrid_analysis_ileave_c(float (*out)[32][2],
float L[2][38][64],
int i, int len)
{
int j;
for (; i < 64; i++) {
for (j = 0; j < len; j++) {
out[i][j][0] = L[0][j][i];
out[i][j][1] = L[1][j][i];
}
}
}
While gcc 4.6 does not attempt to vectorise this at all, 4.7 goes crazy
with a massive slowdown, about 20x slower than non-vectorised with Linaro
4.7 and much worse with FSF 4.7.
Let me know if you need more information.
--
Mans Rullgard / mru
_______________________________________________
linaro-toolchain mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-toolchain