On 11 June 2012 17:34, Ulrich Weigand <ulrich.weig...@de.ibm.com> wrote: > Mans Rullgard <mans.rullg...@linaro.org> wrote: > >> static void ps_hybrid_analysis_ileave_c(float (*out)[32][2], >> float L[2][38][64], >> int i, int len) >> { >> int j; >> >> for (; i < 64; i++) { >> for (j = 0; j < len; j++) { >> out[i][j][0] = L[0][j][i]; >> out[i][j][1] = L[1][j][i]; >> } >> } >> } >> >> While gcc 4.6 does not attempt to vectorise this at all, 4.7 goes crazy >> with a massive slowdown, about 20x slower than non-vectorised with Linaro >> 4.7 and much worse with FSF 4.7. >> >> Let me know if you need more information. > > Thanks for the report; I can reproduce the problem. > > There's a number of issues with how GCC choses the vectorize this loop > that we can potentially improve upon. However, it would appear that no > matter what, it probably isn't actually helpful to try to vectorize this > loop in the first place.
It could be beneficial to merge the stores into a single 64-bit store. In this particular case, it is actually 64-bit aligned, although there's no way for gcc to know this. > Fortunately, the vectorizer cost model clearly recognizes this fact (and > will classify this loop as "not vectorized: vector version will never be > profitable"). > > Unfortunately, it seems that on ARM, the cost model is actually off by > default (it is enabled by default only on i386). > > We'll have to enable the cost model on ARM by default as well (and > probably tune it a bit to avoid regresssions on other benchmarks). > > However for now, I'd recommend you use -fvect-cost-model when testing > the vectorizer on libav. I'll add that flag and see what happens. Any other flags I should be using? -- Mans Rullgard / mru _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain