On 8 August 2012 16:24, Ulrich Weigand <ulrich.weig...@de.ibm.com> wrote: > > Hello, > > I've had a look at the mp3player performance regressions (just with *some* > data sets) with the vector-alignment patch. Interestingly it turns out > that the patch basically does not change the generated code for the hot > spot (inv_mdct routine) at all. (The *only* change is which bits of the > incoming pointer the run-time alignment check generated by the vectorizer > tests for. But this has no practical consequences, since the check itself > is not hot, and the *decision* made by the check is the same anyway -- > everything is in fact properly aligned at runtime.) > > The other difference, outside of code, introduced by the vector-alignment > patch is that some global arrays used to be forcibly aligned to 16 bytes by > the vectorizer, and they are now only aligned to 8 bytes. To check whether > this makes a difference, I've modified the compiler as a hack to always > force all global arrays to be 16 byte aligned. And interestingly enough, > this appears to fix this particular performance regression ... > > Any thoughts as to why this might be the case? What are the > recommendations on the ARM hardware side as to what alignment is prefered?
This suggests to me that the data layout changes now mean that some of the loads hit two-cache lines (for 8-byte alignment) as opposed to one (for 16-byte alignment) The naive comment is that everything should be aligned to 32-bytes for A9 (as that is the A9 cache line size: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388-/CIHGGJAB.html). However for loads of less than 32-bytes alignment to the load-size is OK (as that shouldn't cross a cache line boundary). Thanks, Matt -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-d...@linaro.org _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain