On 8 August 2012 16:24, Ulrich Weigand <ulrich.weig...@de.ibm.com> wrote:
>
> Hello,
>
> I've had a look at the mp3player performance regressions (just with *some*
> data sets) with the vector-alignment patch.  Interestingly it turns out
> that the patch basically does not change the generated code for the hot
> spot (inv_mdct routine) at all.  (The *only* change is which bits of the
> incoming pointer the run-time alignment check generated by the vectorizer
> tests for.  But this has no practical consequences, since the check itself
> is not hot, and the *decision* made by the check is the same anyway --
> everything is in fact properly aligned at runtime.)
>
> The other difference, outside of code, introduced by the vector-alignment
> patch is that some global arrays used to be forcibly aligned to 16 bytes by
> the vectorizer, and they are now only aligned to 8 bytes.  To check whether
> this makes a difference, I've modified the compiler as a hack to always
> force all global arrays to be 16 byte aligned.   And interestingly enough,
> this appears to fix this particular performance regression ...
>
> Any thoughts as to why this might be the case?  What are the
> recommendations on the ARM hardware side as to what alignment is prefered?

This suggests to me that the data layout changes now mean that some of
the loads hit two-cache lines (for 8-byte alignment) as opposed to one
(for 16-byte alignment)

The naive comment is that everything should be aligned to 32-bytes for
A9 (as that is the A9 cache line size:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388-/CIHGGJAB.html).
 However for loads of less than 32-bytes alignment to the load-size is
OK (as that shouldn't cross a cache line boundary).

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to