http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59078

--- Comment #2 from Povilas Kanapickas <tir5c3 at yahoo dot co.uk> ---
This would be very surprising, as a major use case for this instruction set
feature would be eliminated. Unfortunately autoincrements are not mentioned
anywhere in ARM documentation. However, I think a reasonable guess would be
that ARM has implemented a special 'base register cache' in the memory
subsystem that hides any additional latency auto-increments might cause
otherwise.

Anyway, my benchmarks on Cortex-A9 confirm that breaking dependencies between
subsequent vector store instructions does not improve performance.

Reply via email to