http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59078
--- Comment #2 from Povilas Kanapickas <tir5c3 at yahoo dot co.uk> --- This would be very surprising, as a major use case for this instruction set feature would be eliminated. Unfortunately autoincrements are not mentioned anywhere in ARM documentation. However, I think a reasonable guess would be that ARM has implemented a special 'base register cache' in the memory subsystem that hides any additional latency auto-increments might cause otherwise. Anyway, my benchmarks on Cortex-A9 confirm that breaking dependencies between subsequent vector store instructions does not improve performance.