http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725
--- Comment #4 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2010-10-05 07:16:35 UTC --- (In reply to comment #2) > (In reply to comment #1) > > So the compiler is correct not to be using vld1 for this code. The memory > > format of int32x4_t is defined to be the format of a neon register that has > > been filled from an array of int32 values and then stored to memory using > > VSTM > > (or equivalent sequence). The implication of all this is that int32x4_t > > does > > not (necessarily) have the same memory layout as int32_t[4]. > > Could you elaborate on this? Specifically about the case when memory format > for > VSTM and VST1 may differ. > > I thought that VST1 instruction could be always used as a replacement for > VSTM, > it is just a little bit less convenient in some cases because it is lacking > some more advanced addressing modes. Moreover, VSTM is VFP instruction and > VST1 > is NEON one. So I guess mixing VSTM with true NEON instructions may be > additionally a bad idea (for performance reasons on Cortex-A9 or other > processors?). The ARM ARM states that VLDM / VSTM and VLDR / VSTR for 64 bit values are compliant with VFPv2 / VFPv3 and advanced SIMD i.e. they can be executed by both the units . Thus there should be no performance regressions on the A9 AFAIK for VLDM and VSTM / VLDR and VSTR of 64 bit registers interleaved with other Neon instructions. cheers Ramana