[Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics

ramana at gcc dot gnu.org Tue, 05 Oct 2010 00:16:55 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725


--- Comment #4 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2010-10-05 
07:16:35 UTC ---
(In reply to comment #2)
> (In reply to comment #1)
> > So the compiler is correct not to be using vld1 for this code.  The memory
> > format of int32x4_t is defined to be the format of a neon register that has
> > been filled from an array of int32 values and then stored to memory using 
> > VSTM
> > (or equivalent sequence).  The implication of all this is that int32x4_t 
> > does
> > not (necessarily) have the same memory layout as int32_t[4].
> 
> Could you elaborate on this? Specifically about the case when memory format 
> for
> VSTM and VST1 may differ.
> 
> I thought that VST1 instruction could be always used as a replacement for 
> VSTM,
> it is just a little bit less convenient in some cases because it is lacking
> some more advanced addressing modes. Moreover, VSTM is VFP instruction and 
> VST1
> is NEON one. So I guess mixing VSTM with true NEON instructions may be
> additionally a bad idea (for performance reasons on Cortex-A9 or other
> processors?).

The ARM ARM states that VLDM / VSTM and VLDR / VSTR for 64 bit values are
compliant with VFPv2 / VFPv3 and advanced SIMD i.e. they can be executed by
both the units . Thus there should be no performance regressions on the A9
AFAIK for VLDM and VSTM / VLDR and VSTR of 64 bit registers interleaved with
other Neon instructions. 


cheers
Ramana

[Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics

Reply via email to