https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
--- Comment #6 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- (In reply to Joel Holdsworth from comment #5) > I found that if I make modified versions of the intrinsics in arm_neon.h > that are designed more along the lines of the x86_64 SSE intrinsics defined > with a simple pointer dereference, then gcc does the right thing [1]. > > > #include <arm_neon.h> > > __extension__ extern __inline void > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vst1q_s32_fixed (int32_t * __a, int32x4_t __b) > { > *(int32x4_t*)__a = __b; > } > > __extension__ extern __inline int32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vld1q_s32_fixed (const int32_t * __a) > { > return *(const int32x4_t*)__a; > } > > int32x4_t foo(int32x4_t a) > { > int32_t temp[4]; > vst1q_s32_fixed(temp, a); > return vld1q_s32_fixed(temp); > } > > > > ...compiles to: > > foo(long __vector(4)): > bx lr > > > Is there any reason not to simply redefine vst1q_s32, vld1q_s32 and friends > to stop using builtins? > Did you test it with big-endian?