https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
--- Comment #8 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- (In reply to Joel Holdsworth from comment #7) > > Did you test it with big-endian? > > Good question. It seems to do the right thing in both cases: > https://godbolt.org/z/7rDzAm foo2(long*, __simd128_int32_t): vst1.64 {d0-d1}, [r0:64] bx lr Well for big-endian that is wrong. You've got a vector of 32-bit elements but you're storing it as 64-bit elements, so when you look in memory you'll find the elements permuted.