Hi, This patch uses __builtin_memcpy to copy vector structures instead of using a union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h.
Add new code generation tests to verify that superfluous move instructions are not generated for the vst1q_x4 intrinsics. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-07-21 Jonathan Wright <jonathan.wri...@arm.com> * config/aarch64/arm_neon.h (vst1_s8_x4): Use __builtin_memcpy instead of using a union. (vst1q_s8_x4): Likewise. (vst1_s16_x4): Likewise. (vst1q_s16_x4): Likewise. (vst1_s32_x4): Likewise. (vst1q_s32_x4): Likewise. (vst1_u8_x4): Likewise. (vst1q_u8_x4): Likewise. (vst1_u16_x4): Likewise. (vst1q_u16_x4): Likewise. (vst1_u32_x4): Likewise. (vst1q_u32_x4): Likewise. (vst1_f16_x4): Likewise. (vst1q_f16_x4): Likewise. (vst1_f32_x4): Likewise. (vst1q_f32_x4): Likewise. (vst1_p8_x4): Likewise. (vst1q_p8_x4): Likewise. (vst1_p16_x4): Likewise. (vst1q_p16_x4): Likewise. (vst1_s64_x4): Likewise. (vst1_u64_x4): Likewise. (vst1_p64_x4): Likewise. (vst1q_s64_x4): Likewise. (vst1q_u64_x4): Likewise. (vst1q_p64_x4): Likewise. (vst1_f64_x4): Likewise. (vst1q_f64_x4): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vector_structure_intrinsics.c: Add new tests.
rb14697.patch
Description: rb14697.patch