> -----Original Message-----
> From: Jonathan Wright <jonathan.wri...@arm.com>
> Sent: 23 July 2021 10:42
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <kyrylo.tkac...@arm.com>; Richard Sandiford
> <richard.sandif...@arm.com>
> Subject: [PATCH 8/8] aarch64: Use memcpy to copy vector tables in
> vst1[q]_x4 intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> using a union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h.
> 
> Add new code generation tests to verify that superfluous move
> instructions are not generated for the vst1q_x4 intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok, good to see that this approach avoids the superfluous moves.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-21  Jonathan Wright  <jonathan.wri...@arm.com>
> 
>       * config/aarch64/arm_neon.h (vst1_s8_x4): Use
>       __builtin_memcpy instead of using a union.
>       (vst1q_s8_x4): Likewise.
>       (vst1_s16_x4): Likewise.
>       (vst1q_s16_x4): Likewise.
>       (vst1_s32_x4): Likewise.
>       (vst1q_s32_x4): Likewise.
>       (vst1_u8_x4): Likewise.
>       (vst1q_u8_x4): Likewise.
>       (vst1_u16_x4): Likewise.
>       (vst1q_u16_x4): Likewise.
>       (vst1_u32_x4): Likewise.
>       (vst1q_u32_x4): Likewise.
>       (vst1_f16_x4): Likewise.
>       (vst1q_f16_x4): Likewise.
>       (vst1_f32_x4): Likewise.
>       (vst1q_f32_x4): Likewise.
>       (vst1_p8_x4): Likewise.
>       (vst1q_p8_x4): Likewise.
>       (vst1_p16_x4): Likewise.
>       (vst1q_p16_x4): Likewise.
>       (vst1_s64_x4): Likewise.
>       (vst1_u64_x4): Likewise.
>       (vst1_p64_x4): Likewise.
>       (vst1q_s64_x4): Likewise.
>       (vst1q_u64_x4): Likewise.
>       (vst1q_p64_x4): Likewise.
>       (vst1_f64_x4): Likewise.
>       (vst1q_f64_x4): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
>       tests.

Reply via email to