Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst1[q]_x2 Neon intrinsics in arm_neon.h. This simplifies the header
file and also improves code generation - superfluous move
instructions were emitted for every register extraction/set in this
additional structure.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x2 intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-23  Jonathan Wright  <jonathan.wri...@arm.com>

        * config/aarch64/arm_neon.h (vst1_s64_x2): Use
        __builtin_memcpy instead of constructing
        __builtin_aarch64_simd_oi one vector at a time.
        (vst1_u64_x2): Likewise.
        (vst1_f64_x2): Likewise.
        (vst1_s8_x2): Likewise.
        (vst1_p8_x2): Likewise.
        (vst1_s16_x2): Likewise.
        (vst1_p16_x2): Likewise.
        (vst1_s32_x2): Likewise.
        (vst1_u8_x2): Likewise.
        (vst1_u16_x2): Likewise.
        (vst1_u32_x2): Likewise.
        (vst1_f16_x2): Likewise.
        (vst1_f32_x2): Likewise.
        (vst1_p64_x2): Likewise.
        (vst1q_s8_x2): Likewise.
        (vst1q_p8_x2): Likewise.
        (vst1q_s16_x2): Likewise.
        (vst1q_p16_x2): Likewise.
        (vst1q_s32_x2): Likewise.
        (vst1q_s64_x2): Likewise.
        (vst1q_u8_x2): Likewise.
        (vst1q_u16_x2): Likewise.
        (vst1q_u32_x2): Likewise.
        (vst1q_u64_x2): Likewise.
        (vst1q_f16_x2): Likewise.
        (vst1q_f32_x2): Likewise.
        (vst1q_f64_x2): Likewise.
        (vst1q_p64_x2): Likewise.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
        tests.

Attachment: rb14701.patch
Description: rb14701.patch

Reply via email to