Hi,

As subject, this patch uses a union instead of constructing a new opaque
vector structure for each of the vqtbl[234] Neon intrinsics in arm_neon.h.
This simplifies the header file and also improves code generation -
superfluous move instructions were emitted for every register
extraction/set in this additional structure.

This change is safe because the C-level vector structure types e.g.
uint8x16x4_t already provide a tie for sequential register allocation
- which is required by the TBL instructions.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-08  Jonathan Wright  <jonathan.wri...@arm.com>

        * config/aarch64/arm_neon.h (vqtbl2_s8): Use union instead of
        additional __builtin_aarch64_simd_oi structure.
        (vqtbl2_u8): Likewise.
        (vqtbl2_p8): Likewise.
        (vqtbl2q_s8): Likewise.
        (vqtbl2q_u8): Likewise.
        (vqtbl2q_p8): Likewise.
        (vqtbl3_s8): Use union instead of additional
        __builtin_aarch64_simd_ci structure.
        (vqtbl3_u8): Likewise.
        (vqtbl3_p8): Likewise.
        (vqtbl3q_s8): Likewise.
        (vqtbl3q_u8): Likewise.
        (vqtbl3q_p8): Likewise.
        (vqtbl4_s8): Use union instead of additional
        __builtin_aarch64_simd_xi structure.
        (vqtbl4_u8): Likewise.
        (vqtbl4_p8): Likewise.
        (vqtbl4q_s8): Likewise.
        (vqtbl4q_u8): Likewise.
        (vqtbl4q_p8): Likewise.

Attachment: rb14639.patch
Description: rb14639.patch

Reply via email to