Hi, As subject, this patch uses a union instead of constructing a new opaque vector structure for each of the vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for every register extraction/set in this additional structure.
This change is safe because the C-level vector structure types e.g. uint8x16x4_t already provide a tie for sequential register allocation - which is required by the TBL instructions. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-07-08 Jonathan Wright <jonathan.wri...@arm.com> * config/aarch64/arm_neon.h (vqtbl2_s8): Use union instead of additional __builtin_aarch64_simd_oi structure. (vqtbl2_u8): Likewise. (vqtbl2_p8): Likewise. (vqtbl2q_s8): Likewise. (vqtbl2q_u8): Likewise. (vqtbl2q_p8): Likewise. (vqtbl3_s8): Use union instead of additional __builtin_aarch64_simd_ci structure. (vqtbl3_u8): Likewise. (vqtbl3_p8): Likewise. (vqtbl3q_s8): Likewise. (vqtbl3q_u8): Likewise. (vqtbl3q_p8): Likewise. (vqtbl4_s8): Use union instead of additional __builtin_aarch64_simd_xi structure. (vqtbl4_u8): Likewise. (vqtbl4_p8): Likewise. (vqtbl4q_s8): Likewise. (vqtbl4q_u8): Likewise. (vqtbl4q_p8): Likewise.
rb14639.patch
Description: rb14639.patch