Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vtbx4 Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for every register extraction/set in this additional structure.
Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-07-19 Jonathan Wright <jonathan.wri...@arm.com> * config/aarch64/arm_neon.h (vtbx4_s8): Use __builtin_memcpy instead of constructing __builtin_aarch64_simd_oi one vector at a time. (vtbx4_u8): Likewise. (vtbx4_p8): Likewise.
rb14674.patch
Description: rb14674.patch