Hi all, This patch reimplements the vshrn_n* intrinsics to use RTL builtins. These perform a narrowing right shift.
Although the intrinsic generates the half-width mode (e.g. V8HI -> V8QI), the
new pattern
generates a full 128-bit mode (V8HI -> V16QI) by representing the
fill-with-zeroes semantics
of the SHRN instruction. The narrower (V8QI) result is extracted with a lowpart
subreg.
I found this allows the RTL optimisers to do a better job at optimising
redundant moves away
in frequently-occurring SHRN+SRHN2 pairs, like in:
uint8x16_t
foo (uint16x8_t in1, uint16x8_t in2)
{
uint8x8_t tmp = vshrn_n_u16 (in2, 7);
uint8x16_t tmp2 = vshrn_high_n_u16 (tmp, in1, 4);
return tmp2;
}
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def (shrn): Define builtin.
* config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Define.
(aarch64_shrn<mode>_insn_be): Likewise.
(aarch64_shrn<mode>): Likewise.
* config/aarch64/arm_neon.h (vshrn_n_s16): Reimplement using builtins.
(vshrn_n_s32): Likewise.
(vshrn_n_s64): Likewise.
(vshrn_n_u16): Likewise.
(vshrn_n_u32): Likewise.
(vshrn_n_u64): Likewise.
* config/aarch64/iterators.md (vn_mode): New mode attribute.
vshrn.patch
Description: vshrn.patch
