This patch allows for more efficient SVE2 vectorization of Multiply High with Round and Scale (MULHRS) patterns.
The example snippet: uint16_t a[N], b[N], c[N]; void foo_round (void) { for (int i = 0; i < N; i++) a[i] = ((((int32_t)b[i] * (int32_t)c[i]) >> 14) + 1) >> 1; } ... previously vectorized to: foo_round: ... ptrue p0.s whilelo p1.h, wzr, w2 ld1h {z2.h}, p1/z, [x4, x0, lsl #1] ld1h {z0.h}, p1/z, [x3, x0, lsl #1] uunpklo z3.s, z2.h // uunpklo z1.s, z0.h // uunpkhi z2.s, z2.h // uunpkhi z0.s, z0.h // mul z1.s, p0/m, z1.s, z3.s // mul z0.s, p0/m, z0.s, z2.s // asr z1.s, z1.s, #14 // asr z0.s, z0.s, #14 // add z1.s, z1.s, #1 // add z0.s, z0.s, #1 // asr z1.s, z1.s, #1 // asr z0.s, z0.s, #1 // uzp1 z0.h, z1.h, z0.h // st1h {z0.h}, p1, [x1, x0, lsl #1] inch x0 whilelo p1.h, w0, w2 b.ne 28 ret ... and now vectorizes to: foo_round: ... whilelo p0.h, wzr, w2 nop ld1h {z1.h}, p0/z, [x4, x0, lsl #1] ld1h {z2.h}, p0/z, [x3, x0, lsl #1] umullb z0.s, z1.h, z2.h // umullt z1.s, z1.h, z2.h // rshrnb z0.h, z0.s, #15 // rshrnt z0.h, z1.s, #15 // st1h {z0.h}, p0, [x1, x0, lsl #1] inch x0 whilelo p0.h, w0, w2 b.ne 28 ret nop Also supported are: * Non-rounding cases The equivalent example snippet: void foo_trunc (void) { for (int i = 0; i < N; i++) a[i] = ((int32_t)b[i] * (int32_t)c[i]) >> 15; } ... vectorizes with SHRNT/SHRNB * 32-bit and 8-bit input/output types * Signed output types SMULLT/SMULLB are generated instead SQRDMULH was considered as a potential single-instruction optimization but saturates the intermediate value instead of truncating. Best Regards, Yuliang Wang ChangeLog: 2019-08-22 Yuliang Wang <yuliang.w...@arm.com> * config/aarch64/aarch64-sve2.md: support for SVE2 instructions [S/U]MULL[T/B] + [R]SHRN[T/B] and MULHRS pattern variants * config/aarch64/iterators.md: iterators and attributes for above * internal-fn.def: internal functions for MULH[R]S patterns * optabs.def: optabs definitions for above and sign variants * tree-vect-patterns.c (vect_recog_multhi_pattern): pattern recognition function for MULHRS * gcc.target/aarch64/sve2/mulhrs_1.c: new test for all variants
rb11655.patch
Description: rb11655.patch