Hi, This patch adds support in the vectorizer for masking fold left reductions. This avoids the need to insert a conditional assignment with some identity value.
For example, this C code: double f (double *restrict x, int n) { double res = 0.0; for (int i = 0; i < n; i++) { res += x[i]; } return res; } Produced this for SVE: 0000000000000000 <f>: 0: 2f00e400 movi d0, #0x0 4: 7100003f cmp w1, #0x0 8: 5400018d b.le 38 <f+0x38> c: d2800002 mov x2, #0x0 // #0 10: 93407c21 sxtw x1, w1 14: 25f8c002 mov z2.d, #0 18: 25e11fe0 whilelo p0.d, xzr, x1 1c: 25d8e3e1 ptrue p1.d 20: a5e24001 ld1d {z1.d}, p0/z, [x0, x2, lsl #3] 24: 04f0e3e2 incd x2 28: 05e2c021 sel z1.d, p0, z1.d, z2.d 2c: 25e11c40 whilelo p0.d, x2, x1 30: 65d82420 fadda d0, p1, d0, z1.d 34: 54ffff61 b.ne 20 <f+0x20> // b.any 38: d65f03c0 ret And now I get this: 0000000000000000 <f>: 0: 2f00e400 movi d0, #0x0 4: 7100003f cmp w1, #0x0 8: 5400012d b.le 2c <f+0x2c> c: d2800002 mov x2, #0x0 // #0 10: 93407c21 sxtw x1, w1 14: 25e11fe0 whilelo p0.d, xzr, x1 18: a5e24001 ld1d {z1.d}, p0/z, [x0, x2, lsl #3] 1c: 04f0e3e2 incd x2 20: 65d82020 fadda d0, p0, d0, z1.d 24: 25e11c40 whilelo p0.d, x2, x1 28: 54ffff81 b.ne 18 <f+0x18> // b.any 2c: d65f03c0 ret I've added a new test and run the regression testing. Ok for trunk? Alejandro 2019-06-12 Alejandro Martinez <alejandro.martinezvice...@arm.com> gcc/ * config/aarch64/aarch64-sve.md (mask_fold_left_plus_<mode>): Renamed from "*fold_left_plus_<mode>", updated operands order. * doc/md.texi (mask_fold_left_plus_@var{m}): Documented new optab. * internal-fn.c (mask_fold_left_direct): New define. (expand_mask_fold_left_optab_fn): Likewise. (direct_mask_fold_left_optab_supported_p): Likewise. * internal-fn.def (MASK_FOLD_LEFT_PLUS): New internal function. * optabs.def (mask_fold_left_plus_optab): New optab. * tree-vect-loop.c (mask_fold_left_plus_optab): New function to get a masked internal_fn for a reduction ifn. (vectorize_fold_left_reduction): Add support for masking reductions. gcc/testsuite/ * gcc.target/aarch64/sve/fadda_1.c: New test.
mask_fold_left_v3.patch
Description: mask_fold_left_v3.patch