Hi,

This patch adds support in the vectorizer for masking fold left reductions.
This avoids the need to insert a conditional assignment with some identity
value.

For example, this C code:

double
f (double *restrict x, int n)
{
  double res = 0.0;
  for (int i = 0; i < n; i++)
    {
      res += x[i];
    }
  return res;
}

Produced this for SVE:

0000000000000000 <f>:
   0:   2f00e400    movi    d0, #0x0
   4:   7100003f    cmp w1, #0x0
   8:   5400018d    b.le    38 <f+0x38>
   c:   d2800002    mov x2, #0x0                    // #0
  10:   93407c21    sxtw    x1, w1
  14:   25f8c002    mov z2.d, #0
  18:   25e11fe0    whilelo p0.d, xzr, x1
  1c:   25d8e3e1    ptrue   p1.d
  20:   a5e24001    ld1d    {z1.d}, p0/z, [x0, x2, lsl #3]
  24:   04f0e3e2    incd    x2
  28:   05e2c021    sel z1.d, p0, z1.d, z2.d
  2c:   25e11c40    whilelo p0.d, x2, x1
  30:   65d82420    fadda   d0, p1, d0, z1.d
  34:   54ffff61    b.ne    20 <f+0x20>  // b.any
  38:   d65f03c0    ret

And now I get this:

0000000000000000 <f>:
   0:   2f00e400    movi    d0, #0x0
   4:   7100003f    cmp w1, #0x0
   8:   5400012d    b.le    2c <f+0x2c>
   c:   d2800002    mov x2, #0x0                    // #0
  10:   93407c21    sxtw    x1, w1
  14:   25e11fe0    whilelo p0.d, xzr, x1
  18:   a5e24001    ld1d    {z1.d}, p0/z, [x0, x2, lsl #3]
  1c:   04f0e3e2    incd    x2
  20:   65d82020    fadda   d0, p0, d0, z1.d
  24:   25e11c40    whilelo p0.d, x2, x1
  28:   54ffff81    b.ne    18 <f+0x18>  // b.any
  2c:   d65f03c0    ret

I've added a new test and run the regression testing. Ok for trunk?

Alejandro

2019-06-12  Alejandro Martinez  <alejandro.martinezvice...@arm.com>

gcc/
        * config/aarch64/aarch64-sve.md (mask_fold_left_plus_<mode>): Renamed
        from "*fold_left_plus_<mode>", updated operands order.
        * doc/md.texi (mask_fold_left_plus_@var{m}): Documented new optab.
        * internal-fn.c (mask_fold_left_direct): New define.
        (expand_mask_fold_left_optab_fn): Likewise.
        (direct_mask_fold_left_optab_supported_p): Likewise.
        * internal-fn.def (MASK_FOLD_LEFT_PLUS): New internal function.
        * optabs.def (mask_fold_left_plus_optab): New optab.
        * tree-vect-loop.c (mask_fold_left_plus_optab): New function to get a
        masked internal_fn for a reduction ifn.
        (vectorize_fold_left_reduction): Add support for masking reductions.

gcc/testsuite/
        * gcc.target/aarch64/sve/fadda_1.c: New test.

Attachment: mask_fold_left_v3.patch
Description: mask_fold_left_v3.patch

Reply via email to