Hi,

This patch does two things. For the general vectorizer, it adds support to
perform fully masked reductions over expressions that don't support masking.
This is achieved by using VEC_COND_EXPR where possible.  At the moment this is
implemented for DOT_PROD_EXPR only, but the framework is there to extend it to
other expressions.

Related to that, this patch adds support to vectorize dot product using SVE.  It
also uses the new functionality to ensure that the resulting loop is masked.

Given this input code:

uint32_t
dotprod (uint8_t *restrict x, uint8_t *restrict y, int n)
{
  uint32_t sum = 0;

  for (int i = 0; i < n; i++)
    {
      sum += x[i] * y[i];
    }

  return sum;
}

The resulting SVE code is:

0000000000000000 <dotprod>:
   0:   7100005f        cmp     w2, #0x0
   4:   5400024d        b.le    4c <dotprod+0x4c>
   8:   d2800003        mov     x3, #0x0                        // #0
   c:   93407c42        sxtw    x2, w2
  10:   2538c001        mov     z1.b, #0
  14:   25221fe0        whilelo p0.b, xzr, x2
  18:   2538c003        mov     z3.b, #0
  1c:   d503201f        nop
  20:   a4034002        ld1b    {z2.b}, p0/z, [x0, x3]
  24:   a4034020        ld1b    {z0.b}, p0/z, [x1, x3]
  28:   0430e3e3        incb    x3
  2c:   0523c000        sel     z0.b, p0, z0.b, z3.b
  30:   25221c60        whilelo p0.b, x3, x2
  34:   44820401        udot    z1.s, z0.b, z2.b
  38:   54ffff41        b.ne    20 <dotprod+0x20>  // b.any
  3c:   2598e3e0        ptrue   p0.s
  40:   04812021        uaddv   d1, p0, z1.s
  44:   1e260020        fmov    w0, s1
  48:   d65f03c0        ret
  4c:   1e2703e1        fmov    s1, wzr
  50:   1e260020        fmov    w0, s1
  54:   d65f03c0        ret

Notice how udot is used inside a fully masked loop.

I tested this patch in an aarch64 machine bootstrapping the compiler and
running the checks.

I admit it is too late to merge this into gcc 9, but I'm posting it anyway so
it can be considered for gcc 10.

Alejandro


gcc/Changelog:

2019-01-31  Alejandro Martinez  <alejandro.martinezvice...@arm.com>

        * config/aarch64/aarch64-sve.md (<sur>dot_prod<vsi2qi>): Taken from SVE
        ACLE branch.
        * config/aarch64/iterators.md: Copied Vetype_fourth, VSI2QI and vsi2qi 
from
        SVE ACLE branch.
        * tree-vect-loop.c (use_mask_by_cond_expr_p): New function to check if a
        VEC_COND_EXPR be inserted to emulate a conditional internal function.
        (build_vect_cond_expr): Emit the VEC_COND_EXPR.
        (vectorizable_reduction): Use the functions above to vectorize in a
        fully masked loop codes that don't have a conditional internal
        function.

gcc/testsuite/Changelog:
 
2019-01-31  Alejandro Martinez  <alejandro.martinezvice...@arm.com>

        * gcc.target/aarch64/sve/dot_1.c: New test for dot product.

Attachment: dot_v1.patch
Description: dot_v1.patch

Reply via email to