Hi,

This patch adds support to vectorize sum of absolute differences (SAD_EXPR)
using SVE. It also uses the new functionality to ensure that the resulting loop
is masked. Therefore, it depends on

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00016.html

Given this input code:

int
sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n)
{
  int sum = 0;

  for (int i = 0; i < n; i++)
    {
      sum += __builtin_abs (x[i] - y[i]);
    }

  return sum;
}

The resulting SVE code is:

0000000000000000 <sum_abs>:
   0:   7100005f        cmp     w2, #0x0
   4:   5400026d        b.le    50 <sum_abs+0x50>
   8:   d2800003        mov     x3, #0x0                        // #0
   c:   93407c42        sxtw    x2, w2
  10:   2538c002        mov     z2.b, #0
  14:   25221fe0        whilelo p0.b, xzr, x2
  18:   2538c023        mov     z3.b, #1
  1c:   2518e3e1        ptrue   p1.b
  20:   a4034000        ld1b    {z0.b}, p0/z, [x0, x3]
  24:   a4034021        ld1b    {z1.b}, p0/z, [x1, x3]
  28:   0430e3e3        incb    x3
  2c:   0520c021        sel     z1.b, p0, z1.b, z0.b
  30:   25221c60        whilelo p0.b, x3, x2
  34:   040d0420        uabd    z0.b, p1/m, z0.b, z1.b
  38:   44830402        udot    z2.s, z0.b, z3.b
  3c:   54ffff21        b.ne    20 <sum_abs+0x20>  // b.any
  40:   2598e3e0        ptrue   p0.s
  44:   04812042        uaddv   d2, p0, z2.s
  48:   1e260040        fmov    w0, s2
  4c:   d65f03c0        ret
  50:   1e2703e2        fmov    s2, wzr
  54:   1e260040        fmov    w0, s2
  58:   d65f03c0        ret

Notice how udot is used inside a fully masked loop.

I tested this patch in an aarch64 machine bootstrapping the compiler and
running the checks.

I admit it is too late to merge this into gcc 9, but I'm posting it anyway so
it can be considered for gcc 10.

Alejandro


gcc/Changelog:

2019-02-04  Alejandro Martinez  <alejandro.martinezvice...@arm.com>

        * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): New define_expand.
        (aarch64_<su>abd<mode>_3): Likewise.
        (*aarch64_<su>abd<mode>_3): New define_insn.
        (<sur>sad<vsi2qi>): New define_expand.
        * config/aarch64/iterators.md: Added MAX_OPP and max_opp attributes.
        Added USMAX iterator.
        * config/aarch64/predicates.md: Added aarch64_smin and aarch64_umin
        predicates.
        * tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR.
        (build_vect_cond_expr): Likewise.

gcc/testsuite/Changelog:
 
2019-02-04  Alejandro Martinez  <alejandro.martinezvice...@arm.com>

        * gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute
        differences.

Attachment: sad_v1.patch
Description: sad_v1.patch

Reply via email to