On Mon, Feb 04, 2019 at 07:34:05AM -0600, Alejandro Martinez Vicente wrote:
> Hi,
> 
> This patch adds support to vectorize sum of absolute differences (SAD_EXPR)
> using SVE. It also uses the new functionality to ensure that the resulting 
> loop
> is masked. Therefore, it depends on
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00016.html
> 
> Given this input code:
> 
> int
> sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n)
> {
>   int sum = 0;
> 
>   for (int i = 0; i < n; i++)
>     {
>       sum += __builtin_abs (x[i] - y[i]);
>     }
> 
>   return sum;
> }
> 
> The resulting SVE code is:
> 
> 0000000000000000 <sum_abs>:
>    0: 7100005f        cmp     w2, #0x0
>    4: 5400026d        b.le    50 <sum_abs+0x50>
>    8: d2800003        mov     x3, #0x0                        // #0
>    c: 93407c42        sxtw    x2, w2
>   10: 2538c002        mov     z2.b, #0
>   14: 25221fe0        whilelo p0.b, xzr, x2
>   18: 2538c023        mov     z3.b, #1
>   1c: 2518e3e1        ptrue   p1.b
>   20: a4034000        ld1b    {z0.b}, p0/z, [x0, x3]
>   24: a4034021        ld1b    {z1.b}, p0/z, [x1, x3]
>   28: 0430e3e3        incb    x3
>   2c: 0520c021        sel     z1.b, p0, z1.b, z0.b
>   30: 25221c60        whilelo p0.b, x3, x2
>   34: 040d0420        uabd    z0.b, p1/m, z0.b, z1.b
>   38: 44830402        udot    z2.s, z0.b, z3.b
>   3c: 54ffff21        b.ne    20 <sum_abs+0x20>  // b.any
>   40: 2598e3e0        ptrue   p0.s
>   44: 04812042        uaddv   d2, p0, z2.s
>   48: 1e260040        fmov    w0, s2
>   4c: d65f03c0        ret
>   50: 1e2703e2        fmov    s2, wzr
>   54: 1e260040        fmov    w0, s2
>   58: d65f03c0        ret
> 
> Notice how udot is used inside a fully masked loop.
> 
> I tested this patch in an aarch64 machine bootstrapping the compiler and
> running the checks.

This doesn't give us much confidence in SVE coverage; unless you have
been running in an environment using SVE by default? Do you have some set
of workloads you could test the compiler against to ensure correct operation
of the SVE vectorization?

> 
> I admit it is too late to merge this into gcc 9, but I'm posting it anyway so
> it can be considered for gcc 10.

Richard Sandiford has the call on whether this patch is OK for trunk now or
GCC 10. With the minimal testing it has had, I'd be uncomfortable with it as
a GCC 9 patch. That said, it is a fairly self-contained pattern for the
compiler and it would be good to see this optimization in GCC 9.

> 
> Alejandro
> 
> 
> gcc/Changelog:
> 
> 2019-02-04  Alejandro Martinez  <alejandro.martinezvice...@arm.com>
> 
>       * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): New define_expand.
>       (aarch64_<su>abd<mode>_3): Likewise.
>       (*aarch64_<su>abd<mode>_3): New define_insn.
>       (<sur>sad<vsi2qi>): New define_expand.
>       * config/aarch64/iterators.md: Added MAX_OPP and max_opp attributes.
>       Added USMAX iterator.
>       * config/aarch64/predicates.md: Added aarch64_smin and aarch64_umin
>       predicates.
>       * tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR.
>       (build_vect_cond_expr): Likewise.
> 
> gcc/testsuite/Changelog:
>  
> 2019-02-04  Alejandro Martinez  <alejandro.martinezvice...@arm.com>
> 
>       * gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute
>       differences.


Reply via email to