On Mon, Feb 04, 2019 at 07:34:05AM -0600, Alejandro Martinez Vicente wrote: > Hi, > > This patch adds support to vectorize sum of absolute differences (SAD_EXPR) > using SVE. It also uses the new functionality to ensure that the resulting > loop > is masked. Therefore, it depends on > > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00016.html > > Given this input code: > > int > sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n) > { > int sum = 0; > > for (int i = 0; i < n; i++) > { > sum += __builtin_abs (x[i] - y[i]); > } > > return sum; > } > > The resulting SVE code is: > > 0000000000000000 <sum_abs>: > 0: 7100005f cmp w2, #0x0 > 4: 5400026d b.le 50 <sum_abs+0x50> > 8: d2800003 mov x3, #0x0 // #0 > c: 93407c42 sxtw x2, w2 > 10: 2538c002 mov z2.b, #0 > 14: 25221fe0 whilelo p0.b, xzr, x2 > 18: 2538c023 mov z3.b, #1 > 1c: 2518e3e1 ptrue p1.b > 20: a4034000 ld1b {z0.b}, p0/z, [x0, x3] > 24: a4034021 ld1b {z1.b}, p0/z, [x1, x3] > 28: 0430e3e3 incb x3 > 2c: 0520c021 sel z1.b, p0, z1.b, z0.b > 30: 25221c60 whilelo p0.b, x3, x2 > 34: 040d0420 uabd z0.b, p1/m, z0.b, z1.b > 38: 44830402 udot z2.s, z0.b, z3.b > 3c: 54ffff21 b.ne 20 <sum_abs+0x20> // b.any > 40: 2598e3e0 ptrue p0.s > 44: 04812042 uaddv d2, p0, z2.s > 48: 1e260040 fmov w0, s2 > 4c: d65f03c0 ret > 50: 1e2703e2 fmov s2, wzr > 54: 1e260040 fmov w0, s2 > 58: d65f03c0 ret > > Notice how udot is used inside a fully masked loop. > > I tested this patch in an aarch64 machine bootstrapping the compiler and > running the checks.
This doesn't give us much confidence in SVE coverage; unless you have been running in an environment using SVE by default? Do you have some set of workloads you could test the compiler against to ensure correct operation of the SVE vectorization? > > I admit it is too late to merge this into gcc 9, but I'm posting it anyway so > it can be considered for gcc 10. Richard Sandiford has the call on whether this patch is OK for trunk now or GCC 10. With the minimal testing it has had, I'd be uncomfortable with it as a GCC 9 patch. That said, it is a fairly self-contained pattern for the compiler and it would be good to see this optimization in GCC 9. > > Alejandro > > > gcc/Changelog: > > 2019-02-04 Alejandro Martinez <alejandro.martinezvice...@arm.com> > > * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): New define_expand. > (aarch64_<su>abd<mode>_3): Likewise. > (*aarch64_<su>abd<mode>_3): New define_insn. > (<sur>sad<vsi2qi>): New define_expand. > * config/aarch64/iterators.md: Added MAX_OPP and max_opp attributes. > Added USMAX iterator. > * config/aarch64/predicates.md: Added aarch64_smin and aarch64_umin > predicates. > * tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR. > (build_vect_cond_expr): Likewise. > > gcc/testsuite/Changelog: > > 2019-02-04 Alejandro Martinez <alejandro.martinezvice...@arm.com> > > * gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute > differences.