Hi Bill, On Sun, Nov 05, 2017 at 06:25:11PM -0600, Bill Schmidt wrote: > This patch adds support for vectorization of unsigned SAD expressions. SAD > vectorization uses the usad<mode> pattern to represent a widening accumulation > of SADs performed on a narrower type. The two cases in this patch operate > on V16QImode and V8HImode, respectively, accumulating into V4SImode. A > vectorized loop on SAD operations will use these patterns in the main loop > body and perform a final reduction to sum the 4 accumulated results in the > V4SImode accumulator during the loop epilogue. > > POWER's sum-across ops (vsum4ubs and vsum4shs) unfortunately have saturating > semantics, so they can only be used for the sum-across; the accumulation > with previous iteration results requires a separate add.
> @@ -4184,6 +4184,51 @@ > "vbpermd %0,%1,%2" > [(set_attr "type" "vecsimple")]) > > +;; Support for SAD (sum of absolute differences). > + > +;; Due to saturating semantics, we can't combine the sum-across > +;; with the vector accumulate in vsum4ubs. A vadduwm is needed. > +(define_expand "usadv16qi" > + [(use (match_operand:V4SI 0 "register_operand")) > + (use (match_operand:V16QI 1 "register_operand")) > + (use (match_operand:V16QI 2 "register_operand")) > + (use (match_operand:V4SI 3 "register_operand"))] > + "TARGET_P9_VECTOR" > + " > +{ > + rtx absd = gen_reg_rtx (V16QImode); > + rtx zero = gen_reg_rtx (V4SImode); > + rtx psum = gen_reg_rtx (V4SImode); > + > + emit_insn (gen_p9_vaduv16qi3 (absd, operands[1], operands[2])); > + emit_insn (gen_altivec_vspltisw (zero, const0_rtx)); > + emit_insn (gen_altivec_vsum4ubs (psum, absd, zero)); > + emit_insn (gen_addv4si3 (operands[0], psum, operands[3])); > + DONE; > +}") No quotes around the {} block please (twice). Other than that, looks fine to me, please commit. Thanks, Segher