[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

rdapp at gcc dot gnu.org via Gcc-bugs Wed, 11 Dec 2024 04:16:12 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722


--- Comment #18 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> But the point here really here is we don't need the widening semantics, more
> twice. The min+max+sub in loops with a final reducing sum should do the
> trick.

OK I guess it can be argued that
  minus (max (a, b), min (a, b))
is generally a preferable "absdiff" expansion over
  max (neg (minus (convert (a), convert (b)))).

The former is not better but also not worse if a two-operand widening minus is
available and equally fast as a regular minus.  When widening sub and add are
slower it's always better.

So if we want to get rid of the widening ops I'd say defining an [us]abd<mode>3
expander is the way to go.  On our uarch widening ops aren't worse than regular
ops so it wouldn't matter but as uarchs differ we could do that.

Something like (untested):
(define_expand "uabd<mode>3"
  [(match_operand:V_VLSI 0 "register_operand")
   (match_operand:V_VLSI 1 "register_operand")
   (match_operand:V_VLSI 2 "register_operand")]
  "TARGET_VECTOR"
{
  rtx mx = gen_reg_rtx (<MODE>mode);
  rtx mn = gen_reg_rtx (<MODE>mode);
  emit_move_insn (mx, gen_rtx_UMAX (<MODE>mode, operands[1], operands[2]));
  emit_move_insn (mn, gen_rtx_UMIN (<MODE>mode, operands[1], operands[2]));
  emit_move_insn (operands[0], gen_rtx_MINUS (<MODE>mode, mx, mn));
  DONE;
})

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

Reply via email to