[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 28 Mar 2022 03:15:46 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075


--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
@cindex @code{ssad@var{m}} instruction pattern
@item @samp{ssad@var{m}}
@cindex @code{usad@var{m}} instruction pattern
@item @samp{usad@var{m}}
Compute the sum of absolute differences of two signed/unsigned elements.
Operand 1 and operand 2 are of the same mode. Their absolute difference, which
is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode
equal or wider than the mode of the absolute difference. The result is placed
in operand 0, which is of the same mode as operand 3.


That cruically "misses" a detail for the vector case where the sum will
also sum across (unspecified!) lanes when operand 3 is wider than the
absolute difference and has a lower number of lanes than the input vectors.

The unspecified part makes it a hart fit for pattern matching (unrolled)
code when actual output lanes are used and they are not being reduced to
a single scalar in the end.

For scalar instruction matching the patterns should be usable.

Note the SAD_EXPR on GENERIC has the same issue when vectors types are
used - the exact semantics are unspecified.

The same is true for DOT_PROD_EXPR and WIDEN_SUM_EXPR and a bunch of others.

These days we'd go for matching them to direct internal function calls
using the {u,s}sad optabs and I don't see any reason to not allow scalar
modes for them.  I'd rather get rid of all the tree codes we have for
vectorizer reduction patterns in favor of those so if you can avoid
introducing new ones or adding more uses of existing ones that would be nice.

[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

Reply via email to