https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- @cindex @code{ssad@var{m}} instruction pattern @item @samp{ssad@var{m}} @cindex @code{usad@var{m}} instruction pattern @item @samp{usad@var{m}} Compute the sum of absolute differences of two signed/unsigned elements. Operand 1 and operand 2 are of the same mode. Their absolute difference, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the absolute difference. The result is placed in operand 0, which is of the same mode as operand 3. That cruically "misses" a detail for the vector case where the sum will also sum across (unspecified!) lanes when operand 3 is wider than the absolute difference and has a lower number of lanes than the input vectors. The unspecified part makes it a hart fit for pattern matching (unrolled) code when actual output lanes are used and they are not being reduced to a single scalar in the end. For scalar instruction matching the patterns should be usable. Note the SAD_EXPR on GENERIC has the same issue when vectors types are used - the exact semantics are unspecified. The same is true for DOT_PROD_EXPR and WIDEN_SUM_EXPR and a bunch of others. These days we'd go for matching them to direct internal function calls using the {u,s}sad optabs and I don't see any reason to not allow scalar modes for them. I'd rather get rid of all the tree codes we have for vectorizer reduction patterns in favor of those so if you can avoid introducing new ones or adding more uses of existing ones that would be nice.