https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Tamar Christina from comment #0) > 2. It looks like all targets that implement SAD do so with an instruction > that does ABD and then perform a reduction. So it looks like no target has > the semantics for SAD. x86 for example does SAD on 16 QImode data and 4 SImode accumulators which means it sums 4 QImode absolute differences each (but SAD_EXPR leaves unspecified which, so SAD_EXPR is only usable when you in the end sum the accumulator lanes as well). So I don't think 2. is true. > So this brings up the question of why the detection wasn't done based on ABD > instead and leaving the reduction explicit in the vectorizer. > > So question is, should we create a completely new standalone pattern for ABD > or should be make ABD the thing being detected and change SAD_EXPR to > recognize ADB + reduction. > > Removing SAD completely in favor of ABD + reduction means that hand > optimized versions in targets need updating so I'm in favor of still > emitting SAD. I'd do a separate internal function for ABD, possibly sharing part of the detection as you proposed.