https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #0)
> 2. It looks like all targets that implement SAD do so with an instruction
> that does ABD and then perform a reduction.  So it looks like no target has
> the semantics for SAD.

x86 for example does SAD on 16 QImode data and 4 SImode accumulators which
means it sums 4 QImode absolute differences each (but SAD_EXPR leaves
unspecified which, so SAD_EXPR is only usable when you in the end sum
the accumulator lanes as well).

So I don't think 2. is true.

> So this brings up the question of why the detection wasn't done based on ABD
> instead and leaving the reduction explicit in the vectorizer.
> 
> So question is, should we create a completely new standalone pattern for ABD
> or should be make ABD the thing being detected and change SAD_EXPR to
> recognize ADB + reduction.
> 
> Removing SAD completely in favor of ABD + reduction means that hand
> optimized versions in targets need updating so I'm in favor of still
> emitting SAD.

I'd do a separate internal function for ABD, possibly sharing part of the
detection as you proposed.

Reply via email to