https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120846

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> I guess the testcase assumes that the qi->si case gets an intermediate
> qi->hi promotion and then dotprod_hisi being used.  But it fails to check
> for the ability to do qi->hi promotion.  Is that what your target is missing?
> 
> OTOH I didn't check what the aarch64 codegen does but I understood Tamar
> that aarch64 only has hisi dotprod, not qisi.

aarch64 has both qisi, hisi dotprod when in SVE streaming mode with sme2
enabled.

that is:
```
int sdot2(int n, short* data) __arm_streaming {
  int sum = 0;
  for (int i=0; i<n; i+=1) {
    sum += data[i] * data[i];
  }
  return sum;
}
```
With `-march=armv9-a+sme2` added, GCC can autovectorize this using:
sdot    z30.s, z27.h, z28.h

Otherwise aarch64 backend normally has just qisi.

Reply via email to