https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120846
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Richard Biener from comment #1) > I guess the testcase assumes that the qi->si case gets an intermediate > qi->hi promotion and then dotprod_hisi being used. But it fails to check > for the ability to do qi->hi promotion. Is that what your target is missing? > > OTOH I didn't check what the aarch64 codegen does but I understood Tamar > that aarch64 only has hisi dotprod, not qisi. aarch64 has both qisi, hisi dotprod when in SVE streaming mode with sme2 enabled. that is: ``` int sdot2(int n, short* data) __arm_streaming { int sum = 0; for (int i=0; i<n; i+=1) { sum += data[i] * data[i]; } return sum; } ``` With `-march=armv9-a+sme2` added, GCC can autovectorize this using: sdot z30.s, z27.h, z28.h Otherwise aarch64 backend normally has just qisi.