https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101939
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Your inline-asm is incorrect too.
It should be:
asm(
"vpaddsw %[tmp0], %[tmp1], %[tmp0]\n\t"
"vpmaddwd %[tmp0], %[ones], %[tmp0]\n\t"
"vpaddd %[acc], %[tmp0], %[acc]\n\t"
: [acc]"+v"(acc), [tmp0]"+&v"(tmp0)
: [tmp1]"v"(tmp1), [ones]"v"(_mm256_set1_epi16(1))
);
Because you write to tmp0 still.
