https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101939
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Your inline-asm is incorrect too. It should be: asm( "vpaddsw %[tmp0], %[tmp1], %[tmp0]\n\t" "vpmaddwd %[tmp0], %[ones], %[tmp0]\n\t" "vpaddd %[acc], %[tmp0], %[acc]\n\t" : [acc]"+v"(acc), [tmp0]"+&v"(tmp0) : [tmp1]"v"(tmp1), [ones]"v"(_mm256_set1_epi16(1)) ); Because you write to tmp0 still.