https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68780
--- Comment #1 from James Almer <jamrial at gmail dot com> --- What i assume you want is _mm256_mullo_epi32(a, b), which maps to the vpmulld instruction (Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst), which for your testcase would result in eight 32-bit integers with value 2. _mm256_mul_epi32(a, b) maps to vpmuldq (Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst), which for your testcase correctly gives four 64-bit integers with value 2.