https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68780

--- Comment #1 from James Almer <jamrial at gmail dot com> ---
What i assume you want is _mm256_mullo_epi32(a, b), which maps to the vpmulld
instruction (Multiply the packed 32-bit integers in a and b, producing
intermediate 64-bit integers, and store the low 32 bits of the intermediate
integers in dst), which for your testcase would result in eight 32-bit integers
with value 2.

_mm256_mul_epi32(a, b) maps to vpmuldq (Multiply the low 32-bit integers from
each packed 64-bit element in a and b, and store the signed 64-bit results in
dst), which for your testcase correctly gives four 64-bit integers with value
2.

Reply via email to