https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754
Bug ID: 99754
Summary: [sse2] new _mm_loadu_si16 and _mm_loadu_si32
implemented incorrectly
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: [email protected]
Target Milestone: ---
Created attachment 50470
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50470&action=edit
Trivial patch
_mm_loadu_si16 and _mm_loadu_si32 were implemented in GCC 11, but incorrectly.
The value pointed to by the argument is supposed to go in the first element,
but _mm_set_epi16 / _mm_set_epi32 reverse the argument order so in GCC they go
in the *last* elemement.
The most straightforward solution would be to change the _mm_set_* calls so the
input is used for the last argument instead of the first (patch attached).
FWIW, here is LLVM's implementation:
<https://github.com/llvm/llvm-project/blob/a76d0207d5f94af698525d7dc1f0953ed35901a6/clang/lib/Headers/emmintrin.h#L1670-L1710>.
I've verified that LLVM's implementation matches ICC's.