https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103611
--- Comment #4 from John Platts <john_platts at hotmail dot com> --- (In reply to Andrew Pinski from comment #3) > Hmm, GCC 4.8.1-5.5.0 produces: > long long SSE2ExtractInt64<0>(long long __vector): > .LFB499: > .cfi_startproc > pshufd xmm1, xmm0, 1 > movd eax, xmm0 > movd edx, xmm1 > ret > long long SSE2ExtractInt64<1>(long long __vector): > .LFB500: > .cfi_startproc > pshufd xmm1, xmm0, 3 > pshufd xmm0, xmm0, 2 > movd edx, xmm1 > movd eax, xmm0 > ret > > For the code in comment #0. > And always used memory for code in comment #2. I have noticed that the issue with suboptimal code being generated for the code in comment #0 and comment #1 isn't happening with GCC 5.5 or earlier, but the issue with suboptimal code being generated for code in comment #0 and comment #1 is happening with GCC 6.1 or later.