https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
So
__m128d f(__m128d a, __m128d b) {
return _mm_mul_pd(_mm_shuffle_pd(a, a, 0), _mm_shuffle_pd(b, b, 0));
}
is expanded as
_3 = VEC_PERM_EXPR <b_2(D), b_2(D), { 0, 0 }>;
_5 = VEC_PERM_EXPR <a_4(D), a_4(D), { 0, 0 }>;
_6 = _3 * _5;
return _6;
but vector lowering ssa_uniform_vector_p doesn't yet handle VEC_PERM_EXPRs
with all-zero permute. Hacking that in (not fixing the fallout) produces
<bb 2> [local count: 1073741824]:
_7 = BIT_FIELD_REF <b_2(D), 64, 0>;
_8 = BIT_FIELD_REF <a_4(D), 64, 0>;
_9 = _7 * _8;
_6 = {_9, _9};
and
f:
.LFB534:
.cfi_startproc
mulsd %xmm1, %xmm0
unpcklpd %xmm0, %xmm0
ret