[Bug target/98167] [x86] Failure to optimize operation on indentically shuffled operands into a shuffle of the result of the operation

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 07 Dec 2020 04:39:22 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167


--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
So

__m128d f(__m128d a, __m128d b) {
    return _mm_mul_pd(_mm_shuffle_pd(a, a, 0), _mm_shuffle_pd(b, b, 0));
}

is expanded as

  _3 = VEC_PERM_EXPR <b_2(D), b_2(D), { 0, 0 }>;
  _5 = VEC_PERM_EXPR <a_4(D), a_4(D), { 0, 0 }>;
  _6 = _3 * _5;
  return _6;

but vector lowering ssa_uniform_vector_p doesn't yet handle VEC_PERM_EXPRs
with all-zero permute.  Hacking that in (not fixing the fallout) produces

  <bb 2> [local count: 1073741824]:
  _7 = BIT_FIELD_REF <b_2(D), 64, 0>;
  _8 = BIT_FIELD_REF <a_4(D), 64, 0>;
  _9 = _7 * _8;
  _6 = {_9, _9};

and

f:
.LFB534:
        .cfi_startproc
        mulsd   %xmm1, %xmm0
        unpcklpd        %xmm0, %xmm0
        ret

[Bug target/98167] [x86] Failure to optimize operation on indentically shuffled operands into a shuffle of the result of the operation

Reply via email to