[Bug target/54174] Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0)

crazylht at gmail dot com via Gcc-bugs Mon, 23 Aug 2021 04:29:36 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54174


--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> That's more likely a register allocator issue.

Yes, LRA allocate registers from back to front which means change source code
like below will eliminate redundant mov.

typedef float v4sf __attribute__ ((vector_size (4*4)));
typedef float v8sf __attribute__ ((vector_size (4*8)));

v4sf add(v8sf v)
{
  v4sf b = __builtin_ia32_vextractf128_ps256(v, 1);
  v4sf a = __builtin_ia32_vextractf128_ps256(v, 0);
  return a + b;
}

[Bug target/54174] Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0)

Reply via email to