https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54174

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> That's more likely a register allocator issue.

Yes, LRA allocate registers from back to front which means change source code
like below will eliminate redundant mov.

typedef float v4sf __attribute__ ((vector_size (4*4)));
typedef float v8sf __attribute__ ((vector_size (4*8)));

v4sf add(v8sf v)
{
  v4sf b = __builtin_ia32_vextractf128_ps256(v, 1);
  v4sf a = __builtin_ia32_vextractf128_ps256(v, 0);
  return a + b;
}

Reply via email to