https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831
--- Comment #44 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 23 May 2024, mkretz at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831 > > --- Comment #43 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> --- > I see this issue in SIMD programming. Example (on x86_64 with only '-O2', i.e. > without AVX512) https://compiler-explorer.com/z/K64djP356: > > typedef int V __attribute__((vector_size(64))); > > V gen(); > > void g0(V const&, V const&); > void g1(V, V); > > void constref() > { > g0(gen(), gen()); > } > > void byvalue() > { > g1(gen(), gen()); > } > > Both the 'constref' and 'byvalue' cases copy every V argument before calling > g0/g1. The copy on GIMPLE is due to IL constraints: _10 = gen (); <bb 4> : D.2805 = _10; g0 (&D.2805, &D.2806); when the call has a register type return value the LHS of the call statement has to be a register (SSA name). But the argument to g0 has to be memory, so we get the extra copy. Now, w/o AVX512 that "register" doesn't work out and we allocate it to memory causing a memory-to-memory copy. That's also because in vector lowering we do not lower those register-to-memory stores (doing that would possibly help a bit, as would more clever expansion of the copy or more clever expanding of _10)