https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831

--- Comment #44 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 23 May 2024, mkretz at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831
> 
> --- Comment #43 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> ---
> I see this issue in SIMD programming. Example (on x86_64 with only '-O2', i.e.
> without AVX512) https://compiler-explorer.com/z/K64djP356:
> 
> typedef int V __attribute__((vector_size(64)));
> 
> V gen();
> 
> void g0(V const&, V const&);
> void g1(V, V);
> 
> void constref()
> {
>   g0(gen(), gen());
> }
> 
> void byvalue()
> {
>   g1(gen(), gen());
> }
> 
> Both the 'constref' and 'byvalue' cases copy every V argument before calling
> g0/g1.

The copy on GIMPLE is due to IL constraints:

  _10 = gen ();

  <bb 4> :
  D.2805 = _10;
  g0 (&D.2805, &D.2806);

when the call has a register type return value the LHS of the call
statement has to be a register (SSA name).  But the argument to
g0 has to be memory, so we get the extra copy.  Now, w/o AVX512
that "register" doesn't work out and we allocate it to memory
causing a memory-to-memory copy.

That's also because in vector lowering we do not lower those
register-to-memory stores (doing that would possibly help a bit,
as would more clever expansion of the copy or more clever
expanding of _10)

Reply via email to