inefficient spilling of AVX2 ymm registers

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 06 Apr 2021 01:25:31 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-*

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Which function does the loop kernel reside in?  I see you have some lambdas
in Z4c_RHS, done fancy as out-of-line functions, that do look like they
could comprise the actual kernels.  In apply_upwind_diss I see cases without
stack usage.

I'm looking at -O2 -march=skylake compiles

Note that with C++ it's easy to retain some abstraction and thus misinterpret
stack accesses as spilling where they are aggregates not eliminated.  For
example in one of the lambdas I see

  _61489 = __builtin_ia32_maskloadpd256 (_104487, _61513);
  D.545024[1].elts.car = _61489;
...
  MEM[(struct vect *)&D.544982].elts._M_elems[1] = MEM[(const struct simd
&)&D.545024 + 32];
...
  MEM[(struct mat3 *)&vars + 992B] = MEM[(const struct mat3 &)&D.544982];

and D.544982 is later variable indexed in some MIN/MAX, FMA using code
(instead of using 'vars' there).  Looking at what -fdump-tree-optimized
produces is sometimes pointing at problems.

That said, the code is large so please point at some source lines within the
important kernel(s) (of the preprocessed source, that is) and the compile
options used.

[Bug target/99912] Unnecessary / inefficient spilling of AVX2 ymm registers

Reply via email to