https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912

--- Comment #4 from Erik Schnetter <schnetter at gmail dot com> ---
I build with the compiler options

/Users/eschnett/src/CarpetX/Cactus/view-compilers/bin/g++  -fopenmp -Wall -pipe
-g -march=skylake -std=gnu++17 -O3 -fcx-limited-range -fexcess-precision=fast
-fno-math-errno -fno-rounding-math -fno-signaling-nans
-funsafe-math-optimizations   -c -o configs/sim/build/Z4c/rhs.cxx.o
configs/sim/build/Z4c/rhs.cxx.ii

One of the kernels in question (the one I describe above) is the C++ lambda in
lines 281013 to 281119. The call to the "noinline" function ensures that the
kernel (and surrounding for loops) is compiled as a separate function, which
produces more efficient code. The function "grid.loop_int_device" contains
essentially three nested for loops, and the actual kernel is the C++ lambda in
lines 281015 to 281118.

I'll have a look at -fdump-tree-optimized.

Reply via email to