https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785

            Bug ID: 116785
           Summary: RAJAPerf REDUCE_SUM regresses with commit
                    f0a02467bbc35a478eb82f5a8a7e8870827b51fc
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

Some of the loops in RAJAPerf are not vectored with the change. This results in
~64% regression for this and some other kernels. This regression can also be
observed again gcc 11 (I tried only this version).

g++ -Ofast -S CONVECTION3DPA-Seq.cpp.ii  -fopt-info-vec -fpermissive

shows:
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:80:31:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:80:31:
optimized:  loop versioned for vectorization because of possible aliasing
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:47:31:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:47:31:
optimized:  loop versioned for vectorization because of possible aliasing
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized:  loop versioned for vectorization because of possible aliasing
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized:  loop versioned for vectorization because of possible aliasing

With the patch reverted:
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:100:29:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:89:29:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:80:31:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:67:29:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:58:31:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:47:31:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23:
optimized: loop vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:162:41:
optimized: basic block part vectorized using 16 byte vectors
/proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:40:29:
optimized: basic block part vectorized using 16 byte vectors




g++ -v                                                              
Using built-in specs.
COLLECT_GCC=/local/home/kvivekananda/install/bin/g++
COLLECT_LTO_WRAPPER=/local/home/kvivekananda/install/libexec/gcc/aarch64-unknown-linux-gnu/15.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc/configure --disable-bootstrap --enable-multiarch=yes
--enable-languages=c,c++,fortran,lto --prefix=/local/home/kvivekananda/install
: (reconfigured) ../gcc/configure --disable-bootstrap --enable-multiarch=yes
--prefix=/local/home/kvivekananda/install --enable-languages=c,c++,fortran,lto
--no-create --no-recursion
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 15.0.0 20240917 (experimental) (GCC)

Reply via email to