https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785
Bug ID: 116785 Summary: RAJAPerf REDUCE_SUM regresses with commit f0a02467bbc35a478eb82f5a8a7e8870827b51fc Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kugan at gcc dot gnu.org Target Milestone: --- Some of the loops in RAJAPerf are not vectored with the change. This results in ~64% regression for this and some other kernels. This regression can also be observed again gcc 11 (I tried only this version). g++ -Ofast -S CONVECTION3DPA-Seq.cpp.ii -fopt-info-vec -fpermissive shows: /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:80:31: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:80:31: optimized: loop versioned for vectorization because of possible aliasing /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:47:31: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:47:31: optimized: loop versioned for vectorization because of possible aliasing /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop versioned for vectorization because of possible aliasing /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop versioned for vectorization because of possible aliasing With the patch reverted: /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:100:29: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:89:29: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:80:31: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:67:29: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:58:31: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:47:31: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/tpl/RAJA/include/RAJA/policy/loop/launch.hpp:101:23: optimized: loop vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:162:41: optimized: basic block part vectorized using 16 byte vectors /proj/ta/tests/OpenMP_4.5_Test_Suite/benchmarks/RAJAPerf/src/apps/CONVECTION3DPA-Seq.cpp:40:29: optimized: basic block part vectorized using 16 byte vectors g++ -v Using built-in specs. COLLECT_GCC=/local/home/kvivekananda/install/bin/g++ COLLECT_LTO_WRAPPER=/local/home/kvivekananda/install/libexec/gcc/aarch64-unknown-linux-gnu/15.0.0/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: ../gcc/configure --disable-bootstrap --enable-multiarch=yes --enable-languages=c,c++,fortran,lto --prefix=/local/home/kvivekananda/install : (reconfigured) ../gcc/configure --disable-bootstrap --enable-multiarch=yes --prefix=/local/home/kvivekananda/install --enable-languages=c,c++,fortran,lto --no-create --no-recursion Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 15.0.0 20240917 (experimental) (GCC)