https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I don't see any difference though, neither with the fold-const.c change, nor with the loop-invert.patch (at least on my Haswell-E, -g -Ofast x86_64, single runs only; though, it shows the LU slowdown with C++ clearly, and also that clang wins on SOR and Sparse matmult, we win significantly on MonteCarlo and less significantly on FFT, C LU is comparable): gcc trunk 20160224 Composite Score: 2482.97 FFT Mflops: 1982.24 (N=1024) SOR Mflops: 1904.08 (100 x 100) MonteCarlo: Mflops: 677.65 Sparse matmult Mflops: 2775.38 (N=1000, nz=5000) LU Mflops: 5075.48 (M=100, N=100) g++ trunk 20160224 Composite Score: 2314.35 FFT Mflops: 1986.77 (N=1024) SOR Mflops: 1903.29 (100 x 100) MonteCarlo: Mflops: 678.80 Sparse matmult Mflops: 2775.33 (N=1000, nz=5000) LU Mflops: 4227.54 (M=100, N=100) g++ trunk 20160224 + fold-const.c MIN/MAX change Composite Score: 2331.88 FFT Mflops: 1983.28 (N=1024) SOR Mflops: 1906.04 (100 x 100) MonteCarlo: Mflops: 676.53 Sparse matmult Mflops: 2823.60 (N=1000, nz=5000) LU Mflops: 4269.96 (M=100, N=100) g++ trunk 20150224 + fold-const.c MIN/MAX change + loop-invert.patch Composite Score: 2332.00 FFT Mflops: 1983.18 (N=1024) SOR Mflops: 1905.64 (100 x 100) MonteCarlo: Mflops: 674.50 Sparse matmult Mflops: 2823.55 (N=1000, nz=5000) LU Mflops: 4273.14 (M=100, N=100) clang 3.8 Composite Score: 2418.13 FFT Mflops: 1583.23 (N=1024) SOR Mflops: 2130.27 (100 x 100) MonteCarlo: Mflops: 281.80 Sparse matmult Mflops: 3026.40 (N=1000, nz=5000) LU Mflops: 5068.95 (M=100, N=100) clang++ 3.8 Composite Score: 2434.04 FFT Mflops: 1595.89 (N=1024) SOR Mflops: 2131.09 (100 x 100) MonteCarlo: Mflops: 281.63 Sparse matmult Mflops: 3001.59 (N=1000, nz=5000) LU Mflops: 5159.98 (M=100, N=100)