https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103976
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2022-01-11 Status|UNCONFIRMED |NEW Keywords| |missed-optimization Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. The kernel is outlined even in the if (0) path but eventually executed serially (which is faster than with using threads). The only difference with using if (1) is --- a-t.c.244t.optimized0 2022-01-11 14:07:52.152665056 +0100 +++ a-t.c.244t.optimized 2022-01-11 14:07:58.696751625 +0100 @@ -121,7 +121,7 @@ # sum_17 = PHI <sum_10(3), 0.0(2)> # ivtmp_4 = PHI <ivtmp_3(3), 100000000(2)> .omp_data_o.1.sum = sum_17; - __builtin_GOMP_parallel (main._omp_fn.0, &.omp_data_o.1, 1, 0); + __builtin_GOMP_parallel (main._omp_fn.0, &.omp_data_o.1, 0, 0); sum_10 = .omp_data_o.1.sum; .omp_data_o.1 ={v} {CLOBBER}; ivtmp_3 = ivtmp_4 + 4294967295; the loop kernel still executes workload computation and reduction commoning with atomics. Without -fopenmp we unroll the kernel and constant evaluate all 1./j