https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103976

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-01-11
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  The kernel is outlined even in the if (0) path but
eventually executed serially (which is faster than with using threads).
The only difference with using if (1) is

--- a-t.c.244t.optimized0       2022-01-11 14:07:52.152665056 +0100
+++ a-t.c.244t.optimized        2022-01-11 14:07:58.696751625 +0100
@@ -121,7 +121,7 @@
   # sum_17 = PHI <sum_10(3), 0.0(2)>
   # ivtmp_4 = PHI <ivtmp_3(3), 100000000(2)>
   .omp_data_o.1.sum = sum_17;
-  __builtin_GOMP_parallel (main._omp_fn.0, &.omp_data_o.1, 1, 0);
+  __builtin_GOMP_parallel (main._omp_fn.0, &.omp_data_o.1, 0, 0);
   sum_10 = .omp_data_o.1.sum;
   .omp_data_o.1 ={v} {CLOBBER};
   ivtmp_3 = ivtmp_4 + 4294967295;

the loop kernel still executes workload computation and reduction
commoning with atomics.  Without -fopenmp we unroll the kernel
and constant evaluate all 1./j

Reply via email to