https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213
Bug ID: 97213 Summary: OpenMP "if" is dramatically slower than code-level "if" - why? Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: ttsiodras at gmail dot com CC: jakub at gcc dot gnu.org Target Milestone: --- In trying to understand how OpenMP `task` works, I did this benchmark: #include <omp.h> #include <stdio.h> long fib(int val) { if (val < 2) return val; long total = 0; { #pragma omp task shared(total) if(val==45) total += fib(val-1); #pragma omp task shared(total) if(val==45) total += fib(val-2); #pragma omp taskwait } return total; } int main() { #pragma omp parallel #pragma omp single { long res = fib(45); printf("fib(45)=%ld\n", res); } } It's a simple Fibonacci calculation, that only spawns two tasks at the top-level of fib(45) - basically, one thread does fib(44), the other does fib(43); and the results are added and returned. I know there's a chance for a race on the "+=" of the total - but that's not the point of this... Here's the performance in my i5 laptop: $ gcc -O2 with_openmp_if.c -fopenmp $ time ./a.out fib(45)=1134903170 real 1m4.244s user 1m44.696s sys 0m0.010s 64 seconds... Now compare this, to the same code, but with the "if" moved from OpenMP level, to user code level - i.e. this change in "fib": long fib(int val) { if (val < 2) return val; long total = 0; { if (val == 45) { #pragma omp task shared(total) total += fib(val-1); #pragma omp task shared(total) total += fib(val-2); #pragma omp taskwait } else return fib(val-1) + fib(val-2); } return total; } $ gcc -O2 with_normal_if.c -fopenmp $ time ./a.out fib(45)=1134903170 real 0m8.585s user 0m14.021s sys 0m0.011s We go from 64 seconds down to 8.5 seconds. Why? What does the OpenMP-level "if" do so differently, that it causes an order of magnitude less performance?