https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213
Bug ID: 97213
Summary: OpenMP "if" is dramatically slower than code-level
"if" - why?
Product: gcc
Version: 10.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: ttsiodras at gmail dot com
CC: jakub at gcc dot gnu.org
Target Milestone: ---
In trying to understand how OpenMP `task` works, I did this benchmark:
#include
#include
long fib(int val)
{
if (val < 2)
return val;
long total = 0;
{
#pragma omp task shared(total) if(val==45)
total += fib(val-1);
#pragma omp task shared(total) if(val==45)
total += fib(val-2);
#pragma omp taskwait
}
return total;
}
int main()
{
#pragma omp parallel
#pragma omp single
{
long res = fib(45);
printf("fib(45)=%ld\n", res);
}
}
It's a simple Fibonacci calculation, that only spawns two tasks at the
top-level of fib(45) - basically, one thread does fib(44), the other does
fib(43); and the results are added and returned.
I know there's a chance for a race on the "+=" of the total - but that's not
the point of this... Here's the performance in my i5 laptop:
$ gcc -O2 with_openmp_if.c -fopenmp
$ time ./a.out
fib(45)=1134903170
real1m4.244s
user1m44.696s
sys 0m0.010s
64 seconds... Now compare this, to the same code, but with the "if" moved from
OpenMP level, to user code level - i.e. this change in "fib":
long fib(int val)
{
if (val < 2)
return val;
long total = 0;
{
if (val == 45) {
#pragma omp task shared(total)
total += fib(val-1);
#pragma omp task shared(total)
total += fib(val-2);
#pragma omp taskwait
} else
return fib(val-1) + fib(val-2);
}
return total;
}
$ gcc -O2 with_normal_if.c -fopenmp
$ time ./a.out
fib(45)=1134903170
real0m8.585s
user0m14.021s
sys 0m0.011s
We go from 64 seconds down to 8.5 seconds.
Why?
What does the OpenMP-level "if" do so differently, that it causes an order of
magnitude less performance?