Issue |
145165
|
Summary |
#pragma omp critical/#pragma omp atomic for a shared variable on device yields wrong values on nvptx.
|
Labels |
new issue
|
Assignees |
|
Reporter |
bschulz81
|
compile with -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
```
#include <omp.h>
#include <vector>
#include <iostream>
int main()
{
size_t elements=20;
std::vector<double> v1(elements),v2(elements);
#pragma omp parallel for simd
for(size_t i=1;i<elements;i++)
{
v1[i]=(double)i;
v2[i]=(double)i;
}
double* v1d=v1.data(),*v2d=v2.data();
double tmp=0;
#pragma omp target enter data map (to:v1d[0:elements])
#pragma omp target enter data map (to:v2d[0:elements])
#pragma omp target teams distribute parallel for shared(tmp)
for(size_t i=1;i<20;i++)
{
#pragma omp critical
tmp+=v1d[i]*v2d[i];
}
std::cout<<tmp<<"\n";
}
```
yields 0 instead of 2470.
if one replaces the
`#pragma omp target teams distribute parallel for shared(tmp)`
by
` #pragma omp parallel for shared(tmp)`
on the host, then the result is correct
if one replaces the pragma
`#pragma omp target teams distribute parallel for shared(tmp)`
by
`#pragma omp target teams distribute reduction(+:tmp)`
then the result is also correct, even though according to the OpenMP 5x standard
`teams distribute` has no reduction clause.
If one replaces the pragma
`#pragma omp critical `
by
`#pragma omp atomic`
the results are the same.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs