Author: William S. Moses
Date: 2022-03-06T18:34:25-05:00
New Revision: 87ec6f41bba6d72a3408e71cf19ae56feff523bc
URL:
https://github.com/llvm/llvm-project/commit/87ec6f41bba6d72a3408e71cf19ae56feff523bc
DIFF:
https://github.com/llvm/llvm-project/commit/87ec6f41bba6d72a3408e71cf19ae56feff523bc.diff
LOG: [OpenMPIRBuilder] Allocate temporary at the correct block in a nested
parallel
The OpenMPIRBuilder has a bug. Specifically, suppose you have two nested openmp
parallel regions (writing with MLIR for ease)
```
omp.parallel {
%a = ...
omp.parallel {
use(%a)
}
}
```
As OpenMP only permits pointer-like inputs, the builder will wrap all of the
inputs into a stack allocation, and then pass this
allocation to the inner parallel. For example, we would want to get something
like the following:
```
omp.parallel {
%a = ...
%tmp = alloc
store %tmp[] = %a
kmpc_fork(outlined, %tmp)
}
```
However, in practice, this is not what currently occurs in the context of
nested parallel regions. Specifically to the OpenMPIRBuilder,
the entirety of the function (at the LLVM level) is currently inlined with
blocks marking the corresponding start and end of each
region.
```
entry:
...
parallel1:
%a = ...
...
parallel2:
use(%a)
...
endparallel2:
...
endparallel1:
...
```
When the allocation is inserted, it presently inserted into the parent of the
entire function (e.g. entry) rather than the parent
allocation scope to the function being outlined. If we were outlining
parallel2, the corresponding alloca location would be parallel1.
This causes a variety of bugs, including
https://github.com/llvm/llvm-project/issues/54165 as one example.
This PR allows the stack allocation to be created at the correct allocation
block, and thus remedies such issues.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D121061
Added:
mlir/test/Target/LLVMIR/openmp-nested.mlir
Modified:
clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c
clang/test/OpenMP/irbuilder_nested_parallel_for.c
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
llvm/include/llvm/Transforms/Utils/CodeExtractor.h
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
llvm/lib/Transforms/IPO/IROutliner.cpp
llvm/lib/Transforms/Utils/CodeExtractor.cpp
Removed:
diff --git a/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c
b/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c
index 56aa41b0db466..2c1139d7ef7df 100644
--- a/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c
+++ b/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c
@@ -33,8 +33,7 @@ void nested_parallel_0(void) {
// ALL-LABEL: @_Z17nested_parallel_1Pfid(
// ALL-NEXT: entry:
-// ALL-NEXT:[[STRUCTARG14:%.*]] = alloca { { i32*, double*, float** }*,
i32*, double*, float** }, align 8
-// ALL-NEXT:[[STRUCTARG:%.*]] = alloca { i32*, double*, float** }, align 8
+// ALL-NEXT:[[STRUCTARG14:%.*]] = alloca { i32*, double*, float** }, align
8
// ALL-NEXT:[[R_ADDR:%.*]] = alloca float*, align 8
// ALL-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4
// ALL-NEXT:[[B_ADDR:%.*]] = alloca double, align 8
@@ -44,15 +43,13 @@ void nested_parallel_0(void) {
// ALL-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32
@__kmpc_global_thread_num(%struct.ident_t* @[[GLOB1]])
// ALL-NEXT:br label [[OMP_PARALLEL:%.*]]
// ALL: omp_parallel:
-// ALL-NEXT:[[GEP_STRUCTARG:%.*]] = getelementptr { { i32*, double*,
float** }*, i32*, double*, float** }, { { i32*, double*, float** }*, i32*,
double*, float** }* [[STRUCTARG14]], i32 0, i32 0
-// ALL-NEXT:store { i32*, double*, float** }* [[STRUCTARG]], { i32*,
double*, float** }** [[GEP_STRUCTARG]], align 8
-// ALL-NEXT:[[GEP_A_ADDR15:%.*]] = getelementptr { { i32*, double*,
float** }*, i32*, double*, float** }, { { i32*, double*, float** }*, i32*,
double*, float** }* [[STRUCTARG14]], i32 0, i32 1
+// ALL-NEXT:[[GEP_A_ADDR15:%.*]] = getelementptr { i32*, double*, float**
}, { i32*, double*, float** }* [[STRUCTARG14]], i32 0, i32 0
// ALL-NEXT:store i32* [[A_ADDR]], i32** [[GEP_A_ADDR15]], align 8
-// ALL-NEXT:[[GEP_B_ADDR16:%.*]] = getelementptr { { i32*, double*,
float** }*, i32*, double*, float** }, { { i32*, double*, float** }*, i32*,
double*, float** }* [[STRUCTARG14]], i32 0, i32 2
+// ALL-NEXT:[[GEP_B_ADDR16:%.*]] = getelementptr { i32*, double*, float**
}, { i32*, double*, float** }* [[STRUCTARG14]], i32 0, i32 1
// ALL-NEXT:store double* [[B_ADDR]], double** [[GEP_B_ADDR16]], align 8
-// ALL-NEXT:[[GEP_R_ADDR17:%.*]] = getelementptr { { i32*, double*,
float** }*, i32*, double*, float** }, { { i32*, double*, float** }*, i32*,
double*, float** }* [[STRUCTARG14]], i32 0, i32 3
+// ALL-NEXT:[[GEP_R_ADDR17:%.*]] = getelementptr { i32*, double*,