[llvm-branch-commits] [mlir] [MLIR][OpenMP] Clause-based OpenMP operation definition (PR #92523)
@@ -315,38 +266,17 @@ def SectionsOp : OpenMP_Op<"sections", [AttrSizedOperandSegments, into the final value, which is available in the accumulator after all the sections complete. -The $allocators_vars and $allocate_vars parameters are a variadic list of values -that specify the memory allocator to be used to obtain storage for private values. - -The `nowait` attribute, when present, signifies that there should be no -implicit barrier at the end of the construct. - }]; - let arguments = (ins Variadic:$reduction_vars, - OptionalAttr:$reductions, - Variadic:$allocate_vars, - Variadic:$allocators_vars, - UnitAttr:$nowait); +The optional `byref` attribute controls whether reduction arguments are +passed by reference or by value. tblah wrote: Shouldn't this use the machinery in `OpenMP_ReductionClause`? https://github.com/llvm/llvm-project/pull/92523 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Clause-based OpenMP operation definition (PR #92523)
@@ -920,6 +663,9 @@ def TaskloopOp : OpenMP_Op<"taskloop", [AttrSizedOperandSegments, specifies how to combine the values from each iteration into the final value, which is available in the accumulator after the loop completes. +The optional `byref` attribute controls whether reduction arguments are +passed by reference or by value. + tblah wrote: Here too. I guess we need separate work to standardize how reductions work over different ops? https://github.com/llvm/llvm-project/pull/92523 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Clause-based OpenMP operation definition (PR #92523)
tblah wrote: > @tblah thanks for giving this a look. Basically the issue here with the > reduction clause is that its description is currently different for each > operation that accepts it. > It would be great to agree on a generic description for that clause that > would work for all ops, so that it makes sense while reading the generated > documentation and also being centralized into the clause definition rather > than each operation. I created a gist > [here](https://gist.github.com/skatrak/ee6e1829980867f8119f91f84cb284b1) with > the whole list of descriptions that currently exist for `reduction`, in case > that helps with figuring out a shared description. Thanks for taking the time to create the gist. Some thoughts - I think some of the descriptions are out of date. `omp.reduction` has been removed: https://github.com/llvm/llvm-project/pull/92732 - Assuming that `omp.reduction` really was unused, I think that leaves byref as the main difference? I guess fixing byref is on me (https://github.com/llvm/llvm-project/pull/92244). Unfortunately I can't work on this immediately so I won't hold up this PR for it. https://github.com/llvm/llvm-project/pull/92523 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Clause-based OpenMP operation definition (PR #92523)
https://github.com/tblah approved this pull request. LGTM, but please wait for another reviewer https://github.com/llvm/llvm-project/pull/92523 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Clause-based OpenMP operation definition (PR #92523)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/92523 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Clause-based OpenMP operation definition (PR #92523)
tblah wrote: > I guess fixing byref is on me (#92244). Unfortunately I can't work on this > immediately so I won't hold up this PR for it. @skatrak does https://github.com/llvm/llvm-project/pull/96215 cover everything you need? https://github.com/llvm/llvm-project/pull/92523 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Update flang with changes to the OpenMP dialect (PR #92524)
https://github.com/tblah approved this pull request. LGTM, thanks! https://github.com/llvm/llvm-project/pull/92524 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] NFC: Share DataSharingProcessor creation logic for all loop directives (PR #97565)
https://github.com/tblah approved this pull request. LGTM, thanks! https://github.com/llvm/llvm-project/pull/97565 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Refactor loop-related lowering for composite support (PR #97566)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/97566 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Refactor loop-related lowering for composite support (PR #97566)
@@ -518,8 +518,8 @@ struct OpWithBodyGenInfo { } OpWithBodyGenInfo & - setReductions(llvm::SmallVectorImpl *value1, -llvm::SmallVectorImpl *value2) { + setReductions(llvm::ArrayRef *value1, +llvm::ArrayRef *value2) { tblah wrote: Can we pass these by value instead of as pointers now? https://github.com/llvm/llvm-project/pull/97566 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Refactor loop-related lowering for composite support (PR #97566)
https://github.com/tblah commented: This looks good overall. Do you expect there to be a lot more wrapper operations in the near future? If so, they look like there is a common pattern that could be further abstracted. Something like ```c++ template static OP genWrapperOp(..., llvm::ArrayRef extraBlockArgTypes) { fir::FirOpBuilder &firOpBuilder = ...; auto op = firOpBuilder.create(loc, clauseOps); llvm::SmallVector blockArgLocs(extraBlockArgTypes.size(), loc); firOpBuilder.createBlock(...) firOpBuilder.setInsertionPoint(...) return op; } ``` https://github.com/llvm/llvm-project/pull/97566 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Refactor loop-related lowering for composite support (PR #97566)
@@ -1492,7 +1466,7 @@ genParallelOp(lower::AbstractConverter &converter, lower::SymMap &symTable, firOpBuilder.createBlock(®ion, /*insertPt=*/{}, allRegionArgTypes, allRegionArgLocs); -llvm::SmallVector allSymbols = reductionSyms; +llvm::SmallVector allSymbols(reductionSyms); tblah wrote: ultra-nit, feel free to ignore Personally I don't like using `()` here because it can parse (to humans and computers) like a function declaration. ```suggestion llvm::SmallVector allSymbols{reductionSyms}; ``` https://github.com/llvm/llvm-project/pull/97566 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Prevent allocas from being inserted into loop wrappers (PR #97563)
https://github.com/tblah approved this pull request. Code changes look good. I would prefer to see a lit test for this code path. It is good enough if this is used by a test added later in your PR stack. Otherwise, please could you add a lit test which hits this condition so that we can catch if this gets broken by any changes in the future. https://github.com/llvm/llvm-project/pull/97563 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Add lowering support for DISTRIBUTE SIMD (PR #97819)
https://github.com/tblah approved this pull request. LGTM, thanks! https://github.com/llvm/llvm-project/pull/97819 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [Flang][OpenMP] Add lowering support for DO SIMD (PR #97718)
https://github.com/tblah approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/97718 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add missing clauses to OpenMP op definitions (PR #99507)
https://github.com/tblah commented: I think it would be best to add errors for these unsupported clauses to the OpenMPToLLVMIR translation, so that clauses in MLIR are not silently ignored. https://github.com/llvm/llvm-project/pull/99507 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add missing clauses to OpenMP op definitions (PR #99507)
@@ -238,9 +237,9 @@ def SectionOp : OpenMP_Op<"section", [HasParent<"SectionsOp">], def SectionsOp : OpenMP_Op<"sections", traits = [ AttrSizedOperandSegments ], clauses = [ -// TODO: Complete clause list (private). // TODO: Sort clauses alphabetically. tblah wrote: nit: might as well sort the clauses while you are changing the line https://github.com/llvm/llvm-project/pull/99507 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add missing clauses to OpenMP op definitions (PR #99507)
@@ -432,9 +439,10 @@ def SimdOp : OpenMP_Op<"simd", traits = [ AttrSizedOperandSegments, DeclareOpInterfaceMethods, RecursiveMemoryEffects, SingleBlock ], clauses = [ -// TODO: Complete clause list (linear, private, reduction). +// TODO: Complete clause list (linear). tblah wrote: Why did you decide not to add linear while adding the other missing clauses? https://github.com/llvm/llvm-project/pull/99507 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add missing clauses to OpenMP op definitions (PR #99507)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/99507 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Create `LoopRelatedClause` (PR #99506)
https://github.com/tblah approved this pull request. LGTM, thanks! https://github.com/llvm/llvm-project/pull/99506 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Add frontend support for -fopenmp-targets (PR #100155)
https://github.com/tblah approved this pull request. LGTM, thanks https://github.com/llvm/llvm-project/pull/100155 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] release/19.x: [flang][debug] Set scope of internal functions correctly. (#99531) (PR #100727)
https://github.com/tblah approved this pull request. https://github.com/llvm/llvm-project/pull/100727 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] NFC: Sort clauses alphabetically (2/2) (PR #101194)
https://github.com/tblah approved this pull request. LGTM, thanks again! https://github.com/llvm/llvm-project/pull/101194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -0,0 +1,259 @@ +//===- LowerWorkshare.cpp - special cases for bufferization ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// Lower omp workshare construct. +//===--===// + +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/OpenMP/Passes.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" +#include "mlir/IR/BuiltinOps.h" +#include "mlir/IR/IRMapping.h" +#include "mlir/IR/OpDefinition.h" +#include "mlir/IR/PatternMatch.h" +#include "mlir/Support/LLVM.h" +#include "mlir/Transforms/GreedyPatternRewriteDriver.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace flangomp { +#define GEN_PASS_DEF_LOWERWORKSHARE +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +#define DEBUG_TYPE "lower-workshare" + +using namespace mlir; + +namespace flangomp { +bool shouldUseWorkshareLowering(Operation *op) { + auto workshare = dyn_cast(op->getParentOp()); + if (!workshare) +return false; + return workshare->getParentOfType(); +} +} // namespace flangomp + +namespace { + +struct SingleRegion { + Block::iterator begin, end; +}; + +static bool isSupportedByFirAlloca(Type ty) { + return !isa(ty); +} + +static bool isSafeToParallelize(Operation *op) { + if (isa(op)) +return true; + + llvm::SmallVector effects; + MemoryEffectOpInterface interface = dyn_cast(op); + if (!interface) { +return false; + } + interface.getEffects(effects); + if (effects.empty()) +return true; + + return false; +} + +/// Lowers workshare to a sequence of single-thread regions and parallel loops +/// +/// For example: +/// +/// omp.workshare { +/// %a = fir.allocmem +/// omp.wsloop {} +/// fir.call Assign %b %a +/// fir.freemem %a +/// } +/// +/// becomes +/// +/// omp.single { +/// %a = fir.allocmem +/// fir.store %a %tmp +/// } +/// %a_reloaded = fir.load %tmp +/// omp.wsloop {} +/// omp.single { +/// fir.call Assign %b %a_reloaded +/// fir.freemem %a_reloaded +/// } +/// +/// Note that we allocate temporary memory for values in omp.single's which need +/// to be accessed in all threads in the closest omp.parallel +/// +/// TODO currently we need to be able to access the encompassing omp.parallel so +/// that we can allocate temporaries accessible by all threads outside of it. +/// In case we do not find it, we fall back to converting the omp.workshare to +/// omp.single. +/// To better handle this we should probably enable yielding values out of an +/// omp.single which will be supported by the omp runtime. tblah wrote: Could you use the copyprivate clause to broadcast values from the omp.single to all other threads? https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
https://github.com/tblah commented: Thank you for your work so far. This is a great start. What is the plan for transforming do loops generated by lowering (e.g. that do not become hlfir.elemental operations and are not generated by hlfir bufferization)? https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -0,0 +1,259 @@ +//===- LowerWorkshare.cpp - special cases for bufferization ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// Lower omp workshare construct. +//===--===// + +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/OpenMP/Passes.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" +#include "mlir/IR/BuiltinOps.h" +#include "mlir/IR/IRMapping.h" +#include "mlir/IR/OpDefinition.h" +#include "mlir/IR/PatternMatch.h" +#include "mlir/Support/LLVM.h" +#include "mlir/Transforms/GreedyPatternRewriteDriver.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace flangomp { +#define GEN_PASS_DEF_LOWERWORKSHARE +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +#define DEBUG_TYPE "lower-workshare" + +using namespace mlir; + +namespace flangomp { +bool shouldUseWorkshareLowering(Operation *op) { + auto workshare = dyn_cast(op->getParentOp()); + if (!workshare) +return false; + return workshare->getParentOfType(); +} +} // namespace flangomp + +namespace { + +struct SingleRegion { + Block::iterator begin, end; +}; + +static bool isSupportedByFirAlloca(Type ty) { + return !isa(ty); +} + +static bool isSafeToParallelize(Operation *op) { + if (isa(op)) +return true; + + llvm::SmallVector effects; + MemoryEffectOpInterface interface = dyn_cast(op); + if (!interface) { +return false; + } + interface.getEffects(effects); + if (effects.empty()) +return true; + + return false; +} + +/// Lowers workshare to a sequence of single-thread regions and parallel loops +/// +/// For example: +/// +/// omp.workshare { +/// %a = fir.allocmem +/// omp.wsloop {} +/// fir.call Assign %b %a +/// fir.freemem %a +/// } +/// +/// becomes +/// +/// omp.single { +/// %a = fir.allocmem +/// fir.store %a %tmp +/// } +/// %a_reloaded = fir.load %tmp +/// omp.wsloop {} +/// omp.single { +/// fir.call Assign %b %a_reloaded +/// fir.freemem %a_reloaded +/// } +/// +/// Note that we allocate temporary memory for values in omp.single's which need +/// to be accessed in all threads in the closest omp.parallel +/// +/// TODO currently we need to be able to access the encompassing omp.parallel so +/// that we can allocate temporaries accessible by all threads outside of it. +/// In case we do not find it, we fall back to converting the omp.workshare to +/// omp.single. +/// To better handle this we should probably enable yielding values out of an +/// omp.single which will be supported by the omp runtime. +void lowerWorkshare(mlir::omp::WorkshareOp wsOp) { + assert(wsOp.getRegion().getBlocks().size() == 1); + + Location loc = wsOp->getLoc(); + + omp::ParallelOp parallelOp = wsOp->getParentOfType(); + if (!parallelOp) { +wsOp.emitWarning("cannot handle workshare, converting to single"); +Operation *terminator = wsOp.getRegion().front().getTerminator(); +wsOp->getBlock()->getOperations().splice( +wsOp->getIterator(), wsOp.getRegion().front().getOperations()); +terminator->erase(); +return; + } + + OpBuilder allocBuilder(parallelOp); + OpBuilder rootBuilder(wsOp); + IRMapping rootMapping; + + omp::SingleOp singleOp = nullptr; + + auto mapReloadedValue = [&](Value v, OpBuilder singleBuilder, + IRMapping singleMapping) { +if (auto reloaded = rootMapping.lookupOrNull(v)) + return; +Type llvmPtrTy = LLVM::LLVMPointerType::get(allocBuilder.getContext()); +Type ty = v.getType(); +Value alloc, reloaded; +if (isSupportedByFirAlloca(ty)) { + alloc = allocBuilder.create(loc, ty); + singleBuilder.create(loc, singleMapping.lookup(v), alloc); + reloaded = rootBuilder.create(loc, ty, alloc); +} else { tblah wrote: What types are you seeing used which cannot be supported by fir alloca? https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -0,0 +1,259 @@ +//===- LowerWorkshare.cpp - special cases for bufferization ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// Lower omp workshare construct. +//===--===// + +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/OpenMP/Passes.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" +#include "mlir/IR/BuiltinOps.h" +#include "mlir/IR/IRMapping.h" +#include "mlir/IR/OpDefinition.h" +#include "mlir/IR/PatternMatch.h" +#include "mlir/Support/LLVM.h" +#include "mlir/Transforms/GreedyPatternRewriteDriver.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace flangomp { +#define GEN_PASS_DEF_LOWERWORKSHARE +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +#define DEBUG_TYPE "lower-workshare" + +using namespace mlir; + +namespace flangomp { +bool shouldUseWorkshareLowering(Operation *op) { + auto workshare = dyn_cast(op->getParentOp()); + if (!workshare) +return false; + return workshare->getParentOfType(); +} +} // namespace flangomp + +namespace { + +struct SingleRegion { + Block::iterator begin, end; +}; + +static bool isSupportedByFirAlloca(Type ty) { + return !isa(ty); +} + +static bool isSafeToParallelize(Operation *op) { + if (isa(op)) +return true; + + llvm::SmallVector effects; + MemoryEffectOpInterface interface = dyn_cast(op); + if (!interface) { +return false; + } + interface.getEffects(effects); + if (effects.empty()) +return true; + + return false; +} + +/// Lowers workshare to a sequence of single-thread regions and parallel loops +/// +/// For example: +/// +/// omp.workshare { +/// %a = fir.allocmem +/// omp.wsloop {} +/// fir.call Assign %b %a +/// fir.freemem %a +/// } +/// +/// becomes +/// +/// omp.single { +/// %a = fir.allocmem +/// fir.store %a %tmp +/// } +/// %a_reloaded = fir.load %tmp +/// omp.wsloop {} +/// omp.single { +/// fir.call Assign %b %a_reloaded +/// fir.freemem %a_reloaded +/// } +/// +/// Note that we allocate temporary memory for values in omp.single's which need +/// to be accessed in all threads in the closest omp.parallel +/// +/// TODO currently we need to be able to access the encompassing omp.parallel so +/// that we can allocate temporaries accessible by all threads outside of it. +/// In case we do not find it, we fall back to converting the omp.workshare to +/// omp.single. +/// To better handle this we should probably enable yielding values out of an +/// omp.single which will be supported by the omp runtime. +void lowerWorkshare(mlir::omp::WorkshareOp wsOp) { + assert(wsOp.getRegion().getBlocks().size() == 1); + + Location loc = wsOp->getLoc(); + + omp::ParallelOp parallelOp = wsOp->getParentOfType(); + if (!parallelOp) { +wsOp.emitWarning("cannot handle workshare, converting to single"); +Operation *terminator = wsOp.getRegion().front().getTerminator(); +wsOp->getBlock()->getOperations().splice( +wsOp->getIterator(), wsOp.getRegion().front().getOperations()); +terminator->erase(); +return; + } + + OpBuilder allocBuilder(parallelOp); + OpBuilder rootBuilder(wsOp); + IRMapping rootMapping; + + omp::SingleOp singleOp = nullptr; + + auto mapReloadedValue = [&](Value v, OpBuilder singleBuilder, + IRMapping singleMapping) { +if (auto reloaded = rootMapping.lookupOrNull(v)) + return; +Type llvmPtrTy = LLVM::LLVMPointerType::get(allocBuilder.getContext()); +Type ty = v.getType(); +Value alloc, reloaded; +if (isSupportedByFirAlloca(ty)) { + alloc = allocBuilder.create(loc, ty); + singleBuilder.create(loc, singleMapping.lookup(v), alloc); + reloaded = rootBuilder.create(loc, ty, alloc); +} else { + auto one = allocBuilder.create( + loc, allocBuilder.getI32Type(), 1); + alloc = + allocBuilder.create(loc, llvmPtrTy, llvmPtrTy, one); + Value toStore = singleBuilder + .create( + loc, llvmPtrTy, singleMapping.lookup(v)) + .getResult(0); + singleBuilder.create(loc, toStore, alloc); + reloaded = rootBuilder.create(loc, llvmPtrTy, alloc); + reloaded = + rootBuilder.create(loc, ty, reloaded) + .getResult(0); +} +rootMapping.map(v, reloaded); + }; + + auto moveToSingle = [&](SingleRegion sr, OpBuilder singleBuilder) { +IRMapping singleMapping = rootMapping; + +for (Operation &op : llvm::make_range(sr.begin, sr.end)) { + singleBuilder.clone(op, singleMapping); + if (i
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -344,6 +345,7 @@ inline void createHLFIRToFIRPassPipeline( pm.addPass(hlfir::createLowerHLFIRIntrinsics()); pm.addPass(hlfir::createBufferizeHLFIR()); pm.addPass(hlfir::createConvertHLFIRtoFIR()); + pm.addPass(flangomp::createLowerWorkshare()); tblah wrote: The other OpenMP passes are added in `createOpenMPFIRPassPipeline`, which is only called when `-fopenmp` is used. It would be convenient if this new pass could stay with the other OpenMP passes. Currently those passes are run immediately after lowering. There are comments which say they have to be run immediately after lowering, but at a glance it isn't obvious why they couldn't be run here after HLFIR. @agozillon what do you think? https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -0,0 +1,18 @@ +//===-- Passes.td - HLFIR pass definition file -*- tablegen -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef FORTRAN_DIALECT_OPENMP_PASSES +#define FORTRAN_DIALECT_OPENMP_PASSES + +include "mlir/Pass/PassBase.td" + +def LowerWorkshare : Pass<"lower-workshare"> { tblah wrote: This pass doesn't have an operation type associated with it and so `pm.addPass` will run it on every operation in the module (and the module itself). I think we can only get workshare operations inside of functions so maybe this should be run on func.func. Maybe for more future proofing you could run it on all top level operations (e.g. `addNestedPassToAllTopLevelOperations` instead of `pm.addPass`). I think the pass has to be run on the parent of the workshare loop not on the workshare loop operation itself because operations are inserted and removed from that parent. https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -0,0 +1,259 @@ +//===- LowerWorkshare.cpp - special cases for bufferization ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// Lower omp workshare construct. +//===--===// + +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/OpenMP/Passes.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" +#include "mlir/IR/BuiltinOps.h" +#include "mlir/IR/IRMapping.h" +#include "mlir/IR/OpDefinition.h" +#include "mlir/IR/PatternMatch.h" +#include "mlir/Support/LLVM.h" +#include "mlir/Transforms/GreedyPatternRewriteDriver.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace flangomp { +#define GEN_PASS_DEF_LOWERWORKSHARE +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +#define DEBUG_TYPE "lower-workshare" + +using namespace mlir; + +namespace flangomp { +bool shouldUseWorkshareLowering(Operation *op) { + auto workshare = dyn_cast(op->getParentOp()); + if (!workshare) +return false; + return workshare->getParentOfType(); +} +} // namespace flangomp + +namespace { + +struct SingleRegion { + Block::iterator begin, end; +}; + +static bool isSupportedByFirAlloca(Type ty) { + return !isa(ty); +} + +static bool isSafeToParallelize(Operation *op) { + if (isa(op)) +return true; + + llvm::SmallVector effects; + MemoryEffectOpInterface interface = dyn_cast(op); + if (!interface) { +return false; + } + interface.getEffects(effects); + if (effects.empty()) +return true; + + return false; +} + +/// Lowers workshare to a sequence of single-thread regions and parallel loops +/// +/// For example: +/// +/// omp.workshare { +/// %a = fir.allocmem +/// omp.wsloop {} +/// fir.call Assign %b %a +/// fir.freemem %a +/// } +/// +/// becomes +/// +/// omp.single { +/// %a = fir.allocmem +/// fir.store %a %tmp +/// } +/// %a_reloaded = fir.load %tmp +/// omp.wsloop {} +/// omp.single { +/// fir.call Assign %b %a_reloaded +/// fir.freemem %a_reloaded +/// } +/// +/// Note that we allocate temporary memory for values in omp.single's which need +/// to be accessed in all threads in the closest omp.parallel +/// +/// TODO currently we need to be able to access the encompassing omp.parallel so +/// that we can allocate temporaries accessible by all threads outside of it. +/// In case we do not find it, we fall back to converting the omp.workshare to +/// omp.single. +/// To better handle this we should probably enable yielding values out of an +/// omp.single which will be supported by the omp runtime. +void lowerWorkshare(mlir::omp::WorkshareOp wsOp) { + assert(wsOp.getRegion().getBlocks().size() == 1); + + Location loc = wsOp->getLoc(); + + omp::ParallelOp parallelOp = wsOp->getParentOfType(); + if (!parallelOp) { +wsOp.emitWarning("cannot handle workshare, converting to single"); +Operation *terminator = wsOp.getRegion().front().getTerminator(); +wsOp->getBlock()->getOperations().splice( +wsOp->getIterator(), wsOp.getRegion().front().getOperations()); +terminator->erase(); +return; + } + + OpBuilder allocBuilder(parallelOp); + OpBuilder rootBuilder(wsOp); + IRMapping rootMapping; + + omp::SingleOp singleOp = nullptr; + + auto mapReloadedValue = [&](Value v, OpBuilder singleBuilder, + IRMapping singleMapping) { +if (auto reloaded = rootMapping.lookupOrNull(v)) + return; +Type llvmPtrTy = LLVM::LLVMPointerType::get(allocBuilder.getContext()); +Type ty = v.getType(); +Value alloc, reloaded; +if (isSupportedByFirAlloca(ty)) { + alloc = allocBuilder.create(loc, ty); + singleBuilder.create(loc, singleMapping.lookup(v), alloc); tblah wrote: What about more complicated types e.g. an allocatable array? If you only shallow copy the descriptor won't every thread in the parallel region try to free the same allocation? https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -2,3 +2,4 @@ add_subdirectory(CodeGen) add_subdirectory(Dialect) add_subdirectory(HLFIR) add_subdirectory(Transforms) +add_subdirectory(OpenMP) tblah wrote: There are some other OpenMP passes already in `flang/lib/Optimizer/Transforms/OMP*.cpp`. I prefer creating a separate subdirectory as you have done here. Please could you move the other passes here too. https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -792,7 +793,8 @@ struct ElementalOpConversion // Generate a loop nest looping around the fir.elemental shape and clone // fir.elemental region inside the inner loop. hlfir::LoopNest loopNest = -hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered()); +hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered(), + flangomp::shouldUseWorkshareLowering(elemental)); tblah wrote: I'm a bit worried about hlfir.elementals with the isOrdered flag being lowered as workshare loops. This could happen for example for elemental calls of impure functions. Does the standard say how these should be handled? https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -0,0 +1,259 @@ +//===- LowerWorkshare.cpp - special cases for bufferization ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// Lower omp workshare construct. +//===--===// + +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/OpenMP/Passes.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" +#include "mlir/IR/BuiltinOps.h" +#include "mlir/IR/IRMapping.h" +#include "mlir/IR/OpDefinition.h" +#include "mlir/IR/PatternMatch.h" +#include "mlir/Support/LLVM.h" +#include "mlir/Transforms/GreedyPatternRewriteDriver.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace flangomp { +#define GEN_PASS_DEF_LOWERWORKSHARE +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +#define DEBUG_TYPE "lower-workshare" + +using namespace mlir; + +namespace flangomp { +bool shouldUseWorkshareLowering(Operation *op) { + auto workshare = dyn_cast(op->getParentOp()); tblah wrote: why does the workshare op have to be the immediate parent? Couldn't there be another operation in between (e.g a `fir.if`?) https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -0,0 +1,259 @@ +//===- LowerWorkshare.cpp - special cases for bufferization ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// Lower omp workshare construct. +//===--===// + +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/OpenMP/Passes.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" +#include "mlir/IR/BuiltinOps.h" +#include "mlir/IR/IRMapping.h" +#include "mlir/IR/OpDefinition.h" +#include "mlir/IR/PatternMatch.h" +#include "mlir/Support/LLVM.h" +#include "mlir/Transforms/GreedyPatternRewriteDriver.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace flangomp { +#define GEN_PASS_DEF_LOWERWORKSHARE +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +#define DEBUG_TYPE "lower-workshare" + +using namespace mlir; + +namespace flangomp { +bool shouldUseWorkshareLowering(Operation *op) { + auto workshare = dyn_cast(op->getParentOp()); + if (!workshare) +return false; + return workshare->getParentOfType(); +} +} // namespace flangomp + +namespace { + +struct SingleRegion { + Block::iterator begin, end; +}; + +static bool isSupportedByFirAlloca(Type ty) { + return !isa(ty); +} + +static bool isSafeToParallelize(Operation *op) { + if (isa(op)) +return true; + + llvm::SmallVector effects; + MemoryEffectOpInterface interface = dyn_cast(op); + if (!interface) { +return false; + } + interface.getEffects(effects); + if (effects.empty()) +return true; + + return false; +} + +/// Lowers workshare to a sequence of single-thread regions and parallel loops +/// +/// For example: +/// +/// omp.workshare { +/// %a = fir.allocmem +/// omp.wsloop {} +/// fir.call Assign %b %a +/// fir.freemem %a +/// } +/// +/// becomes +/// +/// omp.single { +/// %a = fir.allocmem +/// fir.store %a %tmp +/// } +/// %a_reloaded = fir.load %tmp +/// omp.wsloop {} +/// omp.single { +/// fir.call Assign %b %a_reloaded +/// fir.freemem %a_reloaded +/// } +/// +/// Note that we allocate temporary memory for values in omp.single's which need +/// to be accessed in all threads in the closest omp.parallel +/// +/// TODO currently we need to be able to access the encompassing omp.parallel so +/// that we can allocate temporaries accessible by all threads outside of it. +/// In case we do not find it, we fall back to converting the omp.workshare to +/// omp.single. +/// To better handle this we should probably enable yielding values out of an +/// omp.single which will be supported by the omp runtime. +void lowerWorkshare(mlir::omp::WorkshareOp wsOp) { + assert(wsOp.getRegion().getBlocks().size() == 1); + + Location loc = wsOp->getLoc(); + + omp::ParallelOp parallelOp = wsOp->getParentOfType(); + if (!parallelOp) { +wsOp.emitWarning("cannot handle workshare, converting to single"); +Operation *terminator = wsOp.getRegion().front().getTerminator(); +wsOp->getBlock()->getOperations().splice( +wsOp->getIterator(), wsOp.getRegion().front().getOperations()); +terminator->erase(); +return; + } + + OpBuilder allocBuilder(parallelOp); + OpBuilder rootBuilder(wsOp); + IRMapping rootMapping; + + omp::SingleOp singleOp = nullptr; + + auto mapReloadedValue = [&](Value v, OpBuilder singleBuilder, + IRMapping singleMapping) { +if (auto reloaded = rootMapping.lookupOrNull(v)) + return; +Type llvmPtrTy = LLVM::LLVMPointerType::get(allocBuilder.getContext()); +Type ty = v.getType(); +Value alloc, reloaded; +if (isSupportedByFirAlloca(ty)) { + alloc = allocBuilder.create(loc, ty); + singleBuilder.create(loc, singleMapping.lookup(v), alloc); + reloaded = rootBuilder.create(loc, ty, alloc); +} else { + auto one = allocBuilder.create( + loc, allocBuilder.getI32Type(), 1); + alloc = + allocBuilder.create(loc, llvmPtrTy, llvmPtrTy, one); + Value toStore = singleBuilder + .create( + loc, llvmPtrTy, singleMapping.lookup(v)) + .getResult(0); + singleBuilder.create(loc, toStore, alloc); + reloaded = rootBuilder.create(loc, llvmPtrTy, alloc); + reloaded = + rootBuilder.create(loc, ty, reloaded) + .getResult(0); +} +rootMapping.map(v, reloaded); + }; + + auto moveToSingle = [&](SingleRegion sr, OpBuilder singleBuilder) { +IRMapping singleMapping = rootMapping; + +for (Operation &op : llvm::make_range(sr.begin, sr.end)) { + singleBuilder.clone(op, singleMapping); + if (i
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -0,0 +1,259 @@ +//===- LowerWorkshare.cpp - special cases for bufferization ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// Lower omp workshare construct. +//===--===// + +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/OpenMP/Passes.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" +#include "mlir/IR/BuiltinOps.h" +#include "mlir/IR/IRMapping.h" +#include "mlir/IR/OpDefinition.h" +#include "mlir/IR/PatternMatch.h" +#include "mlir/Support/LLVM.h" +#include "mlir/Transforms/GreedyPatternRewriteDriver.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/iterator_range.h" + +#include + +namespace flangomp { +#define GEN_PASS_DEF_LOWERWORKSHARE +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +#define DEBUG_TYPE "lower-workshare" + +using namespace mlir; + +namespace flangomp { +bool shouldUseWorkshareLowering(Operation *op) { + auto workshare = dyn_cast(op->getParentOp()); + if (!workshare) +return false; + return workshare->getParentOfType(); +} +} // namespace flangomp + +namespace { + +struct SingleRegion { + Block::iterator begin, end; +}; + +static bool isSupportedByFirAlloca(Type ty) { + return !isa(ty); +} + +static bool isSafeToParallelize(Operation *op) { + if (isa(op)) +return true; + + llvm::SmallVector effects; + MemoryEffectOpInterface interface = dyn_cast(op); + if (!interface) { +return false; + } + interface.getEffects(effects); + if (effects.empty()) +return true; + + return false; +} + +/// Lowers workshare to a sequence of single-thread regions and parallel loops +/// +/// For example: +/// +/// omp.workshare { +/// %a = fir.allocmem +/// omp.wsloop {} +/// fir.call Assign %b %a +/// fir.freemem %a +/// } +/// +/// becomes +/// +/// omp.single { +/// %a = fir.allocmem +/// fir.store %a %tmp +/// } +/// %a_reloaded = fir.load %tmp +/// omp.wsloop {} +/// omp.single { +/// fir.call Assign %b %a_reloaded +/// fir.freemem %a_reloaded +/// } +/// +/// Note that we allocate temporary memory for values in omp.single's which need +/// to be accessed in all threads in the closest omp.parallel +/// +/// TODO currently we need to be able to access the encompassing omp.parallel so +/// that we can allocate temporaries accessible by all threads outside of it. +/// In case we do not find it, we fall back to converting the omp.workshare to +/// omp.single. +/// To better handle this we should probably enable yielding values out of an +/// omp.single which will be supported by the omp runtime. +void lowerWorkshare(mlir::omp::WorkshareOp wsOp) { + assert(wsOp.getRegion().getBlocks().size() == 1); + + Location loc = wsOp->getLoc(); + + omp::ParallelOp parallelOp = wsOp->getParentOfType(); + if (!parallelOp) { +wsOp.emitWarning("cannot handle workshare, converting to single"); +Operation *terminator = wsOp.getRegion().front().getTerminator(); +wsOp->getBlock()->getOperations().splice( +wsOp->getIterator(), wsOp.getRegion().front().getOperations()); +terminator->erase(); +return; + } + + OpBuilder allocBuilder(parallelOp); + OpBuilder rootBuilder(wsOp); + IRMapping rootMapping; + + omp::SingleOp singleOp = nullptr; + + auto mapReloadedValue = [&](Value v, OpBuilder singleBuilder, + IRMapping singleMapping) { +if (auto reloaded = rootMapping.lookupOrNull(v)) + return; +Type llvmPtrTy = LLVM::LLVMPointerType::get(allocBuilder.getContext()); +Type ty = v.getType(); +Value alloc, reloaded; +if (isSupportedByFirAlloca(ty)) { + alloc = allocBuilder.create(loc, ty); + singleBuilder.create(loc, singleMapping.lookup(v), alloc); tblah wrote: Ahh you are correct for the allocatable array case. If the fir.freemem or runtime destroy call is in a single region this will be fine. https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
https://github.com/tblah commented: Thank you for all of the updates! > So the LowerWorksharePass that I have implemented here is tasked with > parallelizing the loops nested in workshare_loop_wrapper and both the > Fortran->mlir frontend and the hlfir lowering passes would be responsible for > emitting the workshare_loop_wrapper ops where appropriate. For that I have > started with some of the obvious lowerings in the hlfir bufferizations, but > perhaps that can be done gradually and not everything needs to be covered by > this PR. Let me know what you think. Doing it gradually sounds good to me. When you take this out of draft, please document in the commit message exactly what is and is not supported at this point. https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
@@ -0,0 +1,399 @@ +//===- LowerWorkshare.cpp - special cases for bufferization ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// Lower omp workshare construct. +//===--===// + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace flangomp { +#define GEN_PASS_DEF_LOWERWORKSHARE +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +#define DEBUG_TYPE "lower-workshare" + +using namespace mlir; + +namespace flangomp { +bool shouldUseWorkshareLowering(Operation *op) { + // TODO this is insufficient, as we could have + // omp.parallel { + // omp.workshare { + // omp.parallel { + // hlfir.elemental {} + // + // Then this hlfir.elemental shall _not_ use the lowering for workshare + // + // Standard says: + // For a parallel construct, the construct is a unit of work with respect to + // the workshare construct. The statements contained in the parallel + // construct are executed by a new thread team. + // + // TODO similarly for single, critical, etc. Need to think through the + // patterns and implement this function. + // + return op->getParentOfType(); +} +} // namespace flangomp + +namespace { + +struct SingleRegion { + Block::iterator begin, end; +}; + +static bool mustParallelizeOp(Operation *op) { + // TODO as in shouldUseWorkshareLowering we be careful not to pick up + // workshare_loop_wrapper in nested omp.parallel ops + // + // e.g. + // + // omp.parallel { + // omp.workshare { + // omp.parallel { + // omp.workshare { + // omp.workshare_loop_wrapper {} + return op + ->walk( + [](omp::WorkshareLoopWrapperOp) { return WalkResult::interrupt(); }) + .wasInterrupted(); +} + +static bool isSafeToParallelize(Operation *op) { + return isa(op) || isa(op) || + isMemoryEffectFree(op); +} + +static mlir::func::FuncOp createCopyFunc(mlir::Location loc, mlir::Type varType, tblah wrote: Could this re-use `createCopyFunc` from `ClauseProcessor.cpp`? It would be good to only implement copyprivate in one place. If not, please could you document the difference between workshare copyprivate and the implementation of the copyprivate clause. https://github.com/llvm/llvm-project/pull/101446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][OpenMP] Convert reduction alloc region to LLVMIR (PR #102524)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/102524 The intention of this change is to ensure that allocas end up in the entry block not spread out amongst complex reduction variable initialization code. The tests we have are quite minimized for readability and maintainability, making the benefits less obvious. The use case for this is when there are multiple reduction variables each will multiple blocks inside of the init region for that reduction. 2/3 Part 1: https://github.com/llvm/llvm-project/pull/102522 >From 05093116a50eff847d3cfea08b5e7e02cd80a527 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Wed, 31 Jul 2024 10:10:33 + Subject: [PATCH] [mlir][OpenMP] Convert reduction alloc region to LLVMIR The intention of this change is to ensure that allocas end up in the entry block not spread out amongst complex reduction variable initialization code. The tests we have are quite minimized for readability and maintainability, making the benefits less obvious. The use case for this is when there are multiple reduction variables each will multiple blocks inside of the init region for that reduction. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 134 +- mlir/test/Target/LLVMIR/openmp-private.mlir | 6 +- .../openmp-reduction-array-sections.mlir | 14 +- .../Target/LLVMIR/openmp-reduction-byref.mlir | 12 +- 4 files changed, 119 insertions(+), 47 deletions(-) diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 458d05d5059db7..e3caf90696d699 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -594,45 +594,85 @@ convertOmpOrderedRegion(Operation &opInst, llvm::IRBuilderBase &builder, /// Allocate space for privatized reduction variables. template -static void allocByValReductionVars( -T loop, ArrayRef reductionArgs, llvm::IRBuilderBase &builder, -LLVM::ModuleTranslation &moduleTranslation, -llvm::OpenMPIRBuilder::InsertPointTy &allocaIP, -SmallVectorImpl &reductionDecls, -SmallVectorImpl &privateReductionVariables, -DenseMap &reductionVariableMap, -llvm::ArrayRef isByRefs) { +static LogicalResult +allocReductionVars(T loop, ArrayRef reductionArgs, + llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation, + llvm::OpenMPIRBuilder::InsertPointTy &allocaIP, + SmallVectorImpl &reductionDecls, + SmallVectorImpl &privateReductionVariables, + DenseMap &reductionVariableMap, + llvm::ArrayRef isByRefs) { llvm::IRBuilderBase::InsertPointGuard guard(builder); builder.SetInsertPoint(allocaIP.getBlock()->getTerminator()); + // delay creating stores until after all allocas + SmallVector> storesToCreate; + storesToCreate.reserve(loop.getNumReductionVars()); + for (std::size_t i = 0; i < loop.getNumReductionVars(); ++i) { -if (isByRefs[i]) - continue; -llvm::Value *var = builder.CreateAlloca( -moduleTranslation.convertType(reductionDecls[i].getType())); -moduleTranslation.mapValue(reductionArgs[i], var); -privateReductionVariables[i] = var; -reductionVariableMap.try_emplace(loop.getReductionVars()[i], var); +Region &allocRegion = reductionDecls[i].getAllocRegion(); +if (isByRefs[i]) { + if (allocRegion.empty()) +continue; + + SmallVector phis; + if (failed(inlineConvertOmpRegions(allocRegion, "omp.reduction.alloc", + builder, moduleTranslation, &phis))) +return failure(); + assert(phis.size() == 1 && "expected one allocation to be yielded"); + + builder.SetInsertPoint(allocaIP.getBlock()->getTerminator()); + + // Allocate reduction variable (which is a pointer to the real reduction + // variable allocated in the inlined region) + llvm::Value *var = builder.CreateAlloca( + moduleTranslation.convertType(reductionDecls[i].getType())); + storesToCreate.emplace_back(phis[0], var); + + privateReductionVariables[i] = var; + moduleTranslation.mapValue(reductionArgs[i], phis[0]); + reductionVariableMap.try_emplace(loop.getReductionVars()[i], phis[0]); +} else { + assert(allocRegion.empty() && + "allocaction is implicit for by-val reduction"); + llvm::Value *var = builder.CreateAlloca( + moduleTranslation.convertType(reductionDecls[i].getType())); + moduleTranslation.mapValue(reductionArgs[i], var); + privateReductionVariables[i] = var; + reductionVariableMap.try_emplace(loop.getReductionVars()[i], var); +} } + + // TODO: further delay this so it doesn't come in the entry block at all + for (auto [data, addr] : storesToCreate) +builder
[llvm-branch-commits] [flang] [flang][OpenMP] use reduction alloc region (PR #102525)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/102525 I removed the `*-hlfir*` tests because they are duplicate now that the other tests have been updated to use the HLFIR lowering. 3/3 Part 1: https://github.com/llvm/llvm-project/pull/102522 Part 2: https://github.com/llvm/llvm-project/pull/102524 >From 2c0a25e9821d1f08e471d0d215cc22250252c580 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Tue, 30 Jul 2024 10:56:36 + Subject: [PATCH] [flang][OpenMP] use reduction alloc region I removed the `*-hlfir*` tests because they are duplicate now that the other tests have been updated to use the HLFIR lowering. --- flang/lib/Lower/OpenMP/ReductionProcessor.cpp | 86 ++- .../delayed-privatization-reduction-byref.f90 | 2 +- .../OpenMP/parallel-reduction-add-byref.f90 | 20 +++-- .../parallel-reduction-allocatable-array.f90 | 14 +-- .../OpenMP/parallel-reduction-array-lb.f90| 12 +-- .../Lower/OpenMP/parallel-reduction-array.f90 | 12 +-- .../OpenMP/parallel-reduction-array2.f90 | 12 +-- .../Lower/OpenMP/parallel-reduction-byref.f90 | 12 +-- .../Lower/OpenMP/parallel-reduction-mixed.f90 | 4 +- .../parallel-reduction-pointer-array.f90 | 14 +-- .../test/Lower/OpenMP/parallel-reduction3.f90 | 12 +-- .../OpenMP/reduction-array-intrinsic.f90 | 12 +-- .../Lower/OpenMP/sections-array-reduction.f90 | 5 +- .../OpenMP/wsloop-reduction-add-byref.f90 | 48 ++- .../wsloop-reduction-add-hlfir-byref.f90 | 58 - .../OpenMP/wsloop-reduction-add-hlfir.f90 | 54 ...oop-reduction-allocatable-array-minmax.f90 | 28 +++--- .../OpenMP/wsloop-reduction-allocatable.f90 | 14 +-- .../wsloop-reduction-array-assumed-shape.f90 | 12 +-- .../Lower/OpenMP/wsloop-reduction-array.f90 | 12 +-- .../Lower/OpenMP/wsloop-reduction-array2.f90 | 12 +-- .../OpenMP/wsloop-reduction-iand-byref.f90| 10 ++- .../OpenMP/wsloop-reduction-ieor-byref.f90| 10 ++- .../OpenMP/wsloop-reduction-ior-byref.f90 | 10 ++- .../wsloop-reduction-logical-and-byref.f90| 12 +-- .../wsloop-reduction-logical-eqv-byref.f90| 12 +-- .../wsloop-reduction-logical-neqv-byref.f90 | 12 +-- .../wsloop-reduction-logical-or-byref.f90 | 12 +-- .../OpenMP/wsloop-reduction-max-byref.f90 | 18 ++-- .../wsloop-reduction-max-hlfir-byref.f90 | 64 -- .../OpenMP/wsloop-reduction-max-hlfir.f90 | 60 - .../OpenMP/wsloop-reduction-min-byref.f90 | 18 ++-- .../OpenMP/wsloop-reduction-mul-byref.f90 | 40 + .../wsloop-reduction-multiple-clauses.f90 | 12 +-- .../Lower/OpenMP/wsloop-reduction-pointer.f90 | 8 +- 35 files changed, 317 insertions(+), 436 deletions(-) delete mode 100644 flang/test/Lower/OpenMP/wsloop-reduction-add-hlfir-byref.f90 delete mode 100644 flang/test/Lower/OpenMP/wsloop-reduction-add-hlfir.f90 delete mode 100644 flang/test/Lower/OpenMP/wsloop-reduction-max-hlfir-byref.f90 delete mode 100644 flang/test/Lower/OpenMP/wsloop-reduction-max-hlfir.f90 diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp index c3c1f363033c27..c87182abe3d187 100644 --- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp +++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp @@ -489,23 +489,57 @@ static mlir::Type unwrapSeqOrBoxedType(mlir::Type ty) { return ty; } -static mlir::Value -createReductionInitRegion(fir::FirOpBuilder &builder, mlir::Location loc, - mlir::omp::DeclareReductionOp &reductionDecl, - const ReductionProcessor::ReductionIdentifier redId, - mlir::Type type, bool isByRef) { +static void createReductionAllocAndInitRegions( +fir::FirOpBuilder &builder, mlir::Location loc, +mlir::omp::DeclareReductionOp &reductionDecl, +const ReductionProcessor::ReductionIdentifier redId, mlir::Type type, +bool isByRef) { + auto yield = [&](mlir::Value ret) { +builder.create(loc, ret); + }; + + mlir::Block *allocBlock = nullptr; + mlir::Block *initBlock = nullptr; + if (isByRef) { +allocBlock = +builder.createBlock(&reductionDecl.getAllocRegion(), +reductionDecl.getAllocRegion().end(), {}, {}); +initBlock = builder.createBlock(&reductionDecl.getInitializerRegion(), +reductionDecl.getInitializerRegion().end(), +{type, type}, {loc, loc}); + } else { +initBlock = builder.createBlock(&reductionDecl.getInitializerRegion(), +reductionDecl.getInitializerRegion().end(), +{type}, {loc}); + } + mlir::Type ty = fir::unwrapRefType(type); + builder.setInsertionPointToEnd(initBlock); mlir::Value initValue = ReductionProcessor::getReductionInitValue( loc, unwrapSeqOrBoxedType(ty), redId, builder); if (fir::isa_trivial(ty)) { if
[llvm-branch-commits] [mlir] [mlir][OpenMP] Convert reduction alloc region to LLVMIR (PR #102524)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/102524 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][OpenMP] Convert reduction alloc region to LLVMIR (PR #102524)
tblah wrote: ping for review https://github.com/llvm/llvm-project/pull/102524 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] use reduction alloc region (PR #102525)
tblah wrote: ping for review https://github.com/llvm/llvm-project/pull/102525 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][OpenMP] Convert reduction alloc region to LLVMIR (PR #102524)
@@ -594,45 +594,85 @@ convertOmpOrderedRegion(Operation &opInst, llvm::IRBuilderBase &builder, /// Allocate space for privatized reduction variables. template -static void allocByValReductionVars( -T loop, ArrayRef reductionArgs, llvm::IRBuilderBase &builder, -LLVM::ModuleTranslation &moduleTranslation, -llvm::OpenMPIRBuilder::InsertPointTy &allocaIP, -SmallVectorImpl &reductionDecls, -SmallVectorImpl &privateReductionVariables, -DenseMap &reductionVariableMap, -llvm::ArrayRef isByRefs) { +static LogicalResult +allocReductionVars(T loop, ArrayRef reductionArgs, + llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation, + llvm::OpenMPIRBuilder::InsertPointTy &allocaIP, + SmallVectorImpl &reductionDecls, + SmallVectorImpl &privateReductionVariables, + DenseMap &reductionVariableMap, + llvm::ArrayRef isByRefs) { llvm::IRBuilderBase::InsertPointGuard guard(builder); builder.SetInsertPoint(allocaIP.getBlock()->getTerminator()); + // delay creating stores until after all allocas + SmallVector> storesToCreate; + storesToCreate.reserve(loop.getNumReductionVars()); + for (std::size_t i = 0; i < loop.getNumReductionVars(); ++i) { -if (isByRefs[i]) - continue; -llvm::Value *var = builder.CreateAlloca( -moduleTranslation.convertType(reductionDecls[i].getType())); -moduleTranslation.mapValue(reductionArgs[i], var); -privateReductionVariables[i] = var; -reductionVariableMap.try_emplace(loop.getReductionVars()[i], var); +Region &allocRegion = reductionDecls[i].getAllocRegion(); +if (isByRefs[i]) { + if (allocRegion.empty()) tblah wrote: The alloc region is optional. If it isn't included it could still be included in the initialization region as normal. This could happen for example if there is no part of allocation that is on the stack (because we don't want a call to malloc mixed into the middle of allocas). https://github.com/llvm/llvm-project/pull/102524 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][omp] Add omp.workshare op (PR #101443)
https://github.com/tblah approved this pull request. LGTM. Thanks for the updates https://github.com/llvm/llvm-project/pull/101443 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][omp] Emit omp.workshare in frontend (PR #101444)
tblah wrote: > Should we have a `-use-experimental-workshare` or similar flag to facilitate > some temporary in-tree development as this may require more moving pieces? A flag like that sounds appropriate yes. The current code changes look good. https://github.com/llvm/llvm-project/pull/101444 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [flang] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
tblah wrote: Please could you update the documentation for reductions on line 442 - I presume we don't want to encourage `omp.reduction` operations anymore https://github.com/llvm/llvm-project/pull/80019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [flang] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
@@ -398,11 +400,39 @@ struct ParallelOpLowering : public OpRewritePattern { // Replace the reduction operations contained in this loop. Must be done // here rather than in a separate pattern to have access to the list of // reduction variables. +unsigned int reductionIndex = 0; for (auto [x, y] : llvm::zip_equal(reductionVariables, reduce.getOperands())) { OpBuilder::InsertionGuard guard(rewriter); rewriter.setInsertionPoint(reduce); - rewriter.create(reduce.getLoc(), y, x); + Region &redRegion = + ompReductionDecls[reductionIndex].getReductionRegion(); + assert(redRegion.hasOneBlock() && + "expect reduction region to have one block"); tblah wrote: Please could you add a comment explaining why a reduction region must have only one block, or adding a TODO for multiple blocks. https://github.com/llvm/llvm-project/pull/80019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [flang] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
@@ -398,11 +400,39 @@ struct ParallelOpLowering : public OpRewritePattern { // Replace the reduction operations contained in this loop. Must be done // here rather than in a separate pattern to have access to the list of // reduction variables. +unsigned int reductionIndex = 0; for (auto [x, y] : llvm::zip_equal(reductionVariables, reduce.getOperands())) { tblah wrote: nit: you could add `ompReductionDecls` to the `llvm::zip_equal` so that the loop handles the iteration. https://github.com/llvm/llvm-project/pull/80019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
https://github.com/tblah approved this pull request. Thanks for the update! https://github.com/llvm/llvm-project/pull/80019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [flang][OpenMP][OMPIRBuilder][mlir] Optionally pass reduction vars by ref (PR #84304)
tblah wrote: co-authored with @Leporacanthicus (github seems to have taken the tag out of the commit message but shows it in the UI) https://github.com/llvm/llvm-project/pull/84304 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][TBAABuilder] not all loads and stores are inside of functions (PR #84305)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/84305 TBAA builder assumed that all loads/stores are inside of functions and hit an assertion once it found loads and stores inside of an omp::ReductionDeclareOp. For now just don't add TBAA tags to those loads and stores. They would end up in a different TBAA tree to the host function after OpenMPIRBuilder inlines them anyway so there isn't an easy way of making this work. Commit series for by-ref OpenMP reductions: 2/3 >From ba0182b73b1c6f8b91efa8d74b185b3bfc0f1cb6 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 15 Feb 2024 13:29:00 + Subject: [PATCH] [flang][TBAABuilder] not all loads and stores are inside of functions TBAA builder assumed that all loads/stores are inside of functions and hit an assertion once it found loads and stores inside of an omp::ReductionDeclareOp. For now just don't add TBAA tags to those loads and stores. They would end up in a different TBAA tree to the host function after OpenMPIRBuilder inlines them anyway so there isn't an easy way of making this work. Commit series for by-ref OpenMP reductions: 2/3 --- flang/lib/Optimizer/CodeGen/TBAABuilder.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/flang/lib/Optimizer/CodeGen/TBAABuilder.cpp b/flang/lib/Optimizer/CodeGen/TBAABuilder.cpp index 8e7f59f76383c9..b1b0e9b766a625 100644 --- a/flang/lib/Optimizer/CodeGen/TBAABuilder.cpp +++ b/flang/lib/Optimizer/CodeGen/TBAABuilder.cpp @@ -102,7 +102,8 @@ void TBAABuilder::attachTBAATag(AliasAnalysisOpInterface op, Type baseFIRType, return; mlir::LLVM::LLVMFuncOp func = op->getParentOfType(); - assert(func && "func.func should have already been converted to llvm.func"); + if (!func) +return; ++tagAttachmentCounter; if (tagAttachmentLimit != kTagAttachmentUnlimited && ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][TBAABuilder] not all loads and stores are inside of functions (PR #84305)
tblah wrote: Next PR in the series https://github.com/llvm/llvm-project/pull/84304 https://github.com/llvm/llvm-project/pull/84305 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] support fir.alloca operations inside of omp reduction ops (PR #84952)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/84952 Advise to place the alloca at the start of the first block of whichever region (init or combiner) we are currently inside. It probably isn't safe to put an alloca inside of a combiner region because this will be executed multiple times. But that would be a bug to fix in Lower/OpenMP.cpp, not here. OpenMP array reductions 1/6 >From 63c7fe3c522047b4df964a137ffdf0bbc38acec8 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Wed, 14 Feb 2024 15:22:02 + Subject: [PATCH] [flang] support fir.alloca operations inside of omp reduction ops Advise to place the alloca at the start of the first block of whichever region (init or combiner) we are currently inside. It probably isn't safe to put an alloca inside of a combiner region because this will be executed multiple times. But that would be a bug to fix in Lower/OpenMP.cpp, not here. --- flang/lib/Optimizer/Builder/FIRBuilder.cpp | 2 ++ flang/lib/Optimizer/CodeGen/CodeGen.cpp| 11 +-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/flang/lib/Optimizer/Builder/FIRBuilder.cpp b/flang/lib/Optimizer/Builder/FIRBuilder.cpp index 12da7412888a3b..f7327a299d9a5e 100644 --- a/flang/lib/Optimizer/Builder/FIRBuilder.cpp +++ b/flang/lib/Optimizer/Builder/FIRBuilder.cpp @@ -208,6 +208,8 @@ mlir::Block *fir::FirOpBuilder::getAllocaBlock() { .getParentOfType()) { return ompOutlineableIface.getAllocaBlock(); } + if (mlir::isa(getRegion().getParentOp())) +return &getRegion().front(); if (auto accRecipeIface = getRegion().getParentOfType()) { return accRecipeIface.getAllocaBlock(getRegion()); diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp index f81a08388da722..123eb6e4e6a255 100644 --- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp +++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp @@ -410,8 +410,15 @@ class FIROpConversion : public mlir::ConvertOpToLLVMPattern { mlir::ConversionPatternRewriter &rewriter) const { auto thisPt = rewriter.saveInsertionPoint(); mlir::Operation *parentOp = rewriter.getInsertionBlock()->getParentOp(); -mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp); -rewriter.setInsertionPointToStart(insertBlock); +if (mlir::isa(parentOp)) { + // ReductionDeclareOp has multiple child regions. We want to get the first + // block of whichever of those regions we are currently in + mlir::Region *parentRegion = rewriter.getInsertionBlock()->getParent(); + rewriter.setInsertionPointToStart(&parentRegion->front()); +} else { + mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp); + rewriter.setInsertionPointToStart(insertBlock); +} auto size = genI32Constant(loc, rewriter, 1); unsigned allocaAs = getAllocaAddressSpace(rewriter); unsigned programAs = getProgramAddressSpace(rewriter); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] run CFG conversion on omp reduction declare ops (PR #84953)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/84953 Most FIR passes only look for FIR operations inside of functions (either because they run only on func.func or they run on the module but iterate over functions internally). But there can also be FIR operations inside of fir.global, some OpenMP and OpenACC container operations. This has worked so far for fir.global and OpenMP reductions because they only contained very simple FIR code which doesn't need most passes to be lowered into LLVM IR. I am not sure how OpenACC works. In the long run, I hope to see a more systematic approach to making sure that every pass runs on all of these container operations. I will write an RFC for this soon. In the meantime, this pass duplicates the CFG conversion pass to also run on omp reduction operations. This is similar to how the AbstractResult pass is already duplicated for fir.global operations. OpenMP array reductions 2/6 >From 192da3c05fd8c0759f280e0895ffc2f09b2203e4 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 15 Feb 2024 12:12:29 + Subject: [PATCH] [flang] run CFG conversion on omp reduction declare ops Most FIR passes only look for FIR operations inside of functions (either because they run only on func.func or they run on the module but iterate over functions internally). But there can also be FIR operations inside of fir.global, some OpenMP and OpenACC container operations. This has worked so far for fir.global and OpenMP reductions because they only contained very simple FIR code which doesn't need most passes to be lowered into LLVM IR. I am not sure how OpenACC works. In the long run, I hope to see a more systematic approach to making sure that every pass runs on all of these container operations. I will write an RFC for this soon. In the meantime, this pass duplicates the CFG conversion pass to also run on omp reduction operations. This is similar to how the AbstractResult pass is already duplicated for fir.global operations. Co-authored-by: Mats Petersson --- .../flang/Optimizer/Transforms/Passes.h | 7 +++-- .../flang/Optimizer/Transforms/Passes.td | 12 ++-- flang/include/flang/Tools/CLOptions.inc | 4 ++- .../Transforms/ControlFlowConverter.cpp | 30 ++- flang/test/Driver/bbc-mlir-pass-pipeline.f90 | 5 +++- .../test/Driver/mlir-debug-pass-pipeline.f90 | 16 +- flang/test/Driver/mlir-pass-pipeline.f90 | 16 ++ flang/test/Fir/array-value-copy-2.fir | 4 +-- flang/test/Fir/basic-program.fir | 5 +++- .../Fir/convert-to-llvm-openmp-and-fir.fir| 2 +- flang/test/Fir/loop01.fir | 2 +- flang/test/Fir/loop02.fir | 4 +-- flang/test/Lower/OpenMP/FIR/flush.f90 | 2 +- flang/test/Lower/OpenMP/FIR/master.f90| 2 +- .../Lower/OpenMP/FIR/parallel-sections.f90| 2 +- 15 files changed, 76 insertions(+), 37 deletions(-) diff --git a/flang/include/flang/Optimizer/Transforms/Passes.h b/flang/include/flang/Optimizer/Transforms/Passes.h index e1d22c8c986da7..adf747ebdb400b 100644 --- a/flang/include/flang/Optimizer/Transforms/Passes.h +++ b/flang/include/flang/Optimizer/Transforms/Passes.h @@ -11,6 +11,7 @@ #include "flang/Optimizer/Dialect/FIROps.h" #include "mlir/Dialect/LLVMIR/LLVMAttrs.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/Pass/Pass.h" #include "mlir/Pass/PassRegistry.h" #include @@ -37,7 +38,8 @@ namespace fir { #define GEN_PASS_DECL_ANNOTATECONSTANTOPERANDS #define GEN_PASS_DECL_ARRAYVALUECOPY #define GEN_PASS_DECL_CHARACTERCONVERSION -#define GEN_PASS_DECL_CFGCONVERSION +#define GEN_PASS_DECL_CFGCONVERSIONONFUNC +#define GEN_PASS_DECL_CFGCONVERSIONONREDUCTION #define GEN_PASS_DECL_EXTERNALNAMECONVERSION #define GEN_PASS_DECL_MEMREFDATAFLOWOPT #define GEN_PASS_DECL_SIMPLIFYINTRINSICS @@ -53,7 +55,8 @@ std::unique_ptr createAbstractResultOnGlobalOptPass(); std::unique_ptr createAffineDemotionPass(); std::unique_ptr createArrayValueCopyPass(fir::ArrayValueCopyOptions options = {}); -std::unique_ptr createFirToCfgPass(); +std::unique_ptr createFirToCfgOnFuncPass(); +std::unique_ptr createFirToCfgOnReductionPass(); std::unique_ptr createCharacterConversionPass(); std::unique_ptr createExternalNameConversionPass(); std::unique_ptr diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td index 5fb576fd876254..e6ea92d814400f 100644 --- a/flang/include/flang/Optimizer/Transforms/Passes.td +++ b/flang/include/flang/Optimizer/Transforms/Passes.td @@ -145,7 +145,8 @@ def CharacterConversion : Pass<"character-conversion"> { ]; } -def CFGConversion : Pass<"cfg-conversion", "::mlir::func::FuncOp"> { +class CFGConversionBase + : Pass<"cfg-conversion-on-" # optExt # "-opt", operation> { let summary = "Convert FIR structured control flow ops to CFG ops."; let description = [{ Trans
[llvm-branch-commits] [flang] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops (PR #84954)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/84954 OpenMP reduction declare operations can contain FIR code which needs to be lowered to LLVM. With array reductions, these regions can contain more complicated operations which need PreCGRewriting. A similar extra case was already needed for fir::GlobalOp. OpenMP array reductions 3/6 >From f951d16cf6cb1ab221f47ca2e712020b9af0af87 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Fri, 1 Mar 2024 16:59:09 + Subject: [PATCH] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops OpenMP reduction declare operations can contain FIR code which needs to be lowered to LLVM. With array reductions, these regions can contain more complicated operations which need PreCGRewriting. A similar extra case was already needed for fir::GlobalOp. --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 5 + 1 file changed, 5 insertions(+) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index 0170b56367cf3c..dd935e71762355 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -22,6 +22,7 @@ #include "mlir/Transforms/RegionUtils.h" #include "llvm/ADT/STLExtras.h" #include "llvm/Support/Debug.h" +#include namespace fir { #define GEN_PASS_DEF_CODEGENREWRITE @@ -319,6 +320,10 @@ class CodeGenRewrite : public fir::impl::CodeGenRewriteBase { runOn(func, func.getBody()); for (auto global : mod.getOps()) runOn(global, global.getRegion()); +for (auto omp : mod.getOps()) { + runOn(omp, omp.getInitializerRegion()); + runOn(omp, omp.getReductionRegion()); +} } }; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][LLVM] erase call mappings in forgetMapping() (PR #84955)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/84955 It looks like the mappings for call instructions were forgotten here. This fixes a bug in OpenMP when in-lining a region containing call operations multiple times. OpenMP array reductions 4/6 >From c62b31262bc619145866a304e10925a35462f5bf Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Wed, 21 Feb 2024 14:22:39 + Subject: [PATCH] [mlir][LLVM] erase call mappings in forgetMapping() It looks like the mappings for call instructions were forgotten here. This fixes a bug in OpenMP when inlining a region containing call operations multiple times. --- mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp index c00628a420a000..995544238e4a3c 100644 --- a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp @@ -716,6 +716,8 @@ void ModuleTranslation::forgetMapping(Region ®ion) { branchMapping.erase(&op); if (isa(op)) globalsMapping.erase(&op); +if (isa(op)) + callMapping.erase(&op); llvm::append_range( toProcess, llvm::map_range(op.getRegions(), [](Region &r) { return &r; })); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][NFC] move extractSequenceType helper out of OpenACC to share code (PR #84957)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/84957 Moving extractSequenceType to FIRType.h so that this can also be used from OpenMP. OpenMP array reductions 5/6 >From 2ff12fa0a580cb060f208d173d9af72bfa49d3b2 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Fri, 1 Mar 2024 15:47:24 + Subject: [PATCH] [flang][NFC] move extractSequenceType helper out of OpenACC to share code Moving extractSequenceType to FIRType.h so that this can also be used from OpenMP. --- .../include/flang/Optimizer/Dialect/FIRType.h | 3 +++ flang/lib/Lower/OpenACC.cpp | 21 --- flang/lib/Optimizer/Dialect/FIRType.cpp | 12 +++ 3 files changed, 19 insertions(+), 17 deletions(-) diff --git a/flang/include/flang/Optimizer/Dialect/FIRType.h b/flang/include/flang/Optimizer/Dialect/FIRType.h index a526b4ddf3b98c..7fcd9c1babf24f 100644 --- a/flang/include/flang/Optimizer/Dialect/FIRType.h +++ b/flang/include/flang/Optimizer/Dialect/FIRType.h @@ -237,6 +237,9 @@ inline mlir::Type unwrapSequenceType(mlir::Type t) { return t; } +/// Return the nested sequence type if any. +mlir::Type extractSequenceType(mlir::Type ty); + inline mlir::Type unwrapRefType(mlir::Type t) { if (auto eleTy = dyn_cast_ptrEleTy(t)) return eleTy; diff --git a/flang/lib/Lower/OpenACC.cpp b/flang/lib/Lower/OpenACC.cpp index d2c6006ecf914a..6539de4d88304c 100644 --- a/flang/lib/Lower/OpenACC.cpp +++ b/flang/lib/Lower/OpenACC.cpp @@ -406,19 +406,6 @@ fir::ShapeOp genShapeOp(mlir::OpBuilder &builder, fir::SequenceType seqTy, return builder.create(loc, extents); } -/// Return the nested sequence type if any. -static mlir::Type extractSequenceType(mlir::Type ty) { - if (mlir::isa(ty)) -return ty; - if (auto boxTy = mlir::dyn_cast(ty)) -return extractSequenceType(boxTy.getEleTy()); - if (auto heapTy = mlir::dyn_cast(ty)) -return extractSequenceType(heapTy.getEleTy()); - if (auto ptrTy = mlir::dyn_cast(ty)) -return extractSequenceType(ptrTy.getEleTy()); - return mlir::Type{}; -} - template static void genPrivateLikeInitRegion(mlir::OpBuilder &builder, RecipeOp recipe, mlir::Type ty, mlir::Location loc) { @@ -454,7 +441,7 @@ static void genPrivateLikeInitRegion(mlir::OpBuilder &builder, RecipeOp recipe, } } } else if (auto boxTy = mlir::dyn_cast_or_null(ty)) { -mlir::Type innerTy = extractSequenceType(boxTy); +mlir::Type innerTy = fir::extractSequenceType(boxTy); if (!innerTy) TODO(loc, "Unsupported boxed type in OpenACC privatization"); fir::FirOpBuilder firBuilder{builder, recipe.getOperation()}; @@ -688,7 +675,7 @@ mlir::acc::FirstprivateRecipeOp Fortran::lower::createOrGetFirstprivateRecipe( } else if (auto boxTy = mlir::dyn_cast_or_null(ty)) { fir::FirOpBuilder firBuilder{builder, recipe.getOperation()}; llvm::SmallVector tripletArgs; -mlir::Type innerTy = extractSequenceType(boxTy); +mlir::Type innerTy = fir::extractSequenceType(boxTy); fir::SequenceType seqTy = mlir::dyn_cast_or_null(innerTy); if (!seqTy) @@ -1018,7 +1005,7 @@ static mlir::Value genReductionInitRegion(fir::FirOpBuilder &builder, return declareOp.getBase(); } } else if (auto boxTy = mlir::dyn_cast_or_null(ty)) { -mlir::Type innerTy = extractSequenceType(boxTy); +mlir::Type innerTy = fir::extractSequenceType(boxTy); if (!mlir::isa(innerTy)) TODO(loc, "Unsupported boxed type for reduction"); // Create the private copy from the initial fir.box. @@ -1230,7 +1217,7 @@ static void genCombiner(fir::FirOpBuilder &builder, mlir::Location loc, builder.create(loc, res, addr1); builder.setInsertionPointAfter(loops[0]); } else if (auto boxTy = mlir::dyn_cast(ty)) { -mlir::Type innerTy = extractSequenceType(boxTy); +mlir::Type innerTy = fir::extractSequenceType(boxTy); fir::SequenceType seqTy = mlir::dyn_cast_or_null(innerTy); if (!seqTy) diff --git a/flang/lib/Optimizer/Dialect/FIRType.cpp b/flang/lib/Optimizer/Dialect/FIRType.cpp index 8a2c681d958609..5c4cad6d208344 100644 --- a/flang/lib/Optimizer/Dialect/FIRType.cpp +++ b/flang/lib/Optimizer/Dialect/FIRType.cpp @@ -254,6 +254,18 @@ bool hasDynamicSize(mlir::Type t) { return false; } +mlir::Type extractSequenceType(mlir::Type ty) { + if (mlir::isa(ty)) +return ty; + if (auto boxTy = mlir::dyn_cast(ty)) +return extractSequenceType(boxTy.getEleTy()); + if (auto heapTy = mlir::dyn_cast(ty)) +return extractSequenceType(heapTy.getEleTy()); + if (auto ptrTy = mlir::dyn_cast(ty)) +return extractSequenceType(ptrTy.getEleTy()); + return mlir::Type{}; +} + bool isPointerType(mlir::Type ty) { if (auto refTy = fir::dyn_cast_ptrEleTy(ty)) ty = refTy; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinf
[llvm-branch-commits] [flang] [flang][OpenMP] lower simple array reductions (PR #84958)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/84958 This has been tested with arrays with compile-time constant bounds. Allocatable arrays and arrays with non-constant bounds are not yet supported. User-defined reduction functions are also not yet supported. The design is intended to work for arrays with non-constant bounds too without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I haven't fixed yet). We need some way to get these runtime bounds into the reduction init and combiner regions. To keep things simple for now I opted to always box the array arguments so the box can be passed as one argument and the lower bounds and extents read from the box. This has the disadvantage of resulting in fir.box_dim operations inside of the critical section. If these prove to be a performance issue, we could follow OpenACC reading box lower bounds and extents before the reduction and passing them as block arguments to the reduction init and combiner regions. I would prefer to keep things simple for now. Note: this implementation only works when the HLFIR lowering is used. I don't think it is worth supporting FIR-only lowering because the plan is for that to be removed soon. OpenMP array reductions 6/6 >From bd668cd95d95d1e5b9c8436875c14878c98902ff Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Mon, 12 Feb 2024 14:03:00 + Subject: [PATCH] [flang][OpenMP] lower simple array reductions This has been tested with arrays with compile-time constant bounds. Allocatable arrays and arrays with non-constant bounds are not yet supported. User-defined reduction functions are also not yet supported. The design is intended to work for arrays with non-constant bounds too without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I haven't fixed yet). We need some way to get these runtime bounds into the reduction init and combiner regions. To keep things simple for now I opted to always box the array arguments so the box can be passed as one argument and the lower bounds and extents read from the box. This has the disadvantage of resulting in fir.box_dim operations inside of the critical section. If these prove to be a performance issue, we could follow OpenACC reading box lower bounds and extents before the reduction and passing them as block arguments to the reduction init and combiner regions. I would prefer to keep things simple for now. Note: this implementation only works when the HLFIR lowering is used. I don't think it is worth supporting FIR-only lowering because the plan is for that to be removed soon. --- flang/lib/Lower/OpenMP/ReductionProcessor.cpp | 283 ++ flang/lib/Lower/OpenMP/ReductionProcessor.h | 2 +- .../Lower/OpenMP/Todo/reduction-arrays.f90| 15 - .../Lower/OpenMP/parallel-reduction-array.f90 | 74 + .../Lower/OpenMP/wsloop-reduction-array.f90 | 84 ++ 5 files changed, 389 insertions(+), 69 deletions(-) delete mode 100644 flang/test/Lower/OpenMP/Todo/reduction-arrays.f90 create mode 100644 flang/test/Lower/OpenMP/parallel-reduction-array.f90 create mode 100644 flang/test/Lower/OpenMP/wsloop-reduction-array.f90 diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp index e6a63dd4b939ce..de2f11f5b9512e 100644 --- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp +++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp @@ -13,6 +13,7 @@ #include "ReductionProcessor.h" #include "flang/Lower/AbstractConverter.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" #include "flang/Optimizer/Builder/Todo.h" #include "flang/Optimizer/Dialect/FIRType.h" #include "flang/Optimizer/HLFIR/HLFIROps.h" @@ -92,10 +93,42 @@ std::string ReductionProcessor::getReductionName(llvm::StringRef name, if (isByRef) byrefAddition = "_byref"; - return (llvm::Twine(name) + - (ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + - llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) - .str(); + if (fir::isa_trivial(ty)) +return (llvm::Twine(name) + +(ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + +llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) +.str(); + + // creates a name like reduction_i_64_box_ux4x3 + if (auto boxTy = mlir::dyn_cast_or_null(ty)) { +// TODO: support for allocatable boxes: +// !fir.box>> +fir::SequenceType seqTy = fir::unwrapRefType(boxTy.getEleTy()) + .dyn_cast_or_null(); +if (!seqTy) + return {}; + +std::string prefix = getReductionName( +name, fir::unwrapSeqOrBoxedSeqType(ty), /*isByRef=*/false); +if (prefix.empty()) + return {}; +std::stringstream tyStr; +tyStr << prefix << "_box_"; +bool first = true; +for (std::int64_t extent : seqTy.getShape()) { + if (first) +first = false; + else +tyStr << "x"; + if (exte
[llvm-branch-commits] [flang] [flang] support fir.alloca operations inside of omp reduction ops (PR #84952)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/84952 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] run CFG conversion on omp reduction declare ops (PR #84953)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/84953 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops (PR #84954)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/84954 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][LLVM] erase call mappings in forgetMapping() (PR #84955)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/84955 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][NFC] move extractSequenceType helper out of OpenACC to share code (PR #84957)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/84957 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] lower simple array reductions (PR #84958)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/84958 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Convert unique clauses in ClauseProcessor (PR #81622)
https://github.com/tblah approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/81622 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Convert repeatable clauses (except Map) in ClauseProc… (PR #81623)
@@ -87,50 +87,44 @@ getSimdModifier(const omp::clause::Schedule &clause) { static void genAllocateClause(Fortran::lower::AbstractConverter &converter, - const Fortran::parser::OmpAllocateClause &ompAllocateClause, + const omp::clause::Allocate &clause, llvm::SmallVectorImpl &allocatorOperands, llvm::SmallVectorImpl &allocateOperands) { fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder(); mlir::Location currentLocation = converter.getCurrentLocation(); Fortran::lower::StatementContext stmtCtx; mlir::Value allocatorOperand; - const Fortran::parser::OmpObjectList &ompObjectList = - std::get(ompAllocateClause.t); - const auto &allocateModifier = std::get< - std::optional>( - ompAllocateClause.t); + const omp::ObjectList &objectList = std::get(clause.t); + const auto &modifier = + std::get>(clause.t); // If the allocate modifier is present, check if we only use the allocator // submodifier. ALIGN in this context is unimplemented const bool onlyAllocator = - allocateModifier && - std::holds_alternative< - Fortran::parser::OmpAllocateClause::AllocateModifier::Allocator>( - allocateModifier->u); + modifier && + std::holds_alternative( + modifier->u); - if (allocateModifier && !onlyAllocator) { + if (modifier && !onlyAllocator) { TODO(currentLocation, "OmpAllocateClause ALIGN modifier"); } // Check if allocate clause has allocator specified. If so, add it // to list of allocators, otherwise, add default allocator to // list of allocators. if (onlyAllocator) { -const auto &allocatorValue = std::get< -Fortran::parser::OmpAllocateClause::AllocateModifier::Allocator>( -allocateModifier->u); -allocatorOperand = fir::getBase(converter.genExprValue( -*Fortran::semantics::GetExpr(allocatorValue.v), stmtCtx)); -allocatorOperands.insert(allocatorOperands.end(), ompObjectList.v.size(), - allocatorOperand); +const auto &value = +std::get(modifier->u); +mlir::Value operand = +fir::getBase(converter.genExprValue(value.v, stmtCtx)); tblah wrote: Why did you drop `Fortran::semantics::GetExpr`? https://github.com/llvm/llvm-project/pull/81623 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Convert repeatable clauses (except Map) in ClauseProc… (PR #81623)
@@ -181,45 +172,41 @@ genDependKindAttr(fir::FirOpBuilder &firOpBuilder, pbKind); } -static mlir::Value getIfClauseOperand( -Fortran::lower::AbstractConverter &converter, -const Fortran::parser::OmpClause::If *ifClause, -Fortran::parser::OmpIfClause::DirectiveNameModifier directiveName, -mlir::Location clauseLocation) { +static mlir::Value +getIfClauseOperand(Fortran::lower::AbstractConverter &converter, + const omp::clause::If &clause, + omp::clause::If::DirectiveNameModifier directiveName, + mlir::Location clauseLocation) { // Only consider the clause if it's intended for the given directive. - auto &directive = std::get< - std::optional>( - ifClause->v.t); + auto &directive = + std::get>(clause.t); if (directive && directive.value() != directiveName) return nullptr; Fortran::lower::StatementContext stmtCtx; fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder(); - auto &expr = std::get(ifClause->v.t); mlir::Value ifVal = fir::getBase( - converter.genExprValue(*Fortran::semantics::GetExpr(expr), stmtCtx)); + converter.genExprValue(std::get(clause.t), stmtCtx)); return firOpBuilder.createConvert(clauseLocation, firOpBuilder.getI1Type(), ifVal); } static void addUseDeviceClause(Fortran::lower::AbstractConverter &converter, - const Fortran::parser::OmpObjectList &useDeviceClause, + const omp::ObjectList &objects, llvm::SmallVectorImpl &operands, llvm::SmallVectorImpl &useDeviceTypes, llvm::SmallVectorImpl &useDeviceLocs, llvm::SmallVectorImpl &useDeviceSymbols) { - genObjectList(useDeviceClause, converter, operands); + genObjectList(objects, converter, operands); for (mlir::Value &operand : operands) { checkMapType(operand.getLoc(), operand.getType()); useDeviceTypes.push_back(operand.getType()); useDeviceLocs.push_back(operand.getLoc()); } - for (const Fortran::parser::OmpObject &ompObject : useDeviceClause.v) { -Fortran::semantics::Symbol *sym = getOmpObjectSymbol(ompObject); -useDeviceSymbols.push_back(sym); - } + for (const omp::Object &object : objects) +useDeviceSymbols.push_back(object.id()); tblah wrote: Why have you dropped the call to `getOmpObjectSymbol`? https://github.com/llvm/llvm-project/pull/81623 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Convert repeatable clauses (except Map) in ClauseProc… (PR #81623)
@@ -181,45 +172,41 @@ genDependKindAttr(fir::FirOpBuilder &firOpBuilder, pbKind); } -static mlir::Value getIfClauseOperand( -Fortran::lower::AbstractConverter &converter, -const Fortran::parser::OmpClause::If *ifClause, -Fortran::parser::OmpIfClause::DirectiveNameModifier directiveName, -mlir::Location clauseLocation) { +static mlir::Value +getIfClauseOperand(Fortran::lower::AbstractConverter &converter, + const omp::clause::If &clause, + omp::clause::If::DirectiveNameModifier directiveName, + mlir::Location clauseLocation) { // Only consider the clause if it's intended for the given directive. - auto &directive = std::get< - std::optional>( - ifClause->v.t); + auto &directive = + std::get>(clause.t); if (directive && directive.value() != directiveName) return nullptr; Fortran::lower::StatementContext stmtCtx; fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder(); - auto &expr = std::get(ifClause->v.t); mlir::Value ifVal = fir::getBase( - converter.genExprValue(*Fortran::semantics::GetExpr(expr), stmtCtx)); + converter.genExprValue(std::get(clause.t), stmtCtx)); tblah wrote: Here too https://github.com/llvm/llvm-project/pull/81623 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] support fir.alloca operations inside of omp reduction ops (PR #84952)
@@ -410,8 +410,15 @@ class FIROpConversion : public mlir::ConvertOpToLLVMPattern { mlir::ConversionPatternRewriter &rewriter) const { auto thisPt = rewriter.saveInsertionPoint(); mlir::Operation *parentOp = rewriter.getInsertionBlock()->getParentOp(); -mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp); -rewriter.setInsertionPointToStart(insertBlock); +if (mlir::isa(parentOp)) { + // ReductionDeclareOp has multiple child regions. We want to get the first + // block of whichever of those regions we are currently in + mlir::Region *parentRegion = rewriter.getInsertionBlock()->getParent(); + rewriter.setInsertionPointToStart(&parentRegion->front()); +} else { + mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp); + rewriter.setInsertionPointToStart(insertBlock); +} tblah wrote: That function doesn't have access to the rewriter. I could have added it as an argument, but I felt that it reads a bit weird there because it isn't obvious out of context what the conditions on the rewriter are. The awkwardness comes from the use of multiple regions in the reduction declare op. We need to make sure the alloca ends up in the same region that we are currently inserting into. https://github.com/llvm/llvm-project/pull/84952 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Convert repeatable clauses (except Map) in ClauseProc… (PR #81623)
https://github.com/tblah approved this pull request. Thanks for explaining. LGTM https://github.com/llvm/llvm-project/pull/81623 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Convert processTODO and remove unused objects (PR #81627)
https://github.com/tblah approved this pull request. LGTM, thanks https://github.com/llvm/llvm-project/pull/81627 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops (PR #84954)
https://github.com/tblah updated https://github.com/llvm/llvm-project/pull/84954 >From f951d16cf6cb1ab221f47ca2e712020b9af0af87 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Fri, 1 Mar 2024 16:59:09 + Subject: [PATCH 1/3] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops OpenMP reduction declare operations can contain FIR code which needs to be lowered to LLVM. With array reductions, these regions can contain more complicated operations which need PreCGRewriting. A similar extra case was already needed for fir::GlobalOp. --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 5 + 1 file changed, 5 insertions(+) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index 0170b56367cf3c..dd935e71762355 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -22,6 +22,7 @@ #include "mlir/Transforms/RegionUtils.h" #include "llvm/ADT/STLExtras.h" #include "llvm/Support/Debug.h" +#include namespace fir { #define GEN_PASS_DEF_CODEGENREWRITE @@ -319,6 +320,10 @@ class CodeGenRewrite : public fir::impl::CodeGenRewriteBase { runOn(func, func.getBody()); for (auto global : mod.getOps()) runOn(global, global.getRegion()); +for (auto omp : mod.getOps()) { + runOn(omp, omp.getInitializerRegion()); + runOn(omp, omp.getReductionRegion()); +} } }; >From b909193418789d1bcb572b69070fdca9c2d35a7c Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 14 Mar 2024 07:54:12 + Subject: [PATCH 2/3] Fix include --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index dd935e71762355..097845e447842b 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -18,11 +18,11 @@ #include "flang/Optimizer/Dialect/FIROps.h" #include "flang/Optimizer/Dialect/FIRType.h" #include "flang/Optimizer/Dialect/Support/FIRContext.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/Transforms/DialectConversion.h" #include "mlir/Transforms/RegionUtils.h" #include "llvm/ADT/STLExtras.h" #include "llvm/Support/Debug.h" -#include namespace fir { #define GEN_PASS_DEF_CODEGENREWRITE >From 9d5026a16f4de4037d1fefa77d5c913085183150 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 14 Mar 2024 08:02:40 + Subject: [PATCH 3/3] Run PreCGRewrite on all regions in the module --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index 097845e447842b..410e6400c9be14 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -284,7 +284,7 @@ class DeclareOpConversion : public mlir::OpRewritePattern { class CodeGenRewrite : public fir::impl::CodeGenRewriteBase { public: - void runOn(mlir::Operation *op, mlir::Region ®ion) { + void runOn(mlir::Operation *op) { auto &context = getContext(); mlir::ConversionTarget target(context); target.addLegalDialect { void runOnOperation() override final { // Call runOn on all top level regions that may contain emboxOp/arrayCoorOp. -auto mod = getOperation(); -for (auto func : mod.getOps()) - runOn(func, func.getBody()); -for (auto global : mod.getOps()) - runOn(global, global.getRegion()); -for (auto omp : mod.getOps()) { - runOn(omp, omp.getInitializerRegion()); - runOn(omp, omp.getReductionRegion()); -} +mlir::ModuleOp mod = getOperation(); +runOn(mod); } }; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops (PR #84954)
@@ -319,6 +320,10 @@ class CodeGenRewrite : public fir::impl::CodeGenRewriteBase { runOn(func, func.getBody()); for (auto global : mod.getOps()) runOn(global, global.getRegion()); +for (auto omp : mod.getOps()) { tblah wrote: Good idea. That does indeed work https://github.com/llvm/llvm-project/pull/84954 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] run CFG conversion on omp reduction declare ops (PR #84953)
https://github.com/tblah updated https://github.com/llvm/llvm-project/pull/84953 >From 192da3c05fd8c0759f280e0895ffc2f09b2203e4 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 15 Feb 2024 12:12:29 + Subject: [PATCH 1/2] [flang] run CFG conversion on omp reduction declare ops Most FIR passes only look for FIR operations inside of functions (either because they run only on func.func or they run on the module but iterate over functions internally). But there can also be FIR operations inside of fir.global, some OpenMP and OpenACC container operations. This has worked so far for fir.global and OpenMP reductions because they only contained very simple FIR code which doesn't need most passes to be lowered into LLVM IR. I am not sure how OpenACC works. In the long run, I hope to see a more systematic approach to making sure that every pass runs on all of these container operations. I will write an RFC for this soon. In the meantime, this pass duplicates the CFG conversion pass to also run on omp reduction operations. This is similar to how the AbstractResult pass is already duplicated for fir.global operations. Co-authored-by: Mats Petersson --- .../flang/Optimizer/Transforms/Passes.h | 7 +++-- .../flang/Optimizer/Transforms/Passes.td | 12 ++-- flang/include/flang/Tools/CLOptions.inc | 4 ++- .../Transforms/ControlFlowConverter.cpp | 30 ++- flang/test/Driver/bbc-mlir-pass-pipeline.f90 | 5 +++- .../test/Driver/mlir-debug-pass-pipeline.f90 | 16 +- flang/test/Driver/mlir-pass-pipeline.f90 | 16 ++ flang/test/Fir/array-value-copy-2.fir | 4 +-- flang/test/Fir/basic-program.fir | 5 +++- .../Fir/convert-to-llvm-openmp-and-fir.fir| 2 +- flang/test/Fir/loop01.fir | 2 +- flang/test/Fir/loop02.fir | 4 +-- flang/test/Lower/OpenMP/FIR/flush.f90 | 2 +- flang/test/Lower/OpenMP/FIR/master.f90| 2 +- .../Lower/OpenMP/FIR/parallel-sections.f90| 2 +- 15 files changed, 76 insertions(+), 37 deletions(-) diff --git a/flang/include/flang/Optimizer/Transforms/Passes.h b/flang/include/flang/Optimizer/Transforms/Passes.h index e1d22c8c986da7..adf747ebdb400b 100644 --- a/flang/include/flang/Optimizer/Transforms/Passes.h +++ b/flang/include/flang/Optimizer/Transforms/Passes.h @@ -11,6 +11,7 @@ #include "flang/Optimizer/Dialect/FIROps.h" #include "mlir/Dialect/LLVMIR/LLVMAttrs.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/Pass/Pass.h" #include "mlir/Pass/PassRegistry.h" #include @@ -37,7 +38,8 @@ namespace fir { #define GEN_PASS_DECL_ANNOTATECONSTANTOPERANDS #define GEN_PASS_DECL_ARRAYVALUECOPY #define GEN_PASS_DECL_CHARACTERCONVERSION -#define GEN_PASS_DECL_CFGCONVERSION +#define GEN_PASS_DECL_CFGCONVERSIONONFUNC +#define GEN_PASS_DECL_CFGCONVERSIONONREDUCTION #define GEN_PASS_DECL_EXTERNALNAMECONVERSION #define GEN_PASS_DECL_MEMREFDATAFLOWOPT #define GEN_PASS_DECL_SIMPLIFYINTRINSICS @@ -53,7 +55,8 @@ std::unique_ptr createAbstractResultOnGlobalOptPass(); std::unique_ptr createAffineDemotionPass(); std::unique_ptr createArrayValueCopyPass(fir::ArrayValueCopyOptions options = {}); -std::unique_ptr createFirToCfgPass(); +std::unique_ptr createFirToCfgOnFuncPass(); +std::unique_ptr createFirToCfgOnReductionPass(); std::unique_ptr createCharacterConversionPass(); std::unique_ptr createExternalNameConversionPass(); std::unique_ptr diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td index 5fb576fd876254..e6ea92d814400f 100644 --- a/flang/include/flang/Optimizer/Transforms/Passes.td +++ b/flang/include/flang/Optimizer/Transforms/Passes.td @@ -145,7 +145,8 @@ def CharacterConversion : Pass<"character-conversion"> { ]; } -def CFGConversion : Pass<"cfg-conversion", "::mlir::func::FuncOp"> { +class CFGConversionBase + : Pass<"cfg-conversion-on-" # optExt # "-opt", operation> { let summary = "Convert FIR structured control flow ops to CFG ops."; let description = [{ Transform the `fir.do_loop`, `fir.if`, `fir.iterate_while` and @@ -154,7 +155,6 @@ def CFGConversion : Pass<"cfg-conversion", "::mlir::func::FuncOp"> { This pass is required before code gen to the LLVM IR dialect. }]; - let constructor = "::fir::createFirToCfgPass()"; let dependentDialects = [ "fir::FIROpsDialect", "mlir::func::FuncDialect" ]; @@ -165,6 +165,14 @@ def CFGConversion : Pass<"cfg-conversion", "::mlir::func::FuncOp"> { ]; } +def CFGConversionOnFunc : CFGConversionBase<"func", "mlir::func::FuncOp"> { + let constructor = "::fir::createFirToCfgOnFuncPass()"; +} + +def CFGConversionOnReduction : CFGConversionBase<"reduce", "mlir::omp::ReductionDeclareOp"> { + let constructor = "::fir::createFirToCfgOnReductionPass()"; +} + def ExternalNameConversion : Pass<"external-name-interop", "mlir::ModuleOp"> { let summary = "Convert name for
[llvm-branch-commits] [flang] [flang] run CFG conversion on omp reduction declare ops (PR #84953)
tblah wrote: > Wouldn't it be cleaner to expose the patterns via a > `populateFirCfgConversionPatterns` function and reuse it in you extra pass > instead of making two pass from the initial file? > > We did this recently for the FirToLLVM patterns. #83492 Thanks for taking a look at this. I have extracted the patterns into an externally visible function, but I don't understand the benefit of splitting the two CFG conversion passes out into different files. I think this could be confusing because it would be unclear which file one should contain the definition of those conversion patterns. Keeping it in one file makes it clear that both do exactly the same thing on different target operations. https://github.com/llvm/llvm-project/pull/84953 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops (PR #84954)
https://github.com/tblah updated https://github.com/llvm/llvm-project/pull/84954 >From f951d16cf6cb1ab221f47ca2e712020b9af0af87 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Fri, 1 Mar 2024 16:59:09 + Subject: [PATCH 1/4] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops OpenMP reduction declare operations can contain FIR code which needs to be lowered to LLVM. With array reductions, these regions can contain more complicated operations which need PreCGRewriting. A similar extra case was already needed for fir::GlobalOp. --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 5 + 1 file changed, 5 insertions(+) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index 0170b56367cf3c..dd935e71762355 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -22,6 +22,7 @@ #include "mlir/Transforms/RegionUtils.h" #include "llvm/ADT/STLExtras.h" #include "llvm/Support/Debug.h" +#include namespace fir { #define GEN_PASS_DEF_CODEGENREWRITE @@ -319,6 +320,10 @@ class CodeGenRewrite : public fir::impl::CodeGenRewriteBase { runOn(func, func.getBody()); for (auto global : mod.getOps()) runOn(global, global.getRegion()); +for (auto omp : mod.getOps()) { + runOn(omp, omp.getInitializerRegion()); + runOn(omp, omp.getReductionRegion()); +} } }; >From b909193418789d1bcb572b69070fdca9c2d35a7c Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 14 Mar 2024 07:54:12 + Subject: [PATCH 2/4] Fix include --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index dd935e71762355..097845e447842b 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -18,11 +18,11 @@ #include "flang/Optimizer/Dialect/FIROps.h" #include "flang/Optimizer/Dialect/FIRType.h" #include "flang/Optimizer/Dialect/Support/FIRContext.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/Transforms/DialectConversion.h" #include "mlir/Transforms/RegionUtils.h" #include "llvm/ADT/STLExtras.h" #include "llvm/Support/Debug.h" -#include namespace fir { #define GEN_PASS_DEF_CODEGENREWRITE >From 9d5026a16f4de4037d1fefa77d5c913085183150 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 14 Mar 2024 08:02:40 + Subject: [PATCH 3/4] Run PreCGRewrite on all regions in the module --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index 097845e447842b..410e6400c9be14 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -284,7 +284,7 @@ class DeclareOpConversion : public mlir::OpRewritePattern { class CodeGenRewrite : public fir::impl::CodeGenRewriteBase { public: - void runOn(mlir::Operation *op, mlir::Region ®ion) { + void runOn(mlir::Operation *op) { auto &context = getContext(); mlir::ConversionTarget target(context); target.addLegalDialect { void runOnOperation() override final { // Call runOn on all top level regions that may contain emboxOp/arrayCoorOp. -auto mod = getOperation(); -for (auto func : mod.getOps()) - runOn(func, func.getBody()); -for (auto global : mod.getOps()) - runOn(global, global.getRegion()); -for (auto omp : mod.getOps()) { - runOn(omp, omp.getInitializerRegion()); - runOn(omp, omp.getReductionRegion()); -} +mlir::ModuleOp mod = getOperation(); +runOn(mod); } }; >From c7ffde450f0d3480be4532c4b21ed5a036fb8424 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 14 Mar 2024 14:03:13 + Subject: [PATCH 4/4] Remove unessecary include --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 1 - 1 file changed, 1 deletion(-) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index 410e6400c9be14..4a05ad717f02f2 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -18,7 +18,6 @@ #include "flang/Optimizer/Dialect/FIROps.h" #include "flang/Optimizer/Dialect/FIRType.h" #include "flang/Optimizer/Dialect/Support/FIRContext.h" -#include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/Transforms/DialectConversion.h" #include "mlir/Transforms/RegionUtils.h" #include "llvm/ADT/STLExtras.h" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] run CFG conversion on omp reduction declare ops (PR #84953)
tblah wrote: > Wouldn't applying the patterns on the module in a single pass work here as > well? Yes that would work but we would loose parallelism. I wanted to keep that because there was talk in the past about breaking up the existing Module passes so that different functions (or other container operations) can be processed in parallel https://github.com/llvm/llvm-project/pull/84953 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops (PR #84954)
https://github.com/tblah updated https://github.com/llvm/llvm-project/pull/84954 >From f951d16cf6cb1ab221f47ca2e712020b9af0af87 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Fri, 1 Mar 2024 16:59:09 + Subject: [PATCH 1/5] [flang][CodeGen] Run PreCGRewrite on omp reduction declare ops OpenMP reduction declare operations can contain FIR code which needs to be lowered to LLVM. With array reductions, these regions can contain more complicated operations which need PreCGRewriting. A similar extra case was already needed for fir::GlobalOp. --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 5 + 1 file changed, 5 insertions(+) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index 0170b56367cf3c..dd935e71762355 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -22,6 +22,7 @@ #include "mlir/Transforms/RegionUtils.h" #include "llvm/ADT/STLExtras.h" #include "llvm/Support/Debug.h" +#include namespace fir { #define GEN_PASS_DEF_CODEGENREWRITE @@ -319,6 +320,10 @@ class CodeGenRewrite : public fir::impl::CodeGenRewriteBase { runOn(func, func.getBody()); for (auto global : mod.getOps()) runOn(global, global.getRegion()); +for (auto omp : mod.getOps()) { + runOn(omp, omp.getInitializerRegion()); + runOn(omp, omp.getReductionRegion()); +} } }; >From b909193418789d1bcb572b69070fdca9c2d35a7c Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 14 Mar 2024 07:54:12 + Subject: [PATCH 2/5] Fix include --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index dd935e71762355..097845e447842b 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -18,11 +18,11 @@ #include "flang/Optimizer/Dialect/FIROps.h" #include "flang/Optimizer/Dialect/FIRType.h" #include "flang/Optimizer/Dialect/Support/FIRContext.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/Transforms/DialectConversion.h" #include "mlir/Transforms/RegionUtils.h" #include "llvm/ADT/STLExtras.h" #include "llvm/Support/Debug.h" -#include namespace fir { #define GEN_PASS_DEF_CODEGENREWRITE >From 9d5026a16f4de4037d1fefa77d5c913085183150 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 14 Mar 2024 08:02:40 + Subject: [PATCH 3/5] Run PreCGRewrite on all regions in the module --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index 097845e447842b..410e6400c9be14 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -284,7 +284,7 @@ class DeclareOpConversion : public mlir::OpRewritePattern { class CodeGenRewrite : public fir::impl::CodeGenRewriteBase { public: - void runOn(mlir::Operation *op, mlir::Region ®ion) { + void runOn(mlir::Operation *op) { auto &context = getContext(); mlir::ConversionTarget target(context); target.addLegalDialect { void runOnOperation() override final { // Call runOn on all top level regions that may contain emboxOp/arrayCoorOp. -auto mod = getOperation(); -for (auto func : mod.getOps()) - runOn(func, func.getBody()); -for (auto global : mod.getOps()) - runOn(global, global.getRegion()); -for (auto omp : mod.getOps()) { - runOn(omp, omp.getInitializerRegion()); - runOn(omp, omp.getReductionRegion()); -} +mlir::ModuleOp mod = getOperation(); +runOn(mod); } }; >From c7ffde450f0d3480be4532c4b21ed5a036fb8424 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Thu, 14 Mar 2024 14:03:13 + Subject: [PATCH 4/5] Remove unessecary include --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 1 - 1 file changed, 1 deletion(-) diff --git a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp index 410e6400c9be14..4a05ad717f02f2 100644 --- a/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp +++ b/flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp @@ -18,7 +18,6 @@ #include "flang/Optimizer/Dialect/FIROps.h" #include "flang/Optimizer/Dialect/FIRType.h" #include "flang/Optimizer/Dialect/Support/FIRContext.h" -#include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/Transforms/DialectConversion.h" #include "mlir/Transforms/RegionUtils.h" #include "llvm/ADT/STLExtras.h" >From 790025615d31ad0352f2befdd937f8e5d94f6c67 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Fri, 15 Mar 2024 11:28:30 + Subject: [PATCH 5/5] Remove runOn --- flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/flang/li
[llvm-branch-commits] [mlir] [mlir][LLVM] erase call mappings in forgetMapping() (PR #84955)
https://github.com/tblah updated https://github.com/llvm/llvm-project/pull/84955 >From c62b31262bc619145866a304e10925a35462f5bf Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Wed, 21 Feb 2024 14:22:39 + Subject: [PATCH 1/2] [mlir][LLVM] erase call mappings in forgetMapping() It looks like the mappings for call instructions were forgotten here. This fixes a bug in OpenMP when inlining a region containing call operations multiple times. --- mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp index c00628a420a000..995544238e4a3c 100644 --- a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp @@ -716,6 +716,8 @@ void ModuleTranslation::forgetMapping(Region ®ion) { branchMapping.erase(&op); if (isa(op)) globalsMapping.erase(&op); +if (isa(op)) + callMapping.erase(&op); llvm::append_range( toProcess, llvm::map_range(op.getRegions(), [](Region &r) { return &r; })); >From 83b87ec9057dc9fbe529c652b5020452fb9585bb Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Fri, 15 Mar 2024 14:05:53 + Subject: [PATCH 2/2] Add test --- .../Target/LLVMIR/openmp-reduction-call.mlir | 47 +++ 1 file changed, 47 insertions(+) create mode 100644 mlir/test/Target/LLVMIR/openmp-reduction-call.mlir diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir new file mode 100644 index 00..60419a85f66aa2 --- /dev/null +++ b/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir @@ -0,0 +1,47 @@ +// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s + +// Test that we don't crash when there is a call operation in the combiner + +omp.reduction.declare @add_f32 : f32 +init { +^bb0(%arg: f32): + %0 = llvm.mlir.constant(0.0 : f32) : f32 + omp.yield (%0 : f32) +} +combiner { +^bb1(%arg0: f32, %arg1: f32): +// test this call here: + llvm.call @test_call() : () -> () + %1 = llvm.fadd %arg0, %arg1 : f32 + omp.yield (%1 : f32) +} +atomic { +^bb2(%arg2: !llvm.ptr, %arg3: !llvm.ptr): + %2 = llvm.load %arg3 : !llvm.ptr -> f32 + llvm.atomicrmw fadd %arg2, %2 monotonic : !llvm.ptr, f32 + omp.yield +} + +llvm.func @simple_reduction(%lb : i64, %ub : i64, %step : i64) { + %c1 = llvm.mlir.constant(1 : i32) : i32 + %0 = llvm.alloca %c1 x i32 : (i32) -> !llvm.ptr + omp.parallel { +omp.wsloop reduction(@add_f32 %0 -> %prv : !llvm.ptr) +for (%iv) : i64 = (%lb) to (%ub) step (%step) { + %1 = llvm.mlir.constant(2.0 : f32) : f32 + %2 = llvm.load %prv : !llvm.ptr -> f32 + %3 = llvm.fadd %1, %2 : f32 + llvm.store %3, %prv : f32, !llvm.ptr + omp.yield +} +omp.terminator + } + llvm.return +} + +llvm.func @test_call() -> () + +// Call to the troublesome function will have been inlined twice: once into +// main and once into the outlined function +// CHECK: call void @test_call() +// CHECK: call void @test_call() ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] lower simple array reductions (PR #84958)
@@ -390,15 +559,35 @@ void ReductionProcessor::addReductionDecl( // initial pass to collect all recuction vars so we can figure out if this // should happen byref + fir::FirOpBuilder &builder = converter.getFirOpBuilder(); for (const Fortran::parser::OmpObject &ompObject : objectList.v) { if (const auto *name{ Fortran::parser::Unwrap(ompObject)}) { if (const Fortran::semantics::Symbol * symbol{name->symbol}) { if (reductionSymbols) reductionSymbols->push_back(symbol); mlir::Value symVal = converter.getSymbolAddress(*symbol); -if (auto declOp = symVal.getDefiningOp()) +auto redType = mlir::cast(symVal.getType()); tblah wrote: The existing upstream code already does this, I was mostly moving code around here. See line 434 in the old version of the code. This even predates the by-ref reduction changes. See https://github.com/llvm/llvm-project/blob/f18d78b477c76bc09dc580cdaedd55e121f5ebf5/flang/lib/Lower/OpenMP/ReductionProcessor.cpp#L347 For example, if you look at the llvm code we generate by-val reductions (which should be identical after my changes), it loaded the reduction variables unconditionally, so this must be some kind of pointer. https://github.com/llvm/llvm-project/pull/84958 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] lower simple array reductions (PR #84958)
@@ -283,13 +316,166 @@ mlir::Value ReductionProcessor::createScalarCombiner( return reductionOp; } +/// Create reduction combiner region for reduction variables which are boxed +/// arrays +static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc, + ReductionProcessor::ReductionIdentifier redId, + fir::BaseBoxType boxTy, mlir::Value lhs, + mlir::Value rhs) { + fir::SequenceType seqTy = + mlir::dyn_cast_or_null(boxTy.getEleTy()); + // TODO: support allocatable arrays: !fir.box>> + if (!seqTy || seqTy.hasUnknownShape()) +TODO(loc, "Unsupported boxed type in OpenMP reduction"); + + // load fir.ref> + mlir::Value lhsAddr = lhs; + lhs = builder.create(loc, lhs); + rhs = builder.create(loc, rhs); + + const unsigned rank = seqTy.getDimension(); + llvm::SmallVector extents; + extents.reserve(rank); + llvm::SmallVector lbAndExtents; + lbAndExtents.reserve(rank * 2); + + // Get box lowerbounds and extents: + mlir::Type idxTy = builder.getIndexType(); + for (unsigned i = 0; i < rank; ++i) { +// TODO: ideally we want to hoist box reads out of the critical section. +// We could do this by having box dimensions in block arguments like +// OpenACC does +mlir::Value dim = builder.createIntegerConstant(loc, idxTy, i); +auto dimInfo = +builder.create(loc, idxTy, idxTy, idxTy, lhs, dim); +extents.push_back(dimInfo.getExtent()); +lbAndExtents.push_back(dimInfo.getLowerBound()); +lbAndExtents.push_back(dimInfo.getExtent()); + } + + auto shapeShiftTy = fir::ShapeShiftType::get(builder.getContext(), rank); + auto shapeShift = + builder.create(loc, shapeShiftTy, lbAndExtents); + + // Iterate over array elements, applying the equivalent scalar reduction: + + // A hlfir::elemental here gets inlined with a temporary so create the + // loop nest directly. + // This function already controls all of the code in this region so we + // know this won't miss any opportuinties for clever elemental inlining + hlfir::LoopNest nest = + hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true); + builder.setInsertionPointToStart(nest.innerLoop.getBody()); + mlir::Type refTy = fir::ReferenceType::get(seqTy.getEleTy()); + auto lhsEleAddr = builder.create( + loc, refTy, lhs, shapeShift, /*slice=*/mlir::Value{}, + nest.oneBasedIndices, /*typeparms=*/mlir::ValueRange{}); + auto rhsEleAddr = builder.create( + loc, refTy, rhs, shapeShift, /*slice=*/mlir::Value{}, + nest.oneBasedIndices, /*typeparms=*/mlir::ValueRange{}); + auto lhsEle = builder.create(loc, lhsEleAddr); + auto rhsEle = builder.create(loc, rhsEleAddr); + mlir::Value scalarReduction = ReductionProcessor::createScalarCombiner( + builder, loc, redId, refTy, lhsEle, rhsEle); + builder.create(loc, scalarReduction, lhsEleAddr); + + builder.setInsertionPointAfter(nest.outerLoop); + builder.create(loc, lhsAddr); +} + +// generate combiner region for reduction operations +static void genCombiner(fir::FirOpBuilder &builder, mlir::Location loc, +ReductionProcessor::ReductionIdentifier redId, +mlir::Type ty, mlir::Value lhs, mlir::Value rhs, +bool isByRef) { + ty = fir::unwrapRefType(ty); + + if (fir::isa_trivial(ty)) { +mlir::Value lhsLoaded = builder.loadIfRef(loc, lhs); +mlir::Value rhsLoaded = builder.loadIfRef(loc, rhs); + +mlir::Value result = ReductionProcessor::createScalarCombiner( +builder, loc, redId, ty, lhsLoaded, rhsLoaded); +if (isByRef) { + builder.create(loc, result, lhs); + builder.create(loc, lhs); +} else { + builder.create(loc, result); +} +return; + } + // all arrays should have been boxed + if (auto boxTy = mlir::dyn_cast(ty)) { +genBoxCombiner(builder, loc, redId, boxTy, lhs, rhs); +return; + } + + TODO(loc, "OpenMP genCombiner for unsupported reduction variable type"); +} + +static mlir::Value +createReductionInitRegion(fir::FirOpBuilder &builder, mlir::Location loc, + const ReductionProcessor::ReductionIdentifier redId, + mlir::Type type, bool isByRef) { + mlir::Type ty = fir::unwrapRefType(type); + mlir::Value initValue = ReductionProcessor::getReductionInitValue( + loc, fir::unwrapSeqOrBoxedSeqType(ty), redId, builder); + + if (fir::isa_trivial(ty)) { +if (isByRef) { + mlir::Value alloca = builder.create(loc, ty); + builder.createStoreWithConvert(loc, initValue, alloca); + return alloca; +} +// by val +return initValue; + } + + // all arrays are boxed + if (auto boxTy = mlir::dyn_cast_or_null(ty)) { +assert(isByRef && "passing arrays by value is unsupported"); +// TODO: support allocatable arrays: !fir.box>> +mlir::Type innerTy = fir::extractSequenceType(boxTy); +if (!mlir::isa(innerTy)) +
[llvm-branch-commits] [flang] [flang][OpenMP] lower simple array reductions (PR #84958)
@@ -0,0 +1,74 @@ +! RUN: bbc -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s +! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s + +program reduce +integer, dimension(3) :: i = 0 + +!$omp parallel reduction(+:i) +i(1) = 1 +i(2) = 2 +i(3) = 3 tblah wrote: I was just trying to create a test with a very simple body. Do you want a test with the reduction operation in the body as well? https://github.com/llvm/llvm-project/pull/84958 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] lower simple array reductions (PR #84958)
@@ -92,10 +93,42 @@ std::string ReductionProcessor::getReductionName(llvm::StringRef name, if (isByRef) byrefAddition = "_byref"; - return (llvm::Twine(name) + - (ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + - llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) - .str(); + if (fir::isa_trivial(ty)) +return (llvm::Twine(name) + +(ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + +llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) +.str(); + + // creates a name like reduction_i_64_box_ux4x3 + if (auto boxTy = mlir::dyn_cast_or_null(ty)) { +// TODO: support for allocatable boxes: +// !fir.box>> +fir::SequenceType seqTy = fir::unwrapRefType(boxTy.getEleTy()) + .dyn_cast_or_null(); +if (!seqTy) + return {}; + +std::string prefix = getReductionName( +name, fir::unwrapSeqOrBoxedSeqType(ty), /*isByRef=*/false); +if (prefix.empty()) + return {}; +std::stringstream tyStr; +tyStr << prefix << "_box_"; +bool first = true; +for (std::int64_t extent : seqTy.getShape()) { + if (first) +first = false; + else +tyStr << "x"; + if (extent == seqTy.getUnknownExtent()) +tyStr << 'u'; // I'm not sure that '?' is safe in symbol names + else +tyStr << extent; +} +return (tyStr.str() + byrefAddition).str(); + } + + return {}; tblah wrote: Yeah that looks much better, thanks! I will do it in a separate patch because I think it might change existing names. https://github.com/llvm/llvm-project/pull/84958 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] lower simple array reductions (PR #84958)
https://github.com/tblah updated https://github.com/llvm/llvm-project/pull/84958 >From bd668cd95d95d1e5b9c8436875c14878c98902ff Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Mon, 12 Feb 2024 14:03:00 + Subject: [PATCH 1/4] [flang][OpenMP] lower simple array reductions This has been tested with arrays with compile-time constant bounds. Allocatable arrays and arrays with non-constant bounds are not yet supported. User-defined reduction functions are also not yet supported. The design is intended to work for arrays with non-constant bounds too without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I haven't fixed yet). We need some way to get these runtime bounds into the reduction init and combiner regions. To keep things simple for now I opted to always box the array arguments so the box can be passed as one argument and the lower bounds and extents read from the box. This has the disadvantage of resulting in fir.box_dim operations inside of the critical section. If these prove to be a performance issue, we could follow OpenACC reading box lower bounds and extents before the reduction and passing them as block arguments to the reduction init and combiner regions. I would prefer to keep things simple for now. Note: this implementation only works when the HLFIR lowering is used. I don't think it is worth supporting FIR-only lowering because the plan is for that to be removed soon. --- flang/lib/Lower/OpenMP/ReductionProcessor.cpp | 283 ++ flang/lib/Lower/OpenMP/ReductionProcessor.h | 2 +- .../Lower/OpenMP/Todo/reduction-arrays.f90| 15 - .../Lower/OpenMP/parallel-reduction-array.f90 | 74 + .../Lower/OpenMP/wsloop-reduction-array.f90 | 84 ++ 5 files changed, 389 insertions(+), 69 deletions(-) delete mode 100644 flang/test/Lower/OpenMP/Todo/reduction-arrays.f90 create mode 100644 flang/test/Lower/OpenMP/parallel-reduction-array.f90 create mode 100644 flang/test/Lower/OpenMP/wsloop-reduction-array.f90 diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp index e6a63dd4b939ce..de2f11f5b9512e 100644 --- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp +++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp @@ -13,6 +13,7 @@ #include "ReductionProcessor.h" #include "flang/Lower/AbstractConverter.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" #include "flang/Optimizer/Builder/Todo.h" #include "flang/Optimizer/Dialect/FIRType.h" #include "flang/Optimizer/HLFIR/HLFIROps.h" @@ -92,10 +93,42 @@ std::string ReductionProcessor::getReductionName(llvm::StringRef name, if (isByRef) byrefAddition = "_byref"; - return (llvm::Twine(name) + - (ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + - llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) - .str(); + if (fir::isa_trivial(ty)) +return (llvm::Twine(name) + +(ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + +llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) +.str(); + + // creates a name like reduction_i_64_box_ux4x3 + if (auto boxTy = mlir::dyn_cast_or_null(ty)) { +// TODO: support for allocatable boxes: +// !fir.box>> +fir::SequenceType seqTy = fir::unwrapRefType(boxTy.getEleTy()) + .dyn_cast_or_null(); +if (!seqTy) + return {}; + +std::string prefix = getReductionName( +name, fir::unwrapSeqOrBoxedSeqType(ty), /*isByRef=*/false); +if (prefix.empty()) + return {}; +std::stringstream tyStr; +tyStr << prefix << "_box_"; +bool first = true; +for (std::int64_t extent : seqTy.getShape()) { + if (first) +first = false; + else +tyStr << "x"; + if (extent == seqTy.getUnknownExtent()) +tyStr << 'u'; // I'm not sure that '?' is safe in symbol names + else +tyStr << extent; +} +return (tyStr.str() + byrefAddition).str(); + } + + return {}; } std::string ReductionProcessor::getReductionName( @@ -283,6 +316,156 @@ mlir::Value ReductionProcessor::createScalarCombiner( return reductionOp; } +/// Create reduction combiner region for reduction variables which are boxed +/// arrays +static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc, + ReductionProcessor::ReductionIdentifier redId, + fir::BaseBoxType boxTy, mlir::Value lhs, + mlir::Value rhs) { + fir::SequenceType seqTy = + mlir::dyn_cast_or_null(boxTy.getEleTy()); + // TODO: support allocatable arrays: !fir.box>> + if (!seqTy || seqTy.hasUnknownShape()) +TODO(loc, "Unsupported boxed type in OpenMP reduction"); + + // load fir.ref> + mlir::Value lhsAddr = lhs; + lhs = builder.create(loc, lhs); + rhs = builder.create(loc, rhs); + + const unsigned rank = seqTy.getDimension(); + llvm::SmallVector extents; + extents.reserve(
[llvm-branch-commits] [flang] [flang][OpenMP] lower simple array reductions (PR #84958)
https://github.com/tblah updated https://github.com/llvm/llvm-project/pull/84958 >From bd668cd95d95d1e5b9c8436875c14878c98902ff Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Mon, 12 Feb 2024 14:03:00 + Subject: [PATCH 1/5] [flang][OpenMP] lower simple array reductions This has been tested with arrays with compile-time constant bounds. Allocatable arrays and arrays with non-constant bounds are not yet supported. User-defined reduction functions are also not yet supported. The design is intended to work for arrays with non-constant bounds too without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I haven't fixed yet). We need some way to get these runtime bounds into the reduction init and combiner regions. To keep things simple for now I opted to always box the array arguments so the box can be passed as one argument and the lower bounds and extents read from the box. This has the disadvantage of resulting in fir.box_dim operations inside of the critical section. If these prove to be a performance issue, we could follow OpenACC reading box lower bounds and extents before the reduction and passing them as block arguments to the reduction init and combiner regions. I would prefer to keep things simple for now. Note: this implementation only works when the HLFIR lowering is used. I don't think it is worth supporting FIR-only lowering because the plan is for that to be removed soon. --- flang/lib/Lower/OpenMP/ReductionProcessor.cpp | 283 ++ flang/lib/Lower/OpenMP/ReductionProcessor.h | 2 +- .../Lower/OpenMP/Todo/reduction-arrays.f90| 15 - .../Lower/OpenMP/parallel-reduction-array.f90 | 74 + .../Lower/OpenMP/wsloop-reduction-array.f90 | 84 ++ 5 files changed, 389 insertions(+), 69 deletions(-) delete mode 100644 flang/test/Lower/OpenMP/Todo/reduction-arrays.f90 create mode 100644 flang/test/Lower/OpenMP/parallel-reduction-array.f90 create mode 100644 flang/test/Lower/OpenMP/wsloop-reduction-array.f90 diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp index e6a63dd4b939ce..de2f11f5b9512e 100644 --- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp +++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp @@ -13,6 +13,7 @@ #include "ReductionProcessor.h" #include "flang/Lower/AbstractConverter.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" #include "flang/Optimizer/Builder/Todo.h" #include "flang/Optimizer/Dialect/FIRType.h" #include "flang/Optimizer/HLFIR/HLFIROps.h" @@ -92,10 +93,42 @@ std::string ReductionProcessor::getReductionName(llvm::StringRef name, if (isByRef) byrefAddition = "_byref"; - return (llvm::Twine(name) + - (ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + - llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) - .str(); + if (fir::isa_trivial(ty)) +return (llvm::Twine(name) + +(ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + +llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) +.str(); + + // creates a name like reduction_i_64_box_ux4x3 + if (auto boxTy = mlir::dyn_cast_or_null(ty)) { +// TODO: support for allocatable boxes: +// !fir.box>> +fir::SequenceType seqTy = fir::unwrapRefType(boxTy.getEleTy()) + .dyn_cast_or_null(); +if (!seqTy) + return {}; + +std::string prefix = getReductionName( +name, fir::unwrapSeqOrBoxedSeqType(ty), /*isByRef=*/false); +if (prefix.empty()) + return {}; +std::stringstream tyStr; +tyStr << prefix << "_box_"; +bool first = true; +for (std::int64_t extent : seqTy.getShape()) { + if (first) +first = false; + else +tyStr << "x"; + if (extent == seqTy.getUnknownExtent()) +tyStr << 'u'; // I'm not sure that '?' is safe in symbol names + else +tyStr << extent; +} +return (tyStr.str() + byrefAddition).str(); + } + + return {}; } std::string ReductionProcessor::getReductionName( @@ -283,6 +316,156 @@ mlir::Value ReductionProcessor::createScalarCombiner( return reductionOp; } +/// Create reduction combiner region for reduction variables which are boxed +/// arrays +static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc, + ReductionProcessor::ReductionIdentifier redId, + fir::BaseBoxType boxTy, mlir::Value lhs, + mlir::Value rhs) { + fir::SequenceType seqTy = + mlir::dyn_cast_or_null(boxTy.getEleTy()); + // TODO: support allocatable arrays: !fir.box>> + if (!seqTy || seqTy.hasUnknownShape()) +TODO(loc, "Unsupported boxed type in OpenMP reduction"); + + // load fir.ref> + mlir::Value lhsAddr = lhs; + lhs = builder.create(loc, lhs); + rhs = builder.create(loc, rhs); + + const unsigned rank = seqTy.getDimension(); + llvm::SmallVector extents; + extents.reserve(
[llvm-branch-commits] [flang] [flang][OpenMP] lower simple array reductions (PR #84958)
@@ -92,10 +93,42 @@ std::string ReductionProcessor::getReductionName(llvm::StringRef name, if (isByRef) byrefAddition = "_byref"; - return (llvm::Twine(name) + - (ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + - llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) - .str(); + if (fir::isa_trivial(ty)) +return (llvm::Twine(name) + +(ty.isIntOrIndex() ? llvm::Twine("_i_") : llvm::Twine("_f_")) + +llvm::Twine(ty.getIntOrFloatBitWidth()) + byrefAddition) +.str(); + + // creates a name like reduction_i_64_box_ux4x3 + if (auto boxTy = mlir::dyn_cast_or_null(ty)) { +// TODO: support for allocatable boxes: +// !fir.box>> +fir::SequenceType seqTy = fir::unwrapRefType(boxTy.getEleTy()) + .dyn_cast_or_null(); +if (!seqTy) + return {}; + +std::string prefix = getReductionName( +name, fir::unwrapSeqOrBoxedSeqType(ty), /*isByRef=*/false); +if (prefix.empty()) + return {}; +std::stringstream tyStr; +tyStr << prefix << "_box_"; +bool first = true; +for (std::int64_t extent : seqTy.getShape()) { + if (first) +first = false; + else +tyStr << "x"; + if (extent == seqTy.getUnknownExtent()) +tyStr << 'u'; // I'm not sure that '?' is safe in symbol names + else +tyStr << extent; +} +return (tyStr.str() + byrefAddition).str(); + } + + return {}; tblah wrote: https://github.com/llvm/llvm-project/pull/85666 https://github.com/llvm/llvm-project/pull/84958 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][LLVM] erase call mappings in forgetMapping() (PR #84955)
https://github.com/tblah updated https://github.com/llvm/llvm-project/pull/84955 >From 0b2f5fee61d170b0a2197fd5da92f0e84b3b14f4 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Wed, 21 Feb 2024 14:22:39 + Subject: [PATCH 1/3] [mlir][LLVM] erase call mappings in forgetMapping() It looks like the mappings for call instructions were forgotten here. This fixes a bug in OpenMP when inlining a region containing call operations multiple times. --- mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp index c00628a420a000..995544238e4a3c 100644 --- a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp @@ -716,6 +716,8 @@ void ModuleTranslation::forgetMapping(Region ®ion) { branchMapping.erase(&op); if (isa(op)) globalsMapping.erase(&op); +if (isa(op)) + callMapping.erase(&op); llvm::append_range( toProcess, llvm::map_range(op.getRegions(), [](Region &r) { return &r; })); >From d6955d43534d23aaf58c10090c1a5a2167018796 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Fri, 15 Mar 2024 14:05:53 + Subject: [PATCH 2/3] Add test --- .../Target/LLVMIR/openmp-reduction-call.mlir | 47 +++ 1 file changed, 47 insertions(+) create mode 100644 mlir/test/Target/LLVMIR/openmp-reduction-call.mlir diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir new file mode 100644 index 00..60419a85f66aa2 --- /dev/null +++ b/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir @@ -0,0 +1,47 @@ +// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s + +// Test that we don't crash when there is a call operation in the combiner + +omp.reduction.declare @add_f32 : f32 +init { +^bb0(%arg: f32): + %0 = llvm.mlir.constant(0.0 : f32) : f32 + omp.yield (%0 : f32) +} +combiner { +^bb1(%arg0: f32, %arg1: f32): +// test this call here: + llvm.call @test_call() : () -> () + %1 = llvm.fadd %arg0, %arg1 : f32 + omp.yield (%1 : f32) +} +atomic { +^bb2(%arg2: !llvm.ptr, %arg3: !llvm.ptr): + %2 = llvm.load %arg3 : !llvm.ptr -> f32 + llvm.atomicrmw fadd %arg2, %2 monotonic : !llvm.ptr, f32 + omp.yield +} + +llvm.func @simple_reduction(%lb : i64, %ub : i64, %step : i64) { + %c1 = llvm.mlir.constant(1 : i32) : i32 + %0 = llvm.alloca %c1 x i32 : (i32) -> !llvm.ptr + omp.parallel { +omp.wsloop reduction(@add_f32 %0 -> %prv : !llvm.ptr) +for (%iv) : i64 = (%lb) to (%ub) step (%step) { + %1 = llvm.mlir.constant(2.0 : f32) : f32 + %2 = llvm.load %prv : !llvm.ptr -> f32 + %3 = llvm.fadd %1, %2 : f32 + llvm.store %3, %prv : f32, !llvm.ptr + omp.yield +} +omp.terminator + } + llvm.return +} + +llvm.func @test_call() -> () + +// Call to the troublesome function will have been inlined twice: once into +// main and once into the outlined function +// CHECK: call void @test_call() +// CHECK: call void @test_call() >From 02550e13fc8e9d3e653b8e689a4479cfdd0bfcc8 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Mon, 18 Mar 2024 17:16:55 + Subject: [PATCH 3/3] Simplify test --- .../Target/LLVMIR/openmp-reduction-call.mlir | 20 +-- 1 file changed, 5 insertions(+), 15 deletions(-) diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir index 60419a85f66aa2..9aa9ac7e4b5a52 100644 --- a/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir +++ b/mlir/test/Target/LLVMIR/openmp-reduction-call.mlir @@ -15,25 +15,15 @@ combiner { %1 = llvm.fadd %arg0, %arg1 : f32 omp.yield (%1 : f32) } -atomic { -^bb2(%arg2: !llvm.ptr, %arg3: !llvm.ptr): - %2 = llvm.load %arg3 : !llvm.ptr -> f32 - llvm.atomicrmw fadd %arg2, %2 monotonic : !llvm.ptr, f32 - omp.yield -} llvm.func @simple_reduction(%lb : i64, %ub : i64, %step : i64) { %c1 = llvm.mlir.constant(1 : i32) : i32 %0 = llvm.alloca %c1 x i32 : (i32) -> !llvm.ptr - omp.parallel { -omp.wsloop reduction(@add_f32 %0 -> %prv : !llvm.ptr) -for (%iv) : i64 = (%lb) to (%ub) step (%step) { - %1 = llvm.mlir.constant(2.0 : f32) : f32 - %2 = llvm.load %prv : !llvm.ptr -> f32 - %3 = llvm.fadd %1, %2 : f32 - llvm.store %3, %prv : f32, !llvm.ptr - omp.yield -} + omp.parallel reduction(@add_f32 %0 -> %prv : !llvm.ptr) { +%1 = llvm.mlir.constant(2.0 : f32) : f32 +%2 = llvm.load %prv : !llvm.ptr -> f32 +%3 = llvm.fadd %1, %2 : f32 +llvm.store %3, %prv : f32, !llvm.ptr omp.terminator } llvm.return ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][LLVM] erase call mappings in forgetMapping() (PR #84955)
tblah wrote: Sorry about the force push. Github wasn't showing the new commit (02550e1). It is just a rebase onto the target branch. https://github.com/llvm/llvm-project/pull/84955 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][OpenMP] map argument to reduction initialization region (PR #86979)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/86979 The argument to the initialization region of reduction declarations was never mapped. This meant that if this argument was accessed inside the initialization region, that mlir operation would be translated to an llvm operation with a null argument (failing verification). Adding the mapping ensures that the right LLVM value can be found when inlining and converting the initialization region. We have to separately establish and clean up these mappings for each use of the reduction declaration because repeated usage of the same declaration will inline it using a different concrete value for the block argument. >From aa5f843292a941dc976bd445d1dce4916cfb7438 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Mon, 25 Mar 2024 16:36:02 + Subject: [PATCH] [mlir][OpenMP] map argument to reduction initialization region The argument to the initialization region of reduction declarations was never mapped. This meant that if this argument was accessed inside the initialization region, that mlir operation would be translated to an llvm operation with a null argument (failing verification). Adding the mapping ensures that the right LLVM value can be found when inlining and converting the initialization region. We have to separately establish and clean up these mappings for each use of the reduction declaration because repeated usage of the same declaration will inline it using a different concrete value for the block argument. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 35 + .../LLVMIR/openmp-reduction-init-arg.mlir | 147 ++ 2 files changed, 182 insertions(+) create mode 100644 mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index cacf2c37e38d1c..c4bf6a20ebe71c 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -825,6 +825,25 @@ static void allocByValReductionVars( } } +/// Map input argument to all reduction initialization regions +template +static void +mapInitializationArg(T loop, LLVM::ModuleTranslation &moduleTranslation, + SmallVectorImpl &reductionDecls, + unsigned i) { + // map input argument to the initialization region + mlir::omp::DeclareReductionOp &reduction = reductionDecls[i]; + Region &initializerRegion = reduction.getInitializerRegion(); + Block &entry = initializerRegion.front(); + assert(entry.getNumArguments() == 1 && + "the initialization region has one argument"); + + mlir::Value mlirSource = loop.getReductionVars()[i]; + llvm::Value *llvmSource = moduleTranslation.lookupValue(mlirSource); + assert(llvmSource && "lookup reduction var"); + moduleTranslation.mapValue(entry.getArgument(0), llvmSource); +} + /// Collect reduction info template static void collectReductionInfo( @@ -902,6 +921,10 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, loop.getRegion().getArguments().take_back(loop.getNumReductionVars()); for (unsigned i = 0; i < loop.getNumReductionVars(); ++i) { SmallVector phis; + +// map block argument to initializer region +mapInitializationArg(loop, moduleTranslation, reductionDecls, i); + if (failed(inlineConvertOmpRegions(reductionDecls[i].getInitializerRegion(), "omp.reduction.neutral", builder, moduleTranslation, &phis))) @@ -925,6 +948,11 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, builder.CreateStore(phis[0], privateReductionVariables[i]); // the rest was handled in allocByValReductionVars } + +// forget the mapping for the initializer region because we might need a +// different mapping if this reduction declaration is re-used for a +// different variable +moduleTranslation.forgetMapping(reductionDecls[i].getInitializerRegion()); } // Store the mapping between reduction variables and their private copies on @@ -1118,6 +1146,9 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder, opInst.getNumReductionVars()); for (unsigned i = 0; i < opInst.getNumReductionVars(); ++i) { SmallVector phis; + + // map the block argument + mapInitializationArg(opInst, moduleTranslation, reductionDecls, i); if (failed(inlineConvertOmpRegions( reductionDecls[i].getInitializerRegion(), "omp.reduction.neutral", builder, moduleTranslation, &phis))) @@ -1144,6 +1175,10 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder, builder.CreateStore(phis[0], privateReductionVariables[i]); // the rest is done in allocByValReduc
[llvm-branch-commits] [mlir] [mlir][OpenMP] map argument to reduction initialization region (PR #86979)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/86979 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] lower reductions of assumed shape arrays (PR #86982)
https://github.com/tblah created https://github.com/llvm/llvm-project/pull/86982 Patch 1: https://github.com/llvm/llvm-project/pull/86978 Patch 2: https://github.com/llvm/llvm-project/pull/86979 >From 9f68c844b6f4c4a52002cd9d90cd158b10e64bf2 Mon Sep 17 00:00:00 2001 From: Tom Eccles Date: Tue, 19 Mar 2024 15:41:59 + Subject: [PATCH] [flang][OpenMP] lower reductions of assumed shape arrays --- flang/lib/Lower/OpenMP/ReductionProcessor.cpp | 25 +- .../wsloop-reduction-array-assumed-shape.f90 | 90 +++ 2 files changed, 112 insertions(+), 3 deletions(-) create mode 100644 flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp index 0d05ca5aee658b..afb1a6e7107641 100644 --- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp +++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp @@ -522,11 +522,16 @@ void ReductionProcessor::addDeclareReduction( if (reductionSymbols) reductionSymbols->push_back(symbol); mlir::Value symVal = converter.getSymbolAddress(*symbol); -auto redType = mlir::cast(symVal.getType()); +mlir::Type eleType; +auto refType = mlir::dyn_cast_or_null(symVal.getType()); +if (refType) + eleType = refType.getEleTy(); +else + eleType = symVal.getType(); // all arrays must be boxed so that we have convenient access to all the // information needed to iterate over the array -if (mlir::isa(redType.getEleTy())) { +if (mlir::isa(eleType)) { hlfir::Entity entity{symVal}; entity = genVariableBox(currentLocation, builder, entity); mlir::Value box = entity.getBase(); @@ -538,11 +543,25 @@ void ReductionProcessor::addDeclareReduction( builder.create(currentLocation, box, alloca); symVal = alloca; - redType = mlir::cast(symVal.getType()); +} else if (mlir::isa(symVal.getType())) { + // boxed arrays are passed as values not by reference. Unfortunately, + // we can't pass a box by value to omp.redution_declare, so turn it + // into a reference + + auto alloca = + builder.create(currentLocation, symVal.getType()); + builder.create(currentLocation, symVal, alloca); + symVal = alloca; } else if (auto declOp = symVal.getDefiningOp()) { symVal = declOp.getBase(); } +// this isn't the same as the by-val and by-ref passing later in the +// pipeline. Both styles assume that the variable is a reference at +// this point +assert(mlir::isa(symVal.getType()) && + "reduction input var is a reference"); + reductionVars.push_back(symVal); } const bool isByRef = doReductionByRef(reductionVars); diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 new file mode 100644 index 00..a1f339faea5cd5 --- /dev/null +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 @@ -0,0 +1,90 @@ +! RUN: bbc -emit-hlfir -fopenmp -o - %s | FileCheck %s +! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s | FileCheck %s + +program reduce_assumed_shape +real(8), dimension(2) :: r +r = 0 +call reduce(r) +print *, r + +contains +subroutine reduce(r) + implicit none + real(8),intent(inout) :: r(:) + integer :: i = 0 + + !$omp parallel do reduction(+:r) + do i=0,10 +r(1) = i +r(2) = 1 + enddo + !$omp end parallel do +end subroutine +end program + +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref>> init { +! CHECK: ^bb0(%[[VAL_0:.*]]: !fir.ref>>): +! CHECK: %[[VAL_1:.*]] = arith.constant 0.00e+00 : f64 +! CHECK: %[[VAL_2:.*]] = fir.load %[[VAL_0]] : !fir.ref>> +! CHECK: %[[VAL_3:.*]] = arith.constant 0 : index +! CHECK: %[[VAL_4:.*]]:3 = fir.box_dims %[[VAL_2]], %[[VAL_3]] : (!fir.box>, index) -> (index, index, index) +! CHECK: %[[VAL_5:.*]] = fir.shape %[[VAL_4]]#1 : (index) -> !fir.shape<1> +! CHECK: %[[VAL_6:.*]] = fir.alloca !fir.array, %[[VAL_4]]#1 {bindc_name = ".tmp"} +! CHECK: %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_6]](%[[VAL_5]]) {uniq_name = ".tmp"} : (!fir.ref>, !fir.shape<1>) -> (!fir.box>, !fir.ref>) +! CHECK: hlfir.assign %[[VAL_1]] to %[[VAL_7]]#0 : f64, !fir.box> +! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box> +! CHECK: fir.store %[[VAL_7]]#0 to %[[VAL_8]] : !fir.ref>> +! CHECK: omp.yield(%[[VAL_8]] : !fir.ref>>) + +! CHECK-LABEL: } combiner { +! CHECK: ^bb0(%[[VAL_0:.*]]: !fir.ref>>, %[[VAL_1:.*]]: !fir.ref>>): +! CHECK: %[[VAL_2:.*]] = fir.load %[[VAL_0]] : !fir.ref>> +! CHECK: %[[VAL_3:.*]] = fir.load %[[VAL_1]] : !fir.ref>> +! CHECK: %[[VAL_4:.*]] = arith.constant 0 : index +! CHECK: %[[VAL_5:.*]]:3 = fir.box_dims %[[VAL_2]], %[[VAL_4]] : (!fir.box>
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Update op verifiers dependent on omp.wsloop (2/5) (PR #89211)
@@ -1977,9 +1977,10 @@ LogicalResult OrderedRegionOp::verify() { if (getSimd()) return failure(); - if (auto container = (*this)->getParentOfType()) { -if (!container.getOrderedValAttr() || -container.getOrderedValAttr().getInt() != 0) + if (auto loopOp = dyn_cast((*this)->getParentOp())) { tblah wrote: Could there be any case where there is another operation in between the OrderedRegionOp and the LoopNestOp? If not, maybe we should check the parent explicitly https://github.com/llvm/llvm-project/pull/89211 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][SCF] Update scf.parallel lowering to OpenMP (3/5) (PR #89212)
https://github.com/tblah approved this pull request. LGTM, thanks https://github.com/llvm/llvm-project/pull/89212 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Update omp.wsloop translation to LLVM IR (4/5) (PR #89214)
https://github.com/tblah approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/89214 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Update op verifiers dependent on omp.wsloop (2/5) (PR #89211)
https://github.com/tblah approved this pull request. LGTM. I'm sorry this fell off my radar https://github.com/llvm/llvm-project/pull/89211 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Implement getIterationVariableSymbol helper function,… (PR #90087)
https://github.com/tblah approved this pull request. LG https://github.com/llvm/llvm-project/pull/90087 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Pass symTable to all genXYZ functions, NFC (PR #90090)
https://github.com/tblah approved this pull request. LGTM, thanks https://github.com/llvm/llvm-project/pull/90090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Don't pass clauses to op-generating functions anymore (PR #90108)
https://github.com/tblah approved this pull request. LG. Thanks for the cleanup https://github.com/llvm/llvm-project/pull/90108 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits