[llvm-branch-commits] [clang] PR for llvm/llvm-project#80432 (PR #80433)
https://github.com/dtemirbulatov approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/80433 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [clang] PR for llvm/llvm-project#79614 (PR #79870)
labrinea wrote: Ping! @tstellar do I need to take any actions for this to show up in the release board? I am not seeing it under `needs triage` or `needs merge`. https://github.com/llvm/llvm-project/pull/79870 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
https://github.com/kiranchandramohan ready_for_review https://github.com/llvm/llvm-project/pull/80019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
llvmbot wrote: @llvm/pr-subscribers-mlir-llvm Author: David Truby (DavidTruby) Changes This patch reworks the way that wsloop reduction operations function to better match the expected semantics from the OpenMP specification, following the rework of parallel reductions. The new semantics create a private reduction variable as a block argument which should be used normally for all operations on that variable in the region; this private variable is then combined with the others into the shared variable. This way no special omp.reduction operations are needed inside the region. These block arguments follow the loop control block arguments. --- Patch is 361.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80019.diff 37 Files Affected: - (modified) flang/lib/Lower/OpenMP.cpp (+37-19) - (modified) flang/test/Fir/convert-to-llvm-openmp-and-fir.fir (+16-4) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-add.f90 (+266-166) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-iand.f90 (+5-3) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-ieor.f90 (+5-3) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-ior.f90 (+4-2) - (removed) flang/test/Lower/OpenMP/FIR/wsloop-reduction-logical-and.f90 (-137) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-logical-eqv.f90 (+138-93) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-logical-neqv.f90 (+140-93) - (removed) flang/test/Lower/OpenMP/FIR/wsloop-reduction-logical-or.f90 (-137) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90 (+8-5) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90 (+8-4) - (removed) flang/test/Lower/OpenMP/FIR/wsloop-reduction-mul.f90 (-274) - (modified) flang/test/Lower/OpenMP/default-clause.f90 (+1-1) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-add-hlfir.f90 (+37-28) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-add.f90 (+312-187) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand.f90 (+40-24) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor.f90 (+6-3) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior.f90 (+41-24) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and.f90 (+158-97) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv.f90 (+153-96) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv.f90 (+158-96) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or.f90 (+155-96) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2.f90 (+2-1) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-hlfir.f90 (+41-19) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-max.f90 (+105-48) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-min.f90 (+107-48) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul.f90 (+282-186) - (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+3-6) - (modified) mlir/lib/Conversion/SCFToOpenMP/SCFToOpenMP.cpp (+31-1) - (modified) mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp (+90-11) - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+14-24) - (modified) mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir (+10-4) - (modified) mlir/test/Conversion/SCFToOpenMP/reductions.mlir (+22-8) - (modified) mlir/test/Dialect/OpenMP/invalid.mlir (+3-32) - (modified) mlir/test/Dialect/OpenMP/ops.mlir (+17-9) - (modified) mlir/test/Target/LLVMIR/openmp-reduction.mlir (+30-14) ``diff diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp index fcf10b26c135b..74cd6c27b3440 100644 --- a/flang/lib/Lower/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP.cpp @@ -2274,6 +2274,12 @@ static void createBodyOfOp( return undef.getDefiningOp(); }; + llvm::SmallVector blockArgTypes; + llvm::SmallVector blockArgLocs; + blockArgTypes.reserve(loopArgs.size() + reductionArgs.size()); + blockArgLocs.reserve(blockArgTypes.size()); + mlir::Block *entryBlock; + // If an argument for the region is provided then create the block with that // argument. Also update the symbol's address with the mlir argument value. // e.g. For loops the argument is the induction variable. And all further @@ -2283,11 +2289,21 @@ static void createBodyOfOp( for (const Fortran::semantics::Symbol *arg : loopArgs) loopVarTypeSize = std::max(loopVarTypeSize, arg->GetUltimate().size()); mlir::Type loopVarType = getLoopVarType(converter, loopVarTypeSize); -llvm::SmallVector tiv(loopArgs.size(), loopVarType); -llvm::SmallVector locs(loopArgs.size(), loc); -firOpBuilder.createBlock(&op.getRegion(), {}, tiv, locs); -// The argument is not currently in memory, so make a temporary for the -// argument, and store it there, then bind that location to the argument. +std::fill_n(std::back_inserter(blockArgTypes), loopArgs.size(), +loopVarType); +std::fill_n(std::
[llvm-branch-commits] [mlir] [flang] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
llvmbot wrote: @llvm/pr-subscribers-mlir Author: David Truby (DavidTruby) Changes This patch reworks the way that wsloop reduction operations function to better match the expected semantics from the OpenMP specification, following the rework of parallel reductions. The new semantics create a private reduction variable as a block argument which should be used normally for all operations on that variable in the region; this private variable is then combined with the others into the shared variable. This way no special omp.reduction operations are needed inside the region. These block arguments follow the loop control block arguments. --- Patch is 361.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80019.diff 37 Files Affected: - (modified) flang/lib/Lower/OpenMP.cpp (+37-19) - (modified) flang/test/Fir/convert-to-llvm-openmp-and-fir.fir (+16-4) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-add.f90 (+266-166) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-iand.f90 (+5-3) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-ieor.f90 (+5-3) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-ior.f90 (+4-2) - (removed) flang/test/Lower/OpenMP/FIR/wsloop-reduction-logical-and.f90 (-137) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-logical-eqv.f90 (+138-93) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-logical-neqv.f90 (+140-93) - (removed) flang/test/Lower/OpenMP/FIR/wsloop-reduction-logical-or.f90 (-137) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90 (+8-5) - (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90 (+8-4) - (removed) flang/test/Lower/OpenMP/FIR/wsloop-reduction-mul.f90 (-274) - (modified) flang/test/Lower/OpenMP/default-clause.f90 (+1-1) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-add-hlfir.f90 (+37-28) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-add.f90 (+312-187) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand.f90 (+40-24) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor.f90 (+6-3) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior.f90 (+41-24) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and.f90 (+158-97) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv.f90 (+153-96) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv.f90 (+158-96) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or.f90 (+155-96) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2.f90 (+2-1) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-hlfir.f90 (+41-19) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-max.f90 (+105-48) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-min.f90 (+107-48) - (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul.f90 (+282-186) - (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+3-6) - (modified) mlir/lib/Conversion/SCFToOpenMP/SCFToOpenMP.cpp (+31-1) - (modified) mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp (+90-11) - (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+14-24) - (modified) mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir (+10-4) - (modified) mlir/test/Conversion/SCFToOpenMP/reductions.mlir (+22-8) - (modified) mlir/test/Dialect/OpenMP/invalid.mlir (+3-32) - (modified) mlir/test/Dialect/OpenMP/ops.mlir (+17-9) - (modified) mlir/test/Target/LLVMIR/openmp-reduction.mlir (+30-14) ``diff diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp index fcf10b26c135b..74cd6c27b3440 100644 --- a/flang/lib/Lower/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP.cpp @@ -2274,6 +2274,12 @@ static void createBodyOfOp( return undef.getDefiningOp(); }; + llvm::SmallVector blockArgTypes; + llvm::SmallVector blockArgLocs; + blockArgTypes.reserve(loopArgs.size() + reductionArgs.size()); + blockArgLocs.reserve(blockArgTypes.size()); + mlir::Block *entryBlock; + // If an argument for the region is provided then create the block with that // argument. Also update the symbol's address with the mlir argument value. // e.g. For loops the argument is the induction variable. And all further @@ -2283,11 +2289,21 @@ static void createBodyOfOp( for (const Fortran::semantics::Symbol *arg : loopArgs) loopVarTypeSize = std::max(loopVarTypeSize, arg->GetUltimate().size()); mlir::Type loopVarType = getLoopVarType(converter, loopVarTypeSize); -llvm::SmallVector tiv(loopArgs.size(), loopVarType); -llvm::SmallVector locs(loopArgs.size(), loc); -firOpBuilder.createBlock(&op.getRegion(), {}, tiv, locs); -// The argument is not currently in memory, so make a temporary for the -// argument, and store it there, then bind that location to the argument. +std::fill_n(std::back_inserter(blockArgTypes), loopArgs.size(), +loopVarType); +std::fill_n(std::back_
[llvm-branch-commits] [clang] [Release Notes][FMV] Document support for rcpc3 and mops features. (PR #80152)
labrinea wrote: @tstellar could you please merge this patch on the release branch? Cheers. https://github.com/llvm/llvm-project/pull/80152 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [flang] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
tblah wrote: Please could you update the documentation for reductions on line 442 - I presume we don't want to encourage `omp.reduction` operations anymore https://github.com/llvm/llvm-project/pull/80019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [flang] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
@@ -398,11 +400,39 @@ struct ParallelOpLowering : public OpRewritePattern { // Replace the reduction operations contained in this loop. Must be done // here rather than in a separate pattern to have access to the list of // reduction variables. +unsigned int reductionIndex = 0; for (auto [x, y] : llvm::zip_equal(reductionVariables, reduce.getOperands())) { OpBuilder::InsertionGuard guard(rewriter); rewriter.setInsertionPoint(reduce); - rewriter.create(reduce.getLoc(), y, x); + Region &redRegion = + ompReductionDecls[reductionIndex].getReductionRegion(); + assert(redRegion.hasOneBlock() && + "expect reduction region to have one block"); tblah wrote: Please could you add a comment explaining why a reduction region must have only one block, or adding a TODO for multiple blocks. https://github.com/llvm/llvm-project/pull/80019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [flang] [mlir][flang][openmp] Rework wsloop reduction operations (PR #80019)
@@ -398,11 +400,39 @@ struct ParallelOpLowering : public OpRewritePattern { // Replace the reduction operations contained in this loop. Must be done // here rather than in a separate pattern to have access to the list of // reduction variables. +unsigned int reductionIndex = 0; for (auto [x, y] : llvm::zip_equal(reductionVariables, reduce.getOperands())) { tblah wrote: nit: you could add `ompReductionDecls` to the `llvm::zip_equal` so that the loop handles the iteration. https://github.com/llvm/llvm-project/pull/80019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] PR for llvm/llvm-project#80432 (PR #80433)
dtemirbulatov wrote: > @dtemirbulatov What do you think about merging this PR to the release branch? no objections. https://github.com/llvm/llvm-project/pull/80433 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/80695 resolves llvm/llvm-project#80694 >From ba5a8cd31193ed21602781c7f0f23ddd380401cf Mon Sep 17 00:00:00 2001 From: Pierre van Houtryve Date: Mon, 5 Feb 2024 14:36:15 +0100 Subject: [PATCH] [AMDGPU][PromoteAlloca] Support memsets to ptr allocas (#80678) Fixes #80366 (cherry picked from commit 4e958abf2f44d08129eafd5b6a4ee2bd3584ed22) --- .../lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | 16 -- .../CodeGen/AMDGPU/promote-alloca-memset.ll | 54 +++ 2 files changed, 66 insertions(+), 4 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp index 5e73411cae9b7..c1b244f50d93f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp @@ -521,10 +521,18 @@ static Value *promoteAllocaUserToVector( // For memset, we don't need to know the previous value because we // currently only allow memsets that cover the whole alloca. Value *Elt = MSI->getOperand(1); - if (DL.getTypeStoreSize(VecEltTy) > 1) { -Value *EltBytes = -Builder.CreateVectorSplat(DL.getTypeStoreSize(VecEltTy), Elt); -Elt = Builder.CreateBitCast(EltBytes, VecEltTy); + const unsigned BytesPerElt = DL.getTypeStoreSize(VecEltTy); + if (BytesPerElt > 1) { +Value *EltBytes = Builder.CreateVectorSplat(BytesPerElt, Elt); + +// If the element type of the vector is a pointer, we need to first cast +// to an integer, then use a PtrCast. +if (VecEltTy->isPointerTy()) { + Type *PtrInt = Builder.getIntNTy(BytesPerElt * 8); + Elt = Builder.CreateBitCast(EltBytes, PtrInt); + Elt = Builder.CreateIntToPtr(Elt, VecEltTy); +} else + Elt = Builder.CreateBitCast(EltBytes, VecEltTy); } return Builder.CreateVectorSplat(VectorTy->getElementCount(), Elt); diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll index 15af1f17e230e..f1e2737b370ef 100644 --- a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll +++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll @@ -84,4 +84,58 @@ entry: ret void } +define amdgpu_kernel void @memset_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [6 x ptr], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_vector_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_vector_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca <6 x ptr>, align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_array_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x [3 x ptr]], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x [3 x ptr]], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_vec_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_vec_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x <3 x ptr>], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x <3 x ptr>], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/80695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)
llvmbot wrote: @arsenm What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/80695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: None (llvmbot) Changes resolves llvm/llvm-project#80694 --- Full diff: https://github.com/llvm/llvm-project/pull/80695.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+12-4) - (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll (+54) ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp index 5e73411cae9b70..c1b244f50d93f8 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp @@ -521,10 +521,18 @@ static Value *promoteAllocaUserToVector( // For memset, we don't need to know the previous value because we // currently only allow memsets that cover the whole alloca. Value *Elt = MSI->getOperand(1); - if (DL.getTypeStoreSize(VecEltTy) > 1) { -Value *EltBytes = -Builder.CreateVectorSplat(DL.getTypeStoreSize(VecEltTy), Elt); -Elt = Builder.CreateBitCast(EltBytes, VecEltTy); + const unsigned BytesPerElt = DL.getTypeStoreSize(VecEltTy); + if (BytesPerElt > 1) { +Value *EltBytes = Builder.CreateVectorSplat(BytesPerElt, Elt); + +// If the element type of the vector is a pointer, we need to first cast +// to an integer, then use a PtrCast. +if (VecEltTy->isPointerTy()) { + Type *PtrInt = Builder.getIntNTy(BytesPerElt * 8); + Elt = Builder.CreateBitCast(EltBytes, PtrInt); + Elt = Builder.CreateIntToPtr(Elt, VecEltTy); +} else + Elt = Builder.CreateBitCast(EltBytes, VecEltTy); } return Builder.CreateVectorSplat(VectorTy->getElementCount(), Elt); diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll index 15af1f17e230ec..f1e2737b370ef0 100644 --- a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll +++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll @@ -84,4 +84,58 @@ entry: ret void } +define amdgpu_kernel void @memset_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [6 x ptr], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_vector_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_vector_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca <6 x ptr>, align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_array_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x [3 x ptr]], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x [3 x ptr]], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_vec_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_vec_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x <3 x ptr>], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x <3 x ptr>], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg) `` https://github.com/llvm/llvm-project/pull/80695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
https://github.com/ayalz commented: Nice refactoring clean-up! Adding some comments. https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -489,6 +490,23 @@ Value *VPInstruction::generateInstruction(VPTransformState &State, return ReducedPartRdx; } + case VPInstruction::PtrAdd: { +if (vputils::onlyFirstLaneUsed(this)) { + auto *P = Builder.CreatePtrAdd( + State.get(getOperand(0), VPIteration(Part, 0)), + State.get(getOperand(1), VPIteration(Part, 0)), Name); + State.set(this, P, VPIteration(Part, 0)); +} else { + for (unsigned Lane = 0; Lane != State.VF.getKnownMinValue(); ++Lane) { +Value *P = Builder.CreatePtrAdd( +State.get(getOperand(0), VPIteration(Part, Lane)), +State.get(getOperand(1), VPIteration(Part, Lane)), Name); + +State.set(this, P, VPIteration(Part, Lane)); + } +} +return nullptr; ayalz wrote: Better for generateInstruction() to continue generate and return a single per-part Value, which is then set in State, possibly renaming it generateValuePerPart(), and have a separate generateValuePerLane() - currently to be invoked only for PtrAdd having all lanes used? https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
https://github.com/ayalz edited https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -540,6 +560,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const { default: return false; case Instruction::ICmp: + case VPInstruction::PtrAdd: // TODO: Cover additional opcodes. ayalz wrote: nit: better place this TODO under default? https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -537,6 +542,30 @@ void VPlanTransforms::optimizeInductions(VPlan &Plan, ScalarEvolution &SE) { bool HasOnlyVectorVFs = !Plan.hasVF(ElementCount::getFixed(1)); VPBasicBlock::iterator InsertPt = HeaderVPBB->getFirstNonPhi(); for (VPRecipeBase &Phi : HeaderVPBB->phis()) { +if (auto *PtrIV = dyn_cast(&Phi)) { + if (!PtrIV->onlyScalarsGenerated(Plan.hasScalableVF())) +continue; + + const InductionDescriptor &ID = PtrIV->getInductionDescriptor(); + VPValue *StartV = Plan.getVPValueOrAddLiveIn( + ConstantInt::get(ID.getStep()->getType(), 0)); + VPValue *StepV = PtrIV->getOperand(1); + VPRecipeBase *Steps = ayalz wrote: Have createScalarIVSteps() return a VPSingleDefRecipe (or even VPScalarIVStepsRecipe) to avoid going through getDefiningRecipe() and getVPSingleValue()? https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -857,11 +857,7 @@ void VPlan::execute(VPTransformState *State) { Phi = cast(State->get(R.getVPSingleValue(), 0)); } else { auto *WidenPhi = cast(&R); -// TODO: Split off the case that all users of a pointer phi are scalar -// from the VPWidenPointerInductionRecipe. -if (WidenPhi->onlyScalarsGenerated(State->VF.isScalable())) - continue; - +assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable())); ayalz wrote: nit: assert message. https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -546,9 +575,10 @@ void VPlanTransforms::optimizeInductions(VPlan &Plan, ScalarEvolution &SE) { continue; const InductionDescriptor &ID = WideIV->getInductionDescriptor(); -VPValue *Steps = createScalarIVSteps(Plan, ID, SE, WideIV->getTruncInst(), - WideIV->getStartValue(), - WideIV->getStepValue(), InsertPt); +VPValue *Steps = createScalarIVSteps( +Plan, ID.getKind(), SE, WideIV->getTruncInst(), WideIV->getStartValue(), +WideIV->getStepValue(), ID.getInductionOpcode(), InsertPt, +dyn_cast_or_null(ID.getInductionBinOp())); ayalz wrote: nit: seems more logical to group the three fields of ID and pass them as adjacent parameters? https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -540,6 +560,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const { default: return false; case Instruction::ICmp: + case VPInstruction::PtrAdd: // TODO: Cover additional opcodes. return vputils::onlyFirstLaneUsed(this); case VPInstruction::ComputeReductionResult: ayalz wrote: nit (unrelated): fall-through to join `true` cases. https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -2503,6 +2504,12 @@ class VPDerivedIVRecipe : public VPSingleDefRecipe { dyn_cast_or_null(IndDesc.getInductionBinOp()), Start, CanonicalIV, Step) {} + VPDerivedIVRecipe(InductionDescriptor::InductionKind Kind, VPValue *Start, +VPCanonicalIVPHIRecipe *CanonicalIV, VPValue *Step, +FPMathOperator *FPBinOp) ayalz wrote: nit: this is identical to the private constructor above, except for accepting a non-const FPBinOp? https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -537,6 +542,30 @@ void VPlanTransforms::optimizeInductions(VPlan &Plan, ScalarEvolution &SE) { bool HasOnlyVectorVFs = !Plan.hasVF(ElementCount::getFixed(1)); VPBasicBlock::iterator InsertPt = HeaderVPBB->getFirstNonPhi(); for (VPRecipeBase &Phi : HeaderVPBB->phis()) { +if (auto *PtrIV = dyn_cast(&Phi)) { + if (!PtrIV->onlyScalarsGenerated(Plan.hasScalableVF())) +continue; + + const InductionDescriptor &ID = PtrIV->getInductionDescriptor(); + VPValue *StartV = Plan.getVPValueOrAddLiveIn( + ConstantInt::get(ID.getStep()->getType(), 0)); ayalz wrote: nit: would getting the Type of ID.getStartValue() be more consistent? https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -489,15 +489,18 @@ void VPlanTransforms::removeDeadRecipes(VPlan &Plan) { } } -static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID, +static VPValue *createScalarIVSteps(VPlan &Plan, +InductionDescriptor::InductionKind Kind, ScalarEvolution &SE, Instruction *TruncI, VPValue *StartV, VPValue *Step, -VPBasicBlock::iterator IP) { +Instruction::BinaryOps InductionOpcode, +VPBasicBlock::iterator IP, +FPMathOperator *FPBinOp = nullptr) { VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock(); VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV(); VPSingleDefRecipe *BaseIV = CanonicalIV; - if (!CanonicalIV->isCanonical(ID.getKind(), StartV, Step)) { -BaseIV = new VPDerivedIVRecipe(ID, StartV, CanonicalIV, Step); + if (!CanonicalIV->isCanonical(Kind, StartV, Step)) { +BaseIV = new VPDerivedIVRecipe(Kind, StartV, CanonicalIV, Step, FPBinOp); ayalz wrote: Should this refactoring to accept and pass Kind instead of ID be pushed as a separate simplifying preparation? https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -515,6 +533,8 @@ void VPInstruction::execute(VPTransformState &State) { State.Builder.setFastMathFlags(getFastMathFlags()); for (unsigned Part = 0; Part < State.UF; ++Part) { Value *GeneratedValue = generateInstruction(State, Part); +if (!GeneratedValue) + continue; if (!hasResult()) continue; ayalz wrote: ```suggestion if (!GeneratedValue || !hasResult()) continue; ``` https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -515,6 +533,8 @@ void VPInstruction::execute(VPTransformState &State) { State.Builder.setFastMathFlags(getFastMathFlags()); for (unsigned Part = 0; Part < State.UF; ++Part) { Value *GeneratedValue = generateInstruction(State, Part); +if (!GeneratedValue) + continue; if (!hasResult()) continue; assert(GeneratedValue && "generateInstruction must produce a value"); ayalz wrote: This assert is now redundant. https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [VPlan] Explicitly handle scalar pointer inductions. (PR #80273)
@@ -537,6 +542,30 @@ void VPlanTransforms::optimizeInductions(VPlan &Plan, ScalarEvolution &SE) { bool HasOnlyVectorVFs = !Plan.hasVF(ElementCount::getFixed(1)); VPBasicBlock::iterator InsertPt = HeaderVPBB->getFirstNonPhi(); for (VPRecipeBase &Phi : HeaderVPBB->phis()) { ayalz wrote: nit: worth adding a comment describing what unfolds next. Plus revisit the documentation of optimizeInductions(). https://github.com/llvm/llvm-project/pull/80273 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)
https://github.com/Pierre-vh approved this pull request. https://github.com/llvm/llvm-project/pull/80695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79175 (PR #80274)
https://github.com/fhahn approved this pull request. LGTM as this fixes a miscompile https://github.com/llvm/llvm-project/pull/80274 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/80695 >From 09303e727e515a7856d5f4cb100c5a9dec00b626 Mon Sep 17 00:00:00 2001 From: Pierre van Houtryve Date: Mon, 5 Feb 2024 14:36:15 +0100 Subject: [PATCH] [AMDGPU][PromoteAlloca] Support memsets to ptr allocas (#80678) Fixes #80366 (cherry picked from commit 4e958abf2f44d08129eafd5b6a4ee2bd3584ed22) --- .../lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | 16 -- .../CodeGen/AMDGPU/promote-alloca-memset.ll | 54 +++ 2 files changed, 66 insertions(+), 4 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp index 5e73411cae9b70..c1b244f50d93f8 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp @@ -521,10 +521,18 @@ static Value *promoteAllocaUserToVector( // For memset, we don't need to know the previous value because we // currently only allow memsets that cover the whole alloca. Value *Elt = MSI->getOperand(1); - if (DL.getTypeStoreSize(VecEltTy) > 1) { -Value *EltBytes = -Builder.CreateVectorSplat(DL.getTypeStoreSize(VecEltTy), Elt); -Elt = Builder.CreateBitCast(EltBytes, VecEltTy); + const unsigned BytesPerElt = DL.getTypeStoreSize(VecEltTy); + if (BytesPerElt > 1) { +Value *EltBytes = Builder.CreateVectorSplat(BytesPerElt, Elt); + +// If the element type of the vector is a pointer, we need to first cast +// to an integer, then use a PtrCast. +if (VecEltTy->isPointerTy()) { + Type *PtrInt = Builder.getIntNTy(BytesPerElt * 8); + Elt = Builder.CreateBitCast(EltBytes, PtrInt); + Elt = Builder.CreateIntToPtr(Elt, VecEltTy); +} else + Elt = Builder.CreateBitCast(EltBytes, VecEltTy); } return Builder.CreateVectorSplat(VectorTy->getElementCount(), Elt); diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll index 15af1f17e230ec..f1e2737b370ef0 100644 --- a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll +++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll @@ -84,4 +84,58 @@ entry: ret void } +define amdgpu_kernel void @memset_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [6 x ptr], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_vector_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_vector_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca <6 x ptr>, align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_array_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x [3 x ptr]], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x [3 x ptr]], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_vec_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_vec_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x <3 x ptr>], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x <3 x ptr>], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80168 (PR #80702)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/80702 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80168 (PR #80702)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/80702 resolves llvm/llvm-project#80168 >From c04bd5109fe4a15d24e6c66cb91567d0589d33c3 Mon Sep 17 00:00:00 2001 From: Louis Dionne Date: Mon, 5 Feb 2024 11:05:46 -0500 Subject: [PATCH] [libc++] Add missing conditionals for feature-test macros (#80168) We noticed that some feature-test macros were not conditional on configuration flags like _LIBCPP_HAS_NO_FILESYSTEM. As a result, code attempting to use FTMs would not work as intended. This patch adds conditionals for a few feature-test macros, but more issues may exist. rdar://122020466 (cherry picked from commit f2c84211d2834c73ff874389c6bb47b1c76d391a) --- libcxx/include/version| 14 +- .../filesystem.version.compile.pass.cpp | 16 +- .../fstream.version.compile.pass.cpp | 16 +- .../iomanip.version.compile.pass.cpp | 80 +--- .../mutex.version.compile.pass.cpp| 64 +-- .../version.version.compile.pass.cpp | 176 -- .../generate_feature_test_macro_components.py | 10 +- 7 files changed, 254 insertions(+), 122 deletions(-) diff --git a/libcxx/include/version b/libcxx/include/version index 9e26da8c1b242..d356976d6454a 100644 --- a/libcxx/include/version +++ b/libcxx/include/version @@ -266,7 +266,9 @@ __cpp_lib_within_lifetime 202306L # define __cpp_lib_make_reverse_iterator201402L # define __cpp_lib_make_unique 201304L # define __cpp_lib_null_iterators 201304L -# define __cpp_lib_quoted_string_io 201304L +# if !defined(_LIBCPP_HAS_NO_LOCALIZATION) +# define __cpp_lib_quoted_string_io 201304L +# endif # define __cpp_lib_result_of_sfinae 201210L # define __cpp_lib_robust_nonmodifying_seq_ops 201304L # if !defined(_LIBCPP_HAS_NO_THREADS) @@ -294,7 +296,7 @@ __cpp_lib_within_lifetime 202306L # define __cpp_lib_clamp201603L # define __cpp_lib_enable_shared_from_this 201603L // # define __cpp_lib_execution201603L -# if _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY +# if !defined(_LIBCPP_HAS_NO_FILESYSTEM) && _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY # define __cpp_lib_filesystem 201703L # endif # define __cpp_lib_gcd_lcm 201606L @@ -323,7 +325,9 @@ __cpp_lib_within_lifetime 202306L // # define __cpp_lib_parallel_algorithm 201603L # define __cpp_lib_raw_memory_algorithms201606L # define __cpp_lib_sample 201603L -# define __cpp_lib_scoped_lock 201703L +# if !defined(_LIBCPP_HAS_NO_THREADS) +# define __cpp_lib_scoped_lock201703L +# endif # if !defined(_LIBCPP_HAS_NO_THREADS) # define __cpp_lib_shared_mutex 201505L # endif @@ -496,7 +500,9 @@ __cpp_lib_within_lifetime 202306L // # define __cpp_lib_freestanding_optional202311L // # define __cpp_lib_freestanding_string_view 202311L // # define __cpp_lib_freestanding_variant 202311L -# define __cpp_lib_fstream_native_handle202306L +# if !defined(_LIBCPP_HAS_NO_FILESYSTEM) && !defined(_LIBCPP_HAS_NO_LOCALIZATION) +# define __cpp_lib_fstream_native_handle 202306L +# endif // # define __cpp_lib_function_ref 202306L // # define __cpp_lib_hazard_pointer 202306L // # define __cpp_lib_linalg 202311L diff --git a/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp b/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp index 46ccde800c179..3f03e8be9aeab 100644 --- a/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp +++ b/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp @@ -51,7 +51,7 @@ # error "__cpp_lib_char8_t should not be defined before c++20" # endif -# if !defined(_LIBCPP_VERSION) || _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY +# if !defined(_LIBCPP_VERSION) || (!defined(_LIBCPP_HAS_NO_FILESYSTEM) && _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY) # ifndef __cpp_lib_filesystem # error "__cpp_lib_filesystem should be defined in c++17" # endif @@ -60,7 +60,7 @@ # endif # else # ifdef __cpp_lib_filesystem -# error "__cpp_lib_filesystem should not be defined when the requirement '!defined(_LIBCPP_VERSION) || _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY' is not met!" +# error "__cpp_lib_filesystem
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80168 (PR #80702)
llvmbot wrote: @ldionne What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/80702 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79571 (PR #79572)
https://github.com/alinas approved this pull request. There are some pre-merge failures to review, but including this in the release makes sense. https://github.com/llvm/llvm-project/pull/79572 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80168 (PR #80702)
llvmbot wrote: @llvm/pr-subscribers-libcxx Author: None (llvmbot) Changes resolves llvm/llvm-project#80168 --- Patch is 30.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80702.diff 7 Files Affected: - (modified) libcxx/include/version (+10-4) - (modified) libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp (+8-8) - (modified) libcxx/test/std/language.support/support.limits/support.limits.general/fstream.version.compile.pass.cpp (+11-5) - (modified) libcxx/test/std/language.support/support.limits/support.limits.general/iomanip.version.compile.pass.cpp (+55-25) - (modified) libcxx/test/std/language.support/support.limits/support.limits.general/mutex.version.compile.pass.cpp (+44-20) - (modified) libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp (+118-58) - (modified) libcxx/utils/generate_feature_test_macro_components.py (+8-2) ``diff diff --git a/libcxx/include/version b/libcxx/include/version index 9e26da8c1b242..d356976d6454a 100644 --- a/libcxx/include/version +++ b/libcxx/include/version @@ -266,7 +266,9 @@ __cpp_lib_within_lifetime 202306L # define __cpp_lib_make_reverse_iterator201402L # define __cpp_lib_make_unique 201304L # define __cpp_lib_null_iterators 201304L -# define __cpp_lib_quoted_string_io 201304L +# if !defined(_LIBCPP_HAS_NO_LOCALIZATION) +# define __cpp_lib_quoted_string_io 201304L +# endif # define __cpp_lib_result_of_sfinae 201210L # define __cpp_lib_robust_nonmodifying_seq_ops 201304L # if !defined(_LIBCPP_HAS_NO_THREADS) @@ -294,7 +296,7 @@ __cpp_lib_within_lifetime 202306L # define __cpp_lib_clamp201603L # define __cpp_lib_enable_shared_from_this 201603L // # define __cpp_lib_execution201603L -# if _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY +# if !defined(_LIBCPP_HAS_NO_FILESYSTEM) && _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY # define __cpp_lib_filesystem 201703L # endif # define __cpp_lib_gcd_lcm 201606L @@ -323,7 +325,9 @@ __cpp_lib_within_lifetime 202306L // # define __cpp_lib_parallel_algorithm 201603L # define __cpp_lib_raw_memory_algorithms201606L # define __cpp_lib_sample 201603L -# define __cpp_lib_scoped_lock 201703L +# if !defined(_LIBCPP_HAS_NO_THREADS) +# define __cpp_lib_scoped_lock201703L +# endif # if !defined(_LIBCPP_HAS_NO_THREADS) # define __cpp_lib_shared_mutex 201505L # endif @@ -496,7 +500,9 @@ __cpp_lib_within_lifetime 202306L // # define __cpp_lib_freestanding_optional202311L // # define __cpp_lib_freestanding_string_view 202311L // # define __cpp_lib_freestanding_variant 202311L -# define __cpp_lib_fstream_native_handle202306L +# if !defined(_LIBCPP_HAS_NO_FILESYSTEM) && !defined(_LIBCPP_HAS_NO_LOCALIZATION) +# define __cpp_lib_fstream_native_handle 202306L +# endif // # define __cpp_lib_function_ref 202306L // # define __cpp_lib_hazard_pointer 202306L // # define __cpp_lib_linalg 202311L diff --git a/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp b/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp index 46ccde800c179..3f03e8be9aeab 100644 --- a/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp +++ b/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp @@ -51,7 +51,7 @@ # error "__cpp_lib_char8_t should not be defined before c++20" # endif -# if !defined(_LIBCPP_VERSION) || _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY +# if !defined(_LIBCPP_VERSION) || (!defined(_LIBCPP_HAS_NO_FILESYSTEM) && _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY) # ifndef __cpp_lib_filesystem # error "__cpp_lib_filesystem should be defined in c++17" # endif @@ -60,7 +60,7 @@ # endif # else # ifdef __cpp_lib_filesystem -# error "__cpp_lib_filesystem should not be defined when the requirement '!defined(_LIBCPP_VERSION) || _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY' is not met!" +# error "__cpp_lib_filesystem should not be defined when the requirement '!defined(_LIBCPP_VERSION) || (!defined(_LIBCPP_HAS_NO_FILESYSTEM) && _LIBCPP_AVAILABI
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80168 (PR #80702)
ldionne wrote: > @ldionne What do you think about merging this PR to the release branch? Approved! https://github.com/llvm/llvm-project/pull/80702 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] 4ce4248 - Revert "[mlir][openacc] Add legalize data pass for compute operation (#80351)"
Author: Valentin Clement (バレンタイン クレメン) Date: 2024-02-05T08:47:02-08:00 New Revision: 4ce4248b450f71324d547d78fdf3dd48bb76d587 URL: https://github.com/llvm/llvm-project/commit/4ce4248b450f71324d547d78fdf3dd48bb76d587 DIFF: https://github.com/llvm/llvm-project/commit/4ce4248b450f71324d547d78fdf3dd48bb76d587.diff LOG: Revert "[mlir][openacc] Add legalize data pass for compute operation (#80351)" This reverts commit 29d47513b3ce706b5df66409170e40ba39f3795a. Added: Modified: flang/include/flang/Optimizer/Support/InitFIR.h mlir/include/mlir/Dialect/OpenACC/CMakeLists.txt mlir/include/mlir/InitAllPasses.h mlir/lib/Dialect/OpenACC/CMakeLists.txt Removed: flang/test/Fir/OpenACC/legalize-data.fir mlir/include/mlir/Dialect/OpenACC/Transforms/CMakeLists.txt mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.h mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.td mlir/lib/Dialect/OpenACC/IR/CMakeLists.txt mlir/lib/Dialect/OpenACC/Transforms/CMakeLists.txt mlir/lib/Dialect/OpenACC/Transforms/LegalizeData.cpp mlir/test/Dialect/OpenACC/legalize-data.mlir diff --git a/flang/include/flang/Optimizer/Support/InitFIR.h b/flang/include/flang/Optimizer/Support/InitFIR.h index b5c41699205f4..8c47ad3d9f445 100644 --- a/flang/include/flang/Optimizer/Support/InitFIR.h +++ b/flang/include/flang/Optimizer/Support/InitFIR.h @@ -19,7 +19,6 @@ #include "mlir/Dialect/Affine/Passes.h" #include "mlir/Dialect/Complex/IR/Complex.h" #include "mlir/Dialect/Func/Extensions/InlinerExtension.h" -#include "mlir/Dialect/OpenACC/Transforms/Passes.h" #include "mlir/InitAllDialects.h" #include "mlir/Pass/Pass.h" #include "mlir/Pass/PassRegistry.h" @@ -75,7 +74,6 @@ inline void loadDialects(mlir::MLIRContext &context) { /// Register the standard passes we use. This comes from registerAllPasses(), /// but is a smaller set since we aren't using many of the passes found there. inline void registerMLIRPassesForFortranTools() { - mlir::acc::registerOpenACCPasses(); mlir::registerCanonicalizerPass(); mlir::registerCSEPass(); mlir::affine::registerAffineLoopFusionPass(); diff --git a/flang/test/Fir/OpenACC/legalize-data.fir b/flang/test/Fir/OpenACC/legalize-data.fir deleted file mode 100644 index 3b8695434e6e4..0 --- a/flang/test/Fir/OpenACC/legalize-data.fir +++ /dev/null @@ -1,24 +0,0 @@ -// RUN: fir-opt -split-input-file --openacc-legalize-data %s | FileCheck %s - -func.func @_QPsub1(%arg0: !fir.ref {fir.bindc_name = "i"}) { - %0:2 = hlfir.declare %arg0 {uniq_name = "_QFsub1Ei"} : (!fir.ref) -> (!fir.ref, !fir.ref) - %1 = acc.copyin varPtr(%0#0 : !fir.ref) -> !fir.ref {dataClause = #acc, name = "i"} - acc.parallel dataOperands(%1 : !fir.ref) { -%c0_i32 = arith.constant 0 : i32 -hlfir.assign %c0_i32 to %0#0 : i32, !fir.ref -acc.yield - } - acc.copyout accPtr(%1 : !fir.ref) to varPtr(%0#0 : !fir.ref) {dataClause = #acc, name = "i"} - return -} - -// CHECK-LABEL: func.func @_QPsub1 -// CHECK-SAME: (%[[ARG0:.*]]: !fir.ref {fir.bindc_name = "i"}) -// CHECK: %[[I:.*]]:2 = hlfir.declare %[[ARG0]] {uniq_name = "_QFsub1Ei"} : (!fir.ref) -> (!fir.ref, !fir.ref) -// CHECK: %[[COPYIN:.*]] = acc.copyin varPtr(%[[I]]#0 : !fir.ref) -> !fir.ref {dataClause = #acc, name = "i"} -// CHECK: acc.parallel dataOperands(%[[COPYIN]] : !fir.ref) { -// CHECK: %c0_i32 = arith.constant 0 : i32 -// CHECK: hlfir.assign %c0{{.*}} to %[[COPYIN]] : i32, !fir.ref -// CHECK: acc.yield -// CHECK: } -// CHECK: acc.copyout accPtr(%[[COPYIN]] : !fir.ref) to varPtr(%[[I]]#0 : !fir.ref) {dataClause = #acc, name = "i"} diff --git a/mlir/include/mlir/Dialect/OpenACC/CMakeLists.txt b/mlir/include/mlir/Dialect/OpenACC/CMakeLists.txt index 8a4b1c7b196ea..56ba2976ee5d4 100644 --- a/mlir/include/mlir/Dialect/OpenACC/CMakeLists.txt +++ b/mlir/include/mlir/Dialect/OpenACC/CMakeLists.txt @@ -1,5 +1,3 @@ -add_subdirectory(Transforms) - set(LLVM_TARGET_DEFINITIONS ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend/OpenACC/ACC.td) mlir_tablegen(AccCommon.td --gen-directive-decl --directives-dialect=OpenACC) add_public_tablegen_target(acc_common_td) diff --git a/mlir/include/mlir/Dialect/OpenACC/Transforms/CMakeLists.txt b/mlir/include/mlir/Dialect/OpenACC/Transforms/CMakeLists.txt deleted file mode 100644 index ddbd5839576fc..0 --- a/mlir/include/mlir/Dialect/OpenACC/Transforms/CMakeLists.txt +++ /dev/null @@ -1,5 +0,0 @@ -set(LLVM_TARGET_DEFINITIONS Passes.td) -mlir_tablegen(Passes.h.inc -gen-pass-decls -name OpenACC) -add_public_tablegen_target(MLIROpenACCPassIncGen) - -add_mlir_doc(Passes OpenACCPasses ./ -gen-pass-doc) diff --git a/mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.h b/mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.h deleted file mode 100644 index 5a11056cda609..0 --- a/mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.h +++ /dev/null @@ -1,40 +
[llvm-branch-commits] [mlir] 248b916 - Revert "[mlir][vector] Drop inner unit dims for transfer ops on dynamic shape…"
Author: Han-Chung Wang Date: 2024-02-05T09:06:32-08:00 New Revision: 248b9161015f4c030294359182169e96737998c3 URL: https://github.com/llvm/llvm-project/commit/248b9161015f4c030294359182169e96737998c3 DIFF: https://github.com/llvm/llvm-project/commit/248b9161015f4c030294359182169e96737998c3.diff LOG: Revert "[mlir][vector] Drop inner unit dims for transfer ops on dynamic shape…" This reverts commit 66347e516e22f9159b86024071fb92f364ac4418. Added: Modified: mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir Removed: diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp index 8363e73857e5c..12aa11e9e33f5 100644 --- a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp +++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp @@ -1236,7 +1236,7 @@ class DropInnerMostUnitDimsTransferRead return failure(); auto srcType = dyn_cast(readOp.getSource().getType()); -if (!srcType) +if (!srcType || !srcType.hasStaticShape()) return failure(); if (!readOp.getPermutationMap().isMinorIdentity()) @@ -1260,21 +1260,19 @@ class DropInnerMostUnitDimsTransferRead targetType.getElementType()); auto loc = readOp.getLoc(); -SmallVector sizes = -memref::getMixedSizes(rewriter, loc, readOp.getSource()); -SmallVector offsets(srcType.getRank(), - rewriter.getIndexAttr(0)); -SmallVector strides(srcType.getRank(), - rewriter.getIndexAttr(1)); MemRefType resultMemrefType = getMemRefTypeWithDroppingInnerDims(rewriter, srcType, dimsToDrop); +SmallVector offsets(srcType.getRank(), 0); +SmallVector strides(srcType.getRank(), 1); + ArrayAttr inBoundsAttr = readOp.getInBounds() ? rewriter.getArrayAttr( readOp.getInBoundsAttr().getValue().drop_back(dimsToDrop)) : ArrayAttr(); Value rankedReducedView = rewriter.create( -loc, resultMemrefType, readOp.getSource(), offsets, sizes, strides); +loc, resultMemrefType, readOp.getSource(), offsets, srcType.getShape(), +strides); auto permMap = getTransferMinorIdentityMap( cast(rankedReducedView.getType()), resultTargetVecType); Value result = rewriter.create( @@ -1320,7 +1318,7 @@ class DropInnerMostUnitDimsTransferWrite return failure(); auto srcType = dyn_cast(writeOp.getSource().getType()); -if (!srcType) +if (!srcType || !srcType.hasStaticShape()) return failure(); if (!writeOp.getPermutationMap().isMinorIdentity()) @@ -1343,23 +1341,20 @@ class DropInnerMostUnitDimsTransferWrite VectorType::get(targetType.getShape().drop_back(dimsToDrop), targetType.getElementType()); -Location loc = writeOp.getLoc(); -SmallVector sizes = -memref::getMixedSizes(rewriter, loc, writeOp.getSource()); -SmallVector offsets(srcType.getRank(), - rewriter.getIndexAttr(0)); -SmallVector strides(srcType.getRank(), - rewriter.getIndexAttr(1)); MemRefType resultMemrefType = getMemRefTypeWithDroppingInnerDims(rewriter, srcType, dimsToDrop); +SmallVector offsets(srcType.getRank(), 0); +SmallVector strides(srcType.getRank(), 1); ArrayAttr inBoundsAttr = writeOp.getInBounds() ? rewriter.getArrayAttr( writeOp.getInBoundsAttr().getValue().drop_back(dimsToDrop)) : ArrayAttr(); +Location loc = writeOp.getLoc(); Value rankedReducedView = rewriter.create( -loc, resultMemrefType, writeOp.getSource(), offsets, sizes, strides); +loc, resultMemrefType, writeOp.getSource(), offsets, srcType.getShape(), +strides); auto permMap = getTransferMinorIdentityMap( cast(rankedReducedView.getType()), resultTargetVecType); diff --git a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir index 3984f17f9e8cd..d6d69c8af8850 100644 --- a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir +++ b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir @@ -16,25 +16,6 @@ func.func @contiguous_inner_most_view(%in: memref<1x1x8x1xf32, strided<[3072, 8, // - -func.func @contiguous_outer_dyn_inner_most_view(%in: memref>) -> vector<1x8x1xf32>{ - %c0 = arith.constant 0 : index - %cst = arith.constant 0.0 : f32 - %0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref>, vector<1x8x1xf32> - return %0 : vector<1x8x1xf32> -} -//
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80715 (PR #80716)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/80716 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80715 (PR #80716)
llvmbot wrote: @yxsamliu What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/80716 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80715 (PR #80716)
https://github.com/yxsamliu approved this pull request. https://github.com/llvm/llvm-project/pull/80716 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80715 (PR #80716)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: None (llvmbot) Changes resolves llvm/llvm-project#80715 --- Patch is 295.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80716.diff 3 Files Affected: - (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+1) - (modified) llvm/test/CodeGen/AMDGPU/div_i128.ll (+5443-6) - (added) llvm/test/CodeGen/AMDGPU/div_v2i128.ll (+25) ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp index 55d95154c75878..2af53a664ff173 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp @@ -577,6 +577,7 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM, ISD::AssertSext, ISD::INTRINSIC_WO_CHAIN}); setMaxAtomicSizeInBitsSupported(64); + setMaxDivRemBitWidthSupported(64); } bool AMDGPUTargetLowering::mayIgnoreSignedZero(SDValue Op) const { diff --git a/llvm/test/CodeGen/AMDGPU/div_i128.ll b/llvm/test/CodeGen/AMDGPU/div_i128.ll index 4aa97c57cbd9c2..5296ad3ab51d31 100644 --- a/llvm/test/CodeGen/AMDGPU/div_i128.ll +++ b/llvm/test/CodeGen/AMDGPU/div_i128.ll @@ -1,9 +1,5446 @@ -; RUN: not --crash llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -o - %s 2>&1 | FileCheck -check-prefix=SDAG-ERR %s -; RUN: not --crash llc -global-isel=1 -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -o - %s 2>&1 | FileCheck -check-prefix=GISEL-ERR %s +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s | FileCheck -check-prefixes=GFX9,GFX9-SDAG %s +; RUN: llc -O0 -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s | FileCheck -check-prefixes=GFX9-O0,GFX9-SDAG-O0 %s + +; FIXME: GlobalISel missing the power-of-2 cases in legalization. https://github.com/llvm/llvm-project/issues/80671 +; xUN: llc -global-isel=1 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s | FileCheck -check-prefixes=GFX9,GFX9 %s +; xUN: llc -O0 -global-isel=1 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s | FileCheck -check-prefixes=GFX9-O0,GFX9-O0 %s -; SDAG-ERR: LLVM ERROR: unsupported libcall legalization -; GISEL-ERR: LLVM ERROR: unable to legalize instruction: %{{[0-9]+}}:_(s128) = G_SDIV %{{[0-9]+}}:_, %{{[0-9]+}}:_ (in function: v_sdiv_i128_vv) define i128 @v_sdiv_i128_vv(i128 %lhs, i128 %rhs) { - %shl = sdiv i128 %lhs, %rhs - ret i128 %shl +; GFX9-LABEL: v_sdiv_i128_vv: +; GFX9: ; %bb.0: ; %_udiv-special-cases +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:v_ashrrev_i32_e32 v16, 31, v3 +; GFX9-NEXT:v_xor_b32_e32 v0, v16, v0 +; GFX9-NEXT:v_xor_b32_e32 v1, v16, v1 +; GFX9-NEXT:v_sub_co_u32_e32 v8, vcc, v0, v16 +; GFX9-NEXT:v_xor_b32_e32 v2, v16, v2 +; GFX9-NEXT:v_subb_co_u32_e32 v9, vcc, v1, v16, vcc +; GFX9-NEXT:v_ashrrev_i32_e32 v17, 31, v7 +; GFX9-NEXT:v_xor_b32_e32 v3, v16, v3 +; GFX9-NEXT:v_subb_co_u32_e32 v10, vcc, v2, v16, vcc +; GFX9-NEXT:v_subb_co_u32_e32 v11, vcc, v3, v16, vcc +; GFX9-NEXT:v_xor_b32_e32 v3, v17, v4 +; GFX9-NEXT:v_xor_b32_e32 v2, v17, v5 +; GFX9-NEXT:v_sub_co_u32_e32 v20, vcc, v3, v17 +; GFX9-NEXT:v_xor_b32_e32 v0, v17, v6 +; GFX9-NEXT:v_subb_co_u32_e32 v21, vcc, v2, v17, vcc +; GFX9-NEXT:v_xor_b32_e32 v1, v17, v7 +; GFX9-NEXT:v_subb_co_u32_e32 v0, vcc, v0, v17, vcc +; GFX9-NEXT:v_subb_co_u32_e32 v1, vcc, v1, v17, vcc +; GFX9-NEXT:v_or_b32_e32 v3, v21, v1 +; GFX9-NEXT:v_or_b32_e32 v2, v20, v0 +; GFX9-NEXT:v_cmp_eq_u64_e32 vcc, 0, v[2:3] +; GFX9-NEXT:v_or_b32_e32 v3, v9, v11 +; GFX9-NEXT:v_or_b32_e32 v2, v8, v10 +; GFX9-NEXT:v_cmp_eq_u64_e64 s[4:5], 0, v[2:3] +; GFX9-NEXT:v_ffbh_u32_e32 v2, v0 +; GFX9-NEXT:v_add_u32_e32 v2, 32, v2 +; GFX9-NEXT:v_ffbh_u32_e32 v3, v1 +; GFX9-NEXT:v_min_u32_e32 v2, v2, v3 +; GFX9-NEXT:v_ffbh_u32_e32 v3, v20 +; GFX9-NEXT:v_add_u32_e32 v3, 32, v3 +; GFX9-NEXT:v_ffbh_u32_e32 v4, v21 +; GFX9-NEXT:v_min_u32_e32 v3, v3, v4 +; GFX9-NEXT:s_or_b64 s[4:5], vcc, s[4:5] +; GFX9-NEXT:v_add_co_u32_e32 v3, vcc, 64, v3 +; GFX9-NEXT:v_addc_co_u32_e64 v4, s[6:7], 0, 0, vcc +; GFX9-NEXT:v_cmp_ne_u64_e32 vcc, 0, v[0:1] +; GFX9-NEXT:v_ffbh_u32_e32 v5, v11 +; GFX9-NEXT:v_cndmask_b32_e32 v2, v3, v2, vcc +; GFX9-NEXT:v_ffbh_u32_e32 v3, v10 +; GFX9-NEXT:v_add_u32_e32 v3, 32, v3 +; GFX9-NEXT:v_min_u32_e32 v3, v3, v5 +; GFX9-NEXT:v_ffbh_u32_e32 v5, v8 +; GFX9-NEXT:v_add_u32_e32 v5, 32, v5 +; GFX9-NEXT:v_ffbh_u32_e32 v6, v9 +; GFX9-NEXT:v_min_u32_e32 v5, v5, v6 +; GFX9-NEXT:v_cndmask_b32_e64 v4, v4, 0, vcc +; GFX9-NEXT:v_add_co_u32_e32 v5, vcc, 64, v5 +; GFX9-NEXT:v_addc_co_u32_e64 v6, s[6:7], 0, 0, vcc +; GFX9-NEXT:v_cmp_ne_u64_e32 vcc, 0, v[10:11] +; GFX9-NEXT:
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80718 (PR #80720)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/80720 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80718 (PR #80720)
llvmbot wrote: @philnik777 What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/80720 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80718 (PR #80720)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/80720 resolves llvm/llvm-project#80718 >From 8df2173d644846197b4285bccc3aba3a651d1521 Mon Sep 17 00:00:00 2001 From: Dimitry Andric Date: Mon, 5 Feb 2024 17:41:12 +0100 Subject: [PATCH] [libc++] Rename __bit_reference template parameter to avoid conflict (#80661) As of 4d20cfcf4eb08217ed37c4d4c38dc395d7a66d26, `__bit_reference` contains a template `__fill_n` with a bool `_FillValue` parameter. Unfortunately there is a relatively widely used piece of scientific software called NetCDF, which exposes a (C) macro `_FillValue` in its public headers. When building the NetCDF C++ bindings, this quickly leads to compilation errors when the macro interferes with the template in `__bit_reference`. Rename the parameter to `_FillVal` to avoid the conflict. (cherry picked from commit 1ec252298925de50b27930c557ba9de3cc397afe) --- libcxx/include/__bit_reference | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libcxx/include/__bit_reference b/libcxx/include/__bit_reference index 9032b8f018093..3a5339b72ddc3 100644 --- a/libcxx/include/__bit_reference +++ b/libcxx/include/__bit_reference @@ -173,7 +173,7 @@ private: // fill_n -template +template _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { using _It= __bit_iterator<_Cp, false>; @@ -185,7 +185,7 @@ __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { __storage_type __clz_f = static_cast<__storage_type>(__bits_per_word - __first.__ctz_); __storage_type __dn= std::min(__clz_f, __n); __storage_type __m = (~__storage_type(0) << __first.__ctz_) & (~__storage_type(0) >> (__clz_f - __dn)); -if (_FillValue) +if (_FillVal) *__first.__seg_ |= __m; else *__first.__seg_ &= ~__m; @@ -194,13 +194,13 @@ __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { } // do middle whole words __storage_type __nw = __n / __bits_per_word; - std::fill_n(std::__to_address(__first.__seg_), __nw, _FillValue ? static_cast<__storage_type>(-1) : 0); + std::fill_n(std::__to_address(__first.__seg_), __nw, _FillVal ? static_cast<__storage_type>(-1) : 0); __n -= __nw * __bits_per_word; // do last partial word if (__n > 0) { __first.__seg_ += __nw; __storage_type __m = ~__storage_type(0) >> (__bits_per_word - __n); -if (_FillValue) +if (_FillVal) *__first.__seg_ |= __m; else *__first.__seg_ &= ~__m; @@ -1007,7 +1007,7 @@ private: friend class __bit_iterator<_Cp, true>; template friend struct __bit_array; - template + template _LIBCPP_CONSTEXPR_SINCE_CXX20 friend void __fill_n(__bit_iterator<_Dp, false> __first, typename _Dp::size_type __n); template ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80718 (PR #80720)
llvmbot wrote: @llvm/pr-subscribers-libcxx Author: None (llvmbot) Changes resolves llvm/llvm-project#80718 --- Full diff: https://github.com/llvm/llvm-project/pull/80720.diff 1 Files Affected: - (modified) libcxx/include/__bit_reference (+5-5) ``diff diff --git a/libcxx/include/__bit_reference b/libcxx/include/__bit_reference index 9032b8f018093..3a5339b72ddc3 100644 --- a/libcxx/include/__bit_reference +++ b/libcxx/include/__bit_reference @@ -173,7 +173,7 @@ private: // fill_n -template +template _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { using _It= __bit_iterator<_Cp, false>; @@ -185,7 +185,7 @@ __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { __storage_type __clz_f = static_cast<__storage_type>(__bits_per_word - __first.__ctz_); __storage_type __dn= std::min(__clz_f, __n); __storage_type __m = (~__storage_type(0) << __first.__ctz_) & (~__storage_type(0) >> (__clz_f - __dn)); -if (_FillValue) +if (_FillVal) *__first.__seg_ |= __m; else *__first.__seg_ &= ~__m; @@ -194,13 +194,13 @@ __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { } // do middle whole words __storage_type __nw = __n / __bits_per_word; - std::fill_n(std::__to_address(__first.__seg_), __nw, _FillValue ? static_cast<__storage_type>(-1) : 0); + std::fill_n(std::__to_address(__first.__seg_), __nw, _FillVal ? static_cast<__storage_type>(-1) : 0); __n -= __nw * __bits_per_word; // do last partial word if (__n > 0) { __first.__seg_ += __nw; __storage_type __m = ~__storage_type(0) >> (__bits_per_word - __n); -if (_FillValue) +if (_FillVal) *__first.__seg_ |= __m; else *__first.__seg_ &= ~__m; @@ -1007,7 +1007,7 @@ private: friend class __bit_iterator<_Cp, true>; template friend struct __bit_array; - template + template _LIBCPP_CONSTEXPR_SINCE_CXX20 friend void __fill_n(__bit_iterator<_Dp, false> __first, typename _Dp::size_type __n); template `` https://github.com/llvm/llvm-project/pull/80720 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80348 (PR #80585)
tstellar wrote: /cherry-pick 4b34558f43121df9b863ff2492f74fb2e65a5af1. https://github.com/llvm/llvm-project/pull/80585 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80348 (PR #80585)
llvmbot wrote: Failed to cherry-pick: 4b34558f43121df9b863ff2492f74fb2e65a5af1. https://github.com/llvm/llvm-project/actions/runs/7789532649 Please manually backport the fix and push it to your github fork. Once this is done, please create a [pull request](https://github.com/llvm/llvm-project/compare) https://github.com/llvm/llvm-project/pull/80585 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80699 (PR #80729)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/80729 resolves llvm/llvm-project#80699 >From 29a91711a135622cf74989e100274ab46c8c0bc1 Mon Sep 17 00:00:00 2001 From: Jeremy Morse Date: Wed, 24 Jan 2024 17:45:43 + Subject: [PATCH] [BPI] Transfer value-handles when assign/move constructing BPI (#4) Background: BPI stores a collection of edge branch-probabilities, and also a set of Callback value-handles for the blocks in the edge-collection. When a block is deleted, BPI's eraseBlock method is called to clear the edge-collection of references to that block, to avoid dangling pointers. However, when move-constructing or assigning a BPI object, the edge-collection gets moved, but the value-handles are discarded. This can lead to to stale entries in the edge-collection when blocks are deleted without the callback -- not normally a problem, but if a new block is allocated with the same address as an old block, spurious branch probabilities will be recorded about it. The fix is to transfer the handles from the source BPI object. This was exposed by an unrelated debug-info change, it probably just shifted around allocation orders to expose this. Detected as nondeterminism and reduced by Zequan Wu: https://github.com/llvm/llvm-project/commit/f1b0a544514f3d343f32a41de9d6fb0b6cbb6021#commitcomment-136737090 (No test because IMHO testing for a behaviour that varies with memory allocators is likely futile; I can add the reproducer with a CHECK for the relevant branch weights if it's desired though) (cherry picked from commit 604a6c409e8473b212952b8633d92bbdb22a45c9) --- llvm/include/llvm/Analysis/BranchProbabilityInfo.h | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h index 6b9d178182011..91e1872e9bd6f 100644 --- a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h +++ b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h @@ -122,16 +122,23 @@ class BranchProbabilityInfo { } BranchProbabilityInfo(BranchProbabilityInfo &&Arg) - : Probs(std::move(Arg.Probs)), LastF(Arg.LastF), -EstimatedBlockWeight(std::move(Arg.EstimatedBlockWeight)) {} + : Handles(std::move(Arg.Handles)), Probs(std::move(Arg.Probs)), +LastF(Arg.LastF), +EstimatedBlockWeight(std::move(Arg.EstimatedBlockWeight)) { +for (auto &Handle : Handles) + Handle.setBPI(this); + } BranchProbabilityInfo(const BranchProbabilityInfo &) = delete; BranchProbabilityInfo &operator=(const BranchProbabilityInfo &) = delete; BranchProbabilityInfo &operator=(BranchProbabilityInfo &&RHS) { releaseMemory(); +Handles = std::move(RHS.Handles); Probs = std::move(RHS.Probs); EstimatedBlockWeight = std::move(RHS.EstimatedBlockWeight); +for (auto &Handle : Handles) + Handle.setBPI(this); return *this; } @@ -279,6 +286,8 @@ class BranchProbabilityInfo { } public: +void setBPI(BranchProbabilityInfo *BPI) { this->BPI = BPI; } + BasicBlockCallbackVH(const Value *V, BranchProbabilityInfo *BPI = nullptr) : CallbackVH(const_cast(V)), BPI(BPI) {} }; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80699 (PR #80729)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/80729 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80699 (PR #80729)
llvmbot wrote: @ZequanWu What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/80729 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80699 (PR #80729)
llvmbot wrote: @llvm/pr-subscribers-llvm-analysis Author: None (llvmbot) Changes resolves llvm/llvm-project#80699 --- Full diff: https://github.com/llvm/llvm-project/pull/80729.diff 1 Files Affected: - (modified) llvm/include/llvm/Analysis/BranchProbabilityInfo.h (+11-2) ``diff diff --git a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h index 6b9d178182011..91e1872e9bd6f 100644 --- a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h +++ b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h @@ -122,16 +122,23 @@ class BranchProbabilityInfo { } BranchProbabilityInfo(BranchProbabilityInfo &&Arg) - : Probs(std::move(Arg.Probs)), LastF(Arg.LastF), -EstimatedBlockWeight(std::move(Arg.EstimatedBlockWeight)) {} + : Handles(std::move(Arg.Handles)), Probs(std::move(Arg.Probs)), +LastF(Arg.LastF), +EstimatedBlockWeight(std::move(Arg.EstimatedBlockWeight)) { +for (auto &Handle : Handles) + Handle.setBPI(this); + } BranchProbabilityInfo(const BranchProbabilityInfo &) = delete; BranchProbabilityInfo &operator=(const BranchProbabilityInfo &) = delete; BranchProbabilityInfo &operator=(BranchProbabilityInfo &&RHS) { releaseMemory(); +Handles = std::move(RHS.Handles); Probs = std::move(RHS.Probs); EstimatedBlockWeight = std::move(RHS.EstimatedBlockWeight); +for (auto &Handle : Handles) + Handle.setBPI(this); return *this; } @@ -279,6 +286,8 @@ class BranchProbabilityInfo { } public: +void setBPI(BranchProbabilityInfo *BPI) { this->BPI = BPI; } + BasicBlockCallbackVH(const Value *V, BranchProbabilityInfo *BPI = nullptr) : CallbackVH(const_cast(V)), BPI(BPI) {} }; `` https://github.com/llvm/llvm-project/pull/80729 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80597 (PR #80731)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/80731 resolves llvm/llvm-project#80597 >From 3df992ed00f46a44492416cce46121f5e4fc0716 Mon Sep 17 00:00:00 2001 From: Yingwei Zheng Date: Tue, 6 Feb 2024 01:29:38 +0800 Subject: [PATCH] [InstCombine] Fix assertion failure in issue80597 (#80614) The assertion in #80597 failed when we were trying to compute known bits of a value in an unreachable BB. https://github.com/llvm/llvm-project/blob/859b09da08c2a47026ba0a7d2f21b7dca705864d/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L749-L810 In this case, `SignBits` is 30 (deduced from instr info), but `Known` is `110101011101000101000?0?` (deduced from dom cond). Setting high bits of `lshr Known, 1` will lead to conflict. This patch masks out high bits of `Known.Zero` to address this problem. Fixes #80597. (cherry picked from commit cb8d83a77c25e529f58eba17bb1ec76069a04e90) --- .../InstCombineSimplifyDemanded.cpp | 3 ++ llvm/test/Transforms/InstCombine/pr80597.ll | 33 +++ 2 files changed, 36 insertions(+) create mode 100644 llvm/test/Transforms/InstCombine/pr80597.ll diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp index a8a5f9831e15e..79873a9b4cbb4 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp @@ -802,6 +802,9 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask, return InsertNewInstWith(LShr, I->getIterator()); } else if (Known.One[BitWidth-ShiftAmt-1]) { // New bits are known one. Known.One |= HighBits; +// SignBits may be out-of-sync with Known.countMinSignBits(). Mask out +// high bits of Known.Zero to avoid conflicts. +Known.Zero &= ~HighBits; } } else { computeKnownBits(I, Known, Depth, CxtI); diff --git a/llvm/test/Transforms/InstCombine/pr80597.ll b/llvm/test/Transforms/InstCombine/pr80597.ll new file mode 100644 index 0..5feae4a06c45c --- /dev/null +++ b/llvm/test/Transforms/InstCombine/pr80597.ll @@ -0,0 +1,33 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4 +; RUN: opt -S -passes=instcombine < %s | FileCheck %s + +define i64 @pr80597(i1 %cond) { +; CHECK-LABEL: define i64 @pr80597( +; CHECK-SAME: i1 [[COND:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT:[[ADD:%.*]] = select i1 [[COND]], i64 0, i64 -12884901888 +; CHECK-NEXT:[[SEXT1:%.*]] = add nsw i64 [[ADD]], 8836839514384105472 +; CHECK-NEXT:[[CMP:%.*]] = icmp ult i64 [[SEXT1]], -34359738368 +; CHECK-NEXT:br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]] +; CHECK: if.else: +; CHECK-NEXT:[[SEXT2:%.*]] = ashr exact i64 [[ADD]], 1 +; CHECK-NEXT:[[ASHR:%.*]] = or i64 [[SEXT2]], 4418419761487020032 +; CHECK-NEXT:ret i64 [[ASHR]] +; CHECK: if.then: +; CHECK-NEXT:ret i64 0 +; +entry: + %add = select i1 %cond, i64 0, i64 4294967293 + %add8 = shl i64 %add, 32 + %sext1 = add i64 %add8, 8836839514384105472 + %cmp = icmp ult i64 %sext1, -34359738368 + br i1 %cmp, label %if.then, label %if.else + +if.else: + %sext2 = or i64 %add8, 8836839522974040064 + %ashr = ashr i64 %sext2, 1 + ret i64 %ashr + +if.then: + ret i64 0 +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80597 (PR #80731)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/80731 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80597 (PR #80731)
llvmbot wrote: @nikic What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/80731 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80597 (PR #80731)
llvmbot wrote: @llvm/pr-subscribers-llvm-transforms Author: None (llvmbot) Changes resolves llvm/llvm-project#80597 --- Full diff: https://github.com/llvm/llvm-project/pull/80731.diff 2 Files Affected: - (modified) llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp (+3) - (added) llvm/test/Transforms/InstCombine/pr80597.ll (+33) ``diff diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp index a8a5f9831e15e..79873a9b4cbb4 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp @@ -802,6 +802,9 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask, return InsertNewInstWith(LShr, I->getIterator()); } else if (Known.One[BitWidth-ShiftAmt-1]) { // New bits are known one. Known.One |= HighBits; +// SignBits may be out-of-sync with Known.countMinSignBits(). Mask out +// high bits of Known.Zero to avoid conflicts. +Known.Zero &= ~HighBits; } } else { computeKnownBits(I, Known, Depth, CxtI); diff --git a/llvm/test/Transforms/InstCombine/pr80597.ll b/llvm/test/Transforms/InstCombine/pr80597.ll new file mode 100644 index 0..5feae4a06c45c --- /dev/null +++ b/llvm/test/Transforms/InstCombine/pr80597.ll @@ -0,0 +1,33 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4 +; RUN: opt -S -passes=instcombine < %s | FileCheck %s + +define i64 @pr80597(i1 %cond) { +; CHECK-LABEL: define i64 @pr80597( +; CHECK-SAME: i1 [[COND:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT:[[ADD:%.*]] = select i1 [[COND]], i64 0, i64 -12884901888 +; CHECK-NEXT:[[SEXT1:%.*]] = add nsw i64 [[ADD]], 8836839514384105472 +; CHECK-NEXT:[[CMP:%.*]] = icmp ult i64 [[SEXT1]], -34359738368 +; CHECK-NEXT:br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]] +; CHECK: if.else: +; CHECK-NEXT:[[SEXT2:%.*]] = ashr exact i64 [[ADD]], 1 +; CHECK-NEXT:[[ASHR:%.*]] = or i64 [[SEXT2]], 4418419761487020032 +; CHECK-NEXT:ret i64 [[ASHR]] +; CHECK: if.then: +; CHECK-NEXT:ret i64 0 +; +entry: + %add = select i1 %cond, i64 0, i64 4294967293 + %add8 = shl i64 %add, 32 + %sext1 = add i64 %add8, 8836839514384105472 + %cmp = icmp ult i64 %sext1, -34359738368 + br i1 %cmp, label %if.then, label %if.else + +if.else: + %sext2 = or i64 %add8, 8836839522974040064 + %ashr = ashr i64 %sext2, 1 + ret i64 %ashr + +if.then: + ret i64 0 +} `` https://github.com/llvm/llvm-project/pull/80731 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] PR for llvm/llvm-project#80432 (PR #80433)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/80433 >From 1a791e84d9e6ef0e6be1a15e64b78a8fcc18467a Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Fri, 2 Feb 2024 11:56:38 + Subject: [PATCH] [Clang][AArch64] Emit 'unimplemented' diagnostic for SME (#80295) When a function F has ZA and ZT0 state, calls another function G that only shares ZT0 state with its caller, F will have to save ZA before the call to G, and restore it afterwards (rather than setting up a lazy-sve). This is not yet implemented in LLVM and does not result in a compile-time error either. So instead of silently generating incorrect code, it's better to emit an error saying this is not yet implemented. (cherry picked from commit 319f4c03ba2909c7240ac157cc46216bf1518c10) --- .../clang/Basic/DiagnosticSemaKinds.td| 6 +++ clang/lib/Sema/SemaChecking.cpp | 50 +-- clang/test/Sema/aarch64-sme-func-attrs.c | 13 - 3 files changed, 40 insertions(+), 29 deletions(-) diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index a1c32abb4dcd88..ef8c111b1d8cc8 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -3711,6 +3711,12 @@ def err_sme_za_call_no_za_state : Error< "call to a shared ZA function requires the caller to have ZA state">; def err_sme_zt0_call_no_zt0_state : Error< "call to a shared ZT0 function requires the caller to have ZT0 state">; +def err_sme_unimplemented_za_save_restore : Error< + "call to a function that shares state other than 'za' from a " + "function that has live 'za' state requires a spill/fill of ZA, which is not yet " + "implemented">; +def note_sme_use_preserves_za : Note< + "add '__arm_preserves(\"za\")' to the callee if it preserves ZA">; def err_sme_definition_using_sm_in_non_sme_target : Error< "function executed in streaming-SVE mode requires 'sme'">; def err_sme_definition_using_za_in_non_sme_target : Error< diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp index 25e9af1ea3f362..09b7e1c62fbd7b 100644 --- a/clang/lib/Sema/SemaChecking.cpp +++ b/clang/lib/Sema/SemaChecking.cpp @@ -7545,47 +7545,43 @@ void Sema::checkCall(NamedDecl *FDecl, const FunctionProtoType *Proto, } } -// If the callee uses AArch64 SME ZA state but the caller doesn't define -// any, then this is an error. -FunctionType::ArmStateValue ArmZAState = +FunctionType::ArmStateValue CalleeArmZAState = FunctionType::getArmZAState(ExtInfo.AArch64SMEAttributes); -if (ArmZAState != FunctionType::ARM_None) { +FunctionType::ArmStateValue CalleeArmZT0State = +FunctionType::getArmZT0State(ExtInfo.AArch64SMEAttributes); +if (CalleeArmZAState != FunctionType::ARM_None || +CalleeArmZT0State != FunctionType::ARM_None) { bool CallerHasZAState = false; + bool CallerHasZT0State = false; if (const auto *CallerFD = dyn_cast(CurContext)) { auto *Attr = CallerFD->getAttr(); if (Attr && Attr->isNewZA()) CallerHasZAState = true; -else if (const auto *FPT = - CallerFD->getType()->getAs()) - CallerHasZAState = FunctionType::getArmZAState( - FPT->getExtProtoInfo().AArch64SMEAttributes) != - FunctionType::ARM_None; - } - - if (!CallerHasZAState) -Diag(Loc, diag::err_sme_za_call_no_za_state); -} - -// If the callee uses AArch64 SME ZT0 state but the caller doesn't define -// any, then this is an error. -FunctionType::ArmStateValue ArmZT0State = -FunctionType::getArmZT0State(ExtInfo.AArch64SMEAttributes); -if (ArmZT0State != FunctionType::ARM_None) { - bool CallerHasZT0State = false; - if (const auto *CallerFD = dyn_cast(CurContext)) { -auto *Attr = CallerFD->getAttr(); if (Attr && Attr->isNewZT0()) CallerHasZT0State = true; -else if (const auto *FPT = - CallerFD->getType()->getAs()) - CallerHasZT0State = +if (const auto *FPT = CallerFD->getType()->getAs()) { + CallerHasZAState |= + FunctionType::getArmZAState( + FPT->getExtProtoInfo().AArch64SMEAttributes) != + FunctionType::ARM_None; + CallerHasZT0State |= FunctionType::getArmZT0State( FPT->getExtProtoInfo().AArch64SMEAttributes) != FunctionType::ARM_None; +} } - if (!CallerHasZT0State) + if (CalleeArmZAState != FunctionType::ARM_None && !CallerHasZAState) +Diag(Loc, diag::err_sme_za_call_no_za_state); + + if (CalleeArmZT0State != FunctionType::ARM_None && !CallerHasZT0State) Diag(Loc, diag::err_sme_zt0_call_no_zt0_state); + + if (CallerHasZAState && CalleeAr
[llvm-branch-commits] [clang] 1a791e8 - [Clang][AArch64] Emit 'unimplemented' diagnostic for SME (#80295)
Author: Sander de Smalen Date: 2024-02-05T11:20:35-08:00 New Revision: 1a791e84d9e6ef0e6be1a15e64b78a8fcc18467a URL: https://github.com/llvm/llvm-project/commit/1a791e84d9e6ef0e6be1a15e64b78a8fcc18467a DIFF: https://github.com/llvm/llvm-project/commit/1a791e84d9e6ef0e6be1a15e64b78a8fcc18467a.diff LOG: [Clang][AArch64] Emit 'unimplemented' diagnostic for SME (#80295) When a function F has ZA and ZT0 state, calls another function G that only shares ZT0 state with its caller, F will have to save ZA before the call to G, and restore it afterwards (rather than setting up a lazy-sve). This is not yet implemented in LLVM and does not result in a compile-time error either. So instead of silently generating incorrect code, it's better to emit an error saying this is not yet implemented. (cherry picked from commit 319f4c03ba2909c7240ac157cc46216bf1518c10) Added: Modified: clang/include/clang/Basic/DiagnosticSemaKinds.td clang/lib/Sema/SemaChecking.cpp clang/test/Sema/aarch64-sme-func-attrs.c Removed: diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index a1c32abb4dcd8..ef8c111b1d8cc 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -3711,6 +3711,12 @@ def err_sme_za_call_no_za_state : Error< "call to a shared ZA function requires the caller to have ZA state">; def err_sme_zt0_call_no_zt0_state : Error< "call to a shared ZT0 function requires the caller to have ZT0 state">; +def err_sme_unimplemented_za_save_restore : Error< + "call to a function that shares state other than 'za' from a " + "function that has live 'za' state requires a spill/fill of ZA, which is not yet " + "implemented">; +def note_sme_use_preserves_za : Note< + "add '__arm_preserves(\"za\")' to the callee if it preserves ZA">; def err_sme_definition_using_sm_in_non_sme_target : Error< "function executed in streaming-SVE mode requires 'sme'">; def err_sme_definition_using_za_in_non_sme_target : Error< diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp index 25e9af1ea3f36..09b7e1c62fbd7 100644 --- a/clang/lib/Sema/SemaChecking.cpp +++ b/clang/lib/Sema/SemaChecking.cpp @@ -7545,47 +7545,43 @@ void Sema::checkCall(NamedDecl *FDecl, const FunctionProtoType *Proto, } } -// If the callee uses AArch64 SME ZA state but the caller doesn't define -// any, then this is an error. -FunctionType::ArmStateValue ArmZAState = +FunctionType::ArmStateValue CalleeArmZAState = FunctionType::getArmZAState(ExtInfo.AArch64SMEAttributes); -if (ArmZAState != FunctionType::ARM_None) { +FunctionType::ArmStateValue CalleeArmZT0State = +FunctionType::getArmZT0State(ExtInfo.AArch64SMEAttributes); +if (CalleeArmZAState != FunctionType::ARM_None || +CalleeArmZT0State != FunctionType::ARM_None) { bool CallerHasZAState = false; + bool CallerHasZT0State = false; if (const auto *CallerFD = dyn_cast(CurContext)) { auto *Attr = CallerFD->getAttr(); if (Attr && Attr->isNewZA()) CallerHasZAState = true; -else if (const auto *FPT = - CallerFD->getType()->getAs()) - CallerHasZAState = FunctionType::getArmZAState( - FPT->getExtProtoInfo().AArch64SMEAttributes) != - FunctionType::ARM_None; - } - - if (!CallerHasZAState) -Diag(Loc, diag::err_sme_za_call_no_za_state); -} - -// If the callee uses AArch64 SME ZT0 state but the caller doesn't define -// any, then this is an error. -FunctionType::ArmStateValue ArmZT0State = -FunctionType::getArmZT0State(ExtInfo.AArch64SMEAttributes); -if (ArmZT0State != FunctionType::ARM_None) { - bool CallerHasZT0State = false; - if (const auto *CallerFD = dyn_cast(CurContext)) { -auto *Attr = CallerFD->getAttr(); if (Attr && Attr->isNewZT0()) CallerHasZT0State = true; -else if (const auto *FPT = - CallerFD->getType()->getAs()) - CallerHasZT0State = +if (const auto *FPT = CallerFD->getType()->getAs()) { + CallerHasZAState |= + FunctionType::getArmZAState( + FPT->getExtProtoInfo().AArch64SMEAttributes) != + FunctionType::ARM_None; + CallerHasZT0State |= FunctionType::getArmZT0State( FPT->getExtProtoInfo().AArch64SMEAttributes) != FunctionType::ARM_None; +} } - if (!CallerHasZT0State) + if (CalleeArmZAState != FunctionType::ARM_None && !CallerHasZAState) +Diag(Loc, diag::err_sme_za_call_no_za_state); + + if (CalleeArmZT0State != FunctionType::ARM_None && !CallerHasZT0State) Dia
[llvm-branch-commits] [clang] PR for llvm/llvm-project#80432 (PR #80433)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/80433 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79571 (PR #79572)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/79572 >From 43db795259d91ddb3b12596e8aec3dddbd1fb583 Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Wed, 24 Jan 2024 10:15:42 +0100 Subject: [PATCH] [MSSAUpdater] Handle simplified accesses when updating phis (#78272) This is a followup to #76819. After those changes, we can still run into an assertion failure for a slight variation of the test case: When fixing up MemoryPhis, we map the incoming access to the access of the cloned instruction -- which may now no longer exist. Fix this by reusing the getNewDefiningAccessForClone() helper, which will look upwards for a new defining access in that case. (cherry picked from commit a7a1b8b17e264fb0f2d2b4165cf9a7f5094b08b3) --- llvm/lib/Analysis/MemorySSAUpdater.cpp| 22 +--- .../memssa-readnone-access.ll | 104 ++ 2 files changed, 107 insertions(+), 19 deletions(-) diff --git a/llvm/lib/Analysis/MemorySSAUpdater.cpp b/llvm/lib/Analysis/MemorySSAUpdater.cpp index e87ae7d71fffe2..aa550f0b6a7bfd 100644 --- a/llvm/lib/Analysis/MemorySSAUpdater.cpp +++ b/llvm/lib/Analysis/MemorySSAUpdater.cpp @@ -692,25 +692,9 @@ void MemorySSAUpdater::updateForClonedLoop(const LoopBlocksRPO &LoopBlocks, continue; // Determine incoming value and add it as incoming from IncBB. - if (MemoryUseOrDef *IncMUD = dyn_cast(IncomingAccess)) { -if (!MSSA->isLiveOnEntryDef(IncMUD)) { - Instruction *IncI = IncMUD->getMemoryInst(); - assert(IncI && "Found MemoryUseOrDef with no Instruction."); - if (Instruction *NewIncI = - cast_or_null(VMap.lookup(IncI))) { -IncMUD = MSSA->getMemoryAccess(NewIncI); -assert(IncMUD && - "MemoryUseOrDef cannot be null, all preds processed."); - } -} -NewPhi->addIncoming(IncMUD, IncBB); - } else { -MemoryPhi *IncPhi = cast(IncomingAccess); -if (MemoryAccess *NewDefPhi = MPhiMap.lookup(IncPhi)) - NewPhi->addIncoming(NewDefPhi, IncBB); -else - NewPhi->addIncoming(IncPhi, IncBB); - } + NewPhi->addIncoming( + getNewDefiningAccessForClone(IncomingAccess, VMap, MPhiMap, MSSA), + IncBB); } if (auto *SingleAccess = onlySingleValue(NewPhi)) { MPhiMap[Phi] = SingleAccess; diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll b/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll index 2aaf777683e116..c6e6608d4be383 100644 --- a/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll +++ b/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll @@ -115,3 +115,107 @@ split: exit: ret void } + +; Variants of the above test with swapped branch destinations. + +define void @test1_swapped(i1 %c) { +; CHECK-LABEL: define void @test1_swapped( +; CHECK-SAME: i1 [[C:%.*]]) { +; CHECK-NEXT: start: +; CHECK-NEXT:[[C_FR:%.*]] = freeze i1 [[C]] +; CHECK-NEXT:br i1 [[C_FR]], label [[START_SPLIT_US:%.*]], label [[START_SPLIT:%.*]] +; CHECK: start.split.us: +; CHECK-NEXT:br label [[LOOP_US:%.*]] +; CHECK: loop.us: +; CHECK-NEXT:call void @foo() +; CHECK-NEXT:br label [[LOOP_US]] +; CHECK: start.split: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT:call void @foo() +; CHECK-NEXT:br label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT:ret void +; +start: + br label %loop + +loop: + %fn = load ptr, ptr @vtable, align 8 + call void %fn() + br i1 %c, label %loop, label %exit + +exit: + ret void +} + +define void @test2_swapped(i1 %c, ptr %p) { +; CHECK-LABEL: define void @test2_swapped( +; CHECK-SAME: i1 [[C:%.*]], ptr [[P:%.*]]) { +; CHECK-NEXT:[[C_FR:%.*]] = freeze i1 [[C]] +; CHECK-NEXT:br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]] +; CHECK: .split.us: +; CHECK-NEXT:br label [[LOOP_US:%.*]] +; CHECK: loop.us: +; CHECK-NEXT:call void @foo() +; CHECK-NEXT:call void @bar() +; CHECK-NEXT:br label [[LOOP_US]] +; CHECK: .split: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT:call void @foo() +; CHECK-NEXT:call void @bar() +; CHECK-NEXT:br label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT:ret void +; + br label %loop + +loop: + %fn = load ptr, ptr @vtable, align 8 + call void %fn() + call void @bar() + br i1 %c, label %loop, label %exit + +exit: + ret void +} + +define void @test3_swapped(i1 %c, ptr %p) { +; CHECK-LABEL: define void @test3_swapped( +; CHECK-SAME: i1 [[C:%.*]], ptr [[P:%.*]]) { +; CHECK-NEXT:[[C_FR:%.*]] = freeze i1 [[C]] +; CHECK-NEXT:br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]] +; CHECK: .split.us: +; CHECK-NEXT:br label [[LOOP_US:%.*]] +; CHECK: loop.us: +; CHECK-NEXT:br label [[SPLIT_US:%.*]] +; CHECK:
[llvm-branch-commits] [llvm] 43db795 - [MSSAUpdater] Handle simplified accesses when updating phis (#78272)
Author: Nikita Popov Date: 2024-02-05T11:23:33-08:00 New Revision: 43db795259d91ddb3b12596e8aec3dddbd1fb583 URL: https://github.com/llvm/llvm-project/commit/43db795259d91ddb3b12596e8aec3dddbd1fb583 DIFF: https://github.com/llvm/llvm-project/commit/43db795259d91ddb3b12596e8aec3dddbd1fb583.diff LOG: [MSSAUpdater] Handle simplified accesses when updating phis (#78272) This is a followup to #76819. After those changes, we can still run into an assertion failure for a slight variation of the test case: When fixing up MemoryPhis, we map the incoming access to the access of the cloned instruction -- which may now no longer exist. Fix this by reusing the getNewDefiningAccessForClone() helper, which will look upwards for a new defining access in that case. (cherry picked from commit a7a1b8b17e264fb0f2d2b4165cf9a7f5094b08b3) Added: Modified: llvm/lib/Analysis/MemorySSAUpdater.cpp llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll Removed: diff --git a/llvm/lib/Analysis/MemorySSAUpdater.cpp b/llvm/lib/Analysis/MemorySSAUpdater.cpp index e87ae7d71fffe..aa550f0b6a7bf 100644 --- a/llvm/lib/Analysis/MemorySSAUpdater.cpp +++ b/llvm/lib/Analysis/MemorySSAUpdater.cpp @@ -692,25 +692,9 @@ void MemorySSAUpdater::updateForClonedLoop(const LoopBlocksRPO &LoopBlocks, continue; // Determine incoming value and add it as incoming from IncBB. - if (MemoryUseOrDef *IncMUD = dyn_cast(IncomingAccess)) { -if (!MSSA->isLiveOnEntryDef(IncMUD)) { - Instruction *IncI = IncMUD->getMemoryInst(); - assert(IncI && "Found MemoryUseOrDef with no Instruction."); - if (Instruction *NewIncI = - cast_or_null(VMap.lookup(IncI))) { -IncMUD = MSSA->getMemoryAccess(NewIncI); -assert(IncMUD && - "MemoryUseOrDef cannot be null, all preds processed."); - } -} -NewPhi->addIncoming(IncMUD, IncBB); - } else { -MemoryPhi *IncPhi = cast(IncomingAccess); -if (MemoryAccess *NewDefPhi = MPhiMap.lookup(IncPhi)) - NewPhi->addIncoming(NewDefPhi, IncBB); -else - NewPhi->addIncoming(IncPhi, IncBB); - } + NewPhi->addIncoming( + getNewDefiningAccessForClone(IncomingAccess, VMap, MPhiMap, MSSA), + IncBB); } if (auto *SingleAccess = onlySingleValue(NewPhi)) { MPhiMap[Phi] = SingleAccess; diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll b/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll index 2aaf777683e11..c6e6608d4be38 100644 --- a/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll +++ b/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll @@ -115,3 +115,107 @@ split: exit: ret void } + +; Variants of the above test with swapped branch destinations. + +define void @test1_swapped(i1 %c) { +; CHECK-LABEL: define void @test1_swapped( +; CHECK-SAME: i1 [[C:%.*]]) { +; CHECK-NEXT: start: +; CHECK-NEXT:[[C_FR:%.*]] = freeze i1 [[C]] +; CHECK-NEXT:br i1 [[C_FR]], label [[START_SPLIT_US:%.*]], label [[START_SPLIT:%.*]] +; CHECK: start.split.us: +; CHECK-NEXT:br label [[LOOP_US:%.*]] +; CHECK: loop.us: +; CHECK-NEXT:call void @foo() +; CHECK-NEXT:br label [[LOOP_US]] +; CHECK: start.split: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT:call void @foo() +; CHECK-NEXT:br label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT:ret void +; +start: + br label %loop + +loop: + %fn = load ptr, ptr @vtable, align 8 + call void %fn() + br i1 %c, label %loop, label %exit + +exit: + ret void +} + +define void @test2_swapped(i1 %c, ptr %p) { +; CHECK-LABEL: define void @test2_swapped( +; CHECK-SAME: i1 [[C:%.*]], ptr [[P:%.*]]) { +; CHECK-NEXT:[[C_FR:%.*]] = freeze i1 [[C]] +; CHECK-NEXT:br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]] +; CHECK: .split.us: +; CHECK-NEXT:br label [[LOOP_US:%.*]] +; CHECK: loop.us: +; CHECK-NEXT:call void @foo() +; CHECK-NEXT:call void @bar() +; CHECK-NEXT:br label [[LOOP_US]] +; CHECK: .split: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT:call void @foo() +; CHECK-NEXT:call void @bar() +; CHECK-NEXT:br label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT:ret void +; + br label %loop + +loop: + %fn = load ptr, ptr @vtable, align 8 + call void %fn() + call void @bar() + br i1 %c, label %loop, label %exit + +exit: + ret void +} + +define void @test3_swapped(i1 %c, ptr %p) { +; CHECK-LABEL: define void @test3_swapped( +; CHECK-SAME: i1 [[C:%.*]], ptr [[P:%.*]]) { +; CHECK-NEXT:[[C_FR:%.*]] = freeze i1 [[C]] +; CHECK-NEXT:br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]] +; CHECK: .split.us: +; CHEC
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79571 (PR #79572)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/79572 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80699 (PR #80729)
https://github.com/ZequanWu approved this pull request. https://github.com/llvm/llvm-project/pull/80729 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#77871 (PR #80513)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/80513 >From a6817b7315af5da94cfbe69767c8e8f827fecbca Mon Sep 17 00:00:00 2001 From: NAKAMURA Takumi Date: Fri, 2 Feb 2024 18:37:10 +0900 Subject: [PATCH 1/2] CoverageMappingWriter: Emit `Decision` before `Expansion` (#78966) To relax scanning record, tweak order by `Decision < Expansion`, or `Expansion` could not be distinguished whether it belonged to `Decision` or not. Relevant to #77871 (cherry picked from commit 438fe1db09b0c20708ea1020519d8073c37feae8) --- .../Coverage/CoverageMappingWriter.cpp| 10 +- .../ProfileData/CoverageMappingTest.cpp | 36 +++ 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp b/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp index 1c7d8a8909c48..27727f216b051 100644 --- a/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp +++ b/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp @@ -167,7 +167,15 @@ void CoverageMappingWriter::write(raw_ostream &OS) { return LHS.FileID < RHS.FileID; if (LHS.startLoc() != RHS.startLoc()) return LHS.startLoc() < RHS.startLoc(); -return LHS.Kind < RHS.Kind; + +// Put `Decision` before `Expansion`. +auto getKindKey = [](CounterMappingRegion::RegionKind Kind) { + return (Kind == CounterMappingRegion::MCDCDecisionRegion + ? 2 * CounterMappingRegion::ExpansionRegion - 1 + : 2 * Kind); +}; + +return getKindKey(LHS.Kind) < getKindKey(RHS.Kind); }); // Write out the fileid -> filename mapping. diff --git a/llvm/unittests/ProfileData/CoverageMappingTest.cpp b/llvm/unittests/ProfileData/CoverageMappingTest.cpp index 23f66a0232ddb..2849781a9dc43 100644 --- a/llvm/unittests/ProfileData/CoverageMappingTest.cpp +++ b/llvm/unittests/ProfileData/CoverageMappingTest.cpp @@ -890,6 +890,42 @@ TEST_P(CoverageMappingTest, non_code_region_bitmask) { ASSERT_EQ(1U, Names.size()); } +// Test the order of MCDCDecision before Expansion +TEST_P(CoverageMappingTest, decision_before_expansion) { + startFunction("foo", 0x1234); + addCMR(Counter::getCounter(0), "foo", 3, 23, 5, 2); + + // This(4:11) was put after Expansion(4:11) before the fix + addMCDCDecisionCMR(0, 2, "foo", 4, 11, 4, 20); + + addExpansionCMR("foo", "A", 4, 11, 4, 12); + addExpansionCMR("foo", "B", 4, 19, 4, 20); + addCMR(Counter::getCounter(0), "A", 1, 14, 1, 17); + addCMR(Counter::getCounter(0), "A", 1, 14, 1, 17); + addMCDCBranchCMR(Counter::getCounter(0), Counter::getCounter(1), 1, 2, 0, "A", + 1, 14, 1, 17); + addCMR(Counter::getCounter(1), "B", 1, 14, 1, 17); + addMCDCBranchCMR(Counter::getCounter(1), Counter::getCounter(2), 2, 0, 0, "B", + 1, 14, 1, 17); + + // InputFunctionCoverageData::Regions is rewritten after the write. + auto InputRegions = InputFunctions.back().Regions; + + writeAndReadCoverageRegions(); + + const auto &OutputRegions = OutputFunctions.back().Regions; + + size_t N = ArrayRef(InputRegions).size(); + ASSERT_EQ(N, OutputRegions.size()); + for (size_t I = 0; I < N; ++I) { +ASSERT_EQ(InputRegions[I].Kind, OutputRegions[I].Kind); +ASSERT_EQ(InputRegions[I].FileID, OutputRegions[I].FileID); +ASSERT_EQ(InputRegions[I].ExpandedFileID, OutputRegions[I].ExpandedFileID); +ASSERT_EQ(InputRegions[I].startLoc(), OutputRegions[I].startLoc()); +ASSERT_EQ(InputRegions[I].endLoc(), OutputRegions[I].endLoc()); + } +} + TEST_P(CoverageMappingTest, strip_filename_prefix) { ProfileWriter.addRecord({"file1:func", 0x1234, {0}}, Err); >From b50a84e303378df35996d7330aa80aa4ea1f497a Mon Sep 17 00:00:00 2001 From: NAKAMURA Takumi Date: Fri, 2 Feb 2024 20:34:12 +0900 Subject: [PATCH 2/2] [Coverage] Let `Decision` take account of expansions (#78969) The current implementation (D138849) assumes `Branch`(es) would follow after the corresponding `Decision`. It is not true if `Branch`(es) are forwarded to expanded file ID. As a result, consecutive `Decision`(s) would be confused with insufficient number of `Branch`(es). `Expansion` will point `Branch`(es) in other file IDs if `Expansion` is included in the range of `Decision`. Fixes #77871 - Co-authored-by: Alan Phipps (cherry picked from commit d912f1f0cb49465b08f82fae89ece222404e5640) --- .../ProfileData/Coverage/CoverageMapping.cpp | 240 ++ llvm/test/tools/llvm-cov/Inputs/mcdc-macro.c | 20 ++ llvm/test/tools/llvm-cov/Inputs/mcdc-macro.o | Bin 0 -> 6424 bytes .../tools/llvm-cov/Inputs/mcdc-macro.proftext | 62 + llvm/test/tools/llvm-cov/mcdc-macro.test | 99 5 files changed, 378 insertions(+), 43 deletions(-) create mode 100644 llvm/test/tools/llvm-cov/Inputs/mcdc-macro.c create mode 100644 llvm/test/tools/llvm-cov/Inputs/mcdc-macro.o create mode 100644 llvm/test/tools/llvm-cov/Inputs/mcdc-macro.proftext create mode 10064
[llvm-branch-commits] [llvm] a6817b7 - CoverageMappingWriter: Emit `Decision` before `Expansion` (#78966)
Author: NAKAMURA Takumi Date: 2024-02-05T11:32:50-08:00 New Revision: a6817b7315af5da94cfbe69767c8e8f827fecbca URL: https://github.com/llvm/llvm-project/commit/a6817b7315af5da94cfbe69767c8e8f827fecbca DIFF: https://github.com/llvm/llvm-project/commit/a6817b7315af5da94cfbe69767c8e8f827fecbca.diff LOG: CoverageMappingWriter: Emit `Decision` before `Expansion` (#78966) To relax scanning record, tweak order by `Decision < Expansion`, or `Expansion` could not be distinguished whether it belonged to `Decision` or not. Relevant to #77871 (cherry picked from commit 438fe1db09b0c20708ea1020519d8073c37feae8) Added: Modified: llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp llvm/unittests/ProfileData/CoverageMappingTest.cpp Removed: diff --git a/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp b/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp index 1c7d8a8909c48..27727f216b051 100644 --- a/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp +++ b/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp @@ -167,7 +167,15 @@ void CoverageMappingWriter::write(raw_ostream &OS) { return LHS.FileID < RHS.FileID; if (LHS.startLoc() != RHS.startLoc()) return LHS.startLoc() < RHS.startLoc(); -return LHS.Kind < RHS.Kind; + +// Put `Decision` before `Expansion`. +auto getKindKey = [](CounterMappingRegion::RegionKind Kind) { + return (Kind == CounterMappingRegion::MCDCDecisionRegion + ? 2 * CounterMappingRegion::ExpansionRegion - 1 + : 2 * Kind); +}; + +return getKindKey(LHS.Kind) < getKindKey(RHS.Kind); }); // Write out the fileid -> filename mapping. diff --git a/llvm/unittests/ProfileData/CoverageMappingTest.cpp b/llvm/unittests/ProfileData/CoverageMappingTest.cpp index 23f66a0232ddb..2849781a9dc43 100644 --- a/llvm/unittests/ProfileData/CoverageMappingTest.cpp +++ b/llvm/unittests/ProfileData/CoverageMappingTest.cpp @@ -890,6 +890,42 @@ TEST_P(CoverageMappingTest, non_code_region_bitmask) { ASSERT_EQ(1U, Names.size()); } +// Test the order of MCDCDecision before Expansion +TEST_P(CoverageMappingTest, decision_before_expansion) { + startFunction("foo", 0x1234); + addCMR(Counter::getCounter(0), "foo", 3, 23, 5, 2); + + // This(4:11) was put after Expansion(4:11) before the fix + addMCDCDecisionCMR(0, 2, "foo", 4, 11, 4, 20); + + addExpansionCMR("foo", "A", 4, 11, 4, 12); + addExpansionCMR("foo", "B", 4, 19, 4, 20); + addCMR(Counter::getCounter(0), "A", 1, 14, 1, 17); + addCMR(Counter::getCounter(0), "A", 1, 14, 1, 17); + addMCDCBranchCMR(Counter::getCounter(0), Counter::getCounter(1), 1, 2, 0, "A", + 1, 14, 1, 17); + addCMR(Counter::getCounter(1), "B", 1, 14, 1, 17); + addMCDCBranchCMR(Counter::getCounter(1), Counter::getCounter(2), 2, 0, 0, "B", + 1, 14, 1, 17); + + // InputFunctionCoverageData::Regions is rewritten after the write. + auto InputRegions = InputFunctions.back().Regions; + + writeAndReadCoverageRegions(); + + const auto &OutputRegions = OutputFunctions.back().Regions; + + size_t N = ArrayRef(InputRegions).size(); + ASSERT_EQ(N, OutputRegions.size()); + for (size_t I = 0; I < N; ++I) { +ASSERT_EQ(InputRegions[I].Kind, OutputRegions[I].Kind); +ASSERT_EQ(InputRegions[I].FileID, OutputRegions[I].FileID); +ASSERT_EQ(InputRegions[I].ExpandedFileID, OutputRegions[I].ExpandedFileID); +ASSERT_EQ(InputRegions[I].startLoc(), OutputRegions[I].startLoc()); +ASSERT_EQ(InputRegions[I].endLoc(), OutputRegions[I].endLoc()); + } +} + TEST_P(CoverageMappingTest, strip_filename_prefix) { ProfileWriter.addRecord({"file1:func", 0x1234, {0}}, Err); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] b50a84e - [Coverage] Let `Decision` take account of expansions (#78969)
Author: NAKAMURA Takumi Date: 2024-02-05T11:32:51-08:00 New Revision: b50a84e303378df35996d7330aa80aa4ea1f497a URL: https://github.com/llvm/llvm-project/commit/b50a84e303378df35996d7330aa80aa4ea1f497a DIFF: https://github.com/llvm/llvm-project/commit/b50a84e303378df35996d7330aa80aa4ea1f497a.diff LOG: [Coverage] Let `Decision` take account of expansions (#78969) The current implementation (D138849) assumes `Branch`(es) would follow after the corresponding `Decision`. It is not true if `Branch`(es) are forwarded to expanded file ID. As a result, consecutive `Decision`(s) would be confused with insufficient number of `Branch`(es). `Expansion` will point `Branch`(es) in other file IDs if `Expansion` is included in the range of `Decision`. Fixes #77871 - Co-authored-by: Alan Phipps (cherry picked from commit d912f1f0cb49465b08f82fae89ece222404e5640) Added: llvm/test/tools/llvm-cov/Inputs/mcdc-macro.c llvm/test/tools/llvm-cov/Inputs/mcdc-macro.o llvm/test/tools/llvm-cov/Inputs/mcdc-macro.proftext llvm/test/tools/llvm-cov/mcdc-macro.test Modified: llvm/lib/ProfileData/Coverage/CoverageMapping.cpp Removed: diff --git a/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp b/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp index da8e1d87319dd..a357b4cb49211 100644 --- a/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp +++ b/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp @@ -14,6 +14,7 @@ #include "llvm/ProfileData/Coverage/CoverageMapping.h" #include "llvm/ADT/ArrayRef.h" #include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallBitVector.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringExtras.h" @@ -583,6 +584,160 @@ static unsigned getMaxBitmapSize(const CounterMappingContext &Ctx, return MaxBitmapID + (SizeInBits / CHAR_BIT); } +namespace { + +/// Collect Decisions, Branchs, and Expansions and associate them. +class MCDCDecisionRecorder { +private: + /// This holds the DecisionRegion and MCDCBranches under it. + /// Also traverses Expansion(s). + /// The Decision has the number of MCDCBranches and will complete + /// when it is filled with unique ConditionID of MCDCBranches. + struct DecisionRecord { +const CounterMappingRegion *DecisionRegion; + +/// They are reflected from DecisionRegion for convenience. +LineColPair DecisionStartLoc; +LineColPair DecisionEndLoc; + +/// This is passed to `MCDCRecordProcessor`, so this should be compatible +/// to`ArrayRef`. +SmallVector MCDCBranches; + +/// IDs that are stored in MCDCBranches +/// Complete when all IDs (1 to NumConditions) are met. +DenseSet ConditionIDs; + +/// Set of IDs of Expansion(s) that are relevant to DecisionRegion +/// and its children (via expansions). +/// FileID pointed by ExpandedFileID is dedicated to the expansion, so +/// the location in the expansion doesn't matter. +DenseSet ExpandedFileIDs; + +DecisionRecord(const CounterMappingRegion &Decision) +: DecisionRegion(&Decision), DecisionStartLoc(Decision.startLoc()), + DecisionEndLoc(Decision.endLoc()) { + assert(Decision.Kind == CounterMappingRegion::MCDCDecisionRegion); +} + +/// Determine whether DecisionRecord dominates `R`. +bool dominates(const CounterMappingRegion &R) const { + // Determine whether `R` is included in `DecisionRegion`. + if (R.FileID == DecisionRegion->FileID && + R.startLoc() >= DecisionStartLoc && R.endLoc() <= DecisionEndLoc) +return true; + + // Determine whether `R` is pointed by any of Expansions. + return ExpandedFileIDs.contains(R.FileID); +} + +enum Result { + NotProcessed = 0, /// Irrelevant to this Decision + Processed,/// Added to this Decision + Completed,/// Added and filled this Decision +}; + +/// Add Branch into the Decision +/// \param Branch expects MCDCBranchRegion +/// \returns NotProcessed/Processed/Completed +Result addBranch(const CounterMappingRegion &Branch) { + assert(Branch.Kind == CounterMappingRegion::MCDCBranchRegion); + + auto ConditionID = Branch.MCDCParams.ID; + assert(ConditionID > 0 && "ConditionID should begin with 1"); + + if (ConditionIDs.contains(ConditionID) || + ConditionID > DecisionRegion->MCDCParams.NumConditions) +return NotProcessed; + + if (!this->dominates(Branch)) +return NotProcessed; + + assert(MCDCBranches.size() < DecisionRegion->MCDCParams.NumConditions); + + // Put `ID=1` in front of `MCDCBranches` for convenience + // even if `MCDCBranches` is not topological. + if (ConditionID == 1) +MCDCBranches.insert(MCDCBranches.begin(), &Branch); + else +MCDCBranches.push_back(&Branch); + + // Mark `ID` as `assigned`. + ConditionIDs.insert(Con
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#77871 (PR #80513)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/80513 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Add support for copyprivate (PR #80485)
@@ -1092,6 +1040,79 @@ class FirConverter : public Fortran::lower::AbstractConverter { return true; } + void copyVar(const Fortran::semantics::Symbol &sym, + const Fortran::lower::SymbolBox &lhs_sb, + const Fortran::lower::SymbolBox &rhs_sb) { +mlir::Location loc = genLocation(sym.name()); +if (lowerToHighLevelFIR()) + copyVarHLFIR(loc, lhs_sb.getAddr(), rhs_sb.getAddr()); +else + copyVarFIR(loc, sym, lhs_sb, rhs_sb); + } + + void copyVarHLFIR(mlir::Location loc, mlir::Value dst, mlir::Value src) { +assert(lowerToHighLevelFIR()); +hlfir::Entity lhs{dst}; +hlfir::Entity rhs{src}; +// Temporary_lhs is set to true in hlfir.assign below to avoid user +// assignment to be used and finalization to be called on the LHS. +// This may or may not be correct but mimics the current behaviour +// without HLFIR. +auto copyData = [&](hlfir::Entity l, hlfir::Entity r) { + // Dereference RHS and load it if trivial scalar. + r = hlfir::loadTrivialScalar(loc, *builder, r); + builder->create( + loc, r, l, + /*isWholeAllocatableAssignment=*/false, + /*keepLhsLengthInAllocatableAssignment=*/false, + /*temporary_lhs=*/true); +}; +if (lhs.isAllocatable()) { + // Deep copy allocatable if it is allocated. + // Note that when allocated, the RHS is already allocated with the LHS + // shape for copy on entry in createHostAssociateVarClone. + // For lastprivate, this assumes that the RHS was not reallocated in + // the OpenMP region. + lhs = hlfir::derefPointersAndAllocatables(loc, *builder, lhs); + mlir::Value addr = hlfir::genVariableRawAddress(loc, *builder, lhs); + mlir::Value isAllocated = builder->genIsNotNullAddr(loc, addr); + builder->genIfThen(loc, isAllocated) + .genThen([&]() { +// Copy the DATA, not the descriptors. +copyData(lhs, rhs); + }) + .end(); +} else if (lhs.isPointer()) { + // Set LHS target to the target of RHS (do not copy the RHS + // target data into the LHS target storage). + auto loadVal = builder->create(loc, rhs); + builder->create(loc, loadVal, lhs); +} else { + // Non ALLOCATABLE/POINTER variable. Simple DATA copy. + copyData(lhs, rhs); +} + } + + void copyVarFIR(mlir::Location loc, const Fortran::semantics::Symbol &sym, + const Fortran::lower::SymbolBox &lhs_sb, + const Fortran::lower::SymbolBox &rhs_sb) { +assert(!lowerToHighLevelFIR()); +fir::ExtendedValue lhs = symBoxToExtendedValue(lhs_sb); +fir::ExtendedValue rhs = symBoxToExtendedValue(rhs_sb); +mlir::Type symType = genType(sym); +if (auto seqTy = symType.dyn_cast()) { + Fortran::lower::StatementContext stmtCtx; + Fortran::lower::createSomeArrayAssignment(*this, lhs, rhs, localSymbols, +stmtCtx); + stmtCtx.finalizeAndReset(); +} else if (lhs.getBoxOf()) { + fir::factory::CharacterExprHelper{*builder, loc}.createAssign(lhs, rhs); +} else { + auto loadVal = builder->create(loc, fir::getBase(rhs)); + builder->create(loc, loadVal, fir::getBase(lhs)); +} + } + luporl wrote: I guess it would be possible to move this to OpenMP.cpp, but this would mean duplicating around 40 lines of code. The `copyVarHLFIR()` code was extracted from `copyHostAssociateVar()`, that now calls `copyVar()` instead. Can we keep it in the converter to avoid code duplication? https://github.com/llvm/llvm-project/pull/80485 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] PR for llvm/llvm-project#80543 (PR #80544)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/80544 >From 7a5cba8bea8f774d48db1b0426bcc102edd2b69f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Martin=20Storsj=C3=B6?= Date: Sat, 3 Feb 2024 14:52:49 +0100 Subject: [PATCH] [compiler-rt] Remove duplicate MS names for chkstk symbols (#80450) Prior to 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2, the builtins library contained two chkstk implementations for each of i386 and x86_64, one that was used in mingw environments, and one unused (with a symbol name not matching anything that is used anywhere). Some of the functions additionally had other, also unused, aliases. After cleaning this up in 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2, the unused symbol names were removed. At the same time, symbol aliases were added for the names as they are used by MSVC; the functions are functionally equivalent, but have different names between mingw and MSVC style environments. By adding a symbol alias (so that one object file contains two different symbols for the same function), users can run into problems with duplicate definitions, if they themselves define one of the symbols (for various reasons), but need to link in the other one. This happens for Wine, which provides their own definition of "__chkstk", but when built in mingw mode does need compiler-rt to provide the mingw specific symbol names; see https://github.com/mstorsjo/llvm-mingw/issues/397. To avoid the issue, remove the extra MS style names. They weren't entirely usable as such for MSVC style environments anyway, as compiler-rt builtins don't build these object files at all, when built in MSVC mode; thus, the effort to provide them for MSVC style environments in 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2 was a half-hearted step towards that. If we really do want to provide those functions (as an alternative to the ones provided by MSVC itself), we should do it in a separate object file (even if the function implementation is the same), so that users who have a definition of one of them but need a definition of the other, won't have conflicts. Additionally, if we do want to provide them for MSVC, those files actually should be built when building the builtins in MSVC mode as well (see compiler-rt/lib/builtins/CMakeLists.txt). If we do that, there's a risk that an MSVC style build ends up linking in and preferring our implementation over the one provided by MSVC, which would be suboptimal. Our implementation always probes the requested amount of stack, while the MSVC one checks the amount of allocated stack and only probes as much as really is needed. In short - this reverts the situation to what it was in the 17.x release series (except for unused functions that have been removed). (cherry picked from commit 248aeac1ad2cf4f583490dd1312a5b448d2bb8cc) --- compiler-rt/lib/builtins/i386/chkstk.S | 2 -- compiler-rt/lib/builtins/x86_64/chkstk.S | 2 -- 2 files changed, 4 deletions(-) diff --git a/compiler-rt/lib/builtins/i386/chkstk.S b/compiler-rt/lib/builtins/i386/chkstk.S index a84bb0ee30070..cdd9a4c2a5752 100644 --- a/compiler-rt/lib/builtins/i386/chkstk.S +++ b/compiler-rt/lib/builtins/i386/chkstk.S @@ -14,7 +14,6 @@ .text .balign 4 DEFINE_COMPILERRT_FUNCTION(_alloca) // _chkstk and _alloca are the same function -DEFINE_COMPILERRT_FUNCTION(_chkstk) push %ecx cmp$0x1000,%eax lea8(%esp),%ecx // esp before calling this routine -> ecx @@ -35,7 +34,6 @@ DEFINE_COMPILERRT_FUNCTION(_chkstk) push (%eax) // push return address onto the stack sub%esp,%eax// restore the original value in eax ret -END_COMPILERRT_FUNCTION(_chkstk) END_COMPILERRT_FUNCTION(_alloca) #endif // __i386__ diff --git a/compiler-rt/lib/builtins/x86_64/chkstk.S b/compiler-rt/lib/builtins/x86_64/chkstk.S index 494ee261193bc..ad7953a116ac7 100644 --- a/compiler-rt/lib/builtins/x86_64/chkstk.S +++ b/compiler-rt/lib/builtins/x86_64/chkstk.S @@ -18,7 +18,6 @@ .text .balign 4 DEFINE_COMPILERRT_FUNCTION(___chkstk_ms) -DEFINE_COMPILERRT_FUNCTION(__chkstk) push %rcx push %rax cmp$0x1000,%rax @@ -36,7 +35,6 @@ DEFINE_COMPILERRT_FUNCTION(__chkstk) pop%rax pop%rcx ret -END_COMPILERRT_FUNCTION(__chkstk) END_COMPILERRT_FUNCTION(___chkstk_ms) #endif // __x86_64__ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] 7a5cba8 - [compiler-rt] Remove duplicate MS names for chkstk symbols (#80450)
Author: Martin Storsjö Date: 2024-02-05T11:39:38-08:00 New Revision: 7a5cba8bea8f774d48db1b0426bcc102edd2b69f URL: https://github.com/llvm/llvm-project/commit/7a5cba8bea8f774d48db1b0426bcc102edd2b69f DIFF: https://github.com/llvm/llvm-project/commit/7a5cba8bea8f774d48db1b0426bcc102edd2b69f.diff LOG: [compiler-rt] Remove duplicate MS names for chkstk symbols (#80450) Prior to 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2, the builtins library contained two chkstk implementations for each of i386 and x86_64, one that was used in mingw environments, and one unused (with a symbol name not matching anything that is used anywhere). Some of the functions additionally had other, also unused, aliases. After cleaning this up in 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2, the unused symbol names were removed. At the same time, symbol aliases were added for the names as they are used by MSVC; the functions are functionally equivalent, but have different names between mingw and MSVC style environments. By adding a symbol alias (so that one object file contains two different symbols for the same function), users can run into problems with duplicate definitions, if they themselves define one of the symbols (for various reasons), but need to link in the other one. This happens for Wine, which provides their own definition of "__chkstk", but when built in mingw mode does need compiler-rt to provide the mingw specific symbol names; see https://github.com/mstorsjo/llvm-mingw/issues/397. To avoid the issue, remove the extra MS style names. They weren't entirely usable as such for MSVC style environments anyway, as compiler-rt builtins don't build these object files at all, when built in MSVC mode; thus, the effort to provide them for MSVC style environments in 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2 was a half-hearted step towards that. If we really do want to provide those functions (as an alternative to the ones provided by MSVC itself), we should do it in a separate object file (even if the function implementation is the same), so that users who have a definition of one of them but need a definition of the other, won't have conflicts. Additionally, if we do want to provide them for MSVC, those files actually should be built when building the builtins in MSVC mode as well (see compiler-rt/lib/builtins/CMakeLists.txt). If we do that, there's a risk that an MSVC style build ends up linking in and preferring our implementation over the one provided by MSVC, which would be suboptimal. Our implementation always probes the requested amount of stack, while the MSVC one checks the amount of allocated stack and only probes as much as really is needed. In short - this reverts the situation to what it was in the 17.x release series (except for unused functions that have been removed). (cherry picked from commit 248aeac1ad2cf4f583490dd1312a5b448d2bb8cc) Added: Modified: compiler-rt/lib/builtins/i386/chkstk.S compiler-rt/lib/builtins/x86_64/chkstk.S Removed: diff --git a/compiler-rt/lib/builtins/i386/chkstk.S b/compiler-rt/lib/builtins/i386/chkstk.S index a84bb0ee30070..cdd9a4c2a5752 100644 --- a/compiler-rt/lib/builtins/i386/chkstk.S +++ b/compiler-rt/lib/builtins/i386/chkstk.S @@ -14,7 +14,6 @@ .text .balign 4 DEFINE_COMPILERRT_FUNCTION(_alloca) // _chkstk and _alloca are the same function -DEFINE_COMPILERRT_FUNCTION(_chkstk) push %ecx cmp$0x1000,%eax lea8(%esp),%ecx // esp before calling this routine -> ecx @@ -35,7 +34,6 @@ DEFINE_COMPILERRT_FUNCTION(_chkstk) push (%eax) // push return address onto the stack sub%esp,%eax// restore the original value in eax ret -END_COMPILERRT_FUNCTION(_chkstk) END_COMPILERRT_FUNCTION(_alloca) #endif // __i386__ diff --git a/compiler-rt/lib/builtins/x86_64/chkstk.S b/compiler-rt/lib/builtins/x86_64/chkstk.S index 494ee261193bc..ad7953a116ac7 100644 --- a/compiler-rt/lib/builtins/x86_64/chkstk.S +++ b/compiler-rt/lib/builtins/x86_64/chkstk.S @@ -18,7 +18,6 @@ .text .balign 4 DEFINE_COMPILERRT_FUNCTION(___chkstk_ms) -DEFINE_COMPILERRT_FUNCTION(__chkstk) push %rcx push %rax cmp$0x1000,%rax @@ -36,7 +35,6 @@ DEFINE_COMPILERRT_FUNCTION(__chkstk) pop%rax pop%rcx ret -END_COMPILERRT_FUNCTION(__chkstk) END_COMPILERRT_FUNCTION(___chkstk_ms) #endif // __x86_64__ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] PR for llvm/llvm-project#80543 (PR #80544)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/80544 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79175 (PR #80274)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/80274 >From aa6980841e587eba9c98bf54c51f5414f8a15871 Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Wed, 24 Jan 2024 12:33:57 +0100 Subject: [PATCH 1/3] [Loads] Use BatchAAResults for available value APIs (NFCI) This allows caching AA queries both within and across the calls, and enables us to use a custom AAQI configuration. (cherry picked from commit 89dae798cc77789a43e9a60173f647dae03a65fe) --- llvm/include/llvm/Analysis/Loads.h | 12 ++-- llvm/lib/Analysis/Lint.cpp | 3 ++- llvm/lib/Analysis/Loads.cpp | 9 - .../InstCombine/InstCombineLoadStoreAlloca.cpp | 3 ++- llvm/lib/Transforms/Scalar/JumpThreading.cpp | 11 ++- 5 files changed, 20 insertions(+), 18 deletions(-) diff --git a/llvm/include/llvm/Analysis/Loads.h b/llvm/include/llvm/Analysis/Loads.h index 2880ed33a34cbc..0926093bba99de 100644 --- a/llvm/include/llvm/Analysis/Loads.h +++ b/llvm/include/llvm/Analysis/Loads.h @@ -18,7 +18,7 @@ namespace llvm { -class AAResults; +class BatchAAResults; class AssumptionCache; class DataLayout; class DominatorTree; @@ -129,11 +129,10 @@ extern cl::opt DefMaxInstsToScan; /// location in memory, as opposed to the value operand of a store. /// /// \returns The found value, or nullptr if no value is found. -Value *FindAvailableLoadedValue(LoadInst *Load, -BasicBlock *ScanBB, +Value *FindAvailableLoadedValue(LoadInst *Load, BasicBlock *ScanBB, BasicBlock::iterator &ScanFrom, unsigned MaxInstsToScan = DefMaxInstsToScan, -AAResults *AA = nullptr, +BatchAAResults *AA = nullptr, bool *IsLoadCSE = nullptr, unsigned *NumScanedInst = nullptr); @@ -141,7 +140,8 @@ Value *FindAvailableLoadedValue(LoadInst *Load, /// FindAvailableLoadedValue() for the case where we are not interested in /// finding the closest clobbering instruction if no available load is found. /// This overload cannot be used to scan across multiple blocks. -Value *FindAvailableLoadedValue(LoadInst *Load, AAResults &AA, bool *IsLoadCSE, +Value *FindAvailableLoadedValue(LoadInst *Load, BatchAAResults &AA, +bool *IsLoadCSE, unsigned MaxInstsToScan = DefMaxInstsToScan); /// Scan backwards to see if we have the value of the given pointer available @@ -170,7 +170,7 @@ Value *FindAvailableLoadedValue(LoadInst *Load, AAResults &AA, bool *IsLoadCSE, Value *findAvailablePtrLoadStore(const MemoryLocation &Loc, Type *AccessTy, bool AtLeastAtomic, BasicBlock *ScanBB, BasicBlock::iterator &ScanFrom, - unsigned MaxInstsToScan, AAResults *AA, + unsigned MaxInstsToScan, BatchAAResults *AA, bool *IsLoadCSE, unsigned *NumScanedInst); /// Returns true if a pointer value \p A can be replace with another pointer diff --git a/llvm/lib/Analysis/Lint.cpp b/llvm/lib/Analysis/Lint.cpp index 1ebc593016bc0d..16635097d20afe 100644 --- a/llvm/lib/Analysis/Lint.cpp +++ b/llvm/lib/Analysis/Lint.cpp @@ -657,11 +657,12 @@ Value *Lint::findValueImpl(Value *V, bool OffsetOk, BasicBlock::iterator BBI = L->getIterator(); BasicBlock *BB = L->getParent(); SmallPtrSet VisitedBlocks; +BatchAAResults BatchAA(*AA); for (;;) { if (!VisitedBlocks.insert(BB).second) break; if (Value *U = - FindAvailableLoadedValue(L, BB, BBI, DefMaxInstsToScan, AA)) + FindAvailableLoadedValue(L, BB, BBI, DefMaxInstsToScan, &BatchAA)) return findValueImpl(U, OffsetOk, Visited); if (BBI != BB->begin()) break; diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp index 97d21db86abf28..6bf0d2f56eb4eb 100644 --- a/llvm/lib/Analysis/Loads.cpp +++ b/llvm/lib/Analysis/Loads.cpp @@ -450,11 +450,10 @@ llvm::DefMaxInstsToScan("available-load-scan-limit", cl::init(6), cl::Hidden, "to scan backward from a given instruction, when searching for " "available loaded value")); -Value *llvm::FindAvailableLoadedValue(LoadInst *Load, - BasicBlock *ScanBB, +Value *llvm::FindAvailableLoadedValue(LoadInst *Load, BasicBlock *ScanBB, BasicBlock::iterator &ScanFrom, unsigned MaxInstsToScan, - AAResults *AA, bool *IsLoad, + BatchAAResults *AA, bool *IsLoad, unsigned *NumScanedInst) { // Don't CSE load that
[llvm-branch-commits] [llvm] aa69808 - [Loads] Use BatchAAResults for available value APIs (NFCI)
Author: Nikita Popov Date: 2024-02-05T11:41:54-08:00 New Revision: aa6980841e587eba9c98bf54c51f5414f8a15871 URL: https://github.com/llvm/llvm-project/commit/aa6980841e587eba9c98bf54c51f5414f8a15871 DIFF: https://github.com/llvm/llvm-project/commit/aa6980841e587eba9c98bf54c51f5414f8a15871.diff LOG: [Loads] Use BatchAAResults for available value APIs (NFCI) This allows caching AA queries both within and across the calls, and enables us to use a custom AAQI configuration. (cherry picked from commit 89dae798cc77789a43e9a60173f647dae03a65fe) Added: Modified: llvm/include/llvm/Analysis/Loads.h llvm/lib/Analysis/Lint.cpp llvm/lib/Analysis/Loads.cpp llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp llvm/lib/Transforms/Scalar/JumpThreading.cpp Removed: diff --git a/llvm/include/llvm/Analysis/Loads.h b/llvm/include/llvm/Analysis/Loads.h index 2880ed33a34cb..0926093bba99d 100644 --- a/llvm/include/llvm/Analysis/Loads.h +++ b/llvm/include/llvm/Analysis/Loads.h @@ -18,7 +18,7 @@ namespace llvm { -class AAResults; +class BatchAAResults; class AssumptionCache; class DataLayout; class DominatorTree; @@ -129,11 +129,10 @@ extern cl::opt DefMaxInstsToScan; /// location in memory, as opposed to the value operand of a store. /// /// \returns The found value, or nullptr if no value is found. -Value *FindAvailableLoadedValue(LoadInst *Load, -BasicBlock *ScanBB, +Value *FindAvailableLoadedValue(LoadInst *Load, BasicBlock *ScanBB, BasicBlock::iterator &ScanFrom, unsigned MaxInstsToScan = DefMaxInstsToScan, -AAResults *AA = nullptr, +BatchAAResults *AA = nullptr, bool *IsLoadCSE = nullptr, unsigned *NumScanedInst = nullptr); @@ -141,7 +140,8 @@ Value *FindAvailableLoadedValue(LoadInst *Load, /// FindAvailableLoadedValue() for the case where we are not interested in /// finding the closest clobbering instruction if no available load is found. /// This overload cannot be used to scan across multiple blocks. -Value *FindAvailableLoadedValue(LoadInst *Load, AAResults &AA, bool *IsLoadCSE, +Value *FindAvailableLoadedValue(LoadInst *Load, BatchAAResults &AA, +bool *IsLoadCSE, unsigned MaxInstsToScan = DefMaxInstsToScan); /// Scan backwards to see if we have the value of the given pointer available @@ -170,7 +170,7 @@ Value *FindAvailableLoadedValue(LoadInst *Load, AAResults &AA, bool *IsLoadCSE, Value *findAvailablePtrLoadStore(const MemoryLocation &Loc, Type *AccessTy, bool AtLeastAtomic, BasicBlock *ScanBB, BasicBlock::iterator &ScanFrom, - unsigned MaxInstsToScan, AAResults *AA, + unsigned MaxInstsToScan, BatchAAResults *AA, bool *IsLoadCSE, unsigned *NumScanedInst); /// Returns true if a pointer value \p A can be replace with another pointer diff --git a/llvm/lib/Analysis/Lint.cpp b/llvm/lib/Analysis/Lint.cpp index 1ebc593016bc0..16635097d20af 100644 --- a/llvm/lib/Analysis/Lint.cpp +++ b/llvm/lib/Analysis/Lint.cpp @@ -657,11 +657,12 @@ Value *Lint::findValueImpl(Value *V, bool OffsetOk, BasicBlock::iterator BBI = L->getIterator(); BasicBlock *BB = L->getParent(); SmallPtrSet VisitedBlocks; +BatchAAResults BatchAA(*AA); for (;;) { if (!VisitedBlocks.insert(BB).second) break; if (Value *U = - FindAvailableLoadedValue(L, BB, BBI, DefMaxInstsToScan, AA)) + FindAvailableLoadedValue(L, BB, BBI, DefMaxInstsToScan, &BatchAA)) return findValueImpl(U, OffsetOk, Visited); if (BBI != BB->begin()) break; diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp index 97d21db86abf2..6bf0d2f56eb4e 100644 --- a/llvm/lib/Analysis/Loads.cpp +++ b/llvm/lib/Analysis/Loads.cpp @@ -450,11 +450,10 @@ llvm::DefMaxInstsToScan("available-load-scan-limit", cl::init(6), cl::Hidden, "to scan backward from a given instruction, when searching for " "available loaded value")); -Value *llvm::FindAvailableLoadedValue(LoadInst *Load, - BasicBlock *ScanBB, +Value *llvm::FindAvailableLoadedValue(LoadInst *Load, BasicBlock *ScanBB, BasicBlock::iterator &ScanFrom, unsigned MaxInstsToScan, - AAResults *AA, bool *IsLoad, + BatchAAResults *AA, bool *IsLoad, unsigned *NumScanedInst) { // Don't C
[llvm-branch-commits] [llvm] 28879ab - [AA][JumpThreading] Don't use DomTree for AA in JumpThreading (#79294)
Author: Nikita Popov Date: 2024-02-05T11:41:55-08:00 New Revision: 28879ab8276e7237bfc86f4c7d7890fd4311d334 URL: https://github.com/llvm/llvm-project/commit/28879ab8276e7237bfc86f4c7d7890fd4311d334 DIFF: https://github.com/llvm/llvm-project/commit/28879ab8276e7237bfc86f4c7d7890fd4311d334.diff LOG: [AA][JumpThreading] Don't use DomTree for AA in JumpThreading (#79294) JumpThreading may perform AA queries while the dominator tree is not up to date, which may result in miscompilations. Fix this by adding a new AAQI option to disable the use of the dominator tree in BasicAA. Fixes https://github.com/llvm/llvm-project/issues/79175. (cherry picked from commit 4f32f5d5720fbef06672714a62376f236a36aef5) Added: Modified: llvm/include/llvm/Analysis/AliasAnalysis.h llvm/include/llvm/Analysis/BasicAliasAnalysis.h llvm/lib/Analysis/BasicAliasAnalysis.cpp llvm/lib/Transforms/Scalar/JumpThreading.cpp llvm/test/Transforms/JumpThreading/pr79175.ll Removed: diff --git a/llvm/include/llvm/Analysis/AliasAnalysis.h b/llvm/include/llvm/Analysis/AliasAnalysis.h index d6f732d35fd4c..e8e4f491be5a3 100644 --- a/llvm/include/llvm/Analysis/AliasAnalysis.h +++ b/llvm/include/llvm/Analysis/AliasAnalysis.h @@ -287,6 +287,10 @@ class AAQueryInfo { /// store %l, ... bool MayBeCrossIteration = false; + /// Whether alias analysis is allowed to use the dominator tree, for use by + /// passes that lazily update the DT while performing AA queries. + bool UseDominatorTree = true; + AAQueryInfo(AAResults &AAR, CaptureInfo *CI) : AAR(AAR), CI(CI) {} }; @@ -668,6 +672,9 @@ class BatchAAResults { void enableCrossIterationMode() { AAQI.MayBeCrossIteration = true; } + + /// Disable the use of the dominator tree during alias analysis queries. + void disableDominatorTree() { AAQI.UseDominatorTree = false; } }; /// Temporary typedef for legacy code that uses a generic \c AliasAnalysis diff --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h index afc1811239f28..7eca82729430d 100644 --- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h +++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h @@ -43,20 +43,26 @@ class BasicAAResult : public AAResultBase { const Function &F; const TargetLibraryInfo &TLI; AssumptionCache &AC; - DominatorTree *DT; + /// Use getDT() instead of accessing this member directly, in order to + /// respect the AAQI.UseDominatorTree option. + DominatorTree *DT_; + + DominatorTree *getDT(const AAQueryInfo &AAQI) const { +return AAQI.UseDominatorTree ? DT_ : nullptr; + } public: BasicAAResult(const DataLayout &DL, const Function &F, const TargetLibraryInfo &TLI, AssumptionCache &AC, DominatorTree *DT = nullptr) - : DL(DL), F(F), TLI(TLI), AC(AC), DT(DT) {} + : DL(DL), F(F), TLI(TLI), AC(AC), DT_(DT) {} BasicAAResult(const BasicAAResult &Arg) : AAResultBase(Arg), DL(Arg.DL), F(Arg.F), TLI(Arg.TLI), AC(Arg.AC), -DT(Arg.DT) {} +DT_(Arg.DT_) {} BasicAAResult(BasicAAResult &&Arg) : AAResultBase(std::move(Arg)), DL(Arg.DL), F(Arg.F), TLI(Arg.TLI), -AC(Arg.AC), DT(Arg.DT) {} +AC(Arg.AC), DT_(Arg.DT_) {} /// Handle invalidation events in the new pass manager. bool invalidate(Function &Fn, const PreservedAnalyses &PA, diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp index 3178e2d278167..1028b52a79123 100644 --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp @@ -89,7 +89,7 @@ bool BasicAAResult::invalidate(Function &Fn, const PreservedAnalyses &PA, // may be created without handles to some analyses and in that case don't // depend on them. if (Inv.invalidate(Fn, PA) || - (DT && Inv.invalidate(Fn, PA))) + (DT_ && Inv.invalidate(Fn, PA))) return true; // Otherwise this analysis result remains valid. @@ -1063,6 +1063,7 @@ AliasResult BasicAAResult::aliasGEP( : AliasResult::MayAlias; } + DominatorTree *DT = getDT(AAQI); DecomposedGEP DecompGEP1 = DecomposeGEPExpression(GEP1, DL, &AC, DT); DecomposedGEP DecompGEP2 = DecomposeGEPExpression(V2, DL, &AC, DT); @@ -1556,6 +1557,7 @@ AliasResult BasicAAResult::aliasCheck(const Value *V1, LocationSize V1Size, const Value *HintO1 = getUnderlyingObject(Hint1); const Value *HintO2 = getUnderlyingObject(Hint2); +DominatorTree *DT = getDT(AAQI); auto ValidAssumeForPtrContext = [&](const Value *Ptr) { if (const Instruction *PtrI = dyn_cast(Ptr)) { return isValidAssumeForContext(Assume, PtrI, DT, @@ -1735,7 +1737,7 @@ bool BasicAAResult::isValueEqualInPotentialCycles(const Value *V, if (!Inst || Inst->getParent()->isEntryBlo
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79175 (PR #80274)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/80274 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] a581690 - [JumpThreading] Add test for #79175 (NFC)
Author: Nikita Popov Date: 2024-02-05T11:41:55-08:00 New Revision: a581690c57d153f329ded71004a8616b93cb88ca URL: https://github.com/llvm/llvm-project/commit/a581690c57d153f329ded71004a8616b93cb88ca DIFF: https://github.com/llvm/llvm-project/commit/a581690c57d153f329ded71004a8616b93cb88ca.diff LOG: [JumpThreading] Add test for #79175 (NFC) (cherry picked from commit 7143b451d71fe314730f7610d7908e3b9611815c) Added: llvm/test/Transforms/JumpThreading/pr79175.ll Modified: Removed: diff --git a/llvm/test/Transforms/JumpThreading/pr79175.ll b/llvm/test/Transforms/JumpThreading/pr79175.ll new file mode 100644 index 0..6815aabb26dfc --- /dev/null +++ b/llvm/test/Transforms/JumpThreading/pr79175.ll @@ -0,0 +1,64 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4 +; RUN: opt -S -passes=jump-threading < %s | FileCheck %s + +@f = external global i32 + +; Make sure the value of @f is reloaded prior to the final comparison. +; FIXME: This is a miscompile. +define i32 @test(i64 %idx, i32 %val) { +; CHECK-LABEL: define i32 @test( +; CHECK-SAME: i64 [[IDX:%.*]], i32 [[VAL:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT:[[CMP:%.*]] = icmp slt i64 [[IDX]], 1 +; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[RETURN:%.*]] +; CHECK: for.body: +; CHECK-NEXT:[[F:%.*]] = load i32, ptr @f, align 4 +; CHECK-NEXT:[[CMP1:%.*]] = icmp eq i32 [[F]], 0 +; CHECK-NEXT:br i1 [[CMP1]], label [[COND_END_THREAD:%.*]], label [[COND_END:%.*]] +; CHECK: cond.end: +; CHECK-NEXT:[[CMP_I:%.*]] = icmp sgt i32 [[VAL]], 0 +; CHECK-NEXT:[[COND_FR:%.*]] = freeze i1 [[CMP_I]] +; CHECK-NEXT:br i1 [[COND_FR]], label [[COND_END_THREAD]], label [[TMP0:%.*]] +; CHECK: cond.end.thread: +; CHECK-NEXT:[[F_RELOAD_PR:%.*]] = load i32, ptr @f, align 4 +; CHECK-NEXT:br label [[TMP0]] +; CHECK: 0: +; CHECK-NEXT:[[F_RELOAD:%.*]] = phi i32 [ [[F]], [[COND_END]] ], [ [[F_RELOAD_PR]], [[COND_END_THREAD]] ] +; CHECK-NEXT:[[TMP1:%.*]] = phi i32 [ 0, [[COND_END_THREAD]] ], [ [[VAL]], [[COND_END]] ] +; CHECK-NEXT:[[F_IDX:%.*]] = getelementptr inbounds i32, ptr @f, i64 [[IDX]] +; CHECK-NEXT:store i32 [[TMP1]], ptr [[F_IDX]], align 4 +; CHECK-NEXT:[[CMP3:%.*]] = icmp slt i32 [[F_RELOAD]], 1 +; CHECK-NEXT:br i1 [[CMP3]], label [[RETURN2:%.*]], label [[RETURN]] +; CHECK: return: +; CHECK-NEXT:ret i32 0 +; CHECK: return2: +; CHECK-NEXT:ret i32 1 +; +entry: + %cmp = icmp slt i64 %idx, 1 + br i1 %cmp, label %for.body, label %return + +for.body: + %f = load i32, ptr @f, align 4 + %cmp1 = icmp eq i32 %f, 0 + br i1 %cmp1, label %cond.end, label %cond.false + +cond.false: + br label %cond.end + +cond.end: + %phi = phi i32 [ %val, %cond.false ], [ 1, %for.body ] + %cmp.i = icmp sgt i32 %phi, 0 + %sel = select i1 %cmp.i, i32 0, i32 %phi + %f.idx = getelementptr inbounds i32, ptr @f, i64 %idx + store i32 %sel, ptr %f.idx, align 4 + %f.reload = load i32, ptr @f, align 4 + %cmp3 = icmp slt i32 %f.reload, 1 + br i1 %cmp3, label %return2, label %return + +return: + ret i32 0 + +return2: + ret i32 1 +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80597 (PR #80731)
https://github.com/nikic approved this pull request. https://github.com/llvm/llvm-project/pull/80731 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] PR for llvm/llvm-project#80599 (PR #80600)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/80600 >From f7d0a0e7aec97eb7f0719f0f3cfcf94ad823fedd Mon Sep 17 00:00:00 2001 From: Koakuma Date: Sun, 4 Feb 2024 11:08:00 +0700 Subject: [PATCH] [clang] Add GCC-compatible code model names for sparc64 This adds GCC-compatible names for code model selection on 64-bit SPARC with absolute code. Testing with a 2-stage build then running codegen tests works okay under all of the supported code models. (32-bit target does not have selectable code models) Reviewed By: @brad0, @MaskRay (cherry picked from commit b0f0babff22e9c0af74535b05e2c6424392bb24a) --- clang/lib/Driver/ToolChains/Clang.cpp | 8 clang/test/Driver/sparc64-codemodel.c | 6 ++ 2 files changed, 14 insertions(+) create mode 100644 clang/test/Driver/sparc64-codemodel.c diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 8092fc050b0ee..54de8edd9a039 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -5779,6 +5779,14 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, // NVPTX/AMDGPU does not care about the code model and will accept // whatever works for the host. Ok = true; +} else if (Triple.isSPARC64()) { + if (CM == "medlow") +CM = "small"; + else if (CM == "medmid") +CM = "medium"; + else if (CM == "medany") +CM = "large"; + Ok = CM == "small" || CM == "medium" || CM == "large"; } if (Ok) { CmdArgs.push_back(Args.MakeArgString("-mcmodel=" + CM)); diff --git a/clang/test/Driver/sparc64-codemodel.c b/clang/test/Driver/sparc64-codemodel.c new file mode 100644 index 0..e4b01fd61b6fa --- /dev/null +++ b/clang/test/Driver/sparc64-codemodel.c @@ -0,0 +1,6 @@ +// RUN: %clang --target=sparc64 -mcmodel=medlow %s -### 2>&1 | FileCheck -check-prefix=MEDLOW %s +// RUN: %clang --target=sparc64 -mcmodel=medmid %s -### 2>&1 | FileCheck -check-prefix=MEDMID %s +// RUN: %clang --target=sparc64 -mcmodel=medany %s -### 2>&1 | FileCheck -check-prefix=MEDANY %s +// MEDLOW: "-mcmodel=small" +// MEDMID: "-mcmodel=medium" +// MEDANY: "-mcmodel=large" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] f7d0a0e - [clang] Add GCC-compatible code model names for sparc64
Author: Koakuma Date: 2024-02-05T11:46:24-08:00 New Revision: f7d0a0e7aec97eb7f0719f0f3cfcf94ad823fedd URL: https://github.com/llvm/llvm-project/commit/f7d0a0e7aec97eb7f0719f0f3cfcf94ad823fedd DIFF: https://github.com/llvm/llvm-project/commit/f7d0a0e7aec97eb7f0719f0f3cfcf94ad823fedd.diff LOG: [clang] Add GCC-compatible code model names for sparc64 This adds GCC-compatible names for code model selection on 64-bit SPARC with absolute code. Testing with a 2-stage build then running codegen tests works okay under all of the supported code models. (32-bit target does not have selectable code models) Reviewed By: @brad0, @MaskRay (cherry picked from commit b0f0babff22e9c0af74535b05e2c6424392bb24a) Added: clang/test/Driver/sparc64-codemodel.c Modified: clang/lib/Driver/ToolChains/Clang.cpp Removed: diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 8092fc050b0ee..54de8edd9a039 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -5779,6 +5779,14 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, // NVPTX/AMDGPU does not care about the code model and will accept // whatever works for the host. Ok = true; +} else if (Triple.isSPARC64()) { + if (CM == "medlow") +CM = "small"; + else if (CM == "medmid") +CM = "medium"; + else if (CM == "medany") +CM = "large"; + Ok = CM == "small" || CM == "medium" || CM == "large"; } if (Ok) { CmdArgs.push_back(Args.MakeArgString("-mcmodel=" + CM)); diff --git a/clang/test/Driver/sparc64-codemodel.c b/clang/test/Driver/sparc64-codemodel.c new file mode 100644 index 0..e4b01fd61b6fa --- /dev/null +++ b/clang/test/Driver/sparc64-codemodel.c @@ -0,0 +1,6 @@ +// RUN: %clang --target=sparc64 -mcmodel=medlow %s -### 2>&1 | FileCheck -check-prefix=MEDLOW %s +// RUN: %clang --target=sparc64 -mcmodel=medmid %s -### 2>&1 | FileCheck -check-prefix=MEDMID %s +// RUN: %clang --target=sparc64 -mcmodel=medany %s -### 2>&1 | FileCheck -check-prefix=MEDANY %s +// MEDLOW: "-mcmodel=small" +// MEDMID: "-mcmodel=medium" +// MEDANY: "-mcmodel=large" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] PR for llvm/llvm-project#80599 (PR #80600)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/80600 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/80695 >From 47fbb649e12f7016ee60a5918bda26c01f2ea543 Mon Sep 17 00:00:00 2001 From: Pierre van Houtryve Date: Mon, 5 Feb 2024 14:36:15 +0100 Subject: [PATCH] [AMDGPU][PromoteAlloca] Support memsets to ptr allocas (#80678) Fixes #80366 (cherry picked from commit 4e958abf2f44d08129eafd5b6a4ee2bd3584ed22) --- .../lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | 16 -- .../CodeGen/AMDGPU/promote-alloca-memset.ll | 54 +++ 2 files changed, 66 insertions(+), 4 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp index 5e73411cae9b70..c1b244f50d93f8 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp @@ -521,10 +521,18 @@ static Value *promoteAllocaUserToVector( // For memset, we don't need to know the previous value because we // currently only allow memsets that cover the whole alloca. Value *Elt = MSI->getOperand(1); - if (DL.getTypeStoreSize(VecEltTy) > 1) { -Value *EltBytes = -Builder.CreateVectorSplat(DL.getTypeStoreSize(VecEltTy), Elt); -Elt = Builder.CreateBitCast(EltBytes, VecEltTy); + const unsigned BytesPerElt = DL.getTypeStoreSize(VecEltTy); + if (BytesPerElt > 1) { +Value *EltBytes = Builder.CreateVectorSplat(BytesPerElt, Elt); + +// If the element type of the vector is a pointer, we need to first cast +// to an integer, then use a PtrCast. +if (VecEltTy->isPointerTy()) { + Type *PtrInt = Builder.getIntNTy(BytesPerElt * 8); + Elt = Builder.CreateBitCast(EltBytes, PtrInt); + Elt = Builder.CreateIntToPtr(Elt, VecEltTy); +} else + Elt = Builder.CreateBitCast(EltBytes, VecEltTy); } return Builder.CreateVectorSplat(VectorTy->getElementCount(), Elt); diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll index 15af1f17e230ec..f1e2737b370ef0 100644 --- a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll +++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll @@ -84,4 +84,58 @@ entry: ret void } +define amdgpu_kernel void @memset_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [6 x ptr], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_vector_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_vector_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca <6 x ptr>, align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_array_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x [3 x ptr]], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x [3 x ptr]], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_vec_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_vec_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x <3 x ptr>], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x <3 x ptr>], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 47fbb64 - [AMDGPU][PromoteAlloca] Support memsets to ptr allocas (#80678)
Author: Pierre van Houtryve Date: 2024-02-05T11:48:14-08:00 New Revision: 47fbb649e12f7016ee60a5918bda26c01f2ea543 URL: https://github.com/llvm/llvm-project/commit/47fbb649e12f7016ee60a5918bda26c01f2ea543 DIFF: https://github.com/llvm/llvm-project/commit/47fbb649e12f7016ee60a5918bda26c01f2ea543.diff LOG: [AMDGPU][PromoteAlloca] Support memsets to ptr allocas (#80678) Fixes #80366 (cherry picked from commit 4e958abf2f44d08129eafd5b6a4ee2bd3584ed22) Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp index 5e73411cae9b7..c1b244f50d93f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp @@ -521,10 +521,18 @@ static Value *promoteAllocaUserToVector( // For memset, we don't need to know the previous value because we // currently only allow memsets that cover the whole alloca. Value *Elt = MSI->getOperand(1); - if (DL.getTypeStoreSize(VecEltTy) > 1) { -Value *EltBytes = -Builder.CreateVectorSplat(DL.getTypeStoreSize(VecEltTy), Elt); -Elt = Builder.CreateBitCast(EltBytes, VecEltTy); + const unsigned BytesPerElt = DL.getTypeStoreSize(VecEltTy); + if (BytesPerElt > 1) { +Value *EltBytes = Builder.CreateVectorSplat(BytesPerElt, Elt); + +// If the element type of the vector is a pointer, we need to first cast +// to an integer, then use a PtrCast. +if (VecEltTy->isPointerTy()) { + Type *PtrInt = Builder.getIntNTy(BytesPerElt * 8); + Elt = Builder.CreateBitCast(EltBytes, PtrInt); + Elt = Builder.CreateIntToPtr(Elt, VecEltTy); +} else + Elt = Builder.CreateBitCast(EltBytes, VecEltTy); } return Builder.CreateVectorSplat(VectorTy->getElementCount(), Elt); diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll index 15af1f17e230e..f1e2737b370ef 100644 --- a/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll +++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll @@ -84,4 +84,58 @@ entry: ret void } +define amdgpu_kernel void @memset_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [6 x ptr], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_vector_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_vector_ptr_alloca( +; CHECK-NEXT:store i64 0, ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca <6 x ptr>, align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_array_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_array_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x [3 x ptr]], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x [3 x ptr]], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + +define amdgpu_kernel void @memset_array_of_vec_ptr_alloca(ptr %out) { +; CHECK-LABEL: @memset_array_of_vec_ptr_alloca( +; CHECK-NEXT:[[ALLOCA:%.*]] = alloca [2 x <3 x ptr>], align 16, addrspace(5) +; CHECK-NEXT:call void @llvm.memset.p5.i64(ptr addrspace(5) [[ALLOCA]], i8 0, i64 48, i1 false) +; CHECK-NEXT:[[LOAD:%.*]] = load i64, ptr addrspace(5) [[ALLOCA]], align 8 +; CHECK-NEXT:store i64 [[LOAD]], ptr [[OUT:%.*]], align 8 +; CHECK-NEXT:ret void +; + %alloca = alloca [2 x <3 x ptr>], align 16, addrspace(5) + call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 48, i1 false) + %load = load i64, ptr addrspace(5) %alloca + store i64 %load, ptr %out + ret void +} + declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80694 (PR #80695)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/80695 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80168 (PR #80702)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/80702 >From 72533964036dca3ce806044e92a1e70584e3aca9 Mon Sep 17 00:00:00 2001 From: Louis Dionne Date: Mon, 5 Feb 2024 11:05:46 -0500 Subject: [PATCH] [libc++] Add missing conditionals for feature-test macros (#80168) We noticed that some feature-test macros were not conditional on configuration flags like _LIBCPP_HAS_NO_FILESYSTEM. As a result, code attempting to use FTMs would not work as intended. This patch adds conditionals for a few feature-test macros, but more issues may exist. rdar://122020466 (cherry picked from commit f2c84211d2834c73ff874389c6bb47b1c76d391a) --- libcxx/include/version| 14 +- .../filesystem.version.compile.pass.cpp | 16 +- .../fstream.version.compile.pass.cpp | 16 +- .../iomanip.version.compile.pass.cpp | 80 +--- .../mutex.version.compile.pass.cpp| 64 +-- .../version.version.compile.pass.cpp | 176 -- .../generate_feature_test_macro_components.py | 10 +- 7 files changed, 254 insertions(+), 122 deletions(-) diff --git a/libcxx/include/version b/libcxx/include/version index 9e26da8c1b2425..d356976d6454ad 100644 --- a/libcxx/include/version +++ b/libcxx/include/version @@ -266,7 +266,9 @@ __cpp_lib_within_lifetime 202306L # define __cpp_lib_make_reverse_iterator201402L # define __cpp_lib_make_unique 201304L # define __cpp_lib_null_iterators 201304L -# define __cpp_lib_quoted_string_io 201304L +# if !defined(_LIBCPP_HAS_NO_LOCALIZATION) +# define __cpp_lib_quoted_string_io 201304L +# endif # define __cpp_lib_result_of_sfinae 201210L # define __cpp_lib_robust_nonmodifying_seq_ops 201304L # if !defined(_LIBCPP_HAS_NO_THREADS) @@ -294,7 +296,7 @@ __cpp_lib_within_lifetime 202306L # define __cpp_lib_clamp201603L # define __cpp_lib_enable_shared_from_this 201603L // # define __cpp_lib_execution201603L -# if _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY +# if !defined(_LIBCPP_HAS_NO_FILESYSTEM) && _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY # define __cpp_lib_filesystem 201703L # endif # define __cpp_lib_gcd_lcm 201606L @@ -323,7 +325,9 @@ __cpp_lib_within_lifetime 202306L // # define __cpp_lib_parallel_algorithm 201603L # define __cpp_lib_raw_memory_algorithms201606L # define __cpp_lib_sample 201603L -# define __cpp_lib_scoped_lock 201703L +# if !defined(_LIBCPP_HAS_NO_THREADS) +# define __cpp_lib_scoped_lock201703L +# endif # if !defined(_LIBCPP_HAS_NO_THREADS) # define __cpp_lib_shared_mutex 201505L # endif @@ -496,7 +500,9 @@ __cpp_lib_within_lifetime 202306L // # define __cpp_lib_freestanding_optional202311L // # define __cpp_lib_freestanding_string_view 202311L // # define __cpp_lib_freestanding_variant 202311L -# define __cpp_lib_fstream_native_handle202306L +# if !defined(_LIBCPP_HAS_NO_FILESYSTEM) && !defined(_LIBCPP_HAS_NO_LOCALIZATION) +# define __cpp_lib_fstream_native_handle 202306L +# endif // # define __cpp_lib_function_ref 202306L // # define __cpp_lib_hazard_pointer 202306L // # define __cpp_lib_linalg 202311L diff --git a/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp b/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp index 46ccde800c1796..3f03e8be9aeab3 100644 --- a/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp +++ b/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp @@ -51,7 +51,7 @@ # error "__cpp_lib_char8_t should not be defined before c++20" # endif -# if !defined(_LIBCPP_VERSION) || _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY +# if !defined(_LIBCPP_VERSION) || (!defined(_LIBCPP_HAS_NO_FILESYSTEM) && _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY) # ifndef __cpp_lib_filesystem # error "__cpp_lib_filesystem should be defined in c++17" # endif @@ -60,7 +60,7 @@ # endif # else # ifdef __cpp_lib_filesystem -# error "__cpp_lib_filesystem should not be defined when the requirement '!defined(_LIBCPP_VERSION) || _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY' is not met!" +# error "__cpp_lib_filesystem should not be defined when the
[llvm-branch-commits] [libcxx] 7253396 - [libc++] Add missing conditionals for feature-test macros (#80168)
Author: Louis Dionne Date: 2024-02-05T11:49:51-08:00 New Revision: 72533964036dca3ce806044e92a1e70584e3aca9 URL: https://github.com/llvm/llvm-project/commit/72533964036dca3ce806044e92a1e70584e3aca9 DIFF: https://github.com/llvm/llvm-project/commit/72533964036dca3ce806044e92a1e70584e3aca9.diff LOG: [libc++] Add missing conditionals for feature-test macros (#80168) We noticed that some feature-test macros were not conditional on configuration flags like _LIBCPP_HAS_NO_FILESYSTEM. As a result, code attempting to use FTMs would not work as intended. This patch adds conditionals for a few feature-test macros, but more issues may exist. rdar://122020466 (cherry picked from commit f2c84211d2834c73ff874389c6bb47b1c76d391a) Added: Modified: libcxx/include/version libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp libcxx/test/std/language.support/support.limits/support.limits.general/fstream.version.compile.pass.cpp libcxx/test/std/language.support/support.limits/support.limits.general/iomanip.version.compile.pass.cpp libcxx/test/std/language.support/support.limits/support.limits.general/mutex.version.compile.pass.cpp libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp libcxx/utils/generate_feature_test_macro_components.py Removed: diff --git a/libcxx/include/version b/libcxx/include/version index 9e26da8c1b242..d356976d6454a 100644 --- a/libcxx/include/version +++ b/libcxx/include/version @@ -266,7 +266,9 @@ __cpp_lib_within_lifetime 202306L # define __cpp_lib_make_reverse_iterator201402L # define __cpp_lib_make_unique 201304L # define __cpp_lib_null_iterators 201304L -# define __cpp_lib_quoted_string_io 201304L +# if !defined(_LIBCPP_HAS_NO_LOCALIZATION) +# define __cpp_lib_quoted_string_io 201304L +# endif # define __cpp_lib_result_of_sfinae 201210L # define __cpp_lib_robust_nonmodifying_seq_ops 201304L # if !defined(_LIBCPP_HAS_NO_THREADS) @@ -294,7 +296,7 @@ __cpp_lib_within_lifetime 202306L # define __cpp_lib_clamp201603L # define __cpp_lib_enable_shared_from_this 201603L // # define __cpp_lib_execution201603L -# if _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY +# if !defined(_LIBCPP_HAS_NO_FILESYSTEM) && _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY # define __cpp_lib_filesystem 201703L # endif # define __cpp_lib_gcd_lcm 201606L @@ -323,7 +325,9 @@ __cpp_lib_within_lifetime 202306L // # define __cpp_lib_parallel_algorithm 201603L # define __cpp_lib_raw_memory_algorithms201606L # define __cpp_lib_sample 201603L -# define __cpp_lib_scoped_lock 201703L +# if !defined(_LIBCPP_HAS_NO_THREADS) +# define __cpp_lib_scoped_lock201703L +# endif # if !defined(_LIBCPP_HAS_NO_THREADS) # define __cpp_lib_shared_mutex 201505L # endif @@ -496,7 +500,9 @@ __cpp_lib_within_lifetime 202306L // # define __cpp_lib_freestanding_optional202311L // # define __cpp_lib_freestanding_string_view 202311L // # define __cpp_lib_freestanding_variant 202311L -# define __cpp_lib_fstream_native_handle202306L +# if !defined(_LIBCPP_HAS_NO_FILESYSTEM) && !defined(_LIBCPP_HAS_NO_LOCALIZATION) +# define __cpp_lib_fstream_native_handle 202306L +# endif // # define __cpp_lib_function_ref 202306L // # define __cpp_lib_hazard_pointer 202306L // # define __cpp_lib_linalg 202311L diff --git a/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp b/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp index 46ccde800c179..3f03e8be9aeab 100644 --- a/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp +++ b/libcxx/test/std/language.support/support.limits/support.limits.general/filesystem.version.compile.pass.cpp @@ -51,7 +51,7 @@ # error "__cpp_lib_char8_t should not be defined before c++20" # endif -# if !defined(_LIBCPP_VERSION) || _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY +# if !defined(_LIBCPP_VERSION) || (!defined(_LIBCPP_HAS_NO_FILESYSTEM) && _LIBCPP_AVAILABILITY_HAS_FILESYSTEM_LIBRARY) # ifndef __cpp_lib_filesystem # error "__cpp_lib_fil
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80168 (PR #80702)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/80702 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80718 (PR #80720)
https://github.com/philnik777 approved this pull request. https://github.com/llvm/llvm-project/pull/80720 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] PR for llvm/llvm-project#78892 (PR #80259)
https://github.com/HazardyKnusperkeks approved this pull request. As stated in the discussion, it is an absolutely must to merge it in the release. In my opinion we can't just drop an option, for the next release. https://github.com/llvm/llvm-project/pull/80259 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80718 (PR #80720)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/80720 >From 984fe4054a4e67ed3a781e15a4269a2a89b5f424 Mon Sep 17 00:00:00 2001 From: Dimitry Andric Date: Mon, 5 Feb 2024 17:41:12 +0100 Subject: [PATCH] [libc++] Rename __bit_reference template parameter to avoid conflict (#80661) As of 4d20cfcf4eb08217ed37c4d4c38dc395d7a66d26, `__bit_reference` contains a template `__fill_n` with a bool `_FillValue` parameter. Unfortunately there is a relatively widely used piece of scientific software called NetCDF, which exposes a (C) macro `_FillValue` in its public headers. When building the NetCDF C++ bindings, this quickly leads to compilation errors when the macro interferes with the template in `__bit_reference`. Rename the parameter to `_FillVal` to avoid the conflict. (cherry picked from commit 1ec252298925de50b27930c557ba9de3cc397afe) --- libcxx/include/__bit_reference | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libcxx/include/__bit_reference b/libcxx/include/__bit_reference index 9032b8f0180937..3a5339b72ddc31 100644 --- a/libcxx/include/__bit_reference +++ b/libcxx/include/__bit_reference @@ -173,7 +173,7 @@ private: // fill_n -template +template _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { using _It= __bit_iterator<_Cp, false>; @@ -185,7 +185,7 @@ __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { __storage_type __clz_f = static_cast<__storage_type>(__bits_per_word - __first.__ctz_); __storage_type __dn= std::min(__clz_f, __n); __storage_type __m = (~__storage_type(0) << __first.__ctz_) & (~__storage_type(0) >> (__clz_f - __dn)); -if (_FillValue) +if (_FillVal) *__first.__seg_ |= __m; else *__first.__seg_ &= ~__m; @@ -194,13 +194,13 @@ __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { } // do middle whole words __storage_type __nw = __n / __bits_per_word; - std::fill_n(std::__to_address(__first.__seg_), __nw, _FillValue ? static_cast<__storage_type>(-1) : 0); + std::fill_n(std::__to_address(__first.__seg_), __nw, _FillVal ? static_cast<__storage_type>(-1) : 0); __n -= __nw * __bits_per_word; // do last partial word if (__n > 0) { __first.__seg_ += __nw; __storage_type __m = ~__storage_type(0) >> (__bits_per_word - __n); -if (_FillValue) +if (_FillVal) *__first.__seg_ |= __m; else *__first.__seg_ &= ~__m; @@ -1007,7 +1007,7 @@ private: friend class __bit_iterator<_Cp, true>; template friend struct __bit_array; - template + template _LIBCPP_CONSTEXPR_SINCE_CXX20 friend void __fill_n(__bit_iterator<_Dp, false> __first, typename _Dp::size_type __n); template ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] 984fe40 - [libc++] Rename __bit_reference template parameter to avoid conflict (#80661)
Author: Dimitry Andric Date: 2024-02-05T13:21:41-08:00 New Revision: 984fe4054a4e67ed3a781e15a4269a2a89b5f424 URL: https://github.com/llvm/llvm-project/commit/984fe4054a4e67ed3a781e15a4269a2a89b5f424 DIFF: https://github.com/llvm/llvm-project/commit/984fe4054a4e67ed3a781e15a4269a2a89b5f424.diff LOG: [libc++] Rename __bit_reference template parameter to avoid conflict (#80661) As of 4d20cfcf4eb08217ed37c4d4c38dc395d7a66d26, `__bit_reference` contains a template `__fill_n` with a bool `_FillValue` parameter. Unfortunately there is a relatively widely used piece of scientific software called NetCDF, which exposes a (C) macro `_FillValue` in its public headers. When building the NetCDF C++ bindings, this quickly leads to compilation errors when the macro interferes with the template in `__bit_reference`. Rename the parameter to `_FillVal` to avoid the conflict. (cherry picked from commit 1ec252298925de50b27930c557ba9de3cc397afe) Added: Modified: libcxx/include/__bit_reference Removed: diff --git a/libcxx/include/__bit_reference b/libcxx/include/__bit_reference index 9032b8f018093..3a5339b72ddc3 100644 --- a/libcxx/include/__bit_reference +++ b/libcxx/include/__bit_reference @@ -173,7 +173,7 @@ private: // fill_n -template +template _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { using _It= __bit_iterator<_Cp, false>; @@ -185,7 +185,7 @@ __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { __storage_type __clz_f = static_cast<__storage_type>(__bits_per_word - __first.__ctz_); __storage_type __dn= std::min(__clz_f, __n); __storage_type __m = (~__storage_type(0) << __first.__ctz_) & (~__storage_type(0) >> (__clz_f - __dn)); -if (_FillValue) +if (_FillVal) *__first.__seg_ |= __m; else *__first.__seg_ &= ~__m; @@ -194,13 +194,13 @@ __fill_n(__bit_iterator<_Cp, false> __first, typename _Cp::size_type __n) { } // do middle whole words __storage_type __nw = __n / __bits_per_word; - std::fill_n(std::__to_address(__first.__seg_), __nw, _FillValue ? static_cast<__storage_type>(-1) : 0); + std::fill_n(std::__to_address(__first.__seg_), __nw, _FillVal ? static_cast<__storage_type>(-1) : 0); __n -= __nw * __bits_per_word; // do last partial word if (__n > 0) { __first.__seg_ += __nw; __storage_type __m = ~__storage_type(0) >> (__bits_per_word - __n); -if (_FillValue) +if (_FillVal) *__first.__seg_ |= __m; else *__first.__seg_ &= ~__m; @@ -1007,7 +1007,7 @@ private: friend class __bit_iterator<_Cp, true>; template friend struct __bit_array; - template + template _LIBCPP_CONSTEXPR_SINCE_CXX20 friend void __fill_n(__bit_iterator<_Dp, false> __first, typename _Dp::size_type __n); template ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] PR for llvm/llvm-project#80718 (PR #80720)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/80720 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [18.x][Docs] Add release note about Clang-defined target OS macros (PR #80044)
tstellar wrote: Looks like this patch caused the documentation build to fail. https://github.com/llvm/llvm-project/pull/80044 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80752 (PR #80754)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/80754 resolves llvm/llvm-project#80752 >From 3ac083df943f040770b9d324956fb066bb8db27d Mon Sep 17 00:00:00 2001 From: Billy Laws Date: Wed, 31 Jan 2024 02:32:15 + Subject: [PATCH 1/2] [AArch64] Fix variadic tail-calls on ARM64EC (#79774) ARM64EC varargs calls expect that x4 = sp at entry, special handling is needed to ensure this with tail calls since they occur after the epilogue and the x4 write happens before. I tried going through AArch64MachineFrameLowering for this, hoping to avoid creating the dummy object but this was the best I could do since the stack info that uses isn't populated at this stage, CreateFixedObject also explicitly forbids 0 sized objects. (cherry picked from commit c761b4a5e4cc003a2c850898e1dc67d2637cfb0c) --- .../Target/AArch64/AArch64ISelLowering.cpp| 10 - llvm/test/CodeGen/AArch64/arm64ec-varargs.ll | 37 +++ llvm/test/CodeGen/AArch64/vararg-tallcall.ll | 8 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index e97f5e3220148..957b556edaf31 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -8007,11 +8007,19 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI, } if (IsVarArg && Subtarget->isWindowsArm64EC()) { +SDValue ParamPtr = StackPtr; +if (IsTailCall) { + // Create a dummy object at the top of the stack that can be used to get + // the SP after the epilogue + int FI = MF.getFrameInfo().CreateFixedObject(1, FPDiff, true); + ParamPtr = DAG.getFrameIndex(FI, PtrVT); +} + // For vararg calls, the Arm64EC ABI requires values in x4 and x5 // describing the argument list. x4 contains the address of the // first stack parameter. x5 contains the size in bytes of all parameters // passed on the stack. -RegsToPass.emplace_back(AArch64::X4, StackPtr); +RegsToPass.emplace_back(AArch64::X4, ParamPtr); RegsToPass.emplace_back(AArch64::X5, DAG.getConstant(NumBytes, DL, MVT::i64)); } diff --git a/llvm/test/CodeGen/AArch64/arm64ec-varargs.ll b/llvm/test/CodeGen/AArch64/arm64ec-varargs.ll index dc16b3a1a0f27..844fc52ddade6 100644 --- a/llvm/test/CodeGen/AArch64/arm64ec-varargs.ll +++ b/llvm/test/CodeGen/AArch64/arm64ec-varargs.ll @@ -100,5 +100,42 @@ define void @varargs_many_argscalleer() nounwind { ret void } +define void @varargs_caller_tail() nounwind { +; CHECK-LABEL: varargs_caller_tail: +; CHECK:// %bb.0: +; CHECK-NEXT:sub sp, sp, #48 +; CHECK-NEXT:mov x4, sp +; CHECK-NEXT:add x8, sp, #16 +; CHECK-NEXT:mov x9, #4617315517961601024// =0x4014 +; CHECK-NEXT:mov x0, #4607182418800017408// =0x3ff0 +; CHECK-NEXT:mov w1, #2 // =0x2 +; CHECK-NEXT:mov x2, #4613937818241073152// =0x4008 +; CHECK-NEXT:mov w3, #4 // =0x4 +; CHECK-NEXT:mov w5, #16 // =0x10 +; CHECK-NEXT:stp xzr, x30, [sp, #24] // 8-byte Folded Spill +; CHECK-NEXT:stp x9, x8, [sp] +; CHECK-NEXT:str xzr, [sp, #16] +; CHECK-NEXT:.weak_anti_dep varargs_callee +; CHECK-NEXT:.set varargs_callee, "#varargs_callee"@WEAKREF +; CHECK-NEXT:.weak_anti_dep "#varargs_callee" +; CHECK-NEXT:.set "#varargs_callee", varargs_callee@WEAKREF +; CHECK-NEXT:bl "#varargs_callee" +; CHECK-NEXT:ldr x30, [sp, #32] // 8-byte Folded Reload +; CHECK-NEXT:add x4, sp, #48 +; CHECK-NEXT:mov x0, #4607182418800017408// =0x3ff0 +; CHECK-NEXT:mov w1, #4 // =0x4 +; CHECK-NEXT:mov w2, #3 // =0x3 +; CHECK-NEXT:mov w3, #2 // =0x2 +; CHECK-NEXT:mov x5, xzr +; CHECK-NEXT:add sp, sp, #48 +; CHECK-NEXT:.weak_anti_dep varargs_callee +; CHECK-NEXT:.set varargs_callee, "#varargs_callee"@WEAKREF +; CHECK-NEXT:.weak_anti_dep "#varargs_callee" +; CHECK-NEXT:.set "#varargs_callee", varargs_callee@WEAKREF +; CHECK-NEXT:b "#varargs_callee" + call void (double, ...) @varargs_callee(double 1.0, i32 2, double 3.0, i32 4, double 5.0, <2 x double> ) + tail call void (double, ...) @varargs_callee(double 1.0, i32 4, i32 3, i32 2) + ret void +} declare void @llvm.va_start(ptr) diff --git a/llvm/test/CodeGen/AArch64/vararg-tallcall.ll b/llvm/test/CodeGen/AArch64/vararg-tallcall.ll index 2d6db1642247d..812837639196e 100644 --- a/llvm/test/CodeGen/AArch64/vararg-tallcall.ll +++ b/llvm/test/CodeGen/AArch64/vara
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80752 (PR #80754)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/80754 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80752 (PR #80754)
llvmbot wrote: @efriedma-quic @cjacek What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/80754 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80752 (PR #80754)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: None (llvmbot) Changes resolves llvm/llvm-project#80752 --- Full diff: https://github.com/llvm/llvm-project/pull/80754.diff 5 Files Affected: - (modified) llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp (+27-21) - (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+9-1) - (modified) llvm/test/CodeGen/AArch64/arm64ec-entry-thunks.ll (+2-2) - (modified) llvm/test/CodeGen/AArch64/arm64ec-varargs.ll (+37) - (modified) llvm/test/CodeGen/AArch64/vararg-tallcall.ll (+8) ``diff diff --git a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp index 11248bb7aef31..91b4f18c73c93 100644 --- a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp @@ -43,6 +43,8 @@ static cl::opt GenerateThunks("arm64ec-generate-thunks", cl::Hidden, namespace { +enum class ThunkType { GuestExit, Entry, Exit }; + class AArch64Arm64ECCallLowering : public ModulePass { public: static char ID; @@ -69,14 +71,14 @@ class AArch64Arm64ECCallLowering : public ModulePass { Type *I64Ty; Type *VoidTy; - void getThunkType(FunctionType *FT, AttributeList AttrList, bool EntryThunk, + void getThunkType(FunctionType *FT, AttributeList AttrList, ThunkType TT, raw_ostream &Out, FunctionType *&Arm64Ty, FunctionType *&X64Ty); void getThunkRetType(FunctionType *FT, AttributeList AttrList, raw_ostream &Out, Type *&Arm64RetTy, Type *&X64RetTy, SmallVectorImpl &Arm64ArgTypes, SmallVectorImpl &X64ArgTypes, bool &HasSretPtr); - void getThunkArgTypes(FunctionType *FT, AttributeList AttrList, + void getThunkArgTypes(FunctionType *FT, AttributeList AttrList, ThunkType TT, raw_ostream &Out, SmallVectorImpl &Arm64ArgTypes, SmallVectorImpl &X64ArgTypes, bool HasSretPtr); @@ -89,10 +91,11 @@ class AArch64Arm64ECCallLowering : public ModulePass { void AArch64Arm64ECCallLowering::getThunkType(FunctionType *FT, AttributeList AttrList, - bool EntryThunk, raw_ostream &Out, + ThunkType TT, raw_ostream &Out, FunctionType *&Arm64Ty, FunctionType *&X64Ty) { - Out << (EntryThunk ? "$ientry_thunk$cdecl$" : "$iexit_thunk$cdecl$"); + Out << (TT == ThunkType::Entry ? "$ientry_thunk$cdecl$" + : "$iexit_thunk$cdecl$"); Type *Arm64RetTy; Type *X64RetTy; @@ -102,8 +105,8 @@ void AArch64Arm64ECCallLowering::getThunkType(FunctionType *FT, // The first argument to a thunk is the called function, stored in x9. // For exit thunks, we pass the called function down to the emulator; - // for entry thunks, we just call the Arm64 function directly. - if (!EntryThunk) + // for entry/guest exit thunks, we just call the Arm64 function directly. + if (TT == ThunkType::Exit) Arm64ArgTypes.push_back(PtrTy); X64ArgTypes.push_back(PtrTy); @@ -111,14 +114,16 @@ void AArch64Arm64ECCallLowering::getThunkType(FunctionType *FT, getThunkRetType(FT, AttrList, Out, Arm64RetTy, X64RetTy, Arm64ArgTypes, X64ArgTypes, HasSretPtr); - getThunkArgTypes(FT, AttrList, Out, Arm64ArgTypes, X64ArgTypes, HasSretPtr); + getThunkArgTypes(FT, AttrList, TT, Out, Arm64ArgTypes, X64ArgTypes, + HasSretPtr); - Arm64Ty = FunctionType::get(Arm64RetTy, Arm64ArgTypes, false); + Arm64Ty = FunctionType::get(Arm64RetTy, Arm64ArgTypes, + TT == ThunkType::Entry && FT->isVarArg()); X64Ty = FunctionType::get(X64RetTy, X64ArgTypes, false); } void AArch64Arm64ECCallLowering::getThunkArgTypes( -FunctionType *FT, AttributeList AttrList, raw_ostream &Out, +FunctionType *FT, AttributeList AttrList, ThunkType TT, raw_ostream &Out, SmallVectorImpl &Arm64ArgTypes, SmallVectorImpl &X64ArgTypes, bool HasSretPtr) { @@ -151,14 +156,16 @@ void AArch64Arm64ECCallLowering::getThunkArgTypes( X64ArgTypes.push_back(I64Ty); } -// x4 -Arm64ArgTypes.push_back(PtrTy); -X64ArgTypes.push_back(PtrTy); -// x5 -Arm64ArgTypes.push_back(I64Ty); -// FIXME: x5 isn't actually passed/used by the x64 side; revisit once we -// have proper isel for varargs -X64ArgTypes.push_back(I64Ty); +if (TT != ThunkType::Entry) { + // x4 + Arm64ArgTypes.push_back(PtrTy); + X64ArgTypes.push_back(PtrTy); + // x5 + Arm64ArgTypes.push_back(I64Ty); + // FIXME: x5 isn't actually passed/used by the x64 side; revisit once we + // have proper isel for varargs + X64ArgTypes.push_back(
[llvm-branch-commits] [compiler-rt] [llvm] [NFC] (PR #80762)
https://github.com/minglotus-6 created https://github.com/llvm/llvm-project/pull/80762 None ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [llvm] [NFC] (PR #80762)
bwendling wrote: Please add a title and description. https://github.com/llvm/llvm-project/pull/80762 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PR for llvm/llvm-project#80715 (PR #80716)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/80716 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits