[llvm-branch-commits] [llvm] 908c1ba - [RISCV] Fix incorrect extend type in vwmulsu combine.

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Craig Topper
Date: 2022-02-21T11:20:52-08:00
New Revision: 908c1bae6e7fa2339e8b9b7856d849f20e98f653

URL: 
https://github.com/llvm/llvm-project/commit/908c1bae6e7fa2339e8b9b7856d849f20e98f653
DIFF: 
https://github.com/llvm/llvm-project/commit/908c1bae6e7fa2339e8b9b7856d849f20e98f653.diff

LOG: [RISCV] Fix incorrect extend type in vwmulsu combine.

While matching widening multiply, if we matched an extend from i8->i32,
i16->i64 or i8->i64, we need to reintroduce a narrower extend. If we're
matching a vwmulsu we need to use a sext for op0 and a zext for op1.

This bug exists in LLVM 14 and will need to be backported.

Differential Revision: https://reviews.llvm.org/D119618

(cherry picked from commit 478c237e21b2c3a83e46f26fcbeb3876682f9b14)

Added: 


Modified: 
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll

Removed: 




diff  --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp 
b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 97d24c8e9c0b..2fe491ad5ea4 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -7443,6 +7443,8 @@ static SDValue combineMUL_VLToVWMUL_VL(SDNode *N, 
SelectionDAG &DAG,
   unsigned ExtOpc = IsSignExt ? RISCVISD::VSEXT_VL : RISCVISD::VZEXT_VL;
   if (Op0.getValueType() != NarrowVT)
 Op0 = DAG.getNode(ExtOpc, DL, NarrowVT, Op0, Mask, VL);
+  // vwmulsu requires second operand to be zero extended.
+  ExtOpc = IsVWMULSU ? RISCVISD::VZEXT_VL : ExtOpc;
   if (Op1.getValueType() != NarrowVT)
 Op1 = DAG.getNode(ExtOpc, DL, NarrowVT, Op1, Mask, VL);
 

diff  --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll 
b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll
index 6c204b24ae2b..ffe4c5613090 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll
@@ -375,7 +375,7 @@ define <2 x i32> @vwmulsu_v2i32_v2i8(<2 x i8>* %x, <2 x 
i8>* %y) {
 ; CHECK-NEXT:vle8.v v8, (a0)
 ; CHECK-NEXT:vle8.v v9, (a1)
 ; CHECK-NEXT:vsetvli zero, zero, e16, mf4, ta, mu
-; CHECK-NEXT:vsext.vf2 v10, v8
+; CHECK-NEXT:vzext.vf2 v10, v8
 ; CHECK-NEXT:vsext.vf2 v11, v9
 ; CHECK-NEXT:vwmulsu.vv v8, v11, v10
 ; CHECK-NEXT:ret
@@ -394,7 +394,7 @@ define <4 x i32> @vwmulsu_v4i32_v4i8_v4i16(<4 x i8>* %x, <4 
x i16>* %y) {
 ; CHECK-NEXT:vle8.v v8, (a0)
 ; CHECK-NEXT:vle16.v v9, (a1)
 ; CHECK-NEXT:vsetvli zero, zero, e16, mf2, ta, mu
-; CHECK-NEXT:vsext.vf2 v10, v8
+; CHECK-NEXT:vzext.vf2 v10, v8
 ; CHECK-NEXT:vwmulsu.vv v8, v9, v10
 ; CHECK-NEXT:ret
   %a = load <4 x i8>, <4 x i8>* %x



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 1e34070 - [PowerPC] Add default handling for single element vectors, and split/promote vNi1 vectors.

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Amy Kwan
Date: 2022-02-21T11:20:52-08:00
New Revision: 1e340705f142ee7379272f94df888d6ddaaa32db

URL: 
https://github.com/llvm/llvm-project/commit/1e340705f142ee7379272f94df888d6ddaaa32db
DIFF: 
https://github.com/llvm/llvm-project/commit/1e340705f142ee7379272f94df888d6ddaaa32db.diff

LOG: [PowerPC] Add default handling for single element vectors, and 
split/promote vNi1 vectors.

This patch updates the handling of vectors in getPreferredVectorAction():

For single-element and scalable vectors, fall back to default vector 
legalization
handling. For vNi1 vectors, add handling to either split or promote them in
order to prevent the production of wide v256i1/v512i1 types.

The following assertion is fixed by this patch, as we ended up producing the
wide vector types (that are used for MMA) in the backend prior to this fix.

```
Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() &&
"Cannot BITCAST between types of different sizes!"
```

Differential Revision: https://reviews.llvm.org/D119521

(cherry picked from commit ac5a5a9cfe7c83ee5fbbc48118b4239e7e6cf6c9)

Added: 
llvm/test/CodeGen/PowerPC/p10-handle-split-promote-vec.ll

Modified: 
llvm/lib/Target/PowerPC/PPCISelLowering.h

Removed: 




diff  --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h 
b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index eb52e4aa6273..b195b1f2556a 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -765,8 +765,19 @@ namespace llvm {
 /// then the VPERM for the shuffle. All in all a very slow sequence.
 TargetLoweringBase::LegalizeTypeAction getPreferredVectorAction(MVT VT)
   const override {
-  if (!VT.isScalableVector() && VT.getVectorNumElements() != 1 &&
-  VT.getScalarSizeInBits() % 8 == 0)
+  // Default handling for scalable and single-element vectors.
+  if (VT.isScalableVector() || VT.getVectorNumElements() == 1)
+return TargetLoweringBase::getPreferredVectorAction(VT);
+
+  // Split and promote vNi1 vectors so we don't produce v256i1/v512i1
+  // types as those are only for MMA instructions.
+  if (VT.getScalarSizeInBits() == 1 && VT.getSizeInBits() > 16)
+return TypeSplitVector;
+  if (VT.getScalarSizeInBits() == 1)
+return TypePromoteInteger;
+
+  // Widen vectors that have reasonably sized elements.
+  if (VT.getScalarSizeInBits() % 8 == 0)
 return TypeWidenVector;
   return TargetLoweringBase::getPreferredVectorAction(VT);
 }

diff  --git a/llvm/test/CodeGen/PowerPC/p10-handle-split-promote-vec.ll 
b/llvm/test/CodeGen/PowerPC/p10-handle-split-promote-vec.ll
new file mode 100644
index ..ad0bd404d313
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/p10-handle-split-promote-vec.ll
@@ -0,0 +1,212 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
+; RUN:   -mcpu=pwr10 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr \
+; RUN:   < %s | FileCheck %s
+; RUN: llc -verify-machineinstrs -mtriple=powerpc64-ibm-aix -vec-extabi \
+; RUN:   -mcpu=pwr10 < %s | FileCheck %s -check-prefix=CHECK-AIX
+
+define i32 @SplitPromoteVectorTest(i32 %Opc) align 2 {
+; CHECK-LABEL: SplitPromoteVectorTest:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:plxv v3, .LCPI0_0@PCREL(0), 1
+; CHECK-NEXT:mtvsrws v2, r3
+; CHECK-NEXT:li r5, 4
+; CHECK-NEXT:li r8, 0
+; CHECK-NEXT:vcmpequw v3, v2, v3
+; CHECK-NEXT:vextubrx r6, r5, v3
+; CHECK-NEXT:vextubrx r4, r8, v3
+; CHECK-NEXT:rlwimi r4, r6, 1, 30, 30
+; CHECK-NEXT:li r6, 8
+; CHECK-NEXT:vextubrx r7, r6, v3
+; CHECK-NEXT:rlwimi r4, r7, 2, 29, 29
+; CHECK-NEXT:li r7, 12
+; CHECK-NEXT:vextubrx r9, r7, v3
+; CHECK-NEXT:plxv v3, .LCPI0_1@PCREL(0), 1
+; CHECK-NEXT:rlwimi r4, r9, 3, 28, 28
+; CHECK-NEXT:vcmpequw v3, v2, v3
+; CHECK-NEXT:vextubrx r9, r8, v3
+; CHECK-NEXT:rlwimi r4, r9, 4, 27, 27
+; CHECK-NEXT:vextubrx r9, r5, v3
+; CHECK-NEXT:rlwimi r4, r9, 5, 26, 26
+; CHECK-NEXT:vextubrx r9, r6, v3
+; CHECK-NEXT:rlwimi r4, r9, 6, 25, 25
+; CHECK-NEXT:vextubrx r9, r7, v3
+; CHECK-NEXT:plxv v3, .LCPI0_2@PCREL(0), 1
+; CHECK-NEXT:rlwimi r4, r9, 7, 24, 24
+; CHECK-NEXT:vcmpequw v3, v2, v3
+; CHECK-NEXT:vextubrx r9, r8, v3
+; CHECK-NEXT:rlwimi r4, r9, 8, 23, 23
+; CHECK-NEXT:vextubrx r9, r5, v3
+; CHECK-NEXT:rlwimi r4, r9, 9, 22, 22
+; CHECK-NEXT:vextubrx r9, r6, v3
+; CHECK-NEXT:rlwimi r4, r9, 10, 21, 21
+; CHECK-NEXT:vextubrx r9, r7, v3
+; CHECK-NEXT:plxv v3, .LCPI0_3@PCREL(0), 1
+; CHECK-NEXT:rlwimi r4, r9, 11, 20, 20
+; CHECK-NEXT:vcmpequw v3, v2, v3
+; CHECK-NEXT:vextubrx r9, r8, v3
+; CHECK-NEXT:rlwimi r4, r9, 12, 19, 19
+; CHECK-NEXT:vextubrx r9, r5, v3
+; CHECK-NEXT:rlwimi r4, 

[llvm-branch-commits] [llvm] 7d8e83d - [funcattrs] check reachability to improve noreturn

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Nick Desaulniers
Date: 2022-02-21T11:52:11-08:00
New Revision: 7d8e83dab37af0516bd9bbaafe818c29fbab7062

URL: 
https://github.com/llvm/llvm-project/commit/7d8e83dab37af0516bd9bbaafe818c29fbab7062
DIFF: 
https://github.com/llvm/llvm-project/commit/7d8e83dab37af0516bd9bbaafe818c29fbab7062.diff

LOG: [funcattrs] check reachability to improve noreturn

There was a fixme in the code pertaining to attributing functions as
noreturn.  By using reachability, if none of the blocks that are
reachable from the entry return, then the function is noreturn.

Previously, the code only checked if any blocks returned. If they're
unreachable, then they don't matter.

This improves codegen for the Linux kernel.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/1563

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119571

(cherry picked from commit 9dcb0061657e9b7f321fa6c295960c8f829ed6f1)

Added: 


Modified: 
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
llvm/test/Transforms/FunctionAttrs/noreturn.ll

Removed: 




diff  --git a/llvm/lib/Transforms/IPO/FunctionAttrs.cpp 
b/llvm/lib/Transforms/IPO/FunctionAttrs.cpp
index 213a998d5bba2..e2f1944cee63f 100644
--- a/llvm/lib/Transforms/IPO/FunctionAttrs.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionAttrs.cpp
@@ -1614,6 +1614,26 @@ static bool basicBlockCanReturn(BasicBlock &BB) {
   return none_of(BB, instructionDoesNotReturn);
 }
 
+// FIXME: this doesn't handle recursion.
+static bool canReturn(Function &F) {
+  SmallVector Worklist;
+  SmallPtrSet Visited;
+
+  Visited.insert(&F.front());
+  Worklist.push_back(&F.front());
+
+  do {
+BasicBlock *BB = Worklist.pop_back_val();
+if (basicBlockCanReturn(*BB))
+  return true;
+for (BasicBlock *Succ : successors(BB))
+  if (Visited.insert(Succ).second)
+Worklist.push_back(Succ);
+  } while (!Worklist.empty());
+
+  return false;
+}
+
 // Set the noreturn function attribute if possible.
 static void addNoReturnAttrs(const SCCNodeSet &SCCNodes,
  SmallSet &Changed) {
@@ -1622,9 +1642,7 @@ static void addNoReturnAttrs(const SCCNodeSet &SCCNodes,
 F->doesNotReturn())
   continue;
 
-// The function can return if any basic blocks can return.
-// FIXME: this doesn't handle recursion or unreachable blocks.
-if (none_of(*F, basicBlockCanReturn)) {
+if (!canReturn(*F)) {
   F->setDoesNotReturn();
   Changed.insert(F);
 }

diff  --git a/llvm/test/Transforms/FunctionAttrs/noreturn.ll 
b/llvm/test/Transforms/FunctionAttrs/noreturn.ll
index eba56c9630adb..6bc1e32ed5165 100644
--- a/llvm/test/Transforms/FunctionAttrs/noreturn.ll
+++ b/llvm/test/Transforms/FunctionAttrs/noreturn.ll
@@ -40,9 +40,8 @@ end:
   ret i32 %c
 }
 
-; CHECK-NOT: Function Attrs: {{.*}}noreturn
+; CHECK: Function Attrs: {{.*}}noreturn
 ; CHECK: @caller5()
-; We currently don't handle unreachable blocks.
 define i32 @caller5() {
 entry:
   %c = call i32 @noreturn()
@@ -87,4 +86,4 @@ define void @coro() "coroutine.presplit"="1" {
 }
 
 declare token @llvm.coro.id.retcon.once(i32 %size, i32 %align, i8* %buffer, 
i8* %prototype, i8* %alloc, i8* %free)
-declare i1 @llvm.coro.end(i8*, i1)
\ No newline at end of file
+declare i1 @llvm.coro.end(i8*, i1)



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] e1b3afb - [SLP] Simplify indices processing for insertelements

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Anton Afanasyev
Date: 2022-02-21T11:51:51-08:00
New Revision: e1b3afbbdef10a8ee085dddfb24afddef2cf70b2

URL: 
https://github.com/llvm/llvm-project/commit/e1b3afbbdef10a8ee085dddfb24afddef2cf70b2
DIFF: 
https://github.com/llvm/llvm-project/commit/e1b3afbbdef10a8ee085dddfb24afddef2cf70b2.diff

LOG: [SLP] Simplify indices processing for insertelements

Get rid of non-constant and undef indices of insertelements
at `buildTree()` stage. Fix bugs.

Differential Revision: https://reviews.llvm.org/D119623

Added: 
llvm/test/Transforms/SLPVectorizer/X86/insert-crash-index.ll

Modified: 
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp 
b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 25bf69729c70f..8a58b52df45d0 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -666,19 +666,18 @@ static void inversePermutation(ArrayRef Indices,
 
 /// \returns inserting index of InsertElement or InsertValue instruction,
 /// using Offset as base offset for index.
-static Optional getInsertIndex(Value *InsertInst, unsigned Offset) {
+static Optional getInsertIndex(Value *InsertInst,
+ unsigned Offset = 0) {
   int Index = Offset;
   if (auto *IE = dyn_cast(InsertInst)) {
 if (auto *CI = dyn_cast(IE->getOperand(2))) {
   auto *VT = cast(IE->getType());
   if (CI->getValue().uge(VT->getNumElements()))
-return UndefMaskElem;
+return None;
   Index *= VT->getNumElements();
   Index += CI->getZExtValue();
   return Index;
 }
-if (isa(IE->getOperand(2)))
-  return UndefMaskElem;
 return None;
   }
 
@@ -3848,13 +3847,15 @@ void BoUpSLP::buildTree_rec(ArrayRef VL, 
unsigned Depth,
   // Check that we have a buildvector and not a shuffle of 2 or more
   // 
diff erent vectors.
   ValueSet SourceVectors;
-  int MinIdx = std::numeric_limits::max();
   for (Value *V : VL) {
 SourceVectors.insert(cast(V)->getOperand(0));
-Optional Idx = *getInsertIndex(V, 0);
-if (!Idx || *Idx == UndefMaskElem)
-  continue;
-MinIdx = std::min(MinIdx, *Idx);
+if (getInsertIndex(V) == None) {
+  LLVM_DEBUG(dbgs() << "SLP: Gather of insertelement vectors with "
+   "non-constant or undef index.\n");
+  newTreeEntry(VL, None /*not vectorized*/, S, UserTreeIdx);
+  BS.cancelScheduling(VL, VL0);
+  return;
+}
   }
 
   if (count_if(VL, [&SourceVectors](Value *V) {
@@ -3876,10 +3877,8 @@ void BoUpSLP::buildTree_rec(ArrayRef VL, 
unsigned Depth,
 decltype(OrdCompare)>
   Indices(OrdCompare);
   for (int I = 0, E = VL.size(); I < E; ++I) {
-Optional Idx = *getInsertIndex(VL[I], 0);
-if (!Idx || *Idx == UndefMaskElem)
-  continue;
-Indices.emplace(*Idx, I);
+unsigned Idx = *getInsertIndex(VL[I]);
+Indices.emplace(Idx, I);
   }
   OrdersType CurrentOrder(VL.size(), VL.size());
   bool IsIdentity = true;
@@ -5006,12 +5005,10 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry 
*E,
   SmallVector PrevMask(NumElts, UndefMaskElem);
   Mask.swap(PrevMask);
   for (unsigned I = 0; I < NumScalars; ++I) {
-Optional InsertIdx = getInsertIndex(VL[PrevMask[I]], 0);
-if (!InsertIdx || *InsertIdx == UndefMaskElem)
-  continue;
-DemandedElts.setBit(*InsertIdx);
-IsIdentity &= *InsertIdx - Offset == I;
-Mask[*InsertIdx - Offset] = I;
+unsigned InsertIdx = *getInsertIndex(VL[PrevMask[I]]);
+DemandedElts.setBit(InsertIdx);
+IsIdentity &= InsertIdx - Offset == I;
+Mask[InsertIdx - Offset] = I;
   }
   assert(Offset < NumElts && "Failed to find vector index offset");
 
@@ -5685,9 +5682,7 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef 
VectorizedVals) {
 // to detect it as a final shuffled/identity match.
 if (auto *VU = dyn_cast_or_null(EU.User)) {
   if (auto *FTy = dyn_cast(VU->getType())) {
-Optional InsertIdx = getInsertIndex(VU, 0);
-if (!InsertIdx || *InsertIdx == UndefMaskElem)
-  continue;
+unsigned InsertIdx = *getInsertIndex(VU);
 auto *It = find_if(FirstUsers, [VU](Value *V) {
   return areTwoInsertFromSameBuildVector(VU,
  cast(V));
@@ -5717,9 +5712,8 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef 
VectorizedVals) {
 } else {
   VecId = std::distance(FirstUsers.begin(), It);
 }
-int Idx = *InsertIdx;
-ShuffleMask[VecId][Idx] = EU.Lane;
-DemandedElts[VecId].setBit(Idx);
+ShuffleMask[VecId][InsertIdx] = EU.Lane;
+DemandedElts[VecId]

[llvm-branch-commits] [libcxx] c06cc1c - [libc++] Fix std::__debug_less in c++17.

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Jordan Rupprecht
Date: 2022-02-21T11:52:32-08:00
New Revision: c06cc1c3a7f8e20f6243433d3d15986d8758be30

URL: 
https://github.com/llvm/llvm-project/commit/c06cc1c3a7f8e20f6243433d3d15986d8758be30
DIFF: 
https://github.com/llvm/llvm-project/commit/c06cc1c3a7f8e20f6243433d3d15986d8758be30.diff

LOG: [libc++] Fix std::__debug_less in c++17.

b07b5bd72716625e0976a84d23652d94d8d0165a adds a use of `__comp_ref_type.h` to 
`std::min`. When libc++ is built with `-D_LIBCPP_DEBUG=0`, this enables 
`std::__debug_less`, which is only marked constexpr after c++17.

`std::min` itself is marked as being `constexpr` as of c++14, so by extension, 
`std::__debug_less` should also be marked `constexpr` for the same versions so 
that `std::min` can use it. This change lowers the guard from `> 17` to `> 11`.

Reproducer in godbolt: https://godbolt.org/z/ans3TGsj8

```

constexpr int x() { return std::min({1, 2, 3, 4}); }

static_assert(x() == 1);
```

Reviewed By: #libc, philnik, Quuxplusone, ldionne

Differential Revision: https://reviews.llvm.org/D118940

(cherry picked from commit 99e5c5256ff2eeebd977f2c82567f52b1e534a1f)

Added: 


Modified: 
libcxx/include/__algorithm/comp_ref_type.h
libcxx/test/libcxx/algorithms/debug_less.pass.cpp

Removed: 




diff  --git a/libcxx/include/__algorithm/comp_ref_type.h 
b/libcxx/include/__algorithm/comp_ref_type.h
index 6cc6405686f5d..0802d2496f5c0 100644
--- a/libcxx/include/__algorithm/comp_ref_type.h
+++ b/libcxx/include/__algorithm/comp_ref_type.h
@@ -28,11 +28,11 @@ template 
 struct __debug_less
 {
 _Compare &__comp_;
-_LIBCPP_CONSTEXPR_AFTER_CXX17
+_LIBCPP_CONSTEXPR_AFTER_CXX11
 __debug_less(_Compare& __c) : __comp_(__c) {}
 
 template 
-_LIBCPP_CONSTEXPR_AFTER_CXX17
+_LIBCPP_CONSTEXPR_AFTER_CXX11
 bool operator()(const _Tp& __x,  const _Up& __y)
 {
 bool __r = __comp_(__x, __y);
@@ -42,7 +42,7 @@ struct __debug_less
 }
 
 template 
-_LIBCPP_CONSTEXPR_AFTER_CXX17
+_LIBCPP_CONSTEXPR_AFTER_CXX11
 bool operator()(_Tp& __x,  _Up& __y)
 {
 bool __r = __comp_(__x, __y);
@@ -52,7 +52,7 @@ struct __debug_less
 }
 
 template 
-_LIBCPP_CONSTEXPR_AFTER_CXX17
+_LIBCPP_CONSTEXPR_AFTER_CXX11
 inline _LIBCPP_INLINE_VISIBILITY
 decltype((void)declval<_Compare&>()(
 declval<_LHS &>(), declval<_RHS &>()))
@@ -62,7 +62,7 @@ struct __debug_less
 }
 
 template 
-_LIBCPP_CONSTEXPR_AFTER_CXX17
+_LIBCPP_CONSTEXPR_AFTER_CXX11
 inline _LIBCPP_INLINE_VISIBILITY
 void __do_compare_assert(long, _LHS &, _RHS &) {}
 };

diff  --git a/libcxx/test/libcxx/algorithms/debug_less.pass.cpp 
b/libcxx/test/libcxx/algorithms/debug_less.pass.cpp
index 6dd56955001d0..46735cd31c5cf 100644
--- a/libcxx/test/libcxx/algorithms/debug_less.pass.cpp
+++ b/libcxx/test/libcxx/algorithms/debug_less.pass.cpp
@@ -270,7 +270,7 @@ void test_value_categories() {
 assert(dl(static_cast(1), static_cast(2)));
 }
 
-#if TEST_STD_VER > 17
+#if TEST_STD_VER > 11
 constexpr bool test_constexpr() {
 std::less<> cmp{};
 __debug_less > dcmp(cmp);
@@ -287,7 +287,7 @@ int main(int, char**) {
 test_non_const_arg_cmp();
 test_value_iterator();
 test_value_categories();
-#if TEST_STD_VER > 17
+#if TEST_STD_VER > 11
 static_assert(test_constexpr(), "");
 #endif
 return 0;



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] f3cfaf8 - [MemoryBuiltins][FIX] Adjust index type size properly wrt. AS casts

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Johannes Doerfert
Date: 2022-02-21T13:47:25-08:00
New Revision: f3cfaf8bc8eb772417294c2d3c847cd750337224

URL: 
https://github.com/llvm/llvm-project/commit/f3cfaf8bc8eb772417294c2d3c847cd750337224
DIFF: 
https://github.com/llvm/llvm-project/commit/f3cfaf8bc8eb772417294c2d3c847cd750337224.diff

LOG: [MemoryBuiltins][FIX] Adjust index type size properly wrt. AS casts

Use existing functionality to strip constant offsets that works well
with AS casts and avoids the code duplication.

Since we strip AS casts during the computation of the offset we also
need to adjust the APInt properly to avoid mismatches in the bit width.
This code ensures the caller of `compute` sees APInts that match the
index type size of the value passed to `compute`, not the value result
of the strip pointer cast.

Fixes #53559.

Differential Revision: https://reviews.llvm.org/D118727

(cherry picked from commit 29c8ebad10b4c349a185438fed52e08426d603e1)

Added: 


Modified: 
llvm/include/llvm/Analysis/MemoryBuiltins.h
llvm/lib/Analysis/MemoryBuiltins.cpp
llvm/test/Transforms/InstCombine/builtin-dynamic-object-size.ll

Removed: 




diff  --git a/llvm/include/llvm/Analysis/MemoryBuiltins.h 
b/llvm/include/llvm/Analysis/MemoryBuiltins.h
index d5b60ee540e06..ce4413682bdc8 100644
--- a/llvm/include/llvm/Analysis/MemoryBuiltins.h
+++ b/llvm/include/llvm/Analysis/MemoryBuiltins.h
@@ -210,7 +210,6 @@ class ObjectSizeOffsetVisitor
   SizeOffsetType visitConstantPointerNull(ConstantPointerNull&);
   SizeOffsetType visitExtractElementInst(ExtractElementInst &I);
   SizeOffsetType visitExtractValueInst(ExtractValueInst &I);
-  SizeOffsetType visitGEPOperator(GEPOperator &GEP);
   SizeOffsetType visitGlobalAlias(GlobalAlias &GA);
   SizeOffsetType visitGlobalVariable(GlobalVariable &GV);
   SizeOffsetType visitIntToPtrInst(IntToPtrInst&);
@@ -221,6 +220,7 @@ class ObjectSizeOffsetVisitor
   SizeOffsetType visitInstruction(Instruction &I);
 
 private:
+  SizeOffsetType computeImpl(Value *V);
   bool CheckedZextOrTrunc(APInt &I);
 };
 

diff  --git a/llvm/lib/Analysis/MemoryBuiltins.cpp 
b/llvm/lib/Analysis/MemoryBuiltins.cpp
index 208f93aa1ac63..9e26f292b789a 100644
--- a/llvm/lib/Analysis/MemoryBuiltins.cpp
+++ b/llvm/lib/Analysis/MemoryBuiltins.cpp
@@ -573,18 +573,48 @@ ObjectSizeOffsetVisitor::ObjectSizeOffsetVisitor(const 
DataLayout &DL,
 }
 
 SizeOffsetType ObjectSizeOffsetVisitor::compute(Value *V) {
+  unsigned InitialIntTyBits = DL.getIndexTypeSizeInBits(V->getType());
+
+  // Stripping pointer casts can strip address space casts which can change the
+  // index type size. The invariant is that we use the value type to determine
+  // the index type size and if we stripped address space casts we have to
+  // readjust the APInt as we pass it upwards in order for the APInt to match
+  // the type the caller passed in.
+  APInt Offset(InitialIntTyBits, 0);
+  V = V->stripAndAccumulateConstantOffsets(
+  DL, Offset, /* AllowNonInbounds */ true, /* AllowInvariantGroup */ true);
+
+  // Later we use the index type size and zero but it will match the type of 
the
+  // value that is passed to computeImpl.
   IntTyBits = DL.getIndexTypeSizeInBits(V->getType());
   Zero = APInt::getZero(IntTyBits);
 
-  V = V->stripPointerCasts();
+  bool IndexTypeSizeChanged = InitialIntTyBits != IntTyBits;
+  if (!IndexTypeSizeChanged && Offset.isZero())
+return computeImpl(V);
+
+  // We stripped an address space cast that changed the index type size or we
+  // accumulated some constant offset (or both). Readjust the bit width to 
match
+  // the argument index type size and apply the offset, as required.
+  SizeOffsetType SOT = computeImpl(V);
+  if (IndexTypeSizeChanged) {
+if (knownSize(SOT) && !::CheckedZextOrTrunc(SOT.first, InitialIntTyBits))
+  SOT.first = APInt();
+if (knownOffset(SOT) && !::CheckedZextOrTrunc(SOT.second, 
InitialIntTyBits))
+  SOT.second = APInt();
+  }
+  // If the computed offset is "unknown" we cannot add the stripped offset.
+  return {SOT.first,
+  SOT.second.getBitWidth() > 1 ? SOT.second + Offset : SOT.second};
+}
+
+SizeOffsetType ObjectSizeOffsetVisitor::computeImpl(Value *V) {
   if (Instruction *I = dyn_cast(V)) {
 // If we have already seen this instruction, bail out. Cycles can happen in
 // unreachable code after constant propagation.
 if (!SeenInsts.insert(I).second)
   return unknown();
 
-if (GEPOperator *GEP = dyn_cast(V))
-  return visitGEPOperator(*GEP);
 return visit(*I);
   }
   if (Argument *A = dyn_cast(V))
@@ -597,12 +627,6 @@ SizeOffsetType ObjectSizeOffsetVisitor::compute(Value *V) {
 return visitGlobalVariable(*GV);
   if (UndefValue *UV = dyn_cast(V))
 return visitUndefValue(*UV);
-  if (ConstantExpr *CE = dyn_cast(V)) {
-if (CE->getOpcode() == Instruction::IntToPtr)
-  return unknown(); // clueless
-if (CE->getOpcode() ==

[llvm-branch-commits] [clang-tools-extra] fef110b - [clangd] Fix building SerializationTests unit test on OpenBSD

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Brad Smith
Date: 2022-02-21T13:48:22-08:00
New Revision: fef110bf8b2b3017b92c61de27ed5cd034b6b0e2

URL: 
https://github.com/llvm/llvm-project/commit/fef110bf8b2b3017b92c61de27ed5cd034b6b0e2
DIFF: 
https://github.com/llvm/llvm-project/commit/fef110bf8b2b3017b92c61de27ed5cd034b6b0e2.diff

LOG: [clangd] Fix building SerializationTests unit test on OpenBSD

This fixes building the unit tests on OpenBSD. OpenBSD does not support 
RLIMIT_AS.

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D119989

(cherry picked from commit f374c8ddf2dd4920190cac0ea81e18a74040ddda)

Added: 


Modified: 
clang-tools-extra/clangd/unittests/SerializationTests.cpp

Removed: 




diff  --git a/clang-tools-extra/clangd/unittests/SerializationTests.cpp 
b/clang-tools-extra/clangd/unittests/SerializationTests.cpp
index 290e20a082d66..6070b229f31c7 100644
--- a/clang-tools-extra/clangd/unittests/SerializationTests.cpp
+++ b/clang-tools-extra/clangd/unittests/SerializationTests.cpp
@@ -308,9 +308,9 @@ TEST(SerializationTest, CmdlTest) {
   }
 }
 
-// rlimit is part of POSIX.
+// rlimit is part of POSIX. RLIMIT_AS does not exist in OpenBSD.
 // Sanitizers use a lot of address space, so we can't apply strict limits.
-#if LLVM_ON_UNIX && !LLVM_ADDRESS_SANITIZER_BUILD &&   
\
+#if LLVM_ON_UNIX && defined(RLIMIT_AS) && !LLVM_ADDRESS_SANITIZER_BUILD && 
\
 !LLVM_MEMORY_SANITIZER_BUILD
 class ScopedMemoryLimit {
   struct rlimit OriginalLimit;



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 56ac6db - [RuntimeDyld] Fix building on OpenBSD

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Brad Smith
Date: 2022-02-21T13:48:49-08:00
New Revision: 56ac6dbc736904c3e812c575853e1e683aa0ed6a

URL: 
https://github.com/llvm/llvm-project/commit/56ac6dbc736904c3e812c575853e1e683aa0ed6a
DIFF: 
https://github.com/llvm/llvm-project/commit/56ac6dbc736904c3e812c575853e1e683aa0ed6a.diff

LOG: [RuntimeDyld] Fix building on OpenBSD

With https://reviews.llvm.org/D105466 the tree does not build on OpenBSD/amd64.
Moritz suggested only building this code on Linux.

Reviewed By: MoritzS

Differential Revision: https://reviews.llvm.org/D119991

(cherry picked from commit 7db1d4d8da4d4dfc5d0240825e8c4d536a12b19c)

Added: 


Modified: 
llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp

Removed: 




diff  --git a/llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp 
b/llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp
index 21339a3f8f3d8..893d8a55c8950 100644
--- a/llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp
+++ b/llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp
@@ -286,7 +286,7 @@ class TrivialMemoryManager : public RTDyldMemoryManager {
   uintptr_t SlabSize = 0;
   uintptr_t CurrentSlabOffset = 0;
   SectionIDMap *SecIDMap = nullptr;
-#if defined(__x86_64__) && defined(__ELF__)
+#if defined(__x86_64__) && defined(__ELF__) && defined(__linux__)
   unsigned UsedTLSStorage = 0;
 #endif
 };
@@ -350,7 +350,7 @@ uint8_t 
*TrivialMemoryManager::allocateDataSection(uintptr_t Size,
 
 // In case the execution needs TLS storage, we define a very small TLS memory
 // area here that will be used in allocateTLSSection().
-#if defined(__x86_64__) && defined(__ELF__)
+#if defined(__x86_64__) && defined(__ELF__) && defined(__linux__)
 extern "C" {
 alignas(16) __attribute__((visibility("hidden"), tls_model("initial-exec"),
used)) thread_local char LLVMRTDyldTLSSpace[16];
@@ -361,7 +361,7 @@ TrivialMemoryManager::TLSSection
 TrivialMemoryManager::allocateTLSSection(uintptr_t Size, unsigned Alignment,
  unsigned SectionID,
  StringRef SectionName) {
-#if defined(__x86_64__) && defined(__ELF__)
+#if defined(__x86_64__) && defined(__ELF__) && defined(__linux__)
   if (Size + UsedTLSStorage > sizeof(LLVMRTDyldTLSSpace)) {
 return {};
   }



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 9bf8897 - [OpenMP] Add RTL function to externalization RAII

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Joseph Huber
Date: 2022-02-21T13:49:10-08:00
New Revision: 9bf8897c84f797c98ec4c2e1475f5f1539f8277b

URL: 
https://github.com/llvm/llvm-project/commit/9bf8897c84f797c98ec4c2e1475f5f1539f8277b
DIFF: 
https://github.com/llvm/llvm-project/commit/9bf8897c84f797c98ec4c2e1475f5f1539f8277b.diff

LOG: [OpenMP] Add RTL function to externalization RAII

This patch adds the '_kmpc_get_hardware_num_threads_in_block'
OpenMP RTL function to the externalization RAII struct. This was getting
optimized out and then being replaced with an undefined value once added
back in, causing bugs for complex reductions.

Fixes #53909.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120076

(cherry picked from commit 74cacf212bb31f8ba837b7eb2434258dd79eaccb)

Added: 


Modified: 
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll

Removed: 




diff  --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp 
b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
index 520b6ebf9e74f..5113c0c67acc6 100644
--- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
+++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
@@ -2119,6 +2119,8 @@ struct OpenMPOpt {
OMPRTL___kmpc_barrier_simple_generic);
 ExternalizationRAII ThreadId(OMPInfoCache,
  
OMPRTL___kmpc_get_hardware_thread_id_in_block);
+ExternalizationRAII NumThreads(
+OMPInfoCache, OMPRTL___kmpc_get_hardware_num_threads_in_block);
 ExternalizationRAII WarpSize(OMPInfoCache, OMPRTL___kmpc_get_warp_size);
 
 registerAAs(IsModulePass);

diff  --git 
a/llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll 
b/llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
index b72031a9b68c0..57eaebc7e141c 100644
--- a/llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
+++ b/llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
@@ -178,7 +178,16 @@ entry:
   ret void
 }
 
-declare i32 @__kmpc_get_hardware_num_threads_in_block()
+define internal i32 @__kmpc_get_hardware_num_threads_in_block() {
+; CHECK-LABEL: define {{[^@]+}}@__kmpc_get_hardware_num_threads_in_block
+; CHECK-SAME: () #[[ATTR1]] {
+; CHECK-NEXT:[[RET:%.*]] = call i32 
@__kmpc_get_hardware_num_threads_in_block_dummy()
+; CHECK-NEXT:ret i32 [[RET]]
+;
+  %ret = call i32 @__kmpc_get_hardware_num_threads_in_block_dummy()
+  ret i32 %ret
+}
+declare i32 @__kmpc_get_hardware_num_threads_in_block_dummy()
 declare i32 @__kmpc_target_init(%struct.ident_t*, i8, i1 zeroext, i1 zeroext) 
#1
 declare void @__kmpc_target_deinit(%struct.ident_t* nocapture readnone, i8, i1 
zeroext) #1
 declare void @__kmpc_parallel_51(%struct.ident_t*, i32, i32, i32, i32, i8*, 
i8*, i8**, i64)



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] b3d3501 - [OpenMP][FIX] Eliminate race on the IsSPMD global

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Johannes Doerfert
Date: 2022-02-21T13:49:32-08:00
New Revision: b3d3501aa128e7c2cb715319918c0ca66384d3cc

URL: 
https://github.com/llvm/llvm-project/commit/b3d3501aa128e7c2cb715319918c0ca66384d3cc
DIFF: 
https://github.com/llvm/llvm-project/commit/b3d3501aa128e7c2cb715319918c0ca66384d3cc.diff

LOG: [OpenMP][FIX] Eliminate race on the IsSPMD global

The `IsSPMD` global can only be read by threads other than the main
thread *after* initialization is complete. To allow usage of
`mapping::getBlockSize` before initialization is done, we can pass the
`IsSPMD` state explicitly. This is similar to other APIs that take
`IsSPMD` explicitly to avoid such a race, e.g.,
`mapping::isInitialThreadInLevel0(IsSPMD)`

Fixes https://github.com/llvm/llvm-project/issues/53857

(cherry picked from commit 57b4c5267b7293b1990d8418477b24732ba0468b)

Added: 


Modified: 
openmp/libomptarget/DeviceRTL/include/Mapping.h
openmp/libomptarget/DeviceRTL/src/Kernel.cpp
openmp/libomptarget/DeviceRTL/src/Mapping.cpp
openmp/libomptarget/DeviceRTL/src/State.cpp

Removed: 




diff  --git a/openmp/libomptarget/DeviceRTL/include/Mapping.h 
b/openmp/libomptarget/DeviceRTL/include/Mapping.h
index 4f65d28da513f..36cfae7c5efa4 100644
--- a/openmp/libomptarget/DeviceRTL/include/Mapping.h
+++ b/openmp/libomptarget/DeviceRTL/include/Mapping.h
@@ -79,7 +79,12 @@ uint32_t getNumberOfWarpsInBlock();
 uint32_t getBlockId();
 
 /// Return the block size, thus number of threads in the block.
+///
+/// Note: The version taking \p IsSPMD mode explicitly can be used during the
+/// initialization of the target region, that is before `mapping::isSPMDMode()`
+/// can be called by any thread other than the main one.
 uint32_t getBlockSize();
+uint32_t getBlockSize(bool IsSPMD);
 
 /// Return the number of blocks in the kernel.
 uint32_t getNumberOfBlocks();

diff  --git a/openmp/libomptarget/DeviceRTL/src/Kernel.cpp 
b/openmp/libomptarget/DeviceRTL/src/Kernel.cpp
index 65b554b729731..8b7a8a2495c45 100644
--- a/openmp/libomptarget/DeviceRTL/src/Kernel.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Kernel.cpp
@@ -100,7 +100,7 @@ int32_t __kmpc_target_init(IdentTy *Ident, int8_t Mode,
   // doing any work.  mapping::getBlockSize() does not include any of the main
   // thread's warp, so none of its threads can ever be active worker threads.
   if (UseGenericStateMachine &&
-  mapping::getThreadIdInBlock() < mapping::getBlockSize())
+  mapping::getThreadIdInBlock() < mapping::getBlockSize(IsSPMD))
 genericStateMachine(Ident);
 
   return mapping::getThreadIdInBlock();

diff  --git a/openmp/libomptarget/DeviceRTL/src/Mapping.cpp 
b/openmp/libomptarget/DeviceRTL/src/Mapping.cpp
index 75a500f39d20a..7f9f837ae98e4 100644
--- a/openmp/libomptarget/DeviceRTL/src/Mapping.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Mapping.cpp
@@ -212,11 +212,14 @@ uint32_t mapping::getThreadIdInBlock() {
 
 uint32_t mapping::getWarpSize() { return impl::getWarpSize(); }
 
-uint32_t mapping::getBlockSize() {
+uint32_t mapping::getBlockSize(bool IsSPMD) {
   uint32_t BlockSize = mapping::getNumberOfProcessorElements() -
-   (!mapping::isSPMDMode() * impl::getWarpSize());
+   (!IsSPMD * impl::getWarpSize());
   return BlockSize;
 }
+uint32_t mapping::getBlockSize() {
+  return mapping::getBlockSize(mapping::isSPMDMode());
+}
 
 uint32_t mapping::getKernelSize() { return impl::getKernelSize(); }
 

diff  --git a/openmp/libomptarget/DeviceRTL/src/State.cpp 
b/openmp/libomptarget/DeviceRTL/src/State.cpp
index 800176eb5eda5..a04f5cccb1738 100644
--- a/openmp/libomptarget/DeviceRTL/src/State.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/State.cpp
@@ -236,7 +236,7 @@ struct TeamStateTy {
 TeamStateTy SHARED(TeamState);
 
 void TeamStateTy::init(bool IsSPMD) {
-  ICVState.NThreadsVar = mapping::getBlockSize();
+  ICVState.NThreadsVar = mapping::getBlockSize(IsSPMD);
   ICVState.LevelVar = 0;
   ICVState.ActiveLevelVar = 0;
   ICVState.MaxActiveLevelsVar = 1;



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 5593af7 - [Attributor][FIX] Heap2Stack needs to use the alloca AS

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Johannes Doerfert
Date: 2022-02-21T13:49:53-08:00
New Revision: 5593af72d0c53aa0f1ec1653f5bcfaaf1baeec5f

URL: 
https://github.com/llvm/llvm-project/commit/5593af72d0c53aa0f1ec1653f5bcfaaf1baeec5f
DIFF: 
https://github.com/llvm/llvm-project/commit/5593af72d0c53aa0f1ec1653f5bcfaaf1baeec5f.diff

LOG: [Attributor][FIX] Heap2Stack needs to use the alloca AS

When we move an allocation from the heap to the stack we need to
allocate it in the alloca AS and then cast the result. This also
prevents us from inserting the alloca after the allocation call but
rather right before.

Fixes https://github.com/llvm/llvm-project/issues/53858

(cherry picked from commit 8ad39fbaf23893b3384cafa0f179d35dcf3c672b)

Added: 


Modified: 
llvm/lib/Transforms/IPO/AttributorAttributes.cpp
llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
llvm/test/Transforms/OpenMP/spmdization.ll

Removed: 




diff  --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp 
b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index de36d5d89a185..6dadfebae038d 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -32,6 +32,7 @@
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Assumptions.h"
 #include "llvm/IR/Constants.h"
+#include "llvm/IR/DataLayout.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Instructions.h"
@@ -6026,13 +6027,13 @@ struct AAHeapToStackFunction final : public 
AAHeapToStack {
   else
 A.emitRemark(AI.CB, "HeapToStack", Remark);
 
+  const DataLayout &DL = A.getInfoCache().getDL();
   Value *Size;
   Optional SizeAPI = getSize(A, *this, AI);
   if (SizeAPI.hasValue()) {
 Size = ConstantInt::get(AI.CB->getContext(), *SizeAPI);
   } else {
 LLVMContext &Ctx = AI.CB->getContext();
-auto &DL = A.getInfoCache().getDL();
 ObjectSizeOpts Opts;
 ObjectSizeOffsetEvaluator Eval(DL, TLI, Ctx, Opts);
 SizeOffsetEvalType SizeOffsetPair = Eval.compute(AI.CB);
@@ -6052,14 +6053,14 @@ struct AAHeapToStackFunction final : public 
AAHeapToStack {
 max(Alignment, MaybeAlign(AlignmentAPI.getValue().getZExtValue()));
   }
 
-  unsigned AS = cast(AI.CB->getType())->getAddressSpace();
-  Instruction *Alloca =
-  new AllocaInst(Type::getInt8Ty(F->getContext()), AS, Size, Alignment,
- "", AI.CB->getNextNode());
+  // TODO: Hoist the alloca towards the function entry.
+  unsigned AS = DL.getAllocaAddrSpace();
+  Instruction *Alloca = new AllocaInst(Type::getInt8Ty(F->getContext()), 
AS,
+   Size, Alignment, "", AI.CB);
 
   if (Alloca->getType() != AI.CB->getType())
-Alloca = new BitCastInst(Alloca, AI.CB->getType(), "malloc_bc",
- Alloca->getNextNode());
+Alloca = BitCastInst::CreatePointerBitCastOrAddrSpaceCast(
+Alloca, AI.CB->getType(), "malloc_cast", AI.CB);
 
   auto *I8Ty = Type::getInt8Ty(F->getContext());
   auto *InitVal = getInitialValueOfAllocation(AI.CB, TLI, I8Ty);

diff  --git a/llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll 
b/llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
index 0f207e4027599..5ee0a6892ac69 100644
--- a/llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
+++ b/llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll
@@ -4,7 +4,12 @@
 ; RUN: opt -attributor-cgscc -enable-new-pm=0 -attributor-manifest-internal  
-attributor-annotate-decl-cs -S < %s | FileCheck %s 
--check-prefixes=CHECK,NOT_TUNIT_NPM,NOT_TUNIT_OPM,NOT_CGSCC_NPM,IS__CGSCC,ISOPM,IS__CGSCC_OPM
 ; RUN: opt -aa-pipeline=basic-aa -passes=attributor-cgscc 
-attributor-manifest-internal  -attributor-annotate-decl-cs -S < %s | FileCheck 
%s 
--check-prefixes=CHECK,NOT_TUNIT_NPM,NOT_TUNIT_OPM,NOT_CGSCC_OPM,IS__CGSCC,ISNPM,IS__CGSCC_NPM
 
+; FIXME: amdgpu doesn't claim malloc is a thing, so the test is somewhat
+; useless except the __kmpc_alloc_shared part which now also covers the 
important
+; part this test was initially designed for, make sure the "is freed" check is
+; not sufficient on a GPU.
 target triple = "amdgcn-amd-amdhsa"
+target datalayout = "A5"
 
 declare noalias i8* @malloc(i64)
 
@@ -20,6 +25,7 @@ declare void @no_sync_func(i8* nocapture %p) nofree nosync 
willreturn
 
 declare void @nofree_func(i8* nocapture %p) nofree  nosync willreturn
 
+declare void @usei8(i8* %p)
 declare void @foo(i32* %p)
 
 declare void @foo_nounw(i32* %p) nounwind nofree
@@ -663,6 +669,43 @@ define void @test16d(i8 %v, i8** %P) {
   store i8* %1, i8** %P
   ret void
 }
+
+declare i8* @__kmpc_alloc_shared(i64)
+declare void @__kmpc_free_shared(i8* nocapture, i64)
+
+define void @test17() {
+; ISOPM-LABEL: define {{[^@]+}}@test17() {
+; ISOPM-NEXT

[llvm-branch-commits] [llvm] 08ad9ae - [InstSimplify] Strip offsets once in computePointerICmp()

2022-02-21 Thread Tom Stellard via llvm-branch-commits

Author: Nikita Popov
Date: 2022-02-21T14:21:13-08:00
New Revision: 08ad9ae10f327e3a511f1df7423f508103df1525

URL: 
https://github.com/llvm/llvm-project/commit/08ad9ae10f327e3a511f1df7423f508103df1525
DIFF: 
https://github.com/llvm/llvm-project/commit/08ad9ae10f327e3a511f1df7423f508103df1525.diff

LOG: [InstSimplify] Strip offsets once in computePointerICmp()

Instead of doing an inbounds strip first and another non-inbounds
strip afterward for equality comparisons, directly do a single
inbounds or non-inbounds strip based on whether we have an equality
predicate or not.

This is NFC-ish in that the alloca equality codepath is the only
part that sees additional non-inbounds offsets now, and for that
codepath it doesn't matter whether or not the GEP is inbounds, as
it does a stronger check itself. InstCombine would infer inbounds
for such GEPs.

(cherry picked from commit f35af77573d9e80bf6e61b3fdd20fe55191e962f)

Added: 


Modified: 
llvm/lib/Analysis/InstructionSimplify.cpp

Removed: 




diff  --git a/llvm/lib/Analysis/InstructionSimplify.cpp 
b/llvm/lib/Analysis/InstructionSimplify.cpp
index 4775340b3438..60895d3ced1a 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -2588,8 +2588,14 @@ computePointerICmp(CmpInst::Predicate Pred, Value *LHS, 
Value *RHS,
   // numerous hazards. AliasAnalysis and its utilities rely on special rules
   // governing loads and stores which don't apply to icmps. Also, AliasAnalysis
   // doesn't need to guarantee pointer inequality when it says NoAlias.
-  Constant *LHSOffset = stripAndComputeConstantOffsets(DL, LHS);
-  Constant *RHSOffset = stripAndComputeConstantOffsets(DL, RHS);
+
+  // Even if an non-inbounds GEP occurs along the path we can still optimize
+  // equality comparisons concerning the result.
+  bool AllowNonInbounds = ICmpInst::isEquality(Pred);
+  Constant *LHSOffset =
+  stripAndComputeConstantOffsets(DL, LHS, AllowNonInbounds);
+  Constant *RHSOffset =
+  stripAndComputeConstantOffsets(DL, RHS, AllowNonInbounds);
 
   // If LHS and RHS are related via constant offsets to the same base
   // value, we can replace it with an icmp which just compares the offsets.
@@ -2659,17 +2665,6 @@ computePointerICmp(CmpInst::Predicate Pred, Value *LHS, 
Value *RHS,
 !CmpInst::isTrueWhenEqual(Pred));
 }
 
-// Even if an non-inbounds GEP occurs along the path we can still optimize
-// equality comparisons concerning the result. We avoid walking the whole
-// chain again by starting where the last calls to
-// stripAndComputeConstantOffsets left off and accumulate the offsets.
-Constant *LHSNoBound = stripAndComputeConstantOffsets(DL, LHS, true);
-Constant *RHSNoBound = stripAndComputeConstantOffsets(DL, RHS, true);
-if (LHS == RHS)
-  return ConstantExpr::getICmp(Pred,
-   ConstantExpr::getAdd(LHSOffset, LHSNoBound),
-   ConstantExpr::getAdd(RHSOffset, 
RHSNoBound));
-
 // If one side of the equality comparison must come from a noalias call
 // (meaning a system memory allocation function), and the other side must
 // come from a pointer that cannot overlap with dynamically-allocated



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits