from:"Philip Reames via llvm\-branch\-commits"

[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)

2024-05-28 Thread Philip Reames via llvm-branch-commits



@@ -2055,9 +2055,9 @@ MemoryDepChecker::Dependence::DepType 
MemoryDepChecker::isDependent(
   // stride multiplied by the backedge taken count, the accesses are 
independet,
   // i.e. they are far enough appart that accesses won't access the same
   // location across all loop ierations.
-  if (HasSameSize &&
-  isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist,
-   MaxStride, TypeByteSize))
+  if (HasSameSize && isSafeDependenceDistance(

preames wrote:

The doc comment on isSafeDependenceDistance needs updated.  I think it's 
correct, but there's a difference between an exact BTC and a bound on BTC.

https://github.com/llvm/llvm-project/pull/93499
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)

2024-05-28 Thread Philip Reames via llvm-branch-commits



@@ -3004,7 +3004,7 @@ void LoopAccessInfo::collectStridedAccess(Value 
*MemAccess) {
   // of various possible stride specializations, considering the alternatives
   // of using gather/scatters (if available).
 
-  const SCEV *BETakenCount = PSE->getBackedgeTakenCount();
+  const SCEV *BETakenCount = PSE->getSymbolicMaxBackedgeTakenCount();

preames wrote:

Not related to your change - but this whole block of code is just weird.  This 
is basically proving a more precise trip count, why is it in LAA at all?  
Wouldn't simply early exiting on small BTC loops be sufficient?

https://github.com/llvm/llvm-project/pull/93499
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)

2024-05-28 Thread Philip Reames via llvm-branch-commits



@@ -1506,6 +1506,16 @@ bool LoopVectorizationLegality::canVectorize(bool 
UseVPlanNativePath) {
   return false;
   }
 
+  if (isa(PSE.getBackedgeTakenCount())) {

preames wrote:

What about the other users of LAA in tree?  Have you audited them?  If not, can 
you add bailouts to ensure we're not breaking anything with this transition?

https://github.com/llvm/llvm-project/pull/93499
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)

2024-05-28 Thread Philip Reames via llvm-branch-commits



@@ -2395,7 +2395,7 @@ bool LoopAccessInfo::canAnalyzeLoop() {
   }
 
   // ScalarEvolution needs to be able to find the exit count.
-  const SCEV *ExitCount = PSE->getBackedgeTakenCount();
+  const SCEV *ExitCount = PSE->getSymbolicMaxBackedgeTakenCount();

preames wrote:

Update the comments to say bound on the btc.

https://github.com/llvm/llvm-project/pull/93499
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (PR #101464)

2024-08-01 Thread Philip Reames via llvm-branch-commits


https://github.com/preames approved this pull request.


https://github.com/llvm/llvm-project/pull/101464
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [RISCV] Use APInt in isSimpleVIDSequence to account for index overflow (#100072) (PR #101124)

2024-08-01 Thread Philip Reames via llvm-branch-commits


preames wrote:

Talked with Luke about this one offline.  On reflection, both of us are a bit 
unsure about the balance of risk vs reward here.  The miscompile is not a 
regression, and occurs in what we think is a pretty unusual configuration.  The 
fix landed recently, and while there's no known problems, there's always risk 
in a backport.  This could easily go either way, but I think we can skip 
backporting this.  

https://github.com/llvm/llvm-project/pull/101124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Revert "[RISCV] Recurse on first operand of two operand shuffles (#79180)" (PR #80238)

2024-02-13 Thread Philip Reames via llvm-branch-commits


preames wrote:

@tstellar This backport has been outstanding for a while now.

https://github.com/llvm/llvm-project/pull/80238
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [RISCV] Add subtarget features for profiles (PR #84877)

2024-04-26 Thread Philip Reames via llvm-branch-commits


https://github.com/preames approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/84877
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] release/18.x: [RISCV] Re-separate unaligned scalar and vector memory features in the backend. (PR #92143)

2024-05-15 Thread Philip Reames via llvm-branch-commits


preames wrote:

I don't think we need to backport this at all.  None of the in tree cpus fall 
into the category where the distinction is important, and I don't feel we have 
any obligation to backport support for our of tree forks.  

https://github.com/llvm/llvm-project/pull/92143
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] release/18.x: [RISCV] Re-separate unaligned scalar and vector memory features in the backend. (PR #92143)

2024-05-16 Thread Philip Reames via llvm-branch-commits


preames wrote:

I'm not strongly opposed to this or anything, but it feels questionable to be 
doing a backport to change the target-feature syntax.  My understand is that 
these are purely internal names.  This isn't a documented public interface.  

https://github.com/llvm/llvm-project/pull/92143
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/18.x: [RISCV] Re-separate unaligned scalar and vector memory features in the backend. (PR #92143)

2024-05-17 Thread Philip Reames via llvm-branch-commits


preames wrote:

I'm fine with this approach.  No strong opinion either way, but definitely 
don't let me previous comments be blocking here.  

https://github.com/llvm/llvm-project/pull/92143
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 73e9633 - [RISCV] Add test coverage for partial buildvecs idioms

2023-11-17 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2023-11-16T13:33:12-08:00
New Revision: 73e963379e4d06ca75625f63a5604c286fe37040

URL: 
https://github.com/llvm/llvm-project/commit/73e963379e4d06ca75625f63a5604c286fe37040
DIFF: 
https://github.com/llvm/llvm-project/commit/73e963379e4d06ca75625f63a5604c286fe37040.diff

LOG: [RISCV] Add test coverage for partial buildvecs idioms

Test coverage for an upcoming set of changes

Added: 


Modified: 
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll

Removed: 




diff  --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll 
b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll
index 717dfb1bfd00537..8055944fc5468f3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll
@@ -446,6 +446,25 @@ define <4 x i32> @add_general_splat(i32 %a, i32 %b, i32 
%c, i32 %d, i32 %e) {
 ; This test previously failed with an assertion failure because constant shift
 ; amounts are type legalized early.
 define void @buggy(i32 %0) #0 {
+; RV32-LABEL: buggy:
+; RV32:   # %bb.0: # %entry
+; RV32-NEXT:vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT:vmv.v.x v8, a0
+; RV32-NEXT:vadd.vv v8, v8, v8
+; RV32-NEXT:vor.vi v8, v8, 1
+; RV32-NEXT:vrgather.vi v9, v8, 0
+; RV32-NEXT:vse32.v v9, (zero)
+; RV32-NEXT:ret
+;
+; RV64-LABEL: buggy:
+; RV64:   # %bb.0: # %entry
+; RV64-NEXT:slli a0, a0, 1
+; RV64-NEXT:vsetivli zero, 4, e32, m1, ta, ma
+; RV64-NEXT:vmv.v.x v8, a0
+; RV64-NEXT:vor.vi v8, v8, 1
+; RV64-NEXT:vrgather.vi v9, v8, 0
+; RV64-NEXT:vse32.v v9, (zero)
+; RV64-NEXT:ret
 entry:
   %mul.us.us.i.3 = shl i32 %0, 1
   %1 = insertelement <4 x i32> zeroinitializer, i32 %mul.us.us.i.3, i64 0
@@ -454,3 +473,96 @@ entry:
   store <4 x i32> %3, ptr null, align 16
   ret void
 }
+
+
+define <8 x i32> @add_constant_rhs_8xi32_vector_in(<8 x i32> %vin, i32 %a, i32 
%b, i32 %c, i32 %d) {
+; CHECK-LABEL: add_constant_rhs_8xi32_vector_in:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:addi a0, a0, 23
+; CHECK-NEXT:addi a1, a1, 25
+; CHECK-NEXT:addi a2, a2, 1
+; CHECK-NEXT:addi a3, a3, 2047
+; CHECK-NEXT:addi a3, a3, 308
+; CHECK-NEXT:vsetivli zero, 2, e32, m1, tu, ma
+; CHECK-NEXT:vmv.s.x v8, a0
+; CHECK-NEXT:vmv.s.x v10, a1
+; CHECK-NEXT:vslideup.vi v8, v10, 1
+; CHECK-NEXT:vmv.s.x v10, a2
+; CHECK-NEXT:vsetivli zero, 3, e32, m1, tu, ma
+; CHECK-NEXT:vslideup.vi v8, v10, 2
+; CHECK-NEXT:vmv.s.x v10, a3
+; CHECK-NEXT:vsetivli zero, 4, e32, m1, tu, ma
+; CHECK-NEXT:vslideup.vi v8, v10, 3
+; CHECK-NEXT:ret
+  %e0 = add i32 %a, 23
+  %e1 = add i32 %b, 25
+  %e2 = add i32 %c, 1
+  %e3 = add i32 %d, 2355
+  %v0 = insertelement <8 x i32> %vin, i32 %e0, i32 0
+  %v1 = insertelement <8 x i32> %v0, i32 %e1, i32 1
+  %v2 = insertelement <8 x i32> %v1, i32 %e2, i32 2
+  %v3 = insertelement <8 x i32> %v2, i32 %e3, i32 3
+  ret <8 x i32> %v3
+}
+
+define <8 x i32> @add_constant_rhs_8xi32_vector_in2(<8 x i32> %vin, i32 %a, 
i32 %b, i32 %c, i32 %d) {
+; CHECK-LABEL: add_constant_rhs_8xi32_vector_in2:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:addi a0, a0, 23
+; CHECK-NEXT:addi a1, a1, 25
+; CHECK-NEXT:addi a2, a2, 1
+; CHECK-NEXT:addi a3, a3, 2047
+; CHECK-NEXT:addi a3, a3, 308
+; CHECK-NEXT:vsetivli zero, 5, e32, m2, tu, ma
+; CHECK-NEXT:vmv.s.x v10, a0
+; CHECK-NEXT:vslideup.vi v8, v10, 4
+; CHECK-NEXT:vmv.s.x v10, a1
+; CHECK-NEXT:vsetivli zero, 6, e32, m2, tu, ma
+; CHECK-NEXT:vslideup.vi v8, v10, 5
+; CHECK-NEXT:vmv.s.x v10, a2
+; CHECK-NEXT:vsetivli zero, 7, e32, m2, tu, ma
+; CHECK-NEXT:vslideup.vi v8, v10, 6
+; CHECK-NEXT:vmv.s.x v10, a3
+; CHECK-NEXT:vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:vslideup.vi v8, v10, 7
+; CHECK-NEXT:ret
+  %e0 = add i32 %a, 23
+  %e1 = add i32 %b, 25
+  %e2 = add i32 %c, 1
+  %e3 = add i32 %d, 2355
+  %v0 = insertelement <8 x i32> %vin, i32 %e0, i32 4
+  %v1 = insertelement <8 x i32> %v0, i32 %e1, i32 5
+  %v2 = insertelement <8 x i32> %v1, i32 %e2, i32 6
+  %v3 = insertelement <8 x i32> %v2, i32 %e3, i32 7
+  ret <8 x i32> %v3
+}
+
+define <8 x i32> @add_constant_rhs_8xi32_vector_in3(<8 x i32> %vin, i32 %a, 
i32 %b, i32 %c, i32 %d) {
+; CHECK-LABEL: add_constant_rhs_8xi32_vector_in3:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:addi a0, a0, 23
+; CHECK-NEXT:addi a1, a1, 25
+; CHECK-NEXT:addi a2, a2, 1
+; CHECK-NEXT:addi a3, a3, 2047
+; CHECK-NEXT:addi a3, a3, 308
+; CHECK-NEXT:vsetivli zero, 3, e32, m1, tu, ma
+; CHECK-NEXT:vmv.s.x v8, a0
+; CHECK-NEXT:vmv.s.x v10, a1
+; CHECK-NEXT:vslideup.vi v8, v10, 2
+; CHECK-NEXT:vmv.s.x v10, a2
+; CHECK-NEXT:vsetivli zero, 5, e32, m2, tu, ma
+; CHECK-NE

[llvm-branch-commits] [llvm] 1aa493f - [RISCV] Further expand coverage for insert_vector_elt patterns

2023-11-17 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2023-11-16T14:14:31-08:00
New Revision: 1aa493f0645395908fe77bc69bce93fd4e80b1e8

URL: 
https://github.com/llvm/llvm-project/commit/1aa493f0645395908fe77bc69bce93fd4e80b1e8
DIFF: 
https://github.com/llvm/llvm-project/commit/1aa493f0645395908fe77bc69bce93fd4e80b1e8.diff

LOG: [RISCV] Further expand coverage for insert_vector_elt patterns

Added: 
llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll

Modified: 
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll

Removed: 




diff  --git a/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll 
b/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll
new file mode 100644
index 000..9193f7aef4b8757
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll
@@ -0,0 +1,241 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv32 -mattr=+v -target-abi=ilp32 \
+; RUN: -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=CHECK,RV32
+; RUN: llc -mtriple=riscv64 -mattr=+v -target-abi=lp64 \
+; RUN: -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=CHECK,RV64
+
+define void @v4xi8_concat_vector_insert_idx0(ptr %a, ptr %b, i8 %x) {
+; CHECK-LABEL: v4xi8_concat_vector_insert_idx0:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-NEXT:vle8.v v8, (a0)
+; CHECK-NEXT:vle8.v v9, (a1)
+; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:vslideup.vi v8, v9, 2
+; CHECK-NEXT:vmv.s.x v9, a2
+; CHECK-NEXT:vsetivli zero, 2, e8, mf4, tu, ma
+; CHECK-NEXT:vslideup.vi v8, v9, 1
+; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:vse8.v v8, (a0)
+; CHECK-NEXT:ret
+  %v1 = load <2 x i8>, ptr %a
+  %v2 = load <2 x i8>, ptr %b
+  %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> 
+  %ins = insertelement <4 x i8> %concat, i8 %x, i32 1
+  store <4 x i8> %ins, ptr %a
+  ret void
+}
+
+define void @v4xi8_concat_vector_insert_idx1(ptr %a, ptr %b, i8 %x) {
+; CHECK-LABEL: v4xi8_concat_vector_insert_idx1:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-NEXT:vle8.v v8, (a0)
+; CHECK-NEXT:vle8.v v9, (a1)
+; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:vslideup.vi v8, v9, 2
+; CHECK-NEXT:vmv.s.x v9, a2
+; CHECK-NEXT:vsetivli zero, 2, e8, mf4, tu, ma
+; CHECK-NEXT:vslideup.vi v8, v9, 1
+; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:vse8.v v8, (a0)
+; CHECK-NEXT:ret
+  %v1 = load <2 x i8>, ptr %a
+  %v2 = load <2 x i8>, ptr %b
+  %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> 
+  %ins = insertelement <4 x i8> %concat, i8 %x, i32 1
+  store <4 x i8> %ins, ptr %a
+  ret void
+}
+
+define void @v4xi8_concat_vector_insert_idx2(ptr %a, ptr %b, i8 %x) {
+; CHECK-LABEL: v4xi8_concat_vector_insert_idx2:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-NEXT:vle8.v v8, (a0)
+; CHECK-NEXT:vle8.v v9, (a1)
+; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:vslideup.vi v8, v9, 2
+; CHECK-NEXT:vmv.s.x v9, a2
+; CHECK-NEXT:vsetivli zero, 3, e8, mf4, tu, ma
+; CHECK-NEXT:vslideup.vi v8, v9, 2
+; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:vse8.v v8, (a0)
+; CHECK-NEXT:ret
+  %v1 = load <2 x i8>, ptr %a
+  %v2 = load <2 x i8>, ptr %b
+  %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> 
+  %ins = insertelement <4 x i8> %concat, i8 %x, i32 2
+  store <4 x i8> %ins, ptr %a
+  ret void
+}
+
+define void @v4xi8_concat_vector_insert_idx3(ptr %a, ptr %b, i8 %x) {
+; CHECK-LABEL: v4xi8_concat_vector_insert_idx3:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-NEXT:vle8.v v8, (a0)
+; CHECK-NEXT:vle8.v v9, (a1)
+; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:vslideup.vi v8, v9, 2
+; CHECK-NEXT:vmv.s.x v9, a2
+; CHECK-NEXT:vslideup.vi v8, v9, 3
+; CHECK-NEXT:vse8.v v8, (a0)
+; CHECK-NEXT:ret
+  %v1 = load <2 x i8>, ptr %a
+  %v2 = load <2 x i8>, ptr %b
+  %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> 
+  %ins = insertelement <4 x i8> %concat, i8 %x, i32 3
+  store <4 x i8> %ins, ptr %a
+  ret void
+}
+
+define void @v4xi64_concat_vector_insert_idx0(ptr %a, ptr %b, i64 %x) {
+; RV32-LABEL: v4xi64_concat_vector_insert_idx0:
+; RV32:   # %bb.0:
+; RV32-NEXT:vsetivli zero, 2, e64, m1, ta, ma
+; RV32-NEXT:vle64.v v8, (a0)
+; RV32-NEXT:vle64.v v10, (a1)
+; RV32-NEXT:vsetivli zero, 4, e64, m2, ta, ma
+; RV32-NEXT:vslideup.vi v8, v10, 2
+; RV32-NEXT:vsetivli zero, 2, e32, m1, ta, ma
+; RV32-NEXT:vslide1down.vx v10, v8, a2
+; RV32-NEXT:vslide1down.vx v10, v10, a3
+; RV32-NEXT:vsetivli zero, 2, e64, m1, tu, ma
+; RV32-N

[llvm-branch-commits] [llvm] 233971b - [RISCV] Fix typo in a test and regen another to reduce test diff

2023-11-17 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2023-11-16T14:28:16-08:00
New Revision: 233971b475a48d9ad8c61632660a1b45186897cc

URL: 
https://github.com/llvm/llvm-project/commit/233971b475a48d9ad8c61632660a1b45186897cc
DIFF: 
https://github.com/llvm/llvm-project/commit/233971b475a48d9ad8c61632660a1b45186897cc.diff

LOG: [RISCV] Fix typo in a test and regen another to reduce test diff

Added: 


Modified: 
llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll

Removed: 




diff  --git a/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll 
b/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll
index 9193f7aef4b8757..3fc22818a2406a5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll
@@ -12,16 +12,14 @@ define void @v4xi8_concat_vector_insert_idx0(ptr %a, ptr 
%b, i8 %x) {
 ; CHECK-NEXT:vle8.v v9, (a1)
 ; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma
 ; CHECK-NEXT:vslideup.vi v8, v9, 2
-; CHECK-NEXT:vmv.s.x v9, a2
-; CHECK-NEXT:vsetivli zero, 2, e8, mf4, tu, ma
-; CHECK-NEXT:vslideup.vi v8, v9, 1
-; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma
+; CHECK-NEXT:vsetvli zero, zero, e8, mf4, tu, ma
+; CHECK-NEXT:vmv.s.x v8, a2
 ; CHECK-NEXT:vse8.v v8, (a0)
 ; CHECK-NEXT:ret
   %v1 = load <2 x i8>, ptr %a
   %v2 = load <2 x i8>, ptr %b
   %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> 
-  %ins = insertelement <4 x i8> %concat, i8 %x, i32 1
+  %ins = insertelement <4 x i8> %concat, i8 %x, i32 0
   store <4 x i8> %ins, ptr %a
   ret void
 }
@@ -98,11 +96,9 @@ define void @v4xi64_concat_vector_insert_idx0(ptr %a, ptr 
%b, i64 %x) {
 ; RV32-NEXT:vle64.v v10, (a1)
 ; RV32-NEXT:vsetivli zero, 4, e64, m2, ta, ma
 ; RV32-NEXT:vslideup.vi v8, v10, 2
-; RV32-NEXT:vsetivli zero, 2, e32, m1, ta, ma
-; RV32-NEXT:vslide1down.vx v10, v8, a2
-; RV32-NEXT:vslide1down.vx v10, v10, a3
-; RV32-NEXT:vsetivli zero, 2, e64, m1, tu, ma
-; RV32-NEXT:vslideup.vi v8, v10, 1
+; RV32-NEXT:vsetivli zero, 2, e32, m1, tu, ma
+; RV32-NEXT:vslide1down.vx v8, v8, a2
+; RV32-NEXT:vslide1down.vx v8, v8, a3
 ; RV32-NEXT:vsetivli zero, 4, e64, m2, ta, ma
 ; RV32-NEXT:vse64.v v8, (a0)
 ; RV32-NEXT:ret
@@ -114,16 +110,14 @@ define void @v4xi64_concat_vector_insert_idx0(ptr %a, ptr 
%b, i64 %x) {
 ; RV64-NEXT:vle64.v v10, (a1)
 ; RV64-NEXT:vsetivli zero, 4, e64, m2, ta, ma
 ; RV64-NEXT:vslideup.vi v8, v10, 2
-; RV64-NEXT:vmv.s.x v10, a2
-; RV64-NEXT:vsetivli zero, 2, e64, m1, tu, ma
-; RV64-NEXT:vslideup.vi v8, v10, 1
-; RV64-NEXT:vsetivli zero, 4, e64, m2, ta, ma
+; RV64-NEXT:vsetvli zero, zero, e64, m2, tu, ma
+; RV64-NEXT:vmv.s.x v8, a2
 ; RV64-NEXT:vse64.v v8, (a0)
 ; RV64-NEXT:ret
   %v1 = load <2 x i64>, ptr %a
   %v2 = load <2 x i64>, ptr %b
   %concat = shufflevector <2 x i64> %v1, <2 x i64> %v2, <4 x i32> 
-  %ins = insertelement <4 x i64> %concat, i64 %x, i32 1
+  %ins = insertelement <4 x i64> %concat, i64 %x, i32 0
   store <4 x i64> %ins, ptr %a
   ret void
 }

diff  --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll 
b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
index d1ea56a1ff93819..2d8bae7092242d3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
@@ -1080,6 +1080,13 @@ define <32 x double> @buildvec_v32f64(double %e0, double 
%e1, double %e2, double
 ; FIXME: These constants have enough sign bits that we could use vmv.v.x/i and
 ; vsext, but we don't support this for FP yet.
 define <2 x float> @signbits() {
+; CHECK-LABEL: signbits:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:lui a0, %hi(.LCPI24_0)
+; CHECK-NEXT:addi a0, a0, %lo(.LCPI24_0)
+; CHECK-NEXT:vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:vle32.v v8, (a0)
+; CHECK-NEXT:ret
 entry:
   ret <2 x float> 
 }



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 52b413f - [RISCV] Precommit tests for buildvector lowering with exact VLEN

2023-11-27 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2023-11-27T16:48:20-08:00
New Revision: 52b413f25ae79b07df88c0224adec4a6d7dabecc

URL: 
https://github.com/llvm/llvm-project/commit/52b413f25ae79b07df88c0224adec4a6d7dabecc
DIFF: 
https://github.com/llvm/llvm-project/commit/52b413f25ae79b07df88c0224adec4a6d7dabecc.diff

LOG: [RISCV] Precommit tests for buildvector lowering with exact VLEN

Added: 


Modified: 
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll

Removed: 




diff  --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll 
b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
index 05aa5f9807b9fc4..31ed3083e05a114 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
@@ -1077,13 +1077,252 @@ define <32 x double> @buildvec_v32f64(double %e0, 
double %e1, double %e2, double
   ret <32 x double> %v31
 }
 
+define <32 x double> @buildvec_v32f64_exact_vlen(double %e0, double %e1, 
double %e2, double %e3, double %e4, double %e5, double %e6, double %e7, double 
%e8, double %e9, double %e10, double %e11, double %e12, double %e13, double 
%e14, double %e15, double %e16, double %e17, double %e18, double %e19, double 
%e20, double %e21, double %e22, double %e23, double %e24, double %e25, double 
%e26, double %e27, double %e28, double %e29, double %e30, double %e31) 
vscale_range(2,2) {
+; RV32-LABEL: buildvec_v32f64_exact_vlen:
+; RV32:   # %bb.0:
+; RV32-NEXT:addi sp, sp, -512
+; RV32-NEXT:.cfi_def_cfa_offset 512
+; RV32-NEXT:sw ra, 508(sp) # 4-byte Folded Spill
+; RV32-NEXT:sw s0, 504(sp) # 4-byte Folded Spill
+; RV32-NEXT:fsd fs0, 496(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs1, 488(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs2, 480(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs3, 472(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs4, 464(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs5, 456(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs6, 448(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs7, 440(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs8, 432(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs9, 424(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs10, 416(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs11, 408(sp) # 8-byte Folded Spill
+; RV32-NEXT:.cfi_offset ra, -4
+; RV32-NEXT:.cfi_offset s0, -8
+; RV32-NEXT:.cfi_offset fs0, -16
+; RV32-NEXT:.cfi_offset fs1, -24
+; RV32-NEXT:.cfi_offset fs2, -32
+; RV32-NEXT:.cfi_offset fs3, -40
+; RV32-NEXT:.cfi_offset fs4, -48
+; RV32-NEXT:.cfi_offset fs5, -56
+; RV32-NEXT:.cfi_offset fs6, -64
+; RV32-NEXT:.cfi_offset fs7, -72
+; RV32-NEXT:.cfi_offset fs8, -80
+; RV32-NEXT:.cfi_offset fs9, -88
+; RV32-NEXT:.cfi_offset fs10, -96
+; RV32-NEXT:.cfi_offset fs11, -104
+; RV32-NEXT:addi s0, sp, 512
+; RV32-NEXT:.cfi_def_cfa s0, 0
+; RV32-NEXT:andi sp, sp, -128
+; RV32-NEXT:sw a0, 120(sp)
+; RV32-NEXT:sw a1, 124(sp)
+; RV32-NEXT:fld ft0, 120(sp)
+; RV32-NEXT:sw a2, 120(sp)
+; RV32-NEXT:sw a3, 124(sp)
+; RV32-NEXT:fld ft1, 120(sp)
+; RV32-NEXT:sw a4, 120(sp)
+; RV32-NEXT:sw a5, 124(sp)
+; RV32-NEXT:fld ft2, 120(sp)
+; RV32-NEXT:sw a6, 120(sp)
+; RV32-NEXT:sw a7, 124(sp)
+; RV32-NEXT:fld ft3, 120(sp)
+; RV32-NEXT:fld ft4, 0(s0)
+; RV32-NEXT:fld ft5, 8(s0)
+; RV32-NEXT:fld ft6, 16(s0)
+; RV32-NEXT:fld ft7, 24(s0)
+; RV32-NEXT:fld ft8, 32(s0)
+; RV32-NEXT:fld ft9, 40(s0)
+; RV32-NEXT:fld ft10, 48(s0)
+; RV32-NEXT:fld ft11, 56(s0)
+; RV32-NEXT:fld fs0, 64(s0)
+; RV32-NEXT:fld fs1, 72(s0)
+; RV32-NEXT:fld fs2, 80(s0)
+; RV32-NEXT:fld fs3, 88(s0)
+; RV32-NEXT:fld fs4, 96(s0)
+; RV32-NEXT:fld fs5, 104(s0)
+; RV32-NEXT:fld fs6, 112(s0)
+; RV32-NEXT:fld fs7, 120(s0)
+; RV32-NEXT:fld fs8, 152(s0)
+; RV32-NEXT:fld fs9, 144(s0)
+; RV32-NEXT:fld fs10, 136(s0)
+; RV32-NEXT:fld fs11, 128(s0)
+; RV32-NEXT:fsd fs8, 248(sp)
+; RV32-NEXT:fsd fs9, 240(sp)
+; RV32-NEXT:fsd fs10, 232(sp)
+; RV32-NEXT:fsd fs11, 224(sp)
+; RV32-NEXT:fsd fs7, 216(sp)
+; RV32-NEXT:fsd fs6, 208(sp)
+; RV32-NEXT:fsd fs5, 200(sp)
+; RV32-NEXT:fsd fs4, 192(sp)
+; RV32-NEXT:fsd fs3, 184(sp)
+; RV32-NEXT:fsd fs2, 176(sp)
+; RV32-NEXT:fsd fs1, 168(sp)
+; RV32-NEXT:fsd fs0, 160(sp)
+; RV32-NEXT:fsd ft11, 152(sp)
+; RV32-NEXT:fsd ft10, 144(sp)
+; RV32-NEXT:fsd ft9, 136(sp)
+; RV32-NEXT:fsd ft8, 128(sp)
+; RV32-NEXT:fsd ft7, 376(sp)
+; RV32-NEXT:fsd ft6, 368(sp)
+; RV32-NEXT:fsd ft5, 360(sp)
+; RV32-NEXT:fsd ft4, 352(sp)
+; RV32-NEXT:fsd fa7, 312(sp)
+; RV32-NEXT:fsd fa6, 304(sp)
+; RV32-NEXT:fsd fa5, 296(sp)
+; RV32-NEXT:fsd fa4, 288(sp)
+; RV32-NEXT:

[llvm-branch-commits] [llvm] 52b413f - [RISCV] Precommit tests for buildvector lowering with exact VLEN

2023-11-28 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2023-11-27T16:48:20-08:00
New Revision: 52b413f25ae79b07df88c0224adec4a6d7dabecc

URL: 
https://github.com/llvm/llvm-project/commit/52b413f25ae79b07df88c0224adec4a6d7dabecc
DIFF: 
https://github.com/llvm/llvm-project/commit/52b413f25ae79b07df88c0224adec4a6d7dabecc.diff

LOG: [RISCV] Precommit tests for buildvector lowering with exact VLEN

Added: 


Modified: 
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll

Removed: 




diff  --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll 
b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
index 05aa5f9807b9fc4..31ed3083e05a114 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
@@ -1077,13 +1077,252 @@ define <32 x double> @buildvec_v32f64(double %e0, 
double %e1, double %e2, double
   ret <32 x double> %v31
 }
 
+define <32 x double> @buildvec_v32f64_exact_vlen(double %e0, double %e1, 
double %e2, double %e3, double %e4, double %e5, double %e6, double %e7, double 
%e8, double %e9, double %e10, double %e11, double %e12, double %e13, double 
%e14, double %e15, double %e16, double %e17, double %e18, double %e19, double 
%e20, double %e21, double %e22, double %e23, double %e24, double %e25, double 
%e26, double %e27, double %e28, double %e29, double %e30, double %e31) 
vscale_range(2,2) {
+; RV32-LABEL: buildvec_v32f64_exact_vlen:
+; RV32:   # %bb.0:
+; RV32-NEXT:addi sp, sp, -512
+; RV32-NEXT:.cfi_def_cfa_offset 512
+; RV32-NEXT:sw ra, 508(sp) # 4-byte Folded Spill
+; RV32-NEXT:sw s0, 504(sp) # 4-byte Folded Spill
+; RV32-NEXT:fsd fs0, 496(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs1, 488(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs2, 480(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs3, 472(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs4, 464(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs5, 456(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs6, 448(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs7, 440(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs8, 432(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs9, 424(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs10, 416(sp) # 8-byte Folded Spill
+; RV32-NEXT:fsd fs11, 408(sp) # 8-byte Folded Spill
+; RV32-NEXT:.cfi_offset ra, -4
+; RV32-NEXT:.cfi_offset s0, -8
+; RV32-NEXT:.cfi_offset fs0, -16
+; RV32-NEXT:.cfi_offset fs1, -24
+; RV32-NEXT:.cfi_offset fs2, -32
+; RV32-NEXT:.cfi_offset fs3, -40
+; RV32-NEXT:.cfi_offset fs4, -48
+; RV32-NEXT:.cfi_offset fs5, -56
+; RV32-NEXT:.cfi_offset fs6, -64
+; RV32-NEXT:.cfi_offset fs7, -72
+; RV32-NEXT:.cfi_offset fs8, -80
+; RV32-NEXT:.cfi_offset fs9, -88
+; RV32-NEXT:.cfi_offset fs10, -96
+; RV32-NEXT:.cfi_offset fs11, -104
+; RV32-NEXT:addi s0, sp, 512
+; RV32-NEXT:.cfi_def_cfa s0, 0
+; RV32-NEXT:andi sp, sp, -128
+; RV32-NEXT:sw a0, 120(sp)
+; RV32-NEXT:sw a1, 124(sp)
+; RV32-NEXT:fld ft0, 120(sp)
+; RV32-NEXT:sw a2, 120(sp)
+; RV32-NEXT:sw a3, 124(sp)
+; RV32-NEXT:fld ft1, 120(sp)
+; RV32-NEXT:sw a4, 120(sp)
+; RV32-NEXT:sw a5, 124(sp)
+; RV32-NEXT:fld ft2, 120(sp)
+; RV32-NEXT:sw a6, 120(sp)
+; RV32-NEXT:sw a7, 124(sp)
+; RV32-NEXT:fld ft3, 120(sp)
+; RV32-NEXT:fld ft4, 0(s0)
+; RV32-NEXT:fld ft5, 8(s0)
+; RV32-NEXT:fld ft6, 16(s0)
+; RV32-NEXT:fld ft7, 24(s0)
+; RV32-NEXT:fld ft8, 32(s0)
+; RV32-NEXT:fld ft9, 40(s0)
+; RV32-NEXT:fld ft10, 48(s0)
+; RV32-NEXT:fld ft11, 56(s0)
+; RV32-NEXT:fld fs0, 64(s0)
+; RV32-NEXT:fld fs1, 72(s0)
+; RV32-NEXT:fld fs2, 80(s0)
+; RV32-NEXT:fld fs3, 88(s0)
+; RV32-NEXT:fld fs4, 96(s0)
+; RV32-NEXT:fld fs5, 104(s0)
+; RV32-NEXT:fld fs6, 112(s0)
+; RV32-NEXT:fld fs7, 120(s0)
+; RV32-NEXT:fld fs8, 152(s0)
+; RV32-NEXT:fld fs9, 144(s0)
+; RV32-NEXT:fld fs10, 136(s0)
+; RV32-NEXT:fld fs11, 128(s0)
+; RV32-NEXT:fsd fs8, 248(sp)
+; RV32-NEXT:fsd fs9, 240(sp)
+; RV32-NEXT:fsd fs10, 232(sp)
+; RV32-NEXT:fsd fs11, 224(sp)
+; RV32-NEXT:fsd fs7, 216(sp)
+; RV32-NEXT:fsd fs6, 208(sp)
+; RV32-NEXT:fsd fs5, 200(sp)
+; RV32-NEXT:fsd fs4, 192(sp)
+; RV32-NEXT:fsd fs3, 184(sp)
+; RV32-NEXT:fsd fs2, 176(sp)
+; RV32-NEXT:fsd fs1, 168(sp)
+; RV32-NEXT:fsd fs0, 160(sp)
+; RV32-NEXT:fsd ft11, 152(sp)
+; RV32-NEXT:fsd ft10, 144(sp)
+; RV32-NEXT:fsd ft9, 136(sp)
+; RV32-NEXT:fsd ft8, 128(sp)
+; RV32-NEXT:fsd ft7, 376(sp)
+; RV32-NEXT:fsd ft6, 368(sp)
+; RV32-NEXT:fsd ft5, 360(sp)
+; RV32-NEXT:fsd ft4, 352(sp)
+; RV32-NEXT:fsd fa7, 312(sp)
+; RV32-NEXT:fsd fa6, 304(sp)
+; RV32-NEXT:fsd fa5, 296(sp)
+; RV32-NEXT:fsd fa4, 288(sp)
+; RV32-NEXT:

[llvm-branch-commits] [llvm] Revert "[RISCV] Recurse on first operand of two operand shuffles (#79180)" (PR #80238)

2024-01-31 Thread Philip Reames via llvm-branch-commits


https://github.com/preames created 
https://github.com/llvm/llvm-project/pull/80238

This reverts commit bdc41106ee48dce59c500c9a3957af947f30c8c3 on the
release/18.x branch.  This change was the first in a mini-series
and while I'm not aware of any particular problem from having it on
it's own in the branch, it seems safer to ship with the previous
known good state.

@tstellar This is my first backport in the new process, so please bear with me 
and double check I got all pieces of this right.

>From 98e43e0054ab81e3455011933e1bdf64bd59e148 Mon Sep 17 00:00:00 2001
From: Philip Reames 
Date: Wed, 31 Jan 2024 14:44:39 -0800
Subject: [PATCH] Revert "[RISCV] Recurse on first operand of two operand
 shuffles (#79180)"

This reverts commit bdc41106ee48dce59c500c9a3957af947f30c8c3 on the
release/18.x branch.  This change was the first in a mini-series
and while I'm not aware of any particular problem from having it on
it's own in the branch, it seems safer to ship with the previous
known good state.
---
 llvm/lib/Target/RISCV/RISCVISelLowering.cpp   |  92 ++---
 .../RISCV/rvv/fixed-vectors-fp-interleave.ll  |  41 +-
 .../RISCV/rvv/fixed-vectors-int-interleave.ll |  63 +--
 .../RISCV/rvv/fixed-vectors-int-shuffles.ll   |  43 +-
 .../rvv/fixed-vectors-interleaved-access.ll   | 387 +-
 .../rvv/fixed-vectors-shuffle-transpose.ll| 128 +++---
 6 files changed, 407 insertions(+), 347 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp 
b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 47c6cd6e5487b..c8f7b5c35a381 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -5033,60 +5033,56 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, 
SelectionDAG &DAG,
   MVT IndexContainerVT =
   ContainerVT.changeVectorElementType(IndexVT.getScalarType());
 
-  // Base case for the recursion just below - handle the worst case
-  // single source permutation.  Note that all the splat variants
-  // are handled above.
-  if (V2.isUndef()) {
+  SDValue Gather;
+  // TODO: This doesn't trigger for i64 vectors on RV32, since there we
+  // encounter a bitcasted BUILD_VECTOR with low/high i32 values.
+  if (SDValue SplatValue = DAG.getSplatValue(V1, /*LegalTypes*/ true)) {
+Gather = lowerScalarSplat(SDValue(), SplatValue, VL, ContainerVT, DL, DAG,
+  Subtarget);
+  } else {
 V1 = convertToScalableVector(ContainerVT, V1, DAG, Subtarget);
-SDValue LHSIndices = DAG.getBuildVector(IndexVT, DL, GatherIndicesLHS);
-LHSIndices = convertToScalableVector(IndexContainerVT, LHSIndices, DAG,
- Subtarget);
-SDValue Gather = DAG.getNode(GatherVVOpc, DL, ContainerVT, V1, LHSIndices,
- DAG.getUNDEF(ContainerVT), TrueMask, VL);
-return convertFromScalableVector(VT, Gather, DAG, Subtarget);
-  }
-
-  // Translate the gather index we computed above (and possibly swapped)
-  // back to a shuffle mask.  This step should disappear once we complete
-  // the migration to recursive design.
-  SmallVector ShuffleMaskLHS;
-  ShuffleMaskLHS.reserve(GatherIndicesLHS.size());
-  for (SDValue GatherIndex : GatherIndicesLHS) {
-if (GatherIndex.isUndef()) {
-  ShuffleMaskLHS.push_back(-1);
-  continue;
+// If only one index is used, we can use a "splat" vrgather.
+// TODO: We can splat the most-common index and fix-up any stragglers, if
+// that's beneficial.
+if (LHSIndexCounts.size() == 1) {
+  int SplatIndex = LHSIndexCounts.begin()->getFirst();
+  Gather = DAG.getNode(GatherVXOpc, DL, ContainerVT, V1,
+   DAG.getConstant(SplatIndex, DL, XLenVT),
+   DAG.getUNDEF(ContainerVT), TrueMask, VL);
+} else {
+  SDValue LHSIndices = DAG.getBuildVector(IndexVT, DL, GatherIndicesLHS);
+  LHSIndices =
+  convertToScalableVector(IndexContainerVT, LHSIndices, DAG, 
Subtarget);
+
+  Gather = DAG.getNode(GatherVVOpc, DL, ContainerVT, V1, LHSIndices,
+   DAG.getUNDEF(ContainerVT), TrueMask, VL);
 }
-auto *IdxC = cast(GatherIndex);
-ShuffleMaskLHS.push_back(IdxC->getZExtValue());
   }
 
-  // Recursively invoke lowering for the LHS as if there were no RHS.
-  // This allows us to leverage all of our single source permute tricks.
-  SDValue Gather =
-DAG.getVectorShuffle(VT, DL, V1, DAG.getUNDEF(VT), ShuffleMaskLHS);
-  Gather = convertToScalableVector(ContainerVT, Gather, DAG, Subtarget);
+  // If a second vector operand is used by this shuffle, blend it in with an
+  // additional vrgather.
+  if (!V2.isUndef()) {
+V2 = convertToScalableVector(ContainerVT, V2, DAG, Subtarget);
 
-  // Blend in second vector source with an additional vrgather.
-  V2 = convertToScalableVector(ContainerVT, V2, DAG, Subtarget);
+MVT MaskContainerVT = ContainerVT.changeVectorElementType(MVT::i1);
+SelectMask =
+convert

[llvm-branch-commits] [flang] [libc] [compiler-rt] [clang] [libcxx] [llvm] [RISCV] Support select optimization (PR #80124)

2024-01-31 Thread Philip Reames via llvm-branch-commits


preames wrote:

> and the measurement data still stands for RISCV.

Please give the measurement data in this review or a direct link to it.  I 
tried searching for it, and did not immediately find it.

https://github.com/llvm/llvm-project/pull/80124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [compiler-rt] [flang] [clang] [libcxx] [libc] [RISCV] Support select optimization (PR #80124)

2024-02-01 Thread Philip Reames via llvm-branch-commits


preames wrote:

JFYI, I don't find the AArch64 data particularly convincing for RISCV.  The 
magnitude of the change even on AArch64 is small, and could easily be swung one 
direction or the other by differences in implementation between the backends.  

https://github.com/llvm/llvm-project/pull/80124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 9f61fbd - [LV] Relax assumption that LCSSA implies single entry

2021-01-12 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-12T12:34:52-08:00
New Revision: 9f61fbd75ae1757d77988b37562de4d6583579aa

URL: 
https://github.com/llvm/llvm-project/commit/9f61fbd75ae1757d77988b37562de4d6583579aa
DIFF: 
https://github.com/llvm/llvm-project/commit/9f61fbd75ae1757d77988b37562de4d6583579aa.diff

LOG: [LV] Relax assumption that LCSSA implies single entry

This relates to the ongoing effort to support vectorization of multiple exit 
loops (see D93317).

The previous code assumed that LCSSA phis were always single entry before the 
vectorizer ran. This was correct, but only because the vectorizer allowed only 
a single exiting edge. There's nothing in the definition of LCSSA which 
requires single entry phis.

A common case where this comes up is with a loop with multiple exiting blocks 
which all reach a common exit block. (e.g. see the test updates)

Differential Revision: https://reviews.llvm.org/D93725

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
llvm/test/Transforms/LoopVectorize/loop-form.ll

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 3906b11ba4b9..e3e522958c3a 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1101,8 +1101,7 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop 
*Lp,
   // TODO: This restriction can be relaxed in the near future, it's here solely
   // to allow separation of changes for review. We need to generalize the phi
   // update logic in a number of places.
-  BasicBlock *ExitBB = Lp->getUniqueExitBlock();
-  if (!ExitBB) {
+  if (!Lp->getUniqueExitBlock()) {
 reportVectorizationFailure("The loop must have a unique exit block",
 "loop control flow is not understood by vectorizer",
 "CFGNotUnderstood", ORE, TheLoop);
@@ -1110,24 +1109,7 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop 
*Lp,
   Result = false;
 else
   return false;
-  } else {
-// The existing code assumes that LCSSA implies that phis are single entry
-// (which was true when we had at most a single exiting edge from the 
latch).
-// In general, there's nothing which prevents an LCSSA phi in exit block 
from
-// having two or more values if there are multiple exiting edges leading to
-// the exit block.  (TODO: implement general case)
-if (!llvm::empty(ExitBB->phis()) && !ExitBB->getSinglePredecessor()) {
-  reportVectorizationFailure("The loop must have no live-out values if "
- "it has more than one exiting block",
-  "loop control flow is not understood by vectorizer",
-  "CFGNotUnderstood", ORE, TheLoop);
-  if (DoExtraAnalysis)
-Result = false;
-  else
-return false;
-}
   }
-
   return Result;
 }
 

diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e6cadf8f8796..5ae400fb5dc9 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -633,10 +633,11 @@ class InnerLoopVectorizer {
   /// Clear NSW/NUW flags from reduction instructions if necessary.
   void clearReductionWrapFlags(RecurrenceDescriptor &RdxDesc);
 
-  /// The Loop exit block may have single value PHI nodes with some
-  /// incoming value. While vectorizing we only handled real values
-  /// that were defined inside the loop and we should have one value for
-  /// each predecessor of its parent basic block. See PR14725.
+  /// Fixup the LCSSA phi nodes in the unique exit block.  This simply
+  /// means we need to add the appropriate incoming value from the middle
+  /// block as exiting edges from the scalar epilogue loop (if present) are
+  /// already in place, and we exit the vector loop exclusively to the middle
+  /// block.
   void fixLCSSAPHIs();
 
   /// Iteratively sink the scalarized operands of a predicated instruction into
@@ -4149,11 +4150,14 @@ void 
InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) {
   // vector recurrence we extracted in the middle block. Since the loop is in
   // LCSSA form, we just need to find all the phi nodes for the original scalar
   // recurrence in the exit block, and then add an edge for the middle block.
-  for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {
-if (LCSSAPhi.getIncomingValue(0) == Phi) {
+  // Note that LCSSA does not imply single entry when the original scalar loop
+  // had multiple exiting edges (as we always run the last iteration in the
+  // scalar epilogue); in that case, the exiting path through middle will be
+  // dynamically dead

[llvm-branch-commits] [llvm] caafdf0 - [LV] Weaken spuriously strong assert in LoopVersioning

2021-01-12 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-12T12:57:13-08:00
New Revision: caafdf07bbccbe89219539e2b56043c2a98358f1

URL: 
https://github.com/llvm/llvm-project/commit/caafdf07bbccbe89219539e2b56043c2a98358f1
DIFF: 
https://github.com/llvm/llvm-project/commit/caafdf07bbccbe89219539e2b56043c2a98358f1.diff

LOG: [LV] Weaken spuriously strong assert in LoopVersioning

LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use 
it for, you know, versioning.  As a result, the precondition LoopVersioning 
expects is too strong for this user.  At the moment, LoopVectorize supports any 
loop with a unique exit block, so check the same precondition here.

Really, the whole class structure here is a mess.  We should separate the 
actual versioning from the metadata updates, but that's a bigger problem.

Added: 


Modified: 
llvm/lib/Transforms/Utils/LoopVersioning.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Utils/LoopVersioning.cpp 
b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
index b54aee35d56d..599bd1feb2bc 100644
--- a/llvm/lib/Transforms/Utils/LoopVersioning.cpp
+++ b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
@@ -44,7 +44,7 @@ LoopVersioning::LoopVersioning(const LoopAccessInfo &LAI,
   AliasChecks(Checks.begin(), Checks.end()),
   Preds(LAI.getPSE().getUnionPredicate()), LAI(LAI), LI(LI), DT(DT),
   SE(SE) {
-  assert(L->getExitBlock() && "No single exit block");
+  assert(L->getUniqueExitBlock() && "No single exit block");
 }
 
 void LoopVersioning::versionLoop(



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 7011086 - [test] Autogen a loop vectorizer test to make future changes visible

2021-01-17 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-17T20:03:22-08:00
New Revision: 7011086dc1cd5575f971db0138a62387939e6a73

URL: 
https://github.com/llvm/llvm-project/commit/7011086dc1cd5575f971db0138a62387939e6a73
DIFF: 
https://github.com/llvm/llvm-project/commit/7011086dc1cd5575f971db0138a62387939e6a73.diff

LOG: [test] Autogen a loop vectorizer test to make future changes visible

Added: 


Modified: 
llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

Removed: 




diff  --git a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll 
b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
index dbc90bcf4519..0d4bdf0ecac3 100644
--- a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
+++ b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt -S -loop-vectorize -instcombine -force-vector-width=4 
-force-vector-interleave=1 -enable-interleaved-mem-accesses=true 
-runtime-memory-check-threshold=24 < %s | FileCheck %s
 
 target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
@@ -16,19 +17,48 @@ target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
 ;   }
 ; }
 
-; CHECK-LABEL: @test_array_load2_store2(
-; CHECK: %wide.vec = load <8 x i32>, <8 x i32>* %{{.*}}, align 4
-; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> 
-; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> 
-; CHECK: add nsw <4 x i32>
-; CHECK: mul nsw <4 x i32>
-; CHECK: %interleaved.vec = shufflevector <4 x i32> {{.*}}, <8 x i32> 
-; CHECK: store <8 x i32> %interleaved.vec, <8 x i32>* %{{.*}}, align 4
 
 @AB = common global [1024 x i32] zeroinitializer, align 4
 @CD = common global [1024 x i32] zeroinitializer, align 4
 
 define void @test_array_load2_store2(i32 %C, i32 %D) {
+; CHECK-LABEL: @test_array_load2_store2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:   vector.ph:
+; CHECK-NEXT:[[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> 
poison, i32 [[C:%.*]], i32 0
+; CHECK-NEXT:[[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> 
[[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:[[BROADCAST_SPLATINSERT2:%.*]] = insertelement <4 x i32> 
poison, i32 [[D:%.*]], i32 0
+; CHECK-NEXT:[[BROADCAST_SPLAT3:%.*]] = shufflevector <4 x i32> 
[[BROADCAST_SPLATINSERT2]], <4 x i32> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:br label [[VECTOR_BODY:%.*]]
+; CHECK:   vector.body:
+; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
+; CHECK-NEXT:[[TMP0:%.*]] = getelementptr inbounds [1024 x i32], [1024 x 
i32]* @AB, i64 0, i64 [[OFFSET_IDX]]
+; CHECK-NEXT:[[TMP1:%.*]] = bitcast i32* [[TMP0]] to <8 x i32>*
+; CHECK-NEXT:[[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP1]], align 
4
+; CHECK-NEXT:[[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], 
<8 x i32> poison, <4 x i32> 
+; CHECK-NEXT:[[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], 
<8 x i32> poison, <4 x i32> 
+; CHECK-NEXT:[[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 1
+; CHECK-NEXT:[[TMP3:%.*]] = add nsw <4 x i32> [[STRIDED_VEC]], 
[[BROADCAST_SPLAT]]
+; CHECK-NEXT:[[TMP4:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], 
[[BROADCAST_SPLAT3]]
+; CHECK-NEXT:[[TMP5:%.*]] = getelementptr inbounds [1024 x i32], [1024 x 
i32]* @CD, i64 0, i64 [[TMP2]]
+; CHECK-NEXT:[[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP5]], i64 
-1
+; CHECK-NEXT:[[TMP7:%.*]] = bitcast i32* [[TMP6]] to <8 x i32>*
+; CHECK-NEXT:[[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP3]], 
<4 x i32> [[TMP4]], <8 x i32> 
+; CHECK-NEXT:store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP7]], 
align 4
+; CHECK-NEXT:[[INDEX_NEXT]] = add i64 [[INDEX]], 4
+; CHECK-NEXT:[[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
+; CHECK-NEXT:br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label 
[[VECTOR_BODY]], [[LOOP0:!llvm.loop !.*]]
+; CHECK:   middle.block:
+; CHECK-NEXT:br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
+; CHECK:   scalar.ph:
+; CHECK-NEXT:br label [[FOR_BODY:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:br i1 undef, label [[FOR_BODY]], label [[FOR_END]], 
[[LOOP2:!llvm.loop !.*]]
+; CHECK:   for.end:
+; CHECK-NEXT:ret void
+;
 entry:
   br label %for.body
 
@@ -67,24 +97,48 @@ for.end:  ; preds = 
%for.body
 ;   }
 ; }
 
-; CHECK-LABEL: @test_struct_array_load3_store3(
-; CHECK: %wide.vec = load <12 x i32>, <12 x i32>* {{.*}}, align 4
-; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> poison, <4 x i32> 
-; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> poison, <4 x i32

[llvm-branch-commits] [llvm] 8356610 - [test] pre commit a couple more tests for vectorizing multiple exit loops

2021-01-17 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-17T20:29:13-08:00
New Revision: 8356610f8d48ca7ecbb930dd9b987e4269784710

URL: 
https://github.com/llvm/llvm-project/commit/8356610f8d48ca7ecbb930dd9b987e4269784710
DIFF: 
https://github.com/llvm/llvm-project/commit/8356610f8d48ca7ecbb930dd9b987e4269784710.diff

LOG: [test] pre commit a couple more tests for vectorizing multiple exit loops

Added: 


Modified: 
llvm/test/Transforms/LoopVectorize/loop-form.ll

Removed: 




diff  --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll 
b/llvm/test/Transforms/LoopVectorize/loop-form.ll
index 5b2dd81a395b..91780789088b 100644
--- a/llvm/test/Transforms/LoopVectorize/loop-form.ll
+++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll
@@ -588,6 +588,140 @@ if.end2:
   ret i32 1
 }
 
+; LCSSA, common value each exit
+define i32 @multiple_exit_blocks2(i16* %p, i32 %n) {
+; CHECK-LABEL: @multiple_exit_blocks2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[FOR_COND:%.*]]
+; CHECK:   for.cond:
+; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], 
[[FOR_BODY:%.*]] ]
+; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]]
+; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64
+; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 
[[IPROM]]
+; CHECK-NEXT:store i16 0, i16* [[B]], align 4
+; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1
+; CHECK-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096
+; CHECK-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]]
+; CHECK:   if.end:
+; CHECK-NEXT:[[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ]
+; CHECK-NEXT:ret i32 [[I_LCSSA]]
+; CHECK:   if.end2:
+; CHECK-NEXT:[[I_LCSSA1:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ]
+; CHECK-NEXT:ret i32 [[I_LCSSA1]]
+;
+; TAILFOLD-LABEL: @multiple_exit_blocks2(
+; TAILFOLD-NEXT:  entry:
+; TAILFOLD-NEXT:br label [[FOR_COND:%.*]]
+; TAILFOLD:   for.cond:
+; TAILFOLD-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], 
[[FOR_BODY:%.*]] ]
+; TAILFOLD-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]]
+; TAILFOLD-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
+; TAILFOLD:   for.body:
+; TAILFOLD-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64
+; TAILFOLD-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], 
i64 [[IPROM]]
+; TAILFOLD-NEXT:store i16 0, i16* [[B]], align 4
+; TAILFOLD-NEXT:[[INC]] = add nsw i32 [[I]], 1
+; TAILFOLD-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096
+; TAILFOLD-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]]
+; TAILFOLD:   if.end:
+; TAILFOLD-NEXT:[[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ]
+; TAILFOLD-NEXT:ret i32 [[I_LCSSA]]
+; TAILFOLD:   if.end2:
+; TAILFOLD-NEXT:[[I_LCSSA1:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ]
+; TAILFOLD-NEXT:ret i32 [[I_LCSSA1]]
+;
+entry:
+  br label %for.cond
+
+for.cond:
+  %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %cmp = icmp slt i32 %i, %n
+  br i1 %cmp, label %for.body, label %if.end
+
+for.body:
+  %iprom = sext i32 %i to i64
+  %b = getelementptr inbounds i16, i16* %p, i64 %iprom
+  store i16 0, i16* %b, align 4
+  %inc = add nsw i32 %i, 1
+  %cmp2 = icmp slt i32 %i, 2096
+  br i1 %cmp2, label %for.cond, label %if.end2
+
+if.end:
+  ret i32 %i
+
+if.end2:
+  ret i32 %i
+}
+
+; LCSSA, distinct value each exit
+define i32 @multiple_exit_blocks3(i16* %p, i32 %n) {
+; CHECK-LABEL: @multiple_exit_blocks3(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[FOR_COND:%.*]]
+; CHECK:   for.cond:
+; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], 
[[FOR_BODY:%.*]] ]
+; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]]
+; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64
+; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 
[[IPROM]]
+; CHECK-NEXT:store i16 0, i16* [[B]], align 4
+; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1
+; CHECK-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096
+; CHECK-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]]
+; CHECK:   if.end:
+; CHECK-NEXT:[[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ]
+; CHECK-NEXT:ret i32 [[I_LCSSA]]
+; CHECK:   if.end2:
+; CHECK-NEXT:[[INC_LCSSA:%.*]] = phi i32 [ [[INC]], [[FOR_BODY]] ]
+; CHECK-NEXT:ret i32 [[INC_LCSSA]]
+;
+; TAILFOLD-LABEL: @multiple_exit_blocks3(
+; TAILFOLD-NEXT:  entry:
+; TAILFOLD-NEXT:br label [[FOR_COND:%.*]]
+; TAILFOLD:   for.cond:
+; TAILFOLD-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], 
[[FOR_BODY:%.*]] ]
+; TAILFOLD-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]]
+; TAILFOLD-

[llvm-branch-commits] [llvm] ef51eed - [LoopDeletion] Handle inner loops w/untaken backedges

2021-01-22 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-22T16:31:29-08:00
New Revision: ef51eed37b7ed67b3c0e5f70fa61d681ba21787d

URL: 
https://github.com/llvm/llvm-project/commit/ef51eed37b7ed67b3c0e5f70fa61d681ba21787d
DIFF: 
https://github.com/llvm/llvm-project/commit/ef51eed37b7ed67b3c0e5f70fa61d681ba21787d.diff

LOG: [LoopDeletion] Handle inner loops w/untaken backedges

This builds on the restricted after initial revert form of D93906, and adds 
back support for breaking backedges of inner loops. It turns out the original 
invalidation logic wasn't quite right, specifically around the handling of 
LCSSA.

When breaking the backedge of an inner loop, we can cause blocks which were in 
the outer loop only because they were also included in a sub-loop to be removed 
from both loops. This results in the exit block set for our original parent 
loop changing, and thus a need for new LCSSA phi nodes.

This case happens when the inner loop has an exit block which is also an exit 
block of the parent, and there's a block in the child which reaches an exit to 
said block without also reaching an exit to the parent loop.

(I'm describing this in terms of the immediate parent, but the problem is 
general for any transitive parent in the nest.)

The approach implemented here involves a potentially expensive LCSSA rebuild.  
Perf testing during review didn't show anything concerning, but we may end up 
needing to revert this if anyone encounters a practical compile time issue.

Differential Revision: https://reviews.llvm.org/D94378

Added: 


Modified: 
llvm/lib/Transforms/Scalar/LoopDeletion.cpp
llvm/lib/Transforms/Utils/LoopUtils.cpp
llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll
llvm/test/Transforms/LoopDeletion/zero-btc.ll

Removed: 




diff  --git a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp 
b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
index bd5cdeabb9bd..1266c93316fa 100644
--- a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
@@ -151,14 +151,6 @@ breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, 
ScalarEvolution &SE,
   if (!BTC->isZero())
 return LoopDeletionResult::Unmodified;
 
-  // For non-outermost loops, the tricky case is that we can drop blocks
-  // out of both inner and outer loops at the same time.  This results in
-  // new exiting block for the outer loop appearing, and possibly needing
-  // an lcssa phi inserted.  (See loop_nest_lcssa test case in zero-btc.ll)
-  // TODO: We can handle a bunch of cases here without much work, revisit.
-  if (!L->isOutermost())
-return LoopDeletionResult::Unmodified;
-
   breakLoopBackedge(L, DT, SE, LI, MSSA);
   return LoopDeletionResult::Deleted;
 }

diff  --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp 
b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index e6575ee2caf2..8d167923db00 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -761,13 +761,18 @@ void llvm::deleteDeadLoop(Loop *L, DominatorTree *DT, 
ScalarEvolution *SE,
   }
 }
 
+static Loop *getOutermostLoop(Loop *L) {
+  while (Loop *Parent = L->getParentLoop())
+L = Parent;
+  return L;
+}
+
 void llvm::breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE,
  LoopInfo &LI, MemorySSA *MSSA) {
-
-  assert(L->isOutermost() && "Can't yet preserve LCSSA for this case");
   auto *Latch = L->getLoopLatch();
   assert(Latch && "multiple latches not yet supported");
   auto *Header = L->getHeader();
+  Loop *OutermostLoop = getOutermostLoop(L);
 
   SE.forgetLoop(L);
 
@@ -790,6 +795,14 @@ void llvm::breakLoopBackedge(Loop *L, DominatorTree &DT, 
ScalarEvolution &SE,
   // Erase (and destroy) this loop instance.  Handles relinking sub-loops
   // and blocks within the loop as needed.
   LI.erase(L);
+
+  // If the loop we broke had a parent, then changeToUnreachable might have
+  // caused a block to be removed from the parent loop (see loop_nest_lcssa
+  // test case in zero-btc.ll for an example), thus changing the parent's
+  // exit blocks.  If that happened, we need to rebuild LCSSA on the outermost
+  // loop which might have a had a block removed.
+  if (OutermostLoop != L)
+formLCSSARecursively(*OutermostLoop, DT, &LI, &SE);
 }
 
 

diff  --git a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll 
b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll
index d0857fa707b1..397c23cfd3ea 100644
--- a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll
+++ b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll
@@ -23,8 +23,8 @@ define dso_local i32 @main() {
 ; CHECK-NEXT:[[I6:%.*]] = load i32, i32* @a, align 4
 ; CHECK-NEXT:[[I24:%.*]] = load i32, i32* @b, align 4
 ; CHECK-NEXT:[[D_PROMOTED9:%.*]] = load i32, i32* @d, align 4
-; CHECK-NEXT:br label [[BB1:%.*]]
-; CHECK:   bb1:
+; CHECK-NEXT:br label [[BB13_PREHEADER:%.*]]
+; CHECK:   bb13.p

[llvm-branch-commits] [llvm] 4b33b23 - Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for builder

2020-12-28 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-28T10:13:28-08:00
New Revision: 4b33b2387787aef5020450cdcc8dde231eb0a5fc

URL: 
https://github.com/llvm/llvm-project/commit/4b33b2387787aef5020450cdcc8dde231eb0a5fc
DIFF: 
https://github.com/llvm/llvm-project/commit/4b33b2387787aef5020450cdcc8dde231eb0a5fc.diff

LOG: Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for 
builder

This reverts commit 4ffcd4fe9ac2ee948948f732baa16663eb63f1c7 thus restoring 
e4df6a40dad.

The only change from the original patch is to add "llvm::" before the call to 
empty(iterator_range).  This is a speculative fix for the ambiguity reported on 
some builders.

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/control-flow.ll
llvm/test/Transforms/LoopVectorize/loop-form.ll
llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 60e1cc9a4a59..65b3132dc3f1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1095,9 +1095,15 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop 
*Lp,
   return false;
   }
 
-  // We must have a single exiting block.
-  if (!Lp->getExitingBlock()) {
-reportVectorizationFailure("The loop must have an exiting block",
+  // We currently must have a single "exit block" after the loop. Note that
+  // multiple "exiting blocks" inside the loop are allowed, provided they all
+  // reach the single exit block.
+  // TODO: This restriction can be relaxed in the near future, it's here solely
+  // to allow separation of changes for review. We need to generalize the phi
+  // update logic in a number of places.
+  BasicBlock *ExitBB = Lp->getUniqueExitBlock();
+  if (!ExitBB) {
+reportVectorizationFailure("The loop must have a unique exit block",
 "loop control flow is not understood by vectorizer",
 "CFGNotUnderstood", ORE, TheLoop);
 if (DoExtraAnalysis)
@@ -1106,11 +1112,14 @@ bool 
LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp,
   return false;
   }
 
-  // We only handle bottom-tested loops, i.e. loop in which the condition is
-  // checked at the end of each iteration. With that we can assume that all
-  // instructions in the loop are executed the same number of times.
-  if (Lp->getExitingBlock() != Lp->getLoopLatch()) {
-reportVectorizationFailure("The exiting block is not the loop latch",
+  // The existing code assumes that LCSSA implies that phis are single entry
+  // (which was true when we had at most a single exiting edge from the latch).
+  // In general, there's nothing which prevents an LCSSA phi in exit block from
+  // having two or more values if there are multiple exiting edges leading to
+  // the exit block.  (TODO: implement general case)
+  if (!llvm::empty(ExitBB->phis()) && !ExitBB->getSinglePredecessor()) {
+reportVectorizationFailure("The loop must have no live-out values if "
+   "it has more than one exiting block",
 "loop control flow is not understood by vectorizer",
 "CFGNotUnderstood", ORE, TheLoop);
 if (DoExtraAnalysis)

diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5889d5e55339..c48b650c3c3e 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -837,7 +837,8 @@ class InnerLoopVectorizer {
   /// Middle Block between the vector and the scalar.
   BasicBlock *LoopMiddleBlock;
 
-  /// The ExitBlock of the scalar loop.
+  /// The (unique) ExitBlock of the scalar loop.  Note that
+  /// there can be multiple exiting edges reaching this block.
   BasicBlock *LoopExitBlock;
 
   /// The vector loop body.
@@ -1548,11 +1549,16 @@ class LoopVectorizationCostModel {
 return InterleaveInfo.getInterleaveGroup(Instr);
   }
 
-  /// Returns true if an interleaved group requires a scalar iteration
-  /// to handle accesses with gaps, and there is nothing preventing us from
-  /// creating a scalar epilogue.
+  /// Returns true if we're required to use a scalar epilogue for at least
+  /// the final iteration of the original loop.
   bool requiresScalarEpilogue() const {
-return isScalarEpilogueAllowed() && 
InterleaveInfo.requiresScalarEpilogue();
+if (!isScalarEpilogueAllowed())
+  return false;
+// If we might exit from anywhere but the latch, must run the exiting
+// iteration in scalar form.
+if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch())
+  return true;
+return InterleaveInfo.requiresScalarEpilogue();
   }
 
   /// Return

[llvm-branch-commits] [llvm] dd6bb36 - [LoopDeletion] Break backedge of loops when known not taken

2021-01-04 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-04T09:19:29-08:00
New Revision: dd6bb367d19e3bf18353e40de54d35480999a930

URL: 
https://github.com/llvm/llvm-project/commit/dd6bb367d19e3bf18353e40de54d35480999a930
DIFF: 
https://github.com/llvm/llvm-project/commit/dd6bb367d19e3bf18353e40de54d35480999a930.diff

LOG: [LoopDeletion] Break backedge of loops when known not taken

The basic idea is that if SCEV can prove the backedge isn't taken, we can go 
ahead and get rid of the backedge (and thus the loop) while leaving the rest of 
the control in place. This nicely handles cases with dispatch between multiple 
exits and internal side effects.

Differential Revision: https://reviews.llvm.org/D93906

Added: 
llvm/test/Transforms/LoopDeletion/zero-btc.ll

Modified: 
llvm/include/llvm/Transforms/Utils/LoopUtils.h
llvm/lib/Transforms/Scalar/LoopDeletion.cpp
llvm/lib/Transforms/Utils/LoopUtils.cpp
llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll
llvm/test/Transforms/IndVarSimplify/exit_value_test2.ll
llvm/test/Transforms/LoopDeletion/update-scev.ll

Removed: 




diff  --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h 
b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index b29add4cba0e5..82c0d9e070d78 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -179,6 +179,12 @@ bool hoistRegion(DomTreeNode *, AAResults *, LoopInfo *, 
DominatorTree *,
 void deleteDeadLoop(Loop *L, DominatorTree *DT, ScalarEvolution *SE,
 LoopInfo *LI, MemorySSA *MSSA = nullptr);
 
+/// Remove the backedge of the specified loop.  Handles loop nests and general
+/// loop structures subject to the precondition that the loop has a single
+/// latch block.  Preserves all listed analyses.
+void breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE,
+   LoopInfo &LI, MemorySSA *MSSA);
+
 /// Try to promote memory values to scalars by sinking stores out of
 /// the loop and moving loads to before the loop.  We do this by looping over
 /// the stores in the loop, looking for stores to Must pointers which are

diff  --git a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp 
b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
index 065db647561ec..04120032f0f41 100644
--- a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
@@ -26,6 +26,7 @@
 #include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Utils/LoopUtils.h"
+
 using namespace llvm;
 
 #define DEBUG_TYPE "loop-delete"
@@ -38,6 +39,14 @@ enum class LoopDeletionResult {
   Deleted,
 };
 
+static LoopDeletionResult merge(LoopDeletionResult A, LoopDeletionResult B) {
+  if (A == LoopDeletionResult::Deleted || B == LoopDeletionResult::Deleted)
+return LoopDeletionResult::Deleted;
+  if (A == LoopDeletionResult::Modified || B == LoopDeletionResult::Modified)
+return LoopDeletionResult::Modified;
+  return LoopDeletionResult::Unmodified;
+}
+
 /// Determines if a loop is dead.
 ///
 /// This assumes that we've already checked for unique exit and exiting blocks,
@@ -126,6 +135,26 @@ static bool isLoopNeverExecuted(Loop *L) {
   return true;
 }
 
+/// If we can prove the backedge is untaken, remove it.  This destroys the
+/// loop, but leaves the (now trivially loop invariant) control flow and
+/// side effects (if any) in place.
+static LoopDeletionResult
+breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, ScalarEvolution &SE,
+LoopInfo &LI, MemorySSA *MSSA,
+OptimizationRemarkEmitter &ORE) {
+  assert(L->isLCSSAForm(DT) && "Expected LCSSA!");
+
+  if (!L->getLoopLatch())
+return LoopDeletionResult::Unmodified;
+
+  auto *BTC = SE.getBackedgeTakenCount(L);
+  if (!BTC->isZero())
+return LoopDeletionResult::Unmodified;
+
+  breakLoopBackedge(L, DT, SE, LI, MSSA);
+  return LoopDeletionResult::Deleted;
+}
+
 /// Remove a loop if it is dead.
 ///
 /// A loop is considered dead if it does not impact the observable behavior of
@@ -162,7 +191,6 @@ static LoopDeletionResult deleteLoopIfDead(Loop *L, 
DominatorTree &DT,
 return LoopDeletionResult::Unmodified;
   }
 
-
   BasicBlock *ExitBlock = L->getUniqueExitBlock();
 
   if (ExitBlock && isLoopNeverExecuted(L)) {
@@ -240,6 +268,14 @@ PreservedAnalyses LoopDeletionPass::run(Loop &L, 
LoopAnalysisManager &AM,
   // but ORE cannot be preserved (see comment before the pass definition).
   OptimizationRemarkEmitter ORE(L.getHeader()->getParent());
   auto Result = deleteLoopIfDead(&L, AR.DT, AR.SE, AR.LI, AR.MSSA, ORE);
+
+  // If we can prove the backedge isn't taken, just break it and be done.  This
+  // leaves the loop structure in place which means it can handle dispatching
+  // to the right exit based on whatever loop invariant structure remains.
+  if (Result != LoopDeletionR

[llvm-branch-commits] [llvm] 7c63aac - Revert "[LoopDeletion] Break backedge of loops when known not taken"

2021-01-04 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-04T09:50:47-08:00
New Revision: 7c63aac7bd4e5ce3402f2ef7c1d5b66047230147

URL: 
https://github.com/llvm/llvm-project/commit/7c63aac7bd4e5ce3402f2ef7c1d5b66047230147
DIFF: 
https://github.com/llvm/llvm-project/commit/7c63aac7bd4e5ce3402f2ef7c1d5b66047230147.diff

LOG: Revert "[LoopDeletion] Break backedge of loops when known not taken"

This reverts commit dd6bb367d19e3bf18353e40de54d35480999a930.

Multi-stage builders are showing an assertion failure w/LCSSA not being 
preserved on entry to IndVars.  Reason isn't clear, reverting while 
investigating.

Added: 


Modified: 
llvm/include/llvm/Transforms/Utils/LoopUtils.h
llvm/lib/Transforms/Scalar/LoopDeletion.cpp
llvm/lib/Transforms/Utils/LoopUtils.cpp
llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll
llvm/test/Transforms/IndVarSimplify/exit_value_test2.ll
llvm/test/Transforms/LoopDeletion/update-scev.ll

Removed: 
llvm/test/Transforms/LoopDeletion/zero-btc.ll



diff  --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h 
b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 82c0d9e070d7..b29add4cba0e 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -179,12 +179,6 @@ bool hoistRegion(DomTreeNode *, AAResults *, LoopInfo *, 
DominatorTree *,
 void deleteDeadLoop(Loop *L, DominatorTree *DT, ScalarEvolution *SE,
 LoopInfo *LI, MemorySSA *MSSA = nullptr);
 
-/// Remove the backedge of the specified loop.  Handles loop nests and general
-/// loop structures subject to the precondition that the loop has a single
-/// latch block.  Preserves all listed analyses.
-void breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE,
-   LoopInfo &LI, MemorySSA *MSSA);
-
 /// Try to promote memory values to scalars by sinking stores out of
 /// the loop and moving loads to before the loop.  We do this by looping over
 /// the stores in the loop, looking for stores to Must pointers which are

diff  --git a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp 
b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
index 04120032f0f4..065db647561e 100644
--- a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
@@ -26,7 +26,6 @@
 #include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Utils/LoopUtils.h"
-
 using namespace llvm;
 
 #define DEBUG_TYPE "loop-delete"
@@ -39,14 +38,6 @@ enum class LoopDeletionResult {
   Deleted,
 };
 
-static LoopDeletionResult merge(LoopDeletionResult A, LoopDeletionResult B) {
-  if (A == LoopDeletionResult::Deleted || B == LoopDeletionResult::Deleted)
-return LoopDeletionResult::Deleted;
-  if (A == LoopDeletionResult::Modified || B == LoopDeletionResult::Modified)
-return LoopDeletionResult::Modified;
-  return LoopDeletionResult::Unmodified;
-}
-
 /// Determines if a loop is dead.
 ///
 /// This assumes that we've already checked for unique exit and exiting blocks,
@@ -135,26 +126,6 @@ static bool isLoopNeverExecuted(Loop *L) {
   return true;
 }
 
-/// If we can prove the backedge is untaken, remove it.  This destroys the
-/// loop, but leaves the (now trivially loop invariant) control flow and
-/// side effects (if any) in place.
-static LoopDeletionResult
-breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, ScalarEvolution &SE,
-LoopInfo &LI, MemorySSA *MSSA,
-OptimizationRemarkEmitter &ORE) {
-  assert(L->isLCSSAForm(DT) && "Expected LCSSA!");
-
-  if (!L->getLoopLatch())
-return LoopDeletionResult::Unmodified;
-
-  auto *BTC = SE.getBackedgeTakenCount(L);
-  if (!BTC->isZero())
-return LoopDeletionResult::Unmodified;
-
-  breakLoopBackedge(L, DT, SE, LI, MSSA);
-  return LoopDeletionResult::Deleted;
-}
-
 /// Remove a loop if it is dead.
 ///
 /// A loop is considered dead if it does not impact the observable behavior of
@@ -191,6 +162,7 @@ static LoopDeletionResult deleteLoopIfDead(Loop *L, 
DominatorTree &DT,
 return LoopDeletionResult::Unmodified;
   }
 
+
   BasicBlock *ExitBlock = L->getUniqueExitBlock();
 
   if (ExitBlock && isLoopNeverExecuted(L)) {
@@ -268,14 +240,6 @@ PreservedAnalyses LoopDeletionPass::run(Loop &L, 
LoopAnalysisManager &AM,
   // but ORE cannot be preserved (see comment before the pass definition).
   OptimizationRemarkEmitter ORE(L.getHeader()->getParent());
   auto Result = deleteLoopIfDead(&L, AR.DT, AR.SE, AR.LI, AR.MSSA, ORE);
-
-  // If we can prove the backedge isn't taken, just break it and be done.  This
-  // leaves the loop structure in place which means it can handle dispatching
-  // to the right exit based on whatever loop invariant structure remains.
-  if (Result != LoopDeletionResult::Deleted)
-Result = merge(Result, breakBackedgeIfNotTaken(&L, AR.DT, AR.SE, AR.LI,
-

[llvm-branch-commits] [llvm] 377dcfd - [Tests] Auto update a vectorizer test to simplify future diff

2021-01-10 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-10T12:23:22-08:00
New Revision: 377dcfd5c15d8e2c9e71a171635529052a96e244

URL: 
https://github.com/llvm/llvm-project/commit/377dcfd5c15d8e2c9e71a171635529052a96e244
DIFF: 
https://github.com/llvm/llvm-project/commit/377dcfd5c15d8e2c9e71a171635529052a96e244.diff

LOG: [Tests] Auto update a vectorizer test to simplify future diff

Added: 


Modified: 
llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll

Removed: 




diff  --git 
a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll 
b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
index a3cdf7bf3e40..208e1a219be8 100644
--- a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
+++ b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S 
%s | FileCheck %s
 
 
@@ -7,32 +8,62 @@
 ; Test case for PR43398.
 
 define void @can_sink_after_store(i32 %x, i32* %ptr, i64 %tc) 
local_unnamed_addr #0 {
-; CHECK-LABEL: vector.ph:
-; CHECK:%broadcast.splatinsert = insertelement <4 x i32> poison, i32 
%x, i32 0
-; CHECK-NEXT:   %broadcast.splat = shufflevector <4 x i32> 
%broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
-; CHECK-NEXT:   %vector.recur.init = insertelement <4 x i32> poison, i32 
%.pre, i32 3
-; CHECK-NEXT:br label %vector.body
-
-; CHECK-LABEL: vector.body:
-; CHECK-NEXT:   %index = phi i64 [ 0, %vector.ph ], [ %index.next, 
%vector.body ]
-; CHECK-NEXT:   %vector.recur = phi <4 x i32> [ %vector.recur.init, %vector.ph 
], [ %wide.load, %vector.body ]
-; CHECK-NEXT:   %offset.idx = add i64 1, %index
-; CHECK-NEXT:   %0 = add i64 %offset.idx, 0
-; CHECK-NEXT:   %1 = getelementptr inbounds [257 x i32], [257 x i32]* @p, i64 
0, i64 %0
-; CHECK-NEXT:   %2 = getelementptr inbounds i32, i32* %1, i32 0
-; CHECK-NEXT:   %3 = bitcast i32* %2 to <4 x i32>*
-; CHECK-NEXT:   %wide.load = load <4 x i32>, <4 x i32>* %3, align 4
-; CHECK-NEXT:   %4 = shufflevector <4 x i32> %vector.recur, <4 x i32> 
%wide.load, <4 x i32> 
-; CHECK-NEXT:   %5 = add <4 x i32> %4, %broadcast.splat
-; CHECK-NEXT:   %6 = add <4 x i32> %5, %wide.load
-; CHECK-NEXT:   %7 = getelementptr inbounds [257 x i32], [257 x i32]* @q, i64 
0, i64 %0
-; CHECK-NEXT:   %8 = getelementptr inbounds i32, i32* %7, i32 0
-; CHECK-NEXT:   %9 = bitcast i32* %8 to <4 x i32>*
-; CHECK-NEXT:   store <4 x i32> %6, <4 x i32>* %9, align 4
-; CHECK-NEXT:   %index.next = add i64 %index, 4
-; CHECK-NEXT:   %10 = icmp eq i64 %index.next, 1996
-; CHECK-NEXT:   br i1 %10, label %middle.block, label %vector.body
+; CHECK-LABEL: @can_sink_after_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[PREHEADER:%.*]]
+; CHECK:   preheader:
+; CHECK-NEXT:[[IDX_PHI_TRANS:%.*]] = getelementptr inbounds [257 x i32], 
[257 x i32]* @p, i64 0, i64 1
+; CHECK-NEXT:[[DOTPRE:%.*]] = load i32, i32* [[IDX_PHI_TRANS]], align 4
+; CHECK-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:   vector.ph:
+; CHECK-NEXT:[[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> 
poison, i32 [[X:%.*]], i32 0
+; CHECK-NEXT:[[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> 
[[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:[[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, 
i32 [[DOTPRE]], i32 3
+; CHECK-NEXT:br label [[VECTOR_BODY:%.*]]
+; CHECK:   vector.body:
+; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[VECTOR_RECUR:%.*]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], 
[[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
+; CHECK-NEXT:[[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
+; CHECK-NEXT:[[TMP1:%.*]] = getelementptr inbounds [257 x i32], [257 x 
i32]* @p, i64 0, i64 [[TMP0]]
+; CHECK-NEXT:[[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TMP1]], i32 0
+; CHECK-NEXT:[[TMP3:%.*]] = bitcast i32* [[TMP2]] to <4 x i32>*
+; CHECK-NEXT:[[WIDE_LOAD]] = load <4 x i32>, <4 x i32>* [[TMP3]], align 4
+; CHECK-NEXT:[[TMP4:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x 
i32> [[WIDE_LOAD]], <4 x i32> 
+; CHECK-NEXT:[[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[BROADCAST_SPLAT]]
+; CHECK-NEXT:[[TMP6:%.*]] = add <4 x i32> [[TMP5]], [[WIDE_LOAD]]
+; CHECK-NEXT:[[TMP7:%.*]] = getelementptr inbounds [257 x i32], [257 x 
i32]* @q, i64 0, i64 [[TMP0]]
+; CHECK-NEXT:[[TMP8:%.*]] = getelementptr inbounds i32, i32* [[TMP7]], i32 0
+; CHECK-NEXT:[[TMP9:%.*]] = bitcast i32* [[TMP8]] to <4 x i32>*
+; CHECK-NEXT:store <4 x i32> [[TMP6]], <4 x i32>* [[TMP9]], align 4
+; CHECK-NEX

[llvm-branch-commits] [llvm] 86d6f7e - Precommit tests requested for D93725

2021-01-10 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-10T12:29:34-08:00
New Revision: 86d6f7e90a1deab93e357b8f356e29d4a24fa3ac

URL: 
https://github.com/llvm/llvm-project/commit/86d6f7e90a1deab93e357b8f356e29d4a24fa3ac
DIFF: 
https://github.com/llvm/llvm-project/commit/86d6f7e90a1deab93e357b8f356e29d4a24fa3ac.diff

LOG: Precommit tests requested for D93725

Added: 


Modified: 
llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
llvm/test/Transforms/LoopVectorize/loop-form.ll

Removed: 




diff  --git 
a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll 
b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
index 208e1a219be8..ef3d3e659e5a 100644
--- a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
+++ b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
@@ -432,3 +432,94 @@ loop.latch:; preds 
= %if.then122, %for.b
 exit:
   ret void
 }
+
+; A recurrence in a multiple exit loop.
+define i16 @multiple_exit(i16* %p, i32 %n) {
+; CHECK-LABEL: @multiple_exit(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[FOR_COND:%.*]]
+; CHECK:   for.cond:
+; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], 
[[FOR_BODY:%.*]] ]
+; CHECK-NEXT:[[REC:%.*]] = phi i16 [ 0, [[ENTRY]] ], [ [[REC_NEXT:%.*]], 
[[FOR_BODY]] ]
+; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64
+; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 
[[IPROM]]
+; CHECK-NEXT:[[REC_NEXT]] = load i16, i16* [[B]], align 2
+; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]]
+; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:store i16 [[REC]], i16* [[B]], align 4
+; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1
+; CHECK-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096
+; CHECK-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]
+; CHECK:   if.end:
+; CHECK-NEXT:[[REC_LCSSA:%.*]] = phi i16 [ [[REC]], [[FOR_BODY]] ], [ 
[[REC]], [[FOR_COND]] ]
+; CHECK-NEXT:ret i16 [[REC_LCSSA]]
+;
+entry:
+  br label %for.cond
+
+for.cond:
+  %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %rec = phi i16 [0, %entry], [ %rec.next, %for.body ]
+  %iprom = sext i32 %i to i64
+  %b = getelementptr inbounds i16, i16* %p, i64 %iprom
+  %rec.next = load i16, i16* %b
+  %cmp = icmp slt i32 %i, %n
+  br i1 %cmp, label %for.body, label %if.end
+
+for.body:
+  store i16 %rec , i16* %b, align 4
+  %inc = add nsw i32 %i, 1
+  %cmp2 = icmp slt i32 %i, 2096
+  br i1 %cmp2, label %for.cond, label %if.end
+
+if.end:
+  ret i16 %rec
+}
+
+
+; A multiple exit case where one of the exiting edges involves a value
+; from the recurrence and one does not.
+define i16 @multiple_exit2(i16* %p, i32 %n) {
+; CHECK-LABEL: @multiple_exit2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[FOR_COND:%.*]]
+; CHECK:   for.cond:
+; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], 
[[FOR_BODY:%.*]] ]
+; CHECK-NEXT:[[REC:%.*]] = phi i16 [ 0, [[ENTRY]] ], [ [[REC_NEXT:%.*]], 
[[FOR_BODY]] ]
+; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64
+; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 
[[IPROM]]
+; CHECK-NEXT:[[REC_NEXT]] = load i16, i16* [[B]], align 2
+; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]]
+; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:store i16 [[REC]], i16* [[B]], align 4
+; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1
+; CHECK-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096
+; CHECK-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]
+; CHECK:   if.end:
+; CHECK-NEXT:[[REC_LCSSA:%.*]] = phi i16 [ [[REC]], [[FOR_COND]] ], [ 10, 
[[FOR_BODY]] ]
+; CHECK-NEXT:ret i16 [[REC_LCSSA]]
+;
+entry:
+  br label %for.cond
+
+for.cond:
+  %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %rec = phi i16 [0, %entry], [ %rec.next, %for.body ]
+  %iprom = sext i32 %i to i64
+  %b = getelementptr inbounds i16, i16* %p, i64 %iprom
+  %rec.next = load i16, i16* %b
+  %cmp = icmp slt i32 %i, %n
+  br i1 %cmp, label %for.body, label %if.end
+
+for.body:
+  store i16 %rec , i16* %b, align 4
+  %inc = add nsw i32 %i, 1
+  %cmp2 = icmp slt i32 %i, 2096
+  br i1 %cmp2, label %for.cond, label %if.end
+
+if.end:
+  %rec.lcssa = phi i16 [ %rec, %for.cond ], [ 10, %for.body ]
+  ret i16 %rec.lcssa
+}

diff  --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll 
b/llvm/test/Transforms/LoopVectorize/loop-form.ll
index f93c038de6bb..bf94505aec2c 100644
--- a/llvm/test/Transforms/LoopVectorize/loop-form.ll
+++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll
@@ -869,3 +869,126 @@ loop.latch:
 exit:
   ret void
 }
+
+define i32 @reduction(i32* %addr) {
+; CHECK-LABEL: @redu

[llvm-branch-commits] [llvm] fc8ab25 - [Tests] Precommit tests from to simplify rebase

2021-01-10 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-10T12:42:08-08:00
New Revision: fc8ab254472972816956c69d16e8b35bc91cc2ab

URL: 
https://github.com/llvm/llvm-project/commit/fc8ab254472972816956c69d16e8b35bc91cc2ab
DIFF: 
https://github.com/llvm/llvm-project/commit/fc8ab254472972816956c69d16e8b35bc91cc2ab.diff

LOG: [Tests] Precommit tests from to simplify rebase

Added: 
llvm/test/Transforms/LoopDeletion/zero-btc.ll

Modified: 


Removed: 




diff  --git a/llvm/test/Transforms/LoopDeletion/zero-btc.ll 
b/llvm/test/Transforms/LoopDeletion/zero-btc.ll
new file mode 100644
index ..b56e30e8f1be
--- /dev/null
+++ b/llvm/test/Transforms/LoopDeletion/zero-btc.ll
@@ -0,0 +1,319 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -loop-deletion -S | FileCheck %s
+
+@G = external global i32
+
+define void @test_trivial() {
+; CHECK-LABEL: @test_trivial(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[LOOP:%.*]]
+; CHECK:   loop:
+; CHECK-NEXT:store i32 0, i32* @G, align 4
+; CHECK-NEXT:br i1 false, label [[LOOP]], label [[EXIT:%.*]]
+; CHECK:   exit:
+; CHECK-NEXT:ret void
+;
+entry:
+  br label %loop
+
+loop:
+  store i32 0, i32* @G
+  br i1 false, label %loop, label %exit
+
+exit:
+  ret void
+}
+
+
+define void @test_bottom_tested() {
+; CHECK-LABEL: @test_bottom_tested(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[LOOP:%.*]]
+; CHECK:   loop:
+; CHECK-NEXT:[[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_INC:%.*]], 
[[LOOP]] ]
+; CHECK-NEXT:store i32 0, i32* @G, align 4
+; CHECK-NEXT:[[IV_INC]] = add i32 [[IV]], 1
+; CHECK-NEXT:[[BE_TAKEN:%.*]] = icmp ne i32 [[IV_INC]], 1
+; CHECK-NEXT:br i1 [[BE_TAKEN]], label [[LOOP]], label [[EXIT:%.*]]
+; CHECK:   exit:
+; CHECK-NEXT:ret void
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry], [ %iv.inc, %loop ]
+  store i32 0, i32* @G
+  %iv.inc = add i32 %iv, 1
+  %be_taken = icmp ne i32 %iv.inc, 1
+  br i1 %be_taken, label %loop, label %exit
+
+exit:
+  ret void
+}
+
+define void @test_early_exit() {
+; CHECK-LABEL: @test_early_exit(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[LOOP:%.*]]
+; CHECK:   loop:
+; CHECK-NEXT:[[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_INC:%.*]], 
[[LATCH:%.*]] ]
+; CHECK-NEXT:store i32 0, i32* @G, align 4
+; CHECK-NEXT:[[IV_INC]] = add i32 [[IV]], 1
+; CHECK-NEXT:[[BE_TAKEN:%.*]] = icmp ne i32 [[IV_INC]], 1
+; CHECK-NEXT:br i1 [[BE_TAKEN]], label [[LATCH]], label [[EXIT:%.*]]
+; CHECK:   latch:
+; CHECK-NEXT:br label [[LOOP]]
+; CHECK:   exit:
+; CHECK-NEXT:ret void
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry], [ %iv.inc, %latch ]
+  store i32 0, i32* @G
+  %iv.inc = add i32 %iv, 1
+  %be_taken = icmp ne i32 %iv.inc, 1
+  br i1 %be_taken, label %latch, label %exit
+latch:
+  br label %loop
+
+exit:
+  ret void
+}
+
+define void @test_multi_exit1() {
+; CHECK-LABEL: @test_multi_exit1(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[LOOP:%.*]]
+; CHECK:   loop:
+; CHECK-NEXT:[[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_INC:%.*]], 
[[LATCH:%.*]] ]
+; CHECK-NEXT:store i32 0, i32* @G, align 4
+; CHECK-NEXT:[[IV_INC]] = add i32 [[IV]], 1
+; CHECK-NEXT:[[BE_TAKEN:%.*]] = icmp ne i32 [[IV_INC]], 1
+; CHECK-NEXT:br i1 [[BE_TAKEN]], label [[LATCH]], label [[EXIT:%.*]]
+; CHECK:   latch:
+; CHECK-NEXT:store i32 1, i32* @G, align 4
+; CHECK-NEXT:[[COND2:%.*]] = icmp ult i32 [[IV_INC]], 30
+; CHECK-NEXT:br i1 [[COND2]], label [[LOOP]], label [[EXIT]]
+; CHECK:   exit:
+; CHECK-NEXT:ret void
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry], [ %iv.inc, %latch ]
+  store i32 0, i32* @G
+  %iv.inc = add i32 %iv, 1
+  %be_taken = icmp ne i32 %iv.inc, 1
+  br i1 %be_taken, label %latch, label %exit
+latch:
+  store i32 1, i32* @G
+  %cond2 = icmp ult i32 %iv.inc, 30
+  br i1 %cond2, label %loop, label %exit
+
+exit:
+  ret void
+}
+
+define void @test_multi_exit2() {
+; CHECK-LABEL: @test_multi_exit2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[LOOP:%.*]]
+; CHECK:   loop:
+; CHECK-NEXT:store i32 0, i32* @G, align 4
+; CHECK-NEXT:br i1 true, label [[LATCH:%.*]], label [[EXIT:%.*]]
+; CHECK:   latch:
+; CHECK-NEXT:store i32 1, i32* @G, align 4
+; CHECK-NEXT:br i1 false, label [[LOOP]], label [[EXIT]]
+; CHECK:   exit:
+; CHECK-NEXT:ret void
+;
+entry:
+  br label %loop
+
+loop:
+  store i32 0, i32* @G
+  br i1 true, label %latch, label %exit
+latch:
+  store i32 1, i32* @G
+  br i1 false, label %loop, label %exit
+
+exit:
+  ret void
+}
+
+; TODO: SCEV seems not to recognize this as a zero btc loop
+define void @test_multi_exit3(i1 %cond1) {
+; CHECK-LABEL: @test_multi_exit3(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[LOOP:%.*]]
+; CHECK:   loop:
+;

[llvm-branch-commits] [llvm] 4739dd6 - [LoopDeletion] Break backedge of outermost loops when known not taken

2021-01-10 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2021-01-10T16:02:33-08:00
New Revision: 4739dd67e7a08b715f1d23f71fb4af16007fe80a

URL: 
https://github.com/llvm/llvm-project/commit/4739dd67e7a08b715f1d23f71fb4af16007fe80a
DIFF: 
https://github.com/llvm/llvm-project/commit/4739dd67e7a08b715f1d23f71fb4af16007fe80a.diff

LOG: [LoopDeletion] Break backedge of outermost loops when known not taken

This is a resubmit of dd6bb367 (which was reverted due to stage2 build failures 
in 7c63aac), with the additional restriction added to the transform to only 
consider outer most loops.

As shown in the added test case, ensuring LCSSA is up to date when deleting an 
inner loop is tricky as we may actually need to remove blocks from any outer 
loops, thus changing the exit block set.   For the moment, just avoid 
transforming this case.  I plan to return to this case in a follow up patch and 
see if we can do better.

Original commit message follows...

The basic idea is that if SCEV can prove the backedge isn't taken, we can go 
ahead and get rid of the backedge (and thus the loop) while leaving the rest of 
the control in place. This nicely handles cases with dispatch between multiple 
exits and internal side effects.

Differential Revision: https://reviews.llvm.org/D93906

Added: 


Modified: 
llvm/include/llvm/Transforms/Utils/LoopUtils.h
llvm/lib/Transforms/Scalar/LoopDeletion.cpp
llvm/lib/Transforms/Utils/LoopUtils.cpp
llvm/test/Transforms/IndVarSimplify/exit_value_test2.ll
llvm/test/Transforms/LoopDeletion/update-scev.ll
llvm/test/Transforms/LoopDeletion/zero-btc.ll

Removed: 




diff  --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h 
b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 80c6b09d9cf0..940747b5b2ea 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -179,6 +179,12 @@ bool hoistRegion(DomTreeNode *, AAResults *, LoopInfo *, 
DominatorTree *,
 void deleteDeadLoop(Loop *L, DominatorTree *DT, ScalarEvolution *SE,
 LoopInfo *LI, MemorySSA *MSSA = nullptr);
 
+/// Remove the backedge of the specified loop.  Handles loop nests and general
+/// loop structures subject to the precondition that the loop has no parent
+/// loop and has a single latch block.  Preserves all listed analyses.
+void breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE,
+   LoopInfo &LI, MemorySSA *MSSA);
+
 /// Try to promote memory values to scalars by sinking stores out of
 /// the loop and moving loads to before the loop.  We do this by looping over
 /// the stores in the loop, looking for stores to Must pointers which are

diff  --git a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp 
b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
index a94676eadeab..bd5cdeabb9bd 100644
--- a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp
@@ -26,6 +26,7 @@
 #include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Utils/LoopUtils.h"
+
 using namespace llvm;
 
 #define DEBUG_TYPE "loop-delete"
@@ -38,6 +39,14 @@ enum class LoopDeletionResult {
   Deleted,
 };
 
+static LoopDeletionResult merge(LoopDeletionResult A, LoopDeletionResult B) {
+  if (A == LoopDeletionResult::Deleted || B == LoopDeletionResult::Deleted)
+return LoopDeletionResult::Deleted;
+  if (A == LoopDeletionResult::Modified || B == LoopDeletionResult::Modified)
+return LoopDeletionResult::Modified;
+  return LoopDeletionResult::Unmodified;
+}
+
 /// Determines if a loop is dead.
 ///
 /// This assumes that we've already checked for unique exit and exiting blocks,
@@ -126,6 +135,34 @@ static bool isLoopNeverExecuted(Loop *L) {
   return true;
 }
 
+/// If we can prove the backedge is untaken, remove it.  This destroys the
+/// loop, but leaves the (now trivially loop invariant) control flow and
+/// side effects (if any) in place.
+static LoopDeletionResult
+breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, ScalarEvolution &SE,
+LoopInfo &LI, MemorySSA *MSSA,
+OptimizationRemarkEmitter &ORE) {
+  assert(L->isLCSSAForm(DT) && "Expected LCSSA!");
+
+  if (!L->getLoopLatch())
+return LoopDeletionResult::Unmodified;
+
+  auto *BTC = SE.getBackedgeTakenCount(L);
+  if (!BTC->isZero())
+return LoopDeletionResult::Unmodified;
+
+  // For non-outermost loops, the tricky case is that we can drop blocks
+  // out of both inner and outer loops at the same time.  This results in
+  // new exiting block for the outer loop appearing, and possibly needing
+  // an lcssa phi inserted.  (See loop_nest_lcssa test case in zero-btc.ll)
+  // TODO: We can handle a bunch of cases here without much work, revisit.
+  if (!L->isOutermost())
+return LoopDeletionResult::Unmodified;
+
+  breakLoopBackedge(L, DT, SE, LI, MSSA)

[llvm-branch-commits] [llvm] f5fe849 - [LAA] Relax restrictions on early exits in loop structure

2020-12-14 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-14T12:44:01-08:00
New Revision: f5fe8493e5acfd70da61993cd370816978b9ef85

URL: 
https://github.com/llvm/llvm-project/commit/f5fe8493e5acfd70da61993cd370816978b9ef85
DIFF: 
https://github.com/llvm/llvm-project/commit/f5fe8493e5acfd70da61993cd370816978b9ef85.diff

LOG: [LAA] Relax restrictions on early exits in loop structure

his is a preparation patch for supporting multiple exits in the loop 
vectorizer, by itself it should be mostly NFC. This patch moves the loop 
structure checks from LAA to their respective consumers (where duplicates don't 
already exist).  Moving the checks does end up changing some of the 
optimization warnings and debug output slightly, but nothing that appears to be 
a regression.

Why do this? Well, after auditing the code, I can't actually find anything in 
LAA itself which relies on having all instructions within a loop execute an 
equal number of times. This patch simply makes this explicit so that if one 
consumer - say LV in the near future (hopefully) - wants to handle a broader 
class of loops, it can do so.

Differential Revision: https://reviews.llvm.org/D92066

Added: 


Modified: 
llvm/lib/Analysis/LoopAccessAnalysis.cpp
llvm/lib/Transforms/Scalar/LoopDistribute.cpp
llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
llvm/lib/Transforms/Utils/LoopVersioning.cpp

Removed: 




diff  --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp 
b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 65d39161c1be..be340a3b3130 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1781,26 +1781,6 @@ bool LoopAccessInfo::canAnalyzeLoop() {
 return false;
   }
 
-  // We must have a single exiting block.
-  if (!TheLoop->getExitingBlock()) {
-LLVM_DEBUG(
-dbgs() << "LAA: loop control flow is not understood by analyzer\n");
-recordAnalysis("CFGNotUnderstood")
-<< "loop control flow is not understood by analyzer";
-return false;
-  }
-
-  // We only handle bottom-tested loops, i.e. loop in which the condition is
-  // checked at the end of each iteration. With that we can assume that all
-  // instructions in the loop are executed the same number of times.
-  if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
-LLVM_DEBUG(
-dbgs() << "LAA: loop control flow is not understood by analyzer\n");
-recordAnalysis("CFGNotUnderstood")
-<< "loop control flow is not understood by analyzer";
-return false;
-  }
-
   // ScalarEvolution needs to be able to find the exit count.
   const SCEV *ExitCount = PSE->getBackedgeTakenCount();
   if (isa(ExitCount)) {

diff  --git a/llvm/lib/Transforms/Scalar/LoopDistribute.cpp 
b/llvm/lib/Transforms/Scalar/LoopDistribute.cpp
index 98d67efef922..3dd7d9dce67a 100644
--- a/llvm/lib/Transforms/Scalar/LoopDistribute.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopDistribute.cpp
@@ -670,15 +670,17 @@ class LoopDistributeForLoop {
   << L->getHeader()->getParent()->getName()
   << "\" checking " << *L << "\n");
 
+// Having a single exit block implies there's also one exiting block.
 if (!L->getExitBlock())
   return fail("MultipleExitBlocks", "multiple exit blocks");
 if (!L->isLoopSimplifyForm())
   return fail("NotLoopSimplifyForm",
   "loop is not in loop-simplify form");
+if (!L->isRotatedForm())
+  return fail("NotBottomTested", "loop is not bottom tested");
 
 BasicBlock *PH = L->getLoopPreheader();
 
-// LAA will check that we only have a single exiting block.
 LAI = &GetLAA(*L);
 
 // Currently, we only distribute to isolate the part of the loop with

diff  --git a/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp 
b/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
index 475448740ae4..56afddead619 100644
--- a/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
@@ -632,6 +632,9 @@ eliminateLoadsAcrossLoops(Function &F, LoopInfo &LI, 
DominatorTree &DT,
 
   // Now walk the identified inner loops.
   for (Loop *L : Worklist) {
+// Match historical behavior
+if (!L->isRotatedForm() || !L->getExitingBlock())
+  continue;
 // The actual work is performed by LoadEliminationForLoop.
 LoadEliminationForLoop LEL(L, &LI, GetLAI(*L), &DT, BFI, PSI);
 Changed |= LEL.processLoop();

diff  --git a/llvm/lib/Transforms/Utils/LoopVersioning.cpp 
b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
index 03eb41b5ee0d..b605cb2fb865 100644
--- a/llvm/lib/Transforms/Utils/LoopVersioning.cpp
+++ b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
@@ -269,8 +269,11 @@ bool runImpl(LoopInfo *LI, function_ref GetLAA,
   // Now walk the identified inner loops.
   bool Changed = false;
   for (Loop *L : Worklist) {
+if (!L->isLoopSimplifyForm() || !L->isRotatedForm() ||

[llvm-branch-commits] [clang] 3b3eb7f - Speculative fix for build bot failures

2020-12-14 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-14T13:44:40-08:00
New Revision: 3b3eb7f07ff97feb64a1975587bb473f1f3efa6b

URL: 
https://github.com/llvm/llvm-project/commit/3b3eb7f07ff97feb64a1975587bb473f1f3efa6b
DIFF: 
https://github.com/llvm/llvm-project/commit/3b3eb7f07ff97feb64a1975587bb473f1f3efa6b.diff

LOG: Speculative fix for build bot failures

(The clang build fails for me locally, so this is based on built bot output and 
a guess as to root cause.)

f5fe849 made the execution of LAA conditional, so I'm guessing that's the root 
cause.

Added: 


Modified: 
clang/test/CodeGen/thinlto-distributed-newpm.ll

Removed: 




diff  --git a/clang/test/CodeGen/thinlto-distributed-newpm.ll 
b/clang/test/CodeGen/thinlto-distributed-newpm.ll
index 75ea4064d6af..8fe53762837e 100644
--- a/clang/test/CodeGen/thinlto-distributed-newpm.ll
+++ b/clang/test/CodeGen/thinlto-distributed-newpm.ll
@@ -183,7 +183,6 @@
 ; CHECK-O: Running analysis: PostDominatorTreeAnalysis on main
 ; CHECK-O: Running analysis: DemandedBitsAnalysis on main
 ; CHECK-O: Running pass: LoopLoadEliminationPass on main
-; CHECK-O: Running analysis: LoopAccessAnalysis on Loop at depth 1 containing: 
%b
 ; CHECK-O: Running pass: InstCombinePass on main
 ; CHECK-O: Running pass: SimplifyCFGPass on main
 ; CHECK-O: Running pass: SLPVectorizerPass on main



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 99ac886 - [tests][LV] precommit tests for D93317

2020-12-15 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-15T10:53:34-08:00
New Revision: 99ac8868cfb403aeffe5b3f13e3487eed79e67b9

URL: 
https://github.com/llvm/llvm-project/commit/99ac8868cfb403aeffe5b3f13e3487eed79e67b9
DIFF: 
https://github.com/llvm/llvm-project/commit/99ac8868cfb403aeffe5b3f13e3487eed79e67b9.diff

LOG: [tests][LV] precommit tests for D93317

Added: 


Modified: 
llvm/test/Transforms/LoopVectorize/loop-form.ll

Removed: 




diff  --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll 
b/llvm/test/Transforms/LoopVectorize/loop-form.ll
index 3bbe8100e34e..cebe7844bb11 100644
--- a/llvm/test/Transforms/LoopVectorize/loop-form.ll
+++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll
@@ -1,16 +1,80 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt -S -loop-vectorize < %s | FileCheck %s
 target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
 
-; Check that we vectorize only bottom-tested loops.
-; This is a reduced testcase from PR21302.
+define void @bottom_tested(i16* %p, i32 %n) {
+; CHECK-LABEL: @bottom_tested(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[TMP0:%.*]] = icmp sgt i32 [[N:%.*]], 0
+; CHECK-NEXT:[[SMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 0
+; CHECK-NEXT:[[TMP1:%.*]] = add nuw i32 [[SMAX]], 1
+; CHECK-NEXT:[[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP1]], 2
+; CHECK-NEXT:br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label 
[[VECTOR_PH:%.*]]
+; CHECK:   vector.ph:
+; CHECK-NEXT:[[N_MOD_VF:%.*]] = urem i32 [[TMP1]], 2
+; CHECK-NEXT:[[N_VEC:%.*]] = sub i32 [[TMP1]], [[N_MOD_VF]]
+; CHECK-NEXT:br label [[VECTOR_BODY:%.*]]
+; CHECK:   vector.body:
+; CHECK-NEXT:[[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[TMP2:%.*]] = add i32 [[INDEX]], 0
+; CHECK-NEXT:[[TMP3:%.*]] = sext i32 [[TMP2]] to i64
+; CHECK-NEXT:[[TMP4:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], 
i64 [[TMP3]]
+; CHECK-NEXT:[[TMP5:%.*]] = getelementptr inbounds i16, i16* [[TMP4]], i32 0
+; CHECK-NEXT:[[TMP6:%.*]] = bitcast i16* [[TMP5]] to <2 x i16>*
+; CHECK-NEXT:store <2 x i16> zeroinitializer, <2 x i16>* [[TMP6]], align 4
+; CHECK-NEXT:[[INDEX_NEXT]] = add i32 [[INDEX]], 2
+; CHECK-NEXT:[[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label 
[[VECTOR_BODY]], [[LOOP0:!llvm.loop !.*]]
+; CHECK:   middle.block:
+; CHECK-NEXT:[[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
+; CHECK-NEXT:br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]
+; CHECK:   scalar.ph:
+; CHECK-NEXT:[[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] 
], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:br label [[FOR_COND:%.*]]
+; CHECK:   for.cond:
+; CHECK-NEXT:[[I:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ 
[[INC:%.*]], [[FOR_COND]] ]
+; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64
+; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P]], i64 
[[IPROM]]
+; CHECK-NEXT:store i16 0, i16* [[B]], align 4
+; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1
+; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
+; CHECK-NEXT:br i1 [[CMP]], label [[FOR_COND]], label [[IF_END]], 
[[LOOP2:!llvm.loop !.*]]
+; CHECK:   if.end:
+; CHECK-NEXT:ret void
 ;
-; rdar://problem/18886083
+entry:
+  br label %for.cond
+
+for.cond:
+  %i = phi i32 [ 0, %entry ], [ %inc, %for.cond ]
+  %iprom = sext i32 %i to i64
+  %b = getelementptr inbounds i16, i16* %p, i64 %iprom
+  store i16 0, i16* %b, align 4
+  %inc = add nsw i32 %i, 1
+  %cmp = icmp slt i32 %i, %n
+  br i1 %cmp, label %for.cond, label %if.end
 
-%struct.X = type { i32, i16 }
-; CHECK-LABEL: @foo(
-; CHECK-NOT: vector.body
+if.end:
+  ret void
+}
 
-define void @foo(i32 %n) {
+define void @early_exit(i16* %p, i32 %n) {
+; CHECK-LABEL: @early_exit(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[FOR_COND:%.*]]
+; CHECK:   for.cond:
+; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], 
[[FOR_BODY:%.*]] ]
+; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]]
+; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64
+; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 
[[IPROM]]
+; CHECK-NEXT:store i16 0, i16* [[B]], align 4
+; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1
+; CHECK-NEXT:br label [[FOR_COND]]
+; CHECK:   if.end:
+; CHECK-NEXT:ret void
+;
 entry:
   br label %for.cond
 
@@ -21,7 +85,7 @@ for.cond:
 
 for.body:
   %iprom = sext i32 %i to i64
-  %b = getelementptr inbounds %struct.X, %struct.X* undef, i64 %iprom, i32 1
+  %b = getelementptr inbounds i16, i16* %p, i64 %iprom
   store i16 0, i16* %b, align 4
   %inc = add

[llvm-branch-commits] [llvm] a048e2f - [tests] fix an accidental target dependence added in 99ac8868

2020-12-15 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-15T11:07:30-08:00
New Revision: a048e2fa1d0285a3582bd224d5652dbf1dc91cb4

URL: 
https://github.com/llvm/llvm-project/commit/a048e2fa1d0285a3582bd224d5652dbf1dc91cb4
DIFF: 
https://github.com/llvm/llvm-project/commit/a048e2fa1d0285a3582bd224d5652dbf1dc91cb4.diff

LOG: [tests] fix an accidental target dependence added in 99ac8868

Added: 


Modified: 
llvm/test/Transforms/LoopVectorize/loop-form.ll

Removed: 




diff  --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll 
b/llvm/test/Transforms/LoopVectorize/loop-form.ll
index cebe7844bb11..298143ba726c 100644
--- a/llvm/test/Transforms/LoopVectorize/loop-form.ll
+++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -S -loop-vectorize < %s | FileCheck %s
+; RUN: opt -S -loop-vectorize -force-vector-width=2 < %s | FileCheck %s
 target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
 
 define void @bottom_tested(i16* %p, i32 %n) {



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] a81db8b - [LV] Restructure handling of -prefer-predicate-over-epilogue option [NFC]

2020-12-15 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-15T12:38:13-08:00
New Revision: a81db8b3159e72a6d2ecb2318024316e4aa30933

URL: 
https://github.com/llvm/llvm-project/commit/a81db8b3159e72a6d2ecb2318024316e4aa30933
DIFF: 
https://github.com/llvm/llvm-project/commit/a81db8b3159e72a6d2ecb2318024316e4aa30933.diff

LOG: [LV] Restructure handling of -prefer-predicate-over-epilogue option [NFC]

This should be purely non-functional.  When touching this code for another 
reason, I found the handling of the PredicateOrDontVectorize piece here very 
confusing.  Let's make it an explicit state (instead of an implicit combination 
of two variables), and use early return for options/hint processing.

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index c96637762658..6e506a4d71a4 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1201,7 +1201,10 @@ enum ScalarEpilogueLowering {
   CM_ScalarEpilogueNotAllowedLowTripLoop,
 
   // Loop hint predicate indicating an epilogue is undesired.
-  CM_ScalarEpilogueNotNeededUsePredicate
+  CM_ScalarEpilogueNotNeededUsePredicate,
+
+  // Directive indicating we must either tail fold or not vectorize
+  CM_ScalarEpilogueNotAllowedUsePredicate
 };
 
 /// LoopVectorizationCostModel - estimates the expected speedups due to
@@ -5463,6 +5466,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount 
UserVF, unsigned UserIC) {
   switch (ScalarEpilogueStatus) {
   case CM_ScalarEpilogueAllowed:
 return MaxVF;
+  case CM_ScalarEpilogueNotAllowedUsePredicate:
+LLVM_FALLTHROUGH;
   case CM_ScalarEpilogueNotNeededUsePredicate:
 LLVM_DEBUG(
 dbgs() << "LV: vector predicate hint/switch found.\n"
@@ -5522,16 +5527,17 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount 
UserVF, unsigned UserIC) {
   // If there was a tail-folding hint/switch, but we can't fold the tail by
   // masking, fallback to a vectorization with a scalar epilogue.
   if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
-if (PreferPredicateOverEpilogue == 
PreferPredicateTy::PredicateOrDontVectorize) {
-  LLVM_DEBUG(dbgs() << "LV: Can't fold tail by masking: don't 
vectorize\n");
-  return None;
-}
 LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
  "scalar epilogue instead.\n");
 ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
 return MaxVF;
   }
 
+  if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedUsePredicate) {
+LLVM_DEBUG(dbgs() << "LV: Can't fold tail by masking: don't vectorize\n");
+return None;
+  }
+  
   if (TC == 0) {
 reportVectorizationFailure(
 "Unable to calculate the loop count due to complex control flow",
@@ -8855,22 +8861,29 @@ static ScalarEpilogueLowering getScalarEpilogueLowering(
   Hints.getForce() != LoopVectorizeHints::FK_Enabled))
 return CM_ScalarEpilogueNotAllowedOptSize;
 
-  bool PredicateOptDisabled = PreferPredicateOverEpilogue.getNumOccurrences() 
&&
-  !PreferPredicateOverEpilogue;
+  // 2) If set, obey the directives
+  if (PreferPredicateOverEpilogue.getNumOccurrences()) {
+switch (PreferPredicateOverEpilogue) {
+case PreferPredicateTy::ScalarEpilogue:
+  return CM_ScalarEpilogueAllowed;
+case PreferPredicateTy::PredicateElseScalarEpilogue:
+  return CM_ScalarEpilogueNotNeededUsePredicate;
+case PreferPredicateTy::PredicateOrDontVectorize:
+  return CM_ScalarEpilogueNotAllowedUsePredicate;
+};
+  }
 
-  // 2) Next, if disabling predication is requested on the command line, honour
-  // this and request a scalar epilogue.
-  if (PredicateOptDisabled)
+  // 3) If set, obey the hints
+  switch (Hints.getPredicate()) {
+  case LoopVectorizeHints::FK_Enabled:
+return CM_ScalarEpilogueNotNeededUsePredicate;
+  case LoopVectorizeHints::FK_Disabled:
 return CM_ScalarEpilogueAllowed;
+  };
 
-  // 3) and 4) look if enabling predication is requested on the command line,
-  // with a loop hint, or if the TTI hook indicates this is profitable, request
-  // predication.
-  if (PreferPredicateOverEpilogue ||
-  Hints.getPredicate() == LoopVectorizeHints::FK_Enabled ||
-  (TTI->preferPredicateOverEpilogue(L, LI, *SE, *AC, TLI, DT,
-LVL.getLAI()) &&
-   Hints.getPredicate() != LoopVectorizeHints::FK_Disabled))
+  // 4) if the TTI hook indicates this is profitable, request predication.
+  if (TTI->preferPredicateOverEpilogue(L, LI, *SE, *AC, TLI, DT,
+   LVL.getLAI()))
 return CM_ScalarEpilogueNotNeededUsePredicate;
 
   return CM_ScalarEpilogueAllowed;



_

[llvm-branch-commits] [llvm] af7ef89 - [LV] Extend dead instruction detection to multiple exiting blocks

2020-12-15 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-15T18:46:32-08:00
New Revision: af7ef895d4951cd41c5e055c84469b4fd229d50c

URL: 
https://github.com/llvm/llvm-project/commit/af7ef895d4951cd41c5e055c84469b4fd229d50c
DIFF: 
https://github.com/llvm/llvm-project/commit/af7ef895d4951cd41c5e055c84469b4fd229d50c.diff

LOG: [LV] Extend dead instruction detection to multiple exiting blocks

Given we haven't yet enabled multiple exiting blocks, this is currently non 
functional, but it's an obvious extension which cleans up a later patch.

I don't think this is worth review (as it's pretty obvious), if anyone 
disagrees, feel feel to revert or comment and I will.

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6e506a4d71a4..cbeb6a32825f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7472,16 +7472,23 @@ void 
LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV,
 
 void LoopVectorizationPlanner::collectTriviallyDeadInstructions(
 SmallPtrSetImpl &DeadInstructions) {
-  BasicBlock *Latch = OrigLoop->getLoopLatch();
 
-  // We create new control-flow for the vectorized loop, so the original
-  // condition will be dead after vectorization if it's only used by the
-  // branch.
-  auto *Cmp = dyn_cast(Latch->getTerminator()->getOperand(0));
-  if (Cmp && Cmp->hasOneUse()) {
-DeadInstructions.insert(Cmp);
+  // We create new control-flow for the vectorized loop, so the original exit
+  // conditions will be dead after vectorization if it's only used by the
+  // terminator
+  SmallVector ExitingBlocks;
+  OrigLoop->getExitingBlocks(ExitingBlocks);
+  for (auto *BB : ExitingBlocks) {
+auto *Cmp = dyn_cast(BB->getTerminator()->getOperand(0));
+if (!Cmp || !Cmp->hasOneUse())
+  continue;
+
+// TODO: we should introduce a getUniqueExitingBlocks on Loop
+if (!DeadInstructions.insert(Cmp).second)
+  continue;
 
 // The operands of the icmp is often a dead trunc, used by IndUpdate.
+// TODO: can recurse through operands in general
 for (Value *Op : Cmp->operands()) {
   if (isa(Op) && Op->hasOneUse())
   DeadInstructions.insert(cast(Op));
@@ -7491,6 +7498,7 @@ void 
LoopVectorizationPlanner::collectTriviallyDeadInstructions(
   // We create new "steps" for induction variable updates to which the original
   // induction variables map. An original update instruction will be dead if
   // all its users except the induction variable are dead.
+  auto *Latch = OrigLoop->getLoopLatch();
   for (auto &Induction : Legal->getInductionVars()) {
 PHINode *Ind = Induction.first;
 auto *IndUpdate = cast(Ind->getIncomingValueForBlock(Latch));



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 1f6e155 - [LV] Weaken a unnecessarily strong assert [NFC]

2020-12-15 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-15T19:07:53-08:00
New Revision: 1f6e15566f147f5814b0fe04df71a8d6acc4e689

URL: 
https://github.com/llvm/llvm-project/commit/1f6e15566f147f5814b0fe04df71a8d6acc4e689
DIFF: 
https://github.com/llvm/llvm-project/commit/1f6e15566f147f5814b0fe04df71a8d6acc4e689.diff

LOG: [LV] Weaken a unnecessarily strong assert [NFC]

Account for the fact that (in the future) the latch might be a switch not a 
branch.  The existing code is correct, minus the assert.

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index cbeb6a32825f..37863b035067 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3409,14 +3409,7 @@ BasicBlock 
*InnerLoopVectorizer::completeLoopSkeleton(Loop *L,
   Value *Count = getOrCreateTripCount(L);
   Value *VectorTripCount = getOrCreateVectorTripCount(L);
 
-  // We need the OrigLoop (scalar loop part) latch terminator to help
-  // produce correct debug info for the middle block BB instructions.
-  // The legality check stage guarantees that the loop will have a single
-  // latch.
-  assert(isa(OrigLoop->getLoopLatch()->getTerminator()) &&
- "Scalar loop latch terminator isn't a branch");
-  BranchInst *ScalarLatchBr =
-  cast(OrigLoop->getLoopLatch()->getTerminator());
+  auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
 
   // Add a check in the middle block to see if we have completed
   // all of the iterations in the first vector loop.
@@ -3428,16 +3421,16 @@ BasicBlock 
*InnerLoopVectorizer::completeLoopSkeleton(Loop *L,
VectorTripCount, "cmp.n",
LoopMiddleBlock->getTerminator());
 
-// Here we use the same DebugLoc as the scalar loop latch branch instead
+// Here we use the same DebugLoc as the scalar loop latch terminator 
instead
 // of the corresponding compare because they may have ended up with
 // 
diff erent line numbers and we want to avoid awkward line stepping while
 // debugging. Eg. if the compare has got a line number inside the loop.
-cast(CmpN)->setDebugLoc(ScalarLatchBr->getDebugLoc());
+cast(CmpN)->setDebugLoc(ScalarLatchTerm->getDebugLoc());
   }
 
   BranchInst *BrInst =
   BranchInst::Create(LoopExitBlock, LoopScalarPreHeader, CmpN);
-  BrInst->setDebugLoc(ScalarLatchBr->getDebugLoc());
+  BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
   ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);
 
   // Get ready to start creating new instructions into the vectorized body.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] f106b28 - [tests] precommit a test mentioned in review for D93317

2020-12-22 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-22T09:47:19-08:00
New Revision: f106b281be24df4b5ed4553c3c09c885610cd2b8

URL: 
https://github.com/llvm/llvm-project/commit/f106b281be24df4b5ed4553c3c09c885610cd2b8
DIFF: 
https://github.com/llvm/llvm-project/commit/f106b281be24df4b5ed4553c3c09c885610cd2b8.diff

LOG: [tests] precommit a test mentioned in review for D93317

Added: 


Modified: 
llvm/test/Transforms/LoopVectorize/loop-form.ll

Removed: 




diff  --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll 
b/llvm/test/Transforms/LoopVectorize/loop-form.ll
index 298143ba726c..72f2215bb934 100644
--- a/llvm/test/Transforms/LoopVectorize/loop-form.ll
+++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll
@@ -338,3 +338,91 @@ if.end:
 if.end2:
   ret i32 1
 }
+
+define i32 @multiple_latch1(i16* %p) {
+; CHECK-LABEL: @multiple_latch1(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[FOR_BODY:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:[[I_02:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], 
[[FOR_BODY_BACKEDGE:%.*]] ]
+; CHECK-NEXT:[[INC]] = add nsw i32 [[I_02]], 1
+; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[INC]], 16
+; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY_BACKEDGE]], label 
[[FOR_SECOND:%.*]]
+; CHECK:   for.second:
+; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I_02]] to i64
+; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 
[[IPROM]]
+; CHECK-NEXT:store i16 0, i16* [[B]], align 4
+; CHECK-NEXT:[[CMPS:%.*]] = icmp sgt i32 [[INC]], 16
+; CHECK-NEXT:br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label 
[[FOR_END:%.*]]
+; CHECK:   for.body.backedge:
+; CHECK-NEXT:br label [[FOR_BODY]]
+; CHECK:   for.end:
+; CHECK-NEXT:ret i32 0
+;
+entry:
+  br label %for.body
+
+for.body:
+  %i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body.backedge]
+  %inc = add nsw i32 %i.02, 1
+  %cmp = icmp slt i32 %inc, 16
+  br i1 %cmp, label %for.body.backedge, label %for.second
+
+for.second:
+  %iprom = sext i32 %i.02 to i64
+  %b = getelementptr inbounds i16, i16* %p, i64 %iprom
+  store i16 0, i16* %b, align 4
+  %cmps = icmp sgt i32 %inc, 16
+  br i1 %cmps, label %for.body.backedge, label %for.end
+
+for.body.backedge:
+  br label %for.body
+
+for.end:
+  ret i32 0
+}
+
+
+; two back branches - loop simplify with convert this to the same form
+; as previous before vectorizer sees it, but show that.
+define i32 @multiple_latch2(i16* %p) {
+; CHECK-LABEL: @multiple_latch2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br label [[FOR_BODY:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:[[I_02:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], 
[[FOR_BODY_BACKEDGE:%.*]] ]
+; CHECK-NEXT:[[INC]] = add nsw i32 [[I_02]], 1
+; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[INC]], 16
+; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY_BACKEDGE]], label 
[[FOR_SECOND:%.*]]
+; CHECK:   for.body.backedge:
+; CHECK-NEXT:br label [[FOR_BODY]]
+; CHECK:   for.second:
+; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I_02]] to i64
+; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 
[[IPROM]]
+; CHECK-NEXT:store i16 0, i16* [[B]], align 4
+; CHECK-NEXT:[[CMPS:%.*]] = icmp sgt i32 [[INC]], 16
+; CHECK-NEXT:br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label 
[[FOR_END:%.*]]
+; CHECK:   for.end:
+; CHECK-NEXT:ret i32 0
+;
+entry:
+  br label %for.body
+
+for.body:
+  %i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]
+  %inc = add nsw i32 %i.02, 1
+  %cmp = icmp slt i32 %inc, 16
+  br i1 %cmp, label %for.body, label %for.second
+
+for.second:
+  %iprom = sext i32 %i.02 to i64
+  %b = getelementptr inbounds i16, i16* %p, i64 %iprom
+  store i16 0, i16* %b, align 4
+  %cmps = icmp sgt i32 %inc, 16
+  br i1 %cmps, label %for.body, label %for.end
+
+for.end:
+  ret i32 0
+}
+
+declare void @foo()



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] e4df6a4 - [LV] Vectorize (some) early and multiple exit loops

2020-12-28 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-28T09:40:42-08:00
New Revision: e4df6a40dad66e989a4333c11d39cf3ed9635135

URL: 
https://github.com/llvm/llvm-project/commit/e4df6a40dad66e989a4333c11d39cf3ed9635135
DIFF: 
https://github.com/llvm/llvm-project/commit/e4df6a40dad66e989a4333c11d39cf3ed9635135.diff

LOG: [LV] Vectorize (some) early and multiple exit loops

This patch is a major step towards supporting multiple exit loops in the 
vectorizer. This patch on it's own extends the loop forms allowed in two ways:

single exit loops which are not bottom tested
multiple exit loops w/ a single exit block reached from all exits and no 
phis in the exit block (because of LCSSA this implies no values defined in the 
loop used later)

The restrictions on multiple exit loop structures will be removed in follow up 
patches; disallowing cases for now makes the code changes smaller and more 
obvious. As before, we can only handle loops with entirely analyzable exits. 
Removing that restriction is much harder, and is not part of currently planned 
efforts.

The basic idea here is that we can force the last iteration to run in the 
scalar epilogue loop (if we have one). From the definition of SCEV's backedge 
taken count, we know that no earlier iteration can exit the vector body. As 
such, we can leave the decision on which exit to be taken to the scalar code 
and generate a bottom tested vector loop which runs all but the last iteration.

The existing code already had the notion of requiring one iteration in the 
scalar epilogue, this patch is mainly about generalizing that support slightly, 
making sure we don't try to use this mechanism when tail folding, and updating 
the code to reflect the difference between a single exit block and a unique 
exit block (very mechanical).

Differential Revision: https://reviews.llvm.org/D93317

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/control-flow.ll
llvm/test/Transforms/LoopVectorize/loop-form.ll
llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 60e1cc9a4a59..911309c9421c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1095,9 +1095,15 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop 
*Lp,
   return false;
   }
 
-  // We must have a single exiting block.
-  if (!Lp->getExitingBlock()) {
-reportVectorizationFailure("The loop must have an exiting block",
+  // We currently must have a single "exit block" after the loop. Note that
+  // multiple "exiting blocks" inside the loop are allowed, provided they all
+  // reach the single exit block.
+  // TODO: This restriction can be relaxed in the near future, it's here solely
+  // to allow separation of changes for review. We need to generalize the phi
+  // update logic in a number of places.
+  BasicBlock *ExitBB = Lp->getUniqueExitBlock();
+  if (!ExitBB) {
+reportVectorizationFailure("The loop must have a unique exit block",
 "loop control flow is not understood by vectorizer",
 "CFGNotUnderstood", ORE, TheLoop);
 if (DoExtraAnalysis)
@@ -1106,11 +1112,14 @@ bool 
LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp,
   return false;
   }
 
-  // We only handle bottom-tested loops, i.e. loop in which the condition is
-  // checked at the end of each iteration. With that we can assume that all
-  // instructions in the loop are executed the same number of times.
-  if (Lp->getExitingBlock() != Lp->getLoopLatch()) {
-reportVectorizationFailure("The exiting block is not the loop latch",
+  // The existing code assumes that LCSSA implies that phis are single entry
+  // (which was true when we had at most a single exiting edge from the latch).
+  // In general, there's nothing which prevents an LCSSA phi in exit block from
+  // having two or more values if there are multiple exiting edges leading to
+  // the exit block.  (TODO: implement general case)
+  if (!empty(ExitBB->phis()) && !ExitBB->getSinglePredecessor()) {
+reportVectorizationFailure("The loop must have no live-out values if "
+   "it has more than one exiting block",
 "loop control flow is not understood by vectorizer",
 "CFGNotUnderstood", ORE, TheLoop);
 if (DoExtraAnalysis)

diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5889d5e55339..c48b650c3c3e 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -837,7 +837,8 @@ class InnerLoopVectorizer {

[llvm-branch-commits] [llvm] b06a2ad - [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE)

2020-11-23 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-11-23T15:32:17-08:00
New Revision: b06a2ad94f45abc18970ecc3cec93d140d036d8f

URL: 
https://github.com/llvm/llvm-project/commit/b06a2ad94f45abc18970ecc3cec93d140d036d8f
DIFF: 
https://github.com/llvm/llvm-project/commit/b06a2ad94f45abc18970ecc3cec93d140d036d8f.diff

LOG: [LoopVectorizer] Lower uniform loads as a single load (instead of relying 
on CSE)

A uniform load is one which loads from a uniform address across all lanes. As 
currently implemented, we cost model such loads as if we did a single scalar 
load + a broadcast, but the actual lowering replicates the load once per lane.

This change tweaks the lowering to use the REPLICATE strategy by marking such 
loads (and the computation leading to their memory operand) as uniform after 
vectorization. This is a useful change in itself, but it's real purpose is to 
pave the way for a following change which will generalize our uniformity logic.

In review discussion, there was an issue raised with coupling cost modeling 
with the lowering strategy for uniform inputs.  The discussion on that item 
remains unsettled and is pending larger architectural discussion.  We decided 
to move forward with this patch as is, and revise as warranted once the bigger 
picture design questions are settled.

Differential Revision: https://reviews.llvm.org/D91398

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
llvm/test/Transforms/LoopVectorize/multiple-strides-vectorization.ll

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index a6cdcd720343..15a3bd39c0f9 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2661,7 +2661,12 @@ void 
InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr, VPUser &User,
   // Replace the operands of the cloned instructions with their scalar
   // equivalents in the new loop.
   for (unsigned op = 0, e = User.getNumOperands(); op != e; ++op) {
-auto *NewOp = State.get(User.getOperand(op), Instance);
+auto *Operand = dyn_cast(Instr->getOperand(op));
+auto InputInstance = Instance;
+if (!Operand || !OrigLoop->contains(Operand) ||
+(Cost->isUniformAfterVectorization(Operand, State.VF)))
+  InputInstance.Lane = 0;
+auto *NewOp = State.get(User.getOperand(op), InputInstance);
 Cloned->setOperand(op, NewOp);
   }
   addNewMetadata(Cloned, Instr);
@@ -5031,6 +5036,11 @@ void 
LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
   // replicating region where only a single instance out of VF should be 
formed.
   // TODO: optimize such seldom cases if found important, see PR40816.
   auto addToWorklistIfAllowed = [&](Instruction *I) -> void {
+if (isOutOfScope(I)) {
+  LLVM_DEBUG(dbgs() << "LV: Found not uniform due to scope: "
+<< *I << "\n");
+  return;
+}
 if (isScalarWithPredication(I, VF)) {
   LLVM_DEBUG(dbgs() << "LV: Found not uniform being ScalarWithPredication: 
"
 << *I << "\n");
@@ -5051,16 +5061,25 @@ void 
LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
   // are pointers that are treated like consecutive pointers during
   // vectorization. The pointer operands of interleaved accesses are an
   // example.
-  SmallSetVector ConsecutiveLikePtrs;
+  SmallSetVector ConsecutiveLikePtrs;
 
   // Holds pointer operands of instructions that are possibly non-uniform.
-  SmallPtrSet PossibleNonUniformPtrs;
+  SmallPtrSet PossibleNonUniformPtrs;
 
   auto isUniformDecision = [&](Instruction *I, ElementCount VF) {
 InstWidening WideningDecision = getWideningDecision(I, VF);
 assert(WideningDecision != CM_Unknown &&
"Widening decision should be ready at this moment");
 
+// The address of a uniform mem op is itself uniform.  We exclude stores
+// here as there's an assumption in the current code that all uses of
+// uniform instructions are uniform and, as noted below, uniform stores are
+// still handled via replication (i.e. aren't uniform after vectorization).
+if (isa(I) && Legal->isUniformMemOp(*I)) {
+  assert(WideningDecision == CM_Scalarize);
+  return true;
+}
+
 return (WideningDecision == CM_Widen ||
 WideningDecision == CM_Widen_Reverse ||
 WideningDecision == CM_Interleave);
@@ -5076,10 +5095,21 @@ void 
LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
   for (auto *BB : TheLoop->blocks())
 for (auto &I : *BB) {
   // If there's no pointer operand, there's nothing to do.
-  auto *Ptr = 
dyn_cast_or_null(getLoadStorePointerOperand(&I));
+  auto *Ptr = getLoadStorePo

[llvm-branch-commits] [llvm] d6239b3 - [test] pre-comit test for D91451

2020-11-23 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-11-23T15:36:08-08:00
New Revision: d6239b3ea6c143a0c395eb3b8512677feaf6acc0

URL: 
https://github.com/llvm/llvm-project/commit/d6239b3ea6c143a0c395eb3b8512677feaf6acc0
DIFF: 
https://github.com/llvm/llvm-project/commit/d6239b3ea6c143a0c395eb3b8512677feaf6acc0.diff

LOG: [test] pre-comit test for D91451

Added: 


Modified: 
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll

Removed: 




diff  --git a/llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll 
b/llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
index 3c0ec386f073..a7e38c2115fb 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
@@ -131,6 +131,68 @@ loopexit:
   ret i32 %accum.next
 }
 
+define i32 @uniform_address(i32* align(4) %addr, i32 %byte_offset) {
+; CHECK-LABEL: @uniform_address(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:   vector.ph:
+; CHECK-NEXT:br label [[VECTOR_BODY:%.*]]
+; CHECK:   vector.body:
+; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:[[TMP0:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT:[[TMP1:%.*]] = add i64 [[INDEX]], 4
+; CHECK-NEXT:[[TMP2:%.*]] = add i64 [[INDEX]], 8
+; CHECK-NEXT:[[TMP3:%.*]] = add i64 [[INDEX]], 12
+; CHECK-NEXT:[[TMP4:%.*]] = udiv i32 [[BYTE_OFFSET:%.*]], 4
+; CHECK-NEXT:[[TMP5:%.*]] = udiv i32 [[BYTE_OFFSET]], 4
+; CHECK-NEXT:[[TMP6:%.*]] = udiv i32 [[BYTE_OFFSET]], 4
+; CHECK-NEXT:[[TMP7:%.*]] = udiv i32 [[BYTE_OFFSET]], 4
+; CHECK-NEXT:[[TMP8:%.*]] = getelementptr i32, i32* [[ADDR:%.*]], i32 
[[TMP4]]
+; CHECK-NEXT:[[TMP9:%.*]] = getelementptr i32, i32* [[ADDR]], i32 [[TMP5]]
+; CHECK-NEXT:[[TMP10:%.*]] = getelementptr i32, i32* [[ADDR]], i32 [[TMP6]]
+; CHECK-NEXT:[[TMP11:%.*]] = getelementptr i32, i32* [[ADDR]], i32 [[TMP7]]
+; CHECK-NEXT:[[TMP12:%.*]] = load i32, i32* [[TMP8]], align 4
+; CHECK-NEXT:[[TMP13:%.*]] = load i32, i32* [[TMP9]], align 4
+; CHECK-NEXT:[[TMP14:%.*]] = load i32, i32* [[TMP10]], align 4
+; CHECK-NEXT:[[TMP15:%.*]] = load i32, i32* [[TMP11]], align 4
+; CHECK-NEXT:[[INDEX_NEXT]] = add i64 [[INDEX]], 16
+; CHECK-NEXT:[[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
+; CHECK-NEXT:br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label 
[[VECTOR_BODY]], [[LOOP6:!llvm.loop !.*]]
+; CHECK:   middle.block:
+; CHECK-NEXT:[[CMP_N:%.*]] = icmp eq i64 4097, 4096
+; CHECK-NEXT:br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
+; CHECK:   scalar.ph:
+; CHECK-NEXT:[[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 
0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:br label [[FOR_BODY:%.*]]
+; CHECK:   for.body:
+; CHECK-NEXT:[[IV:%.*]] = phi i64 [ [[IV_NEXT:%.*]], [[FOR_BODY]] ], [ 
[[BC_RESUME_VAL]], [[SCALAR_PH]] ]
+; CHECK-NEXT:[[OFFSET:%.*]] = udiv i32 [[BYTE_OFFSET]], 4
+; CHECK-NEXT:[[GEP:%.*]] = getelementptr i32, i32* [[ADDR]], i32 [[OFFSET]]
+; CHECK-NEXT:[[LOAD:%.*]] = load i32, i32* [[GEP]], align 4
+; CHECK-NEXT:[[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:[[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096
+; CHECK-NEXT:br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], 
[[LOOP7:!llvm.loop !.*]]
+; CHECK:   loopexit:
+; CHECK-NEXT:[[LOAD_LCSSA:%.*]] = phi i32 [ [[LOAD]], [[FOR_BODY]] ], [ 
[[TMP15]], [[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:ret i32 [[LOAD_LCSSA]]
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
+  %offset = udiv i32 %byte_offset, 4
+  %gep = getelementptr i32, i32* %addr, i32 %offset
+  %load = load i32, i32* %gep
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond = icmp eq i64 %iv, 4096
+  br i1 %exitcond, label %loopexit, label %for.body
+
+loopexit:
+  ret i32 %load
+}
+
+
 
 define void @uniform_store_uniform_value(i32* align(4) %addr) {
 ; CHECK-LABEL: @uniform_store_uniform_value(
@@ -162,7 +224,7 @@ define void @uniform_store_uniform_value(i32* align(4) 
%addr) {
 ; CHECK-NEXT:store i32 0, i32* [[ADDR]], align 4
 ; CHECK-NEXT:[[INDEX_NEXT]] = add i64 [[INDEX]], 16
 ; CHECK-NEXT:[[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
-; CHECK-NEXT:br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label 
[[VECTOR_BODY]], [[LOOP6:!llvm.loop !.*]]
+; CHECK-NEXT:br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label 
[[VECTOR_BODY]], [[LOOP8:!llvm.loop !.*]]
 ; CHECK:   middle.block:
 ; CHECK-NEXT:[[CMP_N:%.*]] = icmp eq i64 4097, 4096
 ; CHECK-NEXT:br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
@@ -174,7 +236,7 @@ define void @uniform_store_uniform_value(i32* align(4) 
%addr) {
 ; CHECK-NEXT:store i32 0, i32* [[ADDR]], align 4
 ; CHECK-NEXT:[[IV_NEXT]] = a

[llvm-branch-commits] [llvm] b3a8a15 - [LAA] Minor code style tweaks [NFC]

2020-11-24 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-11-24T15:49:27-08:00
New Revision: b3a8a153433f65c419b891ae6763f458b33e9605

URL: 
https://github.com/llvm/llvm-project/commit/b3a8a153433f65c419b891ae6763f458b33e9605
DIFF: 
https://github.com/llvm/llvm-project/commit/b3a8a153433f65c419b891ae6763f458b33e9605.diff

LOG: [LAA] Minor code style tweaks [NFC]

Added: 


Modified: 
llvm/lib/Analysis/LoopAccessAnalysis.cpp

Removed: 




diff  --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp 
b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 34de1a052ddf..0bffa7dbddec 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -149,27 +149,23 @@ const SCEV 
*llvm::replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,
   // symbolic stride replaced by one.
   ValueToValueMap::const_iterator SI =
   PtrToStride.find(OrigPtr ? OrigPtr : Ptr);
-  if (SI != PtrToStride.end()) {
-Value *StrideVal = SI->second;
+  if (SI == PtrToStride.end())
+// For a non-symbolic stride, just return the original expression.
+return OrigSCEV;
 
-// Strip casts.
-StrideVal = stripIntegerCast(StrideVal);
+  Value *StrideVal = stripIntegerCast(SI->second);
 
-ScalarEvolution *SE = PSE.getSE();
-const auto *U = cast(SE->getSCEV(StrideVal));
-const auto *CT =
-static_cast(SE->getOne(StrideVal->getType()));
+  ScalarEvolution *SE = PSE.getSE();
+  const auto *U = cast(SE->getSCEV(StrideVal));
+  const auto *CT =
+static_cast(SE->getOne(StrideVal->getType()));
 
-PSE.addPredicate(*SE->getEqualPredicate(U, CT));
-auto *Expr = PSE.getSCEV(Ptr);
+  PSE.addPredicate(*SE->getEqualPredicate(U, CT));
+  auto *Expr = PSE.getSCEV(Ptr);
 
-LLVM_DEBUG(dbgs() << "LAA: Replacing SCEV: " << *OrigSCEV
-  << " by: " << *Expr << "\n");
-return Expr;
-  }
-
-  // Otherwise, just return the SCEV of the original pointer.
-  return OrigSCEV;
+  LLVM_DEBUG(dbgs() << "LAA: Replacing SCEV: " << *OrigSCEV
+<< " by: " << *Expr << "\n");
+  return Expr;
 }
 
 RuntimeCheckingPtrGroup::RuntimeCheckingPtrGroup(
@@ -2150,12 +2146,8 @@ bool LoopAccessInfo::isUniform(Value *V) const {
 }
 
 void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
-  Value *Ptr = nullptr;
-  if (LoadInst *LI = dyn_cast(MemAccess))
-Ptr = LI->getPointerOperand();
-  else if (StoreInst *SI = dyn_cast(MemAccess))
-Ptr = SI->getPointerOperand();
-  else
+  Value *Ptr = getLoadStorePointerOperand(MemAccess);
+  if (!Ptr)
 return;
 
   Value *Stride = getStrideFromPointer(Ptr, PSE->getSE(), TheLoop);



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 10ddb92 - [SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC]

2020-11-24 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-11-24T18:47:49-08:00
New Revision: 10ddb927c1c3ee6af0436c23f93fe1da6de7b99a

URL: 
https://github.com/llvm/llvm-project/commit/10ddb927c1c3ee6af0436c23f93fe1da6de7b99a
DIFF: 
https://github.com/llvm/llvm-project/commit/10ddb927c1c3ee6af0436c23f93fe1da6de7b99a.diff

LOG: [SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC]

Some older code - and code copied from older code - still directly tested 
against the singelton result of SE::getCouldNotCompute.  Using the 
isa form is both shorter, and more readable.

Added: 


Modified: 
llvm/lib/Analysis/LoopAccessAnalysis.cpp
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp
llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Removed: 




diff  --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp 
b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 0bffa7dbddec..78f63c63cb40 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1803,7 +1803,7 @@ bool LoopAccessInfo::canAnalyzeLoop() {
 
   // ScalarEvolution needs to be able to find the exit count.
   const SCEV *ExitCount = PSE->getBackedgeTakenCount();
-  if (ExitCount == PSE->getSE()->getCouldNotCompute()) {
+  if (isa(ExitCount)) {
 recordAnalysis("CantComputeNumberOfIterations")
 << "could not determine number of loop iterations";
 LLVM_DEBUG(dbgs() << "LAA: SCEV could not compute the loop exit count.\n");

diff  --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp 
b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index 81b7c3a8338a..f676ffc18e2d 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -452,7 +452,7 @@ struct LoopInterchange {
   bool isComputableLoopNest(LoopVector LoopList) {
 for (Loop *L : LoopList) {
   const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(L);
-  if (ExitCountOuter == SE->getCouldNotCompute()) {
+  if (isa(ExitCountOuter)) {
 LLVM_DEBUG(dbgs() << "Couldn't compute backedge count\n");
 return false;
   }

diff  --git a/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp 
b/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp
index 3d0ce87047ad..2ff1e8480749 100644
--- a/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp
@@ -267,7 +267,7 @@ bool LoopVersioningLICM::legalLoopStructure() {
   // We need to be able to compute the loop trip count in order
   // to generate the bound checks.
   const SCEV *ExitCount = SE->getBackedgeTakenCount(CurLoop);
-  if (ExitCount == SE->getCouldNotCompute()) {
+  if (isa(ExitCount)) {
 LLVM_DEBUG(dbgs() << "loop does not has trip count\n");
 return false;
   }

diff  --git a/llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp 
b/llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp
index 4553b23532f2..ca114581a515 100644
--- a/llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp
+++ b/llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp
@@ -243,7 +243,7 @@ static bool mustBeFiniteCountedLoop(Loop *L, 
ScalarEvolution *SE,
 BasicBlock *Pred) {
   // A conservative bound on the loop as a whole.
   const SCEV *MaxTrips = SE->getConstantMaxBackedgeTakenCount(L);
-  if (MaxTrips != SE->getCouldNotCompute() &&
+  if (!isa(MaxTrips) &&
   SE->getUnsignedRange(MaxTrips).getUnsignedMax().isIntN(
   CountedLoopTripWidth))
 return true;
@@ -255,7 +255,7 @@ static bool mustBeFiniteCountedLoop(Loop *L, 
ScalarEvolution *SE,
 // This returns an exact expression only.  TODO: We really only need an
 // upper bound here, but SE doesn't expose that.
 const SCEV *MaxExec = SE->getExitCount(L, Pred);
-if (MaxExec != SE->getCouldNotCompute() &&
+if (!isa(MaxExec) &&
 SE->getUnsignedRange(MaxExec).getUnsignedMax().isIntN(
 CountedLoopTripWidth))
 return true;

diff  --git a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp 
b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
index 877495be2dcd..c7e37fe0d1b3 100644
--- a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
+++ b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
@@ -2468,7 +2468,7 @@ Value *SCEVExpander::generateOverflowCheck(const 
SCEVAddRecExpr *AR,
   const SCEV *ExitCount =
   SE.getPredicatedBackedgeTakenCount(AR->getLoop(), Pred);
 
-  assert(ExitCount != SE.getCouldNotCompute() && "Invalid loop count");
+  assert(!isa(ExitCount) && "Invalid loop count");
 
   const SCEV *Step = AR->getStepRecurrence(SE);
   const SCEV *Start = AR->getStart();

diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index af314ae4b27b..e29a0a8b

[llvm-branch-commits] [llvm] d93b8ac - [BasicAA] Add print routines to DecomposedGEP for ease of debugging

2020-12-03 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-03T12:43:39-08:00
New Revision: d93b8acd0949f65de5e7360c79f04a98a66cbd9d

URL: 
https://github.com/llvm/llvm-project/commit/d93b8acd0949f65de5e7360c79f04a98a66cbd9d
DIFF: 
https://github.com/llvm/llvm-project/commit/d93b8acd0949f65de5e7360c79f04a98a66cbd9d.diff

LOG: [BasicAA] Add print routines to DecomposedGEP for ease of debugging

Added: 


Modified: 
llvm/include/llvm/Analysis/BasicAliasAnalysis.h

Removed: 




diff  --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h 
b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
index 7f3cbba0b6af..e59fd6919f66 100644
--- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
+++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
@@ -126,6 +126,14 @@ class BasicAAResult : public AAResultBase {
 bool operator!=(const VariableGEPIndex &Other) const {
   return !operator==(Other);
 }
+
+void dump() const { print(dbgs()); }
+void print(raw_ostream &OS) const {
+  OS << "(V=" << V->getName()
+<< ", zextbits=" << ZExtBits
+<< ", sextbits=" << SExtBits
+<< ", scale=" << Scale << ")";
+}
   };
 
   // Represents the internal structure of a GEP, decomposed into a base 
pointer,
@@ -139,6 +147,20 @@ class BasicAAResult : public AAResultBase {
 SmallVector VarIndices;
 // Is GEP index scale compile-time constant.
 bool HasCompileTimeConstantScale;
+
+void dump() const { print(dbgs()); }
+void print(raw_ostream &OS) const {
+  OS << "(DecomposedGEP Base=" << Base->getName()
+<< ", Offset=" << Offset
+<< ", VarIndices=[" << Offset;
+  for (size_t i = 0; i < VarIndices.size(); i++) {
+   if (i != 0)
+ OS << ", ";
+   VarIndices[i].print(OS);
+  }
+  OS << "], HasCompileTimeConstantScale=" << HasCompileTimeConstantScale
+<< ")";
+}
   };
 
   /// Tracks phi nodes we have visited.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 17b195b - [BasicAA] Minor formatting improvements for printers

2020-12-03 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-03T13:08:56-08:00
New Revision: 17b195b632a780adf637432beda63c91eea2c106

URL: 
https://github.com/llvm/llvm-project/commit/17b195b632a780adf637432beda63c91eea2c106
DIFF: 
https://github.com/llvm/llvm-project/commit/17b195b632a780adf637432beda63c91eea2c106.diff

LOG: [BasicAA] Minor formatting improvements for printers

Added: 


Modified: 
llvm/include/llvm/Analysis/BasicAliasAnalysis.h

Removed: 




diff  --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h 
b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
index e59fd6919f66..4a149387eb74 100644
--- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
+++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
@@ -132,7 +132,7 @@ class BasicAAResult : public AAResultBase {
   OS << "(V=" << V->getName()
 << ", zextbits=" << ZExtBits
 << ", sextbits=" << SExtBits
-<< ", scale=" << Scale << ")";
+<< ", scale=" << Scale << ")\n";
 }
   };
 
@@ -152,14 +152,14 @@ class BasicAAResult : public AAResultBase {
 void print(raw_ostream &OS) const {
   OS << "(DecomposedGEP Base=" << Base->getName()
 << ", Offset=" << Offset
-<< ", VarIndices=[" << Offset;
+<< ", VarIndices=[";
   for (size_t i = 0; i < VarIndices.size(); i++) {
if (i != 0)
  OS << ", ";
VarIndices[i].print(OS);
   }
   OS << "], HasCompileTimeConstantScale=" << HasCompileTimeConstantScale
-<< ")";
+<< ")\n";
 }
   };
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 55db6ec - [BasicAA] Move newline to dump from printer

2020-12-03 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-03T14:35:43-08:00
New Revision: 55db6ec1cc20d32ad179e0059aafcc545125fca6

URL: 
https://github.com/llvm/llvm-project/commit/55db6ec1cc20d32ad179e0059aafcc545125fca6
DIFF: 
https://github.com/llvm/llvm-project/commit/55db6ec1cc20d32ad179e0059aafcc545125fca6.diff

LOG: [BasicAA] Move newline to dump from printer

Added: 


Modified: 
llvm/include/llvm/Analysis/BasicAliasAnalysis.h

Removed: 




diff  --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h 
b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
index 4a149387eb74..d9a174951695 100644
--- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
+++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
@@ -127,12 +127,15 @@ class BasicAAResult : public AAResultBase {
   return !operator==(Other);
 }
 
-void dump() const { print(dbgs()); }
+void dump() const {
+  print(dbgs());
+  dbgs() << "\n";
+}
 void print(raw_ostream &OS) const {
   OS << "(V=" << V->getName()
 << ", zextbits=" << ZExtBits
 << ", sextbits=" << SExtBits
-<< ", scale=" << Scale << ")\n";
+<< ", scale=" << Scale << ")";
 }
   };
 
@@ -148,7 +151,10 @@ class BasicAAResult : public AAResultBase {
 // Is GEP index scale compile-time constant.
 bool HasCompileTimeConstantScale;
 
-void dump() const { print(dbgs()); }
+void dump() const {
+  print(dbgs());
+  dbgs() << "\n";
+}
 void print(raw_ostream &OS) const {
   OS << "(DecomposedGEP Base=" << Base->getName()
 << ", Offset=" << Offset
@@ -159,7 +165,7 @@ class BasicAAResult : public AAResultBase {
VarIndices[i].print(OS);
   }
   OS << "], HasCompileTimeConstantScale=" << HasCompileTimeConstantScale
-<< ")\n";
+<< ")";
 }
   };
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 0c866a3 - [LoopVec] Support non-instructions as argument to uniform mem ops

2020-12-03 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-03T14:51:44-08:00
New Revision: 0c866a3d6aa492b01c29a2c582c56c0fd75c2970

URL: 
https://github.com/llvm/llvm-project/commit/0c866a3d6aa492b01c29a2c582c56c0fd75c2970
DIFF: 
https://github.com/llvm/llvm-project/commit/0c866a3d6aa492b01c29a2c582c56c0fd75c2970.diff

LOG: [LoopVec] Support non-instructions as argument to uniform mem ops

The initial step of the uniform-after-vectorization (lane-0 demanded only) 
analysis was very awkwardly written. It would revisit use list of each pointer 
operand of a widened load/store. As a result, it was in the worst case O(N^2) 
where N was the number of instructions in a loop, and had restricted operand 
Value types to reduce the size of use lists.

This patch replaces the original algorithm with one which is at most O(2N) in 
the number of instructions in the loop. (The key observation is that each use 
of a potentially interesting pointer is visited at most twice, once on first 
scan, once in the use list of *it's* operand. Only instructions within the loop 
have their uses scanned.)

In the process, we remove a restriction which required the operand of the 
uniform mem op to itself be an instruction.  This allows detection of uniform 
mem ops involving global addresses.

Differential Revision: https://reviews.llvm.org/D92056

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
llvm/test/Transforms/LoopVectorize/pr44488-predication.ll

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index daa100ebe8cd..8c02be8530be 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5252,24 +5252,13 @@ void 
LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
   if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())
 addToWorklistIfAllowed(Cmp);
 
-  // Holds consecutive and consecutive-like pointers. Consecutive-like pointers
-  // are pointers that are treated like consecutive pointers during
-  // vectorization. The pointer operands of interleaved accesses are an
-  // example.
-  SmallSetVector ConsecutiveLikePtrs;
-
-  // Holds pointer operands of instructions that are possibly non-uniform.
-  SmallPtrSet PossibleNonUniformPtrs;
-
   auto isUniformDecision = [&](Instruction *I, ElementCount VF) {
 InstWidening WideningDecision = getWideningDecision(I, VF);
 assert(WideningDecision != CM_Unknown &&
"Widening decision should be ready at this moment");
 
-// The address of a uniform mem op is itself uniform.  We exclude stores
-// here as there's an assumption in the current code that all uses of
-// uniform instructions are uniform and, as noted below, uniform stores are
-// still handled via replication (i.e. aren't uniform after vectorization).
+// A uniform memory op is itself uniform.  We exclude uniform stores
+// here as they demand the last lane, not the first one.
 if (isa(I) && Legal->isUniformMemOp(*I)) {
   assert(WideningDecision == CM_Scalarize);
   return true;
@@ -5287,14 +5276,15 @@ void 
LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
 return getLoadStorePointerOperand(I) == Ptr && isUniformDecision(I, VF);
   };
   
-  // Iterate over the instructions in the loop, and collect all
-  // consecutive-like pointer operands in ConsecutiveLikePtrs. If it's possible
-  // that a consecutive-like pointer operand will be scalarized, we collect it
-  // in PossibleNonUniformPtrs instead. We use two sets here because a single
-  // getelementptr instruction can be used by both vectorized and scalarized
-  // memory instructions. For example, if a loop loads and stores from the same
-  // location, but the store is conditional, the store will be scalarized, and
-  // the getelementptr won't remain uniform.
+  // Holds a list of values which are known to have at least one uniform use.
+  // Note that there may be other uses which aren't uniform.  A "uniform use"
+  // here is something which only demands lane 0 of the unrolled iterations;
+  // it does not imply that all lanes produce the same value (e.g. this is not 
+  // the usual meaning of uniform)
+  SmallPtrSet HasUniformUse;
+
+  // Scan the loop for instructions which are either a) known to have only
+  // lane 0 demanded or b) are uses which demand only lane 0 of their operand.
   for (auto *BB : TheLoop->blocks())
 for (auto &I : *BB) {
   // If there's no pointer operand, there's nothing to do.
@@ -5302,45 +5292,31 @@ void 
LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
   if (!Ptr)
 continue;
 
-  // For now, avoid walking use lists in other functions.
-  //

[llvm-branch-commits] [llvm] 0129cd5 - Use deref facts derived from minimum object size of allocations

2020-12-03 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-03T15:01:14-08:00
New Revision: 0129cd503575076556935a16f458b0a3c2e30646

URL: 
https://github.com/llvm/llvm-project/commit/0129cd503575076556935a16f458b0a3c2e30646
DIFF: 
https://github.com/llvm/llvm-project/commit/0129cd503575076556935a16f458b0a3c2e30646.diff

LOG: Use deref facts derived from minimum object size of allocations

This change should be fairly straight forward. If we've reached a call, check 
to see if we can tell the result is dereferenceable from information about the 
minimum object size returned by the call.

To control compile time impact, I'm only adding the call for base facts in the 
routine. getObjectSize can also do recursive reasoning, and we don't want that 
general capability here.

As a follow up patch (without separate review), I will plumb through the 
missing TLI parameter. That will have the effect of extending this to known 
libcalls - malloc, new, and the like - whereas currently this only covers calls 
with the explicit allocsize attribute.

Differential Revision: https://reviews.llvm.org/D90341

Added: 


Modified: 
llvm/lib/Analysis/Loads.cpp
llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

Removed: 




diff  --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index 2ca35a4344ec..8f373f70f216 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -12,7 +12,9 @@
 
 #include "llvm/Analysis/Loads.h"
 #include "llvm/Analysis/AliasAnalysis.h"
+#include "llvm/Analysis/CaptureTracking.h"
 #include "llvm/Analysis/LoopInfo.h"
+#include "llvm/Analysis/MemoryBuiltins.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/ScalarEvolutionExpressions.h"
 #include "llvm/Analysis/ValueTracking.h"
@@ -107,11 +109,50 @@ static bool isDereferenceableAndAlignedPointer(
 return isDereferenceableAndAlignedPointer(ASC->getOperand(0), Alignment,
   Size, DL, CtxI, DT, Visited, 
MaxDepth);
 
-  if (const auto *Call = dyn_cast(V))
+  if (const auto *Call = dyn_cast(V)) {
 if (auto *RP = getArgumentAliasingToReturnedPointer(Call, true))
   return isDereferenceableAndAlignedPointer(RP, Alignment, Size, DL, CtxI,
 DT, Visited, MaxDepth);
 
+// If we have a call we can't recurse through, check to see if this is an
+// allocation function for which we can establish an minimum object size.
+// Such a minimum object size is analogous to a deref_or_null attribute in
+// that we still need to prove the result non-null at point of use.
+// NOTE: We can only use the object size as a base fact as we a) need to
+// prove alignment too, and b) don't want the compile time impact of a
+// separate recursive walk.
+ObjectSizeOpts Opts;
+// TODO: It may be okay to round to align, but that would imply that
+// accessing slightly out of bounds was legal, and we're currently
+// inconsistent about that.  For the moment, be conservative.
+Opts.RoundToAlign = false;
+Opts.NullIsUnknownSize = true;
+uint64_t ObjSize;
+// TODO: Plumb through TLI so that malloc routines and such working.
+if (getObjectSize(V, ObjSize, DL, nullptr, Opts)) {
+  APInt KnownDerefBytes(Size.getBitWidth(), ObjSize);
+  if (KnownDerefBytes.getBoolValue() && KnownDerefBytes.uge(Size) &&
+  isKnownNonZero(V, DL, 0, nullptr, CtxI, DT) &&
+  // TODO: We're currently inconsistent about whether deref(N) is a
+  // global fact or a point in time fact.  Once D61652 eventually
+  // lands, this check will be restricted to the point in time
+  // variant. For that variant, we need to prove that object hasn't
+  // been conditionally freed before ontext instruction - if it has, we
+  // might be hoisting over the inverse conditional and creating a
+  // dynamic use after free. 
+  !PointerMayBeCapturedBefore(V, true, true, CtxI, DT, true)) {
+// As we recursed through GEPs to get here, we've incrementally
+// checked that each step advanced by a multiple of the alignment. If
+// our base is properly aligned, then the original offset accessed
+// must also be. 
+Type *Ty = V->getType();
+assert(Ty->isSized() && "must be sized");
+APInt Offset(DL.getTypeStoreSizeInBits(Ty), 0);
+return isAligned(V, Offset, Alignment, DL);
+  }
+}
+  }
+
   // If we don't know, assume the worst.
   return false;
 }

diff  --git a/llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll 
b/llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll
index 7937c71b7705..167285707e02 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll
@@ -8,7 +8,7 @@ target triple = "x86_64-unknown-linux-

[llvm-branch-commits] [llvm] 99f79cb - [test] precommit test for D92698

2020-12-04 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-04T15:17:39-08:00
New Revision: 99f79cbf31cc6ccdfa1aed253a64c5e8012f4ef7

URL: 
https://github.com/llvm/llvm-project/commit/99f79cbf31cc6ccdfa1aed253a64c5e8012f4ef7
DIFF: 
https://github.com/llvm/llvm-project/commit/99f79cbf31cc6ccdfa1aed253a64c5e8012f4ef7.diff

LOG: [test] precommit test for D92698

Added: 


Modified: 
llvm/test/Analysis/ValueTracking/known-non-equal.ll

Removed: 




diff  --git a/llvm/test/Analysis/ValueTracking/known-non-equal.ll 
b/llvm/test/Analysis/ValueTracking/known-non-equal.ll
index d28b3f4f63a3..ae2251b97ac4 100644
--- a/llvm/test/Analysis/ValueTracking/known-non-equal.ll
+++ b/llvm/test/Analysis/ValueTracking/known-non-equal.ll
@@ -1,20 +1,140 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt -instsimplify < %s -S | FileCheck %s
 
-; CHECK: define i1 @test
 define i1 @test(i8* %pq, i8 %B) {
+; CHECK-LABEL: @test(
+; CHECK-NEXT:ret i1 false
+;
   %q = load i8, i8* %pq, !range !0 ; %q is known nonzero; no known bits
   %A = add nsw i8 %B, %q
   %cmp = icmp eq i8 %A, %B
-  ; CHECK: ret i1 false
   ret i1 %cmp
 }
 
-; CHECK: define i1 @test2
 define i1 @test2(i8 %a, i8 %b) {
+; CHECK-LABEL: @test2(
+; CHECK-NEXT:ret i1 false
+;
   %A = or i8 %a, 2; %A[1] = 1
   %B = and i8 %b, -3  ; %B[1] = 0
   %cmp = icmp eq i8 %A, %B ; %A[1] and %B[1] are contradictory.
-  ; CHECK: ret i1 false
+  ret i1 %cmp
+}
+
+define i1 @test3(i8 %B) {
+; CHECK-LABEL: @test3(
+; CHECK-NEXT:ret i1 false
+;
+  %A = add nsw i8 %B, 1
+  %cmp = icmp eq i8 %A, %B
+  ret i1 %cmp
+}
+
+define i1 @sext(i8 %B) {
+; CHECK-LABEL: @sext(
+; CHECK-NEXT:ret i1 false
+;
+  %A = add nsw i8 %B, 1
+  %A.cast = sext i8 %A to i32
+  %B.cast = sext i8 %B to i32
+  %cmp = icmp eq i32 %A.cast, %B.cast
+  ret i1 %cmp
+}
+
+define i1 @zext(i8 %B) {
+; CHECK-LABEL: @zext(
+; CHECK-NEXT:ret i1 false
+;
+  %A = add nsw i8 %B, 1
+  %A.cast = zext i8 %A to i32
+  %B.cast = zext i8 %B to i32
+  %cmp = icmp eq i32 %A.cast, %B.cast
+  ret i1 %cmp
+}
+
+define i1 @inttoptr(i32 %B) {
+; CHECK-LABEL: @inttoptr(
+; CHECK-NEXT:[[A:%.*]] = add nsw i32 [[B:%.*]], 1
+; CHECK-NEXT:[[A_CAST:%.*]] = inttoptr i32 [[A]] to i8*
+; CHECK-NEXT:[[B_CAST:%.*]] = inttoptr i32 [[B]] to i8*
+; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8* [[A_CAST]], [[B_CAST]]
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+  %A = add nsw i32 %B, 1
+  %A.cast = inttoptr i32 %A to i8*
+  %B.cast = inttoptr i32 %B to i8*
+  %cmp = icmp eq i8* %A.cast, %B.cast
+  ret i1 %cmp
+}
+
+define i1 @ptrtoint(i32* %B) {
+; CHECK-LABEL: @ptrtoint(
+; CHECK-NEXT:[[A:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i32 1
+; CHECK-NEXT:[[A_CAST:%.*]] = ptrtoint i32* [[A]] to i32
+; CHECK-NEXT:[[B_CAST:%.*]] = ptrtoint i32* [[B]] to i32
+; CHECK-NEXT:[[CMP:%.*]] = icmp eq i32 [[A_CAST]], [[B_CAST]]
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+  %A = getelementptr inbounds i32, i32* %B, i32 1
+  %A.cast = ptrtoint i32* %A to i32
+  %B.cast = ptrtoint i32* %B to i32
+  %cmp = icmp eq i32 %A.cast, %B.cast
+  ret i1 %cmp
+}
+
+define i1 @add1(i8 %B, i8 %C) {
+; CHECK-LABEL: @add1(
+; CHECK-NEXT:ret i1 false
+;
+  %A = add i8 %B, 1
+  %A.op = add i8 %A, %C
+  %B.op = add i8 %B, %C
+
+  %cmp = icmp eq i8 %A.op, %B.op
+  ret i1 %cmp
+}
+
+define i1 @add2(i8 %B, i8 %C) {
+; CHECK-LABEL: @add2(
+; CHECK-NEXT:ret i1 false
+;
+  %A = add i8 %B, 1
+  %A.op = add i8 %C, %A
+  %B.op = add i8 %C, %B
+
+  %cmp = icmp eq i8 %A.op, %B.op
+  ret i1 %cmp
+}
+
+define i1 @sub1(i8 %B, i8 %C) {
+; CHECK-LABEL: @sub1(
+; CHECK-NEXT:[[A:%.*]] = add i8 [[B:%.*]], 1
+; CHECK-NEXT:[[A_OP:%.*]] = sub i8 [[A]], [[C:%.*]]
+; CHECK-NEXT:[[B_OP:%.*]] = sub i8 [[B]], [[C]]
+; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8 [[A_OP]], [[B_OP]]
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+  %A = add i8 %B, 1
+  %A.op = sub i8 %A, %C
+  %B.op = sub i8 %B, %C
+
+  %cmp = icmp eq i8 %A.op, %B.op
+  ret i1 %cmp
+}
+
+define i1 @sub2(i8 %B, i8 %C) {
+; CHECK-LABEL: @sub2(
+; CHECK-NEXT:[[A:%.*]] = add i8 [[B:%.*]], 1
+; CHECK-NEXT:[[A_OP:%.*]] = sub i8 [[C:%.*]], [[A]]
+; CHECK-NEXT:[[B_OP:%.*]] = sub i8 [[C]], [[B]]
+; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8 [[A_OP]], [[B_OP]]
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+  %A = add i8 %B, 1
+  %A.op = sub i8 %C, %A
+  %B.op = sub i8 %C, %B
+
+  %cmp = icmp eq i8 %A.op, %B.op
   ret i1 %cmp
 }
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] bfda694 - [BasicAA] Fix a bug with relational reasoning across iterations

2020-12-05 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-05T14:10:21-08:00
New Revision: bfda69416c6d0a76b40644b1b0cbc1cbca254a61

URL: 
https://github.com/llvm/llvm-project/commit/bfda69416c6d0a76b40644b1b0cbc1cbca254a61
DIFF: 
https://github.com/llvm/llvm-project/commit/bfda69416c6d0a76b40644b1b0cbc1cbca254a61.diff

LOG: [BasicAA] Fix a bug with relational reasoning across iterations

Due to the recursion through phis basicaa does, the code needs to be extremely 
careful not to reason about equality between values which might represent 
distinct iterations. I'm generally skeptical of the correctness of the whole 
scheme, but this particular patch fixes one particular instance which is 
demonstrateable incorrect.

Interestingly, this appears to be the second attempted fix for the same issue. 
The former fix is incomplete and doesn't address the actual issue.

Differential Revision: https://reviews.llvm.org/D92694

Added: 


Modified: 
llvm/include/llvm/Analysis/BasicAliasAnalysis.h
llvm/lib/Analysis/BasicAliasAnalysis.cpp
llvm/test/Analysis/BasicAA/phi-aa.ll

Removed: 




diff  --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h 
b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
index d9a174951695..eedecd2a4381 100644
--- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
+++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h
@@ -202,6 +202,12 @@ class BasicAAResult : public AAResultBase {
   const DecomposedGEP &DecompGEP, const DecomposedGEP &DecompObject,
   LocationSize ObjectAccessSize);
 
+  AliasResult aliasSameBasePointerGEPs(const GEPOperator *GEP1,
+   LocationSize MaybeV1Size,
+   const GEPOperator *GEP2,
+   LocationSize MaybeV2Size,
+   const DataLayout &DL);
+
   /// A Heuristic for aliasGEP that searches for a constant offset
   /// between the variables.
   ///

diff  --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp 
b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
index 2fb353eabb6e..5e611a9e193c 100644
--- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
@@ -1032,11 +1032,11 @@ ModRefInfo BasicAAResult::getModRefInfo(const CallBase 
*Call1,
 
 /// Provide ad-hoc rules to disambiguate accesses through two GEP operators,
 /// both having the exact same pointer operand.
-static AliasResult aliasSameBasePointerGEPs(const GEPOperator *GEP1,
-LocationSize MaybeV1Size,
-const GEPOperator *GEP2,
-LocationSize MaybeV2Size,
-const DataLayout &DL) {
+AliasResult BasicAAResult::aliasSameBasePointerGEPs(const GEPOperator *GEP1,
+LocationSize MaybeV1Size,
+const GEPOperator *GEP2,
+LocationSize MaybeV2Size,
+const DataLayout &DL) {
   assert(GEP1->getPointerOperand()->stripPointerCastsAndInvariantGroups() ==
  GEP2->getPointerOperand()->stripPointerCastsAndInvariantGroups() 
&&
  GEP1->getPointerOperandType() == GEP2->getPointerOperandType() &&
@@ -1126,24 +1126,12 @@ static AliasResult aliasSameBasePointerGEPs(const 
GEPOperator *GEP1,
 if (C1 && C2)
   return NoAlias;
 {
+  // If we're not potentially reasoning about values from 
diff erent
+  // iterations, see if we can prove them inequal.
   Value *GEP1LastIdx = GEP1->getOperand(GEP1->getNumOperands() - 1);
   Value *GEP2LastIdx = GEP2->getOperand(GEP2->getNumOperands() - 1);
-  if (isa(GEP1LastIdx) || isa(GEP2LastIdx)) {
-// If one of the indices is a PHI node, be safe and only use
-// computeKnownBits so we don't make any assumptions about the
-// relationships between the two indices. This is important if we're
-// asking about values from 
diff erent loop iterations. See PR32314.
-// TODO: We may be able to change the check so we only do this when
-// we definitely looked through a PHINode.
-if (GEP1LastIdx != GEP2LastIdx &&
-GEP1LastIdx->getType() == GEP2LastIdx->getType()) {
-  KnownBits Known1 = computeKnownBits(GEP1LastIdx, DL);
-  KnownBits Known2 = computeKnownBits(GEP2LastIdx, DL);
-  if (Known1.Zero.intersects(Known2.One) ||
-  Known1.One.intersects(Known2.Zero))
-return NoAlias;
-}
-  } else if (isKnownNonEqual(GEP1LastIdx, GEP2LastIdx, DL))
+  if (VisitedPhiBBs.empty() &&
+  isKnownNonEqual(GEP1LastIdx, GEP2LastIdx, DL))
 return NoAlias;
 }
   }

diff  --git a/llvm/test/Analysis/BasicAA/phi-

[llvm-branch-commits] [llvm] 8f07629 - Add recursive decomposition reasoning to isKnownNonEqual

2020-12-05 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-05T15:58:19-08:00
New Revision: 8f076291be41467560ebf73738561225d2b67206

URL: 
https://github.com/llvm/llvm-project/commit/8f076291be41467560ebf73738561225d2b67206
DIFF: 
https://github.com/llvm/llvm-project/commit/8f076291be41467560ebf73738561225d2b67206.diff

LOG: Add recursive decomposition reasoning to isKnownNonEqual

The basic idea is that by looking through operand instructions which don't 
change the equality result that we can push the existing known bits comparison 
down past instructions which would obscure them.

We have analogous handling in InstSimplify for most - though weirdly not all - 
of these cases starting from an icmp root. It's a bit unfortunate to duplicate 
logic, but since my actual goal is to extend BasicAA, the icmp logic doesn't 
help. (And just makes it hard to test here.)  The BasicAA change will be posted 
separately for review.

Differential Revision: https://reviews.llvm.org/D92698

Added: 


Modified: 
llvm/lib/Analysis/ValueTracking.cpp
llvm/test/Analysis/ValueTracking/known-non-equal.ll

Removed: 




diff  --git a/llvm/lib/Analysis/ValueTracking.cpp 
b/llvm/lib/Analysis/ValueTracking.cpp
index 32e0ca321dec..a1bb6e2eea78 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -350,13 +350,14 @@ bool llvm::isKnownNegative(const Value *V, const 
DataLayout &DL, unsigned Depth,
   return Known.isNegative();
 }
 
-static bool isKnownNonEqual(const Value *V1, const Value *V2, const Query &Q);
+static bool isKnownNonEqual(const Value *V1, const Value *V2, unsigned Depth,
+const Query &Q);
 
 bool llvm::isKnownNonEqual(const Value *V1, const Value *V2,
const DataLayout &DL, AssumptionCache *AC,
const Instruction *CxtI, const DominatorTree *DT,
bool UseInstrInfo) {
-  return ::isKnownNonEqual(V1, V2,
+  return ::isKnownNonEqual(V1, V2, 0,
Query(DL, AC, safeCxtI(V1, safeCxtI(V2, CxtI)), DT,
  UseInstrInfo, /*ORE=*/nullptr));
 }
@@ -2486,7 +2487,8 @@ bool isKnownNonZero(const Value* V, unsigned Depth, const 
Query& Q) {
 }
 
 /// Return true if V2 == V1 + X, where X is known non-zero.
-static bool isAddOfNonZero(const Value *V1, const Value *V2, const Query &Q) {
+static bool isAddOfNonZero(const Value *V1, const Value *V2, unsigned Depth,
+   const Query &Q) {
   const BinaryOperator *BO = dyn_cast(V1);
   if (!BO || BO->getOpcode() != Instruction::Add)
 return false;
@@ -2497,24 +2499,54 @@ static bool isAddOfNonZero(const Value *V1, const Value 
*V2, const Query &Q) {
 Op = BO->getOperand(0);
   else
 return false;
-  return isKnownNonZero(Op, 0, Q);
+  return isKnownNonZero(Op, Depth + 1, Q);
 }
 
 /// Return true if it is known that V1 != V2.
-static bool isKnownNonEqual(const Value *V1, const Value *V2, const Query &Q) {
+static bool isKnownNonEqual(const Value *V1, const Value *V2, unsigned Depth,
+const Query &Q) {
   if (V1 == V2)
 return false;
   if (V1->getType() != V2->getType())
 // We can't look through casts yet.
 return false;
-  if (isAddOfNonZero(V1, V2, Q) || isAddOfNonZero(V2, V1, Q))
+
+  if (Depth >= MaxAnalysisRecursionDepth)
+return false;
+
+  // See if we can recurse through (exactly one of) our operands.
+  auto *O1 = dyn_cast(V1);
+  auto *O2 = dyn_cast(V2);
+  if (O1 && O2 && O1->getOpcode() == O2->getOpcode()) {
+switch (O1->getOpcode()) {
+default: break;
+case Instruction::Add:
+case Instruction::Sub:
+  // Assume operand order has been canonicalized
+  if (O1->getOperand(0) == O2->getOperand(0))
+return isKnownNonEqual(O1->getOperand(1), O2->getOperand(1),
+   Depth + 1, Q);
+  if (O1->getOperand(1) == O2->getOperand(1))
+return isKnownNonEqual(O1->getOperand(0), O2->getOperand(0),
+   Depth + 1, Q);
+  break;
+case Instruction::SExt:
+case Instruction::ZExt:
+  if (O1->getOperand(0)->getType() == O2->getOperand(0)->getType())
+return isKnownNonEqual(O1->getOperand(0), O2->getOperand(0),
+   Depth + 1, Q);
+  break;
+};
+  }
+  
+  if (isAddOfNonZero(V1, V2, Depth, Q) || isAddOfNonZero(V2, V1, Depth, Q))
 return true;
 
   if (V1->getType()->isIntOrIntVectorTy()) {
 // Are any known bits in V1 contradictory to known bits in V2? If V1
 // has a known zero where V2 has a known one, they must not be equal.
-KnownBits Known1 = computeKnownBits(V1, 0, Q);
-KnownBits Known2 = computeKnownBits(V2, 0, Q);
+KnownBits Known1 = computeKnownBits(V1, Depth, Q);
+KnownBits Known2 = computeKnownBits(V2, Depth, Q);
 
 if (Known1.Zero.intersects(Known2.One) |

[llvm-branch-commits] [llvm] 2656885 - Teach isKnownNonEqual how to recurse through invertible multiplies

2020-12-07 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-07T14:52:08-08:00
New Revision: 2656885390f17cceae142b4265c337fcee2410c0

URL: 
https://github.com/llvm/llvm-project/commit/2656885390f17cceae142b4265c337fcee2410c0
DIFF: 
https://github.com/llvm/llvm-project/commit/2656885390f17cceae142b4265c337fcee2410c0.diff

LOG: Teach isKnownNonEqual how to recurse through invertible multiplies

Build on the work started in 8f07629, and add the multiply case. In the 
process, more clearly describe the requirement for the operation we're looking 
through.

Differential Revision: https://reviews.llvm.org/D92726

Added: 


Modified: 
llvm/lib/Analysis/ValueTracking.cpp
llvm/test/Analysis/ValueTracking/known-non-equal.ll

Removed: 




diff  --git a/llvm/lib/Analysis/ValueTracking.cpp 
b/llvm/lib/Analysis/ValueTracking.cpp
index a1bb6e2eea78..eeb505868703 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -2502,6 +2502,7 @@ static bool isAddOfNonZero(const Value *V1, const Value 
*V2, unsigned Depth,
   return isKnownNonZero(Op, Depth + 1, Q);
 }
 
+
 /// Return true if it is known that V1 != V2.
 static bool isKnownNonEqual(const Value *V1, const Value *V2, unsigned Depth,
 const Query &Q) {
@@ -2514,7 +2515,9 @@ static bool isKnownNonEqual(const Value *V1, const Value 
*V2, unsigned Depth,
   if (Depth >= MaxAnalysisRecursionDepth)
 return false;
 
-  // See if we can recurse through (exactly one of) our operands.
+  // See if we can recurse through (exactly one of) our operands.  This
+  // requires our operation be 1-to-1 and map every input value to exactly
+  // one output value.  Such an operation is invertible.
   auto *O1 = dyn_cast(V1);
   auto *O2 = dyn_cast(V2);
   if (O1 && O2 && O1->getOpcode() == O2->getOpcode()) {
@@ -2530,6 +2533,23 @@ static bool isKnownNonEqual(const Value *V1, const Value 
*V2, unsigned Depth,
 return isKnownNonEqual(O1->getOperand(0), O2->getOperand(0),
Depth + 1, Q);
   break;
+case Instruction::Mul:
+  // invertible if A * B == (A * B) mod 2^N where A, and B are integers
+  // and N is the bitwdith.  The nsw case is non-obvious, but proven by
+  // alive2: https://alive2.llvm.org/ce/z/Z6D5qK
+  if ((!cast(O1)->hasNoUnsignedWrap() ||
+   !cast(O2)->hasNoUnsignedWrap()) &&
+  (!cast(O1)->hasNoSignedWrap() ||
+   !cast(O2)->hasNoSignedWrap()))
+break;
+
+  // Assume operand order has been canonicalized
+  if (O1->getOperand(1) == O2->getOperand(1) &&
+  isa(O1->getOperand(1)) &&
+  !cast(O1->getOperand(1))->isZero())
+return isKnownNonEqual(O1->getOperand(0), O2->getOperand(0),
+   Depth + 1, Q);
+  break;
 case Instruction::SExt:
 case Instruction::ZExt:
   if (O1->getOperand(0)->getType() == O2->getOperand(0)->getType())

diff  --git a/llvm/test/Analysis/ValueTracking/known-non-equal.ll 
b/llvm/test/Analysis/ValueTracking/known-non-equal.ll
index 664542f632ab..8bc9a86c9a93 100644
--- a/llvm/test/Analysis/ValueTracking/known-non-equal.ll
+++ b/llvm/test/Analysis/ValueTracking/known-non-equal.ll
@@ -130,4 +130,76 @@ define i1 @sub2(i8 %B, i8 %C) {
   ret i1 %cmp
 }
 
+; op could wrap mapping two values to the same output value.
+define i1 @mul1(i8 %B) {
+; CHECK-LABEL: @mul1(
+; CHECK-NEXT:[[A:%.*]] = add i8 [[B:%.*]], 1
+; CHECK-NEXT:[[A_OP:%.*]] = mul i8 [[A]], 27
+; CHECK-NEXT:[[B_OP:%.*]] = mul i8 [[B]], 27
+; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8 [[A_OP]], [[B_OP]]
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+  %A = add i8 %B, 1
+  %A.op = mul i8 %A, 27
+  %B.op = mul i8 %B, 27
+
+  %cmp = icmp eq i8 %A.op, %B.op
+  ret i1 %cmp
+}
+
+define i1 @mul2(i8 %B) {
+; CHECK-LABEL: @mul2(
+; CHECK-NEXT:ret i1 false
+;
+  %A = add i8 %B, 1
+  %A.op = mul nuw i8 %A, 27
+  %B.op = mul nuw i8 %B, 27
+
+  %cmp = icmp eq i8 %A.op, %B.op
+  ret i1 %cmp
+}
+
+define i1 @mul3(i8 %B) {
+; CHECK-LABEL: @mul3(
+; CHECK-NEXT:ret i1 false
+;
+  %A = add i8 %B, 1
+  %A.op = mul nsw i8 %A, 27
+  %B.op = mul nsw i8 %B, 27
+
+  %cmp = icmp eq i8 %A.op, %B.op
+  ret i1 %cmp
+}
+
+; Multiply by zero collapses all values to one
+define i1 @mul4(i8 %B) {
+; CHECK-LABEL: @mul4(
+; CHECK-NEXT:ret i1 true
+;
+  %A = add i8 %B, 1
+  %A.op = mul nuw i8 %A, 0
+  %B.op = mul nuw i8 %B, 0
+
+  %cmp = icmp eq i8 %A.op, %B.op
+  ret i1 %cmp
+}
+
+; C might be zero, we can't tell
+define i1 @mul5(i8 %B, i8 %C) {
+; CHECK-LABEL: @mul5(
+; CHECK-NEXT:[[A:%.*]] = add i8 [[B:%.*]], 1
+; CHECK-NEXT:[[A_OP:%.*]] = mul nuw nsw i8 [[A]], [[C:%.*]]
+; CHECK-NEXT:[[B_OP:%.*]] = mul nuw nsw i8 [[B]], [[C]]
+; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8 [[A_OP]], [[B_OP]]
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+  %A = add i8 %B, 1
+  %A.op = mul nsw nuw i8 %A, %C
+  %B.op = mul nsw nuw i

[llvm-branch-commits] [llvm] 5171b7b - [indvars] Common a bit of code [NFC]

2020-12-08 Thread Philip Reames via llvm-branch-commits


Author: Philip Reames
Date: 2020-12-08T15:25:48-08:00
New Revision: 5171b7b40e9813e3fbfaf1e1e3372895c9ff6081

URL: 
https://github.com/llvm/llvm-project/commit/5171b7b40e9813e3fbfaf1e1e3372895c9ff6081
DIFF: 
https://github.com/llvm/llvm-project/commit/5171b7b40e9813e3fbfaf1e1e3372895c9ff6081.diff

LOG: [indvars] Common a bit of code [NFC]

Added: 


Modified: 
llvm/lib/Transforms/Utils/SimplifyIndVar.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp 
b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
index c02264aec600..189130f0e0ac 100644
--- a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
@@ -1272,28 +1272,8 @@ Instruction 
*WidenIV::cloneArithmeticIVUser(WidenIV::NarrowIVDefUse DU,
 }
 
 // WideUse is "WideDef `op.wide` X" as described in the comment.
-const SCEV *WideUse = nullptr;
-
-switch (NarrowUse->getOpcode()) {
-default:
-  llvm_unreachable("No other possibility!");
-
-case Instruction::Add:
-  WideUse = SE->getAddExpr(WideLHS, WideRHS);
-  break;
-
-case Instruction::Mul:
-  WideUse = SE->getMulExpr(WideLHS, WideRHS);
-  break;
-
-case Instruction::UDiv:
-  WideUse = SE->getUDivExpr(WideLHS, WideRHS);
-  break;
-
-case Instruction::Sub:
-  WideUse = SE->getMinusSCEV(WideLHS, WideRHS);
-  break;
-}
+const SCEV *WideUse =
+  getSCEVByOpCode(WideLHS, WideRHS, NarrowUse->getOpcode());
 
 return WideUse == WideAR;
   };
@@ -1332,14 +1312,18 @@ WidenIV::ExtendKind WidenIV::getExtendKind(Instruction 
*I) {
 
 const SCEV *WidenIV::getSCEVByOpCode(const SCEV *LHS, const SCEV *RHS,
  unsigned OpCode) const {
-  if (OpCode == Instruction::Add)
+  switch (OpCode) {
+  case Instruction::Add:
 return SE->getAddExpr(LHS, RHS);
-  if (OpCode == Instruction::Sub)
+  case Instruction::Sub:
 return SE->getMinusSCEV(LHS, RHS);
-  if (OpCode == Instruction::Mul)
+  case Instruction::Mul:
 return SE->getMulExpr(LHS, RHS);
-
-  llvm_unreachable("Unsupported opcode.");
+  case Instruction::UDiv:
+return SE->getUDivExpr(LHS, RHS);
+  default:
+llvm_unreachable("Unsupported opcode.");
+  };
 }
 
 /// No-wrap operations can transfer sign extension of their result to their



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)

2024-11-04 Thread Philip Reames via llvm-branch-commits


preames wrote:

At a macro level, it looks like ExpandMemCmp is making some problematic choices 
around unaligned loads and stores.  As I commented before, ExpandMemCmp appears 
to be blindly emitting unaligned accesses (counted as one against budget) 
without accounting for the fact that such loads are going to be scalarized 
again (i.e. resulting in N x loads, where N is the type size).  I think we need 
to fix this.  In particular, the discussion around Zbb and Zbkb in this review 
seem to mostly come from cases where unaligned load.store are being expanded 
implicitly,

I don't believe this change should move forward until the underlying issue in 
ExpandMemCmp has been addressed.

https://github.com/llvm/llvm-project/pull/107548
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)

2024-11-05 Thread Philip Reames via llvm-branch-commits


https://github.com/preames approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/107548
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [RISCV] Set DisableLatencyHeuristic to true (PR #115858)

2024-11-26 Thread Philip Reames via llvm-branch-commits


preames wrote:

Given @michaelmaitland's data, @wangpc-pp  the burden shifts to you to clearly 
justify which cases this is profitable and figure out how to selectively enable 
only in profitable cases.  I agree with @michaelmaitland's conclusion that this 
should not move forward otherwise.  

@michaelmaitland Can you say anything about the magnitude of regression in 
either case?  I assume they were statistically significant given you mention 
them, but are these small regressions or largish ones?

https://github.com/llvm/llvm-project/pull/115858
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [RISCV] Enable ShouldTrackLaneMasks when having vector instructions (PR #115843)

2024-12-04 Thread Philip Reames via llvm-branch-commits


preames wrote:

> Ping.

I went and dug through the diffs in the tests.  I see no obvious evidence of 
performance improvement, and a couple of regressions (see 
vector_interleave_nxv16f64_nxv8f64).  I don't think this patch should move 
forward unless we have a justification for why we think this is a net 
performance win.  The easiest way to make said argument is to share 
measurements from some benchmark set (e.g. spec) on some vector hardware (e.g. 
bp3).  

I'll note that from a conceptual standpoint this patch does seem to make sense. 
 My worry (triggered by the regression noted above) is that this may be 
exposing some other issue and that we need to unwind things a bit before this 
can land.  

https://github.com/llvm/llvm-project/pull/115843
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [RISCV] Use getSignedConstant for negative values. (#125903) (PR #125953)

2025-02-05 Thread Philip Reames via llvm-branch-commits


https://github.com/preames approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/125953
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [RISCV] Check isFixedLengthVector before calling getVectorNumElements in getSingleShuffleSrc. (#125455) (PR #125590)

2025-02-10 Thread Philip Reames via llvm-branch-commits


https://github.com/preames approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/125590
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] DAG: Avoid breaking legal vector_shuffle with multiple uses (PR #123712)

2025-01-21 Thread Philip Reames via llvm-branch-commits


https://github.com/preames approved this pull request.

LGTM

I suspect we'll want to refine the profitability here over the time, but this 
seems reasonable as a stepping stone.

https://github.com/llvm/llvm-project/pull/123712
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)

2025-01-14 Thread Philip Reames via llvm-branch-commits

preames wrote:

> > * BuildVector w/one non-zero non-undef source, repeated 100 times (i.e. 
> > splat or select of two splats)
> 
> I don't follow, this is a 2 element vector, how can you have 100 variants?

Isn't the condition in code in terms of VecIn.size() == 2?  I believe that 
VecIn is the *unique* input elements, right?  Which is distinct from the number 
of elements in the destination type?  (Am I just misreading?  I only skimmed 
this.)

> > If the target isn't optimally lowering the splat or select of splat case in 
> > the shuffle lowering, maybe we should just adjust the target lowering to do 
> > so?t
> 
> It's not a lowering issue, it's the effect on every other combine. We'd have 
> to special case 1 element + 1 undef shuffles everywhere we handle 
> extract_vector_elt now, which is just excessive complexity. #122671 is almost 
> an alternative in one instance, but still shows expanding complexity of 
> handling this edge case.

Honestly,  #122671 (from the review description only) sounds like a worthwhile 
change.  That's not a hugely compelling argument here.  Let's settle the prior 
point, and then return to this.  If I'm just misreading something, let's not 
waste time discussing this.  

https://github.com/llvm/llvm-project/pull/122672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)

2025-01-14 Thread Philip Reames via llvm-branch-commits


https://github.com/preames commented:

I don't think the heuristic here is quite what you want.  I believe this 
heuristic disables both of the following cases:
* BuildVector w/one non-zero non-undef element
* BuildVector w/one non-zero non-undef source, repeated 100 times (i.e. splat 
or select of two splats)

Disabling the former seems defensible, doing so for the second less so.  

Though honestly, I'm not sure of this change as a whole.  Having a single 
canonical form seems valuable here.  If the target isn't optimally lowering the 
splat or select of splat case in the shuffle lowering, maybe we should just 
adjust the target lowering to do so?

https://github.com/llvm/llvm-project/pull/122672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)

2025-01-14 Thread Philip Reames via llvm-branch-commits


preames wrote:

> > Isn't the condition in code in terms of VecIn.size() == 2? I believe that 
> > VecIn is the _unique_ input elements, right? Which is distinct from the 
> > number of elements in the destination type? (Am I just misreading? I only 
> > skimmed this.)
> 
> VecIn is collecting only extract_vector_elts feeding the build_vector. So 
> it's true it's not only a 2 element vector, in general (but the standard case 
> of building a complete vector is 2 elements). The other skipped elements are 
> all constant or undef.
> 
> A 2 element shuffle just happens to the only case I care about which I'm 
> trying to make legal (and really only the odd -> even case is of any use).

This is exactly the distinct I'm trying to get at.  Avoiding the creation of a 
1-2 element shuffle seems quite reasonable.  Avoiding the creation of a 100 
element splat shuffle does not.  I think you need to add an explicit condition 
in terms of the number elements in the result, not the number of *unique* 
elements in the result.  

https://github.com/llvm/llvm-project/pull/122672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)

2025-01-16 Thread Philip Reames via llvm-branch-commits


https://github.com/preames approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/122672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-16 Thread Philip Reames via llvm-branch-commits



@@ -16190,13 +16186,20 @@ combineVectorSizedSetCCEquality(EVT VT, SDValue X, 
SDValue Y, ISD::CondCode CC,
 return SDValue();
 
   unsigned VecSize = OpSize / 8;

preames wrote:

Where in the code above do we have a guarantee that OpSize is a multiple of 8?  

https://github.com/llvm/llvm-project/pull/114971
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

66 matches

Mail list logo