[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)
@@ -2055,9 +2055,9 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent( // stride multiplied by the backedge taken count, the accesses are independet, // i.e. they are far enough appart that accesses won't access the same // location across all loop ierations. - if (HasSameSize && - isSafeDependenceDistance(DL, SE, *(PSE.getBackedgeTakenCount()), *Dist, - MaxStride, TypeByteSize)) + if (HasSameSize && isSafeDependenceDistance( preames wrote: The doc comment on isSafeDependenceDistance needs updated. I think it's correct, but there's a difference between an exact BTC and a bound on BTC. https://github.com/llvm/llvm-project/pull/93499 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)
@@ -3004,7 +3004,7 @@ void LoopAccessInfo::collectStridedAccess(Value *MemAccess) { // of various possible stride specializations, considering the alternatives // of using gather/scatters (if available). - const SCEV *BETakenCount = PSE->getBackedgeTakenCount(); + const SCEV *BETakenCount = PSE->getSymbolicMaxBackedgeTakenCount(); preames wrote: Not related to your change - but this whole block of code is just weird. This is basically proving a more precise trip count, why is it in LAA at all? Wouldn't simply early exiting on small BTC loops be sufficient? https://github.com/llvm/llvm-project/pull/93499 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)
@@ -1506,6 +1506,16 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) { return false; } + if (isa(PSE.getBackedgeTakenCount())) { preames wrote: What about the other users of LAA in tree? Have you audited them? If not, can you add bailouts to ensure we're not breaking anything with this transition? https://github.com/llvm/llvm-project/pull/93499 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (PR #93499)
@@ -2395,7 +2395,7 @@ bool LoopAccessInfo::canAnalyzeLoop() { } // ScalarEvolution needs to be able to find the exit count. - const SCEV *ExitCount = PSE->getBackedgeTakenCount(); + const SCEV *ExitCount = PSE->getSymbolicMaxBackedgeTakenCount(); preames wrote: Update the comments to say bound on the btc. https://github.com/llvm/llvm-project/pull/93499 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (PR #101464)
https://github.com/preames approved this pull request. https://github.com/llvm/llvm-project/pull/101464 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [RISCV] Use APInt in isSimpleVIDSequence to account for index overflow (#100072) (PR #101124)
preames wrote: Talked with Luke about this one offline. On reflection, both of us are a bit unsure about the balance of risk vs reward here. The miscompile is not a regression, and occurs in what we think is a pretty unusual configuration. The fix landed recently, and while there's no known problems, there's always risk in a backport. This could easily go either way, but I think we can skip backporting this. https://github.com/llvm/llvm-project/pull/101124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Revert "[RISCV] Recurse on first operand of two operand shuffles (#79180)" (PR #80238)
preames wrote: @tstellar This backport has been outstanding for a while now. https://github.com/llvm/llvm-project/pull/80238 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [RISCV] Add subtarget features for profiles (PR #84877)
https://github.com/preames approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/84877 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/18.x: [RISCV] Re-separate unaligned scalar and vector memory features in the backend. (PR #92143)
preames wrote: I don't think we need to backport this at all. None of the in tree cpus fall into the category where the distinction is important, and I don't feel we have any obligation to backport support for our of tree forks. https://github.com/llvm/llvm-project/pull/92143 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/18.x: [RISCV] Re-separate unaligned scalar and vector memory features in the backend. (PR #92143)
preames wrote: I'm not strongly opposed to this or anything, but it feels questionable to be doing a backport to change the target-feature syntax. My understand is that these are purely internal names. This isn't a documented public interface. https://github.com/llvm/llvm-project/pull/92143 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [RISCV] Re-separate unaligned scalar and vector memory features in the backend. (PR #92143)
preames wrote: I'm fine with this approach. No strong opinion either way, but definitely don't let me previous comments be blocking here. https://github.com/llvm/llvm-project/pull/92143 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 73e9633 - [RISCV] Add test coverage for partial buildvecs idioms
Author: Philip Reames Date: 2023-11-16T13:33:12-08:00 New Revision: 73e963379e4d06ca75625f63a5604c286fe37040 URL: https://github.com/llvm/llvm-project/commit/73e963379e4d06ca75625f63a5604c286fe37040 DIFF: https://github.com/llvm/llvm-project/commit/73e963379e4d06ca75625f63a5604c286fe37040.diff LOG: [RISCV] Add test coverage for partial buildvecs idioms Test coverage for an upcoming set of changes Added: Modified: llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll Removed: diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll index 717dfb1bfd00537..8055944fc5468f3 100644 --- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll +++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll @@ -446,6 +446,25 @@ define <4 x i32> @add_general_splat(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) { ; This test previously failed with an assertion failure because constant shift ; amounts are type legalized early. define void @buggy(i32 %0) #0 { +; RV32-LABEL: buggy: +; RV32: # %bb.0: # %entry +; RV32-NEXT:vsetivli zero, 4, e32, m1, ta, ma +; RV32-NEXT:vmv.v.x v8, a0 +; RV32-NEXT:vadd.vv v8, v8, v8 +; RV32-NEXT:vor.vi v8, v8, 1 +; RV32-NEXT:vrgather.vi v9, v8, 0 +; RV32-NEXT:vse32.v v9, (zero) +; RV32-NEXT:ret +; +; RV64-LABEL: buggy: +; RV64: # %bb.0: # %entry +; RV64-NEXT:slli a0, a0, 1 +; RV64-NEXT:vsetivli zero, 4, e32, m1, ta, ma +; RV64-NEXT:vmv.v.x v8, a0 +; RV64-NEXT:vor.vi v8, v8, 1 +; RV64-NEXT:vrgather.vi v9, v8, 0 +; RV64-NEXT:vse32.v v9, (zero) +; RV64-NEXT:ret entry: %mul.us.us.i.3 = shl i32 %0, 1 %1 = insertelement <4 x i32> zeroinitializer, i32 %mul.us.us.i.3, i64 0 @@ -454,3 +473,96 @@ entry: store <4 x i32> %3, ptr null, align 16 ret void } + + +define <8 x i32> @add_constant_rhs_8xi32_vector_in(<8 x i32> %vin, i32 %a, i32 %b, i32 %c, i32 %d) { +; CHECK-LABEL: add_constant_rhs_8xi32_vector_in: +; CHECK: # %bb.0: +; CHECK-NEXT:addi a0, a0, 23 +; CHECK-NEXT:addi a1, a1, 25 +; CHECK-NEXT:addi a2, a2, 1 +; CHECK-NEXT:addi a3, a3, 2047 +; CHECK-NEXT:addi a3, a3, 308 +; CHECK-NEXT:vsetivli zero, 2, e32, m1, tu, ma +; CHECK-NEXT:vmv.s.x v8, a0 +; CHECK-NEXT:vmv.s.x v10, a1 +; CHECK-NEXT:vslideup.vi v8, v10, 1 +; CHECK-NEXT:vmv.s.x v10, a2 +; CHECK-NEXT:vsetivli zero, 3, e32, m1, tu, ma +; CHECK-NEXT:vslideup.vi v8, v10, 2 +; CHECK-NEXT:vmv.s.x v10, a3 +; CHECK-NEXT:vsetivli zero, 4, e32, m1, tu, ma +; CHECK-NEXT:vslideup.vi v8, v10, 3 +; CHECK-NEXT:ret + %e0 = add i32 %a, 23 + %e1 = add i32 %b, 25 + %e2 = add i32 %c, 1 + %e3 = add i32 %d, 2355 + %v0 = insertelement <8 x i32> %vin, i32 %e0, i32 0 + %v1 = insertelement <8 x i32> %v0, i32 %e1, i32 1 + %v2 = insertelement <8 x i32> %v1, i32 %e2, i32 2 + %v3 = insertelement <8 x i32> %v2, i32 %e3, i32 3 + ret <8 x i32> %v3 +} + +define <8 x i32> @add_constant_rhs_8xi32_vector_in2(<8 x i32> %vin, i32 %a, i32 %b, i32 %c, i32 %d) { +; CHECK-LABEL: add_constant_rhs_8xi32_vector_in2: +; CHECK: # %bb.0: +; CHECK-NEXT:addi a0, a0, 23 +; CHECK-NEXT:addi a1, a1, 25 +; CHECK-NEXT:addi a2, a2, 1 +; CHECK-NEXT:addi a3, a3, 2047 +; CHECK-NEXT:addi a3, a3, 308 +; CHECK-NEXT:vsetivli zero, 5, e32, m2, tu, ma +; CHECK-NEXT:vmv.s.x v10, a0 +; CHECK-NEXT:vslideup.vi v8, v10, 4 +; CHECK-NEXT:vmv.s.x v10, a1 +; CHECK-NEXT:vsetivli zero, 6, e32, m2, tu, ma +; CHECK-NEXT:vslideup.vi v8, v10, 5 +; CHECK-NEXT:vmv.s.x v10, a2 +; CHECK-NEXT:vsetivli zero, 7, e32, m2, tu, ma +; CHECK-NEXT:vslideup.vi v8, v10, 6 +; CHECK-NEXT:vmv.s.x v10, a3 +; CHECK-NEXT:vsetivli zero, 8, e32, m2, ta, ma +; CHECK-NEXT:vslideup.vi v8, v10, 7 +; CHECK-NEXT:ret + %e0 = add i32 %a, 23 + %e1 = add i32 %b, 25 + %e2 = add i32 %c, 1 + %e3 = add i32 %d, 2355 + %v0 = insertelement <8 x i32> %vin, i32 %e0, i32 4 + %v1 = insertelement <8 x i32> %v0, i32 %e1, i32 5 + %v2 = insertelement <8 x i32> %v1, i32 %e2, i32 6 + %v3 = insertelement <8 x i32> %v2, i32 %e3, i32 7 + ret <8 x i32> %v3 +} + +define <8 x i32> @add_constant_rhs_8xi32_vector_in3(<8 x i32> %vin, i32 %a, i32 %b, i32 %c, i32 %d) { +; CHECK-LABEL: add_constant_rhs_8xi32_vector_in3: +; CHECK: # %bb.0: +; CHECK-NEXT:addi a0, a0, 23 +; CHECK-NEXT:addi a1, a1, 25 +; CHECK-NEXT:addi a2, a2, 1 +; CHECK-NEXT:addi a3, a3, 2047 +; CHECK-NEXT:addi a3, a3, 308 +; CHECK-NEXT:vsetivli zero, 3, e32, m1, tu, ma +; CHECK-NEXT:vmv.s.x v8, a0 +; CHECK-NEXT:vmv.s.x v10, a1 +; CHECK-NEXT:vslideup.vi v8, v10, 2 +; CHECK-NEXT:vmv.s.x v10, a2 +; CHECK-NEXT:vsetivli zero, 5, e32, m2, tu, ma +; CHECK-NE
[llvm-branch-commits] [llvm] 1aa493f - [RISCV] Further expand coverage for insert_vector_elt patterns
Author: Philip Reames Date: 2023-11-16T14:14:31-08:00 New Revision: 1aa493f0645395908fe77bc69bce93fd4e80b1e8 URL: https://github.com/llvm/llvm-project/commit/1aa493f0645395908fe77bc69bce93fd4e80b1e8 DIFF: https://github.com/llvm/llvm-project/commit/1aa493f0645395908fe77bc69bce93fd4e80b1e8.diff LOG: [RISCV] Further expand coverage for insert_vector_elt patterns Added: llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll Modified: llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll Removed: diff --git a/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll b/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll new file mode 100644 index 000..9193f7aef4b8757 --- /dev/null +++ b/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll @@ -0,0 +1,241 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mtriple=riscv32 -mattr=+v -target-abi=ilp32 \ +; RUN: -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV32 +; RUN: llc -mtriple=riscv64 -mattr=+v -target-abi=lp64 \ +; RUN: -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV64 + +define void @v4xi8_concat_vector_insert_idx0(ptr %a, ptr %b, i8 %x) { +; CHECK-LABEL: v4xi8_concat_vector_insert_idx0: +; CHECK: # %bb.0: +; CHECK-NEXT:vsetivli zero, 2, e8, mf8, ta, ma +; CHECK-NEXT:vle8.v v8, (a0) +; CHECK-NEXT:vle8.v v9, (a1) +; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma +; CHECK-NEXT:vslideup.vi v8, v9, 2 +; CHECK-NEXT:vmv.s.x v9, a2 +; CHECK-NEXT:vsetivli zero, 2, e8, mf4, tu, ma +; CHECK-NEXT:vslideup.vi v8, v9, 1 +; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma +; CHECK-NEXT:vse8.v v8, (a0) +; CHECK-NEXT:ret + %v1 = load <2 x i8>, ptr %a + %v2 = load <2 x i8>, ptr %b + %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> + %ins = insertelement <4 x i8> %concat, i8 %x, i32 1 + store <4 x i8> %ins, ptr %a + ret void +} + +define void @v4xi8_concat_vector_insert_idx1(ptr %a, ptr %b, i8 %x) { +; CHECK-LABEL: v4xi8_concat_vector_insert_idx1: +; CHECK: # %bb.0: +; CHECK-NEXT:vsetivli zero, 2, e8, mf8, ta, ma +; CHECK-NEXT:vle8.v v8, (a0) +; CHECK-NEXT:vle8.v v9, (a1) +; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma +; CHECK-NEXT:vslideup.vi v8, v9, 2 +; CHECK-NEXT:vmv.s.x v9, a2 +; CHECK-NEXT:vsetivli zero, 2, e8, mf4, tu, ma +; CHECK-NEXT:vslideup.vi v8, v9, 1 +; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma +; CHECK-NEXT:vse8.v v8, (a0) +; CHECK-NEXT:ret + %v1 = load <2 x i8>, ptr %a + %v2 = load <2 x i8>, ptr %b + %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> + %ins = insertelement <4 x i8> %concat, i8 %x, i32 1 + store <4 x i8> %ins, ptr %a + ret void +} + +define void @v4xi8_concat_vector_insert_idx2(ptr %a, ptr %b, i8 %x) { +; CHECK-LABEL: v4xi8_concat_vector_insert_idx2: +; CHECK: # %bb.0: +; CHECK-NEXT:vsetivli zero, 2, e8, mf8, ta, ma +; CHECK-NEXT:vle8.v v8, (a0) +; CHECK-NEXT:vle8.v v9, (a1) +; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma +; CHECK-NEXT:vslideup.vi v8, v9, 2 +; CHECK-NEXT:vmv.s.x v9, a2 +; CHECK-NEXT:vsetivli zero, 3, e8, mf4, tu, ma +; CHECK-NEXT:vslideup.vi v8, v9, 2 +; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma +; CHECK-NEXT:vse8.v v8, (a0) +; CHECK-NEXT:ret + %v1 = load <2 x i8>, ptr %a + %v2 = load <2 x i8>, ptr %b + %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> + %ins = insertelement <4 x i8> %concat, i8 %x, i32 2 + store <4 x i8> %ins, ptr %a + ret void +} + +define void @v4xi8_concat_vector_insert_idx3(ptr %a, ptr %b, i8 %x) { +; CHECK-LABEL: v4xi8_concat_vector_insert_idx3: +; CHECK: # %bb.0: +; CHECK-NEXT:vsetivli zero, 2, e8, mf8, ta, ma +; CHECK-NEXT:vle8.v v8, (a0) +; CHECK-NEXT:vle8.v v9, (a1) +; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma +; CHECK-NEXT:vslideup.vi v8, v9, 2 +; CHECK-NEXT:vmv.s.x v9, a2 +; CHECK-NEXT:vslideup.vi v8, v9, 3 +; CHECK-NEXT:vse8.v v8, (a0) +; CHECK-NEXT:ret + %v1 = load <2 x i8>, ptr %a + %v2 = load <2 x i8>, ptr %b + %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> + %ins = insertelement <4 x i8> %concat, i8 %x, i32 3 + store <4 x i8> %ins, ptr %a + ret void +} + +define void @v4xi64_concat_vector_insert_idx0(ptr %a, ptr %b, i64 %x) { +; RV32-LABEL: v4xi64_concat_vector_insert_idx0: +; RV32: # %bb.0: +; RV32-NEXT:vsetivli zero, 2, e64, m1, ta, ma +; RV32-NEXT:vle64.v v8, (a0) +; RV32-NEXT:vle64.v v10, (a1) +; RV32-NEXT:vsetivli zero, 4, e64, m2, ta, ma +; RV32-NEXT:vslideup.vi v8, v10, 2 +; RV32-NEXT:vsetivli zero, 2, e32, m1, ta, ma +; RV32-NEXT:vslide1down.vx v10, v8, a2 +; RV32-NEXT:vslide1down.vx v10, v10, a3 +; RV32-NEXT:vsetivli zero, 2, e64, m1, tu, ma +; RV32-N
[llvm-branch-commits] [llvm] 233971b - [RISCV] Fix typo in a test and regen another to reduce test diff
Author: Philip Reames Date: 2023-11-16T14:28:16-08:00 New Revision: 233971b475a48d9ad8c61632660a1b45186897cc URL: https://github.com/llvm/llvm-project/commit/233971b475a48d9ad8c61632660a1b45186897cc DIFF: https://github.com/llvm/llvm-project/commit/233971b475a48d9ad8c61632660a1b45186897cc.diff LOG: [RISCV] Fix typo in a test and regen another to reduce test diff Added: Modified: llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll Removed: diff --git a/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll b/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll index 9193f7aef4b8757..3fc22818a2406a5 100644 --- a/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll +++ b/llvm/test/CodeGen/RISCV/rvv/concat-vector-insert-elt.ll @@ -12,16 +12,14 @@ define void @v4xi8_concat_vector_insert_idx0(ptr %a, ptr %b, i8 %x) { ; CHECK-NEXT:vle8.v v9, (a1) ; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma ; CHECK-NEXT:vslideup.vi v8, v9, 2 -; CHECK-NEXT:vmv.s.x v9, a2 -; CHECK-NEXT:vsetivli zero, 2, e8, mf4, tu, ma -; CHECK-NEXT:vslideup.vi v8, v9, 1 -; CHECK-NEXT:vsetivli zero, 4, e8, mf4, ta, ma +; CHECK-NEXT:vsetvli zero, zero, e8, mf4, tu, ma +; CHECK-NEXT:vmv.s.x v8, a2 ; CHECK-NEXT:vse8.v v8, (a0) ; CHECK-NEXT:ret %v1 = load <2 x i8>, ptr %a %v2 = load <2 x i8>, ptr %b %concat = shufflevector <2 x i8> %v1, <2 x i8> %v2, <4 x i32> - %ins = insertelement <4 x i8> %concat, i8 %x, i32 1 + %ins = insertelement <4 x i8> %concat, i8 %x, i32 0 store <4 x i8> %ins, ptr %a ret void } @@ -98,11 +96,9 @@ define void @v4xi64_concat_vector_insert_idx0(ptr %a, ptr %b, i64 %x) { ; RV32-NEXT:vle64.v v10, (a1) ; RV32-NEXT:vsetivli zero, 4, e64, m2, ta, ma ; RV32-NEXT:vslideup.vi v8, v10, 2 -; RV32-NEXT:vsetivli zero, 2, e32, m1, ta, ma -; RV32-NEXT:vslide1down.vx v10, v8, a2 -; RV32-NEXT:vslide1down.vx v10, v10, a3 -; RV32-NEXT:vsetivli zero, 2, e64, m1, tu, ma -; RV32-NEXT:vslideup.vi v8, v10, 1 +; RV32-NEXT:vsetivli zero, 2, e32, m1, tu, ma +; RV32-NEXT:vslide1down.vx v8, v8, a2 +; RV32-NEXT:vslide1down.vx v8, v8, a3 ; RV32-NEXT:vsetivli zero, 4, e64, m2, ta, ma ; RV32-NEXT:vse64.v v8, (a0) ; RV32-NEXT:ret @@ -114,16 +110,14 @@ define void @v4xi64_concat_vector_insert_idx0(ptr %a, ptr %b, i64 %x) { ; RV64-NEXT:vle64.v v10, (a1) ; RV64-NEXT:vsetivli zero, 4, e64, m2, ta, ma ; RV64-NEXT:vslideup.vi v8, v10, 2 -; RV64-NEXT:vmv.s.x v10, a2 -; RV64-NEXT:vsetivli zero, 2, e64, m1, tu, ma -; RV64-NEXT:vslideup.vi v8, v10, 1 -; RV64-NEXT:vsetivli zero, 4, e64, m2, ta, ma +; RV64-NEXT:vsetvli zero, zero, e64, m2, tu, ma +; RV64-NEXT:vmv.s.x v8, a2 ; RV64-NEXT:vse64.v v8, (a0) ; RV64-NEXT:ret %v1 = load <2 x i64>, ptr %a %v2 = load <2 x i64>, ptr %b %concat = shufflevector <2 x i64> %v1, <2 x i64> %v2, <4 x i32> - %ins = insertelement <4 x i64> %concat, i64 %x, i32 1 + %ins = insertelement <4 x i64> %concat, i64 %x, i32 0 store <4 x i64> %ins, ptr %a ret void } diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll index d1ea56a1ff93819..2d8bae7092242d3 100644 --- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll +++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll @@ -1080,6 +1080,13 @@ define <32 x double> @buildvec_v32f64(double %e0, double %e1, double %e2, double ; FIXME: These constants have enough sign bits that we could use vmv.v.x/i and ; vsext, but we don't support this for FP yet. define <2 x float> @signbits() { +; CHECK-LABEL: signbits: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:lui a0, %hi(.LCPI24_0) +; CHECK-NEXT:addi a0, a0, %lo(.LCPI24_0) +; CHECK-NEXT:vsetivli zero, 2, e32, mf2, ta, ma +; CHECK-NEXT:vle32.v v8, (a0) +; CHECK-NEXT:ret entry: ret <2 x float> } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 52b413f - [RISCV] Precommit tests for buildvector lowering with exact VLEN
Author: Philip Reames Date: 2023-11-27T16:48:20-08:00 New Revision: 52b413f25ae79b07df88c0224adec4a6d7dabecc URL: https://github.com/llvm/llvm-project/commit/52b413f25ae79b07df88c0224adec4a6d7dabecc DIFF: https://github.com/llvm/llvm-project/commit/52b413f25ae79b07df88c0224adec4a6d7dabecc.diff LOG: [RISCV] Precommit tests for buildvector lowering with exact VLEN Added: Modified: llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll Removed: diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll index 05aa5f9807b9fc4..31ed3083e05a114 100644 --- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll +++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll @@ -1077,13 +1077,252 @@ define <32 x double> @buildvec_v32f64(double %e0, double %e1, double %e2, double ret <32 x double> %v31 } +define <32 x double> @buildvec_v32f64_exact_vlen(double %e0, double %e1, double %e2, double %e3, double %e4, double %e5, double %e6, double %e7, double %e8, double %e9, double %e10, double %e11, double %e12, double %e13, double %e14, double %e15, double %e16, double %e17, double %e18, double %e19, double %e20, double %e21, double %e22, double %e23, double %e24, double %e25, double %e26, double %e27, double %e28, double %e29, double %e30, double %e31) vscale_range(2,2) { +; RV32-LABEL: buildvec_v32f64_exact_vlen: +; RV32: # %bb.0: +; RV32-NEXT:addi sp, sp, -512 +; RV32-NEXT:.cfi_def_cfa_offset 512 +; RV32-NEXT:sw ra, 508(sp) # 4-byte Folded Spill +; RV32-NEXT:sw s0, 504(sp) # 4-byte Folded Spill +; RV32-NEXT:fsd fs0, 496(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs1, 488(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs2, 480(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs3, 472(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs4, 464(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs5, 456(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs6, 448(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs7, 440(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs8, 432(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs9, 424(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs10, 416(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs11, 408(sp) # 8-byte Folded Spill +; RV32-NEXT:.cfi_offset ra, -4 +; RV32-NEXT:.cfi_offset s0, -8 +; RV32-NEXT:.cfi_offset fs0, -16 +; RV32-NEXT:.cfi_offset fs1, -24 +; RV32-NEXT:.cfi_offset fs2, -32 +; RV32-NEXT:.cfi_offset fs3, -40 +; RV32-NEXT:.cfi_offset fs4, -48 +; RV32-NEXT:.cfi_offset fs5, -56 +; RV32-NEXT:.cfi_offset fs6, -64 +; RV32-NEXT:.cfi_offset fs7, -72 +; RV32-NEXT:.cfi_offset fs8, -80 +; RV32-NEXT:.cfi_offset fs9, -88 +; RV32-NEXT:.cfi_offset fs10, -96 +; RV32-NEXT:.cfi_offset fs11, -104 +; RV32-NEXT:addi s0, sp, 512 +; RV32-NEXT:.cfi_def_cfa s0, 0 +; RV32-NEXT:andi sp, sp, -128 +; RV32-NEXT:sw a0, 120(sp) +; RV32-NEXT:sw a1, 124(sp) +; RV32-NEXT:fld ft0, 120(sp) +; RV32-NEXT:sw a2, 120(sp) +; RV32-NEXT:sw a3, 124(sp) +; RV32-NEXT:fld ft1, 120(sp) +; RV32-NEXT:sw a4, 120(sp) +; RV32-NEXT:sw a5, 124(sp) +; RV32-NEXT:fld ft2, 120(sp) +; RV32-NEXT:sw a6, 120(sp) +; RV32-NEXT:sw a7, 124(sp) +; RV32-NEXT:fld ft3, 120(sp) +; RV32-NEXT:fld ft4, 0(s0) +; RV32-NEXT:fld ft5, 8(s0) +; RV32-NEXT:fld ft6, 16(s0) +; RV32-NEXT:fld ft7, 24(s0) +; RV32-NEXT:fld ft8, 32(s0) +; RV32-NEXT:fld ft9, 40(s0) +; RV32-NEXT:fld ft10, 48(s0) +; RV32-NEXT:fld ft11, 56(s0) +; RV32-NEXT:fld fs0, 64(s0) +; RV32-NEXT:fld fs1, 72(s0) +; RV32-NEXT:fld fs2, 80(s0) +; RV32-NEXT:fld fs3, 88(s0) +; RV32-NEXT:fld fs4, 96(s0) +; RV32-NEXT:fld fs5, 104(s0) +; RV32-NEXT:fld fs6, 112(s0) +; RV32-NEXT:fld fs7, 120(s0) +; RV32-NEXT:fld fs8, 152(s0) +; RV32-NEXT:fld fs9, 144(s0) +; RV32-NEXT:fld fs10, 136(s0) +; RV32-NEXT:fld fs11, 128(s0) +; RV32-NEXT:fsd fs8, 248(sp) +; RV32-NEXT:fsd fs9, 240(sp) +; RV32-NEXT:fsd fs10, 232(sp) +; RV32-NEXT:fsd fs11, 224(sp) +; RV32-NEXT:fsd fs7, 216(sp) +; RV32-NEXT:fsd fs6, 208(sp) +; RV32-NEXT:fsd fs5, 200(sp) +; RV32-NEXT:fsd fs4, 192(sp) +; RV32-NEXT:fsd fs3, 184(sp) +; RV32-NEXT:fsd fs2, 176(sp) +; RV32-NEXT:fsd fs1, 168(sp) +; RV32-NEXT:fsd fs0, 160(sp) +; RV32-NEXT:fsd ft11, 152(sp) +; RV32-NEXT:fsd ft10, 144(sp) +; RV32-NEXT:fsd ft9, 136(sp) +; RV32-NEXT:fsd ft8, 128(sp) +; RV32-NEXT:fsd ft7, 376(sp) +; RV32-NEXT:fsd ft6, 368(sp) +; RV32-NEXT:fsd ft5, 360(sp) +; RV32-NEXT:fsd ft4, 352(sp) +; RV32-NEXT:fsd fa7, 312(sp) +; RV32-NEXT:fsd fa6, 304(sp) +; RV32-NEXT:fsd fa5, 296(sp) +; RV32-NEXT:fsd fa4, 288(sp) +; RV32-NEXT:
[llvm-branch-commits] [llvm] 52b413f - [RISCV] Precommit tests for buildvector lowering with exact VLEN
Author: Philip Reames Date: 2023-11-27T16:48:20-08:00 New Revision: 52b413f25ae79b07df88c0224adec4a6d7dabecc URL: https://github.com/llvm/llvm-project/commit/52b413f25ae79b07df88c0224adec4a6d7dabecc DIFF: https://github.com/llvm/llvm-project/commit/52b413f25ae79b07df88c0224adec4a6d7dabecc.diff LOG: [RISCV] Precommit tests for buildvector lowering with exact VLEN Added: Modified: llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll Removed: diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll index 05aa5f9807b9fc4..31ed3083e05a114 100644 --- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll +++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll @@ -1077,13 +1077,252 @@ define <32 x double> @buildvec_v32f64(double %e0, double %e1, double %e2, double ret <32 x double> %v31 } +define <32 x double> @buildvec_v32f64_exact_vlen(double %e0, double %e1, double %e2, double %e3, double %e4, double %e5, double %e6, double %e7, double %e8, double %e9, double %e10, double %e11, double %e12, double %e13, double %e14, double %e15, double %e16, double %e17, double %e18, double %e19, double %e20, double %e21, double %e22, double %e23, double %e24, double %e25, double %e26, double %e27, double %e28, double %e29, double %e30, double %e31) vscale_range(2,2) { +; RV32-LABEL: buildvec_v32f64_exact_vlen: +; RV32: # %bb.0: +; RV32-NEXT:addi sp, sp, -512 +; RV32-NEXT:.cfi_def_cfa_offset 512 +; RV32-NEXT:sw ra, 508(sp) # 4-byte Folded Spill +; RV32-NEXT:sw s0, 504(sp) # 4-byte Folded Spill +; RV32-NEXT:fsd fs0, 496(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs1, 488(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs2, 480(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs3, 472(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs4, 464(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs5, 456(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs6, 448(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs7, 440(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs8, 432(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs9, 424(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs10, 416(sp) # 8-byte Folded Spill +; RV32-NEXT:fsd fs11, 408(sp) # 8-byte Folded Spill +; RV32-NEXT:.cfi_offset ra, -4 +; RV32-NEXT:.cfi_offset s0, -8 +; RV32-NEXT:.cfi_offset fs0, -16 +; RV32-NEXT:.cfi_offset fs1, -24 +; RV32-NEXT:.cfi_offset fs2, -32 +; RV32-NEXT:.cfi_offset fs3, -40 +; RV32-NEXT:.cfi_offset fs4, -48 +; RV32-NEXT:.cfi_offset fs5, -56 +; RV32-NEXT:.cfi_offset fs6, -64 +; RV32-NEXT:.cfi_offset fs7, -72 +; RV32-NEXT:.cfi_offset fs8, -80 +; RV32-NEXT:.cfi_offset fs9, -88 +; RV32-NEXT:.cfi_offset fs10, -96 +; RV32-NEXT:.cfi_offset fs11, -104 +; RV32-NEXT:addi s0, sp, 512 +; RV32-NEXT:.cfi_def_cfa s0, 0 +; RV32-NEXT:andi sp, sp, -128 +; RV32-NEXT:sw a0, 120(sp) +; RV32-NEXT:sw a1, 124(sp) +; RV32-NEXT:fld ft0, 120(sp) +; RV32-NEXT:sw a2, 120(sp) +; RV32-NEXT:sw a3, 124(sp) +; RV32-NEXT:fld ft1, 120(sp) +; RV32-NEXT:sw a4, 120(sp) +; RV32-NEXT:sw a5, 124(sp) +; RV32-NEXT:fld ft2, 120(sp) +; RV32-NEXT:sw a6, 120(sp) +; RV32-NEXT:sw a7, 124(sp) +; RV32-NEXT:fld ft3, 120(sp) +; RV32-NEXT:fld ft4, 0(s0) +; RV32-NEXT:fld ft5, 8(s0) +; RV32-NEXT:fld ft6, 16(s0) +; RV32-NEXT:fld ft7, 24(s0) +; RV32-NEXT:fld ft8, 32(s0) +; RV32-NEXT:fld ft9, 40(s0) +; RV32-NEXT:fld ft10, 48(s0) +; RV32-NEXT:fld ft11, 56(s0) +; RV32-NEXT:fld fs0, 64(s0) +; RV32-NEXT:fld fs1, 72(s0) +; RV32-NEXT:fld fs2, 80(s0) +; RV32-NEXT:fld fs3, 88(s0) +; RV32-NEXT:fld fs4, 96(s0) +; RV32-NEXT:fld fs5, 104(s0) +; RV32-NEXT:fld fs6, 112(s0) +; RV32-NEXT:fld fs7, 120(s0) +; RV32-NEXT:fld fs8, 152(s0) +; RV32-NEXT:fld fs9, 144(s0) +; RV32-NEXT:fld fs10, 136(s0) +; RV32-NEXT:fld fs11, 128(s0) +; RV32-NEXT:fsd fs8, 248(sp) +; RV32-NEXT:fsd fs9, 240(sp) +; RV32-NEXT:fsd fs10, 232(sp) +; RV32-NEXT:fsd fs11, 224(sp) +; RV32-NEXT:fsd fs7, 216(sp) +; RV32-NEXT:fsd fs6, 208(sp) +; RV32-NEXT:fsd fs5, 200(sp) +; RV32-NEXT:fsd fs4, 192(sp) +; RV32-NEXT:fsd fs3, 184(sp) +; RV32-NEXT:fsd fs2, 176(sp) +; RV32-NEXT:fsd fs1, 168(sp) +; RV32-NEXT:fsd fs0, 160(sp) +; RV32-NEXT:fsd ft11, 152(sp) +; RV32-NEXT:fsd ft10, 144(sp) +; RV32-NEXT:fsd ft9, 136(sp) +; RV32-NEXT:fsd ft8, 128(sp) +; RV32-NEXT:fsd ft7, 376(sp) +; RV32-NEXT:fsd ft6, 368(sp) +; RV32-NEXT:fsd ft5, 360(sp) +; RV32-NEXT:fsd ft4, 352(sp) +; RV32-NEXT:fsd fa7, 312(sp) +; RV32-NEXT:fsd fa6, 304(sp) +; RV32-NEXT:fsd fa5, 296(sp) +; RV32-NEXT:fsd fa4, 288(sp) +; RV32-NEXT:
[llvm-branch-commits] [llvm] Revert "[RISCV] Recurse on first operand of two operand shuffles (#79180)" (PR #80238)
https://github.com/preames created https://github.com/llvm/llvm-project/pull/80238 This reverts commit bdc41106ee48dce59c500c9a3957af947f30c8c3 on the release/18.x branch. This change was the first in a mini-series and while I'm not aware of any particular problem from having it on it's own in the branch, it seems safer to ship with the previous known good state. @tstellar This is my first backport in the new process, so please bear with me and double check I got all pieces of this right. >From 98e43e0054ab81e3455011933e1bdf64bd59e148 Mon Sep 17 00:00:00 2001 From: Philip Reames Date: Wed, 31 Jan 2024 14:44:39 -0800 Subject: [PATCH] Revert "[RISCV] Recurse on first operand of two operand shuffles (#79180)" This reverts commit bdc41106ee48dce59c500c9a3957af947f30c8c3 on the release/18.x branch. This change was the first in a mini-series and while I'm not aware of any particular problem from having it on it's own in the branch, it seems safer to ship with the previous known good state. --- llvm/lib/Target/RISCV/RISCVISelLowering.cpp | 92 ++--- .../RISCV/rvv/fixed-vectors-fp-interleave.ll | 41 +- .../RISCV/rvv/fixed-vectors-int-interleave.ll | 63 +-- .../RISCV/rvv/fixed-vectors-int-shuffles.ll | 43 +- .../rvv/fixed-vectors-interleaved-access.ll | 387 +- .../rvv/fixed-vectors-shuffle-transpose.ll| 128 +++--- 6 files changed, 407 insertions(+), 347 deletions(-) diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index 47c6cd6e5487b..c8f7b5c35a381 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -5033,60 +5033,56 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG, MVT IndexContainerVT = ContainerVT.changeVectorElementType(IndexVT.getScalarType()); - // Base case for the recursion just below - handle the worst case - // single source permutation. Note that all the splat variants - // are handled above. - if (V2.isUndef()) { + SDValue Gather; + // TODO: This doesn't trigger for i64 vectors on RV32, since there we + // encounter a bitcasted BUILD_VECTOR with low/high i32 values. + if (SDValue SplatValue = DAG.getSplatValue(V1, /*LegalTypes*/ true)) { +Gather = lowerScalarSplat(SDValue(), SplatValue, VL, ContainerVT, DL, DAG, + Subtarget); + } else { V1 = convertToScalableVector(ContainerVT, V1, DAG, Subtarget); -SDValue LHSIndices = DAG.getBuildVector(IndexVT, DL, GatherIndicesLHS); -LHSIndices = convertToScalableVector(IndexContainerVT, LHSIndices, DAG, - Subtarget); -SDValue Gather = DAG.getNode(GatherVVOpc, DL, ContainerVT, V1, LHSIndices, - DAG.getUNDEF(ContainerVT), TrueMask, VL); -return convertFromScalableVector(VT, Gather, DAG, Subtarget); - } - - // Translate the gather index we computed above (and possibly swapped) - // back to a shuffle mask. This step should disappear once we complete - // the migration to recursive design. - SmallVector ShuffleMaskLHS; - ShuffleMaskLHS.reserve(GatherIndicesLHS.size()); - for (SDValue GatherIndex : GatherIndicesLHS) { -if (GatherIndex.isUndef()) { - ShuffleMaskLHS.push_back(-1); - continue; +// If only one index is used, we can use a "splat" vrgather. +// TODO: We can splat the most-common index and fix-up any stragglers, if +// that's beneficial. +if (LHSIndexCounts.size() == 1) { + int SplatIndex = LHSIndexCounts.begin()->getFirst(); + Gather = DAG.getNode(GatherVXOpc, DL, ContainerVT, V1, + DAG.getConstant(SplatIndex, DL, XLenVT), + DAG.getUNDEF(ContainerVT), TrueMask, VL); +} else { + SDValue LHSIndices = DAG.getBuildVector(IndexVT, DL, GatherIndicesLHS); + LHSIndices = + convertToScalableVector(IndexContainerVT, LHSIndices, DAG, Subtarget); + + Gather = DAG.getNode(GatherVVOpc, DL, ContainerVT, V1, LHSIndices, + DAG.getUNDEF(ContainerVT), TrueMask, VL); } -auto *IdxC = cast(GatherIndex); -ShuffleMaskLHS.push_back(IdxC->getZExtValue()); } - // Recursively invoke lowering for the LHS as if there were no RHS. - // This allows us to leverage all of our single source permute tricks. - SDValue Gather = -DAG.getVectorShuffle(VT, DL, V1, DAG.getUNDEF(VT), ShuffleMaskLHS); - Gather = convertToScalableVector(ContainerVT, Gather, DAG, Subtarget); + // If a second vector operand is used by this shuffle, blend it in with an + // additional vrgather. + if (!V2.isUndef()) { +V2 = convertToScalableVector(ContainerVT, V2, DAG, Subtarget); - // Blend in second vector source with an additional vrgather. - V2 = convertToScalableVector(ContainerVT, V2, DAG, Subtarget); +MVT MaskContainerVT = ContainerVT.changeVectorElementType(MVT::i1); +SelectMask = +convert
[llvm-branch-commits] [flang] [libc] [compiler-rt] [clang] [libcxx] [llvm] [RISCV] Support select optimization (PR #80124)
preames wrote: > and the measurement data still stands for RISCV. Please give the measurement data in this review or a direct link to it. I tried searching for it, and did not immediately find it. https://github.com/llvm/llvm-project/pull/80124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [compiler-rt] [flang] [clang] [libcxx] [libc] [RISCV] Support select optimization (PR #80124)
preames wrote: JFYI, I don't find the AArch64 data particularly convincing for RISCV. The magnitude of the change even on AArch64 is small, and could easily be swung one direction or the other by differences in implementation between the backends. https://github.com/llvm/llvm-project/pull/80124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 9f61fbd - [LV] Relax assumption that LCSSA implies single entry
Author: Philip Reames Date: 2021-01-12T12:34:52-08:00 New Revision: 9f61fbd75ae1757d77988b37562de4d6583579aa URL: https://github.com/llvm/llvm-project/commit/9f61fbd75ae1757d77988b37562de4d6583579aa DIFF: https://github.com/llvm/llvm-project/commit/9f61fbd75ae1757d77988b37562de4d6583579aa.diff LOG: [LV] Relax assumption that LCSSA implies single entry This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317). The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis. A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates) Differential Revision: https://reviews.llvm.org/D93725 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll llvm/test/Transforms/LoopVectorize/loop-form.ll Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp index 3906b11ba4b9..e3e522958c3a 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp @@ -1101,8 +1101,7 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp, // TODO: This restriction can be relaxed in the near future, it's here solely // to allow separation of changes for review. We need to generalize the phi // update logic in a number of places. - BasicBlock *ExitBB = Lp->getUniqueExitBlock(); - if (!ExitBB) { + if (!Lp->getUniqueExitBlock()) { reportVectorizationFailure("The loop must have a unique exit block", "loop control flow is not understood by vectorizer", "CFGNotUnderstood", ORE, TheLoop); @@ -1110,24 +1109,7 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp, Result = false; else return false; - } else { -// The existing code assumes that LCSSA implies that phis are single entry -// (which was true when we had at most a single exiting edge from the latch). -// In general, there's nothing which prevents an LCSSA phi in exit block from -// having two or more values if there are multiple exiting edges leading to -// the exit block. (TODO: implement general case) -if (!llvm::empty(ExitBB->phis()) && !ExitBB->getSinglePredecessor()) { - reportVectorizationFailure("The loop must have no live-out values if " - "it has more than one exiting block", - "loop control flow is not understood by vectorizer", - "CFGNotUnderstood", ORE, TheLoop); - if (DoExtraAnalysis) -Result = false; - else -return false; -} } - return Result; } diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index e6cadf8f8796..5ae400fb5dc9 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -633,10 +633,11 @@ class InnerLoopVectorizer { /// Clear NSW/NUW flags from reduction instructions if necessary. void clearReductionWrapFlags(RecurrenceDescriptor &RdxDesc); - /// The Loop exit block may have single value PHI nodes with some - /// incoming value. While vectorizing we only handled real values - /// that were defined inside the loop and we should have one value for - /// each predecessor of its parent basic block. See PR14725. + /// Fixup the LCSSA phi nodes in the unique exit block. This simply + /// means we need to add the appropriate incoming value from the middle + /// block as exiting edges from the scalar epilogue loop (if present) are + /// already in place, and we exit the vector loop exclusively to the middle + /// block. void fixLCSSAPHIs(); /// Iteratively sink the scalarized operands of a predicated instruction into @@ -4149,11 +4150,14 @@ void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) { // vector recurrence we extracted in the middle block. Since the loop is in // LCSSA form, we just need to find all the phi nodes for the original scalar // recurrence in the exit block, and then add an edge for the middle block. - for (PHINode &LCSSAPhi : LoopExitBlock->phis()) { -if (LCSSAPhi.getIncomingValue(0) == Phi) { + // Note that LCSSA does not imply single entry when the original scalar loop + // had multiple exiting edges (as we always run the last iteration in the + // scalar epilogue); in that case, the exiting path through middle will be + // dynamically dead
[llvm-branch-commits] [llvm] caafdf0 - [LV] Weaken spuriously strong assert in LoopVersioning
Author: Philip Reames Date: 2021-01-12T12:57:13-08:00 New Revision: caafdf07bbccbe89219539e2b56043c2a98358f1 URL: https://github.com/llvm/llvm-project/commit/caafdf07bbccbe89219539e2b56043c2a98358f1 DIFF: https://github.com/llvm/llvm-project/commit/caafdf07bbccbe89219539e2b56043c2a98358f1.diff LOG: [LV] Weaken spuriously strong assert in LoopVersioning LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning. As a result, the precondition LoopVersioning expects is too strong for this user. At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here. Really, the whole class structure here is a mess. We should separate the actual versioning from the metadata updates, but that's a bigger problem. Added: Modified: llvm/lib/Transforms/Utils/LoopVersioning.cpp Removed: diff --git a/llvm/lib/Transforms/Utils/LoopVersioning.cpp b/llvm/lib/Transforms/Utils/LoopVersioning.cpp index b54aee35d56d..599bd1feb2bc 100644 --- a/llvm/lib/Transforms/Utils/LoopVersioning.cpp +++ b/llvm/lib/Transforms/Utils/LoopVersioning.cpp @@ -44,7 +44,7 @@ LoopVersioning::LoopVersioning(const LoopAccessInfo &LAI, AliasChecks(Checks.begin(), Checks.end()), Preds(LAI.getPSE().getUnionPredicate()), LAI(LAI), LI(LI), DT(DT), SE(SE) { - assert(L->getExitBlock() && "No single exit block"); + assert(L->getUniqueExitBlock() && "No single exit block"); } void LoopVersioning::versionLoop( ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 7011086 - [test] Autogen a loop vectorizer test to make future changes visible
Author: Philip Reames Date: 2021-01-17T20:03:22-08:00 New Revision: 7011086dc1cd5575f971db0138a62387939e6a73 URL: https://github.com/llvm/llvm-project/commit/7011086dc1cd5575f971db0138a62387939e6a73 DIFF: https://github.com/llvm/llvm-project/commit/7011086dc1cd5575f971db0138a62387939e6a73.diff LOG: [test] Autogen a loop vectorizer test to make future changes visible Added: Modified: llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll Removed: diff --git a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll index dbc90bcf4519..0d4bdf0ecac3 100644 --- a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll +++ b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll @@ -1,3 +1,4 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; RUN: opt -S -loop-vectorize -instcombine -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses=true -runtime-memory-check-threshold=24 < %s | FileCheck %s target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128" @@ -16,19 +17,48 @@ target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128" ; } ; } -; CHECK-LABEL: @test_array_load2_store2( -; CHECK: %wide.vec = load <8 x i32>, <8 x i32>* %{{.*}}, align 4 -; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> -; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> -; CHECK: add nsw <4 x i32> -; CHECK: mul nsw <4 x i32> -; CHECK: %interleaved.vec = shufflevector <4 x i32> {{.*}}, <8 x i32> -; CHECK: store <8 x i32> %interleaved.vec, <8 x i32>* %{{.*}}, align 4 @AB = common global [1024 x i32] zeroinitializer, align 4 @CD = common global [1024 x i32] zeroinitializer, align 4 define void @test_array_load2_store2(i32 %C, i32 %D) { +; CHECK-LABEL: @test_array_load2_store2( +; CHECK-NEXT: entry: +; CHECK-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT:[[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[C:%.*]], i32 0 +; CHECK-NEXT:[[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer +; CHECK-NEXT:[[BROADCAST_SPLATINSERT2:%.*]] = insertelement <4 x i32> poison, i32 [[D:%.*]], i32 0 +; CHECK-NEXT:[[BROADCAST_SPLAT3:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT2]], <4 x i32> poison, <4 x i32> zeroinitializer +; CHECK-NEXT:br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT:[[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1 +; CHECK-NEXT:[[TMP0:%.*]] = getelementptr inbounds [1024 x i32], [1024 x i32]* @AB, i64 0, i64 [[OFFSET_IDX]] +; CHECK-NEXT:[[TMP1:%.*]] = bitcast i32* [[TMP0]] to <8 x i32>* +; CHECK-NEXT:[[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP1]], align 4 +; CHECK-NEXT:[[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> +; CHECK-NEXT:[[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> +; CHECK-NEXT:[[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 1 +; CHECK-NEXT:[[TMP3:%.*]] = add nsw <4 x i32> [[STRIDED_VEC]], [[BROADCAST_SPLAT]] +; CHECK-NEXT:[[TMP4:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[BROADCAST_SPLAT3]] +; CHECK-NEXT:[[TMP5:%.*]] = getelementptr inbounds [1024 x i32], [1024 x i32]* @CD, i64 0, i64 [[TMP2]] +; CHECK-NEXT:[[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP5]], i64 -1 +; CHECK-NEXT:[[TMP7:%.*]] = bitcast i32* [[TMP6]] to <8 x i32>* +; CHECK-NEXT:[[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <8 x i32> +; CHECK-NEXT:store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP7]], align 4 +; CHECK-NEXT:[[INDEX_NEXT]] = add i64 [[INDEX]], 4 +; CHECK-NEXT:[[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512 +; CHECK-NEXT:br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.*]] +; CHECK: middle.block: +; CHECK-NEXT:br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] +; CHECK: scalar.ph: +; CHECK-NEXT:br label [[FOR_BODY:%.*]] +; CHECK: for.body: +; CHECK-NEXT:br i1 undef, label [[FOR_BODY]], label [[FOR_END]], [[LOOP2:!llvm.loop !.*]] +; CHECK: for.end: +; CHECK-NEXT:ret void +; entry: br label %for.body @@ -67,24 +97,48 @@ for.end: ; preds = %for.body ; } ; } -; CHECK-LABEL: @test_struct_array_load3_store3( -; CHECK: %wide.vec = load <12 x i32>, <12 x i32>* {{.*}}, align 4 -; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> poison, <4 x i32> -; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> poison, <4 x i32
[llvm-branch-commits] [llvm] 8356610 - [test] pre commit a couple more tests for vectorizing multiple exit loops
Author: Philip Reames Date: 2021-01-17T20:29:13-08:00 New Revision: 8356610f8d48ca7ecbb930dd9b987e4269784710 URL: https://github.com/llvm/llvm-project/commit/8356610f8d48ca7ecbb930dd9b987e4269784710 DIFF: https://github.com/llvm/llvm-project/commit/8356610f8d48ca7ecbb930dd9b987e4269784710.diff LOG: [test] pre commit a couple more tests for vectorizing multiple exit loops Added: Modified: llvm/test/Transforms/LoopVectorize/loop-form.ll Removed: diff --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll b/llvm/test/Transforms/LoopVectorize/loop-form.ll index 5b2dd81a395b..91780789088b 100644 --- a/llvm/test/Transforms/LoopVectorize/loop-form.ll +++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll @@ -588,6 +588,140 @@ if.end2: ret i32 1 } +; LCSSA, common value each exit +define i32 @multiple_exit_blocks2(i16* %p, i32 %n) { +; CHECK-LABEL: @multiple_exit_blocks2( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[FOR_COND:%.*]] +; CHECK: for.cond: +; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ] +; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]] +; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]] +; CHECK: for.body: +; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64 +; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 [[IPROM]] +; CHECK-NEXT:store i16 0, i16* [[B]], align 4 +; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1 +; CHECK-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096 +; CHECK-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]] +; CHECK: if.end: +; CHECK-NEXT:[[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ] +; CHECK-NEXT:ret i32 [[I_LCSSA]] +; CHECK: if.end2: +; CHECK-NEXT:[[I_LCSSA1:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ] +; CHECK-NEXT:ret i32 [[I_LCSSA1]] +; +; TAILFOLD-LABEL: @multiple_exit_blocks2( +; TAILFOLD-NEXT: entry: +; TAILFOLD-NEXT:br label [[FOR_COND:%.*]] +; TAILFOLD: for.cond: +; TAILFOLD-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ] +; TAILFOLD-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]] +; TAILFOLD-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]] +; TAILFOLD: for.body: +; TAILFOLD-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64 +; TAILFOLD-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 [[IPROM]] +; TAILFOLD-NEXT:store i16 0, i16* [[B]], align 4 +; TAILFOLD-NEXT:[[INC]] = add nsw i32 [[I]], 1 +; TAILFOLD-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096 +; TAILFOLD-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]] +; TAILFOLD: if.end: +; TAILFOLD-NEXT:[[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ] +; TAILFOLD-NEXT:ret i32 [[I_LCSSA]] +; TAILFOLD: if.end2: +; TAILFOLD-NEXT:[[I_LCSSA1:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ] +; TAILFOLD-NEXT:ret i32 [[I_LCSSA1]] +; +entry: + br label %for.cond + +for.cond: + %i = phi i32 [ 0, %entry ], [ %inc, %for.body ] + %cmp = icmp slt i32 %i, %n + br i1 %cmp, label %for.body, label %if.end + +for.body: + %iprom = sext i32 %i to i64 + %b = getelementptr inbounds i16, i16* %p, i64 %iprom + store i16 0, i16* %b, align 4 + %inc = add nsw i32 %i, 1 + %cmp2 = icmp slt i32 %i, 2096 + br i1 %cmp2, label %for.cond, label %if.end2 + +if.end: + ret i32 %i + +if.end2: + ret i32 %i +} + +; LCSSA, distinct value each exit +define i32 @multiple_exit_blocks3(i16* %p, i32 %n) { +; CHECK-LABEL: @multiple_exit_blocks3( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[FOR_COND:%.*]] +; CHECK: for.cond: +; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ] +; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]] +; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]] +; CHECK: for.body: +; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64 +; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 [[IPROM]] +; CHECK-NEXT:store i16 0, i16* [[B]], align 4 +; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1 +; CHECK-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096 +; CHECK-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]] +; CHECK: if.end: +; CHECK-NEXT:[[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ] +; CHECK-NEXT:ret i32 [[I_LCSSA]] +; CHECK: if.end2: +; CHECK-NEXT:[[INC_LCSSA:%.*]] = phi i32 [ [[INC]], [[FOR_BODY]] ] +; CHECK-NEXT:ret i32 [[INC_LCSSA]] +; +; TAILFOLD-LABEL: @multiple_exit_blocks3( +; TAILFOLD-NEXT: entry: +; TAILFOLD-NEXT:br label [[FOR_COND:%.*]] +; TAILFOLD: for.cond: +; TAILFOLD-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ] +; TAILFOLD-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]] +; TAILFOLD-
[llvm-branch-commits] [llvm] ef51eed - [LoopDeletion] Handle inner loops w/untaken backedges
Author: Philip Reames Date: 2021-01-22T16:31:29-08:00 New Revision: ef51eed37b7ed67b3c0e5f70fa61d681ba21787d URL: https://github.com/llvm/llvm-project/commit/ef51eed37b7ed67b3c0e5f70fa61d681ba21787d DIFF: https://github.com/llvm/llvm-project/commit/ef51eed37b7ed67b3c0e5f70fa61d681ba21787d.diff LOG: [LoopDeletion] Handle inner loops w/untaken backedges This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA. When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes. This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop. (I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.) The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue. Differential Revision: https://reviews.llvm.org/D94378 Added: Modified: llvm/lib/Transforms/Scalar/LoopDeletion.cpp llvm/lib/Transforms/Utils/LoopUtils.cpp llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll llvm/test/Transforms/LoopDeletion/zero-btc.ll Removed: diff --git a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp index bd5cdeabb9bd..1266c93316fa 100644 --- a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp +++ b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp @@ -151,14 +151,6 @@ breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, ScalarEvolution &SE, if (!BTC->isZero()) return LoopDeletionResult::Unmodified; - // For non-outermost loops, the tricky case is that we can drop blocks - // out of both inner and outer loops at the same time. This results in - // new exiting block for the outer loop appearing, and possibly needing - // an lcssa phi inserted. (See loop_nest_lcssa test case in zero-btc.ll) - // TODO: We can handle a bunch of cases here without much work, revisit. - if (!L->isOutermost()) -return LoopDeletionResult::Unmodified; - breakLoopBackedge(L, DT, SE, LI, MSSA); return LoopDeletionResult::Deleted; } diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp index e6575ee2caf2..8d167923db00 100644 --- a/llvm/lib/Transforms/Utils/LoopUtils.cpp +++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp @@ -761,13 +761,18 @@ void llvm::deleteDeadLoop(Loop *L, DominatorTree *DT, ScalarEvolution *SE, } } +static Loop *getOutermostLoop(Loop *L) { + while (Loop *Parent = L->getParentLoop()) +L = Parent; + return L; +} + void llvm::breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE, LoopInfo &LI, MemorySSA *MSSA) { - - assert(L->isOutermost() && "Can't yet preserve LCSSA for this case"); auto *Latch = L->getLoopLatch(); assert(Latch && "multiple latches not yet supported"); auto *Header = L->getHeader(); + Loop *OutermostLoop = getOutermostLoop(L); SE.forgetLoop(L); @@ -790,6 +795,14 @@ void llvm::breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE, // Erase (and destroy) this loop instance. Handles relinking sub-loops // and blocks within the loop as needed. LI.erase(L); + + // If the loop we broke had a parent, then changeToUnreachable might have + // caused a block to be removed from the parent loop (see loop_nest_lcssa + // test case in zero-btc.ll for an example), thus changing the parent's + // exit blocks. If that happened, we need to rebuild LCSSA on the outermost + // loop which might have a had a block removed. + if (OutermostLoop != L) +formLCSSARecursively(*OutermostLoop, DT, &LI, &SE); } diff --git a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll index d0857fa707b1..397c23cfd3ea 100644 --- a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll +++ b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll @@ -23,8 +23,8 @@ define dso_local i32 @main() { ; CHECK-NEXT:[[I6:%.*]] = load i32, i32* @a, align 4 ; CHECK-NEXT:[[I24:%.*]] = load i32, i32* @b, align 4 ; CHECK-NEXT:[[D_PROMOTED9:%.*]] = load i32, i32* @d, align 4 -; CHECK-NEXT:br label [[BB1:%.*]] -; CHECK: bb1: +; CHECK-NEXT:br label [[BB13_PREHEADER:%.*]] +; CHECK: bb13.p
[llvm-branch-commits] [llvm] 4b33b23 - Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for builder
Author: Philip Reames Date: 2020-12-28T10:13:28-08:00 New Revision: 4b33b2387787aef5020450cdcc8dde231eb0a5fc URL: https://github.com/llvm/llvm-project/commit/4b33b2387787aef5020450cdcc8dde231eb0a5fc DIFF: https://github.com/llvm/llvm-project/commit/4b33b2387787aef5020450cdcc8dde231eb0a5fc.diff LOG: Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for builder This reverts commit 4ffcd4fe9ac2ee948948f732baa16663eb63f1c7 thus restoring e4df6a40dad. The only change from the original patch is to add "llvm::" before the call to empty(iterator_range). This is a speculative fix for the ambiguity reported on some builders. Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/control-flow.ll llvm/test/Transforms/LoopVectorize/loop-form.ll llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp index 60e1cc9a4a59..65b3132dc3f1 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp @@ -1095,9 +1095,15 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp, return false; } - // We must have a single exiting block. - if (!Lp->getExitingBlock()) { -reportVectorizationFailure("The loop must have an exiting block", + // We currently must have a single "exit block" after the loop. Note that + // multiple "exiting blocks" inside the loop are allowed, provided they all + // reach the single exit block. + // TODO: This restriction can be relaxed in the near future, it's here solely + // to allow separation of changes for review. We need to generalize the phi + // update logic in a number of places. + BasicBlock *ExitBB = Lp->getUniqueExitBlock(); + if (!ExitBB) { +reportVectorizationFailure("The loop must have a unique exit block", "loop control flow is not understood by vectorizer", "CFGNotUnderstood", ORE, TheLoop); if (DoExtraAnalysis) @@ -1106,11 +1112,14 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp, return false; } - // We only handle bottom-tested loops, i.e. loop in which the condition is - // checked at the end of each iteration. With that we can assume that all - // instructions in the loop are executed the same number of times. - if (Lp->getExitingBlock() != Lp->getLoopLatch()) { -reportVectorizationFailure("The exiting block is not the loop latch", + // The existing code assumes that LCSSA implies that phis are single entry + // (which was true when we had at most a single exiting edge from the latch). + // In general, there's nothing which prevents an LCSSA phi in exit block from + // having two or more values if there are multiple exiting edges leading to + // the exit block. (TODO: implement general case) + if (!llvm::empty(ExitBB->phis()) && !ExitBB->getSinglePredecessor()) { +reportVectorizationFailure("The loop must have no live-out values if " + "it has more than one exiting block", "loop control flow is not understood by vectorizer", "CFGNotUnderstood", ORE, TheLoop); if (DoExtraAnalysis) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 5889d5e55339..c48b650c3c3e 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -837,7 +837,8 @@ class InnerLoopVectorizer { /// Middle Block between the vector and the scalar. BasicBlock *LoopMiddleBlock; - /// The ExitBlock of the scalar loop. + /// The (unique) ExitBlock of the scalar loop. Note that + /// there can be multiple exiting edges reaching this block. BasicBlock *LoopExitBlock; /// The vector loop body. @@ -1548,11 +1549,16 @@ class LoopVectorizationCostModel { return InterleaveInfo.getInterleaveGroup(Instr); } - /// Returns true if an interleaved group requires a scalar iteration - /// to handle accesses with gaps, and there is nothing preventing us from - /// creating a scalar epilogue. + /// Returns true if we're required to use a scalar epilogue for at least + /// the final iteration of the original loop. bool requiresScalarEpilogue() const { -return isScalarEpilogueAllowed() && InterleaveInfo.requiresScalarEpilogue(); +if (!isScalarEpilogueAllowed()) + return false; +// If we might exit from anywhere but the latch, must run the exiting +// iteration in scalar form. +if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) + return true; +return InterleaveInfo.requiresScalarEpilogue(); } /// Return
[llvm-branch-commits] [llvm] dd6bb36 - [LoopDeletion] Break backedge of loops when known not taken
Author: Philip Reames Date: 2021-01-04T09:19:29-08:00 New Revision: dd6bb367d19e3bf18353e40de54d35480999a930 URL: https://github.com/llvm/llvm-project/commit/dd6bb367d19e3bf18353e40de54d35480999a930 DIFF: https://github.com/llvm/llvm-project/commit/dd6bb367d19e3bf18353e40de54d35480999a930.diff LOG: [LoopDeletion] Break backedge of loops when known not taken The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906 Added: llvm/test/Transforms/LoopDeletion/zero-btc.ll Modified: llvm/include/llvm/Transforms/Utils/LoopUtils.h llvm/lib/Transforms/Scalar/LoopDeletion.cpp llvm/lib/Transforms/Utils/LoopUtils.cpp llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll llvm/test/Transforms/IndVarSimplify/exit_value_test2.ll llvm/test/Transforms/LoopDeletion/update-scev.ll Removed: diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h index b29add4cba0e5..82c0d9e070d78 100644 --- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h +++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h @@ -179,6 +179,12 @@ bool hoistRegion(DomTreeNode *, AAResults *, LoopInfo *, DominatorTree *, void deleteDeadLoop(Loop *L, DominatorTree *DT, ScalarEvolution *SE, LoopInfo *LI, MemorySSA *MSSA = nullptr); +/// Remove the backedge of the specified loop. Handles loop nests and general +/// loop structures subject to the precondition that the loop has a single +/// latch block. Preserves all listed analyses. +void breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE, + LoopInfo &LI, MemorySSA *MSSA); + /// Try to promote memory values to scalars by sinking stores out of /// the loop and moving loads to before the loop. We do this by looping over /// the stores in the loop, looking for stores to Must pointers which are diff --git a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp index 065db647561ec..04120032f0f41 100644 --- a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp +++ b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp @@ -26,6 +26,7 @@ #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Scalar/LoopPassManager.h" #include "llvm/Transforms/Utils/LoopUtils.h" + using namespace llvm; #define DEBUG_TYPE "loop-delete" @@ -38,6 +39,14 @@ enum class LoopDeletionResult { Deleted, }; +static LoopDeletionResult merge(LoopDeletionResult A, LoopDeletionResult B) { + if (A == LoopDeletionResult::Deleted || B == LoopDeletionResult::Deleted) +return LoopDeletionResult::Deleted; + if (A == LoopDeletionResult::Modified || B == LoopDeletionResult::Modified) +return LoopDeletionResult::Modified; + return LoopDeletionResult::Unmodified; +} + /// Determines if a loop is dead. /// /// This assumes that we've already checked for unique exit and exiting blocks, @@ -126,6 +135,26 @@ static bool isLoopNeverExecuted(Loop *L) { return true; } +/// If we can prove the backedge is untaken, remove it. This destroys the +/// loop, but leaves the (now trivially loop invariant) control flow and +/// side effects (if any) in place. +static LoopDeletionResult +breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, ScalarEvolution &SE, +LoopInfo &LI, MemorySSA *MSSA, +OptimizationRemarkEmitter &ORE) { + assert(L->isLCSSAForm(DT) && "Expected LCSSA!"); + + if (!L->getLoopLatch()) +return LoopDeletionResult::Unmodified; + + auto *BTC = SE.getBackedgeTakenCount(L); + if (!BTC->isZero()) +return LoopDeletionResult::Unmodified; + + breakLoopBackedge(L, DT, SE, LI, MSSA); + return LoopDeletionResult::Deleted; +} + /// Remove a loop if it is dead. /// /// A loop is considered dead if it does not impact the observable behavior of @@ -162,7 +191,6 @@ static LoopDeletionResult deleteLoopIfDead(Loop *L, DominatorTree &DT, return LoopDeletionResult::Unmodified; } - BasicBlock *ExitBlock = L->getUniqueExitBlock(); if (ExitBlock && isLoopNeverExecuted(L)) { @@ -240,6 +268,14 @@ PreservedAnalyses LoopDeletionPass::run(Loop &L, LoopAnalysisManager &AM, // but ORE cannot be preserved (see comment before the pass definition). OptimizationRemarkEmitter ORE(L.getHeader()->getParent()); auto Result = deleteLoopIfDead(&L, AR.DT, AR.SE, AR.LI, AR.MSSA, ORE); + + // If we can prove the backedge isn't taken, just break it and be done. This + // leaves the loop structure in place which means it can handle dispatching + // to the right exit based on whatever loop invariant structure remains. + if (Result != LoopDeletionR
[llvm-branch-commits] [llvm] 7c63aac - Revert "[LoopDeletion] Break backedge of loops when known not taken"
Author: Philip Reames Date: 2021-01-04T09:50:47-08:00 New Revision: 7c63aac7bd4e5ce3402f2ef7c1d5b66047230147 URL: https://github.com/llvm/llvm-project/commit/7c63aac7bd4e5ce3402f2ef7c1d5b66047230147 DIFF: https://github.com/llvm/llvm-project/commit/7c63aac7bd4e5ce3402f2ef7c1d5b66047230147.diff LOG: Revert "[LoopDeletion] Break backedge of loops when known not taken" This reverts commit dd6bb367d19e3bf18353e40de54d35480999a930. Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating. Added: Modified: llvm/include/llvm/Transforms/Utils/LoopUtils.h llvm/lib/Transforms/Scalar/LoopDeletion.cpp llvm/lib/Transforms/Utils/LoopUtils.cpp llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll llvm/test/Transforms/IndVarSimplify/exit_value_test2.ll llvm/test/Transforms/LoopDeletion/update-scev.ll Removed: llvm/test/Transforms/LoopDeletion/zero-btc.ll diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h index 82c0d9e070d7..b29add4cba0e 100644 --- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h +++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h @@ -179,12 +179,6 @@ bool hoistRegion(DomTreeNode *, AAResults *, LoopInfo *, DominatorTree *, void deleteDeadLoop(Loop *L, DominatorTree *DT, ScalarEvolution *SE, LoopInfo *LI, MemorySSA *MSSA = nullptr); -/// Remove the backedge of the specified loop. Handles loop nests and general -/// loop structures subject to the precondition that the loop has a single -/// latch block. Preserves all listed analyses. -void breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE, - LoopInfo &LI, MemorySSA *MSSA); - /// Try to promote memory values to scalars by sinking stores out of /// the loop and moving loads to before the loop. We do this by looping over /// the stores in the loop, looking for stores to Must pointers which are diff --git a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp index 04120032f0f4..065db647561e 100644 --- a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp +++ b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp @@ -26,7 +26,6 @@ #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Scalar/LoopPassManager.h" #include "llvm/Transforms/Utils/LoopUtils.h" - using namespace llvm; #define DEBUG_TYPE "loop-delete" @@ -39,14 +38,6 @@ enum class LoopDeletionResult { Deleted, }; -static LoopDeletionResult merge(LoopDeletionResult A, LoopDeletionResult B) { - if (A == LoopDeletionResult::Deleted || B == LoopDeletionResult::Deleted) -return LoopDeletionResult::Deleted; - if (A == LoopDeletionResult::Modified || B == LoopDeletionResult::Modified) -return LoopDeletionResult::Modified; - return LoopDeletionResult::Unmodified; -} - /// Determines if a loop is dead. /// /// This assumes that we've already checked for unique exit and exiting blocks, @@ -135,26 +126,6 @@ static bool isLoopNeverExecuted(Loop *L) { return true; } -/// If we can prove the backedge is untaken, remove it. This destroys the -/// loop, but leaves the (now trivially loop invariant) control flow and -/// side effects (if any) in place. -static LoopDeletionResult -breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, ScalarEvolution &SE, -LoopInfo &LI, MemorySSA *MSSA, -OptimizationRemarkEmitter &ORE) { - assert(L->isLCSSAForm(DT) && "Expected LCSSA!"); - - if (!L->getLoopLatch()) -return LoopDeletionResult::Unmodified; - - auto *BTC = SE.getBackedgeTakenCount(L); - if (!BTC->isZero()) -return LoopDeletionResult::Unmodified; - - breakLoopBackedge(L, DT, SE, LI, MSSA); - return LoopDeletionResult::Deleted; -} - /// Remove a loop if it is dead. /// /// A loop is considered dead if it does not impact the observable behavior of @@ -191,6 +162,7 @@ static LoopDeletionResult deleteLoopIfDead(Loop *L, DominatorTree &DT, return LoopDeletionResult::Unmodified; } + BasicBlock *ExitBlock = L->getUniqueExitBlock(); if (ExitBlock && isLoopNeverExecuted(L)) { @@ -268,14 +240,6 @@ PreservedAnalyses LoopDeletionPass::run(Loop &L, LoopAnalysisManager &AM, // but ORE cannot be preserved (see comment before the pass definition). OptimizationRemarkEmitter ORE(L.getHeader()->getParent()); auto Result = deleteLoopIfDead(&L, AR.DT, AR.SE, AR.LI, AR.MSSA, ORE); - - // If we can prove the backedge isn't taken, just break it and be done. This - // leaves the loop structure in place which means it can handle dispatching - // to the right exit based on whatever loop invariant structure remains. - if (Result != LoopDeletionResult::Deleted) -Result = merge(Result, breakBackedgeIfNotTaken(&L, AR.DT, AR.SE, AR.LI, -
[llvm-branch-commits] [llvm] 377dcfd - [Tests] Auto update a vectorizer test to simplify future diff
Author: Philip Reames Date: 2021-01-10T12:23:22-08:00 New Revision: 377dcfd5c15d8e2c9e71a171635529052a96e244 URL: https://github.com/llvm/llvm-project/commit/377dcfd5c15d8e2c9e71a171635529052a96e244 DIFF: https://github.com/llvm/llvm-project/commit/377dcfd5c15d8e2c9e71a171635529052a96e244.diff LOG: [Tests] Auto update a vectorizer test to simplify future diff Added: Modified: llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll Removed: diff --git a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll index a3cdf7bf3e40..208e1a219be8 100644 --- a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll +++ b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll @@ -1,3 +1,4 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; RUN: opt -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S %s | FileCheck %s @@ -7,32 +8,62 @@ ; Test case for PR43398. define void @can_sink_after_store(i32 %x, i32* %ptr, i64 %tc) local_unnamed_addr #0 { -; CHECK-LABEL: vector.ph: -; CHECK:%broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i32 0 -; CHECK-NEXT: %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer -; CHECK-NEXT: %vector.recur.init = insertelement <4 x i32> poison, i32 %.pre, i32 3 -; CHECK-NEXT:br label %vector.body - -; CHECK-LABEL: vector.body: -; CHECK-NEXT: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] -; CHECK-NEXT: %vector.recur = phi <4 x i32> [ %vector.recur.init, %vector.ph ], [ %wide.load, %vector.body ] -; CHECK-NEXT: %offset.idx = add i64 1, %index -; CHECK-NEXT: %0 = add i64 %offset.idx, 0 -; CHECK-NEXT: %1 = getelementptr inbounds [257 x i32], [257 x i32]* @p, i64 0, i64 %0 -; CHECK-NEXT: %2 = getelementptr inbounds i32, i32* %1, i32 0 -; CHECK-NEXT: %3 = bitcast i32* %2 to <4 x i32>* -; CHECK-NEXT: %wide.load = load <4 x i32>, <4 x i32>* %3, align 4 -; CHECK-NEXT: %4 = shufflevector <4 x i32> %vector.recur, <4 x i32> %wide.load, <4 x i32> -; CHECK-NEXT: %5 = add <4 x i32> %4, %broadcast.splat -; CHECK-NEXT: %6 = add <4 x i32> %5, %wide.load -; CHECK-NEXT: %7 = getelementptr inbounds [257 x i32], [257 x i32]* @q, i64 0, i64 %0 -; CHECK-NEXT: %8 = getelementptr inbounds i32, i32* %7, i32 0 -; CHECK-NEXT: %9 = bitcast i32* %8 to <4 x i32>* -; CHECK-NEXT: store <4 x i32> %6, <4 x i32>* %9, align 4 -; CHECK-NEXT: %index.next = add i64 %index, 4 -; CHECK-NEXT: %10 = icmp eq i64 %index.next, 1996 -; CHECK-NEXT: br i1 %10, label %middle.block, label %vector.body +; CHECK-LABEL: @can_sink_after_store( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[PREHEADER:%.*]] +; CHECK: preheader: +; CHECK-NEXT:[[IDX_PHI_TRANS:%.*]] = getelementptr inbounds [257 x i32], [257 x i32]* @p, i64 0, i64 1 +; CHECK-NEXT:[[DOTPRE:%.*]] = load i32, i32* [[IDX_PHI_TRANS]], align 4 +; CHECK-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT:[[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X:%.*]], i32 0 +; CHECK-NEXT:[[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer +; CHECK-NEXT:[[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[DOTPRE]], i32 3 +; CHECK-NEXT:br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT:[[VECTOR_RECUR:%.*]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT:[[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]] +; CHECK-NEXT:[[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0 +; CHECK-NEXT:[[TMP1:%.*]] = getelementptr inbounds [257 x i32], [257 x i32]* @p, i64 0, i64 [[TMP0]] +; CHECK-NEXT:[[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TMP1]], i32 0 +; CHECK-NEXT:[[TMP3:%.*]] = bitcast i32* [[TMP2]] to <4 x i32>* +; CHECK-NEXT:[[WIDE_LOAD]] = load <4 x i32>, <4 x i32>* [[TMP3]], align 4 +; CHECK-NEXT:[[TMP4:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[WIDE_LOAD]], <4 x i32> +; CHECK-NEXT:[[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[BROADCAST_SPLAT]] +; CHECK-NEXT:[[TMP6:%.*]] = add <4 x i32> [[TMP5]], [[WIDE_LOAD]] +; CHECK-NEXT:[[TMP7:%.*]] = getelementptr inbounds [257 x i32], [257 x i32]* @q, i64 0, i64 [[TMP0]] +; CHECK-NEXT:[[TMP8:%.*]] = getelementptr inbounds i32, i32* [[TMP7]], i32 0 +; CHECK-NEXT:[[TMP9:%.*]] = bitcast i32* [[TMP8]] to <4 x i32>* +; CHECK-NEXT:store <4 x i32> [[TMP6]], <4 x i32>* [[TMP9]], align 4 +; CHECK-NEX
[llvm-branch-commits] [llvm] 86d6f7e - Precommit tests requested for D93725
Author: Philip Reames Date: 2021-01-10T12:29:34-08:00 New Revision: 86d6f7e90a1deab93e357b8f356e29d4a24fa3ac URL: https://github.com/llvm/llvm-project/commit/86d6f7e90a1deab93e357b8f356e29d4a24fa3ac DIFF: https://github.com/llvm/llvm-project/commit/86d6f7e90a1deab93e357b8f356e29d4a24fa3ac.diff LOG: Precommit tests requested for D93725 Added: Modified: llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll llvm/test/Transforms/LoopVectorize/loop-form.ll Removed: diff --git a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll index 208e1a219be8..ef3d3e659e5a 100644 --- a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll +++ b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll @@ -432,3 +432,94 @@ loop.latch:; preds = %if.then122, %for.b exit: ret void } + +; A recurrence in a multiple exit loop. +define i16 @multiple_exit(i16* %p, i32 %n) { +; CHECK-LABEL: @multiple_exit( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[FOR_COND:%.*]] +; CHECK: for.cond: +; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ] +; CHECK-NEXT:[[REC:%.*]] = phi i16 [ 0, [[ENTRY]] ], [ [[REC_NEXT:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64 +; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 [[IPROM]] +; CHECK-NEXT:[[REC_NEXT]] = load i16, i16* [[B]], align 2 +; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]] +; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]] +; CHECK: for.body: +; CHECK-NEXT:store i16 [[REC]], i16* [[B]], align 4 +; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1 +; CHECK-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096 +; CHECK-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]] +; CHECK: if.end: +; CHECK-NEXT:[[REC_LCSSA:%.*]] = phi i16 [ [[REC]], [[FOR_BODY]] ], [ [[REC]], [[FOR_COND]] ] +; CHECK-NEXT:ret i16 [[REC_LCSSA]] +; +entry: + br label %for.cond + +for.cond: + %i = phi i32 [ 0, %entry ], [ %inc, %for.body ] + %rec = phi i16 [0, %entry], [ %rec.next, %for.body ] + %iprom = sext i32 %i to i64 + %b = getelementptr inbounds i16, i16* %p, i64 %iprom + %rec.next = load i16, i16* %b + %cmp = icmp slt i32 %i, %n + br i1 %cmp, label %for.body, label %if.end + +for.body: + store i16 %rec , i16* %b, align 4 + %inc = add nsw i32 %i, 1 + %cmp2 = icmp slt i32 %i, 2096 + br i1 %cmp2, label %for.cond, label %if.end + +if.end: + ret i16 %rec +} + + +; A multiple exit case where one of the exiting edges involves a value +; from the recurrence and one does not. +define i16 @multiple_exit2(i16* %p, i32 %n) { +; CHECK-LABEL: @multiple_exit2( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[FOR_COND:%.*]] +; CHECK: for.cond: +; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ] +; CHECK-NEXT:[[REC:%.*]] = phi i16 [ 0, [[ENTRY]] ], [ [[REC_NEXT:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64 +; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 [[IPROM]] +; CHECK-NEXT:[[REC_NEXT]] = load i16, i16* [[B]], align 2 +; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]] +; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]] +; CHECK: for.body: +; CHECK-NEXT:store i16 [[REC]], i16* [[B]], align 4 +; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1 +; CHECK-NEXT:[[CMP2:%.*]] = icmp slt i32 [[I]], 2096 +; CHECK-NEXT:br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]] +; CHECK: if.end: +; CHECK-NEXT:[[REC_LCSSA:%.*]] = phi i16 [ [[REC]], [[FOR_COND]] ], [ 10, [[FOR_BODY]] ] +; CHECK-NEXT:ret i16 [[REC_LCSSA]] +; +entry: + br label %for.cond + +for.cond: + %i = phi i32 [ 0, %entry ], [ %inc, %for.body ] + %rec = phi i16 [0, %entry], [ %rec.next, %for.body ] + %iprom = sext i32 %i to i64 + %b = getelementptr inbounds i16, i16* %p, i64 %iprom + %rec.next = load i16, i16* %b + %cmp = icmp slt i32 %i, %n + br i1 %cmp, label %for.body, label %if.end + +for.body: + store i16 %rec , i16* %b, align 4 + %inc = add nsw i32 %i, 1 + %cmp2 = icmp slt i32 %i, 2096 + br i1 %cmp2, label %for.cond, label %if.end + +if.end: + %rec.lcssa = phi i16 [ %rec, %for.cond ], [ 10, %for.body ] + ret i16 %rec.lcssa +} diff --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll b/llvm/test/Transforms/LoopVectorize/loop-form.ll index f93c038de6bb..bf94505aec2c 100644 --- a/llvm/test/Transforms/LoopVectorize/loop-form.ll +++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll @@ -869,3 +869,126 @@ loop.latch: exit: ret void } + +define i32 @reduction(i32* %addr) { +; CHECK-LABEL: @redu
[llvm-branch-commits] [llvm] fc8ab25 - [Tests] Precommit tests from to simplify rebase
Author: Philip Reames Date: 2021-01-10T12:42:08-08:00 New Revision: fc8ab254472972816956c69d16e8b35bc91cc2ab URL: https://github.com/llvm/llvm-project/commit/fc8ab254472972816956c69d16e8b35bc91cc2ab DIFF: https://github.com/llvm/llvm-project/commit/fc8ab254472972816956c69d16e8b35bc91cc2ab.diff LOG: [Tests] Precommit tests from to simplify rebase Added: llvm/test/Transforms/LoopDeletion/zero-btc.ll Modified: Removed: diff --git a/llvm/test/Transforms/LoopDeletion/zero-btc.ll b/llvm/test/Transforms/LoopDeletion/zero-btc.ll new file mode 100644 index ..b56e30e8f1be --- /dev/null +++ b/llvm/test/Transforms/LoopDeletion/zero-btc.ll @@ -0,0 +1,319 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -loop-deletion -S | FileCheck %s + +@G = external global i32 + +define void @test_trivial() { +; CHECK-LABEL: @test_trivial( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT:store i32 0, i32* @G, align 4 +; CHECK-NEXT:br i1 false, label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT:ret void +; +entry: + br label %loop + +loop: + store i32 0, i32* @G + br i1 false, label %loop, label %exit + +exit: + ret void +} + + +define void @test_bottom_tested() { +; CHECK-LABEL: @test_bottom_tested( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT:[[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_INC:%.*]], [[LOOP]] ] +; CHECK-NEXT:store i32 0, i32* @G, align 4 +; CHECK-NEXT:[[IV_INC]] = add i32 [[IV]], 1 +; CHECK-NEXT:[[BE_TAKEN:%.*]] = icmp ne i32 [[IV_INC]], 1 +; CHECK-NEXT:br i1 [[BE_TAKEN]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT:ret void +; +entry: + br label %loop + +loop: + %iv = phi i32 [ 0, %entry], [ %iv.inc, %loop ] + store i32 0, i32* @G + %iv.inc = add i32 %iv, 1 + %be_taken = icmp ne i32 %iv.inc, 1 + br i1 %be_taken, label %loop, label %exit + +exit: + ret void +} + +define void @test_early_exit() { +; CHECK-LABEL: @test_early_exit( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT:[[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_INC:%.*]], [[LATCH:%.*]] ] +; CHECK-NEXT:store i32 0, i32* @G, align 4 +; CHECK-NEXT:[[IV_INC]] = add i32 [[IV]], 1 +; CHECK-NEXT:[[BE_TAKEN:%.*]] = icmp ne i32 [[IV_INC]], 1 +; CHECK-NEXT:br i1 [[BE_TAKEN]], label [[LATCH]], label [[EXIT:%.*]] +; CHECK: latch: +; CHECK-NEXT:br label [[LOOP]] +; CHECK: exit: +; CHECK-NEXT:ret void +; +entry: + br label %loop + +loop: + %iv = phi i32 [ 0, %entry], [ %iv.inc, %latch ] + store i32 0, i32* @G + %iv.inc = add i32 %iv, 1 + %be_taken = icmp ne i32 %iv.inc, 1 + br i1 %be_taken, label %latch, label %exit +latch: + br label %loop + +exit: + ret void +} + +define void @test_multi_exit1() { +; CHECK-LABEL: @test_multi_exit1( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT:[[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_INC:%.*]], [[LATCH:%.*]] ] +; CHECK-NEXT:store i32 0, i32* @G, align 4 +; CHECK-NEXT:[[IV_INC]] = add i32 [[IV]], 1 +; CHECK-NEXT:[[BE_TAKEN:%.*]] = icmp ne i32 [[IV_INC]], 1 +; CHECK-NEXT:br i1 [[BE_TAKEN]], label [[LATCH]], label [[EXIT:%.*]] +; CHECK: latch: +; CHECK-NEXT:store i32 1, i32* @G, align 4 +; CHECK-NEXT:[[COND2:%.*]] = icmp ult i32 [[IV_INC]], 30 +; CHECK-NEXT:br i1 [[COND2]], label [[LOOP]], label [[EXIT]] +; CHECK: exit: +; CHECK-NEXT:ret void +; +entry: + br label %loop + +loop: + %iv = phi i32 [ 0, %entry], [ %iv.inc, %latch ] + store i32 0, i32* @G + %iv.inc = add i32 %iv, 1 + %be_taken = icmp ne i32 %iv.inc, 1 + br i1 %be_taken, label %latch, label %exit +latch: + store i32 1, i32* @G + %cond2 = icmp ult i32 %iv.inc, 30 + br i1 %cond2, label %loop, label %exit + +exit: + ret void +} + +define void @test_multi_exit2() { +; CHECK-LABEL: @test_multi_exit2( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT:store i32 0, i32* @G, align 4 +; CHECK-NEXT:br i1 true, label [[LATCH:%.*]], label [[EXIT:%.*]] +; CHECK: latch: +; CHECK-NEXT:store i32 1, i32* @G, align 4 +; CHECK-NEXT:br i1 false, label [[LOOP]], label [[EXIT]] +; CHECK: exit: +; CHECK-NEXT:ret void +; +entry: + br label %loop + +loop: + store i32 0, i32* @G + br i1 true, label %latch, label %exit +latch: + store i32 1, i32* @G + br i1 false, label %loop, label %exit + +exit: + ret void +} + +; TODO: SCEV seems not to recognize this as a zero btc loop +define void @test_multi_exit3(i1 %cond1) { +; CHECK-LABEL: @test_multi_exit3( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[LOOP:%.*]] +; CHECK: loop: +;
[llvm-branch-commits] [llvm] 4739dd6 - [LoopDeletion] Break backedge of outermost loops when known not taken
Author: Philip Reames Date: 2021-01-10T16:02:33-08:00 New Revision: 4739dd67e7a08b715f1d23f71fb4af16007fe80a URL: https://github.com/llvm/llvm-project/commit/4739dd67e7a08b715f1d23f71fb4af16007fe80a DIFF: https://github.com/llvm/llvm-project/commit/4739dd67e7a08b715f1d23f71fb4af16007fe80a.diff LOG: [LoopDeletion] Break backedge of outermost loops when known not taken This is a resubmit of dd6bb367 (which was reverted due to stage2 build failures in 7c63aac), with the additional restriction added to the transform to only consider outer most loops. As shown in the added test case, ensuring LCSSA is up to date when deleting an inner loop is tricky as we may actually need to remove blocks from any outer loops, thus changing the exit block set. For the moment, just avoid transforming this case. I plan to return to this case in a follow up patch and see if we can do better. Original commit message follows... The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906 Added: Modified: llvm/include/llvm/Transforms/Utils/LoopUtils.h llvm/lib/Transforms/Scalar/LoopDeletion.cpp llvm/lib/Transforms/Utils/LoopUtils.cpp llvm/test/Transforms/IndVarSimplify/exit_value_test2.ll llvm/test/Transforms/LoopDeletion/update-scev.ll llvm/test/Transforms/LoopDeletion/zero-btc.ll Removed: diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h index 80c6b09d9cf0..940747b5b2ea 100644 --- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h +++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h @@ -179,6 +179,12 @@ bool hoistRegion(DomTreeNode *, AAResults *, LoopInfo *, DominatorTree *, void deleteDeadLoop(Loop *L, DominatorTree *DT, ScalarEvolution *SE, LoopInfo *LI, MemorySSA *MSSA = nullptr); +/// Remove the backedge of the specified loop. Handles loop nests and general +/// loop structures subject to the precondition that the loop has no parent +/// loop and has a single latch block. Preserves all listed analyses. +void breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE, + LoopInfo &LI, MemorySSA *MSSA); + /// Try to promote memory values to scalars by sinking stores out of /// the loop and moving loads to before the loop. We do this by looping over /// the stores in the loop, looking for stores to Must pointers which are diff --git a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp index a94676eadeab..bd5cdeabb9bd 100644 --- a/llvm/lib/Transforms/Scalar/LoopDeletion.cpp +++ b/llvm/lib/Transforms/Scalar/LoopDeletion.cpp @@ -26,6 +26,7 @@ #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Scalar/LoopPassManager.h" #include "llvm/Transforms/Utils/LoopUtils.h" + using namespace llvm; #define DEBUG_TYPE "loop-delete" @@ -38,6 +39,14 @@ enum class LoopDeletionResult { Deleted, }; +static LoopDeletionResult merge(LoopDeletionResult A, LoopDeletionResult B) { + if (A == LoopDeletionResult::Deleted || B == LoopDeletionResult::Deleted) +return LoopDeletionResult::Deleted; + if (A == LoopDeletionResult::Modified || B == LoopDeletionResult::Modified) +return LoopDeletionResult::Modified; + return LoopDeletionResult::Unmodified; +} + /// Determines if a loop is dead. /// /// This assumes that we've already checked for unique exit and exiting blocks, @@ -126,6 +135,34 @@ static bool isLoopNeverExecuted(Loop *L) { return true; } +/// If we can prove the backedge is untaken, remove it. This destroys the +/// loop, but leaves the (now trivially loop invariant) control flow and +/// side effects (if any) in place. +static LoopDeletionResult +breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, ScalarEvolution &SE, +LoopInfo &LI, MemorySSA *MSSA, +OptimizationRemarkEmitter &ORE) { + assert(L->isLCSSAForm(DT) && "Expected LCSSA!"); + + if (!L->getLoopLatch()) +return LoopDeletionResult::Unmodified; + + auto *BTC = SE.getBackedgeTakenCount(L); + if (!BTC->isZero()) +return LoopDeletionResult::Unmodified; + + // For non-outermost loops, the tricky case is that we can drop blocks + // out of both inner and outer loops at the same time. This results in + // new exiting block for the outer loop appearing, and possibly needing + // an lcssa phi inserted. (See loop_nest_lcssa test case in zero-btc.ll) + // TODO: We can handle a bunch of cases here without much work, revisit. + if (!L->isOutermost()) +return LoopDeletionResult::Unmodified; + + breakLoopBackedge(L, DT, SE, LI, MSSA)
[llvm-branch-commits] [llvm] f5fe849 - [LAA] Relax restrictions on early exits in loop structure
Author: Philip Reames Date: 2020-12-14T12:44:01-08:00 New Revision: f5fe8493e5acfd70da61993cd370816978b9ef85 URL: https://github.com/llvm/llvm-project/commit/f5fe8493e5acfd70da61993cd370816978b9ef85 DIFF: https://github.com/llvm/llvm-project/commit/f5fe8493e5acfd70da61993cd370816978b9ef85.diff LOG: [LAA] Relax restrictions on early exits in loop structure his is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be mostly NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist). Moving the checks does end up changing some of the optimization warnings and debug output slightly, but nothing that appears to be a regression. Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so. Differential Revision: https://reviews.llvm.org/D92066 Added: Modified: llvm/lib/Analysis/LoopAccessAnalysis.cpp llvm/lib/Transforms/Scalar/LoopDistribute.cpp llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp llvm/lib/Transforms/Utils/LoopVersioning.cpp Removed: diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index 65d39161c1be..be340a3b3130 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -1781,26 +1781,6 @@ bool LoopAccessInfo::canAnalyzeLoop() { return false; } - // We must have a single exiting block. - if (!TheLoop->getExitingBlock()) { -LLVM_DEBUG( -dbgs() << "LAA: loop control flow is not understood by analyzer\n"); -recordAnalysis("CFGNotUnderstood") -<< "loop control flow is not understood by analyzer"; -return false; - } - - // We only handle bottom-tested loops, i.e. loop in which the condition is - // checked at the end of each iteration. With that we can assume that all - // instructions in the loop are executed the same number of times. - if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) { -LLVM_DEBUG( -dbgs() << "LAA: loop control flow is not understood by analyzer\n"); -recordAnalysis("CFGNotUnderstood") -<< "loop control flow is not understood by analyzer"; -return false; - } - // ScalarEvolution needs to be able to find the exit count. const SCEV *ExitCount = PSE->getBackedgeTakenCount(); if (isa(ExitCount)) { diff --git a/llvm/lib/Transforms/Scalar/LoopDistribute.cpp b/llvm/lib/Transforms/Scalar/LoopDistribute.cpp index 98d67efef922..3dd7d9dce67a 100644 --- a/llvm/lib/Transforms/Scalar/LoopDistribute.cpp +++ b/llvm/lib/Transforms/Scalar/LoopDistribute.cpp @@ -670,15 +670,17 @@ class LoopDistributeForLoop { << L->getHeader()->getParent()->getName() << "\" checking " << *L << "\n"); +// Having a single exit block implies there's also one exiting block. if (!L->getExitBlock()) return fail("MultipleExitBlocks", "multiple exit blocks"); if (!L->isLoopSimplifyForm()) return fail("NotLoopSimplifyForm", "loop is not in loop-simplify form"); +if (!L->isRotatedForm()) + return fail("NotBottomTested", "loop is not bottom tested"); BasicBlock *PH = L->getLoopPreheader(); -// LAA will check that we only have a single exiting block. LAI = &GetLAA(*L); // Currently, we only distribute to isolate the part of the loop with diff --git a/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp b/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp index 475448740ae4..56afddead619 100644 --- a/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp +++ b/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp @@ -632,6 +632,9 @@ eliminateLoadsAcrossLoops(Function &F, LoopInfo &LI, DominatorTree &DT, // Now walk the identified inner loops. for (Loop *L : Worklist) { +// Match historical behavior +if (!L->isRotatedForm() || !L->getExitingBlock()) + continue; // The actual work is performed by LoadEliminationForLoop. LoadEliminationForLoop LEL(L, &LI, GetLAI(*L), &DT, BFI, PSI); Changed |= LEL.processLoop(); diff --git a/llvm/lib/Transforms/Utils/LoopVersioning.cpp b/llvm/lib/Transforms/Utils/LoopVersioning.cpp index 03eb41b5ee0d..b605cb2fb865 100644 --- a/llvm/lib/Transforms/Utils/LoopVersioning.cpp +++ b/llvm/lib/Transforms/Utils/LoopVersioning.cpp @@ -269,8 +269,11 @@ bool runImpl(LoopInfo *LI, function_ref GetLAA, // Now walk the identified inner loops. bool Changed = false; for (Loop *L : Worklist) { +if (!L->isLoopSimplifyForm() || !L->isRotatedForm() ||
[llvm-branch-commits] [clang] 3b3eb7f - Speculative fix for build bot failures
Author: Philip Reames Date: 2020-12-14T13:44:40-08:00 New Revision: 3b3eb7f07ff97feb64a1975587bb473f1f3efa6b URL: https://github.com/llvm/llvm-project/commit/3b3eb7f07ff97feb64a1975587bb473f1f3efa6b DIFF: https://github.com/llvm/llvm-project/commit/3b3eb7f07ff97feb64a1975587bb473f1f3efa6b.diff LOG: Speculative fix for build bot failures (The clang build fails for me locally, so this is based on built bot output and a guess as to root cause.) f5fe849 made the execution of LAA conditional, so I'm guessing that's the root cause. Added: Modified: clang/test/CodeGen/thinlto-distributed-newpm.ll Removed: diff --git a/clang/test/CodeGen/thinlto-distributed-newpm.ll b/clang/test/CodeGen/thinlto-distributed-newpm.ll index 75ea4064d6af..8fe53762837e 100644 --- a/clang/test/CodeGen/thinlto-distributed-newpm.ll +++ b/clang/test/CodeGen/thinlto-distributed-newpm.ll @@ -183,7 +183,6 @@ ; CHECK-O: Running analysis: PostDominatorTreeAnalysis on main ; CHECK-O: Running analysis: DemandedBitsAnalysis on main ; CHECK-O: Running pass: LoopLoadEliminationPass on main -; CHECK-O: Running analysis: LoopAccessAnalysis on Loop at depth 1 containing: %b ; CHECK-O: Running pass: InstCombinePass on main ; CHECK-O: Running pass: SimplifyCFGPass on main ; CHECK-O: Running pass: SLPVectorizerPass on main ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 99ac886 - [tests][LV] precommit tests for D93317
Author: Philip Reames Date: 2020-12-15T10:53:34-08:00 New Revision: 99ac8868cfb403aeffe5b3f13e3487eed79e67b9 URL: https://github.com/llvm/llvm-project/commit/99ac8868cfb403aeffe5b3f13e3487eed79e67b9 DIFF: https://github.com/llvm/llvm-project/commit/99ac8868cfb403aeffe5b3f13e3487eed79e67b9.diff LOG: [tests][LV] precommit tests for D93317 Added: Modified: llvm/test/Transforms/LoopVectorize/loop-form.ll Removed: diff --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll b/llvm/test/Transforms/LoopVectorize/loop-form.ll index 3bbe8100e34e..cebe7844bb11 100644 --- a/llvm/test/Transforms/LoopVectorize/loop-form.ll +++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll @@ -1,16 +1,80 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; RUN: opt -S -loop-vectorize < %s | FileCheck %s target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" -; Check that we vectorize only bottom-tested loops. -; This is a reduced testcase from PR21302. +define void @bottom_tested(i16* %p, i32 %n) { +; CHECK-LABEL: @bottom_tested( +; CHECK-NEXT: entry: +; CHECK-NEXT:[[TMP0:%.*]] = icmp sgt i32 [[N:%.*]], 0 +; CHECK-NEXT:[[SMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 0 +; CHECK-NEXT:[[TMP1:%.*]] = add nuw i32 [[SMAX]], 1 +; CHECK-NEXT:[[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP1]], 2 +; CHECK-NEXT:br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT:[[N_MOD_VF:%.*]] = urem i32 [[TMP1]], 2 +; CHECK-NEXT:[[N_VEC:%.*]] = sub i32 [[TMP1]], [[N_MOD_VF]] +; CHECK-NEXT:br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT:[[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT:[[TMP2:%.*]] = add i32 [[INDEX]], 0 +; CHECK-NEXT:[[TMP3:%.*]] = sext i32 [[TMP2]] to i64 +; CHECK-NEXT:[[TMP4:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 [[TMP3]] +; CHECK-NEXT:[[TMP5:%.*]] = getelementptr inbounds i16, i16* [[TMP4]], i32 0 +; CHECK-NEXT:[[TMP6:%.*]] = bitcast i16* [[TMP5]] to <2 x i16>* +; CHECK-NEXT:store <2 x i16> zeroinitializer, <2 x i16>* [[TMP6]], align 4 +; CHECK-NEXT:[[INDEX_NEXT]] = add i32 [[INDEX]], 2 +; CHECK-NEXT:[[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT:br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.*]] +; CHECK: middle.block: +; CHECK-NEXT:[[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]] +; CHECK-NEXT:br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]] +; CHECK: scalar.ph: +; CHECK-NEXT:[[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] +; CHECK-NEXT:br label [[FOR_COND:%.*]] +; CHECK: for.cond: +; CHECK-NEXT:[[I:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.*]], [[FOR_COND]] ] +; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64 +; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P]], i64 [[IPROM]] +; CHECK-NEXT:store i16 0, i16* [[B]], align 4 +; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1 +; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N]] +; CHECK-NEXT:br i1 [[CMP]], label [[FOR_COND]], label [[IF_END]], [[LOOP2:!llvm.loop !.*]] +; CHECK: if.end: +; CHECK-NEXT:ret void ; -; rdar://problem/18886083 +entry: + br label %for.cond + +for.cond: + %i = phi i32 [ 0, %entry ], [ %inc, %for.cond ] + %iprom = sext i32 %i to i64 + %b = getelementptr inbounds i16, i16* %p, i64 %iprom + store i16 0, i16* %b, align 4 + %inc = add nsw i32 %i, 1 + %cmp = icmp slt i32 %i, %n + br i1 %cmp, label %for.cond, label %if.end -%struct.X = type { i32, i16 } -; CHECK-LABEL: @foo( -; CHECK-NOT: vector.body +if.end: + ret void +} -define void @foo(i32 %n) { +define void @early_exit(i16* %p, i32 %n) { +; CHECK-LABEL: @early_exit( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[FOR_COND:%.*]] +; CHECK: for.cond: +; CHECK-NEXT:[[I:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ] +; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[I]], [[N:%.*]] +; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]] +; CHECK: for.body: +; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I]] to i64 +; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 [[IPROM]] +; CHECK-NEXT:store i16 0, i16* [[B]], align 4 +; CHECK-NEXT:[[INC]] = add nsw i32 [[I]], 1 +; CHECK-NEXT:br label [[FOR_COND]] +; CHECK: if.end: +; CHECK-NEXT:ret void +; entry: br label %for.cond @@ -21,7 +85,7 @@ for.cond: for.body: %iprom = sext i32 %i to i64 - %b = getelementptr inbounds %struct.X, %struct.X* undef, i64 %iprom, i32 1 + %b = getelementptr inbounds i16, i16* %p, i64 %iprom store i16 0, i16* %b, align 4 %inc = add
[llvm-branch-commits] [llvm] a048e2f - [tests] fix an accidental target dependence added in 99ac8868
Author: Philip Reames Date: 2020-12-15T11:07:30-08:00 New Revision: a048e2fa1d0285a3582bd224d5652dbf1dc91cb4 URL: https://github.com/llvm/llvm-project/commit/a048e2fa1d0285a3582bd224d5652dbf1dc91cb4 DIFF: https://github.com/llvm/llvm-project/commit/a048e2fa1d0285a3582bd224d5652dbf1dc91cb4.diff LOG: [tests] fix an accidental target dependence added in 99ac8868 Added: Modified: llvm/test/Transforms/LoopVectorize/loop-form.ll Removed: diff --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll b/llvm/test/Transforms/LoopVectorize/loop-form.ll index cebe7844bb11..298143ba726c 100644 --- a/llvm/test/Transforms/LoopVectorize/loop-form.ll +++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll @@ -1,5 +1,5 @@ ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py -; RUN: opt -S -loop-vectorize < %s | FileCheck %s +; RUN: opt -S -loop-vectorize -force-vector-width=2 < %s | FileCheck %s target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" define void @bottom_tested(i16* %p, i32 %n) { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] a81db8b - [LV] Restructure handling of -prefer-predicate-over-epilogue option [NFC]
Author: Philip Reames Date: 2020-12-15T12:38:13-08:00 New Revision: a81db8b3159e72a6d2ecb2318024316e4aa30933 URL: https://github.com/llvm/llvm-project/commit/a81db8b3159e72a6d2ecb2318024316e4aa30933 DIFF: https://github.com/llvm/llvm-project/commit/a81db8b3159e72a6d2ecb2318024316e4aa30933.diff LOG: [LV] Restructure handling of -prefer-predicate-over-epilogue option [NFC] This should be purely non-functional. When touching this code for another reason, I found the handling of the PredicateOrDontVectorize piece here very confusing. Let's make it an explicit state (instead of an implicit combination of two variables), and use early return for options/hint processing. Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index c96637762658..6e506a4d71a4 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1201,7 +1201,10 @@ enum ScalarEpilogueLowering { CM_ScalarEpilogueNotAllowedLowTripLoop, // Loop hint predicate indicating an epilogue is undesired. - CM_ScalarEpilogueNotNeededUsePredicate + CM_ScalarEpilogueNotNeededUsePredicate, + + // Directive indicating we must either tail fold or not vectorize + CM_ScalarEpilogueNotAllowedUsePredicate }; /// LoopVectorizationCostModel - estimates the expected speedups due to @@ -5463,6 +5466,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) { switch (ScalarEpilogueStatus) { case CM_ScalarEpilogueAllowed: return MaxVF; + case CM_ScalarEpilogueNotAllowedUsePredicate: +LLVM_FALLTHROUGH; case CM_ScalarEpilogueNotNeededUsePredicate: LLVM_DEBUG( dbgs() << "LV: vector predicate hint/switch found.\n" @@ -5522,16 +5527,17 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) { // If there was a tail-folding hint/switch, but we can't fold the tail by // masking, fallback to a vectorization with a scalar epilogue. if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) { -if (PreferPredicateOverEpilogue == PreferPredicateTy::PredicateOrDontVectorize) { - LLVM_DEBUG(dbgs() << "LV: Can't fold tail by masking: don't vectorize\n"); - return None; -} LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a " "scalar epilogue instead.\n"); ScalarEpilogueStatus = CM_ScalarEpilogueAllowed; return MaxVF; } + if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedUsePredicate) { +LLVM_DEBUG(dbgs() << "LV: Can't fold tail by masking: don't vectorize\n"); +return None; + } + if (TC == 0) { reportVectorizationFailure( "Unable to calculate the loop count due to complex control flow", @@ -8855,22 +8861,29 @@ static ScalarEpilogueLowering getScalarEpilogueLowering( Hints.getForce() != LoopVectorizeHints::FK_Enabled)) return CM_ScalarEpilogueNotAllowedOptSize; - bool PredicateOptDisabled = PreferPredicateOverEpilogue.getNumOccurrences() && - !PreferPredicateOverEpilogue; + // 2) If set, obey the directives + if (PreferPredicateOverEpilogue.getNumOccurrences()) { +switch (PreferPredicateOverEpilogue) { +case PreferPredicateTy::ScalarEpilogue: + return CM_ScalarEpilogueAllowed; +case PreferPredicateTy::PredicateElseScalarEpilogue: + return CM_ScalarEpilogueNotNeededUsePredicate; +case PreferPredicateTy::PredicateOrDontVectorize: + return CM_ScalarEpilogueNotAllowedUsePredicate; +}; + } - // 2) Next, if disabling predication is requested on the command line, honour - // this and request a scalar epilogue. - if (PredicateOptDisabled) + // 3) If set, obey the hints + switch (Hints.getPredicate()) { + case LoopVectorizeHints::FK_Enabled: +return CM_ScalarEpilogueNotNeededUsePredicate; + case LoopVectorizeHints::FK_Disabled: return CM_ScalarEpilogueAllowed; + }; - // 3) and 4) look if enabling predication is requested on the command line, - // with a loop hint, or if the TTI hook indicates this is profitable, request - // predication. - if (PreferPredicateOverEpilogue || - Hints.getPredicate() == LoopVectorizeHints::FK_Enabled || - (TTI->preferPredicateOverEpilogue(L, LI, *SE, *AC, TLI, DT, -LVL.getLAI()) && - Hints.getPredicate() != LoopVectorizeHints::FK_Disabled)) + // 4) if the TTI hook indicates this is profitable, request predication. + if (TTI->preferPredicateOverEpilogue(L, LI, *SE, *AC, TLI, DT, + LVL.getLAI())) return CM_ScalarEpilogueNotNeededUsePredicate; return CM_ScalarEpilogueAllowed; _
[llvm-branch-commits] [llvm] af7ef89 - [LV] Extend dead instruction detection to multiple exiting blocks
Author: Philip Reames Date: 2020-12-15T18:46:32-08:00 New Revision: af7ef895d4951cd41c5e055c84469b4fd229d50c URL: https://github.com/llvm/llvm-project/commit/af7ef895d4951cd41c5e055c84469b4fd229d50c DIFF: https://github.com/llvm/llvm-project/commit/af7ef895d4951cd41c5e055c84469b4fd229d50c.diff LOG: [LV] Extend dead instruction detection to multiple exiting blocks Given we haven't yet enabled multiple exiting blocks, this is currently non functional, but it's an obvious extension which cleans up a later patch. I don't think this is worth review (as it's pretty obvious), if anyone disagrees, feel feel to revert or comment and I will. Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 6e506a4d71a4..cbeb6a32825f 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -7472,16 +7472,23 @@ void LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV, void LoopVectorizationPlanner::collectTriviallyDeadInstructions( SmallPtrSetImpl &DeadInstructions) { - BasicBlock *Latch = OrigLoop->getLoopLatch(); - // We create new control-flow for the vectorized loop, so the original - // condition will be dead after vectorization if it's only used by the - // branch. - auto *Cmp = dyn_cast(Latch->getTerminator()->getOperand(0)); - if (Cmp && Cmp->hasOneUse()) { -DeadInstructions.insert(Cmp); + // We create new control-flow for the vectorized loop, so the original exit + // conditions will be dead after vectorization if it's only used by the + // terminator + SmallVector ExitingBlocks; + OrigLoop->getExitingBlocks(ExitingBlocks); + for (auto *BB : ExitingBlocks) { +auto *Cmp = dyn_cast(BB->getTerminator()->getOperand(0)); +if (!Cmp || !Cmp->hasOneUse()) + continue; + +// TODO: we should introduce a getUniqueExitingBlocks on Loop +if (!DeadInstructions.insert(Cmp).second) + continue; // The operands of the icmp is often a dead trunc, used by IndUpdate. +// TODO: can recurse through operands in general for (Value *Op : Cmp->operands()) { if (isa(Op) && Op->hasOneUse()) DeadInstructions.insert(cast(Op)); @@ -7491,6 +7498,7 @@ void LoopVectorizationPlanner::collectTriviallyDeadInstructions( // We create new "steps" for induction variable updates to which the original // induction variables map. An original update instruction will be dead if // all its users except the induction variable are dead. + auto *Latch = OrigLoop->getLoopLatch(); for (auto &Induction : Legal->getInductionVars()) { PHINode *Ind = Induction.first; auto *IndUpdate = cast(Ind->getIncomingValueForBlock(Latch)); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 1f6e155 - [LV] Weaken a unnecessarily strong assert [NFC]
Author: Philip Reames Date: 2020-12-15T19:07:53-08:00 New Revision: 1f6e15566f147f5814b0fe04df71a8d6acc4e689 URL: https://github.com/llvm/llvm-project/commit/1f6e15566f147f5814b0fe04df71a8d6acc4e689 DIFF: https://github.com/llvm/llvm-project/commit/1f6e15566f147f5814b0fe04df71a8d6acc4e689.diff LOG: [LV] Weaken a unnecessarily strong assert [NFC] Account for the fact that (in the future) the latch might be a switch not a branch. The existing code is correct, minus the assert. Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index cbeb6a32825f..37863b035067 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -3409,14 +3409,7 @@ BasicBlock *InnerLoopVectorizer::completeLoopSkeleton(Loop *L, Value *Count = getOrCreateTripCount(L); Value *VectorTripCount = getOrCreateVectorTripCount(L); - // We need the OrigLoop (scalar loop part) latch terminator to help - // produce correct debug info for the middle block BB instructions. - // The legality check stage guarantees that the loop will have a single - // latch. - assert(isa(OrigLoop->getLoopLatch()->getTerminator()) && - "Scalar loop latch terminator isn't a branch"); - BranchInst *ScalarLatchBr = - cast(OrigLoop->getLoopLatch()->getTerminator()); + auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator(); // Add a check in the middle block to see if we have completed // all of the iterations in the first vector loop. @@ -3428,16 +3421,16 @@ BasicBlock *InnerLoopVectorizer::completeLoopSkeleton(Loop *L, VectorTripCount, "cmp.n", LoopMiddleBlock->getTerminator()); -// Here we use the same DebugLoc as the scalar loop latch branch instead +// Here we use the same DebugLoc as the scalar loop latch terminator instead // of the corresponding compare because they may have ended up with // diff erent line numbers and we want to avoid awkward line stepping while // debugging. Eg. if the compare has got a line number inside the loop. -cast(CmpN)->setDebugLoc(ScalarLatchBr->getDebugLoc()); +cast(CmpN)->setDebugLoc(ScalarLatchTerm->getDebugLoc()); } BranchInst *BrInst = BranchInst::Create(LoopExitBlock, LoopScalarPreHeader, CmpN); - BrInst->setDebugLoc(ScalarLatchBr->getDebugLoc()); + BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc()); ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst); // Get ready to start creating new instructions into the vectorized body. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] f106b28 - [tests] precommit a test mentioned in review for D93317
Author: Philip Reames Date: 2020-12-22T09:47:19-08:00 New Revision: f106b281be24df4b5ed4553c3c09c885610cd2b8 URL: https://github.com/llvm/llvm-project/commit/f106b281be24df4b5ed4553c3c09c885610cd2b8 DIFF: https://github.com/llvm/llvm-project/commit/f106b281be24df4b5ed4553c3c09c885610cd2b8.diff LOG: [tests] precommit a test mentioned in review for D93317 Added: Modified: llvm/test/Transforms/LoopVectorize/loop-form.ll Removed: diff --git a/llvm/test/Transforms/LoopVectorize/loop-form.ll b/llvm/test/Transforms/LoopVectorize/loop-form.ll index 298143ba726c..72f2215bb934 100644 --- a/llvm/test/Transforms/LoopVectorize/loop-form.ll +++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll @@ -338,3 +338,91 @@ if.end: if.end2: ret i32 1 } + +define i32 @multiple_latch1(i16* %p) { +; CHECK-LABEL: @multiple_latch1( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[FOR_BODY:%.*]] +; CHECK: for.body: +; CHECK-NEXT:[[I_02:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY_BACKEDGE:%.*]] ] +; CHECK-NEXT:[[INC]] = add nsw i32 [[I_02]], 1 +; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[INC]], 16 +; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY_BACKEDGE]], label [[FOR_SECOND:%.*]] +; CHECK: for.second: +; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I_02]] to i64 +; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 [[IPROM]] +; CHECK-NEXT:store i16 0, i16* [[B]], align 4 +; CHECK-NEXT:[[CMPS:%.*]] = icmp sgt i32 [[INC]], 16 +; CHECK-NEXT:br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label [[FOR_END:%.*]] +; CHECK: for.body.backedge: +; CHECK-NEXT:br label [[FOR_BODY]] +; CHECK: for.end: +; CHECK-NEXT:ret i32 0 +; +entry: + br label %for.body + +for.body: + %i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body.backedge] + %inc = add nsw i32 %i.02, 1 + %cmp = icmp slt i32 %inc, 16 + br i1 %cmp, label %for.body.backedge, label %for.second + +for.second: + %iprom = sext i32 %i.02 to i64 + %b = getelementptr inbounds i16, i16* %p, i64 %iprom + store i16 0, i16* %b, align 4 + %cmps = icmp sgt i32 %inc, 16 + br i1 %cmps, label %for.body.backedge, label %for.end + +for.body.backedge: + br label %for.body + +for.end: + ret i32 0 +} + + +; two back branches - loop simplify with convert this to the same form +; as previous before vectorizer sees it, but show that. +define i32 @multiple_latch2(i16* %p) { +; CHECK-LABEL: @multiple_latch2( +; CHECK-NEXT: entry: +; CHECK-NEXT:br label [[FOR_BODY:%.*]] +; CHECK: for.body: +; CHECK-NEXT:[[I_02:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_BODY_BACKEDGE:%.*]] ] +; CHECK-NEXT:[[INC]] = add nsw i32 [[I_02]], 1 +; CHECK-NEXT:[[CMP:%.*]] = icmp slt i32 [[INC]], 16 +; CHECK-NEXT:br i1 [[CMP]], label [[FOR_BODY_BACKEDGE]], label [[FOR_SECOND:%.*]] +; CHECK: for.body.backedge: +; CHECK-NEXT:br label [[FOR_BODY]] +; CHECK: for.second: +; CHECK-NEXT:[[IPROM:%.*]] = sext i32 [[I_02]] to i64 +; CHECK-NEXT:[[B:%.*]] = getelementptr inbounds i16, i16* [[P:%.*]], i64 [[IPROM]] +; CHECK-NEXT:store i16 0, i16* [[B]], align 4 +; CHECK-NEXT:[[CMPS:%.*]] = icmp sgt i32 [[INC]], 16 +; CHECK-NEXT:br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label [[FOR_END:%.*]] +; CHECK: for.end: +; CHECK-NEXT:ret i32 0 +; +entry: + br label %for.body + +for.body: + %i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second] + %inc = add nsw i32 %i.02, 1 + %cmp = icmp slt i32 %inc, 16 + br i1 %cmp, label %for.body, label %for.second + +for.second: + %iprom = sext i32 %i.02 to i64 + %b = getelementptr inbounds i16, i16* %p, i64 %iprom + store i16 0, i16* %b, align 4 + %cmps = icmp sgt i32 %inc, 16 + br i1 %cmps, label %for.body, label %for.end + +for.end: + ret i32 0 +} + +declare void @foo() ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] e4df6a4 - [LV] Vectorize (some) early and multiple exit loops
Author: Philip Reames Date: 2020-12-28T09:40:42-08:00 New Revision: e4df6a40dad66e989a4333c11d39cf3ed9635135 URL: https://github.com/llvm/llvm-project/commit/e4df6a40dad66e989a4333c11d39cf3ed9635135 DIFF: https://github.com/llvm/llvm-project/commit/e4df6a40dad66e989a4333c11d39cf3ed9635135.diff LOG: [LV] Vectorize (some) early and multiple exit loops This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways: single exit loops which are not bottom tested multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later) The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts. The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration. The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical). Differential Revision: https://reviews.llvm.org/D93317 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/control-flow.ll llvm/test/Transforms/LoopVectorize/loop-form.ll llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp index 60e1cc9a4a59..911309c9421c 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp @@ -1095,9 +1095,15 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp, return false; } - // We must have a single exiting block. - if (!Lp->getExitingBlock()) { -reportVectorizationFailure("The loop must have an exiting block", + // We currently must have a single "exit block" after the loop. Note that + // multiple "exiting blocks" inside the loop are allowed, provided they all + // reach the single exit block. + // TODO: This restriction can be relaxed in the near future, it's here solely + // to allow separation of changes for review. We need to generalize the phi + // update logic in a number of places. + BasicBlock *ExitBB = Lp->getUniqueExitBlock(); + if (!ExitBB) { +reportVectorizationFailure("The loop must have a unique exit block", "loop control flow is not understood by vectorizer", "CFGNotUnderstood", ORE, TheLoop); if (DoExtraAnalysis) @@ -1106,11 +1112,14 @@ bool LoopVectorizationLegality::canVectorizeLoopCFG(Loop *Lp, return false; } - // We only handle bottom-tested loops, i.e. loop in which the condition is - // checked at the end of each iteration. With that we can assume that all - // instructions in the loop are executed the same number of times. - if (Lp->getExitingBlock() != Lp->getLoopLatch()) { -reportVectorizationFailure("The exiting block is not the loop latch", + // The existing code assumes that LCSSA implies that phis are single entry + // (which was true when we had at most a single exiting edge from the latch). + // In general, there's nothing which prevents an LCSSA phi in exit block from + // having two or more values if there are multiple exiting edges leading to + // the exit block. (TODO: implement general case) + if (!empty(ExitBB->phis()) && !ExitBB->getSinglePredecessor()) { +reportVectorizationFailure("The loop must have no live-out values if " + "it has more than one exiting block", "loop control flow is not understood by vectorizer", "CFGNotUnderstood", ORE, TheLoop); if (DoExtraAnalysis) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 5889d5e55339..c48b650c3c3e 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -837,7 +837,8 @@ class InnerLoopVectorizer {
[llvm-branch-commits] [llvm] b06a2ad - [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE)
Author: Philip Reames Date: 2020-11-23T15:32:17-08:00 New Revision: b06a2ad94f45abc18970ecc3cec93d140d036d8f URL: https://github.com/llvm/llvm-project/commit/b06a2ad94f45abc18970ecc3cec93d140d036d8f DIFF: https://github.com/llvm/llvm-project/commit/b06a2ad94f45abc18970ecc3cec93d140d036d8f.diff LOG: [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE) A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane. This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic. In review discussion, there was an issue raised with coupling cost modeling with the lowering strategy for uniform inputs. The discussion on that item remains unsettled and is pending larger architectural discussion. We decided to move forward with this patch as is, and revise as warranted once the bigger picture design questions are settled. Differential Revision: https://reviews.llvm.org/D91398 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll llvm/test/Transforms/LoopVectorize/multiple-strides-vectorization.ll Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index a6cdcd720343..15a3bd39c0f9 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -2661,7 +2661,12 @@ void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr, VPUser &User, // Replace the operands of the cloned instructions with their scalar // equivalents in the new loop. for (unsigned op = 0, e = User.getNumOperands(); op != e; ++op) { -auto *NewOp = State.get(User.getOperand(op), Instance); +auto *Operand = dyn_cast(Instr->getOperand(op)); +auto InputInstance = Instance; +if (!Operand || !OrigLoop->contains(Operand) || +(Cost->isUniformAfterVectorization(Operand, State.VF))) + InputInstance.Lane = 0; +auto *NewOp = State.get(User.getOperand(op), InputInstance); Cloned->setOperand(op, NewOp); } addNewMetadata(Cloned, Instr); @@ -5031,6 +5036,11 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) { // replicating region where only a single instance out of VF should be formed. // TODO: optimize such seldom cases if found important, see PR40816. auto addToWorklistIfAllowed = [&](Instruction *I) -> void { +if (isOutOfScope(I)) { + LLVM_DEBUG(dbgs() << "LV: Found not uniform due to scope: " +<< *I << "\n"); + return; +} if (isScalarWithPredication(I, VF)) { LLVM_DEBUG(dbgs() << "LV: Found not uniform being ScalarWithPredication: " << *I << "\n"); @@ -5051,16 +5061,25 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) { // are pointers that are treated like consecutive pointers during // vectorization. The pointer operands of interleaved accesses are an // example. - SmallSetVector ConsecutiveLikePtrs; + SmallSetVector ConsecutiveLikePtrs; // Holds pointer operands of instructions that are possibly non-uniform. - SmallPtrSet PossibleNonUniformPtrs; + SmallPtrSet PossibleNonUniformPtrs; auto isUniformDecision = [&](Instruction *I, ElementCount VF) { InstWidening WideningDecision = getWideningDecision(I, VF); assert(WideningDecision != CM_Unknown && "Widening decision should be ready at this moment"); +// The address of a uniform mem op is itself uniform. We exclude stores +// here as there's an assumption in the current code that all uses of +// uniform instructions are uniform and, as noted below, uniform stores are +// still handled via replication (i.e. aren't uniform after vectorization). +if (isa(I) && Legal->isUniformMemOp(*I)) { + assert(WideningDecision == CM_Scalarize); + return true; +} + return (WideningDecision == CM_Widen || WideningDecision == CM_Widen_Reverse || WideningDecision == CM_Interleave); @@ -5076,10 +5095,21 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) { for (auto *BB : TheLoop->blocks()) for (auto &I : *BB) { // If there's no pointer operand, there's nothing to do. - auto *Ptr = dyn_cast_or_null(getLoadStorePointerOperand(&I)); + auto *Ptr = getLoadStorePo
[llvm-branch-commits] [llvm] d6239b3 - [test] pre-comit test for D91451
Author: Philip Reames Date: 2020-11-23T15:36:08-08:00 New Revision: d6239b3ea6c143a0c395eb3b8512677feaf6acc0 URL: https://github.com/llvm/llvm-project/commit/d6239b3ea6c143a0c395eb3b8512677feaf6acc0 DIFF: https://github.com/llvm/llvm-project/commit/d6239b3ea6c143a0c395eb3b8512677feaf6acc0.diff LOG: [test] pre-comit test for D91451 Added: Modified: llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll Removed: diff --git a/llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll b/llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll index 3c0ec386f073..a7e38c2115fb 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll @@ -131,6 +131,68 @@ loopexit: ret i32 %accum.next } +define i32 @uniform_address(i32* align(4) %addr, i32 %byte_offset) { +; CHECK-LABEL: @uniform_address( +; CHECK-NEXT: entry: +; CHECK-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT:br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT:[[TMP0:%.*]] = add i64 [[INDEX]], 0 +; CHECK-NEXT:[[TMP1:%.*]] = add i64 [[INDEX]], 4 +; CHECK-NEXT:[[TMP2:%.*]] = add i64 [[INDEX]], 8 +; CHECK-NEXT:[[TMP3:%.*]] = add i64 [[INDEX]], 12 +; CHECK-NEXT:[[TMP4:%.*]] = udiv i32 [[BYTE_OFFSET:%.*]], 4 +; CHECK-NEXT:[[TMP5:%.*]] = udiv i32 [[BYTE_OFFSET]], 4 +; CHECK-NEXT:[[TMP6:%.*]] = udiv i32 [[BYTE_OFFSET]], 4 +; CHECK-NEXT:[[TMP7:%.*]] = udiv i32 [[BYTE_OFFSET]], 4 +; CHECK-NEXT:[[TMP8:%.*]] = getelementptr i32, i32* [[ADDR:%.*]], i32 [[TMP4]] +; CHECK-NEXT:[[TMP9:%.*]] = getelementptr i32, i32* [[ADDR]], i32 [[TMP5]] +; CHECK-NEXT:[[TMP10:%.*]] = getelementptr i32, i32* [[ADDR]], i32 [[TMP6]] +; CHECK-NEXT:[[TMP11:%.*]] = getelementptr i32, i32* [[ADDR]], i32 [[TMP7]] +; CHECK-NEXT:[[TMP12:%.*]] = load i32, i32* [[TMP8]], align 4 +; CHECK-NEXT:[[TMP13:%.*]] = load i32, i32* [[TMP9]], align 4 +; CHECK-NEXT:[[TMP14:%.*]] = load i32, i32* [[TMP10]], align 4 +; CHECK-NEXT:[[TMP15:%.*]] = load i32, i32* [[TMP11]], align 4 +; CHECK-NEXT:[[INDEX_NEXT]] = add i64 [[INDEX]], 16 +; CHECK-NEXT:[[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096 +; CHECK-NEXT:br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.*]] +; CHECK: middle.block: +; CHECK-NEXT:[[CMP_N:%.*]] = icmp eq i64 4097, 4096 +; CHECK-NEXT:br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]] +; CHECK: scalar.ph: +; CHECK-NEXT:[[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] +; CHECK-NEXT:br label [[FOR_BODY:%.*]] +; CHECK: for.body: +; CHECK-NEXT:[[IV:%.*]] = phi i64 [ [[IV_NEXT:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] +; CHECK-NEXT:[[OFFSET:%.*]] = udiv i32 [[BYTE_OFFSET]], 4 +; CHECK-NEXT:[[GEP:%.*]] = getelementptr i32, i32* [[ADDR]], i32 [[OFFSET]] +; CHECK-NEXT:[[LOAD:%.*]] = load i32, i32* [[GEP]], align 4 +; CHECK-NEXT:[[IV_NEXT]] = add nuw nsw i64 [[IV]], 1 +; CHECK-NEXT:[[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096 +; CHECK-NEXT:br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], [[LOOP7:!llvm.loop !.*]] +; CHECK: loopexit: +; CHECK-NEXT:[[LOAD_LCSSA:%.*]] = phi i32 [ [[LOAD]], [[FOR_BODY]] ], [ [[TMP15]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT:ret i32 [[LOAD_LCSSA]] +; +entry: + br label %for.body + +for.body: + %iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ] + %offset = udiv i32 %byte_offset, 4 + %gep = getelementptr i32, i32* %addr, i32 %offset + %load = load i32, i32* %gep + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond = icmp eq i64 %iv, 4096 + br i1 %exitcond, label %loopexit, label %for.body + +loopexit: + ret i32 %load +} + + define void @uniform_store_uniform_value(i32* align(4) %addr) { ; CHECK-LABEL: @uniform_store_uniform_value( @@ -162,7 +224,7 @@ define void @uniform_store_uniform_value(i32* align(4) %addr) { ; CHECK-NEXT:store i32 0, i32* [[ADDR]], align 4 ; CHECK-NEXT:[[INDEX_NEXT]] = add i64 [[INDEX]], 16 ; CHECK-NEXT:[[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096 -; CHECK-NEXT:br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.*]] +; CHECK-NEXT:br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.*]] ; CHECK: middle.block: ; CHECK-NEXT:[[CMP_N:%.*]] = icmp eq i64 4097, 4096 ; CHECK-NEXT:br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -174,7 +236,7 @@ define void @uniform_store_uniform_value(i32* align(4) %addr) { ; CHECK-NEXT:store i32 0, i32* [[ADDR]], align 4 ; CHECK-NEXT:[[IV_NEXT]] = a
[llvm-branch-commits] [llvm] b3a8a15 - [LAA] Minor code style tweaks [NFC]
Author: Philip Reames Date: 2020-11-24T15:49:27-08:00 New Revision: b3a8a153433f65c419b891ae6763f458b33e9605 URL: https://github.com/llvm/llvm-project/commit/b3a8a153433f65c419b891ae6763f458b33e9605 DIFF: https://github.com/llvm/llvm-project/commit/b3a8a153433f65c419b891ae6763f458b33e9605.diff LOG: [LAA] Minor code style tweaks [NFC] Added: Modified: llvm/lib/Analysis/LoopAccessAnalysis.cpp Removed: diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index 34de1a052ddf..0bffa7dbddec 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -149,27 +149,23 @@ const SCEV *llvm::replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE, // symbolic stride replaced by one. ValueToValueMap::const_iterator SI = PtrToStride.find(OrigPtr ? OrigPtr : Ptr); - if (SI != PtrToStride.end()) { -Value *StrideVal = SI->second; + if (SI == PtrToStride.end()) +// For a non-symbolic stride, just return the original expression. +return OrigSCEV; -// Strip casts. -StrideVal = stripIntegerCast(StrideVal); + Value *StrideVal = stripIntegerCast(SI->second); -ScalarEvolution *SE = PSE.getSE(); -const auto *U = cast(SE->getSCEV(StrideVal)); -const auto *CT = -static_cast(SE->getOne(StrideVal->getType())); + ScalarEvolution *SE = PSE.getSE(); + const auto *U = cast(SE->getSCEV(StrideVal)); + const auto *CT = +static_cast(SE->getOne(StrideVal->getType())); -PSE.addPredicate(*SE->getEqualPredicate(U, CT)); -auto *Expr = PSE.getSCEV(Ptr); + PSE.addPredicate(*SE->getEqualPredicate(U, CT)); + auto *Expr = PSE.getSCEV(Ptr); -LLVM_DEBUG(dbgs() << "LAA: Replacing SCEV: " << *OrigSCEV - << " by: " << *Expr << "\n"); -return Expr; - } - - // Otherwise, just return the SCEV of the original pointer. - return OrigSCEV; + LLVM_DEBUG(dbgs() << "LAA: Replacing SCEV: " << *OrigSCEV +<< " by: " << *Expr << "\n"); + return Expr; } RuntimeCheckingPtrGroup::RuntimeCheckingPtrGroup( @@ -2150,12 +2146,8 @@ bool LoopAccessInfo::isUniform(Value *V) const { } void LoopAccessInfo::collectStridedAccess(Value *MemAccess) { - Value *Ptr = nullptr; - if (LoadInst *LI = dyn_cast(MemAccess)) -Ptr = LI->getPointerOperand(); - else if (StoreInst *SI = dyn_cast(MemAccess)) -Ptr = SI->getPointerOperand(); - else + Value *Ptr = getLoadStorePointerOperand(MemAccess); + if (!Ptr) return; Value *Stride = getStrideFromPointer(Ptr, PSE->getSE(), TheLoop); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 10ddb92 - [SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC]
Author: Philip Reames Date: 2020-11-24T18:47:49-08:00 New Revision: 10ddb927c1c3ee6af0436c23f93fe1da6de7b99a URL: https://github.com/llvm/llvm-project/commit/10ddb927c1c3ee6af0436c23f93fe1da6de7b99a DIFF: https://github.com/llvm/llvm-project/commit/10ddb927c1c3ee6af0436c23f93fe1da6de7b99a.diff LOG: [SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC] Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute. Using the isa form is both shorter, and more readable. Added: Modified: llvm/lib/Analysis/LoopAccessAnalysis.cpp llvm/lib/Transforms/Scalar/LoopInterchange.cpp llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index 0bffa7dbddec..78f63c63cb40 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -1803,7 +1803,7 @@ bool LoopAccessInfo::canAnalyzeLoop() { // ScalarEvolution needs to be able to find the exit count. const SCEV *ExitCount = PSE->getBackedgeTakenCount(); - if (ExitCount == PSE->getSE()->getCouldNotCompute()) { + if (isa(ExitCount)) { recordAnalysis("CantComputeNumberOfIterations") << "could not determine number of loop iterations"; LLVM_DEBUG(dbgs() << "LAA: SCEV could not compute the loop exit count.\n"); diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp index 81b7c3a8338a..f676ffc18e2d 100644 --- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp +++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp @@ -452,7 +452,7 @@ struct LoopInterchange { bool isComputableLoopNest(LoopVector LoopList) { for (Loop *L : LoopList) { const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(L); - if (ExitCountOuter == SE->getCouldNotCompute()) { + if (isa(ExitCountOuter)) { LLVM_DEBUG(dbgs() << "Couldn't compute backedge count\n"); return false; } diff --git a/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp b/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp index 3d0ce87047ad..2ff1e8480749 100644 --- a/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp +++ b/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp @@ -267,7 +267,7 @@ bool LoopVersioningLICM::legalLoopStructure() { // We need to be able to compute the loop trip count in order // to generate the bound checks. const SCEV *ExitCount = SE->getBackedgeTakenCount(CurLoop); - if (ExitCount == SE->getCouldNotCompute()) { + if (isa(ExitCount)) { LLVM_DEBUG(dbgs() << "loop does not has trip count\n"); return false; } diff --git a/llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp b/llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp index 4553b23532f2..ca114581a515 100644 --- a/llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp +++ b/llvm/lib/Transforms/Scalar/PlaceSafepoints.cpp @@ -243,7 +243,7 @@ static bool mustBeFiniteCountedLoop(Loop *L, ScalarEvolution *SE, BasicBlock *Pred) { // A conservative bound on the loop as a whole. const SCEV *MaxTrips = SE->getConstantMaxBackedgeTakenCount(L); - if (MaxTrips != SE->getCouldNotCompute() && + if (!isa(MaxTrips) && SE->getUnsignedRange(MaxTrips).getUnsignedMax().isIntN( CountedLoopTripWidth)) return true; @@ -255,7 +255,7 @@ static bool mustBeFiniteCountedLoop(Loop *L, ScalarEvolution *SE, // This returns an exact expression only. TODO: We really only need an // upper bound here, but SE doesn't expose that. const SCEV *MaxExec = SE->getExitCount(L, Pred); -if (MaxExec != SE->getCouldNotCompute() && +if (!isa(MaxExec) && SE->getUnsignedRange(MaxExec).getUnsignedMax().isIntN( CountedLoopTripWidth)) return true; diff --git a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp index 877495be2dcd..c7e37fe0d1b3 100644 --- a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp +++ b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp @@ -2468,7 +2468,7 @@ Value *SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr *AR, const SCEV *ExitCount = SE.getPredicatedBackedgeTakenCount(AR->getLoop(), Pred); - assert(ExitCount != SE.getCouldNotCompute() && "Invalid loop count"); + assert(!isa(ExitCount) && "Invalid loop count"); const SCEV *Step = AR->getStepRecurrence(SE); const SCEV *Start = AR->getStart(); diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index af314ae4b27b..e29a0a8b
[llvm-branch-commits] [llvm] d93b8ac - [BasicAA] Add print routines to DecomposedGEP for ease of debugging
Author: Philip Reames Date: 2020-12-03T12:43:39-08:00 New Revision: d93b8acd0949f65de5e7360c79f04a98a66cbd9d URL: https://github.com/llvm/llvm-project/commit/d93b8acd0949f65de5e7360c79f04a98a66cbd9d DIFF: https://github.com/llvm/llvm-project/commit/d93b8acd0949f65de5e7360c79f04a98a66cbd9d.diff LOG: [BasicAA] Add print routines to DecomposedGEP for ease of debugging Added: Modified: llvm/include/llvm/Analysis/BasicAliasAnalysis.h Removed: diff --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h index 7f3cbba0b6af..e59fd6919f66 100644 --- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h +++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h @@ -126,6 +126,14 @@ class BasicAAResult : public AAResultBase { bool operator!=(const VariableGEPIndex &Other) const { return !operator==(Other); } + +void dump() const { print(dbgs()); } +void print(raw_ostream &OS) const { + OS << "(V=" << V->getName() +<< ", zextbits=" << ZExtBits +<< ", sextbits=" << SExtBits +<< ", scale=" << Scale << ")"; +} }; // Represents the internal structure of a GEP, decomposed into a base pointer, @@ -139,6 +147,20 @@ class BasicAAResult : public AAResultBase { SmallVector VarIndices; // Is GEP index scale compile-time constant. bool HasCompileTimeConstantScale; + +void dump() const { print(dbgs()); } +void print(raw_ostream &OS) const { + OS << "(DecomposedGEP Base=" << Base->getName() +<< ", Offset=" << Offset +<< ", VarIndices=[" << Offset; + for (size_t i = 0; i < VarIndices.size(); i++) { + if (i != 0) + OS << ", "; + VarIndices[i].print(OS); + } + OS << "], HasCompileTimeConstantScale=" << HasCompileTimeConstantScale +<< ")"; +} }; /// Tracks phi nodes we have visited. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 17b195b - [BasicAA] Minor formatting improvements for printers
Author: Philip Reames Date: 2020-12-03T13:08:56-08:00 New Revision: 17b195b632a780adf637432beda63c91eea2c106 URL: https://github.com/llvm/llvm-project/commit/17b195b632a780adf637432beda63c91eea2c106 DIFF: https://github.com/llvm/llvm-project/commit/17b195b632a780adf637432beda63c91eea2c106.diff LOG: [BasicAA] Minor formatting improvements for printers Added: Modified: llvm/include/llvm/Analysis/BasicAliasAnalysis.h Removed: diff --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h index e59fd6919f66..4a149387eb74 100644 --- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h +++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h @@ -132,7 +132,7 @@ class BasicAAResult : public AAResultBase { OS << "(V=" << V->getName() << ", zextbits=" << ZExtBits << ", sextbits=" << SExtBits -<< ", scale=" << Scale << ")"; +<< ", scale=" << Scale << ")\n"; } }; @@ -152,14 +152,14 @@ class BasicAAResult : public AAResultBase { void print(raw_ostream &OS) const { OS << "(DecomposedGEP Base=" << Base->getName() << ", Offset=" << Offset -<< ", VarIndices=[" << Offset; +<< ", VarIndices=["; for (size_t i = 0; i < VarIndices.size(); i++) { if (i != 0) OS << ", "; VarIndices[i].print(OS); } OS << "], HasCompileTimeConstantScale=" << HasCompileTimeConstantScale -<< ")"; +<< ")\n"; } }; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 55db6ec - [BasicAA] Move newline to dump from printer
Author: Philip Reames Date: 2020-12-03T14:35:43-08:00 New Revision: 55db6ec1cc20d32ad179e0059aafcc545125fca6 URL: https://github.com/llvm/llvm-project/commit/55db6ec1cc20d32ad179e0059aafcc545125fca6 DIFF: https://github.com/llvm/llvm-project/commit/55db6ec1cc20d32ad179e0059aafcc545125fca6.diff LOG: [BasicAA] Move newline to dump from printer Added: Modified: llvm/include/llvm/Analysis/BasicAliasAnalysis.h Removed: diff --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h index 4a149387eb74..d9a174951695 100644 --- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h +++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h @@ -127,12 +127,15 @@ class BasicAAResult : public AAResultBase { return !operator==(Other); } -void dump() const { print(dbgs()); } +void dump() const { + print(dbgs()); + dbgs() << "\n"; +} void print(raw_ostream &OS) const { OS << "(V=" << V->getName() << ", zextbits=" << ZExtBits << ", sextbits=" << SExtBits -<< ", scale=" << Scale << ")\n"; +<< ", scale=" << Scale << ")"; } }; @@ -148,7 +151,10 @@ class BasicAAResult : public AAResultBase { // Is GEP index scale compile-time constant. bool HasCompileTimeConstantScale; -void dump() const { print(dbgs()); } +void dump() const { + print(dbgs()); + dbgs() << "\n"; +} void print(raw_ostream &OS) const { OS << "(DecomposedGEP Base=" << Base->getName() << ", Offset=" << Offset @@ -159,7 +165,7 @@ class BasicAAResult : public AAResultBase { VarIndices[i].print(OS); } OS << "], HasCompileTimeConstantScale=" << HasCompileTimeConstantScale -<< ")\n"; +<< ")"; } }; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 0c866a3 - [LoopVec] Support non-instructions as argument to uniform mem ops
Author: Philip Reames Date: 2020-12-03T14:51:44-08:00 New Revision: 0c866a3d6aa492b01c29a2c582c56c0fd75c2970 URL: https://github.com/llvm/llvm-project/commit/0c866a3d6aa492b01c29a2c582c56c0fd75c2970 DIFF: https://github.com/llvm/llvm-project/commit/0c866a3d6aa492b01c29a2c582c56c0fd75c2970.diff LOG: [LoopVec] Support non-instructions as argument to uniform mem ops The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists. This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of *it's* operand. Only instructions within the loop have their uses scanned.) In the process, we remove a restriction which required the operand of the uniform mem op to itself be an instruction. This allows detection of uniform mem ops involving global addresses. Differential Revision: https://reviews.llvm.org/D92056 Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll llvm/test/Transforms/LoopVectorize/pr44488-predication.ll Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index daa100ebe8cd..8c02be8530be 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -5252,24 +5252,13 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) { if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse()) addToWorklistIfAllowed(Cmp); - // Holds consecutive and consecutive-like pointers. Consecutive-like pointers - // are pointers that are treated like consecutive pointers during - // vectorization. The pointer operands of interleaved accesses are an - // example. - SmallSetVector ConsecutiveLikePtrs; - - // Holds pointer operands of instructions that are possibly non-uniform. - SmallPtrSet PossibleNonUniformPtrs; - auto isUniformDecision = [&](Instruction *I, ElementCount VF) { InstWidening WideningDecision = getWideningDecision(I, VF); assert(WideningDecision != CM_Unknown && "Widening decision should be ready at this moment"); -// The address of a uniform mem op is itself uniform. We exclude stores -// here as there's an assumption in the current code that all uses of -// uniform instructions are uniform and, as noted below, uniform stores are -// still handled via replication (i.e. aren't uniform after vectorization). +// A uniform memory op is itself uniform. We exclude uniform stores +// here as they demand the last lane, not the first one. if (isa(I) && Legal->isUniformMemOp(*I)) { assert(WideningDecision == CM_Scalarize); return true; @@ -5287,14 +5276,15 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) { return getLoadStorePointerOperand(I) == Ptr && isUniformDecision(I, VF); }; - // Iterate over the instructions in the loop, and collect all - // consecutive-like pointer operands in ConsecutiveLikePtrs. If it's possible - // that a consecutive-like pointer operand will be scalarized, we collect it - // in PossibleNonUniformPtrs instead. We use two sets here because a single - // getelementptr instruction can be used by both vectorized and scalarized - // memory instructions. For example, if a loop loads and stores from the same - // location, but the store is conditional, the store will be scalarized, and - // the getelementptr won't remain uniform. + // Holds a list of values which are known to have at least one uniform use. + // Note that there may be other uses which aren't uniform. A "uniform use" + // here is something which only demands lane 0 of the unrolled iterations; + // it does not imply that all lanes produce the same value (e.g. this is not + // the usual meaning of uniform) + SmallPtrSet HasUniformUse; + + // Scan the loop for instructions which are either a) known to have only + // lane 0 demanded or b) are uses which demand only lane 0 of their operand. for (auto *BB : TheLoop->blocks()) for (auto &I : *BB) { // If there's no pointer operand, there's nothing to do. @@ -5302,45 +5292,31 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) { if (!Ptr) continue; - // For now, avoid walking use lists in other functions. - //
[llvm-branch-commits] [llvm] 0129cd5 - Use deref facts derived from minimum object size of allocations
Author: Philip Reames Date: 2020-12-03T15:01:14-08:00 New Revision: 0129cd503575076556935a16f458b0a3c2e30646 URL: https://github.com/llvm/llvm-project/commit/0129cd503575076556935a16f458b0a3c2e30646 DIFF: https://github.com/llvm/llvm-project/commit/0129cd503575076556935a16f458b0a3c2e30646.diff LOG: Use deref facts derived from minimum object size of allocations This change should be fairly straight forward. If we've reached a call, check to see if we can tell the result is dereferenceable from information about the minimum object size returned by the call. To control compile time impact, I'm only adding the call for base facts in the routine. getObjectSize can also do recursive reasoning, and we don't want that general capability here. As a follow up patch (without separate review), I will plumb through the missing TLI parameter. That will have the effect of extending this to known libcalls - malloc, new, and the like - whereas currently this only covers calls with the explicit allocsize attribute. Differential Revision: https://reviews.llvm.org/D90341 Added: Modified: llvm/lib/Analysis/Loads.cpp llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll Removed: diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp index 2ca35a4344ec..8f373f70f216 100644 --- a/llvm/lib/Analysis/Loads.cpp +++ b/llvm/lib/Analysis/Loads.cpp @@ -12,7 +12,9 @@ #include "llvm/Analysis/Loads.h" #include "llvm/Analysis/AliasAnalysis.h" +#include "llvm/Analysis/CaptureTracking.h" #include "llvm/Analysis/LoopInfo.h" +#include "llvm/Analysis/MemoryBuiltins.h" #include "llvm/Analysis/ScalarEvolution.h" #include "llvm/Analysis/ScalarEvolutionExpressions.h" #include "llvm/Analysis/ValueTracking.h" @@ -107,11 +109,50 @@ static bool isDereferenceableAndAlignedPointer( return isDereferenceableAndAlignedPointer(ASC->getOperand(0), Alignment, Size, DL, CtxI, DT, Visited, MaxDepth); - if (const auto *Call = dyn_cast(V)) + if (const auto *Call = dyn_cast(V)) { if (auto *RP = getArgumentAliasingToReturnedPointer(Call, true)) return isDereferenceableAndAlignedPointer(RP, Alignment, Size, DL, CtxI, DT, Visited, MaxDepth); +// If we have a call we can't recurse through, check to see if this is an +// allocation function for which we can establish an minimum object size. +// Such a minimum object size is analogous to a deref_or_null attribute in +// that we still need to prove the result non-null at point of use. +// NOTE: We can only use the object size as a base fact as we a) need to +// prove alignment too, and b) don't want the compile time impact of a +// separate recursive walk. +ObjectSizeOpts Opts; +// TODO: It may be okay to round to align, but that would imply that +// accessing slightly out of bounds was legal, and we're currently +// inconsistent about that. For the moment, be conservative. +Opts.RoundToAlign = false; +Opts.NullIsUnknownSize = true; +uint64_t ObjSize; +// TODO: Plumb through TLI so that malloc routines and such working. +if (getObjectSize(V, ObjSize, DL, nullptr, Opts)) { + APInt KnownDerefBytes(Size.getBitWidth(), ObjSize); + if (KnownDerefBytes.getBoolValue() && KnownDerefBytes.uge(Size) && + isKnownNonZero(V, DL, 0, nullptr, CtxI, DT) && + // TODO: We're currently inconsistent about whether deref(N) is a + // global fact or a point in time fact. Once D61652 eventually + // lands, this check will be restricted to the point in time + // variant. For that variant, we need to prove that object hasn't + // been conditionally freed before ontext instruction - if it has, we + // might be hoisting over the inverse conditional and creating a + // dynamic use after free. + !PointerMayBeCapturedBefore(V, true, true, CtxI, DT, true)) { +// As we recursed through GEPs to get here, we've incrementally +// checked that each step advanced by a multiple of the alignment. If +// our base is properly aligned, then the original offset accessed +// must also be. +Type *Ty = V->getType(); +assert(Ty->isSized() && "must be sized"); +APInt Offset(DL.getTypeStoreSizeInBits(Ty), 0); +return isAligned(V, Offset, Alignment, DL); + } +} + } + // If we don't know, assume the worst. return false; } diff --git a/llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll b/llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll index 7937c71b7705..167285707e02 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll @@ -8,7 +8,7 @@ target triple = "x86_64-unknown-linux-
[llvm-branch-commits] [llvm] 99f79cb - [test] precommit test for D92698
Author: Philip Reames Date: 2020-12-04T15:17:39-08:00 New Revision: 99f79cbf31cc6ccdfa1aed253a64c5e8012f4ef7 URL: https://github.com/llvm/llvm-project/commit/99f79cbf31cc6ccdfa1aed253a64c5e8012f4ef7 DIFF: https://github.com/llvm/llvm-project/commit/99f79cbf31cc6ccdfa1aed253a64c5e8012f4ef7.diff LOG: [test] precommit test for D92698 Added: Modified: llvm/test/Analysis/ValueTracking/known-non-equal.ll Removed: diff --git a/llvm/test/Analysis/ValueTracking/known-non-equal.ll b/llvm/test/Analysis/ValueTracking/known-non-equal.ll index d28b3f4f63a3..ae2251b97ac4 100644 --- a/llvm/test/Analysis/ValueTracking/known-non-equal.ll +++ b/llvm/test/Analysis/ValueTracking/known-non-equal.ll @@ -1,20 +1,140 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; RUN: opt -instsimplify < %s -S | FileCheck %s -; CHECK: define i1 @test define i1 @test(i8* %pq, i8 %B) { +; CHECK-LABEL: @test( +; CHECK-NEXT:ret i1 false +; %q = load i8, i8* %pq, !range !0 ; %q is known nonzero; no known bits %A = add nsw i8 %B, %q %cmp = icmp eq i8 %A, %B - ; CHECK: ret i1 false ret i1 %cmp } -; CHECK: define i1 @test2 define i1 @test2(i8 %a, i8 %b) { +; CHECK-LABEL: @test2( +; CHECK-NEXT:ret i1 false +; %A = or i8 %a, 2; %A[1] = 1 %B = and i8 %b, -3 ; %B[1] = 0 %cmp = icmp eq i8 %A, %B ; %A[1] and %B[1] are contradictory. - ; CHECK: ret i1 false + ret i1 %cmp +} + +define i1 @test3(i8 %B) { +; CHECK-LABEL: @test3( +; CHECK-NEXT:ret i1 false +; + %A = add nsw i8 %B, 1 + %cmp = icmp eq i8 %A, %B + ret i1 %cmp +} + +define i1 @sext(i8 %B) { +; CHECK-LABEL: @sext( +; CHECK-NEXT:ret i1 false +; + %A = add nsw i8 %B, 1 + %A.cast = sext i8 %A to i32 + %B.cast = sext i8 %B to i32 + %cmp = icmp eq i32 %A.cast, %B.cast + ret i1 %cmp +} + +define i1 @zext(i8 %B) { +; CHECK-LABEL: @zext( +; CHECK-NEXT:ret i1 false +; + %A = add nsw i8 %B, 1 + %A.cast = zext i8 %A to i32 + %B.cast = zext i8 %B to i32 + %cmp = icmp eq i32 %A.cast, %B.cast + ret i1 %cmp +} + +define i1 @inttoptr(i32 %B) { +; CHECK-LABEL: @inttoptr( +; CHECK-NEXT:[[A:%.*]] = add nsw i32 [[B:%.*]], 1 +; CHECK-NEXT:[[A_CAST:%.*]] = inttoptr i32 [[A]] to i8* +; CHECK-NEXT:[[B_CAST:%.*]] = inttoptr i32 [[B]] to i8* +; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8* [[A_CAST]], [[B_CAST]] +; CHECK-NEXT:ret i1 [[CMP]] +; + %A = add nsw i32 %B, 1 + %A.cast = inttoptr i32 %A to i8* + %B.cast = inttoptr i32 %B to i8* + %cmp = icmp eq i8* %A.cast, %B.cast + ret i1 %cmp +} + +define i1 @ptrtoint(i32* %B) { +; CHECK-LABEL: @ptrtoint( +; CHECK-NEXT:[[A:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i32 1 +; CHECK-NEXT:[[A_CAST:%.*]] = ptrtoint i32* [[A]] to i32 +; CHECK-NEXT:[[B_CAST:%.*]] = ptrtoint i32* [[B]] to i32 +; CHECK-NEXT:[[CMP:%.*]] = icmp eq i32 [[A_CAST]], [[B_CAST]] +; CHECK-NEXT:ret i1 [[CMP]] +; + %A = getelementptr inbounds i32, i32* %B, i32 1 + %A.cast = ptrtoint i32* %A to i32 + %B.cast = ptrtoint i32* %B to i32 + %cmp = icmp eq i32 %A.cast, %B.cast + ret i1 %cmp +} + +define i1 @add1(i8 %B, i8 %C) { +; CHECK-LABEL: @add1( +; CHECK-NEXT:ret i1 false +; + %A = add i8 %B, 1 + %A.op = add i8 %A, %C + %B.op = add i8 %B, %C + + %cmp = icmp eq i8 %A.op, %B.op + ret i1 %cmp +} + +define i1 @add2(i8 %B, i8 %C) { +; CHECK-LABEL: @add2( +; CHECK-NEXT:ret i1 false +; + %A = add i8 %B, 1 + %A.op = add i8 %C, %A + %B.op = add i8 %C, %B + + %cmp = icmp eq i8 %A.op, %B.op + ret i1 %cmp +} + +define i1 @sub1(i8 %B, i8 %C) { +; CHECK-LABEL: @sub1( +; CHECK-NEXT:[[A:%.*]] = add i8 [[B:%.*]], 1 +; CHECK-NEXT:[[A_OP:%.*]] = sub i8 [[A]], [[C:%.*]] +; CHECK-NEXT:[[B_OP:%.*]] = sub i8 [[B]], [[C]] +; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8 [[A_OP]], [[B_OP]] +; CHECK-NEXT:ret i1 [[CMP]] +; + %A = add i8 %B, 1 + %A.op = sub i8 %A, %C + %B.op = sub i8 %B, %C + + %cmp = icmp eq i8 %A.op, %B.op + ret i1 %cmp +} + +define i1 @sub2(i8 %B, i8 %C) { +; CHECK-LABEL: @sub2( +; CHECK-NEXT:[[A:%.*]] = add i8 [[B:%.*]], 1 +; CHECK-NEXT:[[A_OP:%.*]] = sub i8 [[C:%.*]], [[A]] +; CHECK-NEXT:[[B_OP:%.*]] = sub i8 [[C]], [[B]] +; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8 [[A_OP]], [[B_OP]] +; CHECK-NEXT:ret i1 [[CMP]] +; + %A = add i8 %B, 1 + %A.op = sub i8 %C, %A + %B.op = sub i8 %C, %B + + %cmp = icmp eq i8 %A.op, %B.op ret i1 %cmp } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] bfda694 - [BasicAA] Fix a bug with relational reasoning across iterations
Author: Philip Reames Date: 2020-12-05T14:10:21-08:00 New Revision: bfda69416c6d0a76b40644b1b0cbc1cbca254a61 URL: https://github.com/llvm/llvm-project/commit/bfda69416c6d0a76b40644b1b0cbc1cbca254a61 DIFF: https://github.com/llvm/llvm-project/commit/bfda69416c6d0a76b40644b1b0cbc1cbca254a61.diff LOG: [BasicAA] Fix a bug with relational reasoning across iterations Due to the recursion through phis basicaa does, the code needs to be extremely careful not to reason about equality between values which might represent distinct iterations. I'm generally skeptical of the correctness of the whole scheme, but this particular patch fixes one particular instance which is demonstrateable incorrect. Interestingly, this appears to be the second attempted fix for the same issue. The former fix is incomplete and doesn't address the actual issue. Differential Revision: https://reviews.llvm.org/D92694 Added: Modified: llvm/include/llvm/Analysis/BasicAliasAnalysis.h llvm/lib/Analysis/BasicAliasAnalysis.cpp llvm/test/Analysis/BasicAA/phi-aa.ll Removed: diff --git a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h index d9a174951695..eedecd2a4381 100644 --- a/llvm/include/llvm/Analysis/BasicAliasAnalysis.h +++ b/llvm/include/llvm/Analysis/BasicAliasAnalysis.h @@ -202,6 +202,12 @@ class BasicAAResult : public AAResultBase { const DecomposedGEP &DecompGEP, const DecomposedGEP &DecompObject, LocationSize ObjectAccessSize); + AliasResult aliasSameBasePointerGEPs(const GEPOperator *GEP1, + LocationSize MaybeV1Size, + const GEPOperator *GEP2, + LocationSize MaybeV2Size, + const DataLayout &DL); + /// A Heuristic for aliasGEP that searches for a constant offset /// between the variables. /// diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp index 2fb353eabb6e..5e611a9e193c 100644 --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp @@ -1032,11 +1032,11 @@ ModRefInfo BasicAAResult::getModRefInfo(const CallBase *Call1, /// Provide ad-hoc rules to disambiguate accesses through two GEP operators, /// both having the exact same pointer operand. -static AliasResult aliasSameBasePointerGEPs(const GEPOperator *GEP1, -LocationSize MaybeV1Size, -const GEPOperator *GEP2, -LocationSize MaybeV2Size, -const DataLayout &DL) { +AliasResult BasicAAResult::aliasSameBasePointerGEPs(const GEPOperator *GEP1, +LocationSize MaybeV1Size, +const GEPOperator *GEP2, +LocationSize MaybeV2Size, +const DataLayout &DL) { assert(GEP1->getPointerOperand()->stripPointerCastsAndInvariantGroups() == GEP2->getPointerOperand()->stripPointerCastsAndInvariantGroups() && GEP1->getPointerOperandType() == GEP2->getPointerOperandType() && @@ -1126,24 +1126,12 @@ static AliasResult aliasSameBasePointerGEPs(const GEPOperator *GEP1, if (C1 && C2) return NoAlias; { + // If we're not potentially reasoning about values from diff erent + // iterations, see if we can prove them inequal. Value *GEP1LastIdx = GEP1->getOperand(GEP1->getNumOperands() - 1); Value *GEP2LastIdx = GEP2->getOperand(GEP2->getNumOperands() - 1); - if (isa(GEP1LastIdx) || isa(GEP2LastIdx)) { -// If one of the indices is a PHI node, be safe and only use -// computeKnownBits so we don't make any assumptions about the -// relationships between the two indices. This is important if we're -// asking about values from diff erent loop iterations. See PR32314. -// TODO: We may be able to change the check so we only do this when -// we definitely looked through a PHINode. -if (GEP1LastIdx != GEP2LastIdx && -GEP1LastIdx->getType() == GEP2LastIdx->getType()) { - KnownBits Known1 = computeKnownBits(GEP1LastIdx, DL); - KnownBits Known2 = computeKnownBits(GEP2LastIdx, DL); - if (Known1.Zero.intersects(Known2.One) || - Known1.One.intersects(Known2.Zero)) -return NoAlias; -} - } else if (isKnownNonEqual(GEP1LastIdx, GEP2LastIdx, DL)) + if (VisitedPhiBBs.empty() && + isKnownNonEqual(GEP1LastIdx, GEP2LastIdx, DL)) return NoAlias; } } diff --git a/llvm/test/Analysis/BasicAA/phi-
[llvm-branch-commits] [llvm] 8f07629 - Add recursive decomposition reasoning to isKnownNonEqual
Author: Philip Reames Date: 2020-12-05T15:58:19-08:00 New Revision: 8f076291be41467560ebf73738561225d2b67206 URL: https://github.com/llvm/llvm-project/commit/8f076291be41467560ebf73738561225d2b67206 DIFF: https://github.com/llvm/llvm-project/commit/8f076291be41467560ebf73738561225d2b67206.diff LOG: Add recursive decomposition reasoning to isKnownNonEqual The basic idea is that by looking through operand instructions which don't change the equality result that we can push the existing known bits comparison down past instructions which would obscure them. We have analogous handling in InstSimplify for most - though weirdly not all - of these cases starting from an icmp root. It's a bit unfortunate to duplicate logic, but since my actual goal is to extend BasicAA, the icmp logic doesn't help. (And just makes it hard to test here.) The BasicAA change will be posted separately for review. Differential Revision: https://reviews.llvm.org/D92698 Added: Modified: llvm/lib/Analysis/ValueTracking.cpp llvm/test/Analysis/ValueTracking/known-non-equal.ll Removed: diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index 32e0ca321dec..a1bb6e2eea78 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -350,13 +350,14 @@ bool llvm::isKnownNegative(const Value *V, const DataLayout &DL, unsigned Depth, return Known.isNegative(); } -static bool isKnownNonEqual(const Value *V1, const Value *V2, const Query &Q); +static bool isKnownNonEqual(const Value *V1, const Value *V2, unsigned Depth, +const Query &Q); bool llvm::isKnownNonEqual(const Value *V1, const Value *V2, const DataLayout &DL, AssumptionCache *AC, const Instruction *CxtI, const DominatorTree *DT, bool UseInstrInfo) { - return ::isKnownNonEqual(V1, V2, + return ::isKnownNonEqual(V1, V2, 0, Query(DL, AC, safeCxtI(V1, safeCxtI(V2, CxtI)), DT, UseInstrInfo, /*ORE=*/nullptr)); } @@ -2486,7 +2487,8 @@ bool isKnownNonZero(const Value* V, unsigned Depth, const Query& Q) { } /// Return true if V2 == V1 + X, where X is known non-zero. -static bool isAddOfNonZero(const Value *V1, const Value *V2, const Query &Q) { +static bool isAddOfNonZero(const Value *V1, const Value *V2, unsigned Depth, + const Query &Q) { const BinaryOperator *BO = dyn_cast(V1); if (!BO || BO->getOpcode() != Instruction::Add) return false; @@ -2497,24 +2499,54 @@ static bool isAddOfNonZero(const Value *V1, const Value *V2, const Query &Q) { Op = BO->getOperand(0); else return false; - return isKnownNonZero(Op, 0, Q); + return isKnownNonZero(Op, Depth + 1, Q); } /// Return true if it is known that V1 != V2. -static bool isKnownNonEqual(const Value *V1, const Value *V2, const Query &Q) { +static bool isKnownNonEqual(const Value *V1, const Value *V2, unsigned Depth, +const Query &Q) { if (V1 == V2) return false; if (V1->getType() != V2->getType()) // We can't look through casts yet. return false; - if (isAddOfNonZero(V1, V2, Q) || isAddOfNonZero(V2, V1, Q)) + + if (Depth >= MaxAnalysisRecursionDepth) +return false; + + // See if we can recurse through (exactly one of) our operands. + auto *O1 = dyn_cast(V1); + auto *O2 = dyn_cast(V2); + if (O1 && O2 && O1->getOpcode() == O2->getOpcode()) { +switch (O1->getOpcode()) { +default: break; +case Instruction::Add: +case Instruction::Sub: + // Assume operand order has been canonicalized + if (O1->getOperand(0) == O2->getOperand(0)) +return isKnownNonEqual(O1->getOperand(1), O2->getOperand(1), + Depth + 1, Q); + if (O1->getOperand(1) == O2->getOperand(1)) +return isKnownNonEqual(O1->getOperand(0), O2->getOperand(0), + Depth + 1, Q); + break; +case Instruction::SExt: +case Instruction::ZExt: + if (O1->getOperand(0)->getType() == O2->getOperand(0)->getType()) +return isKnownNonEqual(O1->getOperand(0), O2->getOperand(0), + Depth + 1, Q); + break; +}; + } + + if (isAddOfNonZero(V1, V2, Depth, Q) || isAddOfNonZero(V2, V1, Depth, Q)) return true; if (V1->getType()->isIntOrIntVectorTy()) { // Are any known bits in V1 contradictory to known bits in V2? If V1 // has a known zero where V2 has a known one, they must not be equal. -KnownBits Known1 = computeKnownBits(V1, 0, Q); -KnownBits Known2 = computeKnownBits(V2, 0, Q); +KnownBits Known1 = computeKnownBits(V1, Depth, Q); +KnownBits Known2 = computeKnownBits(V2, Depth, Q); if (Known1.Zero.intersects(Known2.One) |
[llvm-branch-commits] [llvm] 2656885 - Teach isKnownNonEqual how to recurse through invertible multiplies
Author: Philip Reames Date: 2020-12-07T14:52:08-08:00 New Revision: 2656885390f17cceae142b4265c337fcee2410c0 URL: https://github.com/llvm/llvm-project/commit/2656885390f17cceae142b4265c337fcee2410c0 DIFF: https://github.com/llvm/llvm-project/commit/2656885390f17cceae142b4265c337fcee2410c0.diff LOG: Teach isKnownNonEqual how to recurse through invertible multiplies Build on the work started in 8f07629, and add the multiply case. In the process, more clearly describe the requirement for the operation we're looking through. Differential Revision: https://reviews.llvm.org/D92726 Added: Modified: llvm/lib/Analysis/ValueTracking.cpp llvm/test/Analysis/ValueTracking/known-non-equal.ll Removed: diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index a1bb6e2eea78..eeb505868703 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -2502,6 +2502,7 @@ static bool isAddOfNonZero(const Value *V1, const Value *V2, unsigned Depth, return isKnownNonZero(Op, Depth + 1, Q); } + /// Return true if it is known that V1 != V2. static bool isKnownNonEqual(const Value *V1, const Value *V2, unsigned Depth, const Query &Q) { @@ -2514,7 +2515,9 @@ static bool isKnownNonEqual(const Value *V1, const Value *V2, unsigned Depth, if (Depth >= MaxAnalysisRecursionDepth) return false; - // See if we can recurse through (exactly one of) our operands. + // See if we can recurse through (exactly one of) our operands. This + // requires our operation be 1-to-1 and map every input value to exactly + // one output value. Such an operation is invertible. auto *O1 = dyn_cast(V1); auto *O2 = dyn_cast(V2); if (O1 && O2 && O1->getOpcode() == O2->getOpcode()) { @@ -2530,6 +2533,23 @@ static bool isKnownNonEqual(const Value *V1, const Value *V2, unsigned Depth, return isKnownNonEqual(O1->getOperand(0), O2->getOperand(0), Depth + 1, Q); break; +case Instruction::Mul: + // invertible if A * B == (A * B) mod 2^N where A, and B are integers + // and N is the bitwdith. The nsw case is non-obvious, but proven by + // alive2: https://alive2.llvm.org/ce/z/Z6D5qK + if ((!cast(O1)->hasNoUnsignedWrap() || + !cast(O2)->hasNoUnsignedWrap()) && + (!cast(O1)->hasNoSignedWrap() || + !cast(O2)->hasNoSignedWrap())) +break; + + // Assume operand order has been canonicalized + if (O1->getOperand(1) == O2->getOperand(1) && + isa(O1->getOperand(1)) && + !cast(O1->getOperand(1))->isZero()) +return isKnownNonEqual(O1->getOperand(0), O2->getOperand(0), + Depth + 1, Q); + break; case Instruction::SExt: case Instruction::ZExt: if (O1->getOperand(0)->getType() == O2->getOperand(0)->getType()) diff --git a/llvm/test/Analysis/ValueTracking/known-non-equal.ll b/llvm/test/Analysis/ValueTracking/known-non-equal.ll index 664542f632ab..8bc9a86c9a93 100644 --- a/llvm/test/Analysis/ValueTracking/known-non-equal.ll +++ b/llvm/test/Analysis/ValueTracking/known-non-equal.ll @@ -130,4 +130,76 @@ define i1 @sub2(i8 %B, i8 %C) { ret i1 %cmp } +; op could wrap mapping two values to the same output value. +define i1 @mul1(i8 %B) { +; CHECK-LABEL: @mul1( +; CHECK-NEXT:[[A:%.*]] = add i8 [[B:%.*]], 1 +; CHECK-NEXT:[[A_OP:%.*]] = mul i8 [[A]], 27 +; CHECK-NEXT:[[B_OP:%.*]] = mul i8 [[B]], 27 +; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8 [[A_OP]], [[B_OP]] +; CHECK-NEXT:ret i1 [[CMP]] +; + %A = add i8 %B, 1 + %A.op = mul i8 %A, 27 + %B.op = mul i8 %B, 27 + + %cmp = icmp eq i8 %A.op, %B.op + ret i1 %cmp +} + +define i1 @mul2(i8 %B) { +; CHECK-LABEL: @mul2( +; CHECK-NEXT:ret i1 false +; + %A = add i8 %B, 1 + %A.op = mul nuw i8 %A, 27 + %B.op = mul nuw i8 %B, 27 + + %cmp = icmp eq i8 %A.op, %B.op + ret i1 %cmp +} + +define i1 @mul3(i8 %B) { +; CHECK-LABEL: @mul3( +; CHECK-NEXT:ret i1 false +; + %A = add i8 %B, 1 + %A.op = mul nsw i8 %A, 27 + %B.op = mul nsw i8 %B, 27 + + %cmp = icmp eq i8 %A.op, %B.op + ret i1 %cmp +} + +; Multiply by zero collapses all values to one +define i1 @mul4(i8 %B) { +; CHECK-LABEL: @mul4( +; CHECK-NEXT:ret i1 true +; + %A = add i8 %B, 1 + %A.op = mul nuw i8 %A, 0 + %B.op = mul nuw i8 %B, 0 + + %cmp = icmp eq i8 %A.op, %B.op + ret i1 %cmp +} + +; C might be zero, we can't tell +define i1 @mul5(i8 %B, i8 %C) { +; CHECK-LABEL: @mul5( +; CHECK-NEXT:[[A:%.*]] = add i8 [[B:%.*]], 1 +; CHECK-NEXT:[[A_OP:%.*]] = mul nuw nsw i8 [[A]], [[C:%.*]] +; CHECK-NEXT:[[B_OP:%.*]] = mul nuw nsw i8 [[B]], [[C]] +; CHECK-NEXT:[[CMP:%.*]] = icmp eq i8 [[A_OP]], [[B_OP]] +; CHECK-NEXT:ret i1 [[CMP]] +; + %A = add i8 %B, 1 + %A.op = mul nsw nuw i8 %A, %C + %B.op = mul nsw nuw i
[llvm-branch-commits] [llvm] 5171b7b - [indvars] Common a bit of code [NFC]
Author: Philip Reames Date: 2020-12-08T15:25:48-08:00 New Revision: 5171b7b40e9813e3fbfaf1e1e3372895c9ff6081 URL: https://github.com/llvm/llvm-project/commit/5171b7b40e9813e3fbfaf1e1e3372895c9ff6081 DIFF: https://github.com/llvm/llvm-project/commit/5171b7b40e9813e3fbfaf1e1e3372895c9ff6081.diff LOG: [indvars] Common a bit of code [NFC] Added: Modified: llvm/lib/Transforms/Utils/SimplifyIndVar.cpp Removed: diff --git a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp index c02264aec600..189130f0e0ac 100644 --- a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp @@ -1272,28 +1272,8 @@ Instruction *WidenIV::cloneArithmeticIVUser(WidenIV::NarrowIVDefUse DU, } // WideUse is "WideDef `op.wide` X" as described in the comment. -const SCEV *WideUse = nullptr; - -switch (NarrowUse->getOpcode()) { -default: - llvm_unreachable("No other possibility!"); - -case Instruction::Add: - WideUse = SE->getAddExpr(WideLHS, WideRHS); - break; - -case Instruction::Mul: - WideUse = SE->getMulExpr(WideLHS, WideRHS); - break; - -case Instruction::UDiv: - WideUse = SE->getUDivExpr(WideLHS, WideRHS); - break; - -case Instruction::Sub: - WideUse = SE->getMinusSCEV(WideLHS, WideRHS); - break; -} +const SCEV *WideUse = + getSCEVByOpCode(WideLHS, WideRHS, NarrowUse->getOpcode()); return WideUse == WideAR; }; @@ -1332,14 +1312,18 @@ WidenIV::ExtendKind WidenIV::getExtendKind(Instruction *I) { const SCEV *WidenIV::getSCEVByOpCode(const SCEV *LHS, const SCEV *RHS, unsigned OpCode) const { - if (OpCode == Instruction::Add) + switch (OpCode) { + case Instruction::Add: return SE->getAddExpr(LHS, RHS); - if (OpCode == Instruction::Sub) + case Instruction::Sub: return SE->getMinusSCEV(LHS, RHS); - if (OpCode == Instruction::Mul) + case Instruction::Mul: return SE->getMulExpr(LHS, RHS); - - llvm_unreachable("Unsupported opcode."); + case Instruction::UDiv: +return SE->getUDivExpr(LHS, RHS); + default: +llvm_unreachable("Unsupported opcode."); + }; } /// No-wrap operations can transfer sign extension of their result to their ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)
preames wrote: At a macro level, it looks like ExpandMemCmp is making some problematic choices around unaligned loads and stores. As I commented before, ExpandMemCmp appears to be blindly emitting unaligned accesses (counted as one against budget) without accounting for the fact that such loads are going to be scalarized again (i.e. resulting in N x loads, where N is the type size). I think we need to fix this. In particular, the discussion around Zbb and Zbkb in this review seem to mostly come from cases where unaligned load.store are being expanded implicitly, I don't believe this change should move forward until the underlying issue in ExpandMemCmp has been addressed. https://github.com/llvm/llvm-project/pull/107548 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)
https://github.com/preames approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/107548 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Set DisableLatencyHeuristic to true (PR #115858)
preames wrote: Given @michaelmaitland's data, @wangpc-pp the burden shifts to you to clearly justify which cases this is profitable and figure out how to selectively enable only in profitable cases. I agree with @michaelmaitland's conclusion that this should not move forward otherwise. @michaelmaitland Can you say anything about the magnitude of regression in either case? I assume they were statistically significant given you mention them, but are these small regressions or largish ones? https://github.com/llvm/llvm-project/pull/115858 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Enable ShouldTrackLaneMasks when having vector instructions (PR #115843)
preames wrote: > Ping. I went and dug through the diffs in the tests. I see no obvious evidence of performance improvement, and a couple of regressions (see vector_interleave_nxv16f64_nxv8f64). I don't think this patch should move forward unless we have a justification for why we think this is a net performance win. The easiest way to make said argument is to share measurements from some benchmark set (e.g. spec) on some vector hardware (e.g. bp3). I'll note that from a conceptual standpoint this patch does seem to make sense. My worry (triggered by the regression noted above) is that this may be exposing some other issue and that we need to unwind things a bit before this can land. https://github.com/llvm/llvm-project/pull/115843 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [RISCV] Use getSignedConstant for negative values. (#125903) (PR #125953)
https://github.com/preames approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/125953 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [RISCV] Check isFixedLengthVector before calling getVectorNumElements in getSingleShuffleSrc. (#125455) (PR #125590)
https://github.com/preames approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/125590 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Avoid breaking legal vector_shuffle with multiple uses (PR #123712)
https://github.com/preames approved this pull request. LGTM I suspect we'll want to refine the profitability here over the time, but this seems reasonable as a stepping stone. https://github.com/llvm/llvm-project/pull/123712 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)
preames wrote: > > * BuildVector w/one non-zero non-undef source, repeated 100 times (i.e. > > splat or select of two splats) > > I don't follow, this is a 2 element vector, how can you have 100 variants? Isn't the condition in code in terms of VecIn.size() == 2? I believe that VecIn is the *unique* input elements, right? Which is distinct from the number of elements in the destination type? (Am I just misreading? I only skimmed this.) > > If the target isn't optimally lowering the splat or select of splat case in > > the shuffle lowering, maybe we should just adjust the target lowering to do > > so?t > > It's not a lowering issue, it's the effect on every other combine. We'd have > to special case 1 element + 1 undef shuffles everywhere we handle > extract_vector_elt now, which is just excessive complexity. #122671 is almost > an alternative in one instance, but still shows expanding complexity of > handling this edge case. Honestly, #122671 (from the review description only) sounds like a worthwhile change. That's not a hugely compelling argument here. Let's settle the prior point, and then return to this. If I'm just misreading something, let's not waste time discussing this. https://github.com/llvm/llvm-project/pull/122672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)
https://github.com/preames commented: I don't think the heuristic here is quite what you want. I believe this heuristic disables both of the following cases: * BuildVector w/one non-zero non-undef element * BuildVector w/one non-zero non-undef source, repeated 100 times (i.e. splat or select of two splats) Disabling the former seems defensible, doing so for the second less so. Though honestly, I'm not sure of this change as a whole. Having a single canonical form seems valuable here. If the target isn't optimally lowering the splat or select of splat case in the shuffle lowering, maybe we should just adjust the target lowering to do so? https://github.com/llvm/llvm-project/pull/122672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)
preames wrote: > > Isn't the condition in code in terms of VecIn.size() == 2? I believe that > > VecIn is the _unique_ input elements, right? Which is distinct from the > > number of elements in the destination type? (Am I just misreading? I only > > skimmed this.) > > VecIn is collecting only extract_vector_elts feeding the build_vector. So > it's true it's not only a 2 element vector, in general (but the standard case > of building a complete vector is 2 elements). The other skipped elements are > all constant or undef. > > A 2 element shuffle just happens to the only case I care about which I'm > trying to make legal (and really only the odd -> even case is of any use). This is exactly the distinct I'm trying to get at. Avoiding the creation of a 1-2 element shuffle seems quite reasonable. Avoiding the creation of a 100 element splat shuffle does not. I think you need to add an explicit condition in terms of the number elements in the result, not the number of *unique* elements in the result. https://github.com/llvm/llvm-project/pull/122672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Avoid forming shufflevector from a single extract_vector_elt (PR #122672)
https://github.com/preames approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/122672 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)
@@ -16190,13 +16186,20 @@ combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC, return SDValue(); unsigned VecSize = OpSize / 8; preames wrote: Where in the code above do we have a guarantee that OpSize is a multiple of 8? https://github.com/llvm/llvm-project/pull/114971 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits