After llvm commit de2fed61528a5584dc54c47f6754408597be24de
Author: Philip Reames <listm...@philipreames.com>

    [unroll] Keep unrolled iterations with initial iteration

the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 6% from 10902 to 11518 perf samples
  - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 43% from 1494 
to 2141 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and 
"last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
will fail when triggerring benchmarking jobs if you don't have access to Linaro 
TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and 
assembly files at:
- First_bad save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/build-de2fed61528a5584dc54c47f6754408597be24de/save-temps/
- Last_good save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/build-da25f968a90ad4560fc920a6d18fc2a0221d2750/save-temps/
- Baseline save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57

This benchmarking CI is work-in-progress, and we welcome feedback and 
suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is 
to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" 
data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION 
INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3

First_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/build-de2fed61528a5584dc54c47f6754408597be24de/
Last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/build-da25f968a90ad4560fc920a6d18fc2a0221d2750/
Baseline build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/build-baseline/
Even more details: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-llvm-de2fed61528a5584dc54c47f6754408597be24de
cd investigate-llvm-de2fed61528a5584dc54c47f6754408597be24de

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/34/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /llvm/ ./ ./bisect/baseline/

cd llvm

# Reproduce first_bad build
git checkout --detach de2fed61528a5584dc54c47f6754408597be24de
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach da25f968a90ad4560fc920a6d18fc2a0221d2750
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit de2fed61528a5584dc54c47f6754408597be24de
Author: Philip Reames <listm...@philipreames.com>
Date:   Fri Nov 12 11:35:28 2021 -0800

    [unroll] Keep unrolled iterations with initial iteration
    
    The unrolling code was previously inserting new cloned blocks at the end of 
the function.  The result of this with typical loop structures is that the new 
iterations are placed far from the initial iteration.
    
    With unrolling, the general assumption is that the a) the loop is 
reasonable hot, and b) the first Count-1 copies of the loop are rarely (if 
ever) loop exiting.  As such, placing Count-1 copies out of line is a fairly 
poor code placement choice.  We'd much rather fall through into the hot 
(non-exiting) path.  For code with branch profiles, later layout would fix 
this, but this may have a positive impact on non-PGO compiled code.
    
    However, the real motivation for this change isn't performance.  Its 
readability and human understanding.  Having to jump around long distances in 
an IR file to trace an unrolled loop structure is error prone and tedious.
---
 llvm/lib/Transforms/Utils/LoopUnroll.cpp           |    6 +-
 llvm/test/DebugInfo/unrolled-loop-remainder.ll     |   86 +-
 .../Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll  |   66 +-
 .../Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll  |   24 +-
 .../LoopUnroll/AArch64/runtime-unroll-generic.ll   |    4 +-
 .../LoopUnroll/AArch64/thresholdO3-cost-model.ll   |    8 +-
 .../LoopUnroll/AArch64/unroll-upperbound.ll        |    4 +-
 .../Transforms/LoopUnroll/ARM/loop-unrolling.ll    |    4 +-
 .../test/Transforms/LoopUnroll/ARM/multi-blocks.ll |  230 +-
 llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll  |   10 +-
 .../LoopUnroll/full-unroll-keep-first-exit.ll      |   16 +-
 .../full-unroll-one-unpredictable-exit.ll          |   16 +-
 llvm/test/Transforms/LoopUnroll/multiple-exits.ll  |    8 +-
 llvm/test/Transforms/LoopUnroll/nonlatchcondbr.ll  |   20 +-
 .../LoopUnroll/partial-unroll-non-latch-exit.ll    |   14 +-
 .../partially-unroll-unconditional-latch.ll        |    4 +-
 .../LoopUnroll/runtime-loop-at-most-two-exits.ll   |  120 +-
 .../runtime-loop-multiexit-dom-verify.ll           |  206 +-
 .../LoopUnroll/runtime-loop-multiple-exits.ll      | 2560 ++++++++++----------
 llvm/test/Transforms/LoopUnroll/runtime-loop5.ll   |   34 +-
 .../LoopUnroll/runtime-multiexit-heuristic.ll      |  122 +-
 .../LoopUnroll/runtime-small-upperbound.ll         |    8 +-
 .../LoopUnroll/runtime-unroll-remainder.ll         |   62 +-
 llvm/test/Transforms/LoopUnroll/scevunroll.ll      |   48 +-
 .../Transforms/LoopUnroll/shifted-tripcount.ll     |    4 +-
 ...er-exiting-with-phis-multiple-exiting-blocks.ll |   20 +-
 .../LoopUnroll/unroll-unconditional-latch.ll       |   12 +-
 .../Transforms/LoopUnrollAndJam/unroll-and-jam.ll  |   68 +-
 .../PhaseOrdering/AArch64/matrix-extract-insert.ll |    4 +-
 29 files changed, 1896 insertions(+), 1892 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp 
b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index ce463927fd50..b0c622b98d5e 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -514,6 +514,10 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, 
UnrollLoopOptions ULO, LoopInfo *LI,
   SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;
   identifyNoAliasScopesToClone(L->getBlocks(), LoopLocalNoAliasDeclScopes);
 
+  // We place the unrolled iterations immediately after the original loop
+  // latch.  This is a reasonable default placement if we don't have block
+  // frequencies, and if we do, well the layout will be adjusted later.
+  auto BlockInsertPt = std::next(LatchBlock->getIterator());
   for (unsigned It = 1; It != ULO.Count; ++It) {
     SmallVector<BasicBlock *, 8> NewBlocks;
     SmallDenseMap<const Loop *, Loop *, 4> NewLoops;
@@ -522,7 +526,7 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, 
UnrollLoopOptions ULO, LoopInfo *LI,
     for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {
       ValueToValueMapTy VMap;
       BasicBlock *New = CloneBasicBlock(*BB, VMap, "." + Twine(It));
-      Header->getParent()->getBasicBlockList().push_back(New);
+      Header->getParent()->getBasicBlockList().insert(BlockInsertPt, New);
 
       assert((*BB != Header || LI->getLoopFor(*BB) == L) &&
              "Header should not be in a sub-loop");
diff --git a/llvm/test/DebugInfo/unrolled-loop-remainder.ll 
b/llvm/test/DebugInfo/unrolled-loop-remainder.ll
index 83c30dec780d..ba4ce1f409f6 100644
--- a/llvm/test/DebugInfo/unrolled-loop-remainder.ll
+++ b/llvm/test/DebugInfo/unrolled-loop-remainder.ll
@@ -38,71 +38,71 @@ define i32 @func_c() local_unnamed_addr #0 !dbg !14 {
 ; CHECK-NEXT:    [[PROL_ITER_SUB:%.*]] = sub i32 [[XTRAITER]], 1, !dbg 
[[DBG24]]
 ; CHECK-NEXT:    [[PROL_ITER_CMP:%.*]] = icmp ne i32 [[PROL_ITER_SUB]], 0, 
!dbg [[DBG24]]
 ; CHECK-NEXT:    br i1 [[PROL_ITER_CMP]], label [[FOR_BODY_PROL_1:%.*]], label 
[[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA:%.*]], !dbg [[DBG24]]
+; CHECK:       for.body.prol.1:
+; CHECK-NEXT:    [[ARRAYIDX_PROL_1:%.*]] = getelementptr inbounds i32, i32* 
[[TMP6]], i64 1, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX_PROL_1]], align 4, 
!dbg [[DBG28]], !tbaa [[TBAA20]]
+; CHECK-NEXT:    [[CONV_PROL_1:%.*]] = sext i32 [[TMP7]] to i64, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP8:%.*]] = inttoptr i64 [[CONV_PROL_1]] to i32*, !dbg 
[[DBG28]]
+; CHECK-NEXT:    [[ADD_PROL_1:%.*]] = add nsw i32 [[ADD_PROL]], 2, !dbg 
[[DBG29]]
+; CHECK-NEXT:    [[PROL_ITER_SUB_1:%.*]] = sub i32 [[PROL_ITER_SUB]], 1, !dbg 
[[DBG24]]
+; CHECK-NEXT:    [[PROL_ITER_CMP_1:%.*]] = icmp ne i32 [[PROL_ITER_SUB_1]], 0, 
!dbg [[DBG24]]
+; CHECK-NEXT:    br i1 [[PROL_ITER_CMP_1]], label [[FOR_BODY_PROL_2:%.*]], 
label [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]], !dbg [[DBG24]]
+; CHECK:       for.body.prol.2:
+; CHECK-NEXT:    [[ARRAYIDX_PROL_2:%.*]] = getelementptr inbounds i32, i32* 
[[TMP8]], i64 1, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX_PROL_2]], align 4, 
!dbg [[DBG28]], !tbaa [[TBAA20]]
+; CHECK-NEXT:    [[CONV_PROL_2:%.*]] = sext i32 [[TMP9]] to i64, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP10:%.*]] = inttoptr i64 [[CONV_PROL_2]] to i32*, !dbg 
[[DBG28]]
+; CHECK-NEXT:    [[ADD_PROL_2:%.*]] = add nsw i32 [[ADD_PROL_1]], 2, !dbg 
[[DBG29]]
+; CHECK-NEXT:    br label [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]]
 ; CHECK:       for.body.prol.loopexit.unr-lcssa:
-; CHECK-NEXT:    [[DOTLCSSA_UNR_PH:%.*]] = phi i32* [ [[TMP6]], 
[[FOR_BODY_PROL]] ], [ [[TMP20:%.*]], [[FOR_BODY_PROL_1]] ], [ [[TMP22:%.*]], 
[[FOR_BODY_PROL_2:%.*]] ]
-; CHECK-NEXT:    [[DOTUNR_PH:%.*]] = phi i32* [ [[TMP6]], [[FOR_BODY_PROL]] ], 
[ [[TMP20]], [[FOR_BODY_PROL_1]] ], [ [[TMP22]], [[FOR_BODY_PROL_2]] ]
-; CHECK-NEXT:    [[DOTUNR1_PH:%.*]] = phi i32 [ [[ADD_PROL]], 
[[FOR_BODY_PROL]] ], [ [[ADD_PROL_1:%.*]], [[FOR_BODY_PROL_1]] ], [ 
[[ADD_PROL_2:%.*]], [[FOR_BODY_PROL_2]] ]
+; CHECK-NEXT:    [[DOTLCSSA_UNR_PH:%.*]] = phi i32* [ [[TMP6]], 
[[FOR_BODY_PROL]] ], [ [[TMP8]], [[FOR_BODY_PROL_1]] ], [ [[TMP10]], 
[[FOR_BODY_PROL_2]] ]
+; CHECK-NEXT:    [[DOTUNR_PH:%.*]] = phi i32* [ [[TMP6]], [[FOR_BODY_PROL]] ], 
[ [[TMP8]], [[FOR_BODY_PROL_1]] ], [ [[TMP10]], [[FOR_BODY_PROL_2]] ]
+; CHECK-NEXT:    [[DOTUNR1_PH:%.*]] = phi i32 [ [[ADD_PROL]], 
[[FOR_BODY_PROL]] ], [ [[ADD_PROL_1]], [[FOR_BODY_PROL_1]] ], [ [[ADD_PROL_2]], 
[[FOR_BODY_PROL_2]] ]
 ; CHECK-NEXT:    br label [[FOR_BODY_PROL_LOOPEXIT]], !dbg [[DBG24]]
 ; CHECK:       for.body.prol.loopexit:
 ; CHECK-NEXT:    [[DOTLCSSA_UNR:%.*]] = phi i32* [ undef, [[FOR_BODY_LR_PH]] 
], [ [[DOTLCSSA_UNR_PH]], [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]] ]
 ; CHECK-NEXT:    [[DOTUNR:%.*]] = phi i32* [ [[A_PROMOTED]], 
[[FOR_BODY_LR_PH]] ], [ [[DOTUNR_PH]], [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]] ]
 ; CHECK-NEXT:    [[DOTUNR1:%.*]] = phi i32 [ [[DOTPR]], [[FOR_BODY_LR_PH]] ], 
[ [[DOTUNR1_PH]], [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]] ]
-; CHECK-NEXT:    [[TMP7:%.*]] = icmp ult i32 [[TMP3]], 3, !dbg [[DBG24]]
-; CHECK-NEXT:    br i1 [[TMP7]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], 
label [[FOR_BODY_LR_PH_NEW:%.*]], !dbg [[DBG24]]
+; CHECK-NEXT:    [[TMP11:%.*]] = icmp ult i32 [[TMP3]], 3, !dbg [[DBG24]]
+; CHECK-NEXT:    br i1 [[TMP11]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], 
label [[FOR_BODY_LR_PH_NEW:%.*]], !dbg [[DBG24]]
 ; CHECK:       for.body.lr.ph.new:
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]], !dbg [[DBG24]]
 ; CHECK:       for.body:
-; CHECK-NEXT:    [[TMP8:%.*]] = phi i32* [ [[DOTUNR]], [[FOR_BODY_LR_PH_NEW]] 
], [ [[TMP17:%.*]], [[FOR_BODY]] ], !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP9:%.*]] = phi i32 [ [[DOTUNR1]], [[FOR_BODY_LR_PH_NEW]] 
], [ [[ADD_3:%.*]], [[FOR_BODY]] ]
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[TMP8]], 
i64 1, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX]], align 4, !dbg 
[[DBG28]], !tbaa [[TBAA20]]
-; CHECK-NEXT:    [[CONV:%.*]] = sext i32 [[TMP10]] to i64, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP11:%.*]] = inttoptr i64 [[CONV]] to i32*, !dbg [[DBG28]]
-; CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP9]], 2, !dbg [[DBG29]]
-; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* 
[[TMP11]], i64 1, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP12:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4, !dbg 
[[DBG28]], !tbaa [[TBAA20]]
-; CHECK-NEXT:    [[CONV_1:%.*]] = sext i32 [[TMP12]] to i64, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP13:%.*]] = inttoptr i64 [[CONV_1]] to i32*, !dbg 
[[DBG28]]
+; CHECK-NEXT:    [[TMP12:%.*]] = phi i32* [ [[DOTUNR]], [[FOR_BODY_LR_PH_NEW]] 
], [ [[TMP21:%.*]], [[FOR_BODY]] ], !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP13:%.*]] = phi i32 [ [[DOTUNR1]], [[FOR_BODY_LR_PH_NEW]] 
], [ [[ADD_3:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* 
[[TMP12]], i64 1, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX]], align 4, !dbg 
[[DBG28]], !tbaa [[TBAA20]]
+; CHECK-NEXT:    [[CONV:%.*]] = sext i32 [[TMP14]] to i64, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP15:%.*]] = inttoptr i64 [[CONV]] to i32*, !dbg [[DBG28]]
+; CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP13]], 2, !dbg [[DBG29]]
+; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* 
[[TMP15]], i64 1, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP16:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4, !dbg 
[[DBG28]], !tbaa [[TBAA20]]
+; CHECK-NEXT:    [[CONV_1:%.*]] = sext i32 [[TMP16]] to i64, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP17:%.*]] = inttoptr i64 [[CONV_1]] to i32*, !dbg 
[[DBG28]]
 ; CHECK-NEXT:    [[ADD_1:%.*]] = add nsw i32 [[ADD]], 2, !dbg [[DBG29]]
-; CHECK-NEXT:    [[ARRAYIDX_2:%.*]] = getelementptr inbounds i32, i32* 
[[TMP13]], i64 1, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX_2]], align 4, !dbg 
[[DBG28]], !tbaa [[TBAA20]]
-; CHECK-NEXT:    [[CONV_2:%.*]] = sext i32 [[TMP14]] to i64, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP15:%.*]] = inttoptr i64 [[CONV_2]] to i32*, !dbg 
[[DBG28]]
+; CHECK-NEXT:    [[ARRAYIDX_2:%.*]] = getelementptr inbounds i32, i32* 
[[TMP17]], i64 1, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP18:%.*]] = load i32, i32* [[ARRAYIDX_2]], align 4, !dbg 
[[DBG28]], !tbaa [[TBAA20]]
+; CHECK-NEXT:    [[CONV_2:%.*]] = sext i32 [[TMP18]] to i64, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP19:%.*]] = inttoptr i64 [[CONV_2]] to i32*, !dbg 
[[DBG28]]
 ; CHECK-NEXT:    [[ADD_2:%.*]] = add nsw i32 [[ADD_1]], 2, !dbg [[DBG29]]
-; CHECK-NEXT:    [[ARRAYIDX_3:%.*]] = getelementptr inbounds i32, i32* 
[[TMP15]], i64 1, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP16:%.*]] = load i32, i32* [[ARRAYIDX_3]], align 4, !dbg 
[[DBG28]], !tbaa [[TBAA20]]
-; CHECK-NEXT:    [[CONV_3:%.*]] = sext i32 [[TMP16]] to i64, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP17]] = inttoptr i64 [[CONV_3]] to i32*, !dbg [[DBG28]]
+; CHECK-NEXT:    [[ARRAYIDX_3:%.*]] = getelementptr inbounds i32, i32* 
[[TMP19]], i64 1, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP20:%.*]] = load i32, i32* [[ARRAYIDX_3]], align 4, !dbg 
[[DBG28]], !tbaa [[TBAA20]]
+; CHECK-NEXT:    [[CONV_3:%.*]] = sext i32 [[TMP20]] to i64, !dbg [[DBG28]]
+; CHECK-NEXT:    [[TMP21]] = inttoptr i64 [[CONV_3]] to i32*, !dbg [[DBG28]]
 ; CHECK-NEXT:    [[ADD_3]] = add nsw i32 [[ADD_2]], 2, !dbg [[DBG29]]
 ; CHECK-NEXT:    [[TOBOOL_3:%.*]] = icmp eq i32 [[ADD_3]], 0, !dbg [[DBG24]]
 ; CHECK-NEXT:    br i1 [[TOBOOL_3]], label 
[[FOR_COND_FOR_END_CRIT_EDGE_UNR_LCSSA:%.*]], label [[FOR_BODY]], !dbg 
[[DBG24]], !llvm.loop [[LOOP30:![0-9]+]]
 ; CHECK:       for.cond.for.end_crit_edge.unr-lcssa:
-; CHECK-NEXT:    [[DOTLCSSA_PH:%.*]] = phi i32* [ [[TMP17]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[DOTLCSSA_PH:%.*]] = phi i32* [ [[TMP21]], [[FOR_BODY]] ]
 ; CHECK-NEXT:    br label [[FOR_COND_FOR_END_CRIT_EDGE]], !dbg [[DBG24]]
 ; CHECK:       for.cond.for.end_crit_edge:
 ; CHECK-NEXT:    [[DOTLCSSA:%.*]] = phi i32* [ [[DOTLCSSA_UNR]], 
[[FOR_BODY_PROL_LOOPEXIT]] ], [ [[DOTLCSSA_PH]], 
[[FOR_COND_FOR_END_CRIT_EDGE_UNR_LCSSA]] ], !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP18:%.*]] = add i32 [[TMP2]], 2, !dbg [[DBG24]]
+; CHECK-NEXT:    [[TMP22:%.*]] = add i32 [[TMP2]], 2, !dbg [[DBG24]]
 ; CHECK-NEXT:    store i32* [[DOTLCSSA]], i32** @a, align 8, !dbg [[DBG25]], 
!tbaa [[TBAA26]]
-; CHECK-NEXT:    store i32 [[TMP18]], i32* @b, align 4, !dbg 
[[DBG33:![0-9]+]], !tbaa [[TBAA20]]
+; CHECK-NEXT:    store i32 [[TMP22]], i32* @b, align 4, !dbg 
[[DBG33:![0-9]+]], !tbaa [[TBAA20]]
 ; CHECK-NEXT:    br label [[FOR_END]], !dbg [[DBG24]]
 ; CHECK:       for.end:
 ; CHECK-NEXT:    ret i32 undef, !dbg [[DBG34:![0-9]+]]
-; CHECK:       for.body.prol.1:
-; CHECK-NEXT:    [[ARRAYIDX_PROL_1:%.*]] = getelementptr inbounds i32, i32* 
[[TMP6]], i64 1, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP19:%.*]] = load i32, i32* [[ARRAYIDX_PROL_1]], align 4, 
!dbg [[DBG28]], !tbaa [[TBAA20]]
-; CHECK-NEXT:    [[CONV_PROL_1:%.*]] = sext i32 [[TMP19]] to i64, !dbg 
[[DBG28]]
-; CHECK-NEXT:    [[TMP20]] = inttoptr i64 [[CONV_PROL_1]] to i32*, !dbg 
[[DBG28]]
-; CHECK-NEXT:    [[ADD_PROL_1]] = add nsw i32 [[ADD_PROL]], 2, !dbg [[DBG29]]
-; CHECK-NEXT:    [[PROL_ITER_SUB_1:%.*]] = sub i32 [[PROL_ITER_SUB]], 1, !dbg 
[[DBG24]]
-; CHECK-NEXT:    [[PROL_ITER_CMP_1:%.*]] = icmp ne i32 [[PROL_ITER_SUB_1]], 0, 
!dbg [[DBG24]]
-; CHECK-NEXT:    br i1 [[PROL_ITER_CMP_1]], label [[FOR_BODY_PROL_2]], label 
[[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]], !dbg [[DBG24]]
-; CHECK:       for.body.prol.2:
-; CHECK-NEXT:    [[ARRAYIDX_PROL_2:%.*]] = getelementptr inbounds i32, i32* 
[[TMP20]], i64 1, !dbg [[DBG28]]
-; CHECK-NEXT:    [[TMP21:%.*]] = load i32, i32* [[ARRAYIDX_PROL_2]], align 4, 
!dbg [[DBG28]], !tbaa [[TBAA20]]
-; CHECK-NEXT:    [[CONV_PROL_2:%.*]] = sext i32 [[TMP21]] to i64, !dbg 
[[DBG28]]
-; CHECK-NEXT:    [[TMP22]] = inttoptr i64 [[CONV_PROL_2]] to i32*, !dbg 
[[DBG28]]
-; CHECK-NEXT:    [[ADD_PROL_2]] = add nsw i32 [[ADD_PROL_1]], 2, !dbg [[DBG29]]
-; CHECK-NEXT:    br label [[FOR_BODY_PROL_LOOPEXIT_UNR_LCSSA]]
 ;
 entry:
   %.pr = load i32, i32* @b, align 4, !dbg !17, !tbaa !20
diff --git a/llvm/test/Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll 
b/llvm/test/Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll
index 3e611430d69e..7bb2d732195a 100644
--- a/llvm/test/Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll
+++ b/llvm/test/Transforms/LoopUnroll/2011-08-08-PhiUpdate.ll
@@ -17,24 +17,24 @@ define void @test1(i32 %i, i32 %j) nounwind uwtable ssp {
 ; CHECK-NEXT:    [[SUB5:%.*]] = sub i32 [[SUB]], [[J:%.*]]
 ; CHECK-NEXT:    [[COND2:%.*]] = call zeroext i1 @check()
 ; CHECK-NEXT:    br i1 [[COND2]], label [[IF_THEN_LOOPEXIT:%.*]], label 
[[IF_ELSE_1:%.*]]
-; CHECK:       if.then.loopexit:
-; CHECK-NEXT:    [[SUB5_LCSSA:%.*]] = phi i32 [ [[SUB5]], [[IF_ELSE]] ], [ 
[[SUB5_1:%.*]], [[IF_ELSE_1]] ], [ [[SUB5_2:%.*]], [[IF_ELSE_2:%.*]] ], [ 
[[SUB5_3]], [[IF_ELSE_3]] ]
-; CHECK-NEXT:    br label [[IF_THEN]]
-; CHECK:       if.then:
-; CHECK-NEXT:    [[I_TR:%.*]] = phi i32 [ [[I]], [[ENTRY:%.*]] ], [ 
[[SUB5_LCSSA]], [[IF_THEN_LOOPEXIT]] ]
-; CHECK-NEXT:    ret void
 ; CHECK:       if.else.1:
-; CHECK-NEXT:    [[SUB5_1]] = sub i32 [[SUB5]], [[J]]
+; CHECK-NEXT:    [[SUB5_1:%.*]] = sub i32 [[SUB5]], [[J]]
 ; CHECK-NEXT:    [[COND2_1:%.*]] = call zeroext i1 @check()
-; CHECK-NEXT:    br i1 [[COND2_1]], label [[IF_THEN_LOOPEXIT]], label 
[[IF_ELSE_2]]
+; CHECK-NEXT:    br i1 [[COND2_1]], label [[IF_THEN_LOOPEXIT]], label 
[[IF_ELSE_2:%.*]]
 ; CHECK:       if.else.2:
-; CHECK-NEXT:    [[SUB5_2]] = sub i32 [[SUB5_1]], [[J]]
+; CHECK-NEXT:    [[SUB5_2:%.*]] = sub i32 [[SUB5_1]], [[J]]
 ; CHECK-NEXT:    [[COND2_2:%.*]] = call zeroext i1 @check()
 ; CHECK-NEXT:    br i1 [[COND2_2]], label [[IF_THEN_LOOPEXIT]], label 
[[IF_ELSE_3]]
 ; CHECK:       if.else.3:
 ; CHECK-NEXT:    [[SUB5_3]] = sub i32 [[SUB5_2]], [[J]]
 ; CHECK-NEXT:    [[COND2_3:%.*]] = call zeroext i1 @check()
 ; CHECK-NEXT:    br i1 [[COND2_3]], label [[IF_THEN_LOOPEXIT]], label 
[[IF_ELSE]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       if.then.loopexit:
+; CHECK-NEXT:    [[SUB5_LCSSA:%.*]] = phi i32 [ [[SUB5]], [[IF_ELSE]] ], [ 
[[SUB5_1]], [[IF_ELSE_1]] ], [ [[SUB5_2]], [[IF_ELSE_2]] ], [ [[SUB5_3]], 
[[IF_ELSE_3]] ]
+; CHECK-NEXT:    br label [[IF_THEN]]
+; CHECK:       if.then:
+; CHECK-NEXT:    [[I_TR:%.*]] = phi i32 [ [[I]], [[ENTRY:%.*]] ], [ 
[[SUB5_LCSSA]], [[IF_THEN_LOOPEXIT]] ]
+; CHECK-NEXT:    ret void
 ;
 entry:
   %cond1 = call zeroext i1 @check()
@@ -77,17 +77,11 @@ define i32 @test2(i32* nocapture %p, i32 %n) nounwind 
readonly {
 ; CHECK-NEXT:    [[INDVAR_NEXT:%.*]] = add nuw nsw i64 [[INDVAR]], 1
 ; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[INDVAR_NEXT]], [[TMP]]
 ; CHECK-NEXT:    br i1 [[EXITCOND]], label [[BB_1:%.*]], label 
[[BB1_BB2_CRIT_EDGE:%.*]]
-; CHECK:       bb1.bb2_crit_edge:
-; CHECK-NEXT:    [[DOTLCSSA:%.*]] = phi i32 [ [[TMP2]], [[BB1]] ], [ 
[[TMP4:%.*]], [[BB1_1:%.*]] ], [ [[TMP6:%.*]], [[BB1_2:%.*]] ], [ [[TMP8]], 
[[BB1_3]] ]
-; CHECK-NEXT:    br label [[BB2]]
-; CHECK:       bb2:
-; CHECK-NEXT:    [[S_0_LCSSA:%.*]] = phi i32 [ [[DOTLCSSA]], 
[[BB1_BB2_CRIT_EDGE]] ], [ 0, [[ENTRY:%.*]] ]
-; CHECK-NEXT:    ret i32 [[S_0_LCSSA]]
 ; CHECK:       bb.1:
 ; CHECK-NEXT:    [[SCEVGEP_1:%.*]] = getelementptr i32, i32* [[P]], i64 
[[INDVAR_NEXT]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = load i32, i32* [[SCEVGEP_1]], align 1
-; CHECK-NEXT:    [[TMP4]] = add nsw i32 [[TMP3]], [[TMP2]]
-; CHECK-NEXT:    br label [[BB1_1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = add nsw i32 [[TMP3]], [[TMP2]]
+; CHECK-NEXT:    br label [[BB1_1:%.*]]
 ; CHECK:       bb1.1:
 ; CHECK-NEXT:    [[INDVAR_NEXT_1:%.*]] = add nuw nsw i64 [[INDVAR_NEXT]], 1
 ; CHECK-NEXT:    [[EXITCOND_1:%.*]] = icmp ne i64 [[INDVAR_NEXT_1]], [[TMP]]
@@ -95,8 +89,8 @@ define i32 @test2(i32* nocapture %p, i32 %n) nounwind 
readonly {
 ; CHECK:       bb.2:
 ; CHECK-NEXT:    [[SCEVGEP_2:%.*]] = getelementptr i32, i32* [[P]], i64 
[[INDVAR_NEXT_1]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[SCEVGEP_2]], align 1
-; CHECK-NEXT:    [[TMP6]] = add nsw i32 [[TMP5]], [[TMP4]]
-; CHECK-NEXT:    br label [[BB1_2]]
+; CHECK-NEXT:    [[TMP6:%.*]] = add nsw i32 [[TMP5]], [[TMP4]]
+; CHECK-NEXT:    br label [[BB1_2:%.*]]
 ; CHECK:       bb1.2:
 ; CHECK-NEXT:    [[INDVAR_NEXT_2:%.*]] = add nuw nsw i64 [[INDVAR_NEXT_1]], 1
 ; CHECK-NEXT:    [[EXITCOND_2:%.*]] = icmp ne i64 [[INDVAR_NEXT_2]], [[TMP]]
@@ -110,6 +104,12 @@ define i32 @test2(i32* nocapture %p, i32 %n) nounwind 
readonly {
 ; CHECK-NEXT:    [[INDVAR_NEXT_3]] = add i64 [[INDVAR_NEXT_2]], 1
 ; CHECK-NEXT:    [[EXITCOND_3:%.*]] = icmp ne i64 [[INDVAR_NEXT_3]], [[TMP]]
 ; CHECK-NEXT:    br i1 [[EXITCOND_3]], label [[BB]], label 
[[BB1_BB2_CRIT_EDGE]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK:       bb1.bb2_crit_edge:
+; CHECK-NEXT:    [[DOTLCSSA:%.*]] = phi i32 [ [[TMP2]], [[BB1]] ], [ [[TMP4]], 
[[BB1_1]] ], [ [[TMP6]], [[BB1_2]] ], [ [[TMP8]], [[BB1_3]] ]
+; CHECK-NEXT:    br label [[BB2]]
+; CHECK:       bb2:
+; CHECK-NEXT:    [[S_0_LCSSA:%.*]] = phi i32 [ [[DOTLCSSA]], 
[[BB1_BB2_CRIT_EDGE]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret i32 [[S_0_LCSSA]]
 ;
 entry:
   %0 = icmp sgt i32 %n, 0                         ; <i1> [#uses=1]
@@ -162,20 +162,12 @@ define i32 @test3() nounwind uwtable ssp align 2 {
 ; CHECK:       do.cond:
 ; CHECK-NEXT:    [[COND3:%.*]] = call zeroext i1 @check()
 ; CHECK-NEXT:    br i1 [[COND3]], label [[DO_END:%.*]], label [[DO_BODY_1:%.*]]
-; CHECK:       do.end:
-; CHECK-NEXT:    br label [[RETURN]]
-; CHECK:       return.loopexit:
-; CHECK-NEXT:    [[TMP7_I_LCSSA:%.*]] = phi i32 [ [[TMP7_I]], 
[[LAND_LHS_TRUE]] ], [ [[TMP7_I_1:%.*]], [[LAND_LHS_TRUE_1:%.*]] ], [ 
[[TMP7_I_2:%.*]], [[LAND_LHS_TRUE_2:%.*]] ], [ [[TMP7_I_3:%.*]], 
[[LAND_LHS_TRUE_3:%.*]] ]
-; CHECK-NEXT:    br label [[RETURN]]
-; CHECK:       return:
-; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ 0, [[DO_END]] ], [ 0, 
[[ENTRY:%.*]] ], [ [[TMP7_I_LCSSA]], [[RETURN_LOOPEXIT]] ]
-; CHECK-NEXT:    ret i32 [[RETVAL_0]]
 ; CHECK:       do.body.1:
 ; CHECK-NEXT:    [[COND2_1:%.*]] = call zeroext i1 @check()
 ; CHECK-NEXT:    br i1 [[COND2_1]], label [[EXIT_1:%.*]], label 
[[DO_COND_1:%.*]]
 ; CHECK:       exit.1:
-; CHECK-NEXT:    [[TMP7_I_1]] = load i32, i32* undef, align 8
-; CHECK-NEXT:    br i1 undef, label [[DO_COND_1]], label [[LAND_LHS_TRUE_1]]
+; CHECK-NEXT:    [[TMP7_I_1:%.*]] = load i32, i32* undef, align 8
+; CHECK-NEXT:    br i1 undef, label [[DO_COND_1]], label 
[[LAND_LHS_TRUE_1:%.*]]
 ; CHECK:       land.lhs.true.1:
 ; CHECK-NEXT:    br i1 true, label [[RETURN_LOOPEXIT]], label [[DO_COND_1]]
 ; CHECK:       do.cond.1:
@@ -185,8 +177,8 @@ define i32 @test3() nounwind uwtable ssp align 2 {
 ; CHECK-NEXT:    [[COND2_2:%.*]] = call zeroext i1 @check()
 ; CHECK-NEXT:    br i1 [[COND2_2]], label [[EXIT_2:%.*]], label 
[[DO_COND_2:%.*]]
 ; CHECK:       exit.2:
-; CHECK-NEXT:    [[TMP7_I_2]] = load i32, i32* undef, align 8
-; CHECK-NEXT:    br i1 undef, label [[DO_COND_2]], label [[LAND_LHS_TRUE_2]]
+; CHECK-NEXT:    [[TMP7_I_2:%.*]] = load i32, i32* undef, align 8
+; CHECK-NEXT:    br i1 undef, label [[DO_COND_2]], label 
[[LAND_LHS_TRUE_2:%.*]]
 ; CHECK:       land.lhs.true.2:
 ; CHECK-NEXT:    br i1 true, label [[RETURN_LOOPEXIT]], label [[DO_COND_2]]
 ; CHECK:       do.cond.2:
@@ -196,13 +188,21 @@ define i32 @test3() nounwind uwtable ssp align 2 {
 ; CHECK-NEXT:    [[COND2_3:%.*]] = call zeroext i1 @check()
 ; CHECK-NEXT:    br i1 [[COND2_3]], label [[EXIT_3:%.*]], label 
[[DO_COND_3:%.*]]
 ; CHECK:       exit.3:
-; CHECK-NEXT:    [[TMP7_I_3]] = load i32, i32* undef, align 8
-; CHECK-NEXT:    br i1 undef, label [[DO_COND_3]], label [[LAND_LHS_TRUE_3]]
+; CHECK-NEXT:    [[TMP7_I_3:%.*]] = load i32, i32* undef, align 8
+; CHECK-NEXT:    br i1 undef, label [[DO_COND_3]], label 
[[LAND_LHS_TRUE_3:%.*]]
 ; CHECK:       land.lhs.true.3:
 ; CHECK-NEXT:    br i1 true, label [[RETURN_LOOPEXIT]], label [[DO_COND_3]]
 ; CHECK:       do.cond.3:
 ; CHECK-NEXT:    [[COND3_3:%.*]] = call zeroext i1 @check()
 ; CHECK-NEXT:    br i1 [[COND3_3]], label [[DO_END]], label [[DO_BODY]], 
!llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       do.end:
+; CHECK-NEXT:    br label [[RETURN]]
+; CHECK:       return.loopexit:
+; CHECK-NEXT:    [[TMP7_I_LCSSA:%.*]] = phi i32 [ [[TMP7_I]], 
[[LAND_LHS_TRUE]] ], [ [[TMP7_I_1]], [[LAND_LHS_TRUE_1]] ], [ [[TMP7_I_2]], 
[[LAND_LHS_TRUE_2]] ], [ [[TMP7_I_3]], [[LAND_LHS_TRUE_3]] ]
+; CHECK-NEXT:    br label [[RETURN]]
+; CHECK:       return:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ 0, [[DO_END]] ], [ 0, 
[[ENTRY:%.*]] ], [ [[TMP7_I_LCSSA]], [[RETURN_LOOPEXIT]] ]
+; CHECK-NEXT:    ret i32 [[RETVAL_0]]
 ;
 entry:
   %cond1 = call zeroext i1 @check()
diff --git a/llvm/test/Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll 
b/llvm/test/Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll
index be4b6ff64fdd..af648bae8642 100644
--- a/llvm/test/Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll
+++ b/llvm/test/Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll
@@ -33,16 +33,13 @@ define i32 @foo() uwtable ssp align 2 {
 ; CHECK:       do.cond:
 ; CHECK-NEXT:    [[CMP18:%.*]] = icmp sgt i32 [[CALL2]], -1
 ; CHECK-NEXT:    br i1 [[CMP18]], label [[LAND_LHS_TRUE_I_1:%.*]], label 
[[RETURN]]
-; CHECK:       return:
-; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ [[TMP7_I]], [[LAND_LHS_TRUE]] ], 
[ 0, [[DO_COND]] ], [ [[TMP7_I_1:%.*]], [[LAND_LHS_TRUE_1:%.*]] ], [ 0, 
[[DO_COND_1:%.*]] ], [ [[TMP7_I_2:%.*]], [[LAND_LHS_TRUE_2:%.*]] ], [ 0, 
[[DO_COND_2:%.*]] ], [ [[TMP7_I_3:%.*]], [[LAND_LHS_TRUE_3:%.*]] ], [ 0, 
[[DO_COND_3:%.*]] ]
-; CHECK-NEXT:    ret i32 [[RETVAL_0]]
 ; CHECK:       land.lhs.true.i.1:
 ; CHECK-NEXT:    [[CMP4_I_1:%.*]] = call zeroext i1 @check() #[[ATTR0]]
-; CHECK-NEXT:    br i1 [[CMP4_I_1]], label [[BAR_EXIT_1:%.*]], label 
[[DO_COND_1]]
+; CHECK-NEXT:    br i1 [[CMP4_I_1]], label [[BAR_EXIT_1:%.*]], label 
[[DO_COND_1:%.*]]
 ; CHECK:       bar.exit.1:
-; CHECK-NEXT:    [[TMP7_I_1]] = call i32 @getval() #[[ATTR0]]
+; CHECK-NEXT:    [[TMP7_I_1:%.*]] = call i32 @getval() #[[ATTR0]]
 ; CHECK-NEXT:    [[CMP_NOT_1:%.*]] = icmp eq i32 [[TMP7_I_1]], 0
-; CHECK-NEXT:    br i1 [[CMP_NOT_1]], label [[DO_COND_1]], label 
[[LAND_LHS_TRUE_1]]
+; CHECK-NEXT:    br i1 [[CMP_NOT_1]], label [[DO_COND_1]], label 
[[LAND_LHS_TRUE_1:%.*]]
 ; CHECK:       land.lhs.true.1:
 ; CHECK-NEXT:    [[CALL10_1:%.*]] = call i32 @getval()
 ; CHECK-NEXT:    [[CMP11_1:%.*]] = icmp eq i32 [[CALL10_1]], 0
@@ -52,11 +49,11 @@ define i32 @foo() uwtable ssp align 2 {
 ; CHECK-NEXT:    br i1 [[CMP18_1]], label [[LAND_LHS_TRUE_I_2:%.*]], label 
[[RETURN]]
 ; CHECK:       land.lhs.true.i.2:
 ; CHECK-NEXT:    [[CMP4_I_2:%.*]] = call zeroext i1 @check() #[[ATTR0]]
-; CHECK-NEXT:    br i1 [[CMP4_I_2]], label [[BAR_EXIT_2:%.*]], label 
[[DO_COND_2]]
+; CHECK-NEXT:    br i1 [[CMP4_I_2]], label [[BAR_EXIT_2:%.*]], label 
[[DO_COND_2:%.*]]
 ; CHECK:       bar.exit.2:
-; CHECK-NEXT:    [[TMP7_I_2]] = call i32 @getval() #[[ATTR0]]
+; CHECK-NEXT:    [[TMP7_I_2:%.*]] = call i32 @getval() #[[ATTR0]]
 ; CHECK-NEXT:    [[CMP_NOT_2:%.*]] = icmp eq i32 [[TMP7_I_2]], 0
-; CHECK-NEXT:    br i1 [[CMP_NOT_2]], label [[DO_COND_2]], label 
[[LAND_LHS_TRUE_2]]
+; CHECK-NEXT:    br i1 [[CMP_NOT_2]], label [[DO_COND_2]], label 
[[LAND_LHS_TRUE_2:%.*]]
 ; CHECK:       land.lhs.true.2:
 ; CHECK-NEXT:    [[CALL10_2:%.*]] = call i32 @getval()
 ; CHECK-NEXT:    [[CMP11_2:%.*]] = icmp eq i32 [[CALL10_2]], 0
@@ -66,11 +63,11 @@ define i32 @foo() uwtable ssp align 2 {
 ; CHECK-NEXT:    br i1 [[CMP18_2]], label [[LAND_LHS_TRUE_I_3:%.*]], label 
[[RETURN]]
 ; CHECK:       land.lhs.true.i.3:
 ; CHECK-NEXT:    [[CMP4_I_3:%.*]] = call zeroext i1 @check() #[[ATTR0]]
-; CHECK-NEXT:    br i1 [[CMP4_I_3]], label [[BAR_EXIT_3:%.*]], label 
[[DO_COND_3]]
+; CHECK-NEXT:    br i1 [[CMP4_I_3]], label [[BAR_EXIT_3:%.*]], label 
[[DO_COND_3:%.*]]
 ; CHECK:       bar.exit.3:
-; CHECK-NEXT:    [[TMP7_I_3]] = call i32 @getval() #[[ATTR0]]
+; CHECK-NEXT:    [[TMP7_I_3:%.*]] = call i32 @getval() #[[ATTR0]]
 ; CHECK-NEXT:    [[CMP_NOT_3:%.*]] = icmp eq i32 [[TMP7_I_3]], 0
-; CHECK-NEXT:    br i1 [[CMP_NOT_3]], label [[DO_COND_3]], label 
[[LAND_LHS_TRUE_3]]
+; CHECK-NEXT:    br i1 [[CMP_NOT_3]], label [[DO_COND_3]], label 
[[LAND_LHS_TRUE_3:%.*]]
 ; CHECK:       land.lhs.true.3:
 ; CHECK-NEXT:    [[CALL10_3:%.*]] = call i32 @getval()
 ; CHECK-NEXT:    [[CMP11_3:%.*]] = icmp eq i32 [[CALL10_3]], 0
@@ -78,6 +75,9 @@ define i32 @foo() uwtable ssp align 2 {
 ; CHECK:       do.cond.3:
 ; CHECK-NEXT:    [[CMP18_3:%.*]] = icmp sgt i32 [[CALL2]], -1
 ; CHECK-NEXT:    br i1 [[CMP18_3]], label [[LAND_LHS_TRUE_I]], label 
[[RETURN]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       return:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ [[TMP7_I]], [[LAND_LHS_TRUE]] ], 
[ 0, [[DO_COND]] ], [ [[TMP7_I_1]], [[LAND_LHS_TRUE_1]] ], [ 0, [[DO_COND_1]] 
], [ [[TMP7_I_2]], [[LAND_LHS_TRUE_2]] ], [ 0, [[DO_COND_2]] ], [ [[TMP7_I_3]], 
[[LAND_LHS_TRUE_3]] ], [ 0, [[DO_COND_3]] ]
+; CHECK-NEXT:    ret i32 [[RETVAL_0]]
 ;
 entry:
   br i1 undef, label %return, label %if.end
diff --git a/llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll 
b/llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll
index 5bbab929c936..5c8f9ca01679 100644
--- a/llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll
+++ b/llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll
@@ -67,8 +67,6 @@ define void @runtime_unroll_generic(i32 %arg_0, i32* %arg_1, 
i16* %arg_2, i16* %
 ; CHECK-A55-NEXT:    store i32 [[ADD21_EPIL]], i32* [[ARRAYIDX20]], align 4
 ; CHECK-A55-NEXT:    [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i32 [[XTRAITER]], 1
 ; CHECK-A55-NEXT:    br i1 [[EPIL_ITER_CMP_NOT]], label [[FOR_END]], label 
[[FOR_BODY6_EPIL_1:%.*]]
-; CHECK-A55:       for.end:
-; CHECK-A55-NEXT:    ret void
 ; CHECK-A55:       for.body6.epil.1:
 ; CHECK-A55-NEXT:    [[TMP14:%.*]] = load i16, i16* [[ARRAYIDX10]], align 2
 ; CHECK-A55-NEXT:    [[CONV_EPIL_1:%.*]] = sext i16 [[TMP14]] to i32
@@ -90,6 +88,8 @@ define void @runtime_unroll_generic(i32 %arg_0, i32* %arg_1, 
i16* %arg_2, i16* %
 ; CHECK-A55-NEXT:    [[ADD21_EPIL_2:%.*]] = add nsw i32 [[MUL16_EPIL_2]], 
[[TMP19]]
 ; CHECK-A55-NEXT:    store i32 [[ADD21_EPIL_2]], i32* [[ARRAYIDX20]], align 4
 ; CHECK-A55-NEXT:    br label [[FOR_END]]
+; CHECK-A55:       for.end:
+; CHECK-A55-NEXT:    ret void
 ;
 ; CHECK-GENERIC-LABEL: @runtime_unroll_generic(
 ; CHECK-GENERIC-NEXT:  entry:
diff --git a/llvm/test/Transforms/LoopUnroll/AArch64/thresholdO3-cost-model.ll 
b/llvm/test/Transforms/LoopUnroll/AArch64/thresholdO3-cost-model.ll
index ee07518f8cac..5c6ac690c0ca 100644
--- a/llvm/test/Transforms/LoopUnroll/AArch64/thresholdO3-cost-model.ll
+++ b/llvm/test/Transforms/LoopUnroll/AArch64/thresholdO3-cost-model.ll
@@ -21,10 +21,6 @@ define i32 @tripcount_11() {
 ; CHECK-NEXT:    br label [[DO_BODY6:%.*]]
 ; CHECK:       for.cond:
 ; CHECK-NEXT:    br i1 true, label [[FOR_COND_1:%.*]], label [[IF_THEN11:%.*]]
-; CHECK:       do.body6:
-; CHECK-NEXT:    br i1 true, label [[FOR_COND:%.*]], label [[IF_THEN11]]
-; CHECK:       if.then11:
-; CHECK-NEXT:    unreachable
 ; CHECK:       for.cond.1:
 ; CHECK-NEXT:    br i1 true, label [[FOR_COND_2:%.*]], label [[IF_THEN11]]
 ; CHECK:       for.cond.2:
@@ -45,6 +41,10 @@ define i32 @tripcount_11() {
 ; CHECK-NEXT:    br i1 true, label [[FOR_COND_10:%.*]], label [[IF_THEN11]]
 ; CHECK:       for.cond.10:
 ; CHECK-NEXT:    ret i32 0
+; CHECK:       do.body6:
+; CHECK-NEXT:    br i1 true, label [[FOR_COND:%.*]], label [[IF_THEN11]]
+; CHECK:       if.then11:
+; CHECK-NEXT:    unreachable
 ;
 do.body6.preheader:
   br label %do.body6
diff --git a/llvm/test/Transforms/LoopUnroll/AArch64/unroll-upperbound.ll 
b/llvm/test/Transforms/LoopUnroll/AArch64/unroll-upperbound.ll
index 3b82365d1a6e..ee905e5b10fe 100644
--- a/llvm/test/Transforms/LoopUnroll/AArch64/unroll-upperbound.ll
+++ b/llvm/test/Transforms/LoopUnroll/AArch64/unroll-upperbound.ll
@@ -18,8 +18,6 @@ define void @test(i1 %cond) {
 ; CHECK-NEXT:    br label [[LATCH]]
 ; CHECK:       latch:
 ; CHECK-NEXT:    br i1 false, label [[FOR_END:%.*]], label [[FOR_BODY_1:%.*]]
-; CHECK:       for.end:
-; CHECK-NEXT:    ret void
 ; CHECK:       for.body.1:
 ; CHECK-NEXT:    switch i32 1, label [[SW_DEFAULT_1:%.*]] [
 ; CHECK-NEXT:    i32 2, label [[LATCH_1:%.*]]
@@ -38,6 +36,8 @@ define void @test(i1 %cond) {
 ; CHECK-NEXT:    br label [[LATCH_2]]
 ; CHECK:       latch.2:
 ; CHECK-NEXT:    br label [[FOR_END]]
+; CHECK:       for.end:
+; CHECK-NEXT:    ret void
 ;
 entry:
   %0 = select i1 %cond, i32 2, i32 3
diff --git a/llvm/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll 
b/llvm/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll
index f2e748ade0a2..e12dbf031b3b 100644
--- a/llvm/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll
+++ b/llvm/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll
@@ -121,14 +121,14 @@ for.body4:
 ; CHECK-NOUNROLL: br
 
 ; CHECK-UNROLL: for.body4.epil:
+; CHECK-UNROLL: for.body4.epil.1:
+; CHECK-UNROLL: for.body4.epil.2:
 ; CHECK-UNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z0-9.]+]] ], [ 
[[IV4:%[a-z.0-9]+]], %for.body4 ]
 ; CHECK-UNROLL: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1
 ; CHECK-UNROLL: [[IV2:%[a-z.0-9]+]] = add nuw nsw i32 [[IV1]], 1
 ; CHECK-UNROLL: [[IV3:%[a-z.0-9]+]] = add nuw nsw i32 [[IV2]], 1
 ; CHECK-UNROLL: [[IV4]] = add nuw i32 [[IV3]], 1
 ; CHECK-UNROLL: br
-; CHECK-UNROLL: for.body4.epil.1:
-; CHECK-UNROLL: for.body4.epil.2:
 
   %w.024 = phi i32 [ 0, %for.body4.lr.ph ], [ %inc, %for.body4 ]
   %add = add i32 %w.024, %mul
diff --git a/llvm/test/Transforms/LoopUnroll/ARM/multi-blocks.ll 
b/llvm/test/Transforms/LoopUnroll/ARM/multi-blocks.ll
index 156c0ab10658..8c4257698ab7 100644
--- a/llvm/test/Transforms/LoopUnroll/ARM/multi-blocks.ll
+++ b/llvm/test/Transforms/LoopUnroll/ARM/multi-blocks.ll
@@ -45,8 +45,37 @@ define void @test_three_blocks(i32* nocapture %Output,
 ; CHECK-NEXT:    [[EPIL_ITER_SUB:%.*]] = sub i32 [[XTRAITER]], 1
 ; CHECK-NEXT:    [[EPIL_ITER_CMP:%.*]] = icmp ne i32 [[EPIL_ITER_SUB]], 0
 ; CHECK-NEXT:    br i1 [[EPIL_ITER_CMP]], label [[FOR_BODY_EPIL_1:%.*]], label 
[[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA:%.*]]
+; CHECK:       for.body.epil.1:
+; CHECK-NEXT:    [[ARRAYIDX_EPIL_1:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC_EPIL]]
+; CHECK-NEXT:    [[TMP4:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_1]], align 4
+; CHECK-NEXT:    [[TOBOOL_EPIL_1:%.*]] = icmp eq i32 [[TMP4]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_EPIL_1]], label [[FOR_INC_EPIL_1:%.*]], label 
[[IF_THEN_EPIL_1:%.*]]
+; CHECK:       if.then.epil.1:
+; CHECK-NEXT:    [[ARRAYIDX1_EPIL_1:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC_EPIL]]
+; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX1_EPIL_1]], align 4
+; CHECK-NEXT:    [[ADD_EPIL_1:%.*]] = add i32 [[TMP5]], [[TEMP_1_EPIL]]
+; CHECK-NEXT:    br label [[FOR_INC_EPIL_1]]
+; CHECK:       for.inc.epil.1:
+; CHECK-NEXT:    [[TEMP_1_EPIL_1:%.*]] = phi i32 [ [[ADD_EPIL_1]], 
[[IF_THEN_EPIL_1]] ], [ [[TEMP_1_EPIL]], [[FOR_BODY_EPIL_1]] ]
+; CHECK-NEXT:    [[INC_EPIL_1:%.*]] = add nuw i32 [[INC_EPIL]], 1
+; CHECK-NEXT:    [[EPIL_ITER_SUB_1:%.*]] = sub i32 [[EPIL_ITER_SUB]], 1
+; CHECK-NEXT:    [[EPIL_ITER_CMP_1:%.*]] = icmp ne i32 [[EPIL_ITER_SUB_1]], 0
+; CHECK-NEXT:    br i1 [[EPIL_ITER_CMP_1]], label [[FOR_BODY_EPIL_2:%.*]], 
label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]]
+; CHECK:       for.body.epil.2:
+; CHECK-NEXT:    [[ARRAYIDX_EPIL_2:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC_EPIL_1]]
+; CHECK-NEXT:    [[TMP6:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_2]], align 4
+; CHECK-NEXT:    [[TOBOOL_EPIL_2:%.*]] = icmp eq i32 [[TMP6]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_EPIL_2]], label [[FOR_INC_EPIL_2:%.*]], label 
[[IF_THEN_EPIL_2:%.*]]
+; CHECK:       if.then.epil.2:
+; CHECK-NEXT:    [[ARRAYIDX1_EPIL_2:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC_EPIL_1]]
+; CHECK-NEXT:    [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX1_EPIL_2]], align 4
+; CHECK-NEXT:    [[ADD_EPIL_2:%.*]] = add i32 [[TMP7]], [[TEMP_1_EPIL_1]]
+; CHECK-NEXT:    br label [[FOR_INC_EPIL_2]]
+; CHECK:       for.inc.epil.2:
+; CHECK-NEXT:    [[TEMP_1_EPIL_2:%.*]] = phi i32 [ [[ADD_EPIL_2]], 
[[IF_THEN_EPIL_2]] ], [ [[TEMP_1_EPIL_1]], [[FOR_BODY_EPIL_2]] ]
+; CHECK-NEXT:    br label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]]
 ; CHECK:       for.cond.cleanup.loopexit.epilog-lcssa:
-; CHECK-NEXT:    [[TEMP_1_LCSSA_PH1:%.*]] = phi i32 [ [[TEMP_1_EPIL]], 
[[FOR_INC_EPIL]] ], [ [[TEMP_1_EPIL_1:%.*]], [[FOR_INC_EPIL_1:%.*]] ], [ 
[[TEMP_1_EPIL_2:%.*]], [[FOR_INC_EPIL_2:%.*]] ]
+; CHECK-NEXT:    [[TEMP_1_LCSSA_PH1:%.*]] = phi i32 [ [[TEMP_1_EPIL]], 
[[FOR_INC_EPIL]] ], [ [[TEMP_1_EPIL_1]], [[FOR_INC_EPIL_1]] ], [ 
[[TEMP_1_EPIL_2]], [[FOR_INC_EPIL_2]] ]
 ; CHECK-NEXT:    br label [[FOR_COND_CLEANUP_LOOPEXIT]]
 ; CHECK:       for.cond.cleanup.loopexit:
 ; CHECK-NEXT:    [[TEMP_1_LCSSA:%.*]] = phi i32 [ [[TEMP_1_LCSSA_PH]], 
[[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[TEMP_1_LCSSA_PH1]], 
[[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] ]
@@ -60,51 +89,22 @@ define void @test_three_blocks(i32* nocapture %Output,
 ; CHECK-NEXT:    [[TEMP_09:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER_NEW]] ], 
[ [[TEMP_1_3]], [[FOR_INC_3]] ]
 ; CHECK-NEXT:    [[NITER:%.*]] = phi i32 [ [[UNROLL_ITER]], 
[[FOR_BODY_PREHEADER_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[FOR_INC_3]] ]
 ; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[J_010]]
-; CHECK-NEXT:    [[TMP4:%.*]] = load i32, i32* [[ARRAYIDX]], align 4
-; CHECK-NEXT:    [[TOBOOL:%.*]] = icmp eq i32 [[TMP4]], 0
+; CHECK-NEXT:    [[TMP8:%.*]] = load i32, i32* [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[TOBOOL:%.*]] = icmp eq i32 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[TOBOOL]], label [[FOR_INC:%.*]], label [[IF_THEN:%.*]]
 ; CHECK:       if.then:
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[J_010]]
-; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX1]], align 4
-; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[TMP5]], [[TEMP_09]]
+; CHECK-NEXT:    [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX1]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[TMP9]], [[TEMP_09]]
 ; CHECK-NEXT:    br label [[FOR_INC]]
 ; CHECK:       for.inc:
 ; CHECK-NEXT:    [[TEMP_1:%.*]] = phi i32 [ [[ADD]], [[IF_THEN]] ], [ 
[[TEMP_09]], [[FOR_BODY]] ]
 ; CHECK-NEXT:    [[INC:%.*]] = add nuw nsw i32 [[J_010]], 1
 ; CHECK-NEXT:    [[NITER_NSUB:%.*]] = sub i32 [[NITER]], 1
 ; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC]]
-; CHECK-NEXT:    [[TMP6:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4
-; CHECK-NEXT:    [[TOBOOL_1:%.*]] = icmp eq i32 [[TMP6]], 0
+; CHECK-NEXT:    [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4
+; CHECK-NEXT:    [[TOBOOL_1:%.*]] = icmp eq i32 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[TOBOOL_1]], label [[FOR_INC_1:%.*]], label 
[[IF_THEN_1:%.*]]
-; CHECK:       for.body.epil.1:
-; CHECK-NEXT:    [[ARRAYIDX_EPIL_1:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC_EPIL]]
-; CHECK-NEXT:    [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_1]], align 4
-; CHECK-NEXT:    [[TOBOOL_EPIL_1:%.*]] = icmp eq i32 [[TMP7]], 0
-; CHECK-NEXT:    br i1 [[TOBOOL_EPIL_1]], label [[FOR_INC_EPIL_1]], label 
[[IF_THEN_EPIL_1:%.*]]
-; CHECK:       if.then.epil.1:
-; CHECK-NEXT:    [[ARRAYIDX1_EPIL_1:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC_EPIL]]
-; CHECK-NEXT:    [[TMP8:%.*]] = load i32, i32* [[ARRAYIDX1_EPIL_1]], align 4
-; CHECK-NEXT:    [[ADD_EPIL_1:%.*]] = add i32 [[TMP8]], [[TEMP_1_EPIL]]
-; CHECK-NEXT:    br label [[FOR_INC_EPIL_1]]
-; CHECK:       for.inc.epil.1:
-; CHECK-NEXT:    [[TEMP_1_EPIL_1]] = phi i32 [ [[ADD_EPIL_1]], 
[[IF_THEN_EPIL_1]] ], [ [[TEMP_1_EPIL]], [[FOR_BODY_EPIL_1]] ]
-; CHECK-NEXT:    [[INC_EPIL_1:%.*]] = add nuw i32 [[INC_EPIL]], 1
-; CHECK-NEXT:    [[EPIL_ITER_SUB_1:%.*]] = sub i32 [[EPIL_ITER_SUB]], 1
-; CHECK-NEXT:    [[EPIL_ITER_CMP_1:%.*]] = icmp ne i32 [[EPIL_ITER_SUB_1]], 0
-; CHECK-NEXT:    br i1 [[EPIL_ITER_CMP_1]], label [[FOR_BODY_EPIL_2:%.*]], 
label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]]
-; CHECK:       for.body.epil.2:
-; CHECK-NEXT:    [[ARRAYIDX_EPIL_2:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC_EPIL_1]]
-; CHECK-NEXT:    [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_2]], align 4
-; CHECK-NEXT:    [[TOBOOL_EPIL_2:%.*]] = icmp eq i32 [[TMP9]], 0
-; CHECK-NEXT:    br i1 [[TOBOOL_EPIL_2]], label [[FOR_INC_EPIL_2]], label 
[[IF_THEN_EPIL_2:%.*]]
-; CHECK:       if.then.epil.2:
-; CHECK-NEXT:    [[ARRAYIDX1_EPIL_2:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC_EPIL_1]]
-; CHECK-NEXT:    [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX1_EPIL_2]], align 4
-; CHECK-NEXT:    [[ADD_EPIL_2:%.*]] = add i32 [[TMP10]], [[TEMP_1_EPIL_1]]
-; CHECK-NEXT:    br label [[FOR_INC_EPIL_2]]
-; CHECK:       for.inc.epil.2:
-; CHECK-NEXT:    [[TEMP_1_EPIL_2]] = phi i32 [ [[ADD_EPIL_2]], 
[[IF_THEN_EPIL_2]] ], [ [[TEMP_1_EPIL_1]], [[FOR_BODY_EPIL_2]] ]
-; CHECK-NEXT:    br label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]]
 ; CHECK:       if.then.1:
 ; CHECK-NEXT:    [[ARRAYIDX1_1:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC]]
 ; CHECK-NEXT:    [[TMP11:%.*]] = load i32, i32* [[ARRAYIDX1_1]], align 4
@@ -203,41 +203,34 @@ define void @test_two_exits(i32* nocapture %Output,
 ; CHECK-NEXT:    [[INC:%.*]] = add nuw nsw i32 [[J_016]], 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[INC]], [[MAXJ]]
 ; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY_1:%.*]], label 
[[CLEANUP_LOOPEXIT]]
-; CHECK:       cleanup.loopexit:
-; CHECK-NEXT:    [[TEMP_0_LCSSA_PH:%.*]] = phi i32 [ [[TEMP_0_ADD]], 
[[IF_END]] ], [ [[TEMP_015]], [[FOR_BODY]] ], [ [[TEMP_0_ADD]], [[FOR_BODY_1]] 
], [ [[TEMP_0_ADD_1:%.*]], [[IF_END_1:%.*]] ], [ [[TEMP_0_ADD_1]], 
[[FOR_BODY_2:%.*]] ], [ [[TEMP_0_ADD_2:%.*]], [[IF_END_2:%.*]] ], [ 
[[TEMP_0_ADD_2]], [[FOR_BODY_3:%.*]] ], [ [[TEMP_0_ADD_3]], [[IF_END_3]] ]
-; CHECK-NEXT:    br label [[CLEANUP]]
-; CHECK:       cleanup:
-; CHECK-NEXT:    [[TEMP_0_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ 
[[TEMP_0_LCSSA_PH]], [[CLEANUP_LOOPEXIT]] ]
-; CHECK-NEXT:    store i32 [[TEMP_0_LCSSA]], i32* [[OUTPUT:%.*]], align 4
-; CHECK-NEXT:    ret void
 ; CHECK:       for.body.1:
 ; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4
 ; CHECK-NEXT:    [[CMP1_1:%.*]] = icmp ugt i32 [[TMP2]], 65535
-; CHECK-NEXT:    br i1 [[CMP1_1]], label [[CLEANUP_LOOPEXIT]], label 
[[IF_END_1]]
+; CHECK-NEXT:    br i1 [[CMP1_1]], label [[CLEANUP_LOOPEXIT]], label 
[[IF_END_1:%.*]]
 ; CHECK:       if.end.1:
 ; CHECK-NEXT:    [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = load i32, i32* [[ARRAYIDX2_1]], align 4
 ; CHECK-NEXT:    [[TOBOOL_1:%.*]] = icmp eq i32 [[TMP3]], 0
 ; CHECK-NEXT:    [[ADD_1:%.*]] = select i1 [[TOBOOL_1]], i32 0, i32 [[TMP2]]
-; CHECK-NEXT:    [[TEMP_0_ADD_1]] = add i32 [[ADD_1]], [[TEMP_0_ADD]]
+; CHECK-NEXT:    [[TEMP_0_ADD_1:%.*]] = add i32 [[ADD_1]], [[TEMP_0_ADD]]
 ; CHECK-NEXT:    [[INC_1:%.*]] = add nuw nsw i32 [[INC]], 1
 ; CHECK-NEXT:    [[CMP_1:%.*]] = icmp ult i32 [[INC_1]], [[MAXJ]]
-; CHECK-NEXT:    br i1 [[CMP_1]], label [[FOR_BODY_2]], label 
[[CLEANUP_LOOPEXIT]]
+; CHECK-NEXT:    br i1 [[CMP_1]], label [[FOR_BODY_2:%.*]], label 
[[CLEANUP_LOOPEXIT]]
 ; CHECK:       for.body.2:
 ; CHECK-NEXT:    [[ARRAYIDX_2:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC_1]]
 ; CHECK-NEXT:    [[TMP4:%.*]] = load i32, i32* [[ARRAYIDX_2]], align 4
 ; CHECK-NEXT:    [[CMP1_2:%.*]] = icmp ugt i32 [[TMP4]], 65535
-; CHECK-NEXT:    br i1 [[CMP1_2]], label [[CLEANUP_LOOPEXIT]], label 
[[IF_END_2]]
+; CHECK-NEXT:    br i1 [[CMP1_2]], label [[CLEANUP_LOOPEXIT]], label 
[[IF_END_2:%.*]]
 ; CHECK:       if.end.2:
 ; CHECK-NEXT:    [[ARRAYIDX2_2:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC_1]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX2_2]], align 4
 ; CHECK-NEXT:    [[TOBOOL_2:%.*]] = icmp eq i32 [[TMP5]], 0
 ; CHECK-NEXT:    [[ADD_2:%.*]] = select i1 [[TOBOOL_2]], i32 0, i32 [[TMP4]]
-; CHECK-NEXT:    [[TEMP_0_ADD_2]] = add i32 [[ADD_2]], [[TEMP_0_ADD_1]]
+; CHECK-NEXT:    [[TEMP_0_ADD_2:%.*]] = add i32 [[ADD_2]], [[TEMP_0_ADD_1]]
 ; CHECK-NEXT:    [[INC_2:%.*]] = add nuw nsw i32 [[INC_1]], 1
 ; CHECK-NEXT:    [[CMP_2:%.*]] = icmp ult i32 [[INC_2]], [[MAXJ]]
-; CHECK-NEXT:    br i1 [[CMP_2]], label [[FOR_BODY_3]], label 
[[CLEANUP_LOOPEXIT]]
+; CHECK-NEXT:    br i1 [[CMP_2]], label [[FOR_BODY_3:%.*]], label 
[[CLEANUP_LOOPEXIT]]
 ; CHECK:       for.body.3:
 ; CHECK-NEXT:    [[ARRAYIDX_3:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC_2]]
 ; CHECK-NEXT:    [[TMP6:%.*]] = load i32, i32* [[ARRAYIDX_3]], align 4
@@ -252,6 +245,13 @@ define void @test_two_exits(i32* nocapture %Output,
 ; CHECK-NEXT:    [[INC_3]] = add nuw i32 [[INC_2]], 1
 ; CHECK-NEXT:    [[CMP_3:%.*]] = icmp ult i32 [[INC_3]], [[MAXJ]]
 ; CHECK-NEXT:    br i1 [[CMP_3]], label [[FOR_BODY]], label 
[[CLEANUP_LOOPEXIT]]
+; CHECK:       cleanup.loopexit:
+; CHECK-NEXT:    [[TEMP_0_LCSSA_PH:%.*]] = phi i32 [ [[TEMP_0_ADD]], 
[[IF_END]] ], [ [[TEMP_015]], [[FOR_BODY]] ], [ [[TEMP_0_ADD]], [[FOR_BODY_1]] 
], [ [[TEMP_0_ADD_1]], [[IF_END_1]] ], [ [[TEMP_0_ADD_1]], [[FOR_BODY_2]] ], [ 
[[TEMP_0_ADD_2]], [[IF_END_2]] ], [ [[TEMP_0_ADD_2]], [[FOR_BODY_3]] ], [ 
[[TEMP_0_ADD_3]], [[IF_END_3]] ]
+; CHECK-NEXT:    br label [[CLEANUP]]
+; CHECK:       cleanup:
+; CHECK-NEXT:    [[TEMP_0_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ 
[[TEMP_0_LCSSA_PH]], [[CLEANUP_LOOPEXIT]] ]
+; CHECK-NEXT:    store i32 [[TEMP_0_LCSSA]], i32* [[OUTPUT:%.*]], align 4
+; CHECK-NEXT:    ret void
 ;
   i32* nocapture readonly %Condition,
   i32* nocapture readonly %Input,
@@ -417,100 +417,100 @@ define void @test_four_blocks(i32* nocapture %Output,
 ; CHECK-NEXT:    [[EPIL_ITER_SUB:%.*]] = sub i32 [[XTRAITER]], 1
 ; CHECK-NEXT:    [[EPIL_ITER_CMP:%.*]] = icmp ne i32 [[EPIL_ITER_SUB]], 0
 ; CHECK-NEXT:    br i1 [[EPIL_ITER_CMP]], label [[FOR_BODY_EPIL_1:%.*]], label 
[[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA:%.*]]
-; CHECK:       for.cond.cleanup.loopexit.epilog-lcssa:
-; CHECK-NEXT:    [[TEMP_1_LCSSA_PH1:%.*]] = phi i32 [ [[TEMP_1_EPIL]], 
[[FOR_INC_EPIL]] ], [ [[TEMP_1_EPIL_1:%.*]], [[FOR_INC_EPIL_1:%.*]] ], [ 
[[TEMP_1_EPIL_2:%.*]], [[FOR_INC_EPIL_2:%.*]] ]
-; CHECK-NEXT:    br label [[FOR_COND_CLEANUP_LOOPEXIT]]
-; CHECK:       for.cond.cleanup.loopexit:
-; CHECK-NEXT:    [[TEMP_1_LCSSA:%.*]] = phi i32 [ [[TEMP_1_LCSSA_PH]], 
[[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[TEMP_1_LCSSA_PH1]], 
[[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] ]
-; CHECK-NEXT:    br label [[FOR_COND_CLEANUP]]
-; CHECK:       for.cond.cleanup:
-; CHECK-NEXT:    [[TEMP_0_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ 
[[TEMP_1_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]
-; CHECK-NEXT:    store i32 [[TEMP_0_LCSSA]], i32* [[OUTPUT:%.*]], align 4
-; CHECK-NEXT:    ret void
-; CHECK:       for.body:
-; CHECK-NEXT:    [[TMP6:%.*]] = phi i32 [ [[DOTPRE]], [[FOR_BODY_LR_PH_NEW]] 
], [ [[TMP23]], [[FOR_INC_3]] ]
-; CHECK-NEXT:    [[J_027:%.*]] = phi i32 [ 1, [[FOR_BODY_LR_PH_NEW]] ], [ 
[[INC_3]], [[FOR_INC_3]] ]
-; CHECK-NEXT:    [[TEMP_026:%.*]] = phi i32 [ 0, [[FOR_BODY_LR_PH_NEW]] ], [ 
[[TEMP_1_3]], [[FOR_INC_3]] ]
-; CHECK-NEXT:    [[NITER:%.*]] = phi i32 [ [[UNROLL_ITER]], 
[[FOR_BODY_LR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[FOR_INC_3]] ]
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[J_027]]
-; CHECK-NEXT:    [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX]], align 4
-; CHECK-NEXT:    [[CMP1:%.*]] = icmp ugt i32 [[TMP7]], 65535
-; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[J_027]]
-; CHECK-NEXT:    [[TMP8:%.*]] = load i32, i32* [[ARRAYIDX2]], align 4
-; CHECK-NEXT:    [[CMP4:%.*]] = icmp ugt i32 [[TMP8]], [[TMP6]]
-; CHECK-NEXT:    br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
-; CHECK:       if.then:
-; CHECK-NEXT:    [[COND:%.*]] = zext i1 [[CMP4]] to i32
-; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[TEMP_026]], [[COND]]
-; CHECK-NEXT:    br label [[FOR_INC:%.*]]
-; CHECK:       if.else:
-; CHECK-NEXT:    [[NOT_CMP4:%.*]] = xor i1 [[CMP4]], true
-; CHECK-NEXT:    [[SUB:%.*]] = sext i1 [[NOT_CMP4]] to i32
-; CHECK-NEXT:    [[SUB10_SINK:%.*]] = add i32 [[J_027]], [[SUB]]
-; CHECK-NEXT:    [[ARRAYIDX11:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[SUB10_SINK]]
-; CHECK-NEXT:    [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX11]], align 4
-; CHECK-NEXT:    [[SUB13:%.*]] = sub i32 [[TEMP_026]], [[TMP9]]
-; CHECK-NEXT:    br label [[FOR_INC]]
-; CHECK:       for.inc:
-; CHECK-NEXT:    [[TEMP_1:%.*]] = phi i32 [ [[ADD]], [[IF_THEN]] ], [ 
[[SUB13]], [[IF_ELSE]] ]
-; CHECK-NEXT:    [[INC:%.*]] = add nuw nsw i32 [[J_027]], 1
-; CHECK-NEXT:    [[NITER_NSUB:%.*]] = sub i32 [[NITER]], 1
-; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC]]
-; CHECK-NEXT:    [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4
-; CHECK-NEXT:    [[CMP1_1:%.*]] = icmp ugt i32 [[TMP10]], 65535
-; CHECK-NEXT:    [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC]]
-; CHECK-NEXT:    [[TMP11:%.*]] = load i32, i32* [[ARRAYIDX2_1]], align 4
-; CHECK-NEXT:    [[CMP4_1:%.*]] = icmp ugt i32 [[TMP11]], [[TMP8]]
-; CHECK-NEXT:    br i1 [[CMP1_1]], label [[IF_THEN_1:%.*]], label 
[[IF_ELSE_1:%.*]]
 ; CHECK:       for.body.epil.1:
 ; CHECK-NEXT:    [[ARRAYIDX_EPIL_1:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC_EPIL]]
-; CHECK-NEXT:    [[TMP12:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_1]], align 4
-; CHECK-NEXT:    [[CMP1_EPIL_1:%.*]] = icmp ugt i32 [[TMP12]], 65535
+; CHECK-NEXT:    [[TMP6:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_1]], align 4
+; CHECK-NEXT:    [[CMP1_EPIL_1:%.*]] = icmp ugt i32 [[TMP6]], 65535
 ; CHECK-NEXT:    [[ARRAYIDX2_EPIL_1:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC_EPIL]]
-; CHECK-NEXT:    [[TMP13:%.*]] = load i32, i32* [[ARRAYIDX2_EPIL_1]], align 4
-; CHECK-NEXT:    [[CMP4_EPIL_1:%.*]] = icmp ugt i32 [[TMP13]], [[TMP4]]
+; CHECK-NEXT:    [[TMP7:%.*]] = load i32, i32* [[ARRAYIDX2_EPIL_1]], align 4
+; CHECK-NEXT:    [[CMP4_EPIL_1:%.*]] = icmp ugt i32 [[TMP7]], [[TMP4]]
 ; CHECK-NEXT:    br i1 [[CMP1_EPIL_1]], label [[IF_THEN_EPIL_1:%.*]], label 
[[IF_ELSE_EPIL_1:%.*]]
 ; CHECK:       if.else.epil.1:
 ; CHECK-NEXT:    [[NOT_CMP4_EPIL_1:%.*]] = xor i1 [[CMP4_EPIL_1]], true
 ; CHECK-NEXT:    [[SUB_EPIL_1:%.*]] = sext i1 [[NOT_CMP4_EPIL_1]] to i32
 ; CHECK-NEXT:    [[SUB10_SINK_EPIL_1:%.*]] = add i32 [[INC_EPIL]], 
[[SUB_EPIL_1]]
 ; CHECK-NEXT:    [[ARRAYIDX11_EPIL_1:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[SUB10_SINK_EPIL_1]]
-; CHECK-NEXT:    [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX11_EPIL_1]], align 4
-; CHECK-NEXT:    [[SUB13_EPIL_1:%.*]] = sub i32 [[TEMP_1_EPIL]], [[TMP14]]
-; CHECK-NEXT:    br label [[FOR_INC_EPIL_1]]
+; CHECK-NEXT:    [[TMP8:%.*]] = load i32, i32* [[ARRAYIDX11_EPIL_1]], align 4
+; CHECK-NEXT:    [[SUB13_EPIL_1:%.*]] = sub i32 [[TEMP_1_EPIL]], [[TMP8]]
+; CHECK-NEXT:    br label [[FOR_INC_EPIL_1:%.*]]
 ; CHECK:       if.then.epil.1:
 ; CHECK-NEXT:    [[COND_EPIL_1:%.*]] = zext i1 [[CMP4_EPIL_1]] to i32
 ; CHECK-NEXT:    [[ADD_EPIL_1:%.*]] = add i32 [[TEMP_1_EPIL]], [[COND_EPIL_1]]
 ; CHECK-NEXT:    br label [[FOR_INC_EPIL_1]]
 ; CHECK:       for.inc.epil.1:
-; CHECK-NEXT:    [[TEMP_1_EPIL_1]] = phi i32 [ [[ADD_EPIL_1]], 
[[IF_THEN_EPIL_1]] ], [ [[SUB13_EPIL_1]], [[IF_ELSE_EPIL_1]] ]
+; CHECK-NEXT:    [[TEMP_1_EPIL_1:%.*]] = phi i32 [ [[ADD_EPIL_1]], 
[[IF_THEN_EPIL_1]] ], [ [[SUB13_EPIL_1]], [[IF_ELSE_EPIL_1]] ]
 ; CHECK-NEXT:    [[INC_EPIL_1:%.*]] = add nuw i32 [[INC_EPIL]], 1
 ; CHECK-NEXT:    [[EPIL_ITER_SUB_1:%.*]] = sub i32 [[EPIL_ITER_SUB]], 1
 ; CHECK-NEXT:    [[EPIL_ITER_CMP_1:%.*]] = icmp ne i32 [[EPIL_ITER_SUB_1]], 0
 ; CHECK-NEXT:    br i1 [[EPIL_ITER_CMP_1]], label [[FOR_BODY_EPIL_2:%.*]], 
label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]]
 ; CHECK:       for.body.epil.2:
 ; CHECK-NEXT:    [[ARRAYIDX_EPIL_2:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC_EPIL_1]]
-; CHECK-NEXT:    [[TMP15:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_2]], align 4
-; CHECK-NEXT:    [[CMP1_EPIL_2:%.*]] = icmp ugt i32 [[TMP15]], 65535
+; CHECK-NEXT:    [[TMP9:%.*]] = load i32, i32* [[ARRAYIDX_EPIL_2]], align 4
+; CHECK-NEXT:    [[CMP1_EPIL_2:%.*]] = icmp ugt i32 [[TMP9]], 65535
 ; CHECK-NEXT:    [[ARRAYIDX2_EPIL_2:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC_EPIL_1]]
-; CHECK-NEXT:    [[TMP16:%.*]] = load i32, i32* [[ARRAYIDX2_EPIL_2]], align 4
-; CHECK-NEXT:    [[CMP4_EPIL_2:%.*]] = icmp ugt i32 [[TMP16]], [[TMP13]]
+; CHECK-NEXT:    [[TMP10:%.*]] = load i32, i32* [[ARRAYIDX2_EPIL_2]], align 4
+; CHECK-NEXT:    [[CMP4_EPIL_2:%.*]] = icmp ugt i32 [[TMP10]], [[TMP7]]
 ; CHECK-NEXT:    br i1 [[CMP1_EPIL_2]], label [[IF_THEN_EPIL_2:%.*]], label 
[[IF_ELSE_EPIL_2:%.*]]
 ; CHECK:       if.else.epil.2:
 ; CHECK-NEXT:    [[NOT_CMP4_EPIL_2:%.*]] = xor i1 [[CMP4_EPIL_2]], true
 ; CHECK-NEXT:    [[SUB_EPIL_2:%.*]] = sext i1 [[NOT_CMP4_EPIL_2]] to i32
 ; CHECK-NEXT:    [[SUB10_SINK_EPIL_2:%.*]] = add i32 [[INC_EPIL_1]], 
[[SUB_EPIL_2]]
 ; CHECK-NEXT:    [[ARRAYIDX11_EPIL_2:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[SUB10_SINK_EPIL_2]]
-; CHECK-NEXT:    [[TMP17:%.*]] = load i32, i32* [[ARRAYIDX11_EPIL_2]], align 4
-; CHECK-NEXT:    [[SUB13_EPIL_2:%.*]] = sub i32 [[TEMP_1_EPIL_1]], [[TMP17]]
-; CHECK-NEXT:    br label [[FOR_INC_EPIL_2]]
+; CHECK-NEXT:    [[TMP11:%.*]] = load i32, i32* [[ARRAYIDX11_EPIL_2]], align 4
+; CHECK-NEXT:    [[SUB13_EPIL_2:%.*]] = sub i32 [[TEMP_1_EPIL_1]], [[TMP11]]
+; CHECK-NEXT:    br label [[FOR_INC_EPIL_2:%.*]]
 ; CHECK:       if.then.epil.2:
 ; CHECK-NEXT:    [[COND_EPIL_2:%.*]] = zext i1 [[CMP4_EPIL_2]] to i32
 ; CHECK-NEXT:    [[ADD_EPIL_2:%.*]] = add i32 [[TEMP_1_EPIL_1]], 
[[COND_EPIL_2]]
 ; CHECK-NEXT:    br label [[FOR_INC_EPIL_2]]
 ; CHECK:       for.inc.epil.2:
-; CHECK-NEXT:    [[TEMP_1_EPIL_2]] = phi i32 [ [[ADD_EPIL_2]], 
[[IF_THEN_EPIL_2]] ], [ [[SUB13_EPIL_2]], [[IF_ELSE_EPIL_2]] ]
+; CHECK-NEXT:    [[TEMP_1_EPIL_2:%.*]] = phi i32 [ [[ADD_EPIL_2]], 
[[IF_THEN_EPIL_2]] ], [ [[SUB13_EPIL_2]], [[IF_ELSE_EPIL_2]] ]
 ; CHECK-NEXT:    br label [[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]]
+; CHECK:       for.cond.cleanup.loopexit.epilog-lcssa:
+; CHECK-NEXT:    [[TEMP_1_LCSSA_PH1:%.*]] = phi i32 [ [[TEMP_1_EPIL]], 
[[FOR_INC_EPIL]] ], [ [[TEMP_1_EPIL_1]], [[FOR_INC_EPIL_1]] ], [ 
[[TEMP_1_EPIL_2]], [[FOR_INC_EPIL_2]] ]
+; CHECK-NEXT:    br label [[FOR_COND_CLEANUP_LOOPEXIT]]
+; CHECK:       for.cond.cleanup.loopexit:
+; CHECK-NEXT:    [[TEMP_1_LCSSA:%.*]] = phi i32 [ [[TEMP_1_LCSSA_PH]], 
[[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[TEMP_1_LCSSA_PH1]], 
[[FOR_COND_CLEANUP_LOOPEXIT_EPILOG_LCSSA]] ]
+; CHECK-NEXT:    br label [[FOR_COND_CLEANUP]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    [[TEMP_0_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ 
[[TEMP_1_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]
+; CHECK-NEXT:    store i32 [[TEMP_0_LCSSA]], i32* [[OUTPUT:%.*]], align 4
+; CHECK-NEXT:    ret void
+; CHECK:       for.body:
+; CHECK-NEXT:    [[TMP12:%.*]] = phi i32 [ [[DOTPRE]], [[FOR_BODY_LR_PH_NEW]] 
], [ [[TMP23]], [[FOR_INC_3]] ]
+; CHECK-NEXT:    [[J_027:%.*]] = phi i32 [ 1, [[FOR_BODY_LR_PH_NEW]] ], [ 
[[INC_3]], [[FOR_INC_3]] ]
+; CHECK-NEXT:    [[TEMP_026:%.*]] = phi i32 [ 0, [[FOR_BODY_LR_PH_NEW]] ], [ 
[[TEMP_1_3]], [[FOR_INC_3]] ]
+; CHECK-NEXT:    [[NITER:%.*]] = phi i32 [ [[UNROLL_ITER]], 
[[FOR_BODY_LR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[FOR_INC_3]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[J_027]]
+; CHECK-NEXT:    [[TMP13:%.*]] = load i32, i32* [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp ugt i32 [[TMP13]], 65535
+; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[J_027]]
+; CHECK-NEXT:    [[TMP14:%.*]] = load i32, i32* [[ARRAYIDX2]], align 4
+; CHECK-NEXT:    [[CMP4:%.*]] = icmp ugt i32 [[TMP14]], [[TMP12]]
+; CHECK-NEXT:    br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
+; CHECK:       if.then:
+; CHECK-NEXT:    [[COND:%.*]] = zext i1 [[CMP4]] to i32
+; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[TEMP_026]], [[COND]]
+; CHECK-NEXT:    br label [[FOR_INC:%.*]]
+; CHECK:       if.else:
+; CHECK-NEXT:    [[NOT_CMP4:%.*]] = xor i1 [[CMP4]], true
+; CHECK-NEXT:    [[SUB:%.*]] = sext i1 [[NOT_CMP4]] to i32
+; CHECK-NEXT:    [[SUB10_SINK:%.*]] = add i32 [[J_027]], [[SUB]]
+; CHECK-NEXT:    [[ARRAYIDX11:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[SUB10_SINK]]
+; CHECK-NEXT:    [[TMP15:%.*]] = load i32, i32* [[ARRAYIDX11]], align 4
+; CHECK-NEXT:    [[SUB13:%.*]] = sub i32 [[TEMP_026]], [[TMP15]]
+; CHECK-NEXT:    br label [[FOR_INC]]
+; CHECK:       for.inc:
+; CHECK-NEXT:    [[TEMP_1:%.*]] = phi i32 [ [[ADD]], [[IF_THEN]] ], [ 
[[SUB13]], [[IF_ELSE]] ]
+; CHECK-NEXT:    [[INC:%.*]] = add nuw nsw i32 [[J_027]], 1
+; CHECK-NEXT:    [[NITER_NSUB:%.*]] = sub i32 [[NITER]], 1
+; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, i32* 
[[CONDITION]], i32 [[INC]]
+; CHECK-NEXT:    [[TMP16:%.*]] = load i32, i32* [[ARRAYIDX_1]], align 4
+; CHECK-NEXT:    [[CMP1_1:%.*]] = icmp ugt i32 [[TMP16]], 65535
+; CHECK-NEXT:    [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC]]
+; CHECK-NEXT:    [[TMP17:%.*]] = load i32, i32* [[ARRAYIDX2_1]], align 4
+; CHECK-NEXT:    [[CMP4_1:%.*]] = icmp ugt i32 [[TMP17]], [[TMP14]]
+; CHECK-NEXT:    br i1 [[CMP1_1]], label [[IF_THEN_1:%.*]], label 
[[IF_ELSE_1:%.*]]
 ; CHECK:       if.else.1:
 ; CHECK-NEXT:    [[NOT_CMP4_1:%.*]] = xor i1 [[CMP4_1]], true
 ; CHECK-NEXT:    [[SUB_1:%.*]] = sext i1 [[NOT_CMP4_1]] to i32
@@ -532,7 +532,7 @@ define void @test_four_blocks(i32* nocapture %Output,
 ; CHECK-NEXT:    [[CMP1_2:%.*]] = icmp ugt i32 [[TMP19]], 65535
 ; CHECK-NEXT:    [[ARRAYIDX2_2:%.*]] = getelementptr inbounds i32, i32* 
[[INPUT]], i32 [[INC_1]]
 ; CHECK-NEXT:    [[TMP20:%.*]] = load i32, i32* [[ARRAYIDX2_2]], align 4
-; CHECK-NEXT:    [[CMP4_2:%.*]] = icmp ugt i32 [[TMP20]], [[TMP11]]
+; CHECK-NEXT:    [[CMP4_2:%.*]] = icmp ugt i32 [[TMP20]], [[TMP17]]
 ; CHECK-NEXT:    br i1 [[CMP1_2]], label [[IF_THEN_2:%.*]], label 
[[IF_ELSE_2:%.*]]
 ; CHECK:       if.else.2:
 ; CHECK-NEXT:    [[NOT_CMP4_2:%.*]] = xor i1 [[CMP4_2]], true
@@ -742,10 +742,6 @@ define void @iterate_inc(%struct.Node* %n, i32 %limit) {
 ; CHECK-NEXT:    [[TMP2:%.*]] = load %struct.Node*, %struct.Node** [[TMP1]], 
align 4
 ; CHECK-NEXT:    [[TOBOOL:%.*]] = icmp eq %struct.Node* [[TMP2]], null
 ; CHECK-NEXT:    br i1 [[TOBOOL]], label [[WHILE_END_LOOPEXIT]], label 
[[LAND_RHS_1:%.*]]
-; CHECK:       while.end.loopexit:
-; CHECK-NEXT:    br label [[WHILE_END]]
-; CHECK:       while.end:
-; CHECK-NEXT:    ret void
 ; CHECK:       land.rhs.1:
 ; CHECK-NEXT:    [[VAL_1:%.*]] = getelementptr inbounds [[STRUCT_NODE]], 
%struct.Node* [[TMP2]], i32 0, i32 1
 ; CHECK-NEXT:    [[TMP3:%.*]] = load i32, i32* [[VAL_1]], align 4
@@ -782,6 +778,10 @@ define void @iterate_inc(%struct.Node* %n, i32 %limit) {
 ; CHECK-NEXT:    [[TMP11]] = load %struct.Node*, %struct.Node** [[TMP10]], 
align 4
 ; CHECK-NEXT:    [[TOBOOL_3:%.*]] = icmp eq %struct.Node* [[TMP11]], null
 ; CHECK-NEXT:    br i1 [[TOBOOL_3]], label [[WHILE_END_LOOPEXIT]], label 
[[LAND_RHS]]
+; CHECK:       while.end.loopexit:
+; CHECK-NEXT:    br label [[WHILE_END]]
+; CHECK:       while.end:
+; CHECK-NEXT:    ret void
 ;
 entry:
   %tobool5 = icmp eq %struct.Node* %n, null
diff --git a/llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll 
b/llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll
index ea18d3aa1054..33151c68b319 100644
--- a/llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll
+++ b/llvm/test/Transforms/LoopUnroll/ARM/upperbound.ll
@@ -20,8 +20,6 @@ define void @test(i32* %x, i32 %n) {
 ; CHECK-NEXT:    [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, i32* [[X]], 
i64 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp sgt i32 [[REM]], 1
 ; CHECK-NEXT:    br i1 [[CMP]], label [[WHILE_BODY_1:%.*]], label [[WHILE_END]]
-; CHECK:       while.end:
-; CHECK-NEXT:    ret void
 ; CHECK:       while.body.1:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4
 ; CHECK-NEXT:    [[CMP1_1:%.*]] = icmp slt i32 [[TMP1]], 10
@@ -40,6 +38,8 @@ define void @test(i32* %x, i32 %n) {
 ; CHECK:       if.then.2:
 ; CHECK-NEXT:    store i32 0, i32* [[INCDEC_PTR_1]], align 4
 ; CHECK-NEXT:    br label [[WHILE_END]]
+; CHECK:       while.end:
+; CHECK-NEXT:    ret void
 ;
 entry:
   %sub = add nsw i32 %n, -1
@@ -76,9 +76,9 @@ define i32 @test2(i32 %l86) {
 ; CHECK-NEXT:    [[L86_OFF:%.*]] = add i32 [[L86:%.*]], -1
 ; CHECK-NEXT:    [[SWITCH:%.*]] = icmp ult i32 [[L86_OFF]], 24
 ; CHECK-NEXT:    [[DOTNOT30:%.*]] = icmp ne i32 [[L86]], 25
-; CHECK-NEXT:    [[SPEC_SELECT24:%.*]] = zext i1 [[DOTNOT30]] to i32
-; CHECK-NEXT:    [[COMMON_RET31_OP:%.*]] = select i1 [[SWITCH]], i32 0, i32 
[[SPEC_SELECT24]]
-; CHECK-NEXT:    ret i32 [[COMMON_RET31_OP]]
+; CHECK-NEXT:    [[SPEC_SELECT:%.*]] = zext i1 [[DOTNOT30]] to i32
+; CHECK-NEXT:    [[COMMON_RET_OP:%.*]] = select i1 [[SWITCH]], i32 0, i32 
[[SPEC_SELECT]]
+; CHECK-NEXT:    ret i32 [[COMMON_RET_OP]]
 ;
 entry:
   br label %for.body.i.i
diff --git a/llvm/test/Transforms/LoopUnroll/full-unroll-keep-first-exit.ll 
b/llvm/test/Transforms/LoopUnroll/full-unroll-keep-first-exit.ll
index 316051715584..cdc8e944715e 100644
--- a/llvm/test/Transforms/LoopUnroll/full-unroll-keep-first-exit.ll
+++ b/llvm/test/Transforms/LoopUnroll/full-unroll-keep-first-exit.ll
@@ -15,12 +15,12 @@ define void @s32_max1(i32 %n, i32* %p) {
 ; CHECK-NEXT:    [[INC:%.*]] = add i32 [[N]], 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[N]], [[ADD]]
 ; CHECK-NEXT:    br i1 [[CMP]], label [[DO_BODY_1:%.*]], label [[DO_END:%.*]]
-; CHECK:       do.end:
-; CHECK-NEXT:    ret void
 ; CHECK:       do.body.1:
 ; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr i32, i32* [[P]], i32 
[[INC]]
 ; CHECK-NEXT:    store i32 [[INC]], i32* [[ARRAYIDX_1]], align 4
 ; CHECK-NEXT:    br label [[DO_END]]
+; CHECK:       do.end:
+; CHECK-NEXT:    ret void
 ;
 entry:
   %add = add i32 %n, 1
@@ -51,8 +51,6 @@ define void @s32_max2(i32 %n, i32* %p) {
 ; CHECK-NEXT:    [[INC:%.*]] = add i32 [[N]], 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[N]], [[ADD]]
 ; CHECK-NEXT:    br i1 [[CMP]], label [[DO_BODY_1:%.*]], label [[DO_END:%.*]]
-; CHECK:       do.end:
-; CHECK-NEXT:    ret void
 ; CHECK:       do.body.1:
 ; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr i32, i32* [[P]], i32 
[[INC]]
 ; CHECK-NEXT:    store i32 [[INC]], i32* [[ARRAYIDX_1]], align 4
@@ -60,6 +58,8 @@ define void @s32_max2(i32 %n, i32* %p) {
 ; CHECK-NEXT:    [[ARRAYIDX_2:%.*]] = getelementptr i32, i32* [[P]], i32 
[[INC_1]]
 ; CHECK-NEXT:    store i32 [[INC_1]], i32* [[ARRAYIDX_2]], align 4
 ; CHECK-NEXT:    br label [[DO_END]]
+; CHECK:       do.end:
+; CHECK-NEXT:    ret void
 ;
 entry:
   %add = add i32 %n, 2
@@ -163,12 +163,12 @@ define void @u32_max1(i32 %n, i32* %p) {
 ; CHECK-NEXT:    [[INC:%.*]] = add i32 [[N]], 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[N]], [[ADD]]
 ; CHECK-NEXT:    br i1 [[CMP]], label [[DO_BODY_1:%.*]], label [[DO_END:%.*]]
-; CHECK:       do.end:
-; CHECK-NEXT:    ret void
 ; CHECK:       do.body.1:
 ; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr i32, i32* [[P]], i32 
[[INC]]
 ; CHECK-NEXT:    store i32 [[INC]], i32* [[ARRAYIDX_1]], align 4
 ; CHECK-NEXT:    br label [[DO_END]]
+; CHECK:       do.end:
+; CHECK-NEXT:    ret void
 ;
 entry:
   %add = add i32 %n, 1
@@ -199,8 +199,6 @@ define void @u32_max2(i32 %n, i32* %p) {
 ; CHECK-NEXT:    [[INC:%.*]] = add i32 [[N]], 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[N]], [[ADD]]
 ; CHECK-NEXT:    br i1 [[CMP]], label [[DO_BODY_1:%.*]], label [[DO_END:%.*]]
-; CHECK:       do.end:
-; CHECK-NEXT:    ret void
 ; CHECK:       do.body.1:
 ; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr i32, i32* [[P]], i32 
[[INC]]
 ; CHECK-NEXT:    store i32 [[INC]], i32* [[ARRAYIDX_1]], align 4
@@ -208,6 +206,8 @@ define void @u32_max2(i32 %n, i32* %p) {
 ; CHECK-NEXT:    [[ARRAYIDX_2:%.*]] = getelementptr i32, i32* [[P]], i32 
[[INC_1]]
 ; CHECK-NEXT:    store i32 [[INC_1]], i32* [[ARRAYIDX_2]], align 4
 ; CHECK-NEXT:    br label [[DO_END]]
+; CHECK:       do.end:
+; CHECK-NEXT:    ret void
 ;
 entry:
   %add = add i32 %n, 2
diff --git 
a/llvm/test/Transforms/LoopUnroll/full-unroll-one-unpredictable-exit.ll 
b/llvm/test/Transforms/LoopUnroll/full-unroll-one-unpredictable-exit.ll
index 095a7c1e1dd1..b7d7e00fa0c9 100644
--- a/llvm/test/Transforms/LoopUnroll/full-unroll-one-unpredictable-exit.ll
+++ b/llvm/test/Transforms/LoopUnroll/full-unroll-one-unpredictable-exit.ll
@@ -34,11 +34,11 @@ define i1 @test_latch() {
 ; CHECK-NEXT:    [[LOAD2_1:%.*]] = load i64, i64* [[GEP2_1]], align 8
 ; CHECK-NEXT:    [[EXITCOND2_1:%.*]] = icmp eq i64 [[LOAD1_1]], [[LOAD2_1]]
 ; CHECK-NEXT:    br i1 [[EXITCOND2_1]], label [[LATCH_1:%.*]], label [[EXIT]]
+; CHECK:       latch.1:
+; CHECK-NEXT:    br label [[EXIT]]
 ; CHECK:       exit:
 ; CHECK-NEXT:    [[EXIT_VAL:%.*]] = phi i1 [ false, [[LOOP]] ], [ false, 
[[LATCH]] ], [ true, [[LATCH_1]] ]
 ; CHECK-NEXT:    ret i1 [[EXIT_VAL]]
-; CHECK:       latch.1:
-; CHECK-NEXT:    br label [[EXIT]]
 ;
 start:
   %a1 = alloca [2 x i64], align 8
@@ -95,22 +95,22 @@ define i1 @test_non_latch() {
 ; CHECK-NEXT:    [[LOAD2:%.*]] = load i64, i64* [[GEP2]], align 8
 ; CHECK-NEXT:    [[EXITCOND2:%.*]] = icmp eq i64 [[LOAD1]], [[LOAD2]]
 ; CHECK-NEXT:    br i1 [[EXITCOND2]], label [[LOOP_1:%.*]], label [[EXIT:%.*]]
-; CHECK:       exit:
-; CHECK-NEXT:    [[EXIT_VAL:%.*]] = phi i1 [ false, [[LATCH]] ], [ false, 
[[LATCH_1:%.*]] ], [ true, [[LOOP_2:%.*]] ], [ false, [[LATCH_2:%.*]] ]
-; CHECK-NEXT:    ret i1 [[EXIT_VAL]]
 ; CHECK:       loop.1:
-; CHECK-NEXT:    br label [[LATCH_1]]
+; CHECK-NEXT:    br label [[LATCH_1:%.*]]
 ; CHECK:       latch.1:
 ; CHECK-NEXT:    [[GEP1_1:%.*]] = getelementptr inbounds [2 x i64], [2 x i64]* 
[[A1]], i64 0, i64 1
 ; CHECK-NEXT:    [[GEP2_1:%.*]] = getelementptr inbounds [2 x i64], [2 x i64]* 
[[A2]], i64 0, i64 1
 ; CHECK-NEXT:    [[LOAD1_1:%.*]] = load i64, i64* [[GEP1_1]], align 8
 ; CHECK-NEXT:    [[LOAD2_1:%.*]] = load i64, i64* [[GEP2_1]], align 8
 ; CHECK-NEXT:    [[EXITCOND2_1:%.*]] = icmp eq i64 [[LOAD1_1]], [[LOAD2_1]]
-; CHECK-NEXT:    br i1 [[EXITCOND2_1]], label [[LOOP_2]], label [[EXIT]]
+; CHECK-NEXT:    br i1 [[EXITCOND2_1]], label [[LOOP_2:%.*]], label [[EXIT]]
 ; CHECK:       loop.2:
-; CHECK-NEXT:    br i1 true, label [[EXIT]], label [[LATCH_2]]
+; CHECK-NEXT:    br i1 true, label [[EXIT]], label [[LATCH_2:%.*]]
 ; CHECK:       latch.2:
 ; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       exit:
+; CHECK-NEXT:    [[EXIT_VAL:%.*]] = phi i1 [ false, [[LATCH]] ], [ false, 
[[LATCH_1]] ], [ true, [[LOOP_2]] ], [ false, [[LATCH_2]] ]
+; CHECK-NEXT:    ret i1 [[EXIT_VAL]]
 ;
 start:
   %a1 = alloca [2 x i64], align 8
diff --git a/llvm/test/Transforms/LoopUnroll/multiple-exits.ll 
b/llvm/test/Transforms/LoopUnroll/multiple-exits.ll
index 0bea86350b99..9f40f51c10e6 100644
--- a/llvm/test/Transforms/LoopUnroll/multiple-exits.ll
+++ b/llvm/test/Transforms/LoopUnroll/multiple-exits.ll
@@ -14,8 +14,6 @@ define void @test1() {
 ; CHECK-NEXT:    call void @bar()
 ; CHECK-NEXT:    call void @bar()
 ; CHECK-NEXT:    br label [[LATCH_1:%.*]]
-; CHECK:       exit:
-; CHECK-NEXT:    ret void
 ; CHECK:       latch.1:
</cut>
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to