Successfully identified regression in *llvm* in CI configuration 
tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3.  So far, this commit has 
regressed CI configurations:
 - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3

Culprit:
<cut>
commit a838a4f69f500fc8e39fb4c9a1476f162ccf8423
Author: David Green <david.gr...@arm.com>
Date:   Mon Feb 15 13:17:21 2021 +0000

    [ARM] Extend search for increment in load/store optimizer
    
    Currently the findIncDecAfter will only look at the next instruction for
    post-inc candidates in the load/store optimizer. This extends that to a
    search through the current BB, until an instruction that modifies or
    uses the increment reg is found. This allows more post-inc load/stores
    and ldm/stm's to be created, especially in cases where a schedule might
    move instructions further apart.
    
    We make sure not to look any further for an SP, as that might invalidate
    stack slots that are still in use.
    
    Differential Revision: https://reviews.llvm.org/D95881
</cut>

Results regressed to (for first_bad == a838a4f69f500fc8e39fb4c9a1476f162ccf8423)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set 
gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set 
gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_marm 
artifacts/build-a838a4f69f500fc8e39fb4c9a1476f162ccf8423/results_id:
1
# 482.sphinx3,sphinx_livepretend_base.default                   regressed by 104
# 482.sphinx3,[.] vector_gautbl_eval_logs3                      regressed by 115

from (for last_good == 20e3a6cb6270b68139f74529ab8efdfad1263533)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set 
gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set 
gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_marm 
artifacts/build-20e3a6cb6270b68139f74529ab8efdfad1263533/results_id:
1

Artifacts of last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-O3/10/artifact/artifacts/build-20e3a6cb6270b68139f74529ab8efdfad1263533/
Results ID of last_good: 
tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/4025
Artifacts of first_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-O3/10/artifact/artifacts/build-a838a4f69f500fc8e39fb4c9a1476f162ccf8423/
Results ID of first_bad: 
tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/4022
Build top page/logs: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-O3/10/

Configuration details:


Reproduce builds:
<cut>
mkdir investigate-llvm-a838a4f69f500fc8e39fb4c9a1476f162ccf8423
cd investigate-llvm-a838a4f69f500fc8e39fb4c9a1476f162ccf8423

git clone https://git.linaro.org/toolchain/jenkins-scripts

mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-O3/10/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-O3/10/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-O3/10/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /llvm/ ./ ./bisect/baseline/

cd llvm

# Reproduce first_bad build
git checkout --detach a838a4f69f500fc8e39fb4c9a1476f162ccf8423
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 20e3a6cb6270b68139f74529ab8efdfad1263533
../artifacts/test.sh

cd ..
</cut>

History of pending regressions and results: 
https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/ci/tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3

Artifacts: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-O3/10/artifact/artifacts/
Build log: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-arm-spec2k6-O3/10/consoleText

Full commit (up to 1000 lines):
<cut>
commit a838a4f69f500fc8e39fb4c9a1476f162ccf8423
Author: David Green <david.gr...@arm.com>
Date:   Mon Feb 15 13:17:21 2021 +0000

    [ARM] Extend search for increment in load/store optimizer
    
    Currently the findIncDecAfter will only look at the next instruction for
    post-inc candidates in the load/store optimizer. This extends that to a
    search through the current BB, until an instruction that modifies or
    uses the increment reg is found. This allows more post-inc load/stores
    and ldm/stm's to be created, especially in cases where a schedule might
    move instructions further apart.
    
    We make sure not to look any further for an SP, as that might invalidate
    stack slots that are still in use.
    
    Differential Revision: https://reviews.llvm.org/D95881
---
 llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp      | 40 +++++++++++++------
 llvm/test/CodeGen/ARM/indexed-mem.ll               |  6 +--
 .../Thumb2/LowOverheadLoops/fast-fp-loops.ll       |  9 ++---
 .../Thumb2/LowOverheadLoops/mve-float-loops.ll     | 45 ++++++++--------------
 llvm/test/CodeGen/Thumb2/mve-float32regloops.ll    |  6 +--
 llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll |  9 ++---
 llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll        |  9 ++---
 llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll       | 18 +++------
 llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll    |  6 +--
 llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll         |  3 +-
 10 files changed, 66 insertions(+), 85 deletions(-)

diff --git a/llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp 
b/llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
index aa1fe4e4ffda..5fe61809f31b 100644
--- a/llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
@@ -1238,19 +1238,37 @@ findIncDecBefore(MachineBasicBlock::iterator MBBI, 
Register Reg,
 /// Searches for a increment or decrement of \p Reg after \p MBBI.
 static MachineBasicBlock::iterator
 findIncDecAfter(MachineBasicBlock::iterator MBBI, Register Reg,
-                ARMCC::CondCodes Pred, Register PredReg, int &Offset) {
+                ARMCC::CondCodes Pred, Register PredReg, int &Offset,
+                const TargetRegisterInfo *TRI) {
   Offset = 0;
   MachineBasicBlock &MBB = *MBBI->getParent();
   MachineBasicBlock::iterator EndMBBI = MBB.end();
   MachineBasicBlock::iterator NextMBBI = std::next(MBBI);
-  // Skip debug values.
-  while (NextMBBI != EndMBBI && NextMBBI->isDebugInstr())
-    ++NextMBBI;
-  if (NextMBBI == EndMBBI)
-    return EndMBBI;
+  while (NextMBBI != EndMBBI) {
+    // Skip debug values.
+    while (NextMBBI != EndMBBI && NextMBBI->isDebugInstr())
+      ++NextMBBI;
+    if (NextMBBI == EndMBBI)
+      return EndMBBI;
+
+    unsigned Off = isIncrementOrDecrement(*NextMBBI, Reg, Pred, PredReg);
+    if (Off) {
+      Offset = Off;
+      return NextMBBI;
+    }
 
-  Offset = isIncrementOrDecrement(*NextMBBI, Reg, Pred, PredReg);
-  return Offset == 0 ? EndMBBI : NextMBBI;
+    // SP can only be combined if it is the next instruction after the original
+    // MBBI, otherwise we may be incrementing the stack pointer (invalidating
+    // anything below the new pointer) when its frame elements are still in
+    // use. Other registers can attempt to look further, until a different use
+    // or def of the register is found.
+    if (Reg == ARM::SP || NextMBBI->readsRegister(Reg, TRI) ||
+        NextMBBI->definesRegister(Reg, TRI))
+      return EndMBBI;
+
+    ++NextMBBI;
+  }
+  return EndMBBI;
 }
 
 /// Fold proceeding/trailing inc/dec of base register into the
@@ -1296,7 +1314,7 @@ bool 
ARMLoadStoreOpt::MergeBaseUpdateLSMultiple(MachineInstr *MI) {
   } else if (Mode == ARM_AM::ib && Offset == -Bytes) {
     Mode = ARM_AM::da;
   } else {
-    MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset);
+    MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset, TRI);
     if (((Mode != ARM_AM::ia && Mode != ARM_AM::ib) || Offset != Bytes) &&
         ((Mode != ARM_AM::da && Mode != ARM_AM::db) || Offset != -Bytes)) {
 
@@ -1483,7 +1501,7 @@ bool 
ARMLoadStoreOpt::MergeBaseUpdateLoadStore(MachineInstr *MI) {
   } else if (Offset == -Bytes) {
     NewOpc = getPreIndexedLoadStoreOpcode(Opcode, ARM_AM::sub);
   } else {
-    MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset);
+    MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset, TRI);
     if (Offset == Bytes) {
       NewOpc = getPostIndexedLoadStoreOpcode(Opcode, ARM_AM::add);
     } else if (!isAM5 && Offset == -Bytes) {
@@ -1614,7 +1632,7 @@ bool 
ARMLoadStoreOpt::MergeBaseUpdateLSDouble(MachineInstr &MI) const {
   if (Offset == 8 || Offset == -8) {
     NewOpc = Opcode == ARM::t2LDRDi8 ? ARM::t2LDRD_PRE : ARM::t2STRD_PRE;
   } else {
-    MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset);
+    MergeInstr = findIncDecAfter(MBBI, Base, Pred, PredReg, Offset, TRI);
     if (Offset == 8 || Offset == -8) {
       NewOpc = Opcode == ARM::t2LDRDi8 ? ARM::t2LDRD_POST : ARM::t2STRD_POST;
     } else
diff --git a/llvm/test/CodeGen/ARM/indexed-mem.ll 
b/llvm/test/CodeGen/ARM/indexed-mem.ll
index a5f8409a50a2..295bb377d732 100644
--- a/llvm/test/CodeGen/ARM/indexed-mem.ll
+++ b/llvm/test/CodeGen/ARM/indexed-mem.ll
@@ -220,16 +220,14 @@ define i32* @pre_dec_ldrd(i32* %base) {
 define i32* @post_inc_ldrd(i32* %base, i32* %addr.3) {
 ; CHECK-V8M-LABEL: post_inc_ldrd:
 ; CHECK-V8M:       @ %bb.0:
-; CHECK-V8M-NEXT:    ldrd r2, r3, [r0]
-; CHECK-V8M-NEXT:    adds r0, #8
+; CHECK-V8M-NEXT:    ldrd r2, r3, [r0], #8
 ; CHECK-V8M-NEXT:    add r2, r3
 ; CHECK-V8M-NEXT:    str r2, [r1]
 ; CHECK-V8M-NEXT:    bx lr
 ;
 ; CHECK-V8A-LABEL: post_inc_ldrd:
 ; CHECK-V8A:       @ %bb.0:
-; CHECK-V8A-NEXT:    ldm r0, {r2, r3}
-; CHECK-V8A-NEXT:    add r0, r0, #8
+; CHECK-V8A-NEXT:    ldm r0!, {r2, r3}
 ; CHECK-V8A-NEXT:    add r2, r2, r3
 ; CHECK-V8A-NEXT:    str r2, [r1]
 ; CHECK-V8A-NEXT:    bx lr
diff --git a/llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll 
b/llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
index f8fb8476c322..8b27a9348418 100644
--- a/llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
+++ b/llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
@@ -82,13 +82,10 @@ define arm_aapcs_vfpcc void @fast_float_mul(float* 
nocapture %a, float* nocaptur
 ; CHECK-NEXT:    add.w r0, r0, r3, lsl #2
 ; CHECK-NEXT:  .LBB0_10: @ %for.body.epil
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vldr s0, [r1]
-; CHECK-NEXT:    adds r1, #4
-; CHECK-NEXT:    vldr s2, [r2]
-; CHECK-NEXT:    adds r2, #4
+; CHECK-NEXT:    vldmia r1!, {s0}
+; CHECK-NEXT:    vldmia r2!, {s2}
 ; CHECK-NEXT:    vmul.f32 s0, s2, s0
-; CHECK-NEXT:    vstr s0, [r0]
-; CHECK-NEXT:    adds r0, #4
+; CHECK-NEXT:    vstmia r0!, {s0}
 ; CHECK-NEXT:    le lr, .LBB0_10
 ; CHECK-NEXT:  .LBB0_11: @ %for.cond.cleanup
 ; CHECK-NEXT:    pop {r4, r5, r6, r7, pc}
diff --git a/llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll 
b/llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll
index f962458ddb11..d143976927b2 100644
--- a/llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll
+++ b/llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll
@@ -43,14 +43,11 @@ define arm_aapcs_vfpcc void @float_float_mul(float* 
nocapture readonly %a, float
 ; CHECK-NEXT:    add.w r7, r2, r12, lsl #2
 ; CHECK-NEXT:  .LBB0_6: @ %for.body.prol
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vldr s0, [r6]
-; CHECK-NEXT:    adds r6, #4
-; CHECK-NEXT:    vldr s2, [r5]
-; CHECK-NEXT:    adds r5, #4
+; CHECK-NEXT:    vldmia r6!, {s0}
 ; CHECK-NEXT:    add.w r12, r12, #1
+; CHECK-NEXT:    vldmia r5!, {s2}
 ; CHECK-NEXT:    vmul.f32 s0, s2, s0
-; CHECK-NEXT:    vstr s0, [r7]
-; CHECK-NEXT:    adds r7, #4
+; CHECK-NEXT:    vstmia r7!, {s0}
 ; CHECK-NEXT:    le lr, .LBB0_6
 ; CHECK-NEXT:  .LBB0_7: @ %for.body.prol.loopexit
 ; CHECK-NEXT:    cmp r4, #3
@@ -261,14 +258,11 @@ define arm_aapcs_vfpcc void @float_float_add(float* 
nocapture readonly %a, float
 ; CHECK-NEXT:    add.w r7, r2, r12, lsl #2
 ; CHECK-NEXT:  .LBB1_6: @ %for.body.prol
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vldr s0, [r6]
-; CHECK-NEXT:    adds r6, #4
-; CHECK-NEXT:    vldr s2, [r5]
-; CHECK-NEXT:    adds r5, #4
+; CHECK-NEXT:    vldmia r6!, {s0}
 ; CHECK-NEXT:    add.w r12, r12, #1
+; CHECK-NEXT:    vldmia r5!, {s2}
 ; CHECK-NEXT:    vadd.f32 s0, s2, s0
-; CHECK-NEXT:    vstr s0, [r7]
-; CHECK-NEXT:    adds r7, #4
+; CHECK-NEXT:    vstmia r7!, {s0}
 ; CHECK-NEXT:    le lr, .LBB1_6
 ; CHECK-NEXT:  .LBB1_7: @ %for.body.prol.loopexit
 ; CHECK-NEXT:    cmp r4, #3
@@ -479,14 +473,11 @@ define arm_aapcs_vfpcc void @float_float_sub(float* 
nocapture readonly %a, float
 ; CHECK-NEXT:    add.w r7, r2, r12, lsl #2
 ; CHECK-NEXT:  .LBB2_6: @ %for.body.prol
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vldr s0, [r6]
-; CHECK-NEXT:    adds r6, #4
-; CHECK-NEXT:    vldr s2, [r5]
-; CHECK-NEXT:    adds r5, #4
+; CHECK-NEXT:    vldmia r6!, {s0}
 ; CHECK-NEXT:    add.w r12, r12, #1
+; CHECK-NEXT:    vldmia r5!, {s2}
 ; CHECK-NEXT:    vsub.f32 s0, s2, s0
-; CHECK-NEXT:    vstr s0, [r7]
-; CHECK-NEXT:    adds r7, #4
+; CHECK-NEXT:    vstmia r7!, {s0}
 ; CHECK-NEXT:    le lr, .LBB2_6
 ; CHECK-NEXT:  .LBB2_7: @ %for.body.prol.loopexit
 ; CHECK-NEXT:    cmp r4, #3
@@ -706,13 +697,11 @@ define arm_aapcs_vfpcc void @float_int_mul(float* 
nocapture readonly %a, i32* no
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    ldr r4, [r6], #4
 ; CHECK-NEXT:    add.w r12, r12, #1
-; CHECK-NEXT:    vldr s2, [r5]
-; CHECK-NEXT:    adds r5, #4
+; CHECK-NEXT:    vldmia r5!, {s2}
 ; CHECK-NEXT:    vmov s0, r4
 ; CHECK-NEXT:    vcvt.f32.s32 s0, s0
 ; CHECK-NEXT:    vmul.f32 s0, s2, s0
-; CHECK-NEXT:    vstr s0, [r7]
-; CHECK-NEXT:    adds r7, #4
+; CHECK-NEXT:    vstmia r7!, {s0}
 ; CHECK-NEXT:    le lr, .LBB3_9
 ; CHECK-NEXT:  .LBB3_10: @ %for.body.prol.loopexit
 ; CHECK-NEXT:    cmp.w r8, #3
@@ -1025,8 +1014,7 @@ define arm_aapcs_vfpcc void @half_half_mul(half* 
nocapture readonly %a, half* no
 ; CHECK-NEXT:    adds r1, #2
 ; CHECK-NEXT:    vmul.f16 s0, s2, s0
 ; CHECK-NEXT:    vcvtb.f32.f16 s0, s0
-; CHECK-NEXT:    vstr s0, [r2]
-; CHECK-NEXT:    adds r2, #4
+; CHECK-NEXT:    vstmia r2!, {s0}
 ; CHECK-NEXT:    le lr, .LBB5_7
 ; CHECK-NEXT:  .LBB5_8: @ %for.cond.cleanup
 ; CHECK-NEXT:    pop.w {r4, r5, r6, r7, r8, r9, r10, pc}
@@ -1140,8 +1128,7 @@ define arm_aapcs_vfpcc void @half_half_add(half* 
nocapture readonly %a, half* no
 ; CHECK-NEXT:    adds r1, #2
 ; CHECK-NEXT:    vadd.f16 s0, s2, s0
 ; CHECK-NEXT:    vcvtb.f32.f16 s0, s0
-; CHECK-NEXT:    vstr s0, [r2]
-; CHECK-NEXT:    adds r2, #4
+; CHECK-NEXT:    vstmia r2!, {s0}
 ; CHECK-NEXT:    le lr, .LBB6_7
 ; CHECK-NEXT:  .LBB6_8: @ %for.cond.cleanup
 ; CHECK-NEXT:    pop.w {r4, r5, r6, r7, r8, r9, r10, pc}
@@ -1255,8 +1242,7 @@ define arm_aapcs_vfpcc void @half_half_sub(half* 
nocapture readonly %a, half* no
 ; CHECK-NEXT:    adds r1, #2
 ; CHECK-NEXT:    vsub.f16 s0, s2, s0
 ; CHECK-NEXT:    vcvtb.f32.f16 s0, s0
-; CHECK-NEXT:    vstr s0, [r2]
-; CHECK-NEXT:    adds r2, #4
+; CHECK-NEXT:    vstmia r2!, {s0}
 ; CHECK-NEXT:    le lr, .LBB7_7
 ; CHECK-NEXT:  .LBB7_8: @ %for.cond.cleanup
 ; CHECK-NEXT:    pop.w {r4, r5, r6, r7, r8, r9, r10, pc}
@@ -1376,8 +1362,7 @@ define arm_aapcs_vfpcc void @half_short_mul(half* 
nocapture readonly %a, i16* no
 ; CHECK-NEXT:    vcvt.f16.s32 s2, s2
 ; CHECK-NEXT:    vmul.f16 s0, s0, s2
 ; CHECK-NEXT:    vcvtb.f32.f16 s0, s0
-; CHECK-NEXT:    vstr s0, [r2]
-; CHECK-NEXT:    adds r2, #4
+; CHECK-NEXT:    vstmia r2!, {s0}
 ; CHECK-NEXT:    le lr, .LBB8_7
 ; CHECK-NEXT:  .LBB8_8: @ %for.cond.cleanup
 ; CHECK-NEXT:    pop.w {r4, r5, r6, r7, r8, r9, pc}
diff --git a/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll 
b/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
index 0156cfe25f8e..7e4603e4b4c6 100644
--- a/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
@@ -1442,8 +1442,7 @@ define arm_aapcs_vfpcc void 
@arm_biquad_cascade_stereo_df2T_f32(%struct.arm_biqu
 ; CHECK-NEXT:    adds r1, #8
 ; CHECK-NEXT:    vfma.f32 q5, q4, r5
 ; CHECK-NEXT:    vfma.f32 q3, q5, q2
-; CHECK-NEXT:    vstmia r7, {s20, s21}
-; CHECK-NEXT:    adds r7, #8
+; CHECK-NEXT:    vstmia r7!, {s20, s21}
 ; CHECK-NEXT:    vfma.f32 q3, q4, q1
 ; CHECK-NEXT:    vstrw.32 q3, [r4]
 ; CHECK-NEXT:    le lr, .LBB17_3
@@ -2069,8 +2068,7 @@ define void 
@arm_biquad_cascade_df2T_f32(%struct.arm_biquad_cascade_df2T_instanc
 ; CHECK-NEXT:  .LBB20_5: @ %while.body
 ; CHECK-NEXT:    @ Parent Loop BB20_3 Depth=1
 ; CHECK-NEXT:    @ => This Inner Loop Header: Depth=2
-; CHECK-NEXT:    ldrd r7, r4, [r1]
-; CHECK-NEXT:    adds r1, #8
+; CHECK-NEXT:    ldrd r7, r4, [r1], #8
 ; CHECK-NEXT:    vfma.f32 q6, q3, r7
 ; CHECK-NEXT:    vmov r7, s24
 ; CHECK-NEXT:    vmov q1, q6
diff --git a/llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll 
b/llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll
index b34896e32859..0eb1226f60db 100644
--- a/llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll
@@ -309,14 +309,11 @@ define void @fma8(float* noalias nocapture readonly %A, 
float* noalias nocapture
 ; CHECK-NEXT:    add.w r2, r2, r12, lsl #2
 ; CHECK-NEXT:  .LBB2_7: @ %for.body
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vldr s0, [r0]
-; CHECK-NEXT:    adds r0, #4
-; CHECK-NEXT:    vldr s2, [r1]
-; CHECK-NEXT:    adds r1, #4
+; CHECK-NEXT:    vldmia r0!, {s0}
+; CHECK-NEXT:    vldmia r1!, {s2}
 ; CHECK-NEXT:    vldr s4, [r2]
 ; CHECK-NEXT:    vfma.f32 s4, s2, s0
-; CHECK-NEXT:    vstr s4, [r2]
-; CHECK-NEXT:    adds r2, #4
+; CHECK-NEXT:    vstmia r2!, {s4}
 ; CHECK-NEXT:    le lr, .LBB2_7
 ; CHECK-NEXT:  .LBB2_8: @ %for.cond.cleanup
 ; CHECK-NEXT:    pop {r4, r5, r6, pc}
diff --git a/llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll 
b/llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll
index 070d9b744836..1b6cdfc517be 100644
--- a/llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll
@@ -44,14 +44,11 @@ define void @fma(float* noalias nocapture readonly %A, 
float* noalias nocapture
 ; CHECK-NEXT:    add.w r2, r2, r12, lsl #2
 ; CHECK-NEXT:  .LBB0_7: @ %for.body
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vldr s0, [r0]
-; CHECK-NEXT:    adds r0, #4
-; CHECK-NEXT:    vldr s2, [r1]
-; CHECK-NEXT:    adds r1, #4
+; CHECK-NEXT:    vldmia r0!, {s0}
+; CHECK-NEXT:    vldmia r1!, {s2}
 ; CHECK-NEXT:    vldr s4, [r2]
 ; CHECK-NEXT:    vfma.f32 s4, s2, s0
-; CHECK-NEXT:    vstr s4, [r2]
-; CHECK-NEXT:    adds r2, #4
+; CHECK-NEXT:    vstmia r2!, {s4}
 ; CHECK-NEXT:    le lr, .LBB0_7
 ; CHECK-NEXT:  .LBB0_8: @ %for.cond.cleanup
 ; CHECK-NEXT:    pop {r4, r5, r6, pc}
diff --git a/llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll 
b/llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
index 93a1535a42fe..f69eeb773a9f 100644
--- a/llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
@@ -38,12 +38,10 @@ define arm_aapcs_vfpcc void @ssatmul_s_q31(i32* nocapture 
readonly %pSrcA, i32*
 ; CHECK-NEXT:    vmvn.i32 q1, #0x80000000
 ; CHECK-NEXT:  .LBB0_4: @ %vector.body
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    ldrd r5, r4, [r0]
+; CHECK-NEXT:    ldrd r5, r4, [r0], #8
 ; CHECK-NEXT:    mov.w r3, #-1
-; CHECK-NEXT:    ldrd r8, r7, [r1]
-; CHECK-NEXT:    adds r0, #8
+; CHECK-NEXT:    ldrd r8, r7, [r1], #8
 ; CHECK-NEXT:    smull r4, r7, r7, r4
-; CHECK-NEXT:    adds r1, #8
 ; CHECK-NEXT:    asrl r4, r7, #31
 ; CHECK-NEXT:    smull r6, r5, r8, r5
 ; CHECK-NEXT:    rsbs.w r9, r4, #-2147483648
@@ -95,8 +93,7 @@ define arm_aapcs_vfpcc void @ssatmul_s_q31(i32* nocapture 
readonly %pSrcA, i32*
 ; CHECK-NEXT:    vorr q2, q2, q4
 ; CHECK-NEXT:    vmov r3, s10
 ; CHECK-NEXT:    vmov r4, s8
-; CHECK-NEXT:    strd r4, r3, [r2]
-; CHECK-NEXT:    adds r2, #8
+; CHECK-NEXT:    strd r4, r3, [r2], #8
 ; CHECK-NEXT:    le lr, .LBB0_4
 ; CHECK-NEXT:  @ %bb.5: @ %middle.block
 ; CHECK-NEXT:    ldrd r7, r3, [sp] @ 8-byte Folded Reload
@@ -744,10 +741,8 @@ define arm_aapcs_vfpcc void @usatmul_2_q31(i32* nocapture 
readonly %pSrcA, i32*
 ; CHECK-NEXT:    add.w r12, r0, r5, lsl #2
 ; CHECK-NEXT:  .LBB3_4: @ %vector.body
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    ldrd r4, r7, [r0]
-; CHECK-NEXT:    adds r0, #8
-; CHECK-NEXT:    ldrd r5, r10, [r1]
-; CHECK-NEXT:    adds r1, #8
+; CHECK-NEXT:    ldrd r4, r7, [r0], #8
+; CHECK-NEXT:    ldrd r5, r10, [r1], #8
 ; CHECK-NEXT:    umull r4, r5, r5, r4
 ; CHECK-NEXT:    lsrl r4, r5, #31
 ; CHECK-NEXT:    subs.w r6, r4, #-1
@@ -773,8 +768,7 @@ define arm_aapcs_vfpcc void @usatmul_2_q31(i32* nocapture 
readonly %pSrcA, i32*
 ; CHECK-NEXT:    vorn q0, q1, q0
 ; CHECK-NEXT:    vmov r4, s2
 ; CHECK-NEXT:    vmov r5, s0
-; CHECK-NEXT:    strd r5, r4, [r2]
-; CHECK-NEXT:    adds r2, #8
+; CHECK-NEXT:    strd r5, r4, [r2], #8
 ; CHECK-NEXT:    le lr, .LBB3_4
 ; CHECK-NEXT:  @ %bb.5: @ %middle.block
 ; CHECK-NEXT:    ldr r7, [sp] @ 4-byte Reload
diff --git a/llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll 
b/llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll
index 803f20571672..4393e4646bab 100644
--- a/llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll
@@ -521,8 +521,7 @@ define float @fadd_f32(float* nocapture readonly %x, i32 
%n) {
 ; CHECK-NEXT:    add.w r0, r0, r2, lsl #2
 ; CHECK-NEXT:  .LBB5_8: @ %for.body
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vldr s2, [r0]
-; CHECK-NEXT:    adds r0, #4
+; CHECK-NEXT:    vldmia r0!, {s2}
 ; CHECK-NEXT:    vadd.f32 s0, s2, s0
 ; CHECK-NEXT:    le lr, .LBB5_8
 ; CHECK-NEXT:  .LBB5_9: @ %for.cond.cleanup
@@ -620,8 +619,7 @@ define float @fmul_f32(float* nocapture readonly %x, i32 
%n) {
 ; CHECK-NEXT:    add.w r0, r0, r2, lsl #2
 ; CHECK-NEXT:  .LBB6_8: @ %for.body
 ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vldr s2, [r0]
-; CHECK-NEXT:    adds r0, #4
+; CHECK-NEXT:    vldmia r0!, {s2}
 ; CHECK-NEXT:    vmul.f32 s0, s2, s0
 ; CHECK-NEXT:    le lr, .LBB6_8
 ; CHECK-NEXT:  .LBB6_9: @ %for.cond.cleanup
diff --git a/llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll 
b/llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll
index 80e65f1ee855..d26757fc99e8 100644
--- a/llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll
@@ -176,8 +176,7 @@ define void @arm_cmplx_mag_squared_f32(float* nocapture 
readonly %pSrc, float* n
 ; CHECK-NEXT:    adds r3, #8
 ; CHECK-NEXT:    vmul.f32 s0, s0, s0
 ; CHECK-NEXT:    vfma.f32 s0, s2, s2
-; CHECK-NEXT:    vstr s0, [r12]
-; CHECK-NEXT:    add.w r12, r12, #4
+; CHECK-NEXT:    vstmia r12!, {s0}
 ; CHECK-NEXT:    le lr, .LBB1_7
 ; CHECK-NEXT:  .LBB1_8: @ %while.end
 ; CHECK-NEXT:    pop {r4, r5, r7, pc}
</cut>
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to