[TCWG CI] 453.povray slowed down by 3% after llvm: [llvm] Use range-based for loops (NFC)

2021-12-04 Thread ci_notify
After llvm commit f240e528cea25fd2a9ae01b1e1fe77f507ed7a2c
Author: Kazu Hirata 

[llvm] Use range-based for loops (NFC)

the following benchmarks slowed down by more than 2%:
- 453.povray slowed down by 3% from 4906 to 5047 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and 
"last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
will fail when triggerring benchmarking jobs if you don't have access to Linaro 
TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and 
assembly files at:
- First_bad save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/build-f240e528cea25fd2a9ae01b1e1fe77f507ed7a2c/save-temps/
- Last_good save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/build-c572eb1ad9d8a528bcaff0160888aff31b1f4b5f/save-temps/
- Baseline save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57

This benchmarking CI is work-in-progress, and we welcome feedback and 
suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is 
to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" 
data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION 
INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO

First_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/build-f240e528cea25fd2a9ae01b1e1fe77f507ed7a2c/
Last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/build-c572eb1ad9d8a528bcaff0160888aff31b1f4b5f/
Baseline build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/build-baseline/
Even more details: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/

Reproduce builds:

mkdir investigate-llvm-f240e528cea25fd2a9ae01b1e1fe77f507ed7a2c
cd investigate-llvm-f240e528cea25fd2a9ae01b1e1fe77f507ed7a2c

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O2_LTO/33/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /llvm/ ./ ./bisect/baseline/

cd llvm

# Reproduce first_bad build
git checkout --detach f240e528cea25fd2a9ae01b1e1fe77f507ed7a2c
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach c572eb1ad9d8a528bcaff0160888aff31b1f4b5f
../artifacts/test.sh

cd ..


Full commit (up to 1000 lines):

commit f240e528cea25fd2a9ae01b1e1fe77f507ed7a2c
Author: Kazu Hirata 
Date:   Mon Nov 29 09:04:44 2021 -0800

[llvm] Use range-based for loops (NFC)
---
 llvm/lib/CodeGen/MachinePipeliner.cpp  |  7 ++---
 .../CodeGen/SelectionDAG/ResourcePriorityQueue.cpp |  4 +--
 llvm/lib/Object/ELFObjectFile.cpp  |  4 +--
 llvm/lib/ObjectYAML/COFFEmitter.cpp| 32 ++
 llvm/lib/Passes/StandardInstrumentations.cpp   |  4 +--
 llvm/lib/ProfileData/InstrProf.cpp | 11 
 llvm/lib/Target/Hexagon/HexagonCommonGEP.cpp   | 18 ++--
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp  | 32 ++
 8 files changed, 50 insertions(+), 62 deletions(-)

diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp 
b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 21be6718b7d9..8d6459a627fa 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib

[TCWG CI] 453.povray failed to build after llvm: [SLP]Fix reused extracts cost.

2021-12-04 Thread ci_notify
After llvm commit ba74bb3a226e1b4660537f274627285b1bf41ee1
Author: Alexey Bataev 

[SLP]Fix reused extracts cost.

the following benchmarks slowed down by more than 2%:
- 453.povray failed to build

Below reproducer instructions can be used to re-build both "first_bad" and 
"last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
will fail when triggerring benchmarking jobs if you don't have access to Linaro 
TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and 
assembly files at:
- First_bad save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/save-temps/
- Last_good save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/save-temps/
- Baseline save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3 -flto
- Hardware: NVidia TX1 4x Cortex-A57

This benchmarking CI is work-in-progress, and we welcome feedback and 
suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is 
to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" 
data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION 
INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO

First_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-ba74bb3a226e1b4660537f274627285b1bf41ee1/
Last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-78cc133c63173a4b5b7a43750cc507d4cff683cf/
Baseline build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/build-baseline/
Even more details: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/

Reproduce builds:

mkdir investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1
cd investigate-llvm-ba74bb3a226e1b4660537f274627285b1bf41ee1

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3_LTO/39/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /llvm/ ./ ./bisect/baseline/

cd llvm

# Reproduce first_bad build
git checkout --detach ba74bb3a226e1b4660537f274627285b1bf41ee1
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 78cc133c63173a4b5b7a43750cc507d4cff683cf
../artifacts/test.sh

cd ..


Full commit (up to 1000 lines):

commit ba74bb3a226e1b4660537f274627285b1bf41ee1
Author: Alexey Bataev 
Date:   Thu Dec 2 04:22:55 2021 -0800

[SLP]Fix reused extracts cost.

If the extractelement instruction is used multiple times in the
different tree entries (either vectorized, or gathered), need to
compensate the scalar cost of such instructions. They are completely
removed if all users are part of the tree but we need to compensate the
cost only once for each instruction.

Differential Revision: https://reviews.llvm.org/D114958
---
 llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp| 29 +-
 .../X86/extractelement-multiple-uses.ll| 23 +
 2 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp 
b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 95061e9053fa..335ad6c85387 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Trans