After llvmorg-16-init-16383-g9b5f62685ab4 commit
9b5f62685ab447ba9d3ea8ac2616e0c76a44d21b
Author: Alexey Bataev <[email protected]>
[SLP]Fix cost of the broadcast buildvector/gather.
the following benchmarks slowed down by more than 3%:
- 445.gobmk slowed down by 6% from 10321 to 10904 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and
"last_good" cross-toolchains used in this bisection. Naturally, the scripts
will fail when triggerring benchmarking jobs if you don\'t have access to
Linaro TCWG CI.
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: arm-linux-gnueabihf
- Compiler flags: -O3 -flto -marm
- Hardware:
This benchmarking CI is work-in-progress, and we welcome feedback and
suggestions at [email protected] . In our improvement plans is
to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate"
data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION
INSTRUCTIONS, AND THE RAW COMMIT.
For latest status see comments in https://linaro.atlassian.net/browse/GNU-692 .
Status of llvmorg-16-init-16383-g9b5f62685ab4 commit for
tcwg_bmk-code_speed-spec2k6:
commit 9b5f62685ab447ba9d3ea8ac2616e0c76a44d21b
Author: Alexey Bataev <[email protected]>
Date: Wed Dec 21 13:38:38 2022 -0800
[SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.
Differential Revision: https://reviews.llvm.org/D140498
* llvm-arm-master-O3_LTO
** After llvmorg-16-init-16383-g9b5f62685ab4 commit
9b5f62685ab447ba9d3ea8ac2616e0c76a44d21b
** Author: Alexey Bataev <[email protected]>
**
** [SLP]Fix cost of the broadcast buildvector/gather.
**
** the following benchmarks slowed down by more than 3%:
** - 445.gobmk slowed down by 6% from 10321 to 10904 perf samples
**
https://ci.linaro.org/job/tcwg_bmk-code_speed-spec2k6-llvm-arm-master-O3_LTO-build/9/
Bad build:
https://ci.linaro.org/job/tcwg_bmk-code_speed-spec2k6-llvm-arm-master-O3_LTO-build/9/artifact/artifacts
Good build:
https://ci.linaro.org/job/tcwg_bmk-code_speed-spec2k6-llvm-arm-master-O3_LTO-build/8/artifact/artifacts
Reproduce current build:
<cut>
mkdir -p investigate-llvm-9b5f62685ab447ba9d3ea8ac2616e0c76a44d21b
cd investigate-llvm-9b5f62685ab447ba9d3ea8ac2616e0c76a44d21b
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests for bad and good builds
mkdir -p bad/artifacts good/artifacts
curl -o bad/artifacts/manifest.sh
https://ci.linaro.org/job/tcwg_bmk-code_speed-spec2k6-llvm-arm-master-O3_LTO-build/9/artifact/artifacts/manifest.sh
--fail
curl -o good/artifacts/manifest.sh
https://ci.linaro.org/job/tcwg_bmk-code_speed-spec2k6-llvm-arm-master-O3_LTO-build/8/artifact/artifacts/manifest.sh
--fail
# Reproduce bad build
(cd bad; ../jenkins-scripts/tcwg_bmk-build.sh ^^ true %%rr[top_artifacts]
artifacts)
# Reproduce good build
(cd good; ../jenkins-scripts/tcwg_bmk-build.sh ^^ true %%rr[top_artifacts]
artifacts)
</cut>
Full commit (up to 1000 lines):
<cut>
commit 9b5f62685ab447ba9d3ea8ac2616e0c76a44d21b
Author: Alexey Bataev <[email protected]>
Date: Wed Dec 21 13:38:38 2022 -0800
[SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.
Differential Revision: https://reviews.llvm.org/D140498
---
llvm/include/llvm/Analysis/TargetTransformInfo.h | 12 +-
.../llvm/Analysis/TargetTransformInfoImpl.h | 4 +-
llvm/include/llvm/CodeGen/BasicTTIImpl.h | 54 +--
llvm/lib/Analysis/TargetTransformInfo.cpp | 8 +-
.../Target/AArch64/AArch64TargetTransformInfo.cpp | 7 +-
.../Target/AArch64/AArch64TargetTransformInfo.h | 4 +-
.../Target/AMDGPU/AMDGPUTargetTransformInfo.cpp | 7 +-
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h | 2 +-
llvm/lib/Target/AMDGPU/R600TargetTransformInfo.cpp | 7 +-
llvm/lib/Target/AMDGPU/R600TargetTransformInfo.h | 2 +-
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | 7 +-
llvm/lib/Target/ARM/ARMTargetTransformInfo.h | 4 +-
.../Target/Hexagon/HexagonTargetTransformInfo.cpp | 6 +-
.../Target/Hexagon/HexagonTargetTransformInfo.h | 4 +-
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp | 9 +-
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h | 4 +-
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp | 7 +-
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h | 4 +-
.../Target/SystemZ/SystemZTargetTransformInfo.cpp | 5 +-
.../Target/SystemZ/SystemZTargetTransformInfo.h | 4 +-
.../WebAssembly/WebAssemblyTargetTransformInfo.cpp | 5 +-
.../WebAssembly/WebAssemblyTargetTransformInfo.h | 4 +-
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | 42 ++-
llvm/lib/Target/X86/X86TargetTransformInfo.h | 4 +-
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | 21 +-
.../Analysis/CostModel/X86/loop_v2-inseltpoison.ll | 2 +-
llvm/test/Analysis/CostModel/X86/loop_v2.ll | 2 +-
.../X86/masked-intrinsic-cost-inseltpoison.ll | 4 +-
.../CostModel/X86/masked-intrinsic-cost.ll | 4 +-
.../CostModel/X86/vector-insert-inseltpoison.ll | 120 +++----
llvm/test/Analysis/CostModel/X86/vector-insert.ll | 120 +++----
.../Analysis/CostModel/X86/vshift-ashr-codesize.ll | 50 +--
.../CostModel/X86/vshift-ashr-cost-inseltpoison.ll | 102 ++----
.../Analysis/CostModel/X86/vshift-ashr-cost.ll | 102 ++----
.../Analysis/CostModel/X86/vshift-ashr-latency.ll | 18 +-
.../CostModel/X86/vshift-ashr-sizelatency.ll | 50 +--
.../Analysis/CostModel/X86/vshift-lshr-codesize.ll | 82 +----
.../CostModel/X86/vshift-lshr-cost-inseltpoison.ll | 102 ++----
.../Analysis/CostModel/X86/vshift-lshr-cost.ll | 102 ++----
.../Analysis/CostModel/X86/vshift-lshr-latency.ll | 102 ++----
.../CostModel/X86/vshift-lshr-sizelatency.ll | 82 +----
.../Analysis/CostModel/X86/vshift-shl-codesize.ll | 82 +----
.../CostModel/X86/vshift-shl-cost-inseltpoison.ll | 138 ++-----
.../test/Analysis/CostModel/X86/vshift-shl-cost.ll | 138 ++-----
.../Analysis/CostModel/X86/vshift-shl-latency.ll | 102 ++----
.../CostModel/X86/vshift-shl-sizelatency.ll | 174 ++-------
llvm/test/Transforms/SLPVectorizer/X86/cse.ll | 7 +-
.../Transforms/SLPVectorizer/X86/malformed_phis.ll | 140 ++++----
.../X86/remark_gather-load-redux-cost.ll | 2 +-
.../SLPVectorizer/X86/used-reduced-op.ll | 399 +++++++++++----------
50 files changed, 941 insertions(+), 1522 deletions(-)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 6200af73842c..a9cb8717ffa8 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1193,7 +1193,8 @@ public:
/// case is to provision the cost of vectorization/scalarization in
/// vectorizer passes.
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index = -1) const;
+ unsigned Index = -1, Value *Op0 = nullptr,
+ Value *Op1 = nullptr) const;
/// \return The expected cost of vector Insert and Extract.
/// This is used when instruction is available, and implementation
@@ -1786,7 +1787,8 @@ public:
TTI::TargetCostKind CostKind,
const Instruction *I) = 0;
virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) = 0;
+ unsigned Index, Value *Op0,
+ Value *Op1) = 0;
virtual InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,
unsigned Index) = 0;
@@ -2358,9 +2360,9 @@ public:
const Instruction *I) override {
return Impl.getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind,
I);
}
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) override {
- return Impl.getVectorInstrCost(Opcode, Val, Index);
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1) override {
+ return Impl.getVectorInstrCost(Opcode, Val, Index, Op0, Op1);
}
InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,
unsigned Index) override {
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index e81e430f6624..262b42a05d99 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -585,8 +585,8 @@ public:
return 1;
}
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) const {
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1) const {
return 1;
}
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index aabb94d82c4b..f27c6899d757 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -90,10 +90,12 @@ private:
InstructionCost Cost = 0;
// Broadcast cost is equal to the cost of extracting the zero'th element
// plus the cost of inserting it into every element of the result vector.
- Cost += thisT()->getVectorInstrCost(Instruction::ExtractElement, VTy, 0);
+ Cost += thisT()->getVectorInstrCost(Instruction::ExtractElement, VTy, 0,
+ nullptr, nullptr);
for (int i = 0, e = VTy->getNumElements(); i < e; ++i) {
- Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, VTy, i);
+ Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, VTy, i,
+ nullptr, nullptr);
}
return Cost;
}
@@ -110,8 +112,10 @@ private:
// vector and finally index 3 of second vector and insert them at index
// <0,1,2,3> of result vector.
for (int i = 0, e = VTy->getNumElements(); i < e; ++i) {
- Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, VTy, i);
- Cost += thisT()->getVectorInstrCost(Instruction::ExtractElement, VTy, i);
+ Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, VTy, i,
+ nullptr, nullptr);
+ Cost += thisT()->getVectorInstrCost(Instruction::ExtractElement, VTy, i,
+ nullptr, nullptr);
}
return Cost;
}
@@ -134,9 +138,9 @@ private:
// type.
for (int i = 0; i != NumSubElts; ++i) {
Cost += thisT()->getVectorInstrCost(Instruction::ExtractElement, VTy,
- i + Index);
- Cost +=
- thisT()->getVectorInstrCost(Instruction::InsertElement, SubVTy, i);
+ i + Index, nullptr, nullptr);
+ Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, SubVTy,
i,
+ nullptr, nullptr);
}
return Cost;
}
@@ -158,10 +162,10 @@ private:
// the source type plus the cost of inserting them into the result vector
// type.
for (int i = 0; i != NumSubElts; ++i) {
- Cost +=
- thisT()->getVectorInstrCost(Instruction::ExtractElement, SubVTy, i);
+ Cost += thisT()->getVectorInstrCost(Instruction::ExtractElement, SubVTy,
+ i, nullptr, nullptr);
Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, VTy,
- i + Index);
+ i + Index, nullptr, nullptr);
}
return Cost;
}
@@ -212,7 +216,7 @@ private:
FixedVectorType::get(
PointerType::get(VT->getElementType(), 0),
VT->getNumElements()),
- -1)
+ -1, nullptr, nullptr)
: 0;
InstructionCost LoadCost =
VT->getNumElements() *
@@ -237,7 +241,7 @@ private:
Instruction::ExtractElement,
FixedVectorType::get(Type::getInt1Ty(DataTy->getContext()),
VT->getNumElements()),
- -1) +
+ -1, nullptr, nullptr) +
getCFInstrCost(Instruction::Br, CostKind) +
getCFInstrCost(Instruction::PHI, CostKind));
}
@@ -722,9 +726,11 @@ public:
if (!DemandedElts[i])
continue;
if (Insert)
- Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, Ty, i);
+ Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, Ty, i,
+ nullptr, nullptr);
if (Extract)
- Cost += thisT()->getVectorInstrCost(Instruction::ExtractElement, Ty,
i);
+ Cost += thisT()->getVectorInstrCost(Instruction::ExtractElement, Ty, i,
+ nullptr, nullptr);
}
return Cost;
@@ -1123,7 +1129,7 @@ public:
InstructionCost getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy, unsigned Index) {
return thisT()->getVectorInstrCost(Instruction::ExtractElement, VecTy,
- Index) +
+ Index, nullptr, nullptr) +
thisT()->getCastInstrCost(Opcode, Dst, VecTy->getElementType(),
TTI::CastContextHint::None,
TTI::TCK_RecipThroughput);
@@ -1184,14 +1190,20 @@ public:
return 1;
}
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) {
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1) {
return getRegUsageForType(Val->getScalarType());
}
InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,
unsigned Index) {
- return thisT()->getVectorInstrCost(I.getOpcode(), Val, Index);
+ Value *Op0 = nullptr;
+ Value *Op1 = nullptr;
+ if (auto *IE = dyn_cast<InsertElementInst>(&I)) {
+ Op0 = IE->getOperand(0);
+ Op1 = IE->getOperand(1);
+ }
+ return thisT()->getVectorInstrCost(I.getOpcode(), Val, Index, Op0, Op1);
}
InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,
@@ -2246,7 +2258,8 @@ public:
ArithCost +=
NumReduxLevels * thisT()->getArithmeticInstrCost(Opcode, Ty, CostKind);
return ShuffleCost + ArithCost +
- thisT()->getVectorInstrCost(Instruction::ExtractElement, Ty, 0);
+ thisT()->getVectorInstrCost(Instruction::ExtractElement, Ty, 0,
+ nullptr, nullptr);
}
/// Try to calculate the cost of performing strict (in-order) reductions,
@@ -2353,7 +2366,8 @@ public:
// The last min/max should be in vector registers and we counted it above.
// So just need a single extractelement.
return ShuffleCost + MinMaxCost +
- thisT()->getVectorInstrCost(Instruction::ExtractElement, Ty, 0);
+ thisT()->getVectorInstrCost(Instruction::ExtractElement, Ty, 0,
+ nullptr, nullptr);
}
InstructionCost getExtendedReductionCost(unsigned Opcode, bool IsUnsigned,
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp
b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 7459ce18c3cf..d03a8cf14172 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -897,13 +897,13 @@ InstructionCost TargetTransformInfo::getCmpSelInstrCost(
return Cost;
}
-InstructionCost TargetTransformInfo::getVectorInstrCost(unsigned Opcode,
- Type *Val,
- unsigned Index) const {
+InstructionCost TargetTransformInfo::getVectorInstrCost(
+ unsigned Opcode, Type *Val, unsigned Index, Value *Op0, Value *Op1) const {
// FIXME: Assert that Opcode is either InsertElement or ExtractElement.
// This is mentioned in the interface description and respected by all
// callers, but never asserted upon.
- InstructionCost Cost = TTIImpl->getVectorInstrCost(Opcode, Val, Index);
+ InstructionCost Cost =
+ TTIImpl->getVectorInstrCost(Opcode, Val, Index, Op0, Op1);
assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;
}
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index ae12ae951d75..f5f6c07f766a 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2034,8 +2034,8 @@ InstructionCost
AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode,
// Get the cost for the extract. We compute the cost (if any) for the extend
// below.
- InstructionCost Cost =
- getVectorInstrCost(Instruction::ExtractElement, VecTy, Index);
+ InstructionCost Cost = getVectorInstrCost(Instruction::ExtractElement, VecTy,
+ Index, nullptr, nullptr);
// Legalize the types.
auto VecLT = getTypeLegalizationCost(VecTy);
@@ -2128,7 +2128,8 @@ InstructionCost
AArch64TTIImpl::getVectorInstrCostHelper(Type *Val,
}
InstructionCost AArch64TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) {
+ unsigned Index, Value *Op0,
+ Value *Op1) {
return getVectorInstrCostHelper(Val, Index, false /* HasRealUse */);
}
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index e309117a885b..6eaff9566b8c 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -169,8 +169,8 @@ public:
InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index);
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1);
InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,
unsigned Index);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index af72ba2daa2d..00e6970291bf 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -790,7 +790,8 @@ GCNTTIImpl::getMinMaxReductionCost(VectorType *Ty,
VectorType *CondTy,
}
InstructionCost GCNTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
- unsigned Index) {
+ unsigned Index, Value *Op0,
+ Value *Op1) {
switch (Opcode) {
case Instruction::ExtractElement:
case Instruction::InsertElement: {
@@ -799,7 +800,7 @@ InstructionCost GCNTTIImpl::getVectorInstrCost(unsigned
Opcode, Type *ValTy,
if (EltSize < 32) {
if (EltSize == 16 && Index == 0 && ST->has16BitInsts())
return 0;
- return BaseT::getVectorInstrCost(Opcode, ValTy, Index);
+ return BaseT::getVectorInstrCost(Opcode, ValTy, Index, Op0, Op1);
}
// Extracts are just reads of a subregister, so are free. Inserts are
@@ -810,7 +811,7 @@ InstructionCost GCNTTIImpl::getVectorInstrCost(unsigned
Opcode, Type *ValTy,
return Index == ~0u ? 2 : 0;
}
default:
- return BaseT::getVectorInstrCost(Opcode, ValTy, Index);
+ return BaseT::getVectorInstrCost(Opcode, ValTy, Index, Op0, Op1);
}
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
index 347ce87acd26..4a1137dcf2e2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
@@ -162,7 +162,7 @@ public:
using BaseT::getVectorInstrCost;
InstructionCost getVectorInstrCost(unsigned Opcode, Type *ValTy,
- unsigned Index);
+ unsigned Index, Value *Op0, Value *Op1);
bool isReadRegisterSourceOfDivergence(const IntrinsicInst *ReadReg) const;
bool isSourceOfDivergence(const Value *V) const;
diff --git a/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.cpp
index 365c005b2503..c3dd321a7b9c 100644
--- a/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.cpp
@@ -108,14 +108,15 @@ InstructionCost R600TTIImpl::getCFInstrCost(unsigned
Opcode,
}
InstructionCost R600TTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
- unsigned Index) {
+ unsigned Index, Value *Op0,
+ Value *Op1) {
switch (Opcode) {
case Instruction::ExtractElement:
case Instruction::InsertElement: {
unsigned EltSize =
DL.getTypeSizeInBits(cast<VectorType>(ValTy)->getElementType());
if (EltSize < 32) {
- return BaseT::getVectorInstrCost(Opcode, ValTy, Index);
+ return BaseT::getVectorInstrCost(Opcode, ValTy, Index, Op0, Op1);
}
// Extracts are just reads of a subregister, so are free. Inserts are
@@ -126,7 +127,7 @@ InstructionCost R600TTIImpl::getVectorInstrCost(unsigned
Opcode, Type *ValTy,
return Index == ~0u ? 2 : 0;
}
default:
- return BaseT::getVectorInstrCost(Opcode, ValTy, Index);
+ return BaseT::getVectorInstrCost(Opcode, ValTy, Index, Op0, Op1);
}
}
diff --git a/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.h
b/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.h
index f1a198fd14e4..9045cc773189 100644
--- a/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.h
@@ -62,7 +62,7 @@ public:
const Instruction *I = nullptr);
using BaseT::getVectorInstrCost;
InstructionCost getVectorInstrCost(unsigned Opcode, Type *ValTy,
- unsigned Index);
+ unsigned Index, Value *Op0, Value *Op1);
};
} // end namespace llvm
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index 8eec432a4a66..07786ea82738 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -874,7 +874,8 @@ InstructionCost ARMTTIImpl::getCastInstrCost(unsigned
Opcode, Type *Dst,
}
InstructionCost ARMTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
- unsigned Index) {
+ unsigned Index, Value *Op0,
+ Value *Op1) {
// Penalize inserting into an D-subregister. We end up with a three times
// lower estimated throughput on swift.
if (ST->hasSlowLoadDSubregister() && Opcode == Instruction::InsertElement &&
@@ -893,7 +894,7 @@ InstructionCost ARMTTIImpl::getVectorInstrCost(unsigned
Opcode, Type *ValTy,
if (ValTy->isVectorTy() &&
ValTy->getScalarSizeInBits() <= 32)
return std::max<InstructionCost>(
- BaseT::getVectorInstrCost(Opcode, ValTy, Index), 2U);
+ BaseT::getVectorInstrCost(Opcode, ValTy, Index, Op0, Op1), 2U);
}
if (ST->hasMVEIntegerOps() && (Opcode == Instruction::InsertElement ||
@@ -906,7 +907,7 @@ InstructionCost ARMTTIImpl::getVectorInstrCost(unsigned
Opcode, Type *ValTy,
return LT.first * (ValTy->getScalarType()->isIntegerTy() ? 4 : 1);
}
- return BaseT::getVectorInstrCost(Opcode, ValTy, Index);
+ return BaseT::getVectorInstrCost(Opcode, ValTy, Index, Op0, Op1);
}
InstructionCost ARMTTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
index db96c3da54cf..6b1e6444c516 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
@@ -240,8 +240,8 @@ public:
const Instruction *I = nullptr);
using BaseT::getVectorInstrCost;
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index);
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1);
InstructionCost getAddressComputationCost(Type *Val, ScalarEvolution *SE,
const SCEV *Ptr);
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
index 779577816fb9..6089c865cedf 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
@@ -329,7 +329,8 @@ InstructionCost HexagonTTIImpl::getCastInstrCost(unsigned
Opcode, Type *DstTy,
}
InstructionCost HexagonTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) {
+ unsigned Index, Value *Op0,
+ Value *Op1) {
Type *ElemTy = Val->isVectorTy() ? cast<VectorType>(Val)->getElementType()
: Val;
if (Opcode == Instruction::InsertElement) {
@@ -338,7 +339,8 @@ InstructionCost HexagonTTIImpl::getVectorInstrCost(unsigned
Opcode, Type *Val,
if (ElemTy->isIntegerTy(32))
return Cost;
// If it's not a 32-bit value, there will need to be an extract.
- return Cost + getVectorInstrCost(Instruction::ExtractElement, Val, Index);
+ return Cost + getVectorInstrCost(Instruction::ExtractElement, Val, Index,
+ Op0, Op1);
}
if (Opcode == Instruction::ExtractElement)
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
index 49d9520b8323..d41299ff6413 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
@@ -154,8 +154,8 @@ public:
TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);
using BaseT::getVectorInstrCost;
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index);
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1);
InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr) {
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
index 3b952f11be34..328a70ec43f6 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
@@ -675,7 +675,8 @@ InstructionCost PPCTTIImpl::getCmpSelInstrCost(unsigned
Opcode, Type *ValTy,
}
InstructionCost PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) {
+ unsigned Index, Value *Op0,
+ Value *Op1) {
assert(Val->isVectorTy() && "This must be a vector type");
int ISD = TLI->InstructionOpcodeToISD(Opcode);
@@ -685,7 +686,8 @@ InstructionCost PPCTTIImpl::getVectorInstrCost(unsigned
Opcode, Type *Val,
if (!CostFactor.isValid())
return InstructionCost::getMax();
- InstructionCost Cost = BaseT::getVectorInstrCost(Opcode, Val, Index);
+ InstructionCost Cost =
+ BaseT::getVectorInstrCost(Opcode, Val, Index, Op0, Op1);
Cost *= CostFactor;
if (ST->hasVSX() && Val->getScalarType()->isDoubleTy()) {
@@ -827,7 +829,8 @@ InstructionCost PPCTTIImpl::getMemoryOpCost(unsigned
Opcode, Type *Src,
if (Src->isVectorTy() && Opcode == Instruction::Store)
for (int i = 0, e = cast<FixedVectorType>(Src)->getNumElements(); i < e;
++i)
- Cost += getVectorInstrCost(Instruction::ExtractElement, Src, i);
+ Cost += getVectorInstrCost(Instruction::ExtractElement, Src, i, nullptr,
+ nullptr);
return Cost;
}
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
index 9db903baf407..810a7d0d62ef 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
@@ -126,8 +126,8 @@ public:
TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);
using BaseT::getVectorInstrCost;
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index);
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1);
InstructionCost
getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
unsigned AddressSpace, TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 02ce1b135f7f..ed8af25998b0 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1216,12 +1216,13 @@ InstructionCost
RISCVTTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
}
InstructionCost RISCVTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) {
+ unsigned Index, Value *Op0,
+ Value *Op1) {
assert(Val->isVectorTy() && "This must be a vector type");
if (Opcode != Instruction::ExtractElement &&
Opcode != Instruction::InsertElement)
- return BaseT::getVectorInstrCost(Opcode, Val, Index);
+ return BaseT::getVectorInstrCost(Opcode, Val, Index, Op0, Op1);
// Legalize the type.
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Val);
@@ -1235,7 +1236,7 @@ InstructionCost RISCVTTIImpl::getVectorInstrCost(unsigned
Opcode, Type *Val,
return LT.first;
if (!isTypeLegal(Val))
- return BaseT::getVectorInstrCost(Opcode, Val, Index);
+ return BaseT::getVectorInstrCost(Opcode, Val, Index, Op0, Op1);
// In RVV, we could use vslidedown + vmv.x.s to extract element from vector
// and vslideup + vmv.s.x to insert element to vector.
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 80c7ca3564d7..5df266ba35b5 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -157,8 +157,8 @@ public:
const Instruction *I = nullptr);
using BaseT::getVectorInstrCost;
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index);
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1);
InstructionCost getArithmeticInstrCost(
unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
index 5d00e56ae347..d6736319a404 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
@@ -996,7 +996,8 @@ InstructionCost SystemZTTIImpl::getCmpSelInstrCost(unsigned
Opcode, Type *ValTy,
}
InstructionCost SystemZTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) {
+ unsigned Index, Value *Op0,
+ Value *Op1) {
// vlvgp will insert two grs into a vector register, so only count half the
// number of instructions.
if (Opcode == Instruction::InsertElement && Val->isIntOrIntVectorTy(64))
@@ -1012,7 +1013,7 @@ InstructionCost
SystemZTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
return Cost;
}
- return BaseT::getVectorInstrCost(Opcode, Val, Index);
+ return BaseT::getVectorInstrCost(Opcode, Val, Index, Op0, Op1);
}
// Check if a load may be folded as a memory operand in its user.
diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
index 5ac3d8149a1d..33c3778d572c 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
@@ -107,8 +107,8 @@ public:
TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);
using BaseT::getVectorInstrCost;
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index);
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1);
bool isFoldableLoad(const LoadInst *Ld, const Instruction *&FoldedValue);
InstructionCost
getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
index 38464627e742..b94dcd63ad8b 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
@@ -82,9 +82,10 @@ InstructionCost WebAssemblyTTIImpl::getArithmeticInstrCost(
InstructionCost WebAssemblyTTIImpl::getVectorInstrCost(unsigned Opcode,
Type *Val,
- unsigned Index) {
+ unsigned Index,
+ Value *Op0, Value *Op1)
{
InstructionCost Cost =
- BasicTTIImplBase::getVectorInstrCost(Opcode, Val, Index);
+ BasicTTIImplBase::getVectorInstrCost(Opcode, Val, Index, Op0, Op1);
// SIMD128's insert/extract currently only take constant indices.
if (Index == -1u)
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
index 7eed7ef44af7..4f54a762042f 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
@@ -66,8 +66,8 @@ public:
ArrayRef<const Value *> Args = ArrayRef<const Value *>(),
const Instruction *CxtI = nullptr);
using BaseT::getVectorInstrCost;
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index);
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1);
/// @}
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 7d08a1654be7..5b6c7d86cebe 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -4257,7 +4257,8 @@ X86TTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
}
InstructionCost X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index) {
+ unsigned Index, Value *Op0,
+ Value *Op1) {
static const CostTblEntry SLMCostTbl[] = {
{ ISD::EXTRACT_VECTOR_ELT, MVT::i8, 4 },
{ ISD::EXTRACT_VECTOR_ELT, MVT::i16, 4 },
@@ -4330,6 +4331,14 @@ InstructionCost X86TTIImpl::getVectorInstrCost(unsigned
Opcode, Type *Val,
}
}
+ MVT MScalarTy = LT.second.getScalarType();
+ auto IsCheapPInsrPExtrInsertPS = [&]() {
+ return (MScalarTy == MVT::i16 && ST->hasSSE2()) ||
+ (MScalarTy.isInteger() && ST->hasSSE41()) ||
+ (MScalarTy == MVT::f32 && ST->hasSSE41() &&
+ Opcode == Instruction::InsertElement);
+ };
+
if (Index == 0) {
// Floating point scalars are already located in index #0.
// Many insertions to #0 can fold away for scalar fp-ops, so let's assume
@@ -4337,6 +4346,20 @@ InstructionCost X86TTIImpl::getVectorInstrCost(unsigned
Opcode, Type *Val,
if (ScalarType->isFloatingPointTy())
return RegisterFileMoveCost;
+ if (Opcode == Instruction::InsertElement &&
+ isa_and_nonnull<UndefValue>(Op0)) {
+ // Consider the gather cost to be cheap.
+ if (isa_and_nonnull<LoadInst>(Op1))
+ return RegisterFileMoveCost;
+ if (!IsCheapPInsrPExtrInsertPS()) {
+ // mov constant-to-GPR + movd/movq GPR -> XMM.
+ if (isa_and_nonnull<Constant>(Op1) && Op1->getType()->isIntegerTy())
+ return 2 + RegisterFileMoveCost;
+ // Assume movd/movq GPR -> XMM is relatively cheap on all targets.
+ return 1 + RegisterFileMoveCost;
+ }
+ }
+
// Assume movd/movq XMM -> GPR is relatively cheap on all targets.
if (ScalarType->isIntegerTy() && Opcode == Instruction::ExtractElement)
return 1 + RegisterFileMoveCost;
@@ -4344,19 +4367,13 @@ InstructionCost X86TTIImpl::getVectorInstrCost(unsigned
Opcode, Type *Val,
int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Unexpected vector opcode");
- MVT MScalarTy = LT.second.getScalarType();
if (ST->useSLMArithCosts())
if (auto *Entry = CostTableLookup(SLMCostTbl, ISD, MScalarTy))
return Entry->Cost + RegisterFileMoveCost;
// Assume pinsr/pextr XMM <-> GPR is relatively cheap on all targets.
- if ((MScalarTy == MVT::i16 && ST->hasSSE2()) ||
- (MScalarTy.isInteger() && ST->hasSSE41()))
- return 1 + RegisterFileMoveCost;
-
// Assume insertps is relatively cheap on all targets.
- if (MScalarTy == MVT::f32 && ST->hasSSE41() &&
- Opcode == Instruction::InsertElement)
+ if (IsCheapPInsrPExtrInsertPS())
return 1 + RegisterFileMoveCost;
// For extractions we just need to shuffle the element to index 0, which
@@ -4383,7 +4400,8 @@ InstructionCost X86TTIImpl::getVectorInstrCost(unsigned
Opcode, Type *Val,
if (Opcode == Instruction::ExtractElement && ScalarType->isPointerTy())
RegisterFileMoveCost += 1;
- return BaseT::getVectorInstrCost(Opcode, Val, Index) + RegisterFileMoveCost;
+ return BaseT::getVectorInstrCost(Opcode, Val, Index, Op0, Op1) +
+ RegisterFileMoveCost;
}
InstructionCost X86TTIImpl::getScalarizationOverhead(VectorType *Ty,
@@ -5155,7 +5173,8 @@ X86TTIImpl::getArithmeticReductionCost(unsigned Opcode,
VectorType *ValTy,
}
// Add the final extract element to the cost.
- return ReductionCost + getVectorInstrCost(Instruction::ExtractElement, Ty,
0);
+ return ReductionCost + getVectorInstrCost(Instruction::ExtractElement, Ty, 0,
+ nullptr, nullptr);
}
InstructionCost X86TTIImpl::getMinMaxCost(Type *Ty, Type *CondTy,
@@ -5455,7 +5474,8 @@ X86TTIImpl::getMinMaxReductionCost(VectorType *ValTy,
VectorType *CondTy,
}
// Add the final extract element to the cost.
- return MinMaxCost + getVectorInstrCost(Instruction::ExtractElement, Ty, 0);
+ return MinMaxCost + getVectorInstrCost(Instruction::ExtractElement, Ty, 0,
+ nullptr, nullptr);
}
/// Calculate the cost of materializing a 64-bit value. This helper
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h
b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index 666789e160dc..c189e503f4e8 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -147,8 +147,8 @@ public:
TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);
using BaseT::getVectorInstrCost;
- InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
- unsigned Index);
+ InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val, unsigned
Index,
+ Value *Op0, Value *Op1);
InstructionCost getScalarizationOverhead(VectorType *Ty,
const APInt &DemandedElts,
bool Insert, bool Extract);
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index e1b52aa2f80e..8ca422cfab9f 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -6745,9 +6745,24 @@ InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
// broadcast.
assert(VecTy == FinalVecTy &&
"No reused scalars expected for broadcast.");
- return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy,
- /*Mask=*/std::nullopt, CostKind, /*Index=*/0,
- /*SubTp=*/nullptr, /*Args=*/VL[0]);
+ const auto *It =
+ find_if(VL, [](Value *V) { return !isa<UndefValue>(V); });
+ // If all values are undefs - consider cost free.
+ if (It == VL.end())
+ return TTI::TCC_Free;
+ // Add broadcast for non-identity shuffle only.
+ bool NeedShuffle =
+ VL.front() != *It || !all_of(VL.drop_front(), UndefValue::classof);
+ InstructionCost InsertCost =
+ TTI->getVectorInstrCost(Instruction::InsertElement, VecTy,
+ /*Index=*/0, PoisonValue::get(VecTy), *It);
+ return InsertCost + (NeedShuffle
+ ? TTI->getShuffleCost(
+ TargetTransformInfo::SK_Broadcast, VecTy,
+ /*Mask=*/std::nullopt, CostKind,
+ /*Index=*/0,
+ /*SubTp=*/nullptr, /*Args=*/VL[0])
+ : TTI::TCC_Free);
}
InstructionCost ReuseShuffleCost = 0;
if (NeedToShuffleReuses)
diff --git a/llvm/test/Analysis/CostModel/X86/loop_v2-inseltpoison.ll
b/llvm/test/Analysis/CostModel/X86/loop_v2-inseltpoison.ll
index 3e0f4c11aadf..1e96f97f16e9 100644
--- a/llvm/test/Analysis/CostModel/X86/loop_v2-inseltpoison.ll
+++ b/llvm/test/Analysis/CostModel/X86/loop_v2-inseltpoison.ll
@@ -20,7 +20,7 @@ vector.body: ; preds =
%vector.body, %vecto
%5 = extractelement <2 x i64> %2, i32 1
%6 = getelementptr inbounds i32, ptr %A, i64 %5
%7 = load i32, ptr %4, align 4
- ;CHECK: cost of 1 {{.*}} insert
+ ;CHECK: cost of 0 {{.*}} insert
%8 = insertelement <2 x i32> poison, i32 %7, i32 0
%9 = load i32, ptr %6, align 4
;CHECK: cost of 1 {{.*}} insert
diff --git a/llvm/test/Analysis/CostModel/X86/loop_v2.ll
b/llvm/test/Analysis/CostModel/X86/loop_v2.ll
index a9cbaaf2fd63..8f67b365ca9b 100644
--- a/llvm/test/Analysis/CostModel/X86/loop_v2.ll
+++ b/llvm/test/Analysis/CostModel/X86/loop_v2.ll
@@ -20,7 +20,7 @@ vector.body: ; preds =
%vector.body, %vecto
%5 = extractelement <2 x i64> %2, i32 1
%6 = getelementptr inbounds i32, ptr %A, i64 %5
%7 = load i32, ptr %4, align 4
- ;CHECK: cost of 1 {{.*}} insert
+ ;CHECK: cost of 0 {{.*}} insert
%8 = insertelement <2 x i32> undef, i32 %7, i32 0
%9 = load i32, ptr %6, align 4
;CHECK: cost of 1 {{.*}} insert
diff --git
a/llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost-inseltpoison.ll
b/llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost-inseltpoison.ll
index 381e5b630812..897344d622d0 100644
--- a/llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost-inseltpoison.ll
+++ b/llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost-inseltpoison.ll
@@ -1907,7 +1907,7 @@ define <16 x float> @test_gather_16f32_ra_var_mask(<16 x
ptr> %ptrs, <16 x i32>
define <16 x float> @test_gather_16f32_const_mask2(ptr %base, <16 x i32> %ind)
{
; SSE2-LABEL: 'test_gather_16f32_const_mask2'
-; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%broadcast.splatinsert = insertelement <16 x ptr> poison, ptr %base, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%broadcast.splatinsert = insertelement <16 x ptr> poison, ptr %base, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%broadcast.splat = shufflevector <16 x ptr> %broadcast.splatinsert, <16 x ptr>
poison, <16 x i32> zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction:
%gep.random = getelementptr float, <16 x ptr> %broadcast.splat, <16 x i64>
%sext_ind
@@ -1966,7 +1966,7 @@ define <16 x float> @test_gather_16f32_const_mask2(ptr
%base, <16 x i32> %ind) {
define void @test_scatter_16i32(ptr %base, <16 x i32> %ind, i16 %mask, <16 x
i32>%val) {
; SSE2-LABEL: 'test_scatter_16i32'
-; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%broadcast.splatinsert = insertelement <16 x ptr> poison, ptr %base, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%broadcast.splatinsert = insertelement <16 x ptr> poison, ptr %base, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%broadcast.splat = shufflevector <16 x ptr> %broadcast.splatinsert, <16 x ptr>
poison, <16 x i32> zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction:
%gep.random = getelementptr i32, <16 x ptr> %broadcast.splat, <16 x i32> %ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %imask
= bitcast i16 %mask to <16 x i1>
diff --git a/llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
b/llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
index 2fa41968e807..5f22b2e39f94 100644
--- a/llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
+++ b/llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
@@ -1907,7 +1907,7 @@ define <16 x float> @test_gather_16f32_ra_var_mask(<16 x
ptr> %ptrs, <16 x i32>
define <16 x float> @test_gather_16f32_const_mask2(ptr %base, <16 x i32> %ind)
{
; SSE2-LABEL: 'test_gather_16f32_const_mask2'
-; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%broadcast.splatinsert = insertelement <16 x ptr> undef, ptr %base, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%broadcast.splatinsert = insertelement <16 x ptr> undef, ptr %base, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%broadcast.splat = shufflevector <16 x ptr> %broadcast.splatinsert, <16 x ptr>
undef, <16 x i32> zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction:
%gep.random = getelementptr float, <16 x ptr> %broadcast.splat, <16 x i64>
%sext_ind
@@ -1966,7 +1966,7 @@ define <16 x float> @test_gather_16f32_const_mask2(ptr
%base, <16 x i32> %ind) {
define void @test_scatter_16i32(ptr %base, <16 x i32> %ind, i16 %mask, <16 x
i32>%val) {
; SSE2-LABEL: 'test_scatter_16i32'
-; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%broadcast.splatinsert = insertelement <16 x ptr> undef, ptr %base, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%broadcast.splatinsert = insertelement <16 x ptr> undef, ptr %base, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%broadcast.splat = shufflevector <16 x ptr> %broadcast.splatinsert, <16 x ptr>
undef, <16 x i32> zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction:
%gep.random = getelementptr i32, <16 x ptr> %broadcast.splat, <16 x i32> %ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %imask
= bitcast i16 %mask to <16 x i1>
diff --git a/llvm/test/Analysis/CostModel/X86/vector-insert-inseltpoison.ll
b/llvm/test/Analysis/CostModel/X86/vector-insert-inseltpoison.ll
index 2296b3d5b0c4..e6a4de688186 100644
--- a/llvm/test/Analysis/CostModel/X86/vector-insert-inseltpoison.ll
+++ b/llvm/test/Analysis/CostModel/X86/vector-insert-inseltpoison.ll
@@ -382,58 +382,58 @@ define i32 @insert_i64(i32 %arg) {
define i32 @insert_i32(i32 %arg) {
; SSE2-LABEL: 'insert_i32'
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i32_a = insertelement <2 x i32> poison, i32 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i32_0 = insertelement <2 x i32> poison, i32 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v2i32_0 = insertelement <2 x i32> poison, i32 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i32_1 = insertelement <2 x i32> poison, i32 undef, i32 1
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i32_a = insertelement <4 x i32> poison, i32 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i32_0 = insertelement <4 x i32> poison, i32 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v4i32_0 = insertelement <4 x i32> poison, i32 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i32_3 = insertelement <4 x i32> poison, i32 undef, i32 3
; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction:
%v8i32_a = insertelement <8 x i32> poison, i32 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_0 = insertelement <8 x i32> poison, i32 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v8i32_0 = insertelement <8 x i32> poison, i32 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_3 = insertelement <8 x i32> poison, i32 undef, i32 3
-; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_4 = insertelement <8 x i32> poison, i32 undef, i32 4
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v8i32_4 = insertelement <8 x i32> poison, i32 undef, i32 4
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_7 = insertelement <8 x i32> poison, i32 undef, i32 7
; SSE2-NEXT: Cost Model: Found an estimated cost of 9 for instruction:
%v16i32_a = insertelement <16 x i32> poison, i32 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_0 = insertelement <16 x i32> poison, i32 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v16i32_0 = insertelement <16 x i32> poison, i32 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_3 = insertelement <16 x i32> poison, i32 undef, i32 3
-; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_8 = insertelement <16 x i32> poison, i32 undef, i32 8
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v16i32_8 = insertelement <16 x i32> poison, i32 undef, i32 8
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_15 = insertelement <16 x i32> poison, i32 undef, i32 15
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret
i32 undef
;
; SSE3-LABEL: 'insert_i32'
; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i32_a = insertelement <2 x i32> poison, i32 undef, i32 %arg
-; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i32_0 = insertelement <2 x i32> poison, i32 undef, i32 0
+; SSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v2i32_0 = insertelement <2 x i32> poison, i32 undef, i32 0
; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i32_1 = insertelement <2 x i32> poison, i32 undef, i32 1
; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i32_a = insertelement <4 x i32> poison, i32 undef, i32 %arg
-; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i32_0 = insertelement <4 x i32> poison, i32 undef, i32 0
+; SSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v4i32_0 = insertelement <4 x i32> poison, i32 undef, i32 0
; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i32_3 = insertelement <4 x i32> poison, i32 undef, i32 3
; SSE3-NEXT: Cost Model: Found an estimated cost of 5 for instruction:
%v8i32_a = insertelement <8 x i32> poison, i32 undef, i32 %arg
-; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_0 = insertelement <8 x i32> poison, i32 undef, i32 0
+; SSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v8i32_0 = insertelement <8 x i32> poison, i32 undef, i32 0
; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_3 = insertelement <8 x i32> poison, i32 undef, i32 3
-; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_4 = insertelement <8 x i32> poison, i32 undef, i32 4
+; SSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v8i32_4 = insertelement <8 x i32> poison, i32 undef, i32 4
; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_7 = insertelement <8 x i32> poison, i32 undef, i32 7
; SSE3-NEXT: Cost Model: Found an estimated cost of 9 for instruction:
%v16i32_a = insertelement <16 x i32> poison, i32 undef, i32 %arg
-; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_0 = insertelement <16 x i32> poison, i32 undef, i32 0
+; SSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v16i32_0 = insertelement <16 x i32> poison, i32 undef, i32 0
; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_3 = insertelement <16 x i32> poison, i32 undef, i32 3
-; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_8 = insertelement <16 x i32> poison, i32 undef, i32 8
+; SSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v16i32_8 = insertelement <16 x i32> poison, i32 undef, i32 8
; SSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_15 = insertelement <16 x i32> poison, i32 undef, i32 15
; SSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret
i32 undef
;
; SSSE3-LABEL: 'insert_i32'
; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i32_a = insertelement <2 x i32> poison, i32 undef, i32 %arg
-; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i32_0 = insertelement <2 x i32> poison, i32 undef, i32 0
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v2i32_0 = insertelement <2 x i32> poison, i32 undef, i32 0
; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i32_1 = insertelement <2 x i32> poison, i32 undef, i32 1
; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i32_a = insertelement <4 x i32> poison, i32 undef, i32 %arg
-; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i32_0 = insertelement <4 x i32> poison, i32 undef, i32 0
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v4i32_0 = insertelement <4 x i32> poison, i32 undef, i32 0
; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i32_3 = insertelement <4 x i32> poison, i32 undef, i32 3
; SSSE3-NEXT: Cost Model: Found an estimated cost of 5 for instruction:
%v8i32_a = insertelement <8 x i32> poison, i32 undef, i32 %arg
-; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_0 = insertelement <8 x i32> poison, i32 undef, i32 0
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v8i32_0 = insertelement <8 x i32> poison, i32 undef, i32 0
; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_3 = insertelement <8 x i32> poison, i32 undef, i32 3
-; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_4 = insertelement <8 x i32> poison, i32 undef, i32 4
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v8i32_4 = insertelement <8 x i32> poison, i32 undef, i32 4
; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i32_7 = insertelement <8 x i32> poison, i32 undef, i32 7
; SSSE3-NEXT: Cost Model: Found an estimated cost of 9 for instruction:
%v16i32_a = insertelement <16 x i32> poison, i32 undef, i32 %arg
-; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_0 = insertelement <16 x i32> poison, i32 undef, i32 0
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v16i32_0 = insertelement <16 x i32> poison, i32 undef, i32 0
; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_3 = insertelement <16 x i32> poison, i32 undef, i32 3
-; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_8 = insertelement <16 x i32> poison, i32 undef, i32 8
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v16i32_8 = insertelement <16 x i32> poison, i32 undef, i32 8
; SSSE3-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i32_15 = insertelement <16 x i32> poison, i32 undef, i32 15
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret
i32 undef
;
@@ -664,100 +664,100 @@ define i32 @insert_i16(i32 %arg) {
define i32 @insert_i8(i32 %arg) {
; SSE2-LABEL: 'insert_i8'
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i8_a = insertelement <2 x i8> poison, i8 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i8_0 = insertelement <2 x i8> poison, i8 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v2i8_0 = insertelement <2 x i8> poison, i8 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v2i8_3 = insertelement <2 x i8> poison, i8 undef, i32 1
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v4i8_a = insertelement <4 x i8> poison, i8 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction:
%v4i8_0 = insertelement <4 x i8> poison, i8 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v4i8_0 = insertelement <4 x i8> poison, i8 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction:
%v4i8_3 = insertelement <4 x i8> poison, i8 undef, i32 3
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v8i8_a = insertelement <8 x i8> poison, i8 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v8i8_0 = insertelement <8 x i8> poison, i8 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v8i8_0 = insertelement <8 x i8> poison, i8 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v8i8_7 = insertelement <8 x i8> poison, i8 undef, i32 7
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction:
%v16i8_a = insertelement <16 x i8> poison, i8 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v16i8_0 = insertelement <16 x i8> poison, i8 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v16i8_0 = insertelement <16 x i8> poison, i8 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v16i8_8 = insertelement <16 x i8> poison, i8 undef, i32 8
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v16i8_15 = insertelement <16 x i8> poison, i8 undef, i32 15
; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction:
%v32i8_a = insertelement <32 x i8> poison, i8 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v32i8_0 = insertelement <32 x i8> poison, i8 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v32i8_0 = insertelement <32 x i8> poison, i8 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v32i8_7 = insertelement <32 x i8> poison, i8 undef, i32 7
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v32i8_8 = insertelement <32 x i8> poison, i8 undef, i32 8
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v32i8_15 = insertelement <32 x i8> poison, i8 undef, i32 15
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v32i8_24 = insertelement <32 x i8> poison, i8 undef, i32 24
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v32i8_31 = insertelement <32 x i8> poison, i8 undef, i32 31
; SSE2-NEXT: Cost Model: Found an estimated cost of 9 for instruction:
%v64i8_a = insertelement <64 x i8> poison, i8 undef, i32 %arg
-; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v64i8_0 = insertelement <64 x i8> poison, i8 undef, i32 0
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v64i8_0 = insertelement <64 x i8> poison, i8 undef, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v64i8_7 = insertelement <64 x i8> poison, i8 undef, i32 7
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v64i8_8 = insertelement <64 x i8> poison, i8 undef, i32 8
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v64i8_15 = insertelement <64 x i8> poison, i8 undef, i32 15
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v64i8_24 = insertelement <64 x i8> poison, i8 undef, i32 24
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v64i8_31 = insertelement <64 x i8> poison, i8 undef, i32 31
-; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v64i8_32 = insertelement <64 x i8> poison, i8 undef, i32 32
-; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v64i8_48 = insertelement <64 x i8> poison, i8 undef, i32 48
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v64i8_32 = insertelement <64 x i8> poison, i8 undef, i32 32
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v64i8_48 = insertelement <64 x i8> poison, i8 undef, i32 48
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction:
%v64i8_63 = insertelement <64 x i8> poison, i8 undef, i32 63
</cut>
_______________________________________________
linaro-toolchain mailing list -- [email protected]
To unsubscribe send an email to [email protected]