[llvm-branch-commits] [clang] Ofast deprecation clarifications (PR #101005)

2024-07-29 Thread Sjoerd Meijer via llvm-branch-commits

https://github.com/sjoerdmeijer created 
https://github.com/llvm/llvm-project/pull/101005

Following up on the RFC discussion, this is clarifying that the main purpose 
and effect of the -Ofast deprecation is to discourage its usage and that 
everything else is more or less open for discussion, e.g. there is no timeline 
yet for removal.

>From 7357ef4d5b346d0c317ff09c6700fa944f6ae770 Mon Sep 17 00:00:00 2001
From: Sjoerd Meijer 
Date: Mon, 29 Jul 2024 17:04:48 +0530
Subject: [PATCH] Ofast deprecation clarifications

Following up on the RFC discussion, this is clarifying that the main purpose
and effect of the -Ofast deprecation is to discourage its usage and that
everything else is more or less open for discussion, e.g. there is no timeline
yet for removal.
---
 clang/docs/CommandGuide/clang.rst |  7 +--
 clang/docs/ReleaseNotes.rst   | 10 ++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/clang/docs/CommandGuide/clang.rst 
b/clang/docs/CommandGuide/clang.rst
index 663aca1f6ddcb..6ce340b20c252 100644
--- a/clang/docs/CommandGuide/clang.rst
+++ b/clang/docs/CommandGuide/clang.rst
@@ -429,8 +429,11 @@ Code Generation Options
 
 :option:`-Ofast` Enables all the optimizations from :option:`-O3` along
 with other aggressive optimizations that may violate strict compliance with
-language standards. This is deprecated in favor of :option:`-O3`
-in combination with :option:`-ffast-math`.
+language standards. This is deprecated in Clang-19 and a warning is emitted
+that :option:`-O3` in combination with :option:`-ffast-math` should be used
+instead if the request for non-standard math behavior is intended. Thus, as
+there is no timeline yet for removal, the aim is to discourage its usage
+due to the compliance violating optimizations.
 
 :option:`-Os` Like :option:`-O2` with extra optimizations to reduce code
 size.
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 71d615553c613..430fa77218954 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -495,10 +495,12 @@ New Compiler Flags
 Deprecated Compiler Flags
 -
 
-- The ``-Ofast`` command-line option has been deprecated. This option both
-  enables the ``-O3`` optimization-level, as well as enabling non-standard
-  ``-ffast-math`` behaviors. As such, it is somewhat misleading as an
-  "optimization level". Users are advised to switch to ``-O3 -ffast-math`` if
+- The ``-Ofast`` command-line option has been deprecated, but there is no
+  timeline for removal yet. Thus, the main effect of emitting a deprecation
+  warning message is to discourage its usage due to the problems of ``-Ofast``:
+  it enables both the ``-O3`` optimization-level as well as non-standard
+  ``-ffast-math`` behaviors and as such it is perceived to be misleading as an
+  optimization level.  Users are advised to switch to ``-O3 -ffast-math`` if
   the use of non-standard math behavior is intended, and ``-O3`` otherwise.
   See `RFC `_ for 
details.
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Ofast deprecation clarifications (PR #101005)

2024-07-29 Thread Sjoerd Meijer via llvm-branch-commits

sjoerdmeijer wrote:

> I think we need these changes to be against main and not 19.x, aside from the 
> release notes (otherwise we're losing the documentation in 20.x). 

Oh right, yes, missed that, makes sense. Will split this up, let's do the 
documentation update here first as suggested. 

https://github.com/llvm/llvm-project/pull/101005
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: Ofast deprecation clarifications (#101005) (PR #101663)

2024-08-02 Thread Sjoerd Meijer via llvm-branch-commits

https://github.com/sjoerdmeijer commented:

Excellent, thanks, LGTM

https://github.com/llvm/llvm-project/pull/101663
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 815dd4b - [AArch64] Add Cortex CPU subtarget features for instruction fusion.

2021-01-25 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2021-01-25T09:11:29Z
New Revision: 815dd4b2920887741f905c5922e5bbf935348cce

URL: 
https://github.com/llvm/llvm-project/commit/815dd4b2920887741f905c5922e5bbf935348cce
DIFF: 
https://github.com/llvm/llvm-project/commit/815dd4b2920887741f905c5922e5bbf935348cce.diff

LOG: [AArch64] Add Cortex CPU subtarget features for instruction fusion.

This adds subtarget features for AES, literal, and compare and branch
instruction fusion for different Cortex CPUs.

Patch by: Cassie Jones.

Differential Revision: https://reviews.llvm.org/D94457

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64.td
llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
llvm/lib/Target/AArch64/AArch64Subtarget.h
llvm/test/CodeGen/AArch64/misched-fusion-addr.ll
llvm/test/CodeGen/AArch64/misched-fusion-aes.ll
llvm/test/CodeGen/AArch64/misched-fusion-lit.ll

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64.td 
b/llvm/lib/Target/AArch64/AArch64.td
index 15c7130b24f3..762855207d2b 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -218,6 +218,10 @@ def FeatureArithmeticCbzFusion : SubtargetFeature<
 "arith-cbz-fusion", "HasArithmeticCbzFusion", "true",
 "CPU fuses arithmetic + cbz/cbnz operations">;
 
+def FeatureCmpBccFusion : SubtargetFeature<
+"cmp-bcc-fusion", "HasCmpBccFusion", "true",
+"CPU fuses cmp+bcc operations">;
+
 def FeatureFuseAddress : SubtargetFeature<
 "fuse-address", "HasFuseAddress", "true",
 "CPU fuses address generation and memory operations">;
@@ -615,6 +619,9 @@ def ProcA65 : SubtargetFeature<"a65", "ARMProcFamily", 
"CortexA65",
FeatureDotProd,
FeatureFPARMv8,
FeatureFullFP16,
+   FeatureFuseAddress,
+   FeatureFuseAES,
+   FeatureFuseLiterals,
FeatureNEON,
FeatureRAS,
FeatureRCPC,
@@ -627,6 +634,7 @@ def ProcA72 : SubtargetFeature<"a72", "ARMProcFamily", 
"CortexA72",
FeatureCrypto,
FeatureFPARMv8,
FeatureFuseAES,
+   FeatureFuseLiterals,
FeatureNEON,
FeaturePerfMon
]>;
@@ -658,6 +666,7 @@ def ProcA76 : SubtargetFeature<"a76", "ARMProcFamily", 
"CortexA76",
"Cortex-A76 ARM processors", [
 HasV8_2aOps,
 FeatureFPARMv8,
+FeatureFuseAES,
 FeatureNEON,
 FeatureRCPC,
 FeatureCrypto,
@@ -669,7 +678,9 @@ def ProcA76 : SubtargetFeature<"a76", "ARMProcFamily", 
"CortexA76",
 def ProcA77 : SubtargetFeature<"a77", "ARMProcFamily", "CortexA77",
"Cortex-A77 ARM processors", [
 HasV8_2aOps,
+FeatureCmpBccFusion,
 FeatureFPARMv8,
+FeatureFuseAES,
 FeatureNEON, FeatureRCPC,
 FeatureCrypto,
 FeatureFullFP16,
@@ -680,6 +691,7 @@ def ProcA78 : SubtargetFeature<"cortex-a78", 
"ARMProcFamily",
"CortexA78",
"Cortex-A78 ARM processors", [
HasV8_2aOps,
+   FeatureCmpBccFusion,
FeatureCrypto,
FeatureFPARMv8,
FeatureFuseAES,
@@ -696,6 +708,7 @@ def ProcA78C : SubtargetFeature<"cortex-a78c", 
"ARMProcFamily",
 "CortexA78C",
 "Cortex-A78C ARM processors", [
 HasV8_2aOps,
+FeatureCmpBccFusion,
 FeatureCrypto,
 FeatureDotProd,
 FeatureFlagM,
@@ -723,6 +736,7 @@ def ProcR82 : SubtargetFeature<"cortex-r82", 
"ARMProcFamily",
 def ProcX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1",
   "Cortex-X1 ARM processors", [
   HasV8_2aOps,
+  FeatureCmpBccFusion,
   FeatureCryp

[llvm-branch-commits] [llvm] 8af859d - [MachineLoop] New helper isLoopInvariant()

2021-01-08 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2021-01-08T09:04:56Z
New Revision: 8af859d514fa0ef4a75b3c3dfb1ee8f42ac9bd04

URL: 
https://github.com/llvm/llvm-project/commit/8af859d514fa0ef4a75b3c3dfb1ee8f42ac9bd04
DIFF: 
https://github.com/llvm/llvm-project/commit/8af859d514fa0ef4a75b3c3dfb1ee8f42ac9bd04.diff

LOG: [MachineLoop] New helper isLoopInvariant()

This factors out code from MachineLICM that determines whether an instruction
is loop-invariant, which is a generally useful function. Thus this allows to
use that helper elsewhere too.

Differential Revision: https://reviews.llvm.org/D94082

Added: 


Modified: 
llvm/include/llvm/CodeGen/MachineLoopInfo.h
llvm/lib/CodeGen/MachineLICM.cpp
llvm/lib/CodeGen/MachineLoopInfo.cpp

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/MachineLoopInfo.h 
b/llvm/include/llvm/CodeGen/MachineLoopInfo.h
index 8a93f91ae54d..c7491d4191de 100644
--- a/llvm/include/llvm/CodeGen/MachineLoopInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineLoopInfo.h
@@ -67,6 +67,12 @@ class MachineLoop : public LoopBase {
   /// it returns an unknown location.
   DebugLoc getStartLoc() const;
 
+  /// Returns true if the instruction is loop invariant.
+  /// I.e., all virtual register operands are defined outside of the loop,
+  /// physical registers aren't accessed explicitly, and there are no side
+  /// effects that aren't captured by the operands or other flags.
+  bool isLoopInvariant(MachineInstr &I) const;
+
   void dump() const;
 
 private:

diff  --git a/llvm/lib/CodeGen/MachineLICM.cpp 
b/llvm/lib/CodeGen/MachineLICM.cpp
index 7c356cf0e15b..c06bc39b4940 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -1079,60 +1079,12 @@ bool MachineLICMBase::IsLICMCandidate(MachineInstr &I) {
 }
 
 /// Returns true if the instruction is loop invariant.
-/// I.e., all virtual register operands are defined outside of the loop,
-/// physical registers aren't accessed explicitly, and there are no side
-/// effects that aren't captured by the operands or other flags.
 bool MachineLICMBase::IsLoopInvariantInst(MachineInstr &I) {
   if (!IsLICMCandidate(I)) {
 LLVM_DEBUG(dbgs() << "LICM: Instruction not a LICM candidate\n");
 return false;
   }
-
-  // The instruction is loop invariant if all of its operands are.
-  for (const MachineOperand &MO : I.operands()) {
-if (!MO.isReg())
-  continue;
-
-Register Reg = MO.getReg();
-if (Reg == 0) continue;
-
-// Don't hoist an instruction that uses or defines a physical register.
-if (Register::isPhysicalRegister(Reg)) {
-  if (MO.isUse()) {
-// If the physreg has no defs anywhere, it's just an ambient register
-// and we can freely move its uses. Alternatively, if it's allocatable,
-// it could get allocated to something with a def during allocation.
-// However, if the physreg is known to always be caller saved/restored
-// then this use is safe to hoist.
-if (!MRI->isConstantPhysReg(Reg) &&
-!(TRI->isCallerPreservedPhysReg(Reg.asMCReg(), *I.getMF(
-  return false;
-// Otherwise it's safe to move.
-continue;
-  } else if (!MO.isDead()) {
-// A def that isn't dead. We can't move it.
-return false;
-  } else if (CurLoop->getHeader()->isLiveIn(Reg)) {
-// If the reg is live into the loop, we can't hoist an instruction
-// which would clobber it.
-return false;
-  }
-}
-
-if (!MO.isUse())
-  continue;
-
-assert(MRI->getVRegDef(Reg) &&
-   "Machine instr not mapped for this vreg?!");
-
-// If the loop contains the definition of an operand, then the instruction
-// isn't loop invariant.
-if (CurLoop->contains(MRI->getVRegDef(Reg)))
-  return false;
-  }
-
-  // If we got this far, the instruction is loop invariant!
-  return true;
+  return CurLoop->isLoopInvariant(I);
 }
 
 /// Return true if the specified instruction is used by a phi node and hoisting

diff  --git a/llvm/lib/CodeGen/MachineLoopInfo.cpp 
b/llvm/lib/CodeGen/MachineLoopInfo.cpp
index 0c1439da9b29..78480d0e1488 100644
--- a/llvm/lib/CodeGen/MachineLoopInfo.cpp
+++ b/llvm/lib/CodeGen/MachineLoopInfo.cpp
@@ -16,11 +16,14 @@
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/Analysis/LoopInfoImpl.h"
 #include "llvm/CodeGen/MachineDominators.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/Passes.h"
+#include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/Config/llvm-config.h"
 #include "llvm/InitializePasses.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
+
 using namespace llvm;
 
 // Explicitly instantiate methods in LoopInfoImpl.h for MI-level Loops.
@@ -146,6 +149,59 @@ MachineLoopInfo::findLoopPreheader(MachineLoop *L,
   return Preheader;
 }
 
+bool MachineLoop::isLoopInvariant(MachineInstr &I) co

[llvm-branch-commits] [llvm] 9a6de74 - [MachineLICM] Add llvm debug messages to SinkIntoLoop. NFC.

2020-12-22 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-12-22T09:19:43Z
New Revision: 9a6de74d5a9e11a7865ce4873ff3297b7efbb673

URL: 
https://github.com/llvm/llvm-project/commit/9a6de74d5a9e11a7865ce4873ff3297b7efbb673
DIFF: 
https://github.com/llvm/llvm-project/commit/9a6de74d5a9e11a7865ce4873ff3297b7efbb673.diff

LOG: [MachineLICM] Add llvm debug messages to SinkIntoLoop. NFC.

I am investigating sinking instructions back into the loop under high
register pressure. This is just a first NFC step to add some debug
messages that allows tracing of the decision making.

Added: 


Modified: 
llvm/lib/CodeGen/MachineLICM.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/MachineLICM.cpp 
b/llvm/lib/CodeGen/MachineLICM.cpp
index bc7bb66a82fb6..7c356cf0e15b0 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -800,8 +800,13 @@ void MachineLICMBase::SinkIntoLoop() {
I != Preheader->instr_end(); ++I) {
 // We need to ensure that we can safely move this instruction into the 
loop.
 // As such, it must not have side-effects, e.g. such as a call has.
-if (IsLoopInvariantInst(*I) && !HasLoopPHIUse(&*I))
+LLVM_DEBUG(dbgs() << "LICM: Analysing sink candidate: " << *I);
+if (IsLoopInvariantInst(*I) && !HasLoopPHIUse(&*I)) {
+  LLVM_DEBUG(dbgs() << "LICM: Added as sink candidate.\n");
   Candidates.push_back(&*I);
+  continue;
+}
+LLVM_DEBUG(dbgs() << "LICM: Not added as sink candidate.\n");
   }
 
   for (MachineInstr *I : Candidates) {
@@ -811,8 +816,11 @@ void MachineLICMBase::SinkIntoLoop() {
 if (!MRI->hasOneDef(MO.getReg()))
   continue;
 bool CanSink = true;
-MachineBasicBlock *B = nullptr;
+MachineBasicBlock *SinkBlock = nullptr;
+LLVM_DEBUG(dbgs() << "LICM: Try sinking: " << *I);
+
 for (MachineInstr &MI : MRI->use_instructions(MO.getReg())) {
+  LLVM_DEBUG(dbgs() << "LICM:Analysing use: "; MI.dump());
   // FIXME: Come up with a proper cost model that estimates whether sinking
   // the instruction (and thus possibly executing it on every loop
   // iteration) is more expensive than a register.
@@ -821,24 +829,40 @@ void MachineLICMBase::SinkIntoLoop() {
 CanSink = false;
 break;
   }
-  if (!B) {
-B = MI.getParent();
+  if (!SinkBlock) {
+SinkBlock = MI.getParent();
+LLVM_DEBUG(dbgs() << "LICM:   Setting sink block to: "
+  << printMBBReference(*SinkBlock) << "\n");
 continue;
   }
-  B = DT->findNearestCommonDominator(B, MI.getParent());
-  if (!B) {
+  SinkBlock = DT->findNearestCommonDominator(SinkBlock, MI.getParent());
+  if (!SinkBlock) {
+LLVM_DEBUG(dbgs() << "LICM:   Can't find nearest dominator\n");
 CanSink = false;
 break;
   }
+  LLVM_DEBUG(dbgs() << "LICM:   Setting nearest common dom block: " <<
+ printMBBReference(*SinkBlock) << "\n");
+}
+if (!CanSink) {
+  LLVM_DEBUG(dbgs() << "LICM: Can't sink instruction.\n");
+  continue;
 }
-if (!CanSink || !B || B == Preheader)
+if (!SinkBlock) {
+  LLVM_DEBUG(dbgs() << "LICM: Not sinking, can't find sink block.\n");
   continue;
+}
+if (SinkBlock == Preheader) {
+  LLVM_DEBUG(dbgs() << "LICM: Not sinking, sink block is the preheader\n");
+  continue;
+}
 
-LLVM_DEBUG(dbgs() << "Sinking to " << printMBBReference(*B) << " from "
-  << printMBBReference(*I->getParent()) << ": " << *I);
-B->splice(B->getFirstNonPHI(), Preheader, I);
+LLVM_DEBUG(dbgs() << "LICM: Sinking to " << printMBBReference(*SinkBlock)
+  << " from " << printMBBReference(*I->getParent())
+  << ": " << *I);
+SinkBlock->splice(SinkBlock->getFirstNonPHI(), Preheader, I);
 
-// The instruction is is moved from its basic block, so do not retain the
+// The instruction is moved from its basic block, so do not retain the
 // debug information.
 assert(!I->isDebugInstr() && "Should not sink debug inst");
 I->setDebugLoc(DebugLoc());
@@ -1028,6 +1052,7 @@ bool MachineLICMBase::IsLICMCandidate(MachineInstr &I) {
   bool DontMoveAcrossStore = true;
   if ((!I.isSafeToMove(AA, DontMoveAcrossStore)) &&
   !(HoistConstStores && isInvariantStore(I, TRI, MRI))) {
+LLVM_DEBUG(dbgs() << "LICM: Instruction not safe to move.\n");
 return false;
   }
 
@@ -1038,8 +1063,10 @@ bool MachineLICMBase::IsLICMCandidate(MachineInstr &I) {
   // indexed load from a jump table.
   // Stores and side effects are already checked by isSafeToMove.
   if (I.mayLoad() && !mayLoadFromGOTOrConstantPool(I) &&
-  !IsGuaranteedToExecute(I.getParent()))
+  !IsGuaranteedToExecute(I.getParent())) {
+LLVM_DEBUG(dbgs() << "LICM: Load not guaranteed to execute.\n");
 return false;
+  }
 
   // Convergent attribute

[llvm-branch-commits] [llvm] b9b62c2 - [AArch64] Add a test for MachineLICM SinkIntoLoop. NFC.

2020-12-22 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-12-22T12:22:24Z
New Revision: b9b62c28677d2c812604e29bab27c1e2a2144e4b

URL: 
https://github.com/llvm/llvm-project/commit/b9b62c28677d2c812604e29bab27c1e2a2144e4b
DIFF: 
https://github.com/llvm/llvm-project/commit/b9b62c28677d2c812604e29bab27c1e2a2144e4b.diff

LOG: [AArch64] Add a test for MachineLICM SinkIntoLoop. NFC.

Added: 
llvm/test/CodeGen/AArch64/machine-licm-sink-instr.ll

Modified: 


Removed: 




diff  --git a/llvm/test/CodeGen/AArch64/machine-licm-sink-instr.ll 
b/llvm/test/CodeGen/AArch64/machine-licm-sink-instr.ll
new file mode 100644
index ..f8d53a574dd2
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/machine-licm-sink-instr.ll
@@ -0,0 +1,176 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=aarch64 -sink-insts-to-avoid-spills | FileCheck %s
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+
+@A = external dso_local global [100 x i32], align 4
+
+define i32 @sink_load_and_copy(i32 %n) {
+; CHECK-LABEL: sink_load_and_copy:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-NEXT:stp x20, x19, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:.cfi_def_cfa_offset 32
+; CHECK-NEXT:.cfi_offset w19, -8
+; CHECK-NEXT:.cfi_offset w20, -16
+; CHECK-NEXT:.cfi_offset w21, -24
+; CHECK-NEXT:.cfi_offset w30, -32
+; CHECK-NEXT:mov w19, w0
+; CHECK-NEXT:cmp w0, #1 // =1
+; CHECK-NEXT:b.lt .LBB0_3
+; CHECK-NEXT:  // %bb.1: // %for.body.preheader
+; CHECK-NEXT:adrp x8, A
+; CHECK-NEXT:ldr w21, [x8, :lo12:A]
+; CHECK-NEXT:mov w20, w19
+; CHECK-NEXT:  .LBB0_2: // %for.body
+; CHECK-NEXT:// =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:mov w0, w21
+; CHECK-NEXT:bl _Z3usei
+; CHECK-NEXT:subs w19, w19, #1 // =1
+; CHECK-NEXT:sdiv w20, w20, w0
+; CHECK-NEXT:b.ne .LBB0_2
+; CHECK-NEXT:b .LBB0_4
+; CHECK-NEXT:  .LBB0_3:
+; CHECK-NEXT:mov w20, w19
+; CHECK-NEXT:  .LBB0_4: // %for.cond.cleanup
+; CHECK-NEXT:mov w0, w20
+; CHECK-NEXT:ldp x20, x19, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:ldp x30, x21, [sp], #32 // 16-byte Folded Reload
+; CHECK-NEXT:ret
+entry:
+  %cmp63 = icmp sgt i32 %n, 0
+  br i1 %cmp63, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:
+  %0 = load i32, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, 
i64 0, i64 0), align 4
+  br label %for.body
+
+for.cond.cleanup:
+  %sum.0.lcssa = phi i32 [ %n, %entry ], [ %div, %for.body ]
+  ret i32 %sum.0.lcssa
+
+for.body:
+  %lsr.iv = phi i32 [ %n, %for.body.preheader ], [ %lsr.iv.next, %for.body ]
+  %sum.065 = phi i32 [ %div, %for.body ], [ %n, %for.body.preheader ]
+  %call = tail call i32 @_Z3usei(i32 %0)
+  %div = sdiv i32 %sum.065, %call
+  %lsr.iv.next = add i32 %lsr.iv, -1
+  %exitcond.not = icmp eq i32 %lsr.iv.next, 0
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+define i32 @cant_sink_successive_call(i32 %n) {
+; CHECK-LABEL: cant_sink_successive_call:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-NEXT:stp x20, x19, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:.cfi_def_cfa_offset 32
+; CHECK-NEXT:.cfi_offset w19, -8
+; CHECK-NEXT:.cfi_offset w20, -16
+; CHECK-NEXT:.cfi_offset w21, -24
+; CHECK-NEXT:.cfi_offset w30, -32
+; CHECK-NEXT:mov w19, w0
+; CHECK-NEXT:cmp w0, #1 // =1
+; CHECK-NEXT:b.lt .LBB1_3
+; CHECK-NEXT:  // %bb.1: // %for.body.preheader
+; CHECK-NEXT:adrp x8, A
+; CHECK-NEXT:ldr w20, [x8, :lo12:A]
+; CHECK-NEXT:mov w0, w19
+; CHECK-NEXT:bl _Z3usei
+; CHECK-NEXT:mov w21, w19
+; CHECK-NEXT:  .LBB1_2: // %for.body
+; CHECK-NEXT:// =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:mov w0, w20
+; CHECK-NEXT:bl _Z3usei
+; CHECK-NEXT:subs w19, w19, #1 // =1
+; CHECK-NEXT:sdiv w21, w21, w0
+; CHECK-NEXT:b.ne .LBB1_2
+; CHECK-NEXT:b .LBB1_4
+; CHECK-NEXT:  .LBB1_3:
+; CHECK-NEXT:mov w21, w19
+; CHECK-NEXT:  .LBB1_4: // %for.cond.cleanup
+; CHECK-NEXT:ldp x20, x19, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:mov w0, w21
+; CHECK-NEXT:ldp x30, x21, [sp], #32 // 16-byte Folded Reload
+; CHECK-NEXT:ret
+entry:
+  %cmp63 = icmp sgt i32 %n, 0
+  br i1 %cmp63, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:
+  %0 = load i32, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, 
i64 0, i64 0), align 4
+  %call0 = tail call i32 @_Z3usei(i32 %n)
+  br label %for.body
+
+for.cond.cleanup:
+  %sum.0.lcssa = phi i32 [ %n, %entry ], [ %div, %for.body ]
+  ret i32 %sum.0.lcssa
+
+for.body:
+  %lsr.iv = phi i32 [ %n, %for.body.preheader ], [ %lsr.iv.next, %for.body ]
+  %sum.065 = phi i32 [ %div, %for.body ], [ %n,

[llvm-branch-commits] [llvm] 33b2c88 - [LoopFlatten] Widen IV, support ZExt.

2020-11-23 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-11-23T08:57:19Z
New Revision: 33b2c88fa8223dbf15846ce18cc957e33e0d67fc

URL: 
https://github.com/llvm/llvm-project/commit/33b2c88fa8223dbf15846ce18cc957e33e0d67fc
DIFF: 
https://github.com/llvm/llvm-project/commit/33b2c88fa8223dbf15846ce18cc957e33e0d67fc.diff

LOG: [LoopFlatten] Widen IV, support ZExt.

I disabled the widening in fa5cb4b because it run in an assert, which was
related to replacing values with different types. I forgot that an extend could
also be a zero-extend, which I have added now. This means that the approach now
is to create and insert a trunc value of the outerloop for each user, and use
that to replace IV values.

Differential Revision: https://reviews.llvm.org/D91690

Added: 


Modified: 
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
llvm/test/Transforms/LoopFlatten/widen-iv.ll

Removed: 




diff  --git a/llvm/lib/Transforms/Scalar/LoopFlatten.cpp 
b/llvm/lib/Transforms/Scalar/LoopFlatten.cpp
index 3d9617d43aea..aaff68436c13 100644
--- a/llvm/lib/Transforms/Scalar/LoopFlatten.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopFlatten.cpp
@@ -35,6 +35,7 @@
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/IR/Verifier.h"
@@ -66,7 +67,7 @@ static cl::opt
 
 static cl::opt
 WidenIV("loop-flatten-widen-iv", cl::Hidden,
-cl::init(false),
+cl::init(true),
 cl::desc("Widen the loop induction variables, if possible, so "
  "overflow checks won't reject flattening"));
 
@@ -84,6 +85,9 @@ struct FlattenInfo {
   SmallPtrSet LinearIVUses;
   SmallPtrSet InnerPHIsToTransform;
 
+  // Whether this holds the flatten info before or after widening.
+  bool Widened = false;
+
   FlattenInfo(Loop *OL, Loop *IL) : OuterLoop(OL), InnerLoop(IL) {};
 };
 
@@ -335,8 +339,9 @@ static bool checkIVUsers(struct FlattenInfo &FI) {
   // transformation wouldn't be profitable.
 
   Value *InnerLimit = FI.InnerLimit;
-  if (auto *I = dyn_cast(InnerLimit))
-InnerLimit = I->getOperand(0);
+  if (FI.Widened &&
+  (isa(InnerLimit) || isa(InnerLimit)))
+InnerLimit = cast(InnerLimit)->getOperand(0);
 
   // Check that all uses of the inner loop's induction variable match the
   // expected pattern, recording the uses of the outer IV.
@@ -347,7 +352,7 @@ static bool checkIVUsers(struct FlattenInfo &FI) {
 
 // After widening the IVs, a trunc instruction might have been introduced, 
so
 // look through truncs.
-if (dyn_cast(U) ) {
+if (isa(U)) {
   if (!U->hasOneUse())
 return false;
   U = *U->user_begin();
@@ -544,20 +549,18 @@ static bool DoFlattenLoopPair(struct FlattenInfo &FI, 
DominatorTree *DT,
   BranchInst::Create(InnerExitBlock, InnerExitingBlock);
   DT->deleteEdge(InnerExitingBlock, FI.InnerLoop->getHeader());
 
-  auto HasSExtUser = [] (Value *V) -> Value * {
-for (User *U : V->users() )
-  if (dyn_cast(U))
-return U;
-return nullptr;
-  };
-
   // Replace all uses of the polynomial calculated from the two induction
   // variables with the one new one.
+  IRBuilder<> Builder(FI.OuterInductionPHI->getParent()->getTerminator());
   for (Value *V : FI.LinearIVUses) {
-// If the induction variable has been widened, look through the SExt.
-if (Value *U = HasSExtUser(V))
-  V = U;
-V->replaceAllUsesWith(FI.OuterInductionPHI);
+Value *OuterValue = FI.OuterInductionPHI;
+if (FI.Widened)
+  OuterValue = Builder.CreateTrunc(FI.OuterInductionPHI, V->getType(),
+   "flatten.trunciv");
+
+LLVM_DEBUG(dbgs() << "Replacing: "; V->dump();
+   dbgs() << "with:  "; OuterValue->dump());
+V->replaceAllUsesWith(OuterValue);
   }
 
   // Tell LoopInfo, SCEV and the pass manager that the inner loop has been
@@ -613,6 +616,8 @@ static bool CanWidenIV(struct FlattenInfo &FI, 
DominatorTree *DT,
 RecursivelyDeleteDeadPHINode(WideIVs[i].NarrowIV);
   }
   // After widening, rediscover all the loop components.
+  assert(Widened && "Widenend IV expected");
+  FI.Widened = true;
   return CanFlattenLoopPair(FI, DT, LI, SE, AC, TTI);
 }
 

diff  --git a/llvm/test/Transforms/LoopFlatten/widen-iv.ll 
b/llvm/test/Transforms/LoopFlatten/widen-iv.ll
index 579061833bf4..9ac9215a8d95 100644
--- a/llvm/test/Transforms/LoopFlatten/widen-iv.ll
+++ b/llvm/test/Transforms/LoopFlatten/widen-iv.ll
@@ -4,6 +4,9 @@
 
 target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
 
+; DONTWIDEN-NOT:   %flatten.tripcount
+; DONTWIDEN-NOT:   %flatten.trunciv
+
 ; Function Attrs: nounwind
 define void @foo(i32* %A, i32 %N, i32 %M) {
 ; CHECK-LABEL: @foo(
@@ -22,13 +25,14 @@ define void @foo(i32* %A, i32 %N, i32 %M) {
 ; CHECK-NEXT:[[INDVAR1:%.*]] = p

[llvm-branch-commits] [llvm] a3b1fcb - [AArch64][CostModel] Precommit some vector mul tests. NFC.

2020-11-26 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-11-26T13:23:11Z
New Revision: a3b1fcbc0cf5b70015d0f8aa983263d1ca84a8c8

URL: 
https://github.com/llvm/llvm-project/commit/a3b1fcbc0cf5b70015d0f8aa983263d1ca84a8c8
DIFF: 
https://github.com/llvm/llvm-project/commit/a3b1fcbc0cf5b70015d0f8aa983263d1ca84a8c8.diff

LOG: [AArch64][CostModel] Precommit some vector mul tests. NFC.

The cost-model is not getting the cost right for a mul with <2 x i64>
operands, i.e. we don't have a MUL.2d, and this is precommitting some
tests before adjusting this.

Added: 
llvm/test/Analysis/CostModel/AArch64/mul.ll

Modified: 


Removed: 




diff  --git a/llvm/test/Analysis/CostModel/AArch64/mul.ll 
b/llvm/test/Analysis/CostModel/AArch64/mul.ll
new file mode 100644
index ..6a29c6d772d4
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/AArch64/mul.ll
@@ -0,0 +1,211 @@
+; NOTE: Assertions have been autogenerated by 
utils/update_analyze_test_checks.py
+; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -cost-model 
-cost-kind=throughput -analyze | FileCheck %s --check-prefix=THROUGHPUT
+
+; Verify the cost of (vector) multiply instructions.
+
+define <2 x i8> @t1(<2 x i8> %a, <2 x i8> %b)  {
+; THROUGHPUT-LABEL: 't1'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul <2 x i8> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <2 x i8> %1
+;
+  %1 = mul <2 x i8> %a, %b
+  ret <2 x i8> %1
+}
+
+define <4 x i8> @t2(<4 x i8> %a, <4 x i8> %b)  {
+; THROUGHPUT-LABEL: 't2'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul <4 x i8> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <4 x i8> %1
+;
+  %1 = mul <4 x i8> %a, %b
+  ret <4 x i8> %1
+}
+
+define <8 x i8> @t3(<8 x i8> %a, <8 x i8> %b)  {
+; THROUGHPUT-LABEL: 't3'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul <8 x i8> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <8 x i8> %1
+;
+  %1 = mul <8 x i8> %a, %b
+  ret <8 x i8> %1
+}
+
+define <16 x i8> @t4(<16 x i8> %a, <16 x i8> %b)  {
+; THROUGHPUT-LABEL: 't4'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul <16 x i8> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <16 x i8> %1
+;
+  %1 = mul <16 x i8> %a, %b
+  ret <16 x i8> %1
+}
+
+define <32 x i8> @t5(<32 x i8> %a, <32 x i8> %b)  {
+; THROUGHPUT-LABEL: 't5'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%1 = mul <32 x i8> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <32 x i8> %1
+;
+  %1 = mul <32 x i8> %a, %b
+  ret <32 x i8> %1
+}
+
+define <2 x i16> @t6(<2 x i16> %a, <2 x i16> %b)  {
+; THROUGHPUT-LABEL: 't6'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul <2 x i16> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <2 x i16> %1
+;
+  %1 = mul <2 x i16> %a, %b
+  ret <2 x i16> %1
+}
+
+define <4 x i16> @t7(<4 x i16> %a, <4 x i16> %b)  {
+; THROUGHPUT-LABEL: 't7'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul <4 x i16> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <4 x i16> %1
+;
+  %1 = mul <4 x i16> %a, %b
+  ret <4 x i16> %1
+}
+
+define <8 x i16> @t8(<8 x i16> %a, <8 x i16> %b)  {
+; THROUGHPUT-LABEL: 't8'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul <8 x i16> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <8 x i16> %1
+;
+  %1 = mul <8 x i16> %a, %b
+  ret <8 x i16> %1
+}
+
+define <16 x i16> @t9(<16 x i16> %a, <16 x i16> %b)  {
+; THROUGHPUT-LABEL: 't9'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%1 = mul <16 x i16> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <16 x i16> %1
+;
+  %1 = mul <16 x i16> %a, %b
+  ret <16 x i16> %1
+}
+
+define <2 x i32> @t10(<2 x i32> %a, <2 x i32> %b)  {
+; THROUGHPUT-LABEL: 't10'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul <2 x i32> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <2 x i32> %1
+;
+  %1 = mul <2 x i32> %a, %b
+  ret <2 x i32> %1
+}
+
+define <4 x i32> @t11(<4 x i32> %a, <4 x i32> %b)  {
+; THROUGHPUT-LABEL: 't11'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul <4 x i32> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <4 x i32> %1
+;
+  %1 = mul <4 x i32> %a, %b
+  ret <4 x i32> %1
+}
+
+define <8 x i32> @t12(<8 x i32> %a, <8 x i32> %b)  {
+; T

[llvm-branch-commits] [llvm] 10ad64a - [SLP] Dump Tree costs. NFC.

2020-11-27 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-11-27T11:37:33Z
New Revision: 10ad64aa3bd912e638cd2c9721a6577a7f6b5ccb

URL: 
https://github.com/llvm/llvm-project/commit/10ad64aa3bd912e638cd2c9721a6577a7f6b5ccb
DIFF: 
https://github.com/llvm/llvm-project/commit/10ad64aa3bd912e638cd2c9721a6577a7f6b5ccb.diff

LOG: [SLP] Dump Tree costs. NFC.

This adds LLVM_DEBUG messages to dump the (intermediate) tree cost
calculations, which is useful to trace and see how the final cost is
calculated.

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp 
b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 19c088b6ac9b..a68c8c10e8f3 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -1744,6 +1744,19 @@ class BoUpSLP {
 #endif
   };
 
+#ifndef NDEBUG
+  void dumpTreeCosts(TreeEntry *E, int ReuseShuffleCost, int VecCost,
+ int ScalarCost) const {
+dbgs() << "SLP: Calculated costs for Tree:\n"; E->dump();
+dbgs() << "SLP: Costs:\n";
+dbgs() << "SLP: ReuseShuffleCost = " << ReuseShuffleCost << "\n";
+dbgs() << "SLP: VectorCost = " << VecCost << "\n";
+dbgs() << "SLP: ScalarCost = " << ScalarCost << "\n";
+dbgs() << "SLP: ReuseShuffleCost + VecCost - ScalarCost = " <<
+   ReuseShuffleCost + VecCost - ScalarCost << "\n";
+  }
+#endif
+
   /// Create a new VectorizableTree entry.
   TreeEntry *newTreeEntry(ArrayRef VL, Optional 
Bundle,
   const InstructionsState &S,
@@ -3562,6 +3575,7 @@ int BoUpSLP::getEntryCost(TreeEntry *E) {
 TTI->getCastInstrCost(E->getOpcode(), VecTy, SrcVecTy,
   TTI::getCastContextHint(VL0), CostKind, VL0);
   }
+  LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
   return VecCost - ScalarCost;
 }
 case Instruction::FCmp:
@@ -3612,6 +3626,7 @@ int BoUpSLP::getEntryCost(TreeEntry *E) {
   CmpInst::BAD_ICMP_PREDICATE, CostKind);
 VecCost = std::min(VecCost, IntrinsicCost);
   }
+  LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
   return ReuseShuffleCost + VecCost - ScalarCost;
 }
 case Instruction::FNeg:
@@ -3681,6 +3696,7 @@ int BoUpSLP::getEntryCost(TreeEntry *E) {
   int VecCost = TTI->getArithmeticInstrCost(
   E->getOpcode(), VecTy, CostKind, Op1VK, Op2VK, Op1VP, Op2VP,
   Operands, VL0);
+  LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
   return ReuseShuffleCost + VecCost - ScalarCost;
 }
 case Instruction::GetElementPtr: {
@@ -3699,6 +3715,7 @@ int BoUpSLP::getEntryCost(TreeEntry *E) {
   int VecCost =
   TTI->getArithmeticInstrCost(Instruction::Add, VecTy, CostKind,
   Op1VK, Op2VK);
+  LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
   return ReuseShuffleCost + VecCost - ScalarCost;
 }
 case Instruction::Load: {
@@ -3726,6 +3743,7 @@ int BoUpSLP::getEntryCost(TreeEntry *E) {
 VecLdCost += TTI->getShuffleCost(
 TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
   }
+  LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecLdCost, ScalarLdCost));
   return ReuseShuffleCost + VecLdCost - ScalarLdCost;
 }
 case Instruction::Store: {
@@ -3747,6 +3765,7 @@ int BoUpSLP::getEntryCost(TreeEntry *E) {
 VecStCost += TTI->getShuffleCost(
 TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
   }
+  LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecStCost, ScalarStCost));
   return ReuseShuffleCost + VecStCost - ScalarStCost;
 }
 case Instruction::Call: {
@@ -3811,6 +3830,7 @@ int BoUpSLP::getEntryCost(TreeEntry *E) {
  TTI::CastContextHint::None, CostKind);
   }
   VecCost += TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, 0);
+  LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
   return ReuseShuffleCost + VecCost - ScalarCost;
 }
 default:
@@ -4034,10 +4054,11 @@ int BoUpSLP::getTreeCost() {
   continue;
 
 int C = getEntryCost(&TE);
+Cost += C;
 LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
   << " for bundle that starts with " << *TE.Scalars[0]
-  << ".\n");
-Cost += C;
+  << ".\n"
+  << "SLP: Current total cost = " << Cost << "\n");
   }
 
   SmallPtrSet ExtractCostCalculated;
@@ -5941,9 +5962,9 @@ bool 
SLPVectorizerPass::vectorizeStoreChain(ArrayRef Chain, BoUpSLP &R,
 
   int Cost = R.getTreeCost();
 
-  LLVM_DEBUG(dbgs() << "SLP: Found cost=" << Cost << " for VF=" << VF << "\n");
+  LLVM_DEBUG(dbgs() 

[llvm-branch-commits] [llvm] a2016dc - [AArch64][SLP] Precommit tests which would be better not to SLP vectorize. NFC.

2020-11-27 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-11-27T13:43:16Z
New Revision: a2016dc887c5fce33f5a41eefadf0b15a02b08b6

URL: 
https://github.com/llvm/llvm-project/commit/a2016dc887c5fce33f5a41eefadf0b15a02b08b6
DIFF: 
https://github.com/llvm/llvm-project/commit/a2016dc887c5fce33f5a41eefadf0b15a02b08b6.diff

LOG: [AArch64][SLP] Precommit tests which would be better not to SLP vectorize. 
NFC.

Added: 
llvm/test/Transforms/SLPVectorizer/AArch64/mul.ll

Modified: 


Removed: 




diff  --git a/llvm/test/Transforms/SLPVectorizer/AArch64/mul.ll 
b/llvm/test/Transforms/SLPVectorizer/AArch64/mul.ll
new file mode 100644
index ..228a4d773f0c
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/mul.ll
@@ -0,0 +1,108 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -basic-aa -slp-vectorizer -S | FileCheck %s
+
+target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64--linux-gnu"
+
+; These examples correspond to input code like:
+;
+;   void t(long * __restrict a, long * __restrict b) {
+; a[0] *= b[0];
+; a[1] *= b[1];
+;   }
+;
+; If we SLP vectorise this then we end up with something like this because we
+; don't have a mul.2d:
+;
+;ldr q0, [x1]
+;ldr q1, [x0]
+;fmovx8, d0
+;mov x10, v0.d[1]
+;fmovx9, d1
+;mov x11, v1.d[1]
+;mul x8, x9, x8
+;mul x9, x11, x10
+;fmovd0, x8
+;mov v0.d[1], x9
+;str q0, [x0]
+;ret
+;
+; but if we don't SLP vectorise these examples we get this which is smaller
+; and faster:
+;
+;ldp x8, x9, [x1]
+;ldp x10, x11, [x0]
+;mul x9, x11, x9
+;mul x8, x10, x8
+;stp x8, x9, [x0]
+;ret
+;
+; FIXME: don't SLP vectorise this.
+
+define void @mul(i64* noalias nocapture %a, i64* noalias nocapture readonly 
%b) {
+; CHECK-LABEL: @mul(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[ARRAYIDX2:%.*]] = getelementptr inbounds i64, i64* 
[[B:%.*]], i64 1
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast i64* [[B]] to <2 x i64>*
+; CHECK-NEXT:[[TMP1:%.*]] = load <2 x i64>, <2 x i64>* [[TMP0]], align 8
+; CHECK-NEXT:[[ARRAYIDX3:%.*]] = getelementptr inbounds i64, i64* 
[[A:%.*]], i64 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64* [[A]] to <2 x i64>*
+; CHECK-NEXT:[[TMP3:%.*]] = load <2 x i64>, <2 x i64>* [[TMP2]], align 8
+; CHECK-NEXT:[[TMP4:%.*]] = mul nsw <2 x i64> [[TMP3]], [[TMP1]]
+; CHECK-NEXT:[[TMP5:%.*]] = bitcast i64* [[A]] to <2 x i64>*
+; CHECK-NEXT:store <2 x i64> [[TMP4]], <2 x i64>* [[TMP5]], align 8
+; CHECK-NEXT:ret void
+;
+entry:
+  %0 = load i64, i64* %b, align 8
+  %1 = load i64, i64* %a, align 8
+  %mul = mul nsw i64 %1, %0
+  store i64 %mul, i64* %a, align 8
+  %arrayidx2 = getelementptr inbounds i64, i64* %b, i64 1
+  %2 = load i64, i64* %arrayidx2, align 8
+  %arrayidx3 = getelementptr inbounds i64, i64* %a, i64 1
+  %3 = load i64, i64* %arrayidx3, align 8
+  %mul4 = mul nsw i64 %3, %2
+  store i64 %mul4, i64* %arrayidx3, align 8
+  ret void
+}
+
+; Similar example, but now a multiply-accumulate:
+;
+;  void x (long * __restrict a, long * __restrict b) {
+;a[0] *= b[0];
+;a[1] *= b[1];
+;a[0] += b[0];
+;a[1] += b[1];
+;  }
+;
+define void @mac(i64* noalias nocapture %a, i64* noalias nocapture readonly 
%b) {
+; CHECK-LABEL: @mac(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[ARRAYIDX2:%.*]] = getelementptr inbounds i64, i64* 
[[B:%.*]], i64 1
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast i64* [[B]] to <2 x i64>*
+; CHECK-NEXT:[[TMP1:%.*]] = load <2 x i64>, <2 x i64>* [[TMP0]], align 8
+; CHECK-NEXT:[[ARRAYIDX3:%.*]] = getelementptr inbounds i64, i64* 
[[A:%.*]], i64 1
+; CHECK-NEXT:[[TMP2:%.*]] = bitcast i64* [[A]] to <2 x i64>*
+; CHECK-NEXT:[[TMP3:%.*]] = load <2 x i64>, <2 x i64>* [[TMP2]], align 8
+; CHECK-NEXT:[[TMP4:%.*]] = mul nsw <2 x i64> [[TMP3]], [[TMP1]]
+; CHECK-NEXT:[[TMP5:%.*]] = add nsw <2 x i64> [[TMP4]], [[TMP1]]
+; CHECK-NEXT:[[TMP6:%.*]] = bitcast i64* [[A]] to <2 x i64>*
+; CHECK-NEXT:store <2 x i64> [[TMP5]], <2 x i64>* [[TMP6]], align 8
+; CHECK-NEXT:ret void
+;
+entry:
+  %0 = load i64, i64* %b, align 8
+  %1 = load i64, i64* %a, align 8
+  %mul = mul nsw i64 %1, %0
+  %arrayidx2 = getelementptr inbounds i64, i64* %b, i64 1
+  %2 = load i64, i64* %arrayidx2, align 8
+  %arrayidx3 = getelementptr inbounds i64, i64* %a, i64 1
+  %3 = load i64, i64* %arrayidx3, align 8
+  %mul4 = mul nsw i64 %3, %2
+  %add = add nsw i64 %mul, %0
+  store i64 %add, i64* %a, align 8
+  %add9 = add nsw i64 %mul4, %2
+  store i64 %add9, i64* %arrayidx3, align 8
+  ret void
+}



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listin

[llvm-branch-commits] [llvm] 5110ff0 - [AArch64][CostModel] Fix cost for mul <2 x i64>

2020-11-30 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-11-30T11:36:55Z
New Revision: 5110ff08176f29eefd7638e328d65dfd1c1ad042

URL: 
https://github.com/llvm/llvm-project/commit/5110ff08176f29eefd7638e328d65dfd1c1ad042
DIFF: 
https://github.com/llvm/llvm-project/commit/5110ff08176f29eefd7638e328d65dfd1c1ad042.diff

LOG: [AArch64][CostModel] Fix cost for mul <2 x i64>

This was modeled to have a cost of 1, but since we do not have a MUL.2d this is
scalarized into vector inserts/extracts and scalar muls.

Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll,
which we don't want to SLP vectorize.

Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
unfortunately needed changing, but the reason is documented in
LoopVectorize.cpp:6855:

  // The cost of executing VF copies of the scalar instruction. This opcode
  // is unknown. Assume that it is the same as 'mul'.

which I will address next as a follow up of this.

Differential Revision: https://reviews.llvm.org/D92208

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/AArch64/mul.ll

llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
llvm/test/Transforms/SLPVectorizer/AArch64/mul.ll

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 37a34023b8d0..d97570755291 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -644,8 +644,20 @@ int AArch64TTIImpl::getArithmeticInstrCost(
 }
 return Cost;
 
-  case ISD::ADD:
   case ISD::MUL:
+if (LT.second != MVT::v2i64)
+  return (Cost + 1) * LT.first;
+// Since we do not have a MUL.2d instruction, a mul <2 x i64> is expensive
+// as elements are extracted from the vectors and the muls scalarized.
+// As getScalarizationOverhead is a bit too pessimistic, we estimate the
+// cost for a i64 vector directly here, which is:
+// - four i64 extracts,
+// - two i64 inserts, and
+// - two muls.
+// So, for a v2i64 with LT.First = 1 the cost is 8, and for a v4i64 with
+// LT.first = 2 the cost is 16.
+return LT.first * 8;
+  case ISD::ADD:
   case ISD::XOR:
   case ISD::OR:
   case ISD::AND:

diff  --git a/llvm/test/Analysis/CostModel/AArch64/mul.ll 
b/llvm/test/Analysis/CostModel/AArch64/mul.ll
index 6a29c6d772d4..e98463a9fcf4 100644
--- a/llvm/test/Analysis/CostModel/AArch64/mul.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/mul.ll
@@ -113,7 +113,7 @@ define <8 x i32> @t12(<8 x i32> %a, <8 x i32> %b)  {
 
 define <2 x i64> @t13(<2 x i64> %a, <2 x i64> %b)  {
 ; THROUGHPUT-LABEL: 't13'
-; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: 
%1 = mul nsw <2 x i64> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: 
%1 = mul nsw <2 x i64> %a, %b
 ; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <2 x i64> %1
 ;
   %1 = mul nsw <2 x i64> %a, %b
@@ -122,7 +122,7 @@ define <2 x i64> @t13(<2 x i64> %a, <2 x i64> %b)  {
 
 define <4 x i64> @t14(<4 x i64> %a, <4 x i64> %b)  {
 ; THROUGHPUT-LABEL: 't14'
-; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%1 = mul nsw <4 x i64> %a, %b
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: 
%1 = mul nsw <4 x i64> %a, %b
 ; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: 
ret <4 x i64> %1
 ;
   %1 = mul nsw <4 x i64> %a, %b

diff  --git 
a/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
 
b/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
index 80d2e282176a..37c1f4eec32a 100644
--- 
a/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
+++ 
b/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
@@ -9,8 +9,8 @@
 ; leaving cost 3 for scalarizing the result + 2 for executing the op with VF 2.
 
 ; CM: LV: Scalar loop costs: 7.
-; CM: LV: Found an estimated cost of 5 for VF 2 For instruction:   %a = 
extractvalue { i64, i64 } %sv, 0
-; CM-NEXT: LV: Found an estimated cost of 5 for VF 2 For instruction:   %b = 
extractvalue { i64, i64 } %sv, 1
+; CM: LV: Found an estimated cost of 19 for VF 2 For instruction:   %a = 
extractvalue { i64, i64 } %sv, 0
+; CM-NEXT: LV: Found an estimated cost of 19 for VF 2 For instruction:   %b = 
extractvalue { i64, i64 } %sv, 1
 
 ; Check that the extractvalue operands are actually free in vector code.
 

diff  --git a/llvm/test/Transforms/SLPVectorizer/AArch64/mul.ll 
b/llvm/test/Transforms/SLPVectorizer/AArch64/mul.ll
index 228a4d773f0c..7e941adc8cd5 100644
--- a/llvm/test/Transforms/SLPVectorizer/AArch64/mul.ll
++

[llvm-branch-commits] [llvm] f44ba25 - ExtractValue instruction costs

2020-12-01 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-12-01T10:42:23Z
New Revision: f44ba251354f98d771cd0a4268db94db4315945b

URL: 
https://github.com/llvm/llvm-project/commit/f44ba251354f98d771cd0a4268db94db4315945b
DIFF: 
https://github.com/llvm/llvm-project/commit/f44ba251354f98d771cd0a4268db94db4315945b.diff

LOG: ExtractValue instruction costs

Instruction ExtractValue wasn't handled in
LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled
as a mul which is not really accurate. Since it is free (most of the times),
this now gets a cost of 0 using getInstructionCost.

This is a follow-up of D92208, that required changing this regression test.
In a follow up I will look at InsertValue which also isn't handled yet.

Differential Revision: https://reviews.llvm.org/D92317

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ce4f57fb3945..d389e03e9c04 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6853,6 +6853,8 @@ unsigned 
LoopVectorizationCostModel::getInstructionCost(Instruction *I,
   return std::min(CallCost, getVectorIntrinsicCost(CI, VF));
 return CallCost;
   }
+  case Instruction::ExtractValue:
+return TTI.getInstructionCost(I, TTI::TCK_RecipThroughput);
   default:
 // The cost of executing VF copies of the scalar instruction. This opcode
 // is unknown. Assume that it is the same as 'mul'.

diff  --git 
a/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
 
b/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
index 37c1f4eec32a..35c2ddbf3b65 100644
--- 
a/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
+++ 
b/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
@@ -8,9 +8,9 @@
 ; Check scalar cost for extractvalue. The constant and loop invariant operands 
are free,
 ; leaving cost 3 for scalarizing the result + 2 for executing the op with VF 2.
 
-; CM: LV: Scalar loop costs: 7.
-; CM: LV: Found an estimated cost of 19 for VF 2 For instruction:   %a = 
extractvalue { i64, i64 } %sv, 0
-; CM-NEXT: LV: Found an estimated cost of 19 for VF 2 For instruction:   %b = 
extractvalue { i64, i64 } %sv, 1
+; CM: LV: Scalar loop costs: 5.
+; CM: LV: Found an estimated cost of 0 for VF 2 For instruction:   %a = 
extractvalue { i64, i64 } %sv, 0
+; CM-NEXT: LV: Found an estimated cost of 0 for VF 2 For instruction:   %b = 
extractvalue { i64, i64 } %sv, 1
 
 ; Check that the extractvalue operands are actually free in vector code.
 
@@ -57,9 +57,9 @@ exit:
 ; Similar to the test case above, but checks getVectorCallCost as well.
 declare float @pow(float, float) readnone nounwind
 
-; CM: LV: Scalar loop costs: 16.
-; CM: LV: Found an estimated cost of 5 for VF 2 For instruction:   %a = 
extractvalue { float, float } %sv, 0
-; CM-NEXT: LV: Found an estimated cost of 5 for VF 2 For instruction:   %b = 
extractvalue { float, float } %sv, 1
+; CM: LV: Scalar loop costs: 14.
+; CM: LV: Found an estimated cost of 0 for VF 2 For instruction:   %a = 
extractvalue { float, float } %sv, 0
+; CM-NEXT: LV: Found an estimated cost of 0 for VF 2 For instruction:   %b = 
extractvalue { float, float } %sv, 1
 
 ; FORCED-LABEL: define void @test_getVectorCallCost
 
@@ -101,6 +101,3 @@ loop.body:
 exit:
   ret void
 }
-
-
-



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 1e260f9 - [LICM][docs] Document that LICM is also a canonicalization transform. NFC.

2020-12-08 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-12-08T11:56:35Z
New Revision: 1e260f955d3123351fc68de8c2dde02b1be6d14f

URL: 
https://github.com/llvm/llvm-project/commit/1e260f955d3123351fc68de8c2dde02b1be6d14f
DIFF: 
https://github.com/llvm/llvm-project/commit/1e260f955d3123351fc68de8c2dde02b1be6d14f.diff

LOG: [LICM][docs] Document that LICM is also a canonicalization transform. NFC.

This documents that LICM is a canonicalization transform, which we discussed
recently in:

http://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html

but which was also discused earlier, e.g. in:

http://lists.llvm.org/pipermail/llvm-dev/2019-September/135058.html

Added: 


Modified: 
llvm/docs/Passes.rst
llvm/lib/Transforms/Scalar/LICM.cpp

Removed: 




diff  --git a/llvm/docs/Passes.rst b/llvm/docs/Passes.rst
index 202e3ab223d6..d146ce745282 100644
--- a/llvm/docs/Passes.rst
+++ b/llvm/docs/Passes.rst
@@ -720,6 +720,12 @@ into the preheader block, or by sinking code to the exit 
blocks if it is safe.
 This pass also promotes must-aliased memory locations in the loop to live in
 registers, thus hoisting and sinking "invariant" loads and stores.
 
+Hoisting operations out of loops is a canonicalization transform. It enables
+and simplifies subsequent optimizations in the middle-end. Rematerialization
+of hoisted instructions to reduce register pressure is the responsibility of
+the back-end, which has more accurate information about register pressure and
+also handles other optimizations than LICM that increase live-ranges.
+
 This pass uses alias analysis for two purposes:
 
 #. Moving loop invariant loads and calls out of loops.  If we can determine

diff  --git a/llvm/lib/Transforms/Scalar/LICM.cpp 
b/llvm/lib/Transforms/Scalar/LICM.cpp
index 9d90986c54ad..1b14bc972a9e 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -12,6 +12,13 @@
 // safe.  This pass also promotes must-aliased memory locations in the loop to
 // live in registers, thus hoisting and sinking "invariant" loads and stores.
 //
+// Hoisting operations out of loops is a canonicalization transform.  It
+// enables and simplifies subsequent optimizations in the middle-end.
+// Rematerialization of hoisted instructions to reduce register pressure is the
+// responsibility of the back-end, which has more accurate information about
+// register pressure and also handles other optimizations than LICM that
+// increase live-ranges.
+//
 // This pass uses alias analysis for two purposes:
 //
 //  1. Moving loop invariant loads and calls out of loops.  If we can determine



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 99ad078 - [AArch64] Cortex-R82: remove crypto

2020-12-10 Thread Sjoerd Meijer via llvm-branch-commits

Author: Sjoerd Meijer
Date: 2020-12-10T12:54:51Z
New Revision: 99ad078b91ed601cd19c75a44106a4f86bfa1a41

URL: 
https://github.com/llvm/llvm-project/commit/99ad078b91ed601cd19c75a44106a4f86bfa1a41
DIFF: 
https://github.com/llvm/llvm-project/commit/99ad078b91ed601cd19c75a44106a4f86bfa1a41.diff

LOG: [AArch64] Cortex-R82: remove crypto

Remove target features crypto for Cortex-R82, because it doesn't have any, and
add LSE which was missing while we are at it.
This also removes crypto from the v8-R architecture description because that
aligns better with GCC and so far none of the R-cores have implemented crypto,
so is probably a more sensible default.

Differential Revision: https://reviews.llvm.org/D91994

Added: 


Modified: 
clang/lib/Driver/ToolChains/Arch/AArch64.cpp
clang/test/Preprocessor/aarch64-target-features.c
llvm/include/llvm/Support/AArch64TargetParser.def
llvm/unittests/Support/TargetParserTest.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/Arch/AArch64.cpp 
b/clang/lib/Driver/ToolChains/Arch/AArch64.cpp
index fca6d95d361b..13e4cac292d0 100644
--- a/clang/lib/Driver/ToolChains/Arch/AArch64.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/AArch64.cpp
@@ -317,8 +317,7 @@ void aarch64::getAArch64TargetFeatures(const Driver &D,
   NoCrypto = true;
   }
 
-  if (std::find(ItBegin, ItEnd, "+v8.4a") != ItEnd ||
-  std::find(ItBegin, ItEnd, "+v8r") != ItEnd) {
+  if (std::find(ItBegin, ItEnd, "+v8.4a") != ItEnd) {
 if (HasCrypto && !NoCrypto) {
   // Check if we have NOT disabled an algorithm with something like:
   //   +crypto, -algorithm

diff  --git a/clang/test/Preprocessor/aarch64-target-features.c 
b/clang/test/Preprocessor/aarch64-target-features.c
index f0b01f519a85..178098197d53 100644
--- a/clang/test/Preprocessor/aarch64-target-features.c
+++ b/clang/test/Preprocessor/aarch64-target-features.c
@@ -240,7 +240,7 @@
 // CHECK-MCPU-A57: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" 
"+neon" "-target-feature" "+crc" "-target-feature" "+crypto"
 // CHECK-MCPU-A72: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" 
"+neon" "-target-feature" "+crc" "-target-feature" "+crypto"
 // CHECK-MCPU-CORTEX-A73: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" 
"-target-feature" "+neon" "-target-feature" "+crc" "-target-feature" "+crypto"
-// CHECK-MCPU-CORTEX-R82: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" 
"-target-feature" "+v8r" "-target-feature" "+fp-armv8" "-target-feature" 
"+neon" "-target-feature" "+crc" "-target-feature" "+crypto" "-target-feature" 
"+dotprod" "-target-feature" "+fp16fml" "-target-feature" "+ras" 
"-target-feature" "+rdm" "-target-feature" "+rcpc" "-target-feature" 
"+fullfp16" "-target-feature" "+sm4" "-target-feature" "+sha3" 
"-target-feature" "+sha2" "-target-feature" "+aes"
+// CHECK-MCPU-CORTEX-R82: "-cc1"{{.*}} "-triple" "aarch64{{.*}}"  
"-target-feature" "+v8r" "-target-feature" "+fp-armv8" "-target-feature" 
"+neon" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" 
"+fp16fml" "-target-feature" "+ras" "-target-feature" "+lse" "-target-feature" 
"+rdm" "-target-feature" "+rcpc" "-target-feature" "+fullfp16"
 // CHECK-MCPU-M1: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" 
"+neon" "-target-feature" "+crc" "-target-feature" "+crypto"
 // CHECK-MCPU-M4: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" 
"+neon" "-target-feature" "+crc" "-target-feature" "+crypto" "-target-feature" 
"+dotprod" "-target-feature" "+fullfp16"
 // CHECK-MCPU-KRYO: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" 
"+neon" "-target-feature" "+crc" "-target-feature" "+crypto"

diff  --git a/llvm/include/llvm/Support/AArch64TargetParser.def 
b/llvm/include/llvm/Support/AArch64TargetParser.def
index 7625f5a6f6ab..34b6f72d4621 100644
--- a/llvm/include/llvm/Support/AArch64TargetParser.def
+++ b/llvm/include/llvm/Support/AArch64TargetParser.def
@@ -51,14 +51,14 @@ AARCH64_ARCH("armv8.6-a", ARMV8_6A, "8.6-A", "v8.6a",
   AArch64::AEK_RDM  | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD |
   AArch64::AEK_SM4  | AArch64::AEK_SHA3 | AArch64::AEK_BF16|
   AArch64::AEK_SHA2 | AArch64::AEK_AES  | AArch64::AEK_I8MM))
+// For v8-R, we do not enable crypto and align with GCC that enables a more
+// minimal set of optional architecture extensions.
 AARCH64_ARCH("armv8-r", ARMV8R, "8-R", "v8r",
  ARMBuildAttrs::CPUArch::v8_R, FK_CRYPTO_NEON_FP_ARMV8,
- (AArch64::AEK_CRC | AArch64::AEK_RDM  | AArch64::AEK_SSBS|
-  AArch64::AEK_CRYPTO  | AArch64::AEK_SM4  | AArch64::AEK_SHA3|
-  AArch64::AEK_SHA2| AArch64::AEK_AES  | AArch64::AEK_DOTPROD |
-  AArch64::AEK_FP  | AArch64::AEK_SIMD | AArch64::AEK_FP16|
-  AArch64::AEK_FP16FML | AArch64::AEK_RAS  | AArch64::AEK_RCPC|
-  AArch64::AEK_SB))
+   

[llvm-branch-commits] [llvm] [LoopInterchange] Improve profitability check for vectorization (PR #133672)

2025-04-04 Thread Sjoerd Meijer via llvm-branch-commits


@@ -80,6 +80,21 @@ enum class RuleTy {
   ForVectorization,
 };
 
+/// Store the information about if corresponding direction vector was negated

sjoerdmeijer wrote:

Before I keep reading the rest of this patch, just wanted to share this first 
question that I had. I was initially a bit confused about this, and was 
wondering why we need 2 booleans and 4 states if a direction vector's negated 
status can only be true or false. But I now guess that the complication here is 
the unique entries in the dependency matrix, is that right? If that is the 
case, then I am wondering if it isn't easier to keep all the entries and don't 
make them unique? Making them unique was a little optimisation that I added 
recently because I thought that would help, but if this is now complicating 
things and we need to do all sorts of gymnastics we might as well keep all 
entries.

https://github.com/llvm/llvm-project/pull/133672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopInterchange] Improve profitability check for vectorization (PR #133672)

2025-04-02 Thread Sjoerd Meijer via llvm-branch-commits


@@ -80,6 +80,21 @@ enum class RuleTy {
   ForVectorization,
 };
 
+/// Store the information about if corresponding direction vector was negated

sjoerdmeijer wrote:

I think duplicated direction vectors are always allowed. They don't add new or 
different information, so it shouldn't effect the interpretation of the 
dependence analysis in any way. The only thing that it affects is processing 
the same information again and again, so the only benefit of making them unique 
is to avoid that. But if keeping all entries makes the logic easier, there is a 
good reason to not make them unique. I think adding all the state here 
complicates things, and if a simple map of original to negated helps, you've 
certainly got my vote to simplify this. 

https://github.com/llvm/llvm-project/pull/133672
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits