[clang] [llvm] AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (PR #86313)

2024-03-22 Thread Sirish Pande via cfe-commits

srpande wrote:

There is no issue in changing the names in principle. Curious, what is the 
rationale to use more demangled names?

https://github.com/llvm/llvm-project/pull/86313
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (PR #86313)

2024-03-25 Thread Sirish Pande via cfe-commits


@@ -18533,51 +18533,35 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4bf16:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4f16:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8bf16:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8f16:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8i16: {
 

srpande wrote:

unnecessary line here.

https://github.com/llvm/llvm-project/pull/86313
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (PR #86313)

2024-03-25 Thread Sirish Pande via cfe-commits

https://github.com/srpande approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/86313
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Simplify EmitAMDGPUBuiltinExpr for load transposes, NFC (PR #86707)

2024-03-26 Thread Sirish Pande via cfe-commits

https://github.com/srpande approved this pull request.

It's a good change.

https://github.com/llvm/llvm-project/pull/86707
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [SelectionDAG] Flags are dropped when creating a new FMUL (PR #66701)

2023-09-20 Thread Sirish Pande via cfe-commits

https://github.com/srpande updated 
https://github.com/llvm/llvm-project/pull/66701

>From a9fe01d82743879c41982aa170fef517dee99256 Mon Sep 17 00:00:00 2001
From: Sirish Pande 
Date: Fri, 15 Sep 2023 13:01:09 -0500
Subject: [PATCH] [SelectionDAG] Flags are dropped when creating a new FMUL

While simplifying some vector operators in DAG combine, we may
need to create new instructions for simplified vectors. At that time,
we need to make sure that all the flags of the new instruction
are copied/modified from the old instruction.

Here's an example where "contract" flag is dropped when FMUL
is createted.

Replacing.2 t42: v2f32 = fmul contract t41, t38
With: t48: v2f32 = fmul t38, t38
---
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |  5 ++--
 llvm/test/CodeGen/AMDGPU/fma.ll   | 23 +++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 23c1486f711d727..608bd9427b2a40f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -2990,8 +2990,9 @@ bool TargetLowering::SimplifyDemandedVectorElts(
 SDValue NewOp1 = SimplifyMultipleUseDemandedVectorElts(Op1, DemandedElts,
TLO.DAG, Depth + 1);
 if (NewOp0 || NewOp1) {
-  SDValue NewOp = TLO.DAG.getNode(
-  Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0, NewOp1 ? NewOp1 : Op1);
+  SDValue NewOp =
+  TLO.DAG.getNode(Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0,
+  NewOp1 ? NewOp1 : Op1, Op->getFlags());
   return TLO.CombineTo(Op, NewOp);
 }
 return false;
diff --git a/llvm/test/CodeGen/AMDGPU/fma.ll b/llvm/test/CodeGen/AMDGPU/fma.ll
index b1db04a7fd8863a..db221aedf754cba 100644
--- a/llvm/test/CodeGen/AMDGPU/fma.ll
+++ b/llvm/test/CodeGen/AMDGPU/fma.ll
@@ -154,3 +154,26 @@ define float @fold_fmul_distributive(float %x, float %y) {
   %fmul = fmul contract float %fadd, %x
   ret float %fmul
 }
+
+; test to make sure contract is not dropped such that we can generate fma from 
following sequence.
+define amdgpu_kernel void @vec_mul_scalar_add_fma(<2 x float> %a, <2 x float> 
%b, float %c1, ptr addrspace(1) %inptr) {
+; GFX906-LABEL: vec_mul_scalar_add_fma:
+; GFX906:   ; %bb.0:
+; GFX906-NEXT:s_load_dword s8, s[0:1], 0x34
+; GFX906-NEXT:s_load_dwordx4 s[4:7], s[0:1], 0x24
+; GFX906-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x3c
+; GFX906-NEXT:v_mov_b32_e32 v0, 0
+; GFX906-NEXT:s_waitcnt lgkmcnt(0)
+; GFX906-NEXT:v_mov_b32_e32 v1, s8
+; GFX906-NEXT:v_mov_b32_e32 v2, s6
+; GFX906-NEXT:v_fmac_f32_e32 v1, s4, v2
+; GFX906-NEXT:global_store_dword v0, v1, s[2:3] offset:4
+; GFX906-NEXT:s_endpgm
+  %gep = getelementptr float, ptr addrspace(1) %inptr, i32 1
+  %c = shufflevector <2 x float> %a, <2 x float> poison, <2 x i32> 
zeroinitializer
+  %mul = fmul contract <2 x float> %c, %b
+  %elv = extractelement <2 x float> %mul, i64 0
+  %add = fadd contract float %elv, %c1
+  store float %add, ptr addrspace(1) %gep, align 4
+  ret void
+}

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [SelectionDAG] Flags are dropped when creating a new FMUL (PR #66701)

2023-09-20 Thread Sirish Pande via cfe-commits

https://github.com/srpande updated 
https://github.com/llvm/llvm-project/pull/66701

>From a9fe01d82743879c41982aa170fef517dee99256 Mon Sep 17 00:00:00 2001
From: Sirish Pande 
Date: Fri, 15 Sep 2023 13:01:09 -0500
Subject: [PATCH] [SelectionDAG] Flags are dropped when creating a new FMUL

While simplifying some vector operators in DAG combine, we may
need to create new instructions for simplified vectors. At that time,
we need to make sure that all the flags of the new instruction
are copied/modified from the old instruction.

Here's an example where "contract" flag is dropped when FMUL
is createted.

Replacing.2 t42: v2f32 = fmul contract t41, t38
With: t48: v2f32 = fmul t38, t38
---
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |  5 ++--
 llvm/test/CodeGen/AMDGPU/fma.ll   | 23 +++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 23c1486f711d727..608bd9427b2a40f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -2990,8 +2990,9 @@ bool TargetLowering::SimplifyDemandedVectorElts(
 SDValue NewOp1 = SimplifyMultipleUseDemandedVectorElts(Op1, DemandedElts,
TLO.DAG, Depth + 1);
 if (NewOp0 || NewOp1) {
-  SDValue NewOp = TLO.DAG.getNode(
-  Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0, NewOp1 ? NewOp1 : Op1);
+  SDValue NewOp =
+  TLO.DAG.getNode(Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0,
+  NewOp1 ? NewOp1 : Op1, Op->getFlags());
   return TLO.CombineTo(Op, NewOp);
 }
 return false;
diff --git a/llvm/test/CodeGen/AMDGPU/fma.ll b/llvm/test/CodeGen/AMDGPU/fma.ll
index b1db04a7fd8863a..db221aedf754cba 100644
--- a/llvm/test/CodeGen/AMDGPU/fma.ll
+++ b/llvm/test/CodeGen/AMDGPU/fma.ll
@@ -154,3 +154,26 @@ define float @fold_fmul_distributive(float %x, float %y) {
   %fmul = fmul contract float %fadd, %x
   ret float %fmul
 }
+
+; test to make sure contract is not dropped such that we can generate fma from 
following sequence.
+define amdgpu_kernel void @vec_mul_scalar_add_fma(<2 x float> %a, <2 x float> 
%b, float %c1, ptr addrspace(1) %inptr) {
+; GFX906-LABEL: vec_mul_scalar_add_fma:
+; GFX906:   ; %bb.0:
+; GFX906-NEXT:s_load_dword s8, s[0:1], 0x34
+; GFX906-NEXT:s_load_dwordx4 s[4:7], s[0:1], 0x24
+; GFX906-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x3c
+; GFX906-NEXT:v_mov_b32_e32 v0, 0
+; GFX906-NEXT:s_waitcnt lgkmcnt(0)
+; GFX906-NEXT:v_mov_b32_e32 v1, s8
+; GFX906-NEXT:v_mov_b32_e32 v2, s6
+; GFX906-NEXT:v_fmac_f32_e32 v1, s4, v2
+; GFX906-NEXT:global_store_dword v0, v1, s[2:3] offset:4
+; GFX906-NEXT:s_endpgm
+  %gep = getelementptr float, ptr addrspace(1) %inptr, i32 1
+  %c = shufflevector <2 x float> %a, <2 x float> poison, <2 x i32> 
zeroinitializer
+  %mul = fmul contract <2 x float> %c, %b
+  %elv = extractelement <2 x float> %mul, i64 0
+  %add = fadd contract float %elv, %c1
+  store float %add, ptr addrspace(1) %gep, align 4
+  ret void
+}

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [SelectionDAG] Flags are dropped when creating a new FMUL (PR #66701)

2023-09-21 Thread Sirish Pande via cfe-commits

https://github.com/srpande closed 
https://github.com/llvm/llvm-project/pull/66701
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [SelectionDAG] Flags are dropped when creating a new FMUL (PR #66701)

2023-09-21 Thread Sirish Pande via cfe-commits

https://github.com/srpande closed 
https://github.com/llvm/llvm-project/pull/66701
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Use 32-bit SGPR during save/restore of SCC (PR #68367)

2023-10-10 Thread Sirish Pande via cfe-commits

https://github.com/srpande updated 
https://github.com/llvm/llvm-project/pull/68367

>From a76a360c1d7fa0860944b6bfcb65ab3405c7b4c6 Mon Sep 17 00:00:00 2001
From: Sirish Pande 
Date: Thu, 28 Sep 2023 11:39:32 -0500
Subject: [PATCH 1/2] [AMDGPU] Save/Restore SCC bit across waterfall loop.

Waterfall loop is overwriting SCC bit of status register.
Make sure SCC bit is saved and restored. We need to save/restore
only in cases where SCC is live across waterfall loop.

Change-Id: I3a37a62a948cfb0c879f06b490f934f7d5be7d96
---
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp| 51 ++-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  | 13 +++
 .../CodeGen/AMDGPU/waterfall_kills_scc.ll | 87 +++
 3 files changed, 150 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/waterfall_kills_scc.ll

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 792f4695d288b5f..cd9ad4fcb21c5d4 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -5094,6 +5094,39 @@ unsigned SIInstrInfo::getVALUOp(const MachineInstr &MI) 
const {
   "Unexpected scalar opcode without corresponding vector one!");
 }
 
+bool SIInstrInfo::isSCCDefinedBefore(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator Before) const 
{
+
+  for (MachineBasicBlock::iterator I = Before, B = MBB.begin(); I != B; --I) {
+MachineInstr &MI = *I;
+if (!MI.hasImplicitDef())
+  continue;
+for (MachineOperand &Op : MI.implicit_operands()) {
+  if (Op.getReg() == AMDGPU::SCC && Op.isDef() && !Op.isDead())
+return true;
+}
+  }
+  return false;
+}
+
+bool SIInstrInfo::isSCCUsedAfter(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator After) const {
+  for (MachineBasicBlock::iterator I = After, E = MBB.end(); I != E; ++I) {
+MachineInstr &MI = *I;
+if (MI.hasRegisterImplicitUseOperand(AMDGPU::SCC))
+  return true;
+  }
+  return false;
+}
+
+bool SIInstrInfo::isSCCDefinedAndUsed(MachineBasicBlock &MBB,
+  MachineBasicBlock::iterator Before,
+  MachineBasicBlock::iterator After) const 
{
+  if (isSCCDefinedBefore(MBB, Before) && isSCCUsedAfter(MBB, After))
+return true;
+  return false;
+}
+
 void SIInstrInfo::insertScratchExecCopy(MachineFunction &MF,
 MachineBasicBlock &MBB,
 MachineBasicBlock::iterator MBBI,
@@ -6014,6 +6047,16 @@ loadMBUFScalarOperandsFromVGPR(const SIInstrInfo &TII, 
MachineInstr &MI,
   unsigned MovExecOpc = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
   const auto *BoolXExecRC = TRI->getRegClass(AMDGPU::SReg_1_XEXECRegClassID);
 
+  // Save SCC. Waterfall Loop may overwrite SCC.
+  Register SaveSCCReg;
+  bool SCCDefined = false;
+  if ((SCCDefined = TII.isSCCDefinedAndUsed(MBB, Begin, End))) {
+SaveSCCReg = MRI.createVirtualRegister(
+TRI->getRegClass(AMDGPU::SReg_1_XEXECRegClassID));
+BuildMI(MBB, Begin, DL, TII.get(AMDGPU::COPY), SaveSCCReg)
+.addReg(AMDGPU::SCC);
+  }
+
   Register SaveExec = MRI.createVirtualRegister(BoolXExecRC);
 
   // Save the EXEC mask
@@ -6069,8 +6112,14 @@ loadMBUFScalarOperandsFromVGPR(const SIInstrInfo &TII, 
MachineInstr &MI,
 
   emitLoadScalarOpsFromVGPRLoop(TII, MRI, MBB, *LoopBB, *BodyBB, DL, 
ScalarOps);
 
-  // Restore the EXEC mask
   MachineBasicBlock::iterator First = RemainderBB->begin();
+  // Restore SCC
+  if (SCCDefined) {
+BuildMI(*RemainderBB, First, DL, TII.get(AMDGPU::COPY), AMDGPU::SCC)
+.addReg(SaveSCCReg);
+  }
+
+  // Restore the EXEC mask
   BuildMI(*RemainderBB, First, DL, TII.get(MovExecOpc), Exec).addReg(SaveExec);
   return BodyBB;
 }
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index a4f59fc3513d646..3a347017f1c85a1 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -960,6 +960,19 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 
   unsigned getVALUOp(const MachineInstr &MI) const;
 
+  /// Return true if SCC is deinfed and not dead
+  /// from Before to beginning of MBB
+  bool isSCCDefinedBefore(MachineBasicBlock &MBB,
+  MachineBasicBlock::iterator Before) const;
+
+  /// Return true if SCC is used from After to end of MBB
+  bool isSCCUsedAfter(MachineBasicBlock &MBB,
+  MachineBasicBlock::iterator After) const;
+
+  bool isSCCDefinedAndUsed(MachineBasicBlock &MBB,
+   MachineBasicBlock::iterator Before,
+   MachineBasicBlock::iterator After) const;
+
   void insertScratchExecCopy(MachineFunction &MF, MachineBasicBlock &MBB,
  MachineBasicBlock::iterator MBBI,
  const DebugLoc &DL, Register Reg, 

[clang-tools-extra] [AMDGPU] Use 32-bit SGPR during save/restore of SCC (PR #68367)

2023-10-10 Thread Sirish Pande via cfe-commits

https://github.com/srpande updated 
https://github.com/llvm/llvm-project/pull/68367

>From a76a360c1d7fa0860944b6bfcb65ab3405c7b4c6 Mon Sep 17 00:00:00 2001
From: Sirish Pande 
Date: Thu, 28 Sep 2023 11:39:32 -0500
Subject: [PATCH 1/2] [AMDGPU] Save/Restore SCC bit across waterfall loop.

Waterfall loop is overwriting SCC bit of status register.
Make sure SCC bit is saved and restored. We need to save/restore
only in cases where SCC is live across waterfall loop.

Change-Id: I3a37a62a948cfb0c879f06b490f934f7d5be7d96
---
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp| 51 ++-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  | 13 +++
 .../CodeGen/AMDGPU/waterfall_kills_scc.ll | 87 +++
 3 files changed, 150 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/waterfall_kills_scc.ll

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 792f4695d288b5f..cd9ad4fcb21c5d4 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -5094,6 +5094,39 @@ unsigned SIInstrInfo::getVALUOp(const MachineInstr &MI) 
const {
   "Unexpected scalar opcode without corresponding vector one!");
 }
 
+bool SIInstrInfo::isSCCDefinedBefore(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator Before) const 
{
+
+  for (MachineBasicBlock::iterator I = Before, B = MBB.begin(); I != B; --I) {
+MachineInstr &MI = *I;
+if (!MI.hasImplicitDef())
+  continue;
+for (MachineOperand &Op : MI.implicit_operands()) {
+  if (Op.getReg() == AMDGPU::SCC && Op.isDef() && !Op.isDead())
+return true;
+}
+  }
+  return false;
+}
+
+bool SIInstrInfo::isSCCUsedAfter(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator After) const {
+  for (MachineBasicBlock::iterator I = After, E = MBB.end(); I != E; ++I) {
+MachineInstr &MI = *I;
+if (MI.hasRegisterImplicitUseOperand(AMDGPU::SCC))
+  return true;
+  }
+  return false;
+}
+
+bool SIInstrInfo::isSCCDefinedAndUsed(MachineBasicBlock &MBB,
+  MachineBasicBlock::iterator Before,
+  MachineBasicBlock::iterator After) const 
{
+  if (isSCCDefinedBefore(MBB, Before) && isSCCUsedAfter(MBB, After))
+return true;
+  return false;
+}
+
 void SIInstrInfo::insertScratchExecCopy(MachineFunction &MF,
 MachineBasicBlock &MBB,
 MachineBasicBlock::iterator MBBI,
@@ -6014,6 +6047,16 @@ loadMBUFScalarOperandsFromVGPR(const SIInstrInfo &TII, 
MachineInstr &MI,
   unsigned MovExecOpc = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
   const auto *BoolXExecRC = TRI->getRegClass(AMDGPU::SReg_1_XEXECRegClassID);
 
+  // Save SCC. Waterfall Loop may overwrite SCC.
+  Register SaveSCCReg;
+  bool SCCDefined = false;
+  if ((SCCDefined = TII.isSCCDefinedAndUsed(MBB, Begin, End))) {
+SaveSCCReg = MRI.createVirtualRegister(
+TRI->getRegClass(AMDGPU::SReg_1_XEXECRegClassID));
+BuildMI(MBB, Begin, DL, TII.get(AMDGPU::COPY), SaveSCCReg)
+.addReg(AMDGPU::SCC);
+  }
+
   Register SaveExec = MRI.createVirtualRegister(BoolXExecRC);
 
   // Save the EXEC mask
@@ -6069,8 +6112,14 @@ loadMBUFScalarOperandsFromVGPR(const SIInstrInfo &TII, 
MachineInstr &MI,
 
   emitLoadScalarOpsFromVGPRLoop(TII, MRI, MBB, *LoopBB, *BodyBB, DL, 
ScalarOps);
 
-  // Restore the EXEC mask
   MachineBasicBlock::iterator First = RemainderBB->begin();
+  // Restore SCC
+  if (SCCDefined) {
+BuildMI(*RemainderBB, First, DL, TII.get(AMDGPU::COPY), AMDGPU::SCC)
+.addReg(SaveSCCReg);
+  }
+
+  // Restore the EXEC mask
   BuildMI(*RemainderBB, First, DL, TII.get(MovExecOpc), Exec).addReg(SaveExec);
   return BodyBB;
 }
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index a4f59fc3513d646..3a347017f1c85a1 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -960,6 +960,19 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 
   unsigned getVALUOp(const MachineInstr &MI) const;
 
+  /// Return true if SCC is deinfed and not dead
+  /// from Before to beginning of MBB
+  bool isSCCDefinedBefore(MachineBasicBlock &MBB,
+  MachineBasicBlock::iterator Before) const;
+
+  /// Return true if SCC is used from After to end of MBB
+  bool isSCCUsedAfter(MachineBasicBlock &MBB,
+  MachineBasicBlock::iterator After) const;
+
+  bool isSCCDefinedAndUsed(MachineBasicBlock &MBB,
+   MachineBasicBlock::iterator Before,
+   MachineBasicBlock::iterator After) const;
+
   void insertScratchExecCopy(MachineFunction &MF, MachineBasicBlock &MBB,
  MachineBasicBlock::iterator MBBI,
  const DebugLoc &DL, Register Reg, 

[clang] [AMDGPU] Use 32-bit SGPR o save/restore of SCC (PR #68367)

2023-10-10 Thread Sirish Pande via cfe-commits

https://github.com/srpande edited 
https://github.com/llvm/llvm-project/pull/68367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [AMDGPU] Use 32-bit SGPR o save/restore of SCC (PR #68367)

2023-10-10 Thread Sirish Pande via cfe-commits

https://github.com/srpande edited 
https://github.com/llvm/llvm-project/pull/68367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] Use 32-bit SGPR to save/restore of SCC (PR #68367)

2023-10-10 Thread Sirish Pande via cfe-commits

https://github.com/srpande edited 
https://github.com/llvm/llvm-project/pull/68367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [AMDGPU] Use 32-bit SGPR to save/restore of SCC (PR #68367)

2023-10-10 Thread Sirish Pande via cfe-commits

https://github.com/srpande edited 
https://github.com/llvm/llvm-project/pull/68367
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950 (PR #117053)

2024-11-21 Thread Sirish Pande via cfe-commits

https://github.com/srpande approved this pull request.

lgtm

https://github.com/llvm/llvm-project/pull/117053
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits