[Lldb-commits] [compiler-rt] [libcxxabi] [libcxx] [lld] [flang] [llvm] [clang-tools-extra] [lldb] [clang] [AMDGPU] GFX12: select @llvm.prefetch intrinsic (PR #74576)

2023-12-07 Thread Stanislav Mekhanoshin via lldb-commits


@@ -959,6 +967,32 @@ def : GCNPat <
 }
 } // let OtherPredicates = [HasShaderCyclesRegister]
 
+def SIMM24bitPtr : ImmLeaf (Imm);}]
+>;
+
+multiclass SMPrefetchPat {
+  def : GCNPat <
+(smrd_prefetch (SMRDImm i64:$sbase, i32:$offset), timm, timm, (i32 
cache_type)),
+(!cast("S_PREFETCH_"#type) $sbase, $offset, (i32 
SGPR_NULL), (i8 0))
+  >;
+
+  def : GCNPat <
+(smrd_prefetch (i64 SReg_64:$sbase), timm, timm, (i32 cache_type)),
+(!cast("S_PREFETCH_"#type) $sbase, 0, (i32 SGPR_NULL), 
(i8 0))
+  >;
+
+  def : GCNPat <
+(prefetch SIMM24bitPtr:$offset, timm, timm, (i32 cache_type)),
+(!cast("S_PREFETCH_"#type#"_PC_REL") (as_i32timm 
$offset), (i32 SGPR_NULL), (i8 0))
+  > {
+let AddedComplexity = 10;
+  }

rampitec wrote:

Prefetch on an absolute address is practically useless.

https://github.com/llvm/llvm-project/pull/74576
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang-tools-extra] [llvm] [libcxx] [lldb] [flang] [clang] [compiler-rt] [libcxxabi] [lld] [AMDGPU] GFX12: select @llvm.prefetch intrinsic (PR #74576)

2023-12-07 Thread Stanislav Mekhanoshin via lldb-commits


@@ -959,6 +967,32 @@ def : GCNPat <
 }
 } // let OtherPredicates = [HasShaderCyclesRegister]
 
+def SIMM24bitPtr : ImmLeaf (Imm);}]
+>;
+
+multiclass SMPrefetchPat {
+  def : GCNPat <
+(smrd_prefetch (SMRDImm i64:$sbase, i32:$offset), timm, timm, (i32 
cache_type)),
+(!cast("S_PREFETCH_"#type) $sbase, $offset, (i32 
SGPR_NULL), (i8 0))
+  >;
+
+  def : GCNPat <
+(smrd_prefetch (i64 SReg_64:$sbase), timm, timm, (i32 cache_type)),
+(!cast("S_PREFETCH_"#type) $sbase, 0, (i32 SGPR_NULL), 
(i8 0))
+  >;
+
+  def : GCNPat <
+(prefetch SIMM24bitPtr:$offset, timm, timm, (i32 cache_type)),
+(!cast("S_PREFETCH_"#type#"_PC_REL") (as_i32timm 
$offset), (i32 SGPR_NULL), (i8 0))
+  > {
+let AddedComplexity = 10;
+  }

rampitec wrote:

So you want a target intrinsic?

https://github.com/llvm/llvm-project/pull/74576
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [compiler-rt] [libcxx] [lldb] [libcxxabi] [clang-tools-extra] [lld] [llvm] [clang] [flang] [AMDGPU] GFX12: select @llvm.prefetch intrinsic (PR #74576)

2023-12-08 Thread Stanislav Mekhanoshin via lldb-commits


@@ -959,6 +967,32 @@ def : GCNPat <
 }
 } // let OtherPredicates = [HasShaderCyclesRegister]
 
+def SIMM24bitPtr : ImmLeaf (Imm);}]
+>;
+
+multiclass SMPrefetchPat {
+  def : GCNPat <
+(smrd_prefetch (SMRDImm i64:$sbase, i32:$offset), timm, timm, (i32 
cache_type)),
+(!cast("S_PREFETCH_"#type) $sbase, $offset, (i32 
SGPR_NULL), (i8 0))
+  >;
+
+  def : GCNPat <
+(smrd_prefetch (i64 SReg_64:$sbase), timm, timm, (i32 cache_type)),
+(!cast("S_PREFETCH_"#type) $sbase, 0, (i32 SGPR_NULL), 
(i8 0))
+  >;
+
+  def : GCNPat <
+(prefetch SIMM24bitPtr:$offset, timm, timm, (i32 cache_type)),
+(!cast("S_PREFETCH_"#type#"_PC_REL") (as_i32timm 
$offset), (i32 SGPR_NULL), (i8 0))
+  > {
+let AddedComplexity = 10;
+  }

rampitec wrote:

I do not think we need to use PC_REL form to prefetch on a function's address. 
The instruction can take full 64-bit address, so one can just use this address. 
 My understanding that PC_REL form can be useful if you expect something like a 
huge loop or a local branch and want to prefetch something like 1K from the PC. 
I am not sure though how useful this can be at a high language level or even in 
IR.

https://github.com/llvm/llvm-project/pull/74576
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lld] [llvm] [libc] [libcxx] [flang] [compiler-rt] [clang-tools-extra] [clang] [GlobalISel] Add G_PREFETCH (PR #74863)

2023-12-11 Thread Stanislav Mekhanoshin via lldb-commits


@@ -1209,6 +1209,15 @@ def G_FENCE : GenericInstruction {
   let hasSideEffects = true;
 }
 
+// Generic opcode equivalent to the llvm.prefetch intrinsic.
+def G_PREFETCH : GenericInstruction {
+  let OutOperandList = (outs);
+  let InOperandList = (ins ptype0:$address, i32imm:$rw, i32imm:$locality, 
i32imm:$cachetype);
+  let hasSideEffects = true;
+  let mayLoad = true;
+  let mayStore = true;

rampitec wrote:

> should probably just be hasSideEffects. mayLoad/mayStore imply it needs a 
> memory operand and is an ordered memory reference when it doesn't have one

I could argue this is not a memory operation at all as it shall have no visible 
effects other than access speed, although practically it has ordering. You 
certainly do not want a prefetch to be moved past the loads which it was 
supposed to prefetch. I.e. in my view use of both mayLoad and mayStore is 
justified. Although we need to make sure it is not considered an aliased store 
or load from the AA point of view.

https://github.com/llvm/llvm-project/pull/74863
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [flang] [clang] [compiler-rt] [lld] [clang-tools-extra] [llvm] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-12 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 1/7] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots f

[Lldb-commits] [libcxx] [flang] [clang] [compiler-rt] [lld] [clang-tools-extra] [llvm] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-12 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

Ping

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [llvm] [libcxx] [compiler-rt] [lld] [clang-tools-extra] [libc] [clang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-12 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 1/7] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots f

[Lldb-commits] [lld] [compiler-rt] [clang] [flang] [lldb] [libc] [libcxx] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-12 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

To make it easier I am splitting the patch. I have pre-comitted the test, and 
there is a part which fixes lack of wait on GFX10 : 
https://github.com/llvm/llvm-project/pull/75245

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lld] [compiler-rt] [clang] [flang] [lldb] [libc] [libcxx] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-12 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

Another part is improving memoperand info: 
https://github.com/llvm/llvm-project/pull/75247. This is NFCI just by itself.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lld] [compiler-rt] [clang] [flang] [lldb] [libc] [libcxx] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-12 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

Yet another part to fix disjoint memory checks with LDS DMA: 
https://github.com/llvm/llvm-project/pull/75249

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [lld] [clang-tools-extra] [compiler-rt] [lldb] [flang] [llvm] [libcxx] [libc] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-13 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 1/7] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots f

[Lldb-commits] [libc] [flang] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [lldb] [clang] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-13 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 1/8] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots f

[Lldb-commits] [clang-tools-extra] [lldb] [llvm] [libc] [flang] [lld] [compiler-rt] [libcxx] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-13 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 1/9] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots f

[Lldb-commits] [clang] [llvm] [libcxx] [lldb] [clang-tools-extra] [libc] [compiler-rt] [flang] [lld] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-18 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 1/9] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots f

[Lldb-commits] [clang] [libcxx] [compiler-rt] [lldb] [libc] [llvm] [lld] [flang] [clang-tools-extra] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-18 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

All split off parts were merged and this patch is merged with main. Only 
waitcount insertion pass changes remained here.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [compiler-rt] [libcxx] [flang] [libc] [lldb] [lld] [clang] [clang-tools-extra] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> How does this work in a case like this?
> 
> ```
> call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) 
> @lds.3, i32 4, i32 0, i32 0, i32 0, i32 0)
> call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr addrspace(3) 
> %ptr, i32 4, i32 0, i32 0, i32 0, i32 0)
> %val.3 = load float, ptr addrspace(3) @lds.3, align 4
> ```
> 
> i.e.
> 
> * store to known lds address `@lds.3` (this will use slot 0 and another 
> slot e.g. slot 3?)
> 
> * store to unknown lds address (this will use slot 0?)
> 
> * load from known lds address `@lds.3` (this will use slot 3?)

It does not know the pointer, so it uses default slot 0 and waits till 0. I 
have to tell anyone interested here: before I even wrote this code it didn't 
know of the dependency and did not wait for anything at all. Everyone was happy.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [clang-tools-extra] [compiler-rt] [llvm] [libcxx] [lldb] [lld] [libc] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> Test case:
> 
> ```
> @lds.0 = internal addrspace(3) global [64 x float] poison, align 16
> @lds.1 = internal addrspace(3) global [64 x float] poison, align 16
> 
> declare void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr 
> addrspace(3) nocapture, i32 %size, i32 %voffset, i32 %soffset, i32 %offset, 
> i32 %aux)
> 
> define amdgpu_kernel void @f(<4 x i32> %rsrc, i32 %i1, i32 %i2, ptr 
> addrspace(1) %out, ptr addrspace(3) %ptr) {
> main_body:
>   call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr 
> addrspace(3) @lds.0, i32 4, i32 0, i32 0, i32 0, i32 0)
>   call void @llvm.amdgcn.raw.buffer.load.lds(<4 x i32> %rsrc, ptr 
> addrspace(3) %ptr, i32 4, i32 0, i32 0, i32 0, i32 0)
>   %gep.0 = getelementptr float, ptr addrspace(3) @lds.0, i32 %i1
>   %gep.1 = getelementptr float, ptr addrspace(3) @lds.1, i32 %i2
>   %val.0 = load volatile float, ptr addrspace(3) %gep.0, align 4
>   %val.1 = load volatile float, ptr addrspace(3) %gep.1, align 4
>   %out.gep.1 = getelementptr float, ptr addrspace(1) %out, i32 1
>   store float %val.0, ptr addrspace(1) %out
>   store float %val.1, ptr addrspace(1) %out.gep.1
>   ret void
> }
> ```
> 
> Generates:
> 
> ```
>   s_load_dwordx8 s[4:11], s[0:1], 0x24
>   s_load_dword s2, s[0:1], 0x44
>   s_mov_b32 m0, 0
>   v_mov_b32_e32 v2, 0
>   s_waitcnt lgkmcnt(0)
>   buffer_load_dword off, s[4:7], 0 lds
>   s_mov_b32 m0, s2
>   s_lshl_b32 s0, s8, 2
>   buffer_load_dword off, s[4:7], 0 lds
>   s_lshl_b32 s1, s9, 2
>   v_mov_b32_e32 v0, s0
>   v_mov_b32_e32 v1, s1
>   s_waitcnt vmcnt(1)
>   ds_read_b32 v0, v0
>   s_waitcnt vmcnt(0)
>   ds_read_b32 v1, v1 offset:256
>   s_waitcnt lgkmcnt(0)
>   global_store_dwordx2 v2, v[0:1], s[10:11]
>   s_endpgm
> ```
> 
> The `s_waitcnt vmcnt(1)` seems incorrect, because the second 
> buffer-load-to-lds might clobber `@lds.0`.

This is still correct, pointer argument cannot alias module global. A pointer 
argument to a kernel is an LDS external requested by the host side, and host 
cannot see module LDS.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [clang-tools-extra] [compiler-rt] [llvm] [libcxx] [lldb] [lld] [libc] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> This is still correct, pointer argument cannot alias module global. A pointer 
> argument to a kernel is an LDS external requested by the host side, and host 
> cannot see module LDS.

I.e. that is really the point of the patch: if we are able to definitively 
identify an LDS object targeted by both load and store we only wait on that 
store or stores. And the only way to definitively identify the object at this 
stage is via alias.scope info which we are generating ourselves during module 
LDS lowering.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [libc] [lldb] [lld] [llvm] [compiler-rt] [libcxx] [flang] [clang-tools-extra] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/10] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots

[Lldb-commits] [compiler-rt] [llvm] [libc] [libcxx] [lldb] [clang] [lld] [clang-tools-extra] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> > This is still correct, pointer argument cannot alias module global. A 
> > pointer argument to a kernel is an LDS external requested by the host side, 
> > and host cannot see module LDS.
> 
> I.e. that is really the point of the patch: if we are able to definitively 
> identify an LDS object targeted by both load and store we only wait on that 
> store or stores. And the only way to definitively identify the object at this 
> stage is via alias.scope info which we are generating ourselves during module 
> LDS lowering.

I have added a check for the presence of alias scope info just in case we get a 
rogue AA. The testcase with a pointer argument still produces correct code with 
vmcnt(1).

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lld] [compiler-rt] [flang] [libc] [libcxx] [llvm] [clang] [lldb] [clang-tools-extra] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/75974

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a pseudo
register is used in the scoreboard, acting like if LDS DMA writes
it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias scope info.

Fixes: SWDEV-433427

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/11] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/Mac

[Lldb-commits] [lld] [compiler-rt] [flang] [libc] [libcxx] [llvm] [clang] [lldb] [clang-tools-extra] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

Actually since I am only using alias scope I can avoid all alias analysis 
altogether and only compare alias scope. This does not need an analysis pass, 
calls to mayAlias, and in general simpler code. You can see an alternative PR 
if you like it more: https://github.com/llvm/llvm-project/pull/75974

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lld] [compiler-rt] [flang] [libc] [libcxx] [llvm] [clang] [lldb] [clang-tools-extra] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/75974

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/12] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots

[Lldb-commits] [clang] [lldb] [flang] [llvm] [libc] [libcxx] [lld] [clang-tools-extra] [compiler-rt] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

One thing to note: this alias.scope I am creating myself in the module LDS 
lowering, so I do exactly know what to expect. And then since there is this 
module LDS lowering even if any alias scope would be created before (which 
never happens, much less for an intrinsic call) it is already lost. It is lost 
along with the memory objects deleted by the lowering. That is the whole point 
of creating alias.scope metadata during the lowering: we are putting all module 
LDS into a single structure, so no AA will ever disambiguate it w/o alias scope 
info. In this situation I am the sole creator of the metadata, instructions 
carrying it, memory object accessed, and the consumer of this metadata.

At -O0 there will be no LDS lowering, but there will be no AA either. I do not 
see how to exploit it on practice.

One other thing to note here: there is also !noalias metadata generated in the 
very same place. I do not care about this because I am really searching for a 
store into this memory, which is a scope.

When I was writing code to generate this metadata I kept in mind exactly a 
scenario similar to this.

https://github.com/llvm/llvm-project/pull/75974
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [lldb] [lld] [flang] [clang-tools-extra] [libcxx] [llvm] [libc] [compiler-rt] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2023-12-19 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

This is the place I am creating it: https://reviews.llvm.org/D108315

https://github.com/llvm/llvm-project/pull/75974
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lld] [clang] [flang] [clang-tools-extra] [llvm] [lldb] [libc] [compiler-rt] [libcxx] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-02 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

Ping

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [flang] [libc] [clang-tools-extra] [lldb] [lld] [compiler-rt] [clang] [llvm] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2024-01-02 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

Ping

https://github.com/llvm/llvm-project/pull/75974
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [llvm] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [clang] [libc] [flang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-03 Thread Stanislav Mekhanoshin via lldb-commits


@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 setRegScore(RegNo, T, CurrScore);
   }
 }
-if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) {
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+if (Inst.mayStore() &&
+(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
+  // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
+  // written can be accessed. A load from LDS to VMEM does not need a wait.
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal

rampitec wrote:

> PseudoSourceValue::mayAlias is supposed to report aliasing to possible IR 
> values. It looks like it's layered weirdly, and expects you to go through 
> MachineInstr::mayAlias. MachineInstr::mayAlias ought to be using the AA tags, 
> it shouldn't be a fundamental limitation

This is all PSV::mayAlias() does:
```
bool PseudoSourceValue::mayAlias(const MachineFrameInfo *) const {
  return !(isGOT() || isConstantPool() || isJumpTable());
}
```
No very useful. Then even to get to the AA tags check MI:mayAlias() shall go 
through all IR values' checks first.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [llvm] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [clang] [libc] [flang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-03 Thread Stanislav Mekhanoshin via lldb-commits


@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 setRegScore(RegNo, T, CurrScore);
   }
 }
-if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) {
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+if (Inst.mayStore() &&
+(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
+  // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
+  // written can be accessed. A load from LDS to VMEM does not need a wait.
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal

rampitec wrote:

> It looks to me like it does use it if you pass UseTBAA=true. Not sure why 
> this would be a parameter in the first place

I am passing it, but to get to that check it shall first go through all Value 
and offset checks. Using AA is the last thing it does: 
https://llvm.org/doxygen/MachineInstr_8cpp_source.html#l01285

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [llvm] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [clang] [libc] [flang] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-03 Thread Stanislav Mekhanoshin via lldb-commits


@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 setRegScore(RegNo, T, CurrScore);
   }
 }
-if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) {
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+if (Inst.mayStore() &&
+(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
+  // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
+  // written can be accessed. A load from LDS to VMEM does not need a wait.
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal

rampitec wrote:

> The values don't need to be identical, that's the point of the AA query. 
> BasicAA will parse through the offsets

I also think that values don't need to be identical. But that is what 
MI:mayAlias() does *before* it checks AA: 
https://llvm.org/doxygen/MachineInstr_8cpp_source.html#l01285

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [llvm] [flang] [clang] [clang-tools-extra] [compiler-rt] [libc] [lldb] [lld] [libcxx] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-11 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

Ping

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [libcxx] [compiler-rt] [llvm] [libc] [lldb] [lld] [clang-tools-extra] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/10] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots

[Lldb-commits] [flang] [libcxx] [compiler-rt] [llvm] [libc] [lldb] [lld] [clang-tools-extra] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Stanislav Mekhanoshin via lldb-commits


@@ -130,6 +130,8 @@
 ; GCN-O0-NEXT:MachineDominator Tree Construction
 ; GCN-O0-NEXT:Machine Natural Loop Construction
 ; GCN-O0-NEXT:MachinePostDominator Tree Construction
+; GCN-O0-NEXT:Basic Alias Analysis (stateless AA impl)
+; GCN-O0-NEXT:Function Alias Analysis Results

rampitec wrote:

If I just skip getAnalysis call it does not help since analysis is requested in 
the getAnalysisUsage. If I do not request it it is not available at any 
optlevel.  This is the benefit of the alternative 
https://github.com/llvm/llvm-project/pull/75974, it does not request the full 
analysis.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [libcxx] [compiler-rt] [llvm] [libc] [lldb] [lld] [clang-tools-extra] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Stanislav Mekhanoshin via lldb-commits


@@ -1183,9 +1228,21 @@ bool 
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
 // No need to wait before load from VMEM to LDS.
 if (TII->mayWriteLDSThroughDMA(MI))
   continue;
-unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+
 // VM_CNT is only relevant to vgpr or LDS.
-ScoreBrackets.determineWait(VM_CNT, RegNo, Wait);
+unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+bool FoundAliasingStore = false;
+if (Ptr && Memop->getAAInfo() && Memop->getAAInfo().Scope) {

rampitec wrote:

I have reserved just 8 pseudo registers to track it. I do not want to fill it 
with unrelated stuff. I know that the only way AA will be able to handle this 
very specific situation is if there is scope info, otherwise there is no reason 
to waste a slot and compile time. If I do not enter this 'if' the pass will 
just do conservatively correct thing and wait for this memory regardless of 
aliasing or lack of it.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang-tools-extra] [flang] [libc] [lldb] [compiler-rt] [lld] [llvm] [libcxx] [clang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-12 Thread Stanislav Mekhanoshin via lldb-commits


@@ -703,8 +713,37 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 setRegScore(RegNo, T, CurrScore);
   }
 }
-if (Inst.mayStore() && (TII->isDS(Inst) || mayWriteLDSThroughDMA(Inst))) {
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+if (Inst.mayStore() &&
+(TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
+  // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
+  // written can be accessed. A load from LDS to VMEM does not need a wait.
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal

rampitec wrote:

Right, there is no PSV. I have mentioned PSV because you have earlier suggested 
to use it. For the real IR value: it is not helpful to compare it. The IR value 
is a GEP, and this GEP is always different. I.e. these values never compare 
equal. The rest of the IR is already gone and unavailable for the analysis. 
Even if it would be available this GEP will address kernel module LDS variable, 
a single huge LDS array, and will be useless again. In this case it will tell 
you any LDS operation aliases any other. Now during the module LDS lowering I 
am creating alias scope info specifically to disambiguate aliasing after the 
pass has squashed all LDS variables.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [lld] [clang-tools-extra] [llvm] [compiler-rt] [lldb] [clang] [libc] [libcxx] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-15 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/11] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7eff..2e079404b087faa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7d..50ad22130e939e2 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved s

[Lldb-commits] [libcxx] [flang] [llvm] [libc] [compiler-rt] [clang-tools-extra] [clang] [lld] [lldb] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-15 Thread Stanislav Mekhanoshin via lldb-commits


@@ -1183,9 +1228,21 @@ bool 
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
 // No need to wait before load from VMEM to LDS.
 if (TII->mayWriteLDSThroughDMA(MI))
   continue;
-unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+
 // VM_CNT is only relevant to vgpr or LDS.
-ScoreBrackets.determineWait(VM_CNT, RegNo, Wait);
+unsigned RegNo = SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS;
+bool FoundAliasingStore = false;
+if (Ptr && Memop->getAAInfo() && Memop->getAAInfo().Scope) {

rampitec wrote:

I have added more comments to explain this. The place which fills the LDS DMA 
slot bails if there is no scope info not to waste limited tracking slots. In 
that case a generic first slot is still used for such operation (it is always 
used, regardless if we can or cannot be more specific about the underlying 
object). Here AA will be unable to disambiguate aliasing if there is no scope 
info, so this condition is simply a shortcut to avoid an expensive loop and AA 
query. I can remove this part of the condition here and nothing will change 
except it will work slower. Note that not entering this 'if' statement will 
always produce a conservatively correct wait using first generic tracking slot, 
which always gets a score regardless of our ability to track a specific object. 
The condition is around the relaxation code to avoid a generic and conservative 
'wait for everything' part below.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [compiler-rt] [clang] [clang-tools-extra] [libc] [flang] [lldb] [lld] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/12] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots

[Lldb-commits] [clang-tools-extra] [lldb] [libc] [libcxx] [clang] [compiler-rt] [lld] [flang] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits


@@ -707,7 +723,40 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
 (TII->isDS(Inst) || TII->mayWriteLDSThroughDMA(Inst))) {
   // MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS
   // written can be accessed. A load from LDS to VMEM does not need a wait.
-  setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
+  unsigned Slot = 0;
+  for (const auto *MemOp : Inst.memoperands()) {
+if (!MemOp->isStore() ||
+MemOp->getAddrSpace() != AMDGPUAS::LOCAL_ADDRESS)
+  continue;
+// Comparing just AA info does not guarantee memoperands are equal
+// in general, but this is so for LDS DMA in practice.
+auto AAI = MemOp->getAAInfo();
+// Alias scope information gives a way to definitely identify an
+// original memory object and practically produced in the module LDS
+// lowering pass. If there is no scope available we will not be able
+// to disambiguate LDS aliasing as after the module lowering all LDS
+// is squashed into a single big object. Do not attemt to use one of

rampitec wrote:

Done

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang-tools-extra] [lldb] [libc] [libcxx] [clang] [compiler-rt] [lld] [flang] [llvm] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> lgtm, but can still fix the -O0 thing

But where do I get TM in the getAnalysisUsage?

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [llvm] [lld] [compiler-rt] [clang-tools-extra] [clang] [libc] [lldb] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/74537

>From 7e382620cdc5999c645ed0746f242595f0294c58 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Dec 2023 16:11:53 -0800
Subject: [PATCH 01/13] [AMDGPU] Use alias info to relax waitcounts for LDS DMA

LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a
psuedo register is used in the scoreboard, acting like if LDS DMA
writes it and LDS load reads it.

This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.

Fixes: SWDEV-433427
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp   |  16 +-
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  73 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp  |   4 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h|   8 +
 llvm/lib/Target/AMDGPU/lds-dma-waits.ll | 154 
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll|   2 +
 6 files changed, 241 insertions(+), 16 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/lds-dma-waits.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7ef..2e079404b087fa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1128,11 +1128,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 MachineMemOperand::MOStore |
 MachineMemOperand::MODereferenceable;
 
-  // XXX - Should this be volatile without known ordering?
-  Info.flags |= MachineMemOperand::MOVolatile;
-
   switch (IntrID) {
   default:
+// XXX - Should this be volatile without known ordering?
+Info.flags |= MachineMemOperand::MOVolatile;
 break;
   case Intrinsic::amdgcn_raw_buffer_load_lds:
   case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
@@ -1140,6 +1139,7 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
   case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
 unsigned Width = 
cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
+Info.ptrVal = CI.getArgOperand(1);
 return true;
   }
   }
@@ -1268,8 +1268,8 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
&Info,
 Info.opc = ISD::INTRINSIC_VOID;
 unsigned Width = cast(CI.getArgOperand(2))->getZExtValue();
 Info.memVT = EVT::getIntegerVT(CI.getContext(), Width * 8);
-Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
-  MachineMemOperand::MOVolatile;
+Info.ptrVal = CI.getArgOperand(1);
+Info.flags |= MachineMemOperand::MOLoad | MachineMemOperand::MOStore;
 return true;
   }
   case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
@@ -9084,7 +9084,9 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 
 MachinePointerInfo StorePtrI = LoadPtrI;
-StorePtrI.V = nullptr;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
+LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 
 auto F = LoadMMO->getFlags() &
@@ -9162,6 +9164,8 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 MachinePointerInfo LoadPtrI = LoadMMO->getPointerInfo();
 LoadPtrI.Offset = Op->getConstantOperandVal(5);
 MachinePointerInfo StorePtrI = LoadPtrI;
+LoadPtrI.V = UndefValue::get(
+PointerType::get(*DAG.getContext(), AMDGPUAS::GLOBAL_ADDRESS));
 LoadPtrI.AddrSpace = AMDGPUAS::GLOBAL_ADDRESS;
 StorePtrI.AddrSpace = AMDGPUAS::LOCAL_ADDRESS;
 auto F = LoadMMO->getFlags() &
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7..50ad22130e939e 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/Sequence.h"
+#include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachinePostDominators.h"
 #include "llvm/InitializePasses.h"
@@ -121,8 +122,13 @@ enum RegisterMapping {
   SQ_MAX_PGM_VGPRS = 512, // Maximum programmable VGPRs across all targets.
   AGPR_OFFSET = 256,  // Maximum programmable ArchVGPRs across all targets.
   SQ_MAX_PGM_SGPRS = 256, // Maximum programmable SGPRs across all targets.
-  NUM_EXTRA_VGPRS = 1,// A reserved slot for DS.
-  EXTRA_VGPR_LDS = 0, // An artificial register to track LDS writes.
+  NUM_EXTRA_VGPRS = 9,// Reserved slots

[Lldb-commits] [libcxx] [llvm] [lld] [compiler-rt] [clang-tools-extra] [clang] [libc] [lldb] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> > lgtm, but can still fix the -O0 thing
> 
> But where do I get TM in the getAnalysisUsage?

Found addUsedIfAvailable() which does the trick.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang-tools-extra] [flang] [lld] [libc] [clang] [llvm] [mlir] [compiler-rt] [lldb] [AMDGPU] Reapply 'Sign extend simm16 in setreg intrinsic' (PR #78492)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/78492

>From 01af6c9d8e80b810bbdec35dee38b1cf5d73cfe0 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Fri, 12 Jan 2024 15:07:53 -0800
Subject: [PATCH 1/3] [AMDGPU] Sign extend simm16 in setreg intrinsic

We currently force users to use a negative contant in the
intrinsic call. Changing it zext would break existing programs,
so just sign extend an argument.
---
 llvm/lib/Target/AMDGPU/SOPInstructions.td | 11 ++--
 .../CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll| 66 +++
 2 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td 
b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 46fa3d57a21cb2..5b35d4dcac2e4f 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1117,14 +1117,12 @@ def S_GETREG_B32 : SOPK_Pseudo <
 let Defs = [MODE], Uses = [MODE] in {
 
 // FIXME: Need to truncate immediate to 16-bits.
-class S_SETREG_B32_Pseudo  pattern=[]> : SOPK_Pseudo <
+class S_SETREG_B32_Pseudo : SOPK_Pseudo <
   "s_setreg_b32",
   (outs), (ins SReg_32:$sdst, hwreg:$simm16),
-  "$simm16, $sdst",
-  pattern>;
+  "$simm16, $sdst">;
 
-def S_SETREG_B32 : S_SETREG_B32_Pseudo <
-  [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> {
+def S_SETREG_B32 : S_SETREG_B32_Pseudo {
   // Use custom inserter to optimize some cases to
   // S_DENORM_MODE/S_ROUND_MODE/S_SETREG_B32_mode.
   let usesCustomInserter = 1;
@@ -1160,6 +1158,9 @@ def S_SETREG_IMM32_B32_mode : S_SETREG_IMM32_B32_Pseudo {
 
 } // End Defs = [MODE], Uses = [MODE]
 
+def : GCNPat<(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst),
+ (S_SETREG_B32 $sdst, (as_i16timm $simm16))>;
+
 class SOPK_WAITCNT pat=[]> :
 SOPK_Pseudo<
 opName,
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll
index d2c14f2401fc35..99d80b5dd14b33 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.setreg.ll
@@ -1433,6 +1433,72 @@ define amdgpu_kernel void 
@test_setreg_set_4_bits_straddles_round_and_denorm() {
   ret void
 }
 
+define amdgpu_ps void @test_63489(i32 inreg %var.mode) {
+; GFX6-LABEL: test_63489:
+; GFX6:   ; %bb.0:
+; GFX6-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX6-NEXT:;;#ASMSTART
+; GFX6-NEXT:;;#ASMEND
+; GFX6-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX789-LABEL: test_63489:
+; GFX789:   ; %bb.0:
+; GFX789-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX789-NEXT:;;#ASMSTART
+; GFX789-NEXT:;;#ASMEND
+; GFX789-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX10-LABEL: test_63489:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX10-NEXT:;;#ASMSTART
+; GFX10-NEXT:;;#ASMEND
+; GFX10-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX11-LABEL: test_63489:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:s_endpgm ; encoding: [0x00,0x00,0xb0,0xbf]
+  call void @llvm.amdgcn.s.setreg(i32 63489, i32 %var.mode)
+  call void asm sideeffect "", ""()
+  ret void
+}
+
+define amdgpu_ps void @test_minus_2047(i32 inreg %var.mode) {
+; GFX6-LABEL: test_minus_2047:
+; GFX6:   ; %bb.0:
+; GFX6-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX6-NEXT:;;#ASMSTART
+; GFX6-NEXT:;;#ASMEND
+; GFX6-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX789-LABEL: test_minus_2047:
+; GFX789:   ; %bb.0:
+; GFX789-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX789-NEXT:;;#ASMSTART
+; GFX789-NEXT:;;#ASMEND
+; GFX789-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX10-LABEL: test_minus_2047:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x80,0xb9]
+; GFX10-NEXT:;;#ASMSTART
+; GFX10-NEXT:;;#ASMEND
+; GFX10-NEXT:s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
+;
+; GFX11-LABEL: test_minus_2047:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:s_setreg_b32 hwreg(HW_REG_MODE), s0 ; encoding: 
[0x01,0xf8,0x00,0xb9]
+; GFX11-NEXT:;;#ASMSTART
+; GFX11-NEXT:;;#ASMEND
+; GFX11-NEXT:s_endpgm ; encoding: [0x00,0x00,0xb0,0xbf]
+  call void @llvm.amdgcn.s.setreg(i32 -2047, i32 %var.mode)
+  call void asm sideeffect "", ""()
+  ret void
+}
+
 ; FIXME: Broken for DAG
 ; define void @test_setreg_roundingmode_var_vgpr(i32 %var.mode) {
 ;   call void @llvm.amdgcn.s.setreg(i32 4097, i32 %var.mode)

>From daeef9d3780bcfc9f48a2bf4fff313f3e5575f6b Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 15 Jan 2024 11:21:05 -0

[Lldb-commits] [llvm] [lldb] [compiler-rt] [clang] [lld] [mlir] [libc] [clang-tools-extra] [flang] [AMDGPU] Reapply 'Sign extend simm16 in setreg intrinsic' (PR #78492)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/78492
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libc] [clang] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [llvm] [lldb] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> > > lgtm, but can still fix the -O0 thing
> > 
> > 
> > But where do I get TM in the getAnalysisUsage?
> 
> MF.getTarget() (or maybe a pass parameter is necessary?)

There is no MF there of course.

https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libc] [clang] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [llvm] [lldb] [flang] [AMDGPU] Use alias info to relax waitcounts for LDS DMA (PR #74537)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/74537
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libc] [clang] [clang-tools-extra] [libcxx] [compiler-rt] [lld] [llvm] [lldb] [flang] [AMDGPU] Use alias scope to relax waitcounts for LDS DMA (PR #75974)

2024-01-17 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/75974
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [flang] [mlir] [llvm] [compiler-rt] [clang-tools-extra] [openmp] [libc] [lldb] [lld] [clang] AMDGPU: Add SourceOfDivergence for int_amdgcn_global_load_tr (PR #79218)

2024-01-23 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/79218
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [AMDGPU] Add another SIFoldOperands instance after shrink (PR #67878)

2023-10-03 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> > I've just tested this on 1 graphics shaders and it seems to make no 
> > difference at all. I tried gfx900 and gfx1100. Can anyone else from the 
> > graphics team confirm this?
> 
> I can confirm no difference on gfx1102

gfx11 is the same as gfx10, it just bails because of the VOP3 literal support. 
This is strange for gfx9. Do these shaders use -O2 or -O3?

https://github.com/llvm/llvm-project/pull/67878
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [AMDGPU] Add another SIFoldOperands instance after shrink (PR #67878)

2023-10-03 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> I've just tested this on 1 graphics shaders and it seems to make no 
> difference at all. I tried gfx900 and gfx1100. Can anyone else from the 
> graphics team confirm this?

It seems the most impact is on the pre-gfx9 targets, very similar to 
https://github.com/llvm/llvm-project/pull/68028 and for the same reason: there 
were no no-carry add/sub. The rest of the impact is when an add/sub is created 
late in the pipeline.

https://github.com/llvm/llvm-project/pull/67878
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [AMDGPU] Add another SIFoldOperands instance after shrink (PR #67878)

2023-10-03 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

I have measured compile time performance impact with timing 
check-llvm-codegen-amdgpu on the release build:

```
before the patch:   11.06s
add folding:11.09s +0.2%
remove folding from shrink: 11.02s -0.4%
```
In general the impact smaller than run to run variance, the numbers are median 
time of 5 runs.

https://github.com/llvm/llvm-project/pull/67878
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [AMDGPU] Add another SIFoldOperands instance after shrink (PR #67878)

2023-10-05 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> I've taken another look at this. The patch does not show any benefit from 
> running another `SIFoldOperands` pass _after_ `SIShrinkInstructions` per se; 
> you get exactly the same results (modulo a couple of add instructions that 
> have their operands commuted differently) if you put the second 
> `SIFoldOperands` run _before_ `SIShrinkInstructions` instead.
> 
> In other words `SIFoldOperands` is not idempotent, and the reason for the 
> that seems to be:
> 
> > And the reason it only happens for some SUBREV instructions is even more 
> > convoluted. It's because SIFoldOperands will sometimes shrink 
> > V_SUB_CO_U32_e64 to V_SUBREV_CO_U32_e32 even it does not manage to fold 
> > anything into it. This does seem wrong and is probably worth a closer look.
> 
> This goes back to https://reviews.llvm.org/D51345. Notice how the code that 
> was added to `updateOperand` does the shrinking but does not actually do any 
> folding; it returns before we get to 
> `Old.ChangeToImmediate`/`Old.substVirtReg`. A second run of `SIFoldOperands` 
> will see the shrunk instruction and fold into it.

Yes, this is mostly old targets without no-carry add/sub and the impact is on 
these 2 instructions which needs to be shrunk before folding. Although fold 
operands' shrinking capabilities are really limited compared to the shrink pass.

https://github.com/llvm/llvm-project/pull/67878
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [AMDGPU] Change the representation of double literals in operands (PR #68740)

2023-10-12 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/68740

>From cc9e065a9218eb36750a2c2a4a4d08fae3f329fa Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 4 Oct 2023 13:36:25 -0700
Subject: [PATCH 1/6] [AMDGPU] Change the representation of double literals in
 operands

A 64-bit literal can be used as a 32-bit zero or sign extended
operand. In case of double zeroes are added to the low 32 bits.
Currently asm parser stores only high 32 bits of a double into
an operand. To support codegen as requested by the
https://github.com/llvm/llvm-project/issues/67781 we need to
change the representation to store a full 64-bit value so that
codegen can simply add immediates to an instruction.

There is some code to support compatibility with existing tests
and asm kernels. We allow to use short hex strings to represent
only a high 32 bit of a double value as a valid literal.
---
 .../AMDGPU/AsmParser/AMDGPUAsmParser.cpp  | 21 --
 .../Disassembler/AMDGPUDisassembler.cpp   | 28 ++-
 .../AMDGPU/Disassembler/AMDGPUDisassembler.h  |  9 --
 .../AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp | 12 +---
 .../AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h   |  2 +-
 .../MCTargetDesc/AMDGPUMCCodeEmitter.cpp  |  3 ++
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  |  4 ++-
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|  7 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  3 ++
 9 files changed, 70 insertions(+), 19 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp 
b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 35656bcaea1af7f..0553d3f20b21c56 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -2140,9 +2140,10 @@ void AMDGPUOperand::addLiteralImmOperand(MCInst &Inst, 
int64_t Val, bool ApplyMo
   const_cast(AsmParser)->Warning(Inst.getLoc(),
   "Can't encode literal as exact 64-bit floating-point operand. "
   "Low 32-bits will be set to zero");
+  Val &= 0xu;
 }
 
-Inst.addOperand(MCOperand::createImm(Literal.lshr(32).getZExtValue()));
+Inst.addOperand(MCOperand::createImm(Val));
 setImmKindLiteral();
 return;
   }
@@ -2241,7 +2242,10 @@ void AMDGPUOperand::addLiteralImmOperand(MCInst &Inst, 
int64_t Val, bool ApplyMo
   return;
 }
 
-Inst.addOperand(MCOperand::createImm(Lo_32(Val)));
+if (isInt<32>(Val) || isUInt<32>(Val))
+  Val = AMDGPU::isSISrcFPOperand(InstDesc, OpNum) ? Val << 32 : Lo_32(Val);
+
+Inst.addOperand(MCOperand::createImm(Val));
 setImmKindLiteral();
 return;
 
@@ -4297,7 +4301,18 @@ bool AMDGPUAsmParser::validateVOPLiteral(const MCInst 
&Inst,
   continue;
 
 if (MO.isImm() && !isInlineConstant(Inst, OpIdx)) {
-  uint32_t Value = static_cast(MO.getImm());
+  uint64_t Value = static_cast(MO.getImm());
+  bool IsFP = AMDGPU::isSISrcFPOperand(Desc, OpIdx);
+  bool IsValid32Op = AMDGPU::isValid32BitLiteral(Value, IsFP);
+
+  if (!IsValid32Op && !isInt<32>(Value) && !isUInt<32>(Value)) {
+Error(getLitLoc(Operands), "invalid operand for instruction");
+return false;
+  }
+
+  if (IsFP && IsValid32Op)
+Value = Hi_32(Value);
+
   if (NumLiterals == 0 || LiteralValue != Value) {
 LiteralValue = Value;
 ++NumLiterals;
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp 
b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 439762bc6caf786..8c49c9a9c87772e 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -378,6 +378,15 @@ static DecodeStatus decodeOperand_AVLdSt_Any(MCInst &Inst, 
unsigned Imm,
   return addOperand(Inst, DAsm->decodeSrcOp(Opw, Imm | 256));
 }
 
+static DecodeStatus
+decodeOperand_VSrc_f64(MCInst &Inst, unsigned Imm, uint64_t Addr,
+   const MCDisassembler *Decoder) {
+  assert(Imm < (1 << 9) && "9-bit encoding");
+  auto DAsm = static_cast(Decoder);
+  return addOperand(Inst, DAsm->decodeSrcOp(AMDGPUDisassembler::OPW64, Imm,
+false, 64, true));
+}
+
 static DecodeStatus
 DecodeAVLdSt_32RegisterClass(MCInst &Inst, unsigned Imm, uint64_t Addr,
  const MCDisassembler *Decoder) {
@@ -1218,7 +1227,7 @@ 
AMDGPUDisassembler::decodeMandatoryLiteralConstant(unsigned Val) const {
   return MCOperand::createImm(Literal);
 }
 
-MCOperand AMDGPUDisassembler::decodeLiteralConstant() const {
+MCOperand AMDGPUDisassembler::decodeLiteralConstant(bool ExtendFP64) const {
   // For now all literal constants are supposed to be unsigned integer
   // ToDo: deal with signed/unsigned 64-bit integer constants
   // ToDo: deal with float/double constants
@@ -1228,9 +1237,11 @@ MCOperand AMDGPUDisassembler::decodeLite

[Lldb-commits] [lldb] [AMDGPU] Change the representation of double literals in operands (PR #68740)

2023-10-12 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/68740

>From cc9e065a9218eb36750a2c2a4a4d08fae3f329fa Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Wed, 4 Oct 2023 13:36:25 -0700
Subject: [PATCH 1/6] [AMDGPU] Change the representation of double literals in
 operands

A 64-bit literal can be used as a 32-bit zero or sign extended
operand. In case of double zeroes are added to the low 32 bits.
Currently asm parser stores only high 32 bits of a double into
an operand. To support codegen as requested by the
https://github.com/llvm/llvm-project/issues/67781 we need to
change the representation to store a full 64-bit value so that
codegen can simply add immediates to an instruction.

There is some code to support compatibility with existing tests
and asm kernels. We allow to use short hex strings to represent
only a high 32 bit of a double value as a valid literal.
---
 .../AMDGPU/AsmParser/AMDGPUAsmParser.cpp  | 21 --
 .../Disassembler/AMDGPUDisassembler.cpp   | 28 ++-
 .../AMDGPU/Disassembler/AMDGPUDisassembler.h  |  9 --
 .../AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp | 12 +---
 .../AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h   |  2 +-
 .../MCTargetDesc/AMDGPUMCCodeEmitter.cpp  |  3 ++
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  |  4 ++-
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|  7 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  3 ++
 9 files changed, 70 insertions(+), 19 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp 
b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 35656bcaea1af7f..0553d3f20b21c56 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -2140,9 +2140,10 @@ void AMDGPUOperand::addLiteralImmOperand(MCInst &Inst, 
int64_t Val, bool ApplyMo
   const_cast(AsmParser)->Warning(Inst.getLoc(),
   "Can't encode literal as exact 64-bit floating-point operand. "
   "Low 32-bits will be set to zero");
+  Val &= 0xu;
 }
 
-Inst.addOperand(MCOperand::createImm(Literal.lshr(32).getZExtValue()));
+Inst.addOperand(MCOperand::createImm(Val));
 setImmKindLiteral();
 return;
   }
@@ -2241,7 +2242,10 @@ void AMDGPUOperand::addLiteralImmOperand(MCInst &Inst, 
int64_t Val, bool ApplyMo
   return;
 }
 
-Inst.addOperand(MCOperand::createImm(Lo_32(Val)));
+if (isInt<32>(Val) || isUInt<32>(Val))
+  Val = AMDGPU::isSISrcFPOperand(InstDesc, OpNum) ? Val << 32 : Lo_32(Val);
+
+Inst.addOperand(MCOperand::createImm(Val));
 setImmKindLiteral();
 return;
 
@@ -4297,7 +4301,18 @@ bool AMDGPUAsmParser::validateVOPLiteral(const MCInst 
&Inst,
   continue;
 
 if (MO.isImm() && !isInlineConstant(Inst, OpIdx)) {
-  uint32_t Value = static_cast(MO.getImm());
+  uint64_t Value = static_cast(MO.getImm());
+  bool IsFP = AMDGPU::isSISrcFPOperand(Desc, OpIdx);
+  bool IsValid32Op = AMDGPU::isValid32BitLiteral(Value, IsFP);
+
+  if (!IsValid32Op && !isInt<32>(Value) && !isUInt<32>(Value)) {
+Error(getLitLoc(Operands), "invalid operand for instruction");
+return false;
+  }
+
+  if (IsFP && IsValid32Op)
+Value = Hi_32(Value);
+
   if (NumLiterals == 0 || LiteralValue != Value) {
 LiteralValue = Value;
 ++NumLiterals;
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp 
b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 439762bc6caf786..8c49c9a9c87772e 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -378,6 +378,15 @@ static DecodeStatus decodeOperand_AVLdSt_Any(MCInst &Inst, 
unsigned Imm,
   return addOperand(Inst, DAsm->decodeSrcOp(Opw, Imm | 256));
 }
 
+static DecodeStatus
+decodeOperand_VSrc_f64(MCInst &Inst, unsigned Imm, uint64_t Addr,
+   const MCDisassembler *Decoder) {
+  assert(Imm < (1 << 9) && "9-bit encoding");
+  auto DAsm = static_cast(Decoder);
+  return addOperand(Inst, DAsm->decodeSrcOp(AMDGPUDisassembler::OPW64, Imm,
+false, 64, true));
+}
+
 static DecodeStatus
 DecodeAVLdSt_32RegisterClass(MCInst &Inst, unsigned Imm, uint64_t Addr,
  const MCDisassembler *Decoder) {
@@ -1218,7 +1227,7 @@ 
AMDGPUDisassembler::decodeMandatoryLiteralConstant(unsigned Val) const {
   return MCOperand::createImm(Literal);
 }
 
-MCOperand AMDGPUDisassembler::decodeLiteralConstant() const {
+MCOperand AMDGPUDisassembler::decodeLiteralConstant(bool ExtendFP64) const {
   // For now all literal constants are supposed to be unsigned integer
   // ToDo: deal with signed/unsigned 64-bit integer constants
   // ToDo: deal with float/double constants
@@ -1228,9 +1237,11 @@ MCOperand AMDGPUDisassembler::decodeLite

[Lldb-commits] [lldb] [AMDGPU] Change the representation of double literals in operands (PR #68740)

2023-10-12 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/68740
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [AMDGPU] Change the representation of double literals in operands (PR #68740)

2023-10-12 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> I suppose left shift of negative values is undefined because if you shift out 
> the sign bit you can overflow and get a positive value.

Sounds like BS. It is defined. Unexpected maybe.

https://github.com/llvm/llvm-project/pull/68740
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [AMDGPU] Change the representation of double literals in operands (PR #68740)

2023-10-13 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

> I suppose left shift of negative values is undefined because if you shift out 
> the sign bit you can overflow and get a positive value.

https://github.com/llvm/llvm-project/pull/68959

https://github.com/llvm/llvm-project/pull/68740
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (PR #70395)

2023-10-30 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec closed 
https://github.com/llvm/llvm-project/pull/70395
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang-tools-extra] [mlir] [libcxx] [llvm] [libc] [flang] [openmp] [clang] [lldb] GlobalISel: Guide return in llvm::getIConstantSplatVal (PR #71989)

2023-11-10 Thread Stanislav Mekhanoshin via lldb-commits

rampitec wrote:

Any tests?

https://github.com/llvm/llvm-project/pull/71989
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [llvm] [libc] [flang] [openmp] [clang] [mlir] [libcxx] [lldb] [clang-tools-extra] GlobalISel: Guard return in llvm::getIConstantSplatVal (PR #71989)

2023-11-14 Thread Stanislav Mekhanoshin via lldb-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/71989
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits