[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-06 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From 07b3f94e49df221406cf7b83a05c8704e1af1c75 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 1/2] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-06 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From 07b3f94e49df221406cf7b83a05c8704e1af1c75 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 1/3] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-06 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From 07b3f94e49df221406cf7b83a05c8704e1af1c75 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 1/4] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-06 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From 07b3f94e49df221406cf7b83a05c8704e1af1c75 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 1/5] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-06 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From a87ba1892375ef67edb5d6f3bd537869203273a6 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 1/5] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-06 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From a87ba1892375ef67edb5d6f3bd537869203273a6 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 1/6] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-08 Thread Jun Wang via cfe-commits
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML, return HasVMemLoad && UsesVgprLoadedOutside; } +bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) { + bool Modified = false; + + for (auto &MBB : MF) { jwangg

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-08 Thread Jun Wang via cfe-commits
@@ -52,6 +52,11 @@ static cl::opt ForceEmitZeroFlag( cl::desc("Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)"), cl::init(false), cl::Hidden); +static cl::opt +PreciseMemOpFlag("amdgpu-precise-memory-op", + cl::de

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-09 Thread Jun Wang via cfe-commits
@@ -52,6 +52,11 @@ static cl::opt ForceEmitZeroFlag( cl::desc("Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)"), cl::init(false), cl::Hidden); +static cl::opt +PreciseMemOpFlag("amdgpu-precise-memory-op", + cl::de

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-09 Thread Jun Wang via cfe-commits
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML, return HasVMemLoad && UsesVgprLoadedOutside; } +bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) { + bool Modified = false; + + for (auto &MBB : MF) { jwangg

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-09 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From a87ba1892375ef67edb5d6f3bd537869203273a6 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 1/8] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-09 Thread Jun Wang via cfe-commits
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML, return HasVMemLoad && UsesVgprLoadedOutside; } +bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) { + bool Modified = false; + + for (auto &MBB : MF) { jwangg

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-15 Thread Jun Wang via cfe-commits
jwanggit86 wrote: > > So, while it's possible to create a combined option, using a separate > > option also makes sense. Do we generally try to avoid creating new > > command-line options? > > Looking again, I see they are different and unrelated. I don't really > understand why we have amdgp

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-20 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @arsenm Pls let me know if you have any further comments. I'd like to have the code committed and the PR closed. Thanks! https://github.com/llvm/llvm-project/pull/68932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-21 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From e393477607cb94b45a3b9a5db2aea98fb8af2a86 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 1/9] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-21 Thread Jun Wang via cfe-commits
@@ -0,0 +1,222 @@ +; Testing the -amdgpu-precise-memory-op option +; COM: llc -mtriple=amdgcn -mcpu=hawaii -mattr=+amdgpu-precise-memory-op -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX7 jwanggit86 wrote: Comment. Some testcases in this file won'

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-21 Thread Jun Wang via cfe-commits
@@ -1708,6 +1710,13 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } ++Iter; +if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + auto builder = + BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)) +

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-21 Thread Jun Wang via cfe-commits
@@ -1708,6 +1710,13 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } ++Iter; +if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + auto builder = jwanggit86 wrote: Done. https://github.com/llvm/llvm-project/pu

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-21 Thread Jun Wang via cfe-commits
@@ -388,6 +388,8 @@ class SIInsertWaitcnts : public MachineFunctionPass { // message. DenseSet ReleaseVGPRInsts; + // bool insertWaitcntAfterMemOp(MachineFunction &MF); jwanggit86 wrote: Done. https://github.com/llvm/llvm-project/pull/68932

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-21 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From e393477607cb94b45a3b9a5db2aea98fb8af2a86 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 01/10] [AMDGPU] Emit a waitcnt instruction after each memory instru

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-22 Thread Jun Wang via cfe-commits
@@ -1708,6 +1710,19 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } ++Iter; +if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + auto Builder = + BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)) +

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-22 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/68932 >From e393477607cb94b45a3b9a5db2aea98fb8af2a86 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Thu, 12 Oct 2023 16:45:59 -0500 Subject: [PATCH 01/11] [AMDGPU] Emit a waitcnt instruction after each memory instru

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-22 Thread Jun Wang via cfe-commits
@@ -1847,6 +1862,7 @@ bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) { TrackedWaitcntSet.clear(); BlockInfos.clear(); + jwanggit86 wrote: Done. https://github.com/llvm/llvm-project/pull/68932 ___

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-22 Thread Jun Wang via cfe-commits
@@ -1708,6 +1710,19 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } ++Iter; +if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + auto Builder = + BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)) +

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-22 Thread Jun Wang via cfe-commits
@@ -1708,6 +1710,19 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } ++Iter; +if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + auto Builder = + BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)) +

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2023-12-15 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 created https://github.com/llvm/llvm-project/pull/75647 A new function attribute named amdgpu-num-work-groups is added. This attribute allows programmers to let the compiler know the number of workgroups to be launched and do optimizations based on that informatio

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2023-12-15 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/75647 >From bb15eebae9645e5383f26066093c0734ea76442d Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Fri, 15 Dec 2023 13:53:54 -0600 Subject: [PATCH 1/2] [AMDGPU] Adding the amdgpu-num-work-groups function attribute

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2023-12-19 Thread Jun Wang via cfe-commits
jwanggit86 wrote: Two possible optimizations mentioned by the requester are, "1. This'll let the backend know the maximum size of the workgroup ID, and so we can do things like infer nsw or the ability to use a 16-bit add or so on 2. This could be used to optimize global sync stuff in the futu

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-26 Thread Jun Wang via cfe-commits
@@ -0,0 +1,577 @@ +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX90A +; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-26 Thread Jun Wang via cfe-commits
@@ -0,0 +1,577 @@ +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX90A +; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-26 Thread Jun Wang via cfe-commits
@@ -0,0 +1,577 @@ +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX90A +; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-26 Thread Jun Wang via cfe-commits
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } #endif +if (ST->isPreciseMemoryEnabled()) { + AMDGPU::Waitcnt Wait; + if (WCG == &WCGPreGFX12) +Wait = AMDGPU::Waitcnt(0, 0, 0, 0); jwanggit86

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-26 Thread Jun Wang via cfe-commits
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } #endif +if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + AMDGPU::Waitcnt Wait; + if (ST->hasExtendedWaitCounts()) +Wait = AMDGPU::Waitcnt(0, 0, 0,

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-26 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 edited https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-29 Thread Jun Wang via cfe-commits
@@ -0,0 +1,618 @@ +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX90A +; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-29 Thread Jun Wang via cfe-commits
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } #endif +if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + AMDGPU::Waitcnt Wait; + if (ST->hasExtendedWaitCounts()) +Wait = AMDGPU::Waitcnt(0, 0, 0,

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-31 Thread Jun Wang via cfe-commits
@@ -0,0 +1,1413 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+preci

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-31 Thread Jun Wang via cfe-commits
@@ -0,0 +1,1413 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+preci

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-04-01 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @jayfoad @arsenm Any other comments? https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-06 Thread Jun Wang via cfe-commits
jwanggit86 wrote: I thought about having one attribute with 6 numbers. Then you have to provide 6 numbers when using it. In the current design, either the min or the max attribute can be omitted. https://github.com/llvm/llvm-project/pull/79035 ___ cf

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-06 Thread Jun Wang via cfe-commits
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode", "Enable CU wavefront execution mode" >; +def FeaturePreciseMemory jwanggit86 wrote: The name was the result of some discussions last year. I've forwarded you the email. https://github.com

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-06 Thread Jun Wang via cfe-commits
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { jwanggit86 wrote: I don't have strong objection to merging with CacheC

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-06 Thread Jun Wang via cfe-commits
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { +protected: + const GCNSubtarget &ST; + const SIInstrInfo *TII = nullptr; + + IsaVers

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-07 Thread Jun Wang via cfe-commits
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { jwanggit86 wrote: Ok, will merge with CacheControl. https://github.co

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-07 Thread Jun Wang via cfe-commits
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { +protected: + const GCNSubtarget &ST; + const SIInstrInfo *TII = nullptr; + + IsaVers

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-09 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/79035 >From 5c088a59bd36df40bae9a3a712f3994feded359d Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Mon, 22 Jan 2024 12:43:27 -0600 Subject: [PATCH 1/3] [AMDGPU] Adding the amdgpu-num-work-groups function attribute

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-09 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/79035 >From 5c088a59bd36df40bae9a3a712f3994feded359d Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Mon, 22 Jan 2024 12:43:27 -0600 Subject: [PATCH 1/4] [AMDGPU] Adding the amdgpu-num-work-groups function attribute

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-12 Thread Jun Wang via cfe-commits
@@ -356,6 +356,19 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes( if (NumVGPR != 0) F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR)); } + + if (const auto *Attr = FD->getAttr()) { +uint32_t X = Attr->getNumWorkGroupsX(); +uint32_t Y = Attr

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-12 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/79236 >From 9c40b1151b0673430ff53eb121784724a5b090e5 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Tue, 23 Jan 2024 19:19:00 -0600 Subject: [PATCH 1/3] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-12 Thread Jun Wang via cfe-commits
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { jwanggit86 wrote: Merged with SICacheControl. https://github.com/llvm

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-12 Thread Jun Wang via cfe-commits
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode", "Enable CU wavefront execution mode" >; +def FeaturePreciseMemory jwanggit86 wrote: As it is, we have a clang command-line option "-mamdgpu-precise-memory-op". When specified, "+amdgpu-pre

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-12 Thread Jun Wang via cfe-commits
@@ -0,0 +1,362 @@ +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+amdgpu-precise-memory-op < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+amdgpu-precise-memory-op < %s | FileCheck %s -check-prefixes=GFX90A +; RUN: llc -mtriple=amdgcn -

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-12 Thread Jun Wang via cfe-commits
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { +protected: + const GCNSubtarget &ST; + const SIInstrInfo *TII = nullptr; + + IsaVers

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-13 Thread Jun Wang via cfe-commits
@@ -2705,6 +2705,30 @@ An error will be given if: }]; } +def AMDGPUNumWorkGroupsDocs : Documentation { + let Category = DocCatAMDGPUAttributes; + let Content = [{ +The number of work groups specifies the number of work groups when the kernel +is dispatched. + +Clang suppor

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-05 Thread Jun Wang via cfe-commits
@@ -194,3 +204,105 @@ __global__ void non_cexpr_waves_per_eu_2() {} // expected-error@+1{{'amdgpu_waves_per_eu' attribute requires parameter 1 to be an integer constant}} __attribute__((amdgpu_waves_per_eu(2, ipow2(2 __global__ void non_cexpr_waves_per_eu_2_4() {} + +// ex

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-05 Thread Jun Wang via cfe-commits
@@ -139,6 +139,36 @@ kernel void reqd_work_group_size_32_2_1_flat_work_group_size_16_128() { // CHECK: define{{.*}} amdgpu_kernel void @reqd_work_group_size_32_2_1_flat_work_group_size_16_128() [[FLAT_WORK_GROUP_SIZE_16_128:#[0-9]+]] } +__attribute__((amdgpu_max_num_work_gr

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-05 Thread Jun Wang via cfe-commits
@@ -0,0 +1,77 @@ +; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s 2>&1 | FileCheck --check-prefix=ERROR %s + +; ERROR: error: can't parse integer attribute -1 in amdgpu-max-num-work-groups +define amdgpu_kernel void @empty_max_num_work_groups_neg_num1() #21 { +entry:

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-06 Thread Jun Wang via cfe-commits
@@ -137,6 +137,12 @@ Removed Compiler Flags Attribute Changes in Clang -- +- Introduced a new function attribute ``__attribute__((amdgpu_max_num_work_groups(x, y, z)))`` or jwanggit86 wrote: There are existing attributes that have wor

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-06 Thread Jun Wang via cfe-commits
@@ -137,6 +137,12 @@ Removed Compiler Flags Attribute Changes in Clang -- +- Introduced a new function attribute ``__attribute__((amdgpu_max_num_work_groups(x, y, z)))`` or jwanggit86 wrote: In the case of flat workgroup size, the LLV

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-06 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 edited https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-12 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 closed https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-23 Thread Jun Wang via cfe-commits
@@ -194,3 +204,87 @@ __global__ void non_cexpr_waves_per_eu_2() {} // expected-error@+1{{'amdgpu_waves_per_eu' attribute requires parameter 1 to be an integer constant}} __attribute__((amdgpu_waves_per_eu(2, ipow2(2 __global__ void non_cexpr_waves_per_eu_2_4() {} + +// exp

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-26 Thread Jun Wang via cfe-commits
@@ -8069,6 +8069,67 @@ static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D, const ParsedAttr &AL) { D->addAttr(::new (S.Context) AMDGPUNumVGPRAttr(S.Context, AL, NumVGPR)); } +// Returns true if error jwanggit86 wrote: (1) Updated `ReleaseNotes.rst` (2) A

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-26 Thread Jun Wang via cfe-commits
@@ -8069,6 +8069,67 @@ static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D, const ParsedAttr &AL) { D->addAttr(::new (S.Context) AMDGPUNumVGPRAttr(S.Context, AL, NumVGPR)); } +// Returns true if error +static bool +checkAMDGPUMaxNumWorkGroupsArguments(Sema &S, Expr *XExpr,

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-26 Thread Jun Wang via cfe-commits
@@ -607,6 +607,29 @@ static void instantiateDependentAMDGPUWavesPerEUAttr( S.addAMDGPUWavesPerEUAttr(New, Attr, MinExpr, MaxExpr); } +static void instantiateDependentAMDGPUMaxNumWorkGroupsAttr( +Sema &S, const MultiLevelTemplateArgumentList &TemplateArgs, +const AMDG

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-26 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @arsenm Any comments on the LLVM side? https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-26 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @Pierre-vh Any further comments? https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-27 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @Pierre-vh Could you pls help review the backend part of this patch? Thanks! https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-c

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-29 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @jayfoad After trying the patch you provided above, it appears that this feature can indeed be done in SIInsertWaitcnt instead of SIMemoryLegalizer. Code has been updated accordingly. Pls take a look. https://github.com/llvm/llvm-project/pull/79236 ___

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-01 Thread Jun Wang via cfe-commits
@@ -137,6 +137,11 @@ Removed Compiler Flags Attribute Changes in Clang -- +- Introduced a new function attribute ``__attribute__((amdgpu_max_num_work_groups(x, y, z)))`` or + ``[[clang::amdgpu_max_num_work_groups(x, y, z)]]`` for the AMDGPU target. T

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-01 Thread Jun Wang via cfe-commits
@@ -356,6 +356,24 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes( if (NumVGPR != 0) F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR)); } + + if (const auto *Attr = FD->getAttr()) { +uint32_t X = Attr->getMaxNumWorkGroupsX() +

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-01 Thread Jun Wang via cfe-commits
@@ -139,6 +139,36 @@ kernel void reqd_work_group_size_32_2_1_flat_work_group_size_16_128() { // CHECK: define{{.*}} amdgpu_kernel void @reqd_work_group_size_32_2_1_flat_work_group_size_16_128() [[FLAT_WORK_GROUP_SIZE_16_128:#[0-9]+]] } +__attribute__((amdgpu_max_num_work_gr

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-01 Thread Jun Wang via cfe-commits
@@ -194,3 +204,105 @@ __global__ void non_cexpr_waves_per_eu_2() {} // expected-error@+1{{'amdgpu_waves_per_eu' attribute requires parameter 1 to be an integer constant}} __attribute__((amdgpu_waves_per_eu(2, ipow2(2 __global__ void non_cexpr_waves_per_eu_2_4() {} + +// ex

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-01 Thread Jun Wang via cfe-commits
@@ -814,6 +814,15 @@ bool shouldEmitConstantsToTextSection(const Triple &TT); /// to integer. int getIntegerAttribute(const Function &F, StringRef Name, int Default); +/// \returns Unsigned Integer value requested using \p F's \p Name attribute. +/// +/// \returns \p Default i

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-01 Thread Jun Wang via cfe-commits
@@ -0,0 +1,84 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s + jwanggit86 wrote: Created a new test file to test various errors. https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commi

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-03-01 Thread Jun Wang via cfe-commits
@@ -0,0 +1,84 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s + +; Attribute not specified. +; CHECK-LABEL: {{^}}empty_no_attribute: +define amdgpu_kernel void @empty_no_attribute() { +entry: + ret void +} + +; Ignore if number of work groups for x dime

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-30 Thread Jun Wang via cfe-commits
@@ -641,6 +644,9 @@ class SIMemoryLegalizer final : public MachineFunctionPass { bool expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, MachineBasicBlock::iterator &MI); + bool GFX9InsertWaitcntForPreciseMem(MachineFunction &MF); -

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-30 Thread Jun Wang via cfe-commits
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode", "Enable CU wavefront execution mode" >; +def FeaturePreciseMemory jwanggit86 wrote: End user needs this, so a clang command-line option has been added. There was a lot discussion on this.

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-30 Thread Jun Wang via cfe-commits
@@ -17,13 +17,16 @@ #include "AMDGPUMachineModuleInfo.h" #include "GCNSubtarget.h" #include "MCTargetDesc/AMDGPUMCTargetDesc.h" +#include "Utils/AMDGPUBaseInfo.h" #include "llvm/ADT/BitmaskEnum.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineFunc

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-30 Thread Jun Wang via cfe-commits
@@ -0,0 +1,199 @@ +; Testing the -amdgpu-precise-memory-op option +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+amdgpu-precise-memory-op -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+amdgpu-precise-memory-op -v

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-30 Thread Jun Wang via cfe-commits
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) { jwanggit86 wrote: Yes. Code has been rewritten. https://github

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-30 Thread Jun Wang via cfe-commits
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) { + const GCNSubtarget &ST = MF.getSubtarget(); + const SIInstrInfo *TII = ST.get

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-30 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/79236 >From 9c40b1151b0673430ff53eb121784724a5b090e5 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Tue, 23 Jan 2024 19:19:00 -0600 Subject: [PATCH 1/2] [AMDGPU] Emit a waitcnt instruction after each memory instruct

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-04 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/79035 >From 5c088a59bd36df40bae9a3a712f3994feded359d Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Mon, 22 Jan 2024 12:43:27 -0600 Subject: [PATCH 1/2] [AMDGPU] Adding the amdgpu-num-work-groups function attribute

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-05 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @krzysz00 Code has been updated. Pls take a look when convenient. Pls note the following: (1) Two attributes are now supported, one for min and one for max num of workgroups. (2) It is allowed to only specify one of the two attributes. (3) An attribute is ignored if any one of

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-05 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @t-tye Code has been updated based on your feedback. Pls take a look. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-23 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 created https://github.com/llvm/llvm-project/pull/79236 This patch introduces a new command-line option for clang, namely, amdgpu-precise-mem-op. When this option is specified, a waitcnt instruction is generated after each memory load/store instruction. The counte

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2024-01-23 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 closed https://github.com/llvm/llvm-project/pull/68932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2024-01-23 Thread Jun Wang via cfe-commits
jwanggit86 wrote: Implementation is moved to SIMemoryLegalizer pass. See pull req [79236](https://github.com/llvm/llvm-project/pull/79236). https://github.com/llvm/llvm-project/pull/68932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https:/

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-24 Thread Jun Wang via cfe-commits
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) { + const GCNSubtarget &ST = MF.getSubtarget(); + const SIInstrInfo *TII = ST.get

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-25 Thread Jun Wang via cfe-commits
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) { + const GCNSubtarget &ST = MF.getSubtarget(); + const SIInstrInfo *TII = ST.get

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-01-29 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @arsenm @krzysz00 Any comments? https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-01-29 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @krzysz00 Are you asking for something like the following: ``` "amdgpu-min-num-work-groups"="1,2,3", "amdgpu-max-num-work-groups"="4,5,6" ``` When both are given, min must be <= max. https://github.com/llvm/llvm-project/pull/79035 __

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-01-29 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @krzysz00 Let me make sure I understand the requirements correctly. Based on my understanding, the following are the requirements. Pls let me know if there are any mistakes. 1. Create a new function attribute for the number of workgroups (maybe 2 attributes, one for max, and

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-15 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @krzysz00 So how do you want to proceed? https://github.com/llvm/llvm-project/pull/75647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-16 Thread Jun Wang via cfe-commits
jwanggit86 wrote: @krzysz00 So, instead of 1 number as in the current implementation, you want 3 numbers, i.e., 3 lines like the following in the metadata section? ``` .num_workgroups_x: 1 .num_workgroups_y: 2 .num_workgroups_z: 3 ``` https://github.com/llvm/llvm-project/pull/75647 ___

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-01-22 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 created https://github.com/llvm/llvm-project/pull/79035 A new function attribute named amdgpu-num-work-groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-22 Thread Jun Wang via cfe-commits
jwanggit86 wrote: Reimplemented in [79035](https://github.com/llvm/llvm-project/pull/79035). https://github.com/llvm/llvm-project/pull/75647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-22 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 closed https://github.com/llvm/llvm-project/pull/75647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-14 Thread Jun Wang via cfe-commits
https://github.com/jwanggit86 updated https://github.com/llvm/llvm-project/pull/79236 >From 9c40b1151b0673430ff53eb121784724a5b090e5 Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Tue, 23 Jan 2024 19:19:00 -0600 Subject: [PATCH 1/4] [AMDGPU] Emit a waitcnt instruction after each memory instruct

  1   2   >