https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From 07b3f94e49df221406cf7b83a05c8704e1af1c75 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 1/2] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From 07b3f94e49df221406cf7b83a05c8704e1af1c75 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 1/3] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From 07b3f94e49df221406cf7b83a05c8704e1af1c75 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 1/4] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From 07b3f94e49df221406cf7b83a05c8704e1af1c75 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 1/5] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From a87ba1892375ef67edb5d6f3bd537869203273a6 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 1/5] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From a87ba1892375ef67edb5d6f3bd537869203273a6 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 1/6] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
return HasVMemLoad && UsesVgprLoadedOutside;
}
+bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) {
+ bool Modified = false;
+
+ for (auto &MBB : MF) {
jwangg
@@ -52,6 +52,11 @@ static cl::opt ForceEmitZeroFlag(
cl::desc("Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0)
expcnt(0) lgkmcnt(0)"),
cl::init(false), cl::Hidden);
+static cl::opt
+PreciseMemOpFlag("amdgpu-precise-memory-op",
+ cl::de
@@ -52,6 +52,11 @@ static cl::opt ForceEmitZeroFlag(
cl::desc("Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0)
expcnt(0) lgkmcnt(0)"),
cl::init(false), cl::Hidden);
+static cl::opt
+PreciseMemOpFlag("amdgpu-precise-memory-op",
+ cl::de
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
return HasVMemLoad && UsesVgprLoadedOutside;
}
+bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) {
+ bool Modified = false;
+
+ for (auto &MBB : MF) {
jwangg
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From a87ba1892375ef67edb5d6f3bd537869203273a6 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 1/8] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
return HasVMemLoad && UsesVgprLoadedOutside;
}
+bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) {
+ bool Modified = false;
+
+ for (auto &MBB : MF) {
jwangg
jwanggit86 wrote:
> > So, while it's possible to create a combined option, using a separate
> > option also makes sense. Do we generally try to avoid creating new
> > command-line options?
>
> Looking again, I see they are different and unrelated. I don't really
> understand why we have amdgp
jwanggit86 wrote:
@arsenm Pls let me know if you have any further comments. I'd like to have the
code committed and the PR closed. Thanks!
https://github.com/llvm/llvm-project/pull/68932
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From e393477607cb94b45a3b9a5db2aea98fb8af2a86 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 1/9] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
@@ -0,0 +1,222 @@
+; Testing the -amdgpu-precise-memory-op option
+; COM: llc -mtriple=amdgcn -mcpu=hawaii -mattr=+amdgpu-precise-memory-op
-verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX7
jwanggit86 wrote:
Comment. Some testcases in this file won'
@@ -1708,6 +1710,13 @@ bool
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
++Iter;
+if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+ auto builder =
+ BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
+
@@ -1708,6 +1710,13 @@ bool
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
++Iter;
+if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+ auto builder =
jwanggit86 wrote:
Done.
https://github.com/llvm/llvm-project/pu
@@ -388,6 +388,8 @@ class SIInsertWaitcnts : public MachineFunctionPass {
// message.
DenseSet ReleaseVGPRInsts;
+ // bool insertWaitcntAfterMemOp(MachineFunction &MF);
jwanggit86 wrote:
Done.
https://github.com/llvm/llvm-project/pull/68932
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From e393477607cb94b45a3b9a5db2aea98fb8af2a86 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 01/10] [AMDGPU] Emit a waitcnt instruction after each memory
instru
@@ -1708,6 +1710,19 @@ bool
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
++Iter;
+if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+ auto Builder =
+ BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
+
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/68932
>From e393477607cb94b45a3b9a5db2aea98fb8af2a86 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Thu, 12 Oct 2023 16:45:59 -0500
Subject: [PATCH 01/11] [AMDGPU] Emit a waitcnt instruction after each memory
instru
@@ -1847,6 +1862,7 @@ bool
SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {
TrackedWaitcntSet.clear();
BlockInfos.clear();
+
jwanggit86 wrote:
Done.
https://github.com/llvm/llvm-project/pull/68932
___
@@ -1708,6 +1710,19 @@ bool
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
++Iter;
+if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+ auto Builder =
+ BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
+
@@ -1708,6 +1710,19 @@ bool
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
++Iter;
+if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+ auto Builder =
+ BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
+
https://github.com/jwanggit86 created
https://github.com/llvm/llvm-project/pull/75647
A new function attribute named amdgpu-num-work-groups is added. This attribute
allows programmers to let the compiler know the number of workgroups to be
launched and do optimizations based on that informatio
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/75647
>From bb15eebae9645e5383f26066093c0734ea76442d Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Fri, 15 Dec 2023 13:53:54 -0600
Subject: [PATCH 1/2] [AMDGPU] Adding the amdgpu-num-work-groups function
attribute
jwanggit86 wrote:
Two possible optimizations mentioned by the requester are,
"1. This'll let the backend know the maximum size of the workgroup ID, and so
we can do things like infer nsw or the ability to use a 16-bit add or so on
2. This could be used to optimize global sync stuff in the futu
@@ -0,0 +1,577 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX90A
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=
@@ -0,0 +1,577 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX90A
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=
@@ -0,0 +1,577 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX90A
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=
@@ -2326,6 +2326,20 @@ bool
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
#endif
+if (ST->isPreciseMemoryEnabled()) {
+ AMDGPU::Waitcnt Wait;
+ if (WCG == &WCGPreGFX12)
+Wait = AMDGPU::Waitcnt(0, 0, 0, 0);
jwanggit86
@@ -2326,6 +2326,20 @@ bool
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
#endif
+if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+ AMDGPU::Waitcnt Wait;
+ if (ST->hasExtendedWaitCounts())
+Wait = AMDGPU::Waitcnt(0, 0, 0,
https://github.com/jwanggit86 edited
https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -0,0 +1,618 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX90A
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=
@@ -2326,6 +2326,20 @@ bool
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
#endif
+if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+ AMDGPU::Waitcnt Wait;
+ if (ST->hasExtendedWaitCounts())
+Wait = AMDGPU::Waitcnt(0, 0, 0,
@@ -0,0 +1,1413 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
UTC_ARGS: --version 4
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+preci
@@ -0,0 +1,1413 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
UTC_ARGS: --version 4
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+precise-memory < %s |
FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+preci
jwanggit86 wrote:
@jayfoad @arsenm Any other comments?
https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
jwanggit86 wrote:
I thought about having one attribute with 6 numbers. Then you have to provide 6
numbers when using it. In the current design, either the min or the max
attribute can be omitted.
https://github.com/llvm/llvm-project/pull/79035
___
cf
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode",
"Enable CU wavefront execution mode"
>;
+def FeaturePreciseMemory
jwanggit86 wrote:
The name was the result of some discussions last year. I've forwarded you the
email.
https://github.com
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
bool IsNonTemporal) const override;
};
+class SIPreciseMemorySupport {
jwanggit86 wrote:
I don't have strong objection to merging with CacheC
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
bool IsNonTemporal) const override;
};
+class SIPreciseMemorySupport {
+protected:
+ const GCNSubtarget &ST;
+ const SIInstrInfo *TII = nullptr;
+
+ IsaVers
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
bool IsNonTemporal) const override;
};
+class SIPreciseMemorySupport {
jwanggit86 wrote:
Ok, will merge with CacheControl.
https://github.co
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
bool IsNonTemporal) const override;
};
+class SIPreciseMemorySupport {
+protected:
+ const GCNSubtarget &ST;
+ const SIInstrInfo *TII = nullptr;
+
+ IsaVers
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/79035
>From 5c088a59bd36df40bae9a3a712f3994feded359d Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Mon, 22 Jan 2024 12:43:27 -0600
Subject: [PATCH 1/3] [AMDGPU] Adding the amdgpu-num-work-groups function
attribute
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/79035
>From 5c088a59bd36df40bae9a3a712f3994feded359d Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Mon, 22 Jan 2024 12:43:27 -0600
Subject: [PATCH 1/4] [AMDGPU] Adding the amdgpu-num-work-groups function
attribute
@@ -356,6 +356,19 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
if (NumVGPR != 0)
F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));
}
+
+ if (const auto *Attr = FD->getAttr()) {
+uint32_t X = Attr->getNumWorkGroupsX();
+uint32_t Y = Attr
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/79236
>From 9c40b1151b0673430ff53eb121784724a5b090e5 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Tue, 23 Jan 2024 19:19:00 -0600
Subject: [PATCH 1/3] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
bool IsNonTemporal) const override;
};
+class SIPreciseMemorySupport {
jwanggit86 wrote:
Merged with SICacheControl.
https://github.com/llvm
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode",
"Enable CU wavefront execution mode"
>;
+def FeaturePreciseMemory
jwanggit86 wrote:
As it is, we have a clang command-line option "-mamdgpu-precise-memory-op".
When specified, "+amdgpu-pre
@@ -0,0 +1,362 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+amdgpu-precise-memory-op < %s
| FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+amdgpu-precise-memory-op < %s
| FileCheck %s -check-prefixes=GFX90A
+; RUN: llc -mtriple=amdgcn -
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
bool IsNonTemporal) const override;
};
+class SIPreciseMemorySupport {
+protected:
+ const GCNSubtarget &ST;
+ const SIInstrInfo *TII = nullptr;
+
+ IsaVers
@@ -2705,6 +2705,30 @@ An error will be given if:
}];
}
+def AMDGPUNumWorkGroupsDocs : Documentation {
+ let Category = DocCatAMDGPUAttributes;
+ let Content = [{
+The number of work groups specifies the number of work groups when the kernel
+is dispatched.
+
+Clang suppor
@@ -194,3 +204,105 @@ __global__ void non_cexpr_waves_per_eu_2() {}
// expected-error@+1{{'amdgpu_waves_per_eu' attribute requires parameter 1 to
be an integer constant}}
__attribute__((amdgpu_waves_per_eu(2, ipow2(2
__global__ void non_cexpr_waves_per_eu_2_4() {}
+
+// ex
@@ -139,6 +139,36 @@ kernel void
reqd_work_group_size_32_2_1_flat_work_group_size_16_128() {
// CHECK: define{{.*}} amdgpu_kernel void
@reqd_work_group_size_32_2_1_flat_work_group_size_16_128()
[[FLAT_WORK_GROUP_SIZE_16_128:#[0-9]+]]
}
+__attribute__((amdgpu_max_num_work_gr
@@ -0,0 +1,77 @@
+; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s 2>&1 | FileCheck
--check-prefix=ERROR %s
+
+; ERROR: error: can't parse integer attribute -1 in amdgpu-max-num-work-groups
+define amdgpu_kernel void @empty_max_num_work_groups_neg_num1() #21 {
+entry:
@@ -137,6 +137,12 @@ Removed Compiler Flags
Attribute Changes in Clang
--
+- Introduced a new function attribute
``__attribute__((amdgpu_max_num_work_groups(x, y, z)))`` or
jwanggit86 wrote:
There are existing attributes that have wor
@@ -137,6 +137,12 @@ Removed Compiler Flags
Attribute Changes in Clang
--
+- Introduced a new function attribute
``__attribute__((amdgpu_max_num_work_groups(x, y, z)))`` or
jwanggit86 wrote:
In the case of flat workgroup size, the LLV
https://github.com/jwanggit86 edited
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jwanggit86 closed
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -194,3 +204,87 @@ __global__ void non_cexpr_waves_per_eu_2() {}
// expected-error@+1{{'amdgpu_waves_per_eu' attribute requires parameter 1 to
be an integer constant}}
__attribute__((amdgpu_waves_per_eu(2, ipow2(2
__global__ void non_cexpr_waves_per_eu_2_4() {}
+
+// exp
@@ -8069,6 +8069,67 @@ static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D,
const ParsedAttr &AL) {
D->addAttr(::new (S.Context) AMDGPUNumVGPRAttr(S.Context, AL, NumVGPR));
}
+// Returns true if error
jwanggit86 wrote:
(1) Updated `ReleaseNotes.rst` (2) A
@@ -8069,6 +8069,67 @@ static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D,
const ParsedAttr &AL) {
D->addAttr(::new (S.Context) AMDGPUNumVGPRAttr(S.Context, AL, NumVGPR));
}
+// Returns true if error
+static bool
+checkAMDGPUMaxNumWorkGroupsArguments(Sema &S, Expr *XExpr,
@@ -607,6 +607,29 @@ static void instantiateDependentAMDGPUWavesPerEUAttr(
S.addAMDGPUWavesPerEUAttr(New, Attr, MinExpr, MaxExpr);
}
+static void instantiateDependentAMDGPUMaxNumWorkGroupsAttr(
+Sema &S, const MultiLevelTemplateArgumentList &TemplateArgs,
+const AMDG
jwanggit86 wrote:
@arsenm Any comments on the LLVM side?
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
jwanggit86 wrote:
@Pierre-vh Any further comments?
https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
jwanggit86 wrote:
@Pierre-vh Could you pls help review the backend part of this patch? Thanks!
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-c
jwanggit86 wrote:
@jayfoad After trying the patch you provided above, it appears that this
feature can indeed be done in SIInsertWaitcnt instead of SIMemoryLegalizer.
Code has been updated accordingly. Pls take a look.
https://github.com/llvm/llvm-project/pull/79236
___
@@ -137,6 +137,11 @@ Removed Compiler Flags
Attribute Changes in Clang
--
+- Introduced a new function attribute
``__attribute__((amdgpu_max_num_work_groups(x, y, z)))`` or
+ ``[[clang::amdgpu_max_num_work_groups(x, y, z)]]`` for the AMDGPU target.
T
@@ -356,6 +356,24 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
if (NumVGPR != 0)
F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));
}
+
+ if (const auto *Attr = FD->getAttr()) {
+uint32_t X = Attr->getMaxNumWorkGroupsX()
+
@@ -139,6 +139,36 @@ kernel void
reqd_work_group_size_32_2_1_flat_work_group_size_16_128() {
// CHECK: define{{.*}} amdgpu_kernel void
@reqd_work_group_size_32_2_1_flat_work_group_size_16_128()
[[FLAT_WORK_GROUP_SIZE_16_128:#[0-9]+]]
}
+__attribute__((amdgpu_max_num_work_gr
@@ -194,3 +204,105 @@ __global__ void non_cexpr_waves_per_eu_2() {}
// expected-error@+1{{'amdgpu_waves_per_eu' attribute requires parameter 1 to
be an integer constant}}
__attribute__((amdgpu_waves_per_eu(2, ipow2(2
__global__ void non_cexpr_waves_per_eu_2_4() {}
+
+// ex
@@ -814,6 +814,15 @@ bool shouldEmitConstantsToTextSection(const Triple &TT);
/// to integer.
int getIntegerAttribute(const Function &F, StringRef Name, int Default);
+/// \returns Unsigned Integer value requested using \p F's \p Name attribute.
+///
+/// \returns \p Default i
@@ -0,0 +1,84 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s
+
jwanggit86 wrote:
Created a new test file to test various errors.
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commi
@@ -0,0 +1,84 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s
+
+; Attribute not specified.
+; CHECK-LABEL: {{^}}empty_no_attribute:
+define amdgpu_kernel void @empty_no_attribute() {
+entry:
+ ret void
+}
+
+; Ignore if number of work groups for x dime
@@ -641,6 +644,9 @@ class SIMemoryLegalizer final : public MachineFunctionPass {
bool expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI,
MachineBasicBlock::iterator &MI);
+ bool GFX9InsertWaitcntForPreciseMem(MachineFunction &MF);
-
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode",
"Enable CU wavefront execution mode"
>;
+def FeaturePreciseMemory
jwanggit86 wrote:
End user needs this, so a clang command-line option has been added. There was a
lot discussion on this.
@@ -17,13 +17,16 @@
#include "AMDGPUMachineModuleInfo.h"
#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
+#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/ADT/BitmaskEnum.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunc
@@ -0,0 +1,199 @@
+; Testing the -amdgpu-precise-memory-op option
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+amdgpu-precise-memory-op
-verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+amdgpu-precise-memory-op
-v
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const
SIMemOpInfo &MOI,
return Changed;
}
+bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) {
jwanggit86 wrote:
Yes. Code has been rewritten.
https://github
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const
SIMemOpInfo &MOI,
return Changed;
}
+bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) {
+ const GCNSubtarget &ST = MF.getSubtarget();
+ const SIInstrInfo *TII = ST.get
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/79236
>From 9c40b1151b0673430ff53eb121784724a5b090e5 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Tue, 23 Jan 2024 19:19:00 -0600
Subject: [PATCH 1/2] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/79035
>From 5c088a59bd36df40bae9a3a712f3994feded359d Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Mon, 22 Jan 2024 12:43:27 -0600
Subject: [PATCH 1/2] [AMDGPU] Adding the amdgpu-num-work-groups function
attribute
jwanggit86 wrote:
@krzysz00 Code has been updated. Pls take a look when convenient. Pls note the
following:
(1) Two attributes are now supported, one for min and one for max num of
workgroups.
(2) It is allowed to only specify one of the two attributes.
(3) An attribute is ignored if any one of
jwanggit86 wrote:
@t-tye Code has been updated based on your feedback. Pls take a look.
https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jwanggit86 created
https://github.com/llvm/llvm-project/pull/79236
This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op. When this option is specified, a waitcnt instruction is
generated after each memory load/store instruction. The counte
https://github.com/jwanggit86 closed
https://github.com/llvm/llvm-project/pull/68932
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
jwanggit86 wrote:
Implementation is moved to SIMemoryLegalizer pass. See pull req
[79236](https://github.com/llvm/llvm-project/pull/79236).
https://github.com/llvm/llvm-project/pull/68932
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https:/
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const
SIMemOpInfo &MOI,
return Changed;
}
+bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) {
+ const GCNSubtarget &ST = MF.getSubtarget();
+ const SIInstrInfo *TII = ST.get
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const
SIMemOpInfo &MOI,
return Changed;
}
+bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) {
+ const GCNSubtarget &ST = MF.getSubtarget();
+ const SIInstrInfo *TII = ST.get
jwanggit86 wrote:
@arsenm @krzysz00 Any comments?
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
jwanggit86 wrote:
@krzysz00 Are you asking for something like the following:
```
"amdgpu-min-num-work-groups"="1,2,3", "amdgpu-max-num-work-groups"="4,5,6"
```
When both are given, min must be <= max.
https://github.com/llvm/llvm-project/pull/79035
__
jwanggit86 wrote:
@krzysz00 Let me make sure I understand the requirements correctly. Based on my
understanding, the following are the requirements. Pls let me know if there are
any mistakes.
1. Create a new function attribute for the number of workgroups (maybe 2
attributes, one for max, and
jwanggit86 wrote:
@krzysz00 So how do you want to proceed?
https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
jwanggit86 wrote:
@krzysz00 So, instead of 1 number as in the current implementation, you want 3
numbers, i.e., 3 lines like the following in the metadata section?
```
.num_workgroups_x: 1
.num_workgroups_y: 2
.num_workgroups_z: 3
```
https://github.com/llvm/llvm-project/pull/75647
___
https://github.com/jwanggit86 created
https://github.com/llvm/llvm-project/pull/79035
A new function attribute named amdgpu-num-work-groups is added. This attribute,
which consists of three integers, allows programmers to let the compiler know
the number of workgroups to be launched in each of
jwanggit86 wrote:
Reimplemented in [79035](https://github.com/llvm/llvm-project/pull/79035).
https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com
https://github.com/jwanggit86 closed
https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/jwanggit86 updated
https://github.com/llvm/llvm-project/pull/79236
>From 9c40b1151b0673430ff53eb121784724a5b090e5 Mon Sep 17 00:00:00 2001
From: Jun Wang
Date: Tue, 23 Jan 2024 19:19:00 -0600
Subject: [PATCH 1/4] [AMDGPU] Emit a waitcnt instruction after each memory
instruct
1 - 100 of 161 matches
Mail list logo