krzysz00 wrote:
> (that means addrspacecast 7-> 8 is not invertible by 8-> 7, right? it would
> discard some bits, in invisible breakage sort of way? is there an RFC for
> that design?)
I'm not aware of anything requiring addrspacecast to be invertible? (In
specific, cast 7 -> 8 isn't a thin
krzysz00 wrote:
(You'll note that in
https://github.com/llvm/llvm-project/pull/137425/files#diff-f904f8cd236733212015dd1988ffefcc9f79f7484ee46e3e3833d2d75fa69542R2243
, this intrinsic gets lowered to `raw_ptr_buffer_load_lds` by "pulling apart"
the ptr addrspace(7) - that `raw_ptr_buffer_load_
krzysz00 wrote:
@JonChesterfield This builtin, semantically, cannot accommodate the v4i32 usage
When you have a v4i32, you need to also specify, as an additional argument, the
`voffset` that gets used to index into that v4i32. This builtin doesn't have
room for that, because it takes either a
krzysz00 wrote:
> I don't think we need to worry about compatibility with an intrinsic that's
> been committed for a day
`global.load.lds` and `buffer[.ptr].load.lds` have been around for quite a
while though, and this is just an abstraction over them
https://github.com/llvm/llvm-project/pull
krzysz00 wrote:
@arsenm You're right that it might be better to emit the offset, but all the
existing intrinsics that I'm abstracting over _do_ have such a field.
If you want to add new intrinsics that don't have the offset and that
pattern-match instead, I'd be more than happy to review that
https://github.com/krzysz00 closed
https://github.com/llvm/llvm-project/pull/137425
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
krzysz00 wrote:
Ping
https://github.com/llvm/llvm-project/pull/137425
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
krzysz00 wrote:
Ping
https://github.com/llvm/llvm-project/pull/137425
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm :
// GFX9 Intrinsics
//===--===//
+/// This is a general-purpose intrinsic for all operations that take a pointer
+/// a base location in LDS, and a data size and us
krzysz00 wrote:
Re discussion on the other PR about "why is this even an intrinsic" - since
this probably shouldn't just be in @jayfoad's DMs:
The reason I disagree with "just pattern-match it" is that you can't get the
scheduling you want without a guarantee of the intrinssic
Namely, while
@@ -564,6 +564,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned
BuiltinID,
llvm::Function *F = CGM.getIntrinsic(IID, {LoadTy});
return Builder.CreateCall(F, {Addr});
}
+ case AMDGPU::BI__builtin_amdgcn_load_to_lds: {
+// Should this have asan instrum
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm :
// GFX9 Intrinsics
//===--===//
+/// This is a general-purpose intrinsic for all operations that take a pointer
+/// a base location in LDS, and a data size and us
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm :
// GFX9 Intrinsics
//===--===//
+/// This is a general-purpose intrinsic for all operations that take a pointer
+/// a base location in LDS, and a data size and us
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm :
// GFX9 Intrinsics
//===--===//
+/// This is a general-purpose intrinsic for all operations that take a pointer
+/// a base location in LDS, and a data size and us
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm :
// GFX9 Intrinsics
//===--===//
+/// This is a general-purpose intrinsic for all operations that take a pointer
+/// a base location in LDS, and a data size and us
https://github.com/krzysz00 updated
https://github.com/llvm/llvm-project/pull/137425
>From bcb72e3d8cb2dcdb97199d32797306c5807c8442 Mon Sep 17 00:00:00 2001
From: Krzysztof Drewniak
Date: Sat, 26 Apr 2025 00:20:22 +
Subject: [PATCH 1/4] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic
This
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm :
// GFX9 Intrinsics
//===--===//
+/// This is a general-purpose intrinsic for all operations that take a pointer
+/// a base location in LDS, and a data size and us
https://github.com/krzysz00 updated
https://github.com/llvm/llvm-project/pull/137425
Rate limit ยท GitHub
body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe
UI,Helvetica,Arial,sans-
krzysz00 wrote:
Closing in favor of #137425
https://github.com/llvm/llvm-project/pull/134911
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/krzysz00 closed
https://github.com/llvm/llvm-project/pull/134911
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
krzysz00 wrote:
Side note, if we're cleaning up the API, we really should add
s.buffer.load.real that actually models the memory effects correctly
https://github.com/llvm/llvm-project/pull/137678
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
krzysz00 wrote:
> It does, we should have a consistent set of buffer builtins
To specify what I _think_ the proposed rewrite is, it's auto-upgrading or
otherwise transforming
```llvm
%r = call T llvm.amdgcn.{raw,struct}.buffer.*(<4 x i32> %rsrc, ...)
```
into
```llvm
%rsrc.int = bitcast <4 x i3
@@ -0,0 +1,60 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown
-target-cpu gfx900 -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-u
@@ -564,6 +564,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned
BuiltinID,
llvm::Function *F = CGM.getIntrinsic(IID, {LoadTy});
return Builder.CreateCall(F, {Addr});
}
+ case AMDGPU::BI__builtin_amdgcn_load_to_lds: {
+// Should this have asan instrum
@@ -564,6 +564,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned
BuiltinID,
llvm::Function *F = CGM.getIntrinsic(IID, {LoadTy});
return Builder.CreateCall(F, {Addr});
}
+ case AMDGPU::BI__builtin_amdgcn_load_to_lds: {
+// Should this have asan instrum
@@ -444,17 +444,40 @@ def ROCDL_ds_read_tr6_b96 :
ROCDL_LDS_Read_Tr_IntrOp<"ds.read.tr6.b96">;
def ROCDL_ds_read_tr16_b64 : ROCDL_LDS_Read_Tr_IntrOp<"ds.read.tr16.b64">;
//===-===//
-// Global load to LDS int
@@ -163,7 +163,10 @@ BUILTIN(__builtin_amdgcn_raw_buffer_load_b64,
"V2UiQbiiIi", "n")
BUILTIN(__builtin_amdgcn_raw_buffer_load_b96, "V3UiQbiiIi", "n")
BUILTIN(__builtin_amdgcn_raw_buffer_load_b128, "V4UiQbiiIi", "n")
+TARGET_BUILTIN(__builtin_amdgcn_raw_buffer_load_lds, "vV4U
@@ -163,7 +163,10 @@ BUILTIN(__builtin_amdgcn_raw_buffer_load_b64,
"V2UiQbiiIi", "n")
BUILTIN(__builtin_amdgcn_raw_buffer_load_b96, "V3UiQbiiIi", "n")
BUILTIN(__builtin_amdgcn_raw_buffer_load_b128, "V4UiQbiiIi", "n")
+TARGET_BUILTIN(__builtin_amdgcn_raw_buffer_load_lds, "vV4U
krzysz00 wrote:
Well, if y'all want to go add a pattern for this and eventually deprecate the
intrinsics I'm all ears, but we're trying to use these instructions now
https://github.com/llvm/llvm-project/pull/137425
___
cfe-commits mailing list
cfe-co
krzysz00 wrote:
Semantically, seems fine, but I can't review meaningfully on the clang side
https://github.com/llvm/llvm-project/pull/137678
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com
@@ -257,6 +257,7 @@ TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16,
"V2sV2s*0V2s", "t", "at
TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t",
"atomic-global-pk-add-bf16-inst")
TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*
https://github.com/krzysz00 updated
https://github.com/llvm/llvm-project/pull/137425
>From 96e94b5662c613fd80f712080751076254a73524 Mon Sep 17 00:00:00 2001
From: Krzysztof Drewniak
Date: Sat, 26 Apr 2025 00:20:22 +
Subject: [PATCH 1/2] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic
This
https://github.com/krzysz00 edited
https://github.com/llvm/llvm-project/pull/137425
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
krzysz00 wrote:
@jayfoad I still think we need an intrinsic here because a load + an addtid
store can be scheduled much different from the asynchronous "gather to LDS" -
and because we don't want this load/store to not be optimized
https://github.com/llvm/llvm-project/pull/137425
_
krzysz00 wrote:
@jayfoad
> High level question: I don't understand why you call this a "gather"
> operation. What do you mean by that? Isn't it semantically just a memcpy, or
> a (global/buffer) load followed by a (LDS) store?
The semantics of this operation (at least in the pre-gfx950 cases)
https://github.com/krzysz00 created
https://github.com/llvm/llvm-project/pull/137425
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS
from global (address space 1) pointers and buffer fat pointers (address space
7), since they use the saem API and "gather from a poi
krzysz00 wrote:
(I'll take name suggestions on the ptr addrspace(7) intrinsic)
https://github.com/llvm/llvm-project/pull/134911
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
krzysz00 wrote:
That is to say, because the offset you're gathering from is on the buffer fat
pointer, the addrspace(7) version of this load has the same function signature
as `global.load.lds`, not `buffer.raw.ptr.load.lds`
https://github.com/llvm/llvm-project/pull/134911
krzysz00 wrote:
`void raw.buffer.ptr.load.lds(p8 rsrc, p3 lds, i32 immarg size, i32 voffset,
i32 soffset, i32 immarg immOff, i32 immarg aux)`
However
`void raw.buffer.fat.ptr.load.lds(p7 fatPtr, p3 lds, i32 immarg size, i32
immarg immoff, i32 immarg aux)`
Please note that the buffer.ptr vers
krzysz00 wrote:
@arsenm
Ok, so
1. Forcing the global intrinsic into covering buffers is out per 1:1 rules but
2. Per your comments on the previous PR, a new intrinsic for p7 is also out -
or were you just objecting to the naming?
3. We can't just reuse the intrinsic on p8 (buffer resources) -
https://github.com/krzysz00 edited
https://github.com/llvm/llvm-project/pull/134911
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/krzysz00 updated
https://github.com/llvm/llvm-project/pull/134911
>From 976aa3b4528b93bbf9e8deb433be143d45cfbad6 Mon Sep 17 00:00:00 2001
From: Krzysztof Drewniak
Date: Tue, 8 Apr 2025 19:10:41 +
Subject: [PATCH 1/2] [AMDGPU] Generalize global.load.lds to buffer fat
poin
krzysz00 wrote:
@shiltian Could you update MLIR infrastructure for the new default as well?
`mlir/lib/Target/LLVM/ROCDL/Target.cpp` and
`mlir/lib/Dialect/LLVMIR/IR/ROCDLDialect.cpp`, which both keep an ear on the
ABI version, partly for linking in device libraries
https://github.com/llvm/llvm
https://github.com/krzysz00 approved this pull request.
https://github.com/llvm/llvm-project/pull/132142
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/krzysz00 commented:
I can't speak to the Clang side of this change, but I don't see any issues here
https://github.com/llvm/llvm-project/pull/132048
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-b
https://github.com/krzysz00 closed
https://github.com/llvm/llvm-project/pull/126828
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/krzysz00 updated
https://github.com/llvm/llvm-project/pull/126828
>From f125444bb53e1e10b40b352e9cf7fd3ad052bfbf Mon Sep 17 00:00:00 2001
From: Krzysztof Drewniak
Date: Tue, 11 Feb 2025 23:55:36 +
Subject: [PATCH 1/2] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat
po
@@ -1072,6 +1073,14 @@ static bool upgradeIntrinsicFunction1(Function *F,
Function *&NewFn,
{F->getReturnType(), F->getArg(1)->getType()});
return true;
}
+ // Old-style make.buffer.rsrc was only variadic in the input pointer
+ if (Name.cons
https://github.com/krzysz00 updated
https://github.com/llvm/llvm-project/pull/126828
>From 457350589fc4c4295a212025873ab4b90124e02f Mon Sep 17 00:00:00 2001
From: Krzysztof Drewniak
Date: Tue, 11 Feb 2025 23:55:36 +
Subject: [PATCH 1/4] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat
po
krzysz00 wrote:
@arsenm On further investigation, I misdiagnosed the issue and have updated the
commit message accordingly. The real problem is the
addrspacecast(addrspacecast(x)) => addrspacecast(x)` fold that was getting rid
of the fat pointer intermediate, and then infer-address-spaces did
https://github.com/krzysz00 edited
https://github.com/llvm/llvm-project/pull/126828
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/krzysz00 updated
https://github.com/llvm/llvm-project/pull/126828
>From 457350589fc4c4295a212025873ab4b90124e02f Mon Sep 17 00:00:00 2001
From: Krzysztof Drewniak
Date: Tue, 11 Feb 2025 23:55:36 +
Subject: [PATCH 1/3] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat
po
krzysz00 wrote:
Oh, yeah, agreed that known-bits data is marginal ... but a big pile of
marginal improvements stacks up.
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi
krzysz00 wrote:
The main case I had in mind when adding the annotation was `range()`-like
information: that is, the ability to infer `nsw` and friends on workgroup IDs
and dimensions
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mai
@@ -0,0 +1,9 @@
+
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn -emit-llvm -o - %s -debug-info-kind=limited
2>&1 | FileCheck %s
+
+// CHECK: name: "__amdgcn_buffer_rsrc_t",{{.*}}baseType: ![[BT:[0-9]+]]
+// CHECK: [[BT]] = !DICompositeType(tag: DW_TAG_
krzysz00 wrote:
Just a note - and maybe this was already discussed above - is there good reason
not to explicitly make this type a 128-bit scalar? The LLVM data layout
already does this
https://github.com/llvm/llvm-project/pull/94830
___
cfe-commits
@@ -0,0 +1,95 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu
verde -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple
@@ -2201,6 +2207,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const
{
Align = 8;
\
break;
#include "clang/Basic/WebAssemblyReferenceTypes.def"
+case BuiltinType::AMDGPUBufferRsrc:
+ W
krzysz00 wrote:
@ThomasRaoux No, I just left a nitpick. I'm happy with the state of this.
https://github.com/llvm/llvm-project/pull/94735
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commit
krzysz00 wrote:
(The ugly version of the arbitrary types code lives around
https://github.com/GPUOpen-Drivers/llpc/blob/6c770c7d276d2c2504aed2a0278aab1610993ecf/lgc/patch/PatchBufferOp.cpp#L1559
and really should be an isel legalization instead)
https://github.com/llvm/llvm-project/pull/94576
krzysz00 wrote:
(My guesses for how I might use sofffset is if I've got multiple identical
buffers concatentated and I need to pick between them without messing with the
extent field)
https://github.com/llvm/llvm-project/pull/94576
___
cfe-commits ma
krzysz00 wrote:
The thing is, in all the usecases I've seen, `soffsset == 0`, and so you can
legalize on `voffset` (voffset is also what the constant offsets on an
instruction get added to)
https://github.com/llvm/llvm-project/pull/94576
___
cfe-comm
krzysz00 wrote:
`raw.ptr.buffer.load` (and `.store`) are loads and stores and should be able to
deal with any type you could send through a normal pointer (especially since a
partially-OOB read is already hardware-level UB, so extending that through the
intrinsics is reasonable)
`struct.ptr.*
krzysz00 wrote:
`voffset` and `soffset` are "offset that goes in VGPRs" and "offset that goes
in SGPRs", with the latter having some different bounds-checking semantics on
... at least some of the gfx9's, IIRC.
The address space 7 lowering just uses voffset.
Re arbitrary aggregates: LLPC has
krzysz00 wrote:
1. For the swizzled case, that's `struct.ptr.buffer.*`, and yeah, those will
always need builtins because LLVM can't deal in 2D addressing schemes
2. What I mean is that "types that work" isn't the right framing: any type can
be legalized to one or more types that work. That is,
krzysz00 wrote:
Actually, even ignoring address space 7, it feels like these builtins if you
could `raw.ptr.buffer.store` any type you liked, and then they could be
type-varying in Clang?
https://github.com/llvm/llvm-project/pull/94576
___
cfe-commit
https://github.com/krzysz00 edited
https://github.com/llvm/llvm-project/pull/94735
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -139,6 +143,10 @@ static constexpr fltSemantics semFloat8E4M3FNUZ = {
static constexpr fltSemantics semFloat8E4M3B11FNUZ = {
4, -10, 4, 8, fltNonfiniteBehavior::NanOnly, fltNanEncoding::NegativeZero};
static constexpr fltSemantics semFloatTF32 = {127, -126, 11, 19};
+sta
https://github.com/krzysz00 commented:
I have no issues with the code as written but I'm rather confused by how it
will be used
What's the motivation for this PR? Will anyone be trying to constant-fold these
things?
(If it's for MLIR support, I'd like to have a discussion there, since I don't
krzysz00 wrote:
Re addrspace 7, there's one major piece of work missing: arbitrary-typed inputs.
That is, we can't currently handle, for example, `load <16 x i8>, ptr
addrspace(7) %p` (or, worse, `load i256, ptr addrspace(7) %p`.
That's been a followup ticket I never have time to do.
If we do w
krzysz00 wrote:
@arsenm Are you suggesting that these should instead be a range of
minimum/maximum number of workitems globally?
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm
krzysz00 wrote:
@piotrAMD Thanks for the thorough testing! I found the issue (stale pointer)
and your code also gave me an unrelated crash, namely that I wasn't correctly
handling unreachable intrinssics.
https://github.com/llvm/llvm-project/pull/77952
_
@@ -0,0 +1,1983 @@
+//===-- AMDGPULowerBufferFatPointers.cpp ---=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0
krzysz00 wrote:
Yeah, that's my proposal for metadata that's useful to record, especially since
`min == max` gives the present case
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.l
krzysz00 wrote:
I'm suggesting that this might be a more general design and that there might be
more uses for it.
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/ma
krzysz00 wrote:
Do we want to also get `min-num-work-groups` and `max-num-work-groups` versions?
https://github.com/llvm/llvm-project/pull/79035
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe
https://github.com/krzysz00 closed
https://github.com/llvm/llvm-project/pull/76997
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/krzysz00 edited
https://github.com/llvm/llvm-project/pull/76997
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/krzysz00 updated
https://github.com/llvm/llvm-project/pull/76997
>From 5cc46862df42e7d01a2d45ccc18f221744af0b93 Mon Sep 17 00:00:00 2001
From: Krzysztof Drewniak
Date: Thu, 4 Jan 2024 20:20:54 +
Subject: [PATCH 1/2] [SeperateConstOffsetFromGEP] Handle `or disjoint` flags
@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 :
ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f
//===-===//
// WMMA intrinsics
-class ROCDL_Wmma_IntrOp traits = []> :
+class ROCDL_Wmma_IntrOp overloade
@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 :
ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f
//===-===//
// WMMA intrinsics
-class ROCDL_Wmma_IntrOp traits = []> :
+class ROCDL_Wmma_IntrOp overloade
@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 :
ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f
//===-===//
// WMMA intrinsics
-class ROCDL_Wmma_IntrOp traits = []> :
+class ROCDL_Wmma_IntrOp overloade
krzysz00 wrote:
"dispatch size"?
https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
krzysz00 wrote:
Good to know that other targets have that sort of "how many work groups will be
launched" information. Having that be a min/max (either per dimension or in
total or both) may be the right approach here, and this could be a good excuse
for the unification being talked about.
(T
krzysz00 wrote:
I'd go with Matt's point: close this, and then add metadata for required launch
grid sizes. Then you can update `AMDGPULowerKernelAttributes` to use said
metadata.
https://github.com/llvm/llvm-project/pull/75647
___
cfe-commits mailin
krzysz00 wrote:
As a somewhat naive question, what would it take to turn off requiring codegen
to be in SCC order? We seem to be the only target doing that. The comments on
that line say something about function calls and noinline
https://github.com/llvm/llvm-project/pull/72129
___
krzysz00 wrote:
@arsenm It's entirely possible that max dispatch size per dimension is the
right feature instead, now that you mention it (I keep forgetting we have a
grid).
Currently I was thinking this'll be useful for `KnownBits`-type info, so ...
yeah, per-dimension
https://github.com/ll
@@ -864,6 +865,16 @@ supported for the ``amdgcn`` target.
(bits `127:96`). The specific interpretation of these fields varies by the
target architecture and is detailed in the ISA descriptions.
+**Buffer Strided Pointer**
+ The buffer index pointer is an experimental addr
https://github.com/krzysz00 edited
https://github.com/llvm/llvm-project/pull/74471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/krzysz00 approved this pull request.
Looks good to me, aside from a documentation nit.
https://github.com/llvm/llvm-project/pull/74471
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/lis
krzysz00 wrote:
I'm going to ask the annoying questions:
1. Isn't a strided buffer one where the field that's named something like
`stride` (bits 61:48 or 63:48) is non-zero
2. And therefore it uses structured buffers and the
`llvm.struct[.ptr].buffer.*` intrinsics?
3. So, with LLVM's gep, how
krzysz00 wrote:
I put up a PR to fix SerializeToHsaco and unblock this
https://github.com/llvm/llvm-project/pull/71439
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
krzysz00 wrote:
I don't know if I'll be able to get to the SerializeToHsaco fix today, but
passing in `SmallVectorImpl&` would be my preferred solution ...
Or really that should be `MemoryBuffer &` or some other such structure if
feasible.
https://github.com/llvm/llvm-project/pull/71439
_
https://github.com/krzysz00 updated
https://github.com/llvm/llvm-project/pull/69126
>From 357a21c38c1036a012affc85026fcba376ab7128 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine
Date: Sun, 15 Oct 2023 13:20:31 -0700
Subject: [PATCH 1/2] [Sema] -Wzero-as-null-pointer-constant: don't warn for
Author: Krzysztof Drewniak
Date: 2023-02-09T23:17:55Z
New Revision: 5d8da5a208e6501baff7a8fd8de76ea143e49646
URL:
https://github.com/llvm/llvm-project/commit/5d8da5a208e6501baff7a8fd8de76ea143e49646
DIFF:
https://github.com/llvm/llvm-project/commit/5d8da5a208e6501baff7a8fd8de76ea143e49646.diff
Author: Krzysztof Drewniak
Date: 2022-07-12T15:21:22Z
New Revision: d6ef3d20b4e3768dc30fb229dfa938d8059fffef
URL:
https://github.com/llvm/llvm-project/commit/d6ef3d20b4e3768dc30fb229dfa938d8059fffef
DIFF:
https://github.com/llvm/llvm-project/commit/d6ef3d20b4e3768dc30fb229dfa938d8059fffef.diff
96 matches
Mail list logo