[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-22 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: > (that means addrspacecast 7-> 8 is not invertible by 8-> 7, right? it would > discard some bits, in invisible breakage sort of way? is there an RFC for > that design?) I'm not aware of anything requiring addrspacecast to be invertible? (In specific, cast 7 -> 8 isn't a thin

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-22 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: (You'll note that in https://github.com/llvm/llvm-project/pull/137425/files#diff-f904f8cd236733212015dd1988ffefcc9f79f7484ee46e3e3833d2d75fa69542R2243 , this intrinsic gets lowered to `raw_ptr_buffer_load_lds` by "pulling apart" the ptr addrspace(7) - that `raw_ptr_buffer_load_

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-22 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @JonChesterfield This builtin, semantically, cannot accommodate the v4i32 usage When you have a v4i32, you need to also specify, as an additional argument, the `voffset` that gets used to index into that v4i32. This builtin doesn't have room for that, because it takes either a

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-20 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: > I don't think we need to worry about compatibility with an intrinsic that's > been committed for a day `global.load.lds` and `buffer[.ptr].load.lds` have been around for quite a while though, and this is just an abstraction over them https://github.com/llvm/llvm-project/pull

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-19 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @arsenm You're right that it might be better to emit the offset, but all the existing intrinsics that I'm abstracting over _do_ have such a field. If you want to add new intrinsics that don't have the offset and that pattern-match instead, I'd be more than happy to review that

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-19 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 closed https://github.com/llvm/llvm-project/pull/137425 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-13 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Ping https://github.com/llvm/llvm-project/pull/137425 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-09 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Ping https://github.com/llvm/llvm-project/pull/137425 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-02 Thread Krzysztof Drewniak via cfe-commits
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm : // GFX9 Intrinsics //===--===// +/// This is a general-purpose intrinsic for all operations that take a pointer +/// a base location in LDS, and a data size and us

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-02 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Re discussion on the other PR about "why is this even an intrinsic" - since this probably shouldn't just be in @jayfoad's DMs: The reason I disagree with "just pattern-match it" is that you can't get the scheduling you want without a guarantee of the intrinssic Namely, while

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-02 Thread Krzysztof Drewniak via cfe-commits
@@ -564,6 +564,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, llvm::Function *F = CGM.getIntrinsic(IID, {LoadTy}); return Builder.CreateCall(F, {Addr}); } + case AMDGPU::BI__builtin_amdgcn_load_to_lds: { +// Should this have asan instrum

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-02 Thread Krzysztof Drewniak via cfe-commits
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm : // GFX9 Intrinsics //===--===// +/// This is a general-purpose intrinsic for all operations that take a pointer +/// a base location in LDS, and a data size and us

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-02 Thread Krzysztof Drewniak via cfe-commits
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm : // GFX9 Intrinsics //===--===// +/// This is a general-purpose intrinsic for all operations that take a pointer +/// a base location in LDS, and a data size and us

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-02 Thread Krzysztof Drewniak via cfe-commits
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm : // GFX9 Intrinsics //===--===// +/// This is a general-purpose intrinsic for all operations that take a pointer +/// a base location in LDS, and a data size and us

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-02 Thread Krzysztof Drewniak via cfe-commits
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm : // GFX9 Intrinsics //===--===// +/// This is a general-purpose intrinsic for all operations that take a pointer +/// a base location in LDS, and a data size and us

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-02 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 updated https://github.com/llvm/llvm-project/pull/137425 >From bcb72e3d8cb2dcdb97199d32797306c5807c8442 Mon Sep 17 00:00:00 2001 From: Krzysztof Drewniak Date: Sat, 26 Apr 2025 00:20:22 + Subject: [PATCH 1/4] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic This

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-02 Thread Krzysztof Drewniak via cfe-commits
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm : // GFX9 Intrinsics //===--===// +/// This is a general-purpose intrinsic for all operations that take a pointer +/// a base location in LDS, and a data size and us

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-05-01 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 updated https://github.com/llvm/llvm-project/pull/137425 Rate limit ยท GitHub body { background-color: #f6f8fa; color: #24292e; font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-

[clang] [llvm] [mlir] [AMDGPU] Generalize global.load.lds to buffer fat pointers (PR #134911)

2025-05-01 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Closing in favor of #137425 https://github.com/llvm/llvm-project/pull/134911 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Generalize global.load.lds to buffer fat pointers (PR #134911)

2025-05-01 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 closed https://github.com/llvm/llvm-project/pull/134911 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][amdgpu] Add builtins for raw/struct buffer lds load (PR #137678)

2025-05-01 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Side note, if we're cleaning up the API, we really should add s.buffer.load.real that actually models the memory effects correctly https://github.com/llvm/llvm-project/pull/137678 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [llvm] [clang][amdgpu] Add builtins for raw/struct buffer lds load (PR #137678)

2025-05-01 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: > It does, we should have a consistent set of buffer builtins To specify what I _think_ the proposed rewrite is, it's auto-upgrading or otherwise transforming ```llvm %r = call T llvm.amdgcn.{raw,struct}.buffer.*(<4 x i32> %rsrc, ...) ``` into ```llvm %rsrc.int = bitcast <4 x i3

[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-01 Thread Krzysztof Drewniak via cfe-commits
@@ -0,0 +1,60 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown -target-cpu gfx900 -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-u

[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-01 Thread Krzysztof Drewniak via cfe-commits
@@ -564,6 +564,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, llvm::Function *F = CGM.getIntrinsic(IID, {LoadTy}); return Builder.CreateCall(F, {Addr}); } + case AMDGPU::BI__builtin_amdgcn_load_to_lds: { +// Should this have asan instrum

[clang] [llvm] [mlir] [Sema] Fix bug in builtin AS override (PR #138141)

2025-05-01 Thread Krzysztof Drewniak via cfe-commits
@@ -564,6 +564,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, llvm::Function *F = CGM.getIntrinsic(IID, {LoadTy}); return Builder.CreateCall(F, {Addr}); } + case AMDGPU::BI__builtin_amdgcn_load_to_lds: { +// Should this have asan instrum

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-30 Thread Krzysztof Drewniak via cfe-commits
@@ -444,17 +444,40 @@ def ROCDL_ds_read_tr6_b96 : ROCDL_LDS_Read_Tr_IntrOp<"ds.read.tr6.b96">; def ROCDL_ds_read_tr16_b64 : ROCDL_LDS_Read_Tr_IntrOp<"ds.read.tr16.b64">; //===-===// -// Global load to LDS int

[clang] [llvm] [clang][amdgpu] Add builtins for raw/struct buffer lds load (PR #137678)

2025-04-29 Thread Krzysztof Drewniak via cfe-commits
@@ -163,7 +163,10 @@ BUILTIN(__builtin_amdgcn_raw_buffer_load_b64, "V2UiQbiiIi", "n") BUILTIN(__builtin_amdgcn_raw_buffer_load_b96, "V3UiQbiiIi", "n") BUILTIN(__builtin_amdgcn_raw_buffer_load_b128, "V4UiQbiiIi", "n") +TARGET_BUILTIN(__builtin_amdgcn_raw_buffer_load_lds, "vV4U

[clang] [llvm] [clang][amdgpu] Add builtins for raw/struct buffer lds load (PR #137678)

2025-04-29 Thread Krzysztof Drewniak via cfe-commits
@@ -163,7 +163,10 @@ BUILTIN(__builtin_amdgcn_raw_buffer_load_b64, "V2UiQbiiIi", "n") BUILTIN(__builtin_amdgcn_raw_buffer_load_b96, "V3UiQbiiIi", "n") BUILTIN(__builtin_amdgcn_raw_buffer_load_b128, "V4UiQbiiIi", "n") +TARGET_BUILTIN(__builtin_amdgcn_raw_buffer_load_lds, "vV4U

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-29 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Well, if y'all want to go add a pattern for this and eventually deprecate the intrinsics I'm all ears, but we're trying to use these instructions now https://github.com/llvm/llvm-project/pull/137425 ___ cfe-commits mailing list cfe-co

[clang] [llvm] [clang][amdgpu] Add builtins for raw/struct buffer lds load (PR #137678)

2025-04-28 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Semantically, seems fine, but I can't review meaningfully on the clang side https://github.com/llvm/llvm-project/pull/137678 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-28 Thread Krzysztof Drewniak via cfe-commits
@@ -257,6 +257,7 @@ TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_v2bf16, "V2sV2s*0V2s", "t", "at TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", "atomic-global-pk-add-bf16-inst") TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-28 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 updated https://github.com/llvm/llvm-project/pull/137425 >From 96e94b5662c613fd80f712080751076254a73524 Mon Sep 17 00:00:00 2001 From: Krzysztof Drewniak Date: Sat, 26 Apr 2025 00:20:22 + Subject: [PATCH 1/2] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic This

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-28 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 edited https://github.com/llvm/llvm-project/pull/137425 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-28 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @jayfoad I still think we need an intrinsic here because a load + an addtid store can be scheduled much different from the asynchronous "gather to LDS" - and because we don't want this load/store to not be optimized https://github.com/llvm/llvm-project/pull/137425 _

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-28 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @jayfoad > High level question: I don't understand why you call this a "gather" > operation. What do you mean by that? Isn't it semantically just a memcpy, or > a (global/buffer) load followed by a (LDS) store? The semantics of this operation (at least in the pre-gfx950 cases)

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-25 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 created https://github.com/llvm/llvm-project/pull/137425 This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the saem API and "gather from a poi

[clang] [llvm] [mlir] [AMDGPU] Generalize global.load.lds to buffer fat pointers (PR #134911)

2025-04-11 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: (I'll take name suggestions on the ptr addrspace(7) intrinsic) https://github.com/llvm/llvm-project/pull/134911 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Generalize global.load.lds to buffer fat pointers (PR #134911)

2025-04-11 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: That is to say, because the offset you're gathering from is on the buffer fat pointer, the addrspace(7) version of this load has the same function signature as `global.load.lds`, not `buffer.raw.ptr.load.lds` https://github.com/llvm/llvm-project/pull/134911

[clang] [llvm] [mlir] [AMDGPU] Generalize global.load.lds to buffer fat pointers (PR #134911)

2025-04-11 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: `void raw.buffer.ptr.load.lds(p8 rsrc, p3 lds, i32 immarg size, i32 voffset, i32 soffset, i32 immarg immOff, i32 immarg aux)` However `void raw.buffer.fat.ptr.load.lds(p7 fatPtr, p3 lds, i32 immarg size, i32 immarg immoff, i32 immarg aux)` Please note that the buffer.ptr vers

[clang] [llvm] [mlir] [AMDGPU] Generalize global.load.lds to buffer fat pointers (PR #134911)

2025-04-11 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @arsenm Ok, so 1. Forcing the global intrinsic into covering buffers is out per 1:1 rules but 2. Per your comments on the previous PR, a new intrinsic for p7 is also out - or were you just objecting to the naming? 3. We can't just reuse the intrinsic on p8 (buffer resources) -

[clang] [llvm] [mlir] [AMDGPU] Generalize global.load.lds to buffer fat pointers (PR #134911)

2025-04-09 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 edited https://github.com/llvm/llvm-project/pull/134911 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Generalize global.load.lds to buffer fat pointers (PR #134911)

2025-04-09 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 updated https://github.com/llvm/llvm-project/pull/134911 >From 976aa3b4528b93bbf9e8deb433be143d45cfbad6 Mon Sep 17 00:00:00 2001 From: Krzysztof Drewniak Date: Tue, 8 Apr 2025 19:10:41 + Subject: [PATCH 1/2] [AMDGPU] Generalize global.load.lds to buffer fat poin

[clang] [libc] [llvm] [AMDGPU] Use COV6 by default (PR #118515)

2025-03-31 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @shiltian Could you update MLIR infrastructure for the new default as well? `mlir/lib/Target/LLVM/ROCDL/Target.cpp` and `mlir/lib/Dialect/LLVMIR/IR/ROCDLDialect.cpp`, which both keep an ear on the ABI version, partly for linking in device libraries https://github.com/llvm/llvm

[clang] [TableGen] Avoid repeated hash lookups (NFC) (PR #132142)

2025-03-19 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 approved this pull request. https://github.com/llvm/llvm-project/pull/132142 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Clang][AMDGPU] Expose buffer load lds as a clang builtin (PR #132048)

2025-03-19 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 commented: I can't speak to the Clang side of this change, but I don't see any issues here https://github.com/llvm/llvm-project/pull/132048 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-b

[clang] [llvm] [mlir] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (PR #126828)

2025-02-18 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 closed https://github.com/llvm/llvm-project/pull/126828 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (PR #126828)

2025-02-18 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 updated https://github.com/llvm/llvm-project/pull/126828 >From f125444bb53e1e10b40b352e9cf7fd3ad052bfbf Mon Sep 17 00:00:00 2001 From: Krzysztof Drewniak Date: Tue, 11 Feb 2025 23:55:36 + Subject: [PATCH 1/2] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat po

[clang] [llvm] [mlir] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (PR #126828)

2025-02-12 Thread Krzysztof Drewniak via cfe-commits
@@ -1072,6 +1073,14 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, {F->getReturnType(), F->getArg(1)->getType()}); return true; } + // Old-style make.buffer.rsrc was only variadic in the input pointer + if (Name.cons

[clang] [llvm] [mlir] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (PR #126828)

2025-02-12 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 updated https://github.com/llvm/llvm-project/pull/126828 >From 457350589fc4c4295a212025873ab4b90124e02f Mon Sep 17 00:00:00 2001 From: Krzysztof Drewniak Date: Tue, 11 Feb 2025 23:55:36 + Subject: [PATCH 1/4] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat po

[clang] [llvm] [mlir] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (PR #126828)

2025-02-12 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @arsenm On further investigation, I misdiagnosed the issue and have updated the commit message accordingly. The real problem is the addrspacecast(addrspacecast(x)) => addrspacecast(x)` fold that was getting rid of the fat pointer intermediate, and then infer-address-spaces did

[clang] [llvm] [mlir] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (PR #126828)

2025-02-12 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 edited https://github.com/llvm/llvm-project/pull/126828 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (PR #126828)

2025-02-12 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 updated https://github.com/llvm/llvm-project/pull/126828 >From 457350589fc4c4295a212025873ab4b90124e02f Mon Sep 17 00:00:00 2001 From: Krzysztof Drewniak Date: Tue, 11 Feb 2025 23:55:36 + Subject: [PATCH 1/3] [AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat po

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-12-11 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Oh, yeah, agreed that known-bits data is marginal ... but a big pile of marginal improvements stacks up. https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-12-11 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: The main case I had in mind when adding the annotation was `range()`-like information: that is, the ability to infer `nsw` and friends on workgroup IDs and dimensions https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mai

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Krzysztof Drewniak via cfe-commits
@@ -0,0 +1,9 @@ + +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn -emit-llvm -o - %s -debug-info-kind=limited 2>&1 | FileCheck %s + +// CHECK: name: "__amdgcn_buffer_rsrc_t",{{.*}}baseType: ![[BT:[0-9]+]] +// CHECK: [[BT]] = !DICompositeType(tag: DW_TAG_

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-12 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Just a note - and maybe this was already discussed above - is there good reason not to explicitly make this type a 128-bit scalar? The LLVM data layout already does this https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Krzysztof Drewniak via cfe-commits
@@ -0,0 +1,95 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Krzysztof Drewniak via cfe-commits
@@ -2201,6 +2207,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: + W

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @ThomasRaoux No, I just left a nitpick. I'm happy with the state of this. https://github.com/llvm/llvm-project/pull/94735 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commit

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: (The ugly version of the arbitrary types code lives around https://github.com/GPUOpen-Drivers/llpc/blob/6c770c7d276d2c2504aed2a0278aab1610993ecf/lgc/patch/PatchBufferOp.cpp#L1559 and really should be an isel legalization instead) https://github.com/llvm/llvm-project/pull/94576

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: (My guesses for how I might use sofffset is if I've got multiple identical buffers concatentated and I need to pick between them without messing with the extent field) https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commits ma

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: The thing is, in all the usecases I've seen, `soffsset == 0`, and so you can legalize on `voffset` (voffset is also what the constant offsets on an instruction get added to) https://github.com/llvm/llvm-project/pull/94576 ___ cfe-comm

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: `raw.ptr.buffer.load` (and `.store`) are loads and stores and should be able to deal with any type you could send through a normal pointer (especially since a partially-OOB read is already hardware-level UB, so extending that through the intrinsics is reasonable) `struct.ptr.*

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: `voffset` and `soffset` are "offset that goes in VGPRs" and "offset that goes in SGPRs", with the latter having some different bounds-checking semantics on ... at least some of the gfx9's, IIRC. The address space 7 lowering just uses voffset. Re arbitrary aggregates: LLPC has

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: 1. For the swizzled case, that's `struct.ptr.buffer.*`, and yeah, those will always need builtins because LLVM can't deal in 2D addressing schemes 2. What I mean is that "types that work" isn't the right framing: any type can be legalized to one or more types that work. That is,

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Actually, even ignoring address space 7, it feels like these builtins if you could `raw.ptr.buffer.store` any type you liked, and then they could be type-varying in Clang? https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commit

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 edited https://github.com/llvm/llvm-project/pull/94735 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
@@ -139,6 +143,10 @@ static constexpr fltSemantics semFloat8E4M3FNUZ = { static constexpr fltSemantics semFloat8E4M3B11FNUZ = { 4, -10, 4, 8, fltNonfiniteBehavior::NanOnly, fltNanEncoding::NegativeZero}; static constexpr fltSemantics semFloatTF32 = {127, -126, 11, 19}; +sta

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 commented: I have no issues with the code as written but I'm rather confused by how it will be used What's the motivation for this PR? Will anyone be trying to constant-fold these things? (If it's for MLIR support, I'd like to have a discussion there, since I don't

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Re addrspace 7, there's one major piece of work missing: arbitrary-typed inputs. That is, we can't currently handle, for example, `load <16 x i8>, ptr addrspace(7) %p` (or, worse, `load i256, ptr addrspace(7) %p`. That's been a followup ticket I never have time to do. If we do w

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-06 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @arsenm Are you suggesting that these should instead be a range of minimum/maximum number of workitems globally? https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm

[llvm] [lldb] [lld] [libc] [clang-tools-extra] [clang] [libcxx] [flang] [AMDGPU] Add IR-level pass to rewrite away address space 7 (PR #77952)

2024-02-02 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @piotrAMD Thanks for the thorough testing! I found the issue (stale pointer) and your code also gave me an unrelated crash, namely that I wasn't correctly handling unreachable intrinssics. https://github.com/llvm/llvm-project/pull/77952 _

[clang] [clang-tools-extra] [llvm] [AMDGPU] Add IR-level pass to rewrite away address space 7 (PR #77952)

2024-02-01 Thread Krzysztof Drewniak via cfe-commits
@@ -0,0 +1,1983 @@ +//===-- AMDGPULowerBufferFatPointers.cpp ---=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-01-30 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Yeah, that's my proposal for metadata that's useful to record, especially since `min == max` gives the present case https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.l

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-01-29 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: I'm suggesting that this might be a more general design and that there might be more uses for it. https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/ma

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-01-29 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Do we want to also get `min-num-work-groups` and `max-num-work-groups` versions? https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe

[llvm] [clang-tools-extra] [clang] [SeperateConstOffsetFromGEP] Handle `or disjoint` flags (PR #76997)

2024-01-26 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 closed https://github.com/llvm/llvm-project/pull/76997 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [llvm] [SeperateConstOffsetFromGEP] Handle `or disjoint` flags (PR #76997)

2024-01-26 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 edited https://github.com/llvm/llvm-project/pull/76997 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] [llvm] [SeperateConstOffsetFromGEP] Handle `or disjoint` flags (PR #76997)

2024-01-25 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 updated https://github.com/llvm/llvm-project/pull/76997 >From 5cc46862df42e7d01a2d45ccc18f221744af0b93 Mon Sep 17 00:00:00 2001 From: Krzysztof Drewniak Date: Thu, 4 Jan 2024 20:20:54 + Subject: [PATCH 1/2] [SeperateConstOffsetFromGEP] Handle `or disjoint` flags

[llvm] [clang] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-22 Thread Krzysztof Drewniak via cfe-commits
@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 : ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f //===-===// // WMMA intrinsics -class ROCDL_Wmma_IntrOp traits = []> : +class ROCDL_Wmma_IntrOp overloade

[clang] [llvm] [mlir] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-22 Thread Krzysztof Drewniak via cfe-commits
@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 : ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f //===-===// // WMMA intrinsics -class ROCDL_Wmma_IntrOp traits = []> : +class ROCDL_Wmma_IntrOp overloade

[clang] [mlir] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-22 Thread Krzysztof Drewniak via cfe-commits
@@ -253,22 +253,22 @@ def ROCDL_mfma_f32_32x32x16_fp8_fp8 : ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.fp8.f //===-===// // WMMA intrinsics -class ROCDL_Wmma_IntrOp traits = []> : +class ROCDL_Wmma_IntrOp overloade

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-17 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: "dispatch size"? https://github.com/llvm/llvm-project/pull/75647 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-16 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: Good to know that other targets have that sort of "how many work groups will be launched" information. Having that be a min/max (either per dimension or in total or both) may be the right approach here, and this could be a good excuse for the unification being talked about. (T

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-15 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: I'd go with Matt's point: close this, and then add metadata for required launch grid sizes. Then you can update `AMDGPULowerKernelAttributes` to use said metadata. https://github.com/llvm/llvm-project/pull/75647 ___ cfe-commits mailin

[llvm] [clang] [compiler-rt] [clang-tools-extra] [AMDGPU] Avoid hitting AMDGPUAsmPrinter related asserts for local functions at O0 (PR #72129)

2024-01-12 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: As a somewhat naive question, what would it take to turn off requiring codegen to be in SCC order? We seem to be the only target doing that. The comments on that line say something about function calls and noinline https://github.com/llvm/llvm-project/pull/72129 ___

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #75647)

2024-01-10 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: @arsenm It's entirely possible that max dispatch size per dimension is the right feature instead, now that you mention it (I keep forgetting we have a grid). Currently I was thinking this'll be useful for `KnownBits`-type info, so ... yeah, per-dimension https://github.com/ll

[mlir] [clang] [llvm] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-12 Thread Krzysztof Drewniak via cfe-commits
@@ -864,6 +865,16 @@ supported for the ``amdgcn`` target. (bits `127:96`). The specific interpretation of these fields varies by the target architecture and is detailed in the ISA descriptions. +**Buffer Strided Pointer** + The buffer index pointer is an experimental addr

[llvm] [mlir] [clang] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-12 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 edited https://github.com/llvm/llvm-project/pull/74471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [mlir] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-12 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 approved this pull request. Looks good to me, aside from a documentation nit. https://github.com/llvm/llvm-project/pull/74471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/lis

[llvm] [mlir] [clang] [AMDGPU] - Add address space for strided buffers (PR #74471)

2023-12-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: I'm going to ask the annoying questions: 1. Isn't a strided buffer one where the field that's named something like `stride` (bits 61:48 or 63:48) is non-zero 2. And therefore it uses structured buffers and the `llvm.struct[.ptr].buffer.*` intrinsics? 3. So, with LLVM's gep, how

[mlir] [flang] [clang-tools-extra] [compiler-rt] [clang] [libcxx] [llvm] [libc] Make SmallVectorImpl destructor protected (PR #71439)

2023-11-08 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: I put up a PR to fix SerializeToHsaco and unblock this https://github.com/llvm/llvm-project/pull/71439 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libc] [clang-tools-extra] [libcxx] [clang] [llvm] [flang] [compiler-rt] [mlir] Make SmallVectorImpl destructor protected (PR #71439)

2023-11-07 Thread Krzysztof Drewniak via cfe-commits
krzysz00 wrote: I don't know if I'll be able to get to the SerializeToHsaco fix today, but passing in `SmallVectorImpl&` would be my preferred solution ... Or really that should be `MemoryBuffer &` or some other such structure if feasible. https://github.com/llvm/llvm-project/pull/71439 _

[clang] [Sema] -Wzero-as-null-pointer-constant: don't warn for __null (PR #69126)

2023-10-24 Thread Krzysztof Drewniak via cfe-commits
https://github.com/krzysz00 updated https://github.com/llvm/llvm-project/pull/69126 >From 357a21c38c1036a012affc85026fcba376ab7128 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Sun, 15 Oct 2023 13:20:31 -0700 Subject: [PATCH 1/2] [Sema] -Wzero-as-null-pointer-constant: don't warn for

[clang] 5d8da5a - Add missing cases to clang switch after D141863

2023-02-09 Thread Krzysztof Drewniak via cfe-commits
Author: Krzysztof Drewniak Date: 2023-02-09T23:17:55Z New Revision: 5d8da5a208e6501baff7a8fd8de76ea143e49646 URL: https://github.com/llvm/llvm-project/commit/5d8da5a208e6501baff7a8fd8de76ea143e49646 DIFF: https://github.com/llvm/llvm-project/commit/5d8da5a208e6501baff7a8fd8de76ea143e49646.diff

[clang] d6ef3d2 - [mlir] Remove VectorToROCDL

2022-07-12 Thread Krzysztof Drewniak via cfe-commits
Author: Krzysztof Drewniak Date: 2022-07-12T15:21:22Z New Revision: d6ef3d20b4e3768dc30fb229dfa938d8059fffef URL: https://github.com/llvm/llvm-project/commit/d6ef3d20b4e3768dc30fb229dfa938d8059fffef DIFF: https://github.com/llvm/llvm-project/commit/d6ef3d20b4e3768dc30fb229dfa938d8059fffef.diff