[clang] [llvm] Add f8E4M3 IEEE 754 type to llvm (PR #97179)

2024-06-30 Thread Matt Arsenault via cfe-commits
arsenm wrote: Title and description needs rewording. This isn't adding the type "to llvm" which would imply adding the IR type, but only to APFloat https://github.com/llvm/llvm-project/pull/97179 ___ cfe-commits mailing list cfe-commits@lists.llvm.or

[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-07-01 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/96442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-07-01 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: ping https://github.com/llvm/llvm-project/pull/96442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Matt Arsenault via cfe-commits
@@ -942,6 +942,36 @@ struct Amdgpu final : public VariadicABIInfo { } }; +struct NVPTX final : public VariadicABIInfo { + + bool enableForTarget() override { return true; } + + bool vaListPassedInSSARegister() override { return true; } + + Type *vaListType(LLVMContext &Ct

[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Matt Arsenault via cfe-commits
arsenm wrote: > You could theoretically break this if you didn't go through the C ABI and > ignored type promotion, but I'm not concerned with that kind of misuse since > it's against the ABI in the first place. The IR has its own ABI that may or may not match whatever the platform "C ABI' is

[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-07-01 Thread Matt Arsenault via cfe-commits
@@ -54,7 +54,34 @@ class MockArgList { } template LIBC_INLINE T next_var() { -++arg_counter; +arg_counter++; +return T(arg_counter); + } + + size_t read_count() const { return arg_counter; } +}; + +// Used by the GPU implementation to parse how many bytes ne

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-07-02 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,255 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 \ +// RUN: -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-07-02 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,255 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 \ +// RUN: -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -

[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-07-02 Thread Matt Arsenault via cfe-commits
arsenm wrote: > LangAS::Default is not just determined by target. It also depends on > language. For OpenCL 1.2 it is private. I would have hoped this would be implemented by default assuming a hidden private addrspace qualifier https://github.com/llvm/llvm-project/pull/95728 __

[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-07-02 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96442 >From cd95b668b34f5d0834b16f441ab131003988266b Mon Sep 17 00:00:00 2001 From: martinboehme Date: Wed, 26 Jun 2024 15:01:57 +0200 Subject: [PATCH 01/14] [clang][dataflow] Teach `AnalysisASTVisitor` that `typeid()`

[clang] [Clang] [WIP] Added builtin_alloca right Address Space for OpenCL (PR #95750)

2024-07-03 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,255 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 \ +// RUN: -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -

[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-07-03 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96442 >From f2654a3ebb73c25fe4565fdb8c7b4b0ca6b6bf06 Mon Sep 17 00:00:00 2001 From: martinboehme Date: Wed, 26 Jun 2024 15:01:57 +0200 Subject: [PATCH 01/14] [clang][dataflow] Teach `AnalysisASTVisitor` that `typeid()`

[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-07-03 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96442 >From 803aa8823e1d2a9c4ebf2d0ab4ad7c2e0ad9d7e9 Mon Sep 17 00:00:00 2001 From: martinboehme Date: Wed, 26 Jun 2024 15:01:57 +0200 Subject: [PATCH 01/14] [clang][dataflow] Teach `AnalysisASTVisitor` that `typeid()`

[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-07-04 Thread Matt Arsenault via cfe-commits
@@ -14,13 +14,14 @@ #define LLVM_CODEGEN_MACHINEBRANCHPROBABILITYINFO_H #include "llvm/CodeGen/MachineBasicBlock.h" -#include "llvm/CodeGen/MachinePassManager.h" #include "llvm/Pass.h" #include "llvm/Support/BranchProbability.h" namespace llvm { -class MachineBranchProb

[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-07-04 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96442 >From a70da4e0b569d3e83d405a0248e9c71635f29f96 Mon Sep 17 00:00:00 2001 From: martinboehme Date: Wed, 26 Jun 2024 15:01:57 +0200 Subject: [PATCH 01/14] [clang][dataflow] Teach `AnalysisASTVisitor` that `typeid()`

[clang] [libc] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-07-04 Thread Matt Arsenault via cfe-commits
@@ -14,13 +14,14 @@ #define LLVM_CODEGEN_MACHINEBRANCHPROBABILITYINFO_H #include "llvm/CodeGen/MachineBasicBlock.h" -#include "llvm/CodeGen/MachinePassManager.h" #include "llvm/Pass.h" #include "llvm/Support/BranchProbability.h" namespace llvm { -class MachineBranchProb

[clang] [llvm] [AMDGPU] Add target intrinsic for s_prefetch_data (PR #107133)

2024-09-03 Thread Matt Arsenault via cfe-commits
@@ -2689,6 +2689,12 @@ def int_amdgcn_global_load_tr_b128 : AMDGPULoadIntrinsic; def int_amdgcn_wave_id : DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef, IntrNoMem, IntrSpeculatable]>; +def int_amdgcn_s_prefetch_data : + Intrinsic<[], [llvm_anyptr_ty, llvm_i32_ty], ---

[clang] [Clang][CodeGen] Fix type for atomic float incdec operators (PR #107075)

2024-09-03 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/107075 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Remove 3-element vector load and store special handling (PR #104661)

2024-09-03 Thread Matt Arsenault via cfe-commits
@@ -45,7 +45,7 @@ void test3(packedfloat3 *p) { *p = (packedfloat3) { 3.2f, 2.3f, 0.1f }; } // CHECK: @test3( -// CHECK: store <4 x float> {{.*}}, align 4 +// CHECK: store <3 x float> {{.*}}, align 4 arsenm wrote: The ideal control would be more specific tha

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-03 Thread Matt Arsenault via cfe-commits
arsenm wrote: > ok, you mean, i remove the vector testcase for this patch. and just save the > scalar testcase? No, keep the tests. Only keep the scalar behavior change. The previous revision was essentially correct and minimal https://github.com/llvm/llvm-project/pull/89051

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-04 Thread Matt Arsenault via cfe-commits
@@ -1431,9 +1431,13 @@ Value *ScalarExprEmitter::EmitScalarCast(Value *Src, QualType SrcType, return Builder.CreateFPToUI(Src, DstTy, "conv"); } - if (DstElementTy->getTypeID() < SrcElementTy->getTypeID()) + if ((DstElementTy->is16bitFPTy() && SrcElementTy->is16bitFPT

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-04 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: The vector tests should still be added https://github.com/llvm/llvm-project/pull/89051 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)

2024-09-05 Thread Matt Arsenault via cfe-commits
@@ -9934,6 +9934,12 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, auto NewMI = DAG.getMachineNode(Opc, DL, Op->getVTList(), Ops); return SDValue(NewMI, 0); } + case Intrinsic::amdgcn_s_prefetch_data: { +// For non-global address space preserve the

[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)

2024-09-06 Thread Matt Arsenault via cfe-commits
@@ -9934,6 +9934,12 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op, auto NewMI = DAG.getMachineNode(Opc, DL, Op->getVTList(), Ops); return SDValue(NewMI, 0); } + case Intrinsic::amdgcn_s_prefetch_data: { +// For non-global address space preserve the

[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)

2024-09-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. I think the parent needs some revision for global/flat/infer handling https://github.com/llvm/llvm-project/pull/107293 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org

[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)

2024-09-06 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,36 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx1200 < %s | FileCheck --check-prefix=GCN %s +; RUN: llc -global-isel=1 -march=amdgcn -mcpu=gfx1200 < %s | FileC

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: > > The vector tests should still be added > > sorry. if i remove the change of the vector. i have to remove the testcase. > because, for the current code convert between vector type of half and > bfloat16, it has a bug. And it will be Assert "Invalid cast!"" > OK, LGTM with th

[clang] a291fe5 - clang/AMDGPU: Update test message order

2024-09-06 Thread Matt Arsenault via cfe-commits
Author: Matt Arsenault Date: 2024-09-06T21:18:41+04:00 New Revision: a291fe5ed44fa37493d038c78ff4d73135fd85a9 URL: https://github.com/llvm/llvm-project/commit/a291fe5ed44fa37493d038c78ff4d73135fd85a9 DIFF: https://github.com/llvm/llvm-project/commit/a291fe5ed44fa37493d038c78ff4d73135fd85a9.diff

[clang] [Clang] Remove 3-element vector load and store special handling (PR #104661)

2024-09-06 Thread Matt Arsenault via cfe-commits
@@ -45,7 +45,7 @@ void test3(packedfloat3 *p) { *p = (packedfloat3) { 3.2f, 2.3f, 0.1f }; } // CHECK: @test3( -// CHECK: store <4 x float> {{.*}}, align 4 +// CHECK: store <3 x float> {{.*}}, align 4 arsenm wrote: I'd expect this to be in terms of type, not

[clang] [TBAA] Emit "omnipotent char" for intrinsics with type cast (PR #107793)

2024-09-08 Thread Matt Arsenault via cfe-commits
arsenm wrote: I don't understand this. The code is a strict aliasing violation, so why should clang work around it? https://github.com/llvm/llvm-project/pull/107793 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bi

[clang] [llvm] [TableGen] Change SetTheory set/vec to use const Record * (PR #107692)

2024-09-08 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. lgtm assuming the const_cast goes away in a subsequent change https://github.com/llvm/llvm-project/pull/107692 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-08 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89051 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [TBAA] Emit "omnipotent char" for intrinsics with type cast (PR #107793)

2024-09-11 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Hi, @paulwalker-arm, ACLE allows users to do instruction-level development, > but mixing intrinsic and regular C code may break some of the rules set by > the compiler. The rules are still there. You can always use a union or copy to avoid violating the rules. I don't think i

[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

2024-09-11 Thread Matt Arsenault via cfe-commits
@@ -766,8 +766,19 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest, // LLVM atomic instructions always have synch scope. If clang atomic // expression has no scope operand, use default LLVM synch scope. if (!ScopeModel) { +llvm::SyncSc

[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

2024-09-11 Thread Matt Arsenault via cfe-commits
@@ -58,7 +58,35 @@ class SPIRVTargetCodeGenInfo : public CommonSPIRTargetCodeGenInfo { SPIRVTargetCodeGenInfo(CodeGen::CodeGenTypes &CGT) : CommonSPIRTargetCodeGenInfo(std::make_unique(CGT)) {} void setCUDAKernelCallingConvention(const FunctionType *&FT) const overri

[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

2024-09-11 Thread Matt Arsenault via cfe-commits
@@ -251,6 +251,24 @@ SPIRV::MemorySemantics::MemorySemantics getMemSemantics(AtomicOrdering Ord) { llvm_unreachable(nullptr); } +SPIRV::Scope::Scope getMemScope(const LLVMContext &Ctx, SyncScope::ID ID) { + SmallVector SSNs; + Ctx.getSyncScopeNames(SSNs); + + StringRef M

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-11 Thread Matt Arsenault via cfe-commits
@@ -699,9 +699,20 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const FunctionDecl *FD, bool ConstWithoutErrnoAndExceptions = Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID); // Restrict to target with errno, for example, MacOS doesn't

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-11 Thread Matt Arsenault via cfe-commits
@@ -699,9 +699,20 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const FunctionDecl *FD, bool ConstWithoutErrnoAndExceptions = Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID); // Restrict to target with errno, for example, MacOS doesn't

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -1034,6 +1038,169 @@ inline void FPOptions::applyChanges(FPOptionsOverride FPO) { *this = FPO.applyOverrides(*this); } +/// Atomic control options +class AtomicOptionsOverride; +class AtomicOptions { +public: + using storage_type = uint16_t; + + static constexpr unsign

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,19 @@ +//===--- AtomicOptions.def - Atomic Options database -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Ap

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -238,3 +238,55 @@ LLVM_DUMP_METHOD void FPOptionsOverride::dump() { #include "clang/Basic/FPOptions.def" llvm::errs() << "\n"; } + +AtomicOptions +AtomicOptions::defaultWithoutTrailingStorage(const LangOptions &LO) { + AtomicOptions result(LO); + return result; +} + +Ato

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -1,50 +1,48 @@ // RUN: %clang_cc1 -x hip %s -emit-llvm -o - -triple=amdgcn-amd-amdhsa \ // RUN: -fcuda-is-device -target-cpu gfx906 -fnative-half-type \ -// RUN: -fnative-half-arguments-and-returns | FileCheck -check-prefixes=CHECK,SAFEIR %s +// RUN: -fnative-half-argu

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -2,315 +2,195 @@ // RUN: %clang_cc1 -fnative-half-arguments-and-returns -triple amdgcn-amd-amdhsa-gnu -target-cpu gfx900 -emit-llvm -o - %s | FileCheck -check-prefixes=CHECK,SAFE %s // RUN: %clang_cc1 -fnative-half-arguments-and-returns -triple amdgcn-amd-amdhsa-gnu -targe

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -61,30 +59,28 @@ __global__ void ffp1(float *p) { } __global__ void ffp2(double *p) { - // CHECK-LABEL: @_Z4ffp2Pd - // SAFEIR: atomicrmw fadd ptr {{.*}} monotonic, align 8{{$}} - // SAFEIR: atomicrmw fsub ptr {{.*}} monotonic, align 8{{$}} - // SAFEIR: atomicrmw fmax p

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,80 @@ +// RUN: %clang_cc1 -ast-dump %s | FileCheck %s +// RUN: %clang_cc1 -ast-dump -fcuda-is-device %s | FileCheck %s +// RUN: %clang_cc1 -ast-dump -fcuda-is-device %s \ +// RUN: -fatomic=no_fine_grained_memory:off,no_remote_memory:on,ignore_denormal_mode:on \ +//

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -5881,6 +5881,32 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, RenderFloatingPointOptions(TC, D, OFastEnabled, Args, CmdArgs, JA); + if (Arg *AtomicArg = Args.getLastArg(options::OPT_fatomic_EQ)) { +if (!AtomicArg->getNumValues()) { + D.Diag

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,28 @@ +// RUN: %clang_cc1 -fsyntax-only -verify %s +// RUN: %clang_cc1 -fsyntax-only -verify -fcuda-is-device %s +// RUN: %clang_cc1 -fsyntax-only -verify -fcuda-is-device %s \ +// RUN: -fatomic=no_fine_grained_memory:off,no_remote_memory:on,ignore_denormal_mode:on +

[clang] [RFC] Add clang atomic control options and pragmas (PR #102569)

2024-08-09 Thread Matt Arsenault via cfe-commits
@@ -238,3 +238,55 @@ LLVM_DUMP_METHOD void FPOptionsOverride::dump() { #include "clang/Basic/FPOptions.def" llvm::errs() << "\n"; } + +AtomicOptions +AtomicOptions::defaultWithoutTrailingStorage(const LangOptions &LO) { + AtomicOptions result(LO); + return result; +} + +Ato

[clang] [llvm] [Coroutines] Change `llvm.coro.noop` to accept `llvm_anyptr_ty` instead (PR #102096)

2024-08-12 Thread Matt Arsenault via cfe-commits
@@ -123,6 +123,17 @@ bool coro::declaresIntrinsics(const Module &M, return false; } +// Verifies if a module has any intrinsics. +bool coro::declaresIntrinsics(const Module &M, + const DenseSet &Identifiers) { + for (const Function &F : M.functi

[clang] [llvm] [Coroutines] Change `llvm.coro.noop` to accept `llvm_anyptr_ty` instead (PR #102096)

2024-08-12 Thread Matt Arsenault via cfe-commits
@@ -123,6 +123,17 @@ bool coro::declaresIntrinsics(const Module &M, return false; } +// Verifies if a module has any intrinsics. +bool coro::declaresIntrinsics(const Module &M, + const DenseSet &Identifiers) { + for (const Function &F : M.functi

[clang] [clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. (PR #102776)

2024-08-12 Thread Matt Arsenault via cfe-commits
arsenm wrote: > but empirically robust and guaranteed to work as the AMDGPU BE retains > handling of direct passing for legacy reasons. I would like to get rid of that someday... https://github.com/llvm/llvm-project/pull/102776 ___ cfe-commits mailin

[clang] [Clang] Add `__CLANG_GPU_DISABLE_MATH_WRAPPERS` macro for offloading math (PR #98234)

2024-08-13 Thread Matt Arsenault via cfe-commits
@@ -12,6 +12,10 @@ #error "This file is for CUDA compilation only." #endif +// The __CLANG_GPU_DISABLE_MATH_WRAPPERS macro provides a way to let standard +// libcalls reach the link step instead of being eagerly replaced. +#ifndef __CLANG_GPU_DISABLE_MATH_WRAPPERS

[clang] [llvm] target ABI: improve call parameters extensions handling (PR #100757)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -1185,6 +1189,9 @@ Currently, only the following parameter attributes are defined: value should be sign-extended to the extent required by the target's ABI (which is usually 32-bits) by the caller (for a parameter) or the callee (for a return value). +``noext``

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,94 @@ +//===- AMDGPUMCResourceInfo.h - MC Resource Info --*- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apa

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -84,6 +88,11 @@ class AMDGPUAsmPrinter final : public AsmPrinter { SmallString<128> getMCExprStr(const MCExpr *Value); + /// Attempts to replace the validation that is missed in getSIProgramInfo due + /// to MCExpr being unknown. Invoked during doFinalization such that

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,94 @@ +//===- AMDGPUMCResourceInfo.h - MC Resource Info --*- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apa

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter { AMDGPUResourceUsageAnalysis *ResourceUsage; + std::unique_ptr RI; arsenm wrote: Why does this need unique_ptr instead of just a plain member? https://github.com/llvm/llvm-project/pul

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,220 @@ +//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info --===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Ap

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -771,18 +885,38 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo, return false; }; - ProgInfo.NumArchVGPR = CreateExpr(Info.NumVGPR); - ProgInfo.NumAccVGPR = CreateExpr(Info.NumAGPR); - ProgInfo.NumVGPR = CreateExpr(Info.getTotalNumVGPRs(STM)); -

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -3025,8 +3025,8 @@ define amdgpu_kernel void @dyn_extract_v5f64_s_s(ptr addrspace(1) %out, i32 %sel ; GPRIDX-NEXT: amd_machine_version_stepping = 0 ; GPRIDX-NEXT: kernel_code_entry_byte_offset = 256 ; GPRIDX-NEXT: kernel_code_prefetch_byte_size = 0 -; GPRIDX-NEX

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,220 @@ +//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info --===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Ap

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -75,10 +75,10 @@ bb.2: store volatile i32 0, ptr addrspace(1) undef ret void } -; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 16 +; DEFAULTSIZE: .amdhsa_private_segment_fixed_size kernel_non_entry_block_static_alloca_uniformly_reached_align4.private_seg_size ; DEFA

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter { AMDGPUResourceUsageAnalysis *ResourceUsage; + std::unique_ptr RI; + SIProgramInfo CurrentProgramInfo; std::unique_ptr HSAMetadataStream; MCCodeEmitter *DumpCodeInstEmitter = nullptr; + /

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,220 @@ +//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info --===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Ap

[clang] [Clang] Fix sema checks thinking kernels aren't kernels (PR #104460)

2024-08-15 Thread Matt Arsenault via cfe-commits
@@ -7147,7 +7147,9 @@ void Sema::ProcessDeclAttributeList( // good to have a way to specify "these attributes must appear as a group", // for these. Additionally, it would be good to have a way to specify "these // attribute must never appear as a group" for attributes li

[clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-08-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96872 >From 099aec4c343868155104109c66c99d97ae669c4c Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 11 Jun 2024 10:58:44 +0200 Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_ato

[clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-08-15 Thread Matt Arsenault via cfe-commits
arsenm wrote: ### Merge activity * **Aug 15, 2:58 PM EDT**: @arsenm started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/96872). https://github.com/llvm/llvm-project/pull/96872 __

[clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-08-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/96872 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-08-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/96873 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-08-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96873 >From f03c978a942c9169b987b817bbb5ae68126dc766 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 19:12:59 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f1

[clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-08-16 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96873 >From 5d0d09c00b837186a4d40a38c474f32461bac034 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 19:12:59 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f1

[clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-08-16 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/96873 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-08-16 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: ping https://github.com/llvm/llvm-project/pull/96873 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Coroutines] Change `llvm.coro.noop` to accept `llvm_anyptr_ty` instead (PR #102096)

2024-08-16 Thread Matt Arsenault via cfe-commits
@@ -123,6 +123,17 @@ bool coro::declaresIntrinsics(const Module &M, return false; } +// Verifies if a module has any intrinsics. +bool coro::declaresIntrinsics(const Module &M, + const DenseSet &Identifiers) { + for (const Function &F : M.functi

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-16 Thread Matt Arsenault via cfe-commits
@@ -40,12 +41,20 @@ class AMDGPUAsmPrinter final : public AsmPrinter { AMDGPUResourceUsageAnalysis *ResourceUsage; + std::unique_ptr RI; arsenm wrote: But OutContext is a reference in the parent already, so you can use it? https://github.com/llvm/llvm-p

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-08-16 Thread Matt Arsenault via cfe-commits
@@ -3025,8 +3025,8 @@ define amdgpu_kernel void @dyn_extract_v5f64_s_s(ptr addrspace(1) %out, i32 %sel ; GPRIDX-NEXT: amd_machine_version_stepping = 0 ; GPRIDX-NEXT: kernel_code_entry_byte_offset = 256 ; GPRIDX-NEXT: kernel_code_prefetch_byte_size = 0 -; GPRIDX-NEX

[clang] [llvm] [OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels (PR #87695)

2024-04-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: Note this attribute doesn't actually do anything yet. @jwanggit86 are you working on implementing the propagation and optimizations with this? https://github.com/llvm/llvm-project/pull/87695 ___ cfe-commits mailing list cfe-commits@lists

[clang] [Offload] Do not pass `-fcf-protection=` for offloading (PR #88402)

2024-04-12 Thread Matt Arsenault via cfe-commits
@@ -6867,8 +6867,14 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back("-nogpulib"); if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) { -CmdArgs.push_back( -Args.MakeArgString(Twine("-fcf-protection=") + A->getVal

[clang] [Offload] Do not pass `-fcf-protection=` for offloading (PR #88402)

2024-04-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,39 @@ +// Check that -fcf-protection does not get passed to the device-side +// compilation. + +// RUN: %clang -### -x cuda --target=x86_64-unknown-linux-gnu -nogpulib \ +// RUN: -nogpuinc --offload-arch=sm_52 -fcf-protection=full -c %s 2>&1 \ +// RUN: | FileCheck %s

[clang] [Offload] Do not pass `-fcf-protection=` for offloading (PR #88402)

2024-04-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,39 @@ +// Check that -fcf-protection does not get passed to the device-side +// compilation. + +// RUN: %clang -### -x cuda --target=x86_64-unknown-linux-gnu -nogpulib \ +// RUN: -nogpuinc --offload-arch=sm_52 -fcf-protection=full -c %s 2>&1 \ +// RUN: | FileCheck %s

[clang] [Offload] Do not pass `-fcf-protection=` for offloading (PR #88402)

2024-04-12 Thread Matt Arsenault via cfe-commits
@@ -6867,8 +6867,14 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back("-nogpulib"); if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) { -CmdArgs.push_back( -Args.MakeArgString(Twine("-fcf-protection=") + A->getVal

[clang] [AMDGPU] add macro `__AMDGCN_CDNA_VERSION__` (PR #88293)

2024-04-12 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/88293 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] add macro `__AMDGCN_CDNA_VERSION__` (PR #88293)

2024-04-12 Thread Matt Arsenault via cfe-commits
arsenm wrote: "CDNA version" isn't even a defined, technical concept, much less a useful one. We should be not be expanding the set of device macros, and strongly discouraging further use. https://github.com/llvm/llvm-project/pull/88293 ___ cfe-comm

[clang] [clang][CodeGen][OpenMP] Fix casting of atomic update of ptr types (PR #88215)

2024-04-12 Thread Matt Arsenault via cfe-commits
@@ -851,6 +852,16 @@ int main(void) { // CHECK: call{{.*}} @__kmpc_flush( #pragma omp atomic seq_cst rix = dv / rix; + +// CHECK: [[LD_CPX:%.+]] = load atomic ptr, ptr @cpx monotonic +// CHECK: br label %[[CONT:.+]] +// CHECK: [[CONT]] +// CHECK: [[PHI:%.+]] = phi ptr +// CHE

[clang] [clang][CodeGen][OpenMP] Fix casting of atomic update of ptr types (PR #88215)

2024-04-12 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. We should really fix using cmpxchg here. Can you open an IR issue for it? https://github.com/llvm/llvm-project/pull/88215 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.o

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Is there really a good use case for this? Can you use regular stores to > addrspace(7) instead? @krzysz00 I see these regularly used via inline asm in various ML code. We need to expose these in some way to stop people from doing that > > Also, do you really need a separate

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,264 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s --check-prefixes=VERDE +// RUN: %clang_cc

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-06 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,1037 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apach

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-06 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,1037 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apach

[clang] [amdgpu] Pass variadic arguments without splitting (PR #94083)

2024-06-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: > @arsenm You're right about passing larger things indirectly. I'm intending to > land this as-is, with the types inlined, as that unblocks #93362. I'm nervous > that the extra pointer indirection will hit the same memory error that > tweaking codegen in that patch hits (it's a s

[clang] [amdgpu] Pass variadic arguments without splitting (PR #94083)

2024-06-06 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,293 @@ +// REQUIRES: amdgpu-registered-target +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature +// RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s -o - | FileCheck %s + +void sink_0

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,264 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s --check-prefixes=VERDE +// RUN: %clang_cc

[clang] [Clang][AMDGPU] Use `I` to decorate imm argument for `__builtin_amdgcn_global_load_lds` (PR #94376)

2024-06-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/94376 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,19 @@ +; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900 -verify-machineinstrs -o - %s 2>&1 | FileCheck %s arsenm wrote: This is not an IR verifier test, it is a codegen test that fails the machine verifier. A machine veri

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm requested changes to this pull request. @jayfoad's testcase fails and the same test should be repeated for all 3 intrinsics https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,19 @@ +; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900 -verify-machineinstrs -o - %s 2>&1 | FileCheck %s arsenm wrote: This should also be repeated for all 3 intrinsics https://github.com/llvm/llvm-project/pull/89217 __

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: > If we do want addrspace(7), we'll need to expose `make.buffer.rsrc` and give > it a `p7` variant probably. Yes. We probably should expose some kind of custom type instead of directly using a C address_space(7) attribute https://github.com/llvm/llvm-project/pull/94576 ___

[clang] [llvm] [clang][CodeGen] `used` globals && the payloads for global ctors & dtors are globals (PR #93601)

2024-06-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/93601 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][CodeGen] `used` globals && the payloads for global ctors & dtors are globals (PR #93601)

2024-06-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. lgtm with nit https://github.com/llvm/llvm-project/pull/93601 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

<    4   5   6   7   8   9   10   11   12   13   >