[clang] 39dac1f - [clang] Add clang builtins support for gfx90a
Author: Anshil Gandhi Date: 2021-08-05T02:08:06-06:00 New Revision: 39dac1f7f65691487dbdc969e343108db5b0f765 URL: https://github.com/llvm/llvm-project/commit/39dac1f7f65691487dbdc969e343108db5b0f765 DIFF: https://github.com/llvm/llvm-project/commit/39dac1f7f65691487dbdc969e343108db5b0f765.diff LOG: [clang] Add clang builtins support for gfx90a Implement target builtins for gfx90a including fadd64, fadd32, add2h, max and min on various global, flat and ds address spaces for which intrinsics are implemented. Differential Revision: https://reviews.llvm.org/D106909 Added: clang/test/CodeGenOpenCL/builtins-amdgcn-fp-atomics-gfx7-err.cl clang/test/CodeGenOpenCL/builtins-amdgcn-fp-atomics-gfx908-err.cl clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx1030.cl clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx8.cl clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/lib/CodeGen/CGBuiltin.cpp Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 3570431d952cb..2e1d3c7ccbff9 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -196,6 +196,19 @@ TARGET_BUILTIN(__builtin_amdgcn_perm, "UiUiUiUi", "nc", "gfx8-insts") TARGET_BUILTIN(__builtin_amdgcn_fmed3h, "", "nc", "gfx9-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f64, "dd*1d", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1f", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2f16, "V2hV2h*1V2h", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fmin_f64, "dd*1d", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fmax_f64, "dd*1d", "t", "gfx90a-insts") + +TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_f64, "dd*0d", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fmin_f64, "dd*0d", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fmax_f64, "dd*0d", "t", "gfx90a-insts") + +TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f64, "dd*3d", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f32, "ff*3f", "t", "gfx8-insts") + //===--===// // Deep learning builtins. //===--===// diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index b316a865f2fc7..606689385199a 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -16197,6 +16197,74 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Src0 = Builder.CreatePointerBitCastOrAddrSpaceCast(Src0, PTy); return Builder.CreateCall(F, { Src0, Src1, Src2, Src3, Src4 }); } + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: { +Intrinsic::ID IID; +llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); +switch (BuiltinID) { +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: + ArgTy = llvm::Type::getFloatTy(getLLVMContext()); + IID = Intrinsic::amdgcn_global_atomic_fadd; + break; +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: + ArgTy = llvm::FixedVectorType::get( + llvm::Type::getHalfTy(getLLVMContext()), 2); + IID = Intrinsic::amdgcn_global_atomic_fadd; + break; +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: + IID = Intrinsic::amdgcn_global_atomic_fadd; + break; +case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: + IID = Intrinsic::amdgcn_global_atomic_fmin; + break; +case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: + IID = Intrinsic::amdgcn_global_atomic_fmax; + break; +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: + IID = Intrinsic::amdgcn_flat_atomic_fadd; + break; +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: + IID = Intrinsic::amdgcn_flat_atomic_fmin; + break; +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: + IID = Intrinsic::amdgcn_flat_atomic_fmax; + break; +} +llvm::Value *Addr = EmitScalarExpr(E->getArg(0)); +llvm::Value *Val = EmitScalarExpr(E->getArg(1)); +llvm::Function *F = +CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()}); +return Builder.CreateCa
[clang] a350089 - [HIP] Allow target addr space in target builtins
Author: Anshil Gandhi Date: 2021-08-09T16:38:04-06:00 New Revision: a35008955fa606487f79a050f5cc80fc7ee84dda URL: https://github.com/llvm/llvm-project/commit/a35008955fa606487f79a050f5cc80fc7ee84dda DIFF: https://github.com/llvm/llvm-project/commit/a35008955fa606487f79a050f5cc80fc7ee84dda.diff LOG: [HIP] Allow target addr space in target builtins This patch allows target specific addr space in target builtins for HIP. It inserts implicit addr space cast for non-generic pointer to generic pointer in general, and inserts implicit addr space cast for generic to non-generic for target builtin arguments only. It is NFC for non-HIP languages. Differential Revision: https://reviews.llvm.org/D102405 Added: Modified: clang/include/clang/AST/Type.h clang/lib/Basic/Targets/AMDGPU.h clang/lib/Sema/SemaExpr.cpp clang/test/CodeGenCUDA/builtins-amdgcn.cu Removed: diff --git a/clang/include/clang/AST/Type.h b/clang/include/clang/AST/Type.h index 9f46d53378976..4238667b8b076 100644 --- a/clang/include/clang/AST/Type.h +++ b/clang/include/clang/AST/Type.h @@ -495,7 +495,12 @@ class Qualifiers { (A == LangAS::Default && (B == LangAS::sycl_private || B == LangAS::sycl_local || B == LangAS::sycl_global || B == LangAS::sycl_global_device || - B == LangAS::sycl_global_host)); + B == LangAS::sycl_global_host)) || + // In HIP device compilation, any cuda address space is allowed + // to implicitly cast into the default address space. + (A == LangAS::Default && +(B == LangAS::cuda_constant || B == LangAS::cuda_device || + B == LangAS::cuda_shared)); } /// Returns true if the address space in these qualifiers is equal to or diff --git a/clang/lib/Basic/Targets/AMDGPU.h b/clang/lib/Basic/Targets/AMDGPU.h index 2e580ecf24259..f8772cbe244f0 100644 --- a/clang/lib/Basic/Targets/AMDGPU.h +++ b/clang/lib/Basic/Targets/AMDGPU.h @@ -352,7 +352,16 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo { } LangAS getCUDABuiltinAddressSpace(unsigned AS) const override { -return LangAS::Default; +switch (AS) { +case 1: + return LangAS::cuda_device; +case 3: + return LangAS::cuda_shared; +case 4: + return LangAS::cuda_constant; +default: + return getLangASFromTargetAS(AS); +} } llvm::Optional getConstantAddressSpace() const override { diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index 8ef4a9d96320b..5bde87d02877e 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -6572,6 +6572,53 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, return ExprError(); checkDirectCallValidity(*this, Fn, FD, ArgExprs); + +// If this expression is a call to a builtin function in HIP device +// compilation, allow a pointer-type argument to default address space to be +// passed as a pointer-type parameter to a non-default address space. +// If Arg is declared in the default address space and Param is declared +// in a non-default address space, perform an implicit address space cast to +// the parameter type. +if (getLangOpts().HIP && getLangOpts().CUDAIsDevice && FD && +FD->getBuiltinID()) { + for (unsigned Idx = 0; Idx < FD->param_size(); ++Idx) { +ParmVarDecl *Param = FD->getParamDecl(Idx); +if (!ArgExprs[Idx] || !Param || !Param->getType()->isPointerType() || +!ArgExprs[Idx]->getType()->isPointerType()) + continue; + +auto ParamAS = Param->getType()->getPointeeType().getAddressSpace(); +auto ArgTy = ArgExprs[Idx]->getType(); +auto ArgPtTy = ArgTy->getPointeeType(); +auto ArgAS = ArgPtTy.getAddressSpace(); + +// Only allow implicit casting from a non-default address space pointee +// type to a default address space pointee type +if (ArgAS != LangAS::Default || ParamAS == LangAS::Default) + continue; + +// First, ensure that the Arg is an RValue. +if (ArgExprs[Idx]->isGLValue()) { + ArgExprs[Idx] = ImplicitCastExpr::Create( + Context, ArgExprs[Idx]->getType(), CK_NoOp, ArgExprs[Idx], + nullptr, VK_PRValue, FPOptionsOverride()); +} + +// Construct a new arg type with address space of Param +Qualifiers ArgPtQuals = ArgPtTy.getQualifiers(); +ArgPtQuals.setAddressSpace(ParamAS); +auto NewArgPtTy = +Context.getQualifiedType(ArgPtTy.getUnqualifiedType(), ArgPtQuals); +auto NewArgTy = +Context.getQualifiedType(Context.getPointerType(NewArgPtTy), + ArgTy.getQualifiers()); + +// Finally perform an implicit address space cast +
[clang] c4e5425 - [Remarks] Emit optimization remarks for atomics generating CAS loop
Author: Anshil Gandhi Date: 2021-08-13T22:44:08-06:00 New Revision: c4e5425aa579d21530ef1766d7144b38a347f247 URL: https://github.com/llvm/llvm-project/commit/c4e5425aa579d21530ef1766d7144b38a347f247 DIFF: https://github.com/llvm/llvm-project/commit/c4e5425aa579d21530ef1766d7144b38a347f247.diff LOG: [Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891 Added: clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/test/CodeGen/AMDGPU/atomics-remarks-gfx90a.ll Modified: llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/test/CodeGen/AMDGPU/llc-pipeline.ll llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/opt-pipeline.ll Removed: diff --git a/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu b/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu new file mode 100644 index 0..96892286fd75e --- /dev/null +++ b/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu @@ -0,0 +1,16 @@ +// RUN: %clang_cc1 %s -triple=amdgcn-amd-amdhsa -fcuda-is-device \ +// RUN: -target-cpu gfx90a -Rpass=atomic-expand -S -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=GFX90A-CAS + +// REQUIRES: amdgpu-registered-target + +#include "Inputs/cuda.h" +#include + +// GFX90A-CAS: A compare and swap loop was generated for an atomic fadd operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf +// GFX90A-CAS: flat_atomic_cmpswap v0, v[2:3], v[4:5] glc +// GFX90A-CAS: s_cbranch_execnz +__device__ float atomic_add_cas(float *p) { + return __atomic_fetch_add(p, 1.0f, memory_order_relaxed); +} diff --git a/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl b/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl new file mode 100644 index 0..2d8b68f83b9d6 --- /dev/null +++ b/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl @@ -0,0 +1,46 @@ +// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=atomic-expand -S -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=REMARK + +// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=atomic-expand -S -emit-llvm -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=GFX90A-CAS + +// REQUIRES: amdgpu-registered-target + +typedef enum memory_order { + memory_order_relaxed = __ATOMIC_RELAXED, + memory_order_acquire = __ATOMIC_ACQUIRE, + memory_order_release = __ATOMIC_RELEASE, + memory_order_acq_rel = __ATOMIC_ACQ_REL, + memory_order_seq_cst = __ATOMIC_SEQ_CST +} memory_order; + +typedef enum memory_scope { + memory_scope_work_item = __OPENCL_MEMORY_SCOPE_WORK_ITEM, + memory_scope_work_group = __OPENCL_MEMORY_SCOPE_WORK_GROUP, + memory_scope_device = __OPENCL_MEMORY_SCOPE_DEVICE, + memory_scope_all_svm_devices = __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES, +#if defined(cl_intel_subgroups) || defined(cl_khr_subgroups) + memory_scope_sub_group = __OPENCL_MEMORY_SCOPE_SUB_GROUP +#endif +} memory_scope; + +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at workgroup-one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at agent-one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at wavefront-one-as memory scope [-Rpass=atomic-expand] +// GFX90A-CAS-LABEL: @atomic_cas +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("workgroup-one-as") monotonic +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("agent-one-as") monotonic +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("one-as") monotonic +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("wavefront-one-as") monotonic +float atomic_cas(__global atomic_float *d, float a) { + float ret1 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_work_group); + float ret2 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_device); + float ret3 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_all_svm_devices); + float ret4 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_sub_group); +} + + + diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index 125a3be585cb5..5b5458e1058e8 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -17,6 +17,7 @@ #include "llvm/ADT/ArrayRef.h" #include "llvm/ADT/STLExtras.h" #include "llv
[clang] 29e11a1 - Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
Author: Anshil Gandhi Date: 2021-08-13T23:58:04-06:00 New Revision: 29e11a1aa303cf81b81fdbab74fad4f31e5018d3 URL: https://github.com/llvm/llvm-project/commit/29e11a1aa303cf81b81fdbab74fad4f31e5018d3 DIFF: https://github.com/llvm/llvm-project/commit/29e11a1aa303cf81b81fdbab74fad4f31e5018d3.diff LOG: Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit c4e5425aa579d21530ef1766d7144b38a347f247. Added: Modified: llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/test/CodeGen/AMDGPU/llc-pipeline.ll llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/opt-pipeline.ll Removed: clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/test/CodeGen/AMDGPU/atomics-remarks-gfx90a.ll diff --git a/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu b/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu deleted file mode 100644 index 96892286fd75e..0 --- a/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu +++ /dev/null @@ -1,16 +0,0 @@ -// RUN: %clang_cc1 %s -triple=amdgcn-amd-amdhsa -fcuda-is-device \ -// RUN: -target-cpu gfx90a -Rpass=atomic-expand -S -o - 2>&1 | \ -// RUN: FileCheck %s --check-prefix=GFX90A-CAS - -// REQUIRES: amdgpu-registered-target - -#include "Inputs/cuda.h" -#include - -// GFX90A-CAS: A compare and swap loop was generated for an atomic fadd operation at system memory scope -// GFX90A-CAS-LABEL: _Z14atomic_add_casPf -// GFX90A-CAS: flat_atomic_cmpswap v0, v[2:3], v[4:5] glc -// GFX90A-CAS: s_cbranch_execnz -__device__ float atomic_add_cas(float *p) { - return __atomic_fetch_add(p, 1.0f, memory_order_relaxed); -} diff --git a/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl b/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl deleted file mode 100644 index 2d8b68f83b9d6..0 --- a/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl +++ /dev/null @@ -1,46 +0,0 @@ -// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ -// RUN: -Rpass=atomic-expand -S -o - 2>&1 | \ -// RUN: FileCheck %s --check-prefix=REMARK - -// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ -// RUN: -Rpass=atomic-expand -S -emit-llvm -o - 2>&1 | \ -// RUN: FileCheck %s --check-prefix=GFX90A-CAS - -// REQUIRES: amdgpu-registered-target - -typedef enum memory_order { - memory_order_relaxed = __ATOMIC_RELAXED, - memory_order_acquire = __ATOMIC_ACQUIRE, - memory_order_release = __ATOMIC_RELEASE, - memory_order_acq_rel = __ATOMIC_ACQ_REL, - memory_order_seq_cst = __ATOMIC_SEQ_CST -} memory_order; - -typedef enum memory_scope { - memory_scope_work_item = __OPENCL_MEMORY_SCOPE_WORK_ITEM, - memory_scope_work_group = __OPENCL_MEMORY_SCOPE_WORK_GROUP, - memory_scope_device = __OPENCL_MEMORY_SCOPE_DEVICE, - memory_scope_all_svm_devices = __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES, -#if defined(cl_intel_subgroups) || defined(cl_khr_subgroups) - memory_scope_sub_group = __OPENCL_MEMORY_SCOPE_SUB_GROUP -#endif -} memory_scope; - -// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at workgroup-one-as memory scope [-Rpass=atomic-expand] -// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at agent-one-as memory scope [-Rpass=atomic-expand] -// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at one-as memory scope [-Rpass=atomic-expand] -// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at wavefront-one-as memory scope [-Rpass=atomic-expand] -// GFX90A-CAS-LABEL: @atomic_cas -// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("workgroup-one-as") monotonic -// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("agent-one-as") monotonic -// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("one-as") monotonic -// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("wavefront-one-as") monotonic -float atomic_cas(__global atomic_float *d, float a) { - float ret1 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_work_group); - float ret2 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_device); - float ret3 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_all_svm_devices); - float ret4 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_sub_group); -} - - - diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index 5b5458e1058e8..125a3be585cb5 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -17,7 +17,6 @@ #include "llvm/ADT/ArrayRef.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallVector.h" -#include "llvm/Analysis/OptimizationRemarkEm
[clang] 4357852 - [Remarks] Emit optimization remarks for atomics generating CAS loop
Author: Anshil Gandhi Date: 2021-08-14T23:37:23-06:00 New Revision: 435785214f73ff0c92e97f2ade6356e3ba3bf661 URL: https://github.com/llvm/llvm-project/commit/435785214f73ff0c92e97f2ade6356e3ba3bf661 DIFF: https://github.com/llvm/llvm-project/commit/435785214f73ff0c92e97f2ade6356e3ba3bf661.diff LOG: [Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891 Added: clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/test/CodeGen/AMDGPU/atomics-remarks-gfx90a.ll Modified: llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/test/CodeGen/AArch64/O0-pipeline.ll llvm/test/CodeGen/AArch64/O3-pipeline.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll llvm/test/CodeGen/ARM/O3-pipeline.ll llvm/test/CodeGen/PowerPC/O3-pipeline.ll llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/opt-pipeline.ll Removed: diff --git a/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu b/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu new file mode 100644 index 0..96892286fd75e --- /dev/null +++ b/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu @@ -0,0 +1,16 @@ +// RUN: %clang_cc1 %s -triple=amdgcn-amd-amdhsa -fcuda-is-device \ +// RUN: -target-cpu gfx90a -Rpass=atomic-expand -S -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=GFX90A-CAS + +// REQUIRES: amdgpu-registered-target + +#include "Inputs/cuda.h" +#include + +// GFX90A-CAS: A compare and swap loop was generated for an atomic fadd operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf +// GFX90A-CAS: flat_atomic_cmpswap v0, v[2:3], v[4:5] glc +// GFX90A-CAS: s_cbranch_execnz +__device__ float atomic_add_cas(float *p) { + return __atomic_fetch_add(p, 1.0f, memory_order_relaxed); +} diff --git a/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl b/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl new file mode 100644 index 0..2d8b68f83b9d6 --- /dev/null +++ b/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl @@ -0,0 +1,46 @@ +// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=atomic-expand -S -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=REMARK + +// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=atomic-expand -S -emit-llvm -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=GFX90A-CAS + +// REQUIRES: amdgpu-registered-target + +typedef enum memory_order { + memory_order_relaxed = __ATOMIC_RELAXED, + memory_order_acquire = __ATOMIC_ACQUIRE, + memory_order_release = __ATOMIC_RELEASE, + memory_order_acq_rel = __ATOMIC_ACQ_REL, + memory_order_seq_cst = __ATOMIC_SEQ_CST +} memory_order; + +typedef enum memory_scope { + memory_scope_work_item = __OPENCL_MEMORY_SCOPE_WORK_ITEM, + memory_scope_work_group = __OPENCL_MEMORY_SCOPE_WORK_GROUP, + memory_scope_device = __OPENCL_MEMORY_SCOPE_DEVICE, + memory_scope_all_svm_devices = __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES, +#if defined(cl_intel_subgroups) || defined(cl_khr_subgroups) + memory_scope_sub_group = __OPENCL_MEMORY_SCOPE_SUB_GROUP +#endif +} memory_scope; + +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at workgroup-one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at agent-one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at wavefront-one-as memory scope [-Rpass=atomic-expand] +// GFX90A-CAS-LABEL: @atomic_cas +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("workgroup-one-as") monotonic +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("agent-one-as") monotonic +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("one-as") monotonic +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("wavefront-one-as") monotonic +float atomic_cas(__global atomic_float *d, float a) { + float ret1 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_work_group); + float ret2 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_device); + float ret3 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_all_svm_devices); + float ret4 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_sub_group); +} + + + diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index 125a3be585cb5..5b5458e1058e8 100644 --- a/ll
[clang] f22ba51 - [Remarks] Emit optimization remarks for atomics generating CAS loop
Author: Anshil Gandhi Date: 2021-08-16T14:56:01-06:00 New Revision: f22ba51873509b93732015176b778465f40c6db5 URL: https://github.com/llvm/llvm-project/commit/f22ba51873509b93732015176b778465f40c6db5 DIFF: https://github.com/llvm/llvm-project/commit/f22ba51873509b93732015176b778465f40c6db5.diff LOG: [Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891 Added: clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/test/CodeGen/AMDGPU/atomics-remarks-gfx90a.ll Modified: llvm/lib/CodeGen/AtomicExpandPass.cpp Removed: diff --git a/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu b/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu new file mode 100644 index 0..96892286fd75e --- /dev/null +++ b/clang/test/CodeGenCUDA/atomics-remarks-gfx90a.cu @@ -0,0 +1,16 @@ +// RUN: %clang_cc1 %s -triple=amdgcn-amd-amdhsa -fcuda-is-device \ +// RUN: -target-cpu gfx90a -Rpass=atomic-expand -S -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=GFX90A-CAS + +// REQUIRES: amdgpu-registered-target + +#include "Inputs/cuda.h" +#include + +// GFX90A-CAS: A compare and swap loop was generated for an atomic fadd operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf +// GFX90A-CAS: flat_atomic_cmpswap v0, v[2:3], v[4:5] glc +// GFX90A-CAS: s_cbranch_execnz +__device__ float atomic_add_cas(float *p) { + return __atomic_fetch_add(p, 1.0f, memory_order_relaxed); +} diff --git a/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl b/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl new file mode 100644 index 0..127866e84e051 --- /dev/null +++ b/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl @@ -0,0 +1,43 @@ +// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=atomic-expand -S -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=REMARK + +// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=atomic-expand -S -emit-llvm -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=GFX90A-CAS + +// REQUIRES: amdgpu-registered-target + +typedef enum memory_order { + memory_order_relaxed = __ATOMIC_RELAXED, + memory_order_acquire = __ATOMIC_ACQUIRE, + memory_order_release = __ATOMIC_RELEASE, + memory_order_acq_rel = __ATOMIC_ACQ_REL, + memory_order_seq_cst = __ATOMIC_SEQ_CST +} memory_order; + +typedef enum memory_scope { + memory_scope_work_item = __OPENCL_MEMORY_SCOPE_WORK_ITEM, + memory_scope_work_group = __OPENCL_MEMORY_SCOPE_WORK_GROUP, + memory_scope_device = __OPENCL_MEMORY_SCOPE_DEVICE, + memory_scope_all_svm_devices = __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES, +#if defined(cl_intel_subgroups) || defined(cl_khr_subgroups) + memory_scope_sub_group = __OPENCL_MEMORY_SCOPE_SUB_GROUP +#endif +} memory_scope; + +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at workgroup-one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at agent-one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at one-as memory scope [-Rpass=atomic-expand] +// REMARK: remark: A compare and swap loop was generated for an atomic fadd operation at wavefront-one-as memory scope [-Rpass=atomic-expand] +// GFX90A-CAS-LABEL: @atomic_cas +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("workgroup-one-as") monotonic +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("agent-one-as") monotonic +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("one-as") monotonic +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("wavefront-one-as") monotonic +float atomic_cas(__global atomic_float *d, float a) { + float ret1 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_work_group); + float ret2 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_device); + float ret3 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_all_svm_devices); + float ret4 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_sub_group); +} diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index 125a3be585cb5..a27d43e43a855 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -17,6 +17,7 @@ #include "llvm/ADT/ArrayRef.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallVector.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" #include "llvm/CodeGen/AtomicExpandUtils.h" #include "ll
[clang] f5d5f17 - Revert "[HIP] Allow target addr space in target builtins"
Author: Anshil Gandhi Date: 2021-08-18T21:38:42-06:00 New Revision: f5d5f17d3ad455de2fbb9448acea66cbc09561c5 URL: https://github.com/llvm/llvm-project/commit/f5d5f17d3ad455de2fbb9448acea66cbc09561c5 DIFF: https://github.com/llvm/llvm-project/commit/f5d5f17d3ad455de2fbb9448acea66cbc09561c5.diff LOG: Revert "[HIP] Allow target addr space in target builtins" This reverts commit a35008955fa606487f79a050f5cc80fc7ee84dda. Added: Modified: clang/include/clang/AST/Type.h clang/lib/Basic/Targets/AMDGPU.h clang/lib/Sema/SemaExpr.cpp clang/test/CodeGenCUDA/builtins-amdgcn.cu Removed: diff --git a/clang/include/clang/AST/Type.h b/clang/include/clang/AST/Type.h index fc83c895afa2e..09e9705bd86b8 100644 --- a/clang/include/clang/AST/Type.h +++ b/clang/include/clang/AST/Type.h @@ -495,12 +495,7 @@ class Qualifiers { (A == LangAS::Default && (B == LangAS::sycl_private || B == LangAS::sycl_local || B == LangAS::sycl_global || B == LangAS::sycl_global_device || - B == LangAS::sycl_global_host)) || - // In HIP device compilation, any cuda address space is allowed - // to implicitly cast into the default address space. - (A == LangAS::Default && -(B == LangAS::cuda_constant || B == LangAS::cuda_device || - B == LangAS::cuda_shared)); + B == LangAS::sycl_global_host)); } /// Returns true if the address space in these qualifiers is equal to or diff --git a/clang/lib/Basic/Targets/AMDGPU.h b/clang/lib/Basic/Targets/AMDGPU.h index f8772cbe244f0..2e580ecf24259 100644 --- a/clang/lib/Basic/Targets/AMDGPU.h +++ b/clang/lib/Basic/Targets/AMDGPU.h @@ -352,16 +352,7 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo { } LangAS getCUDABuiltinAddressSpace(unsigned AS) const override { -switch (AS) { -case 1: - return LangAS::cuda_device; -case 3: - return LangAS::cuda_shared; -case 4: - return LangAS::cuda_constant; -default: - return getLangASFromTargetAS(AS); -} +return LangAS::Default; } llvm::Optional getConstantAddressSpace() const override { diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index 5bde87d02877e..8ef4a9d96320b 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -6572,53 +6572,6 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, return ExprError(); checkDirectCallValidity(*this, Fn, FD, ArgExprs); - -// If this expression is a call to a builtin function in HIP device -// compilation, allow a pointer-type argument to default address space to be -// passed as a pointer-type parameter to a non-default address space. -// If Arg is declared in the default address space and Param is declared -// in a non-default address space, perform an implicit address space cast to -// the parameter type. -if (getLangOpts().HIP && getLangOpts().CUDAIsDevice && FD && -FD->getBuiltinID()) { - for (unsigned Idx = 0; Idx < FD->param_size(); ++Idx) { -ParmVarDecl *Param = FD->getParamDecl(Idx); -if (!ArgExprs[Idx] || !Param || !Param->getType()->isPointerType() || -!ArgExprs[Idx]->getType()->isPointerType()) - continue; - -auto ParamAS = Param->getType()->getPointeeType().getAddressSpace(); -auto ArgTy = ArgExprs[Idx]->getType(); -auto ArgPtTy = ArgTy->getPointeeType(); -auto ArgAS = ArgPtTy.getAddressSpace(); - -// Only allow implicit casting from a non-default address space pointee -// type to a default address space pointee type -if (ArgAS != LangAS::Default || ParamAS == LangAS::Default) - continue; - -// First, ensure that the Arg is an RValue. -if (ArgExprs[Idx]->isGLValue()) { - ArgExprs[Idx] = ImplicitCastExpr::Create( - Context, ArgExprs[Idx]->getType(), CK_NoOp, ArgExprs[Idx], - nullptr, VK_PRValue, FPOptionsOverride()); -} - -// Construct a new arg type with address space of Param -Qualifiers ArgPtQuals = ArgPtTy.getQualifiers(); -ArgPtQuals.setAddressSpace(ParamAS); -auto NewArgPtTy = -Context.getQualifiedType(ArgPtTy.getUnqualifiedType(), ArgPtQuals); -auto NewArgTy = -Context.getQualifiedType(Context.getPointerType(NewArgPtTy), - ArgTy.getQualifiers()); - -// Finally perform an implicit address space cast -ArgExprs[Idx] = ImpCastExprToType(ArgExprs[Idx], NewArgTy, - CK_AddressSpaceConversion) -.get(); - } -} } if (Context.isDependenceAllowed() && diff --git a/clang/test/CodeGenCUDA/builtins-amdgcn.cu
[clang] df0560c - [HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang
Author: Anshil Gandhi Date: 2021-11-29T12:07:13-07:00 New Revision: df0560ca00182364e0a786d35adb294c3c98dbd0 URL: https://github.com/llvm/llvm-project/commit/df0560ca00182364e0a786d35adb294c3c98dbd0 DIFF: https://github.com/llvm/llvm-project/commit/df0560ca00182364e0a786d35adb294c3c98dbd0.diff LOG: [HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang Introduce `__hip_atomic_load`, `__hip_atomic_store` and `__hip_atomic_compare_exchange_weak` builtins in HIP. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D114553 Added: clang/test/SemaCUDA/atomic-ops.cu Modified: clang/include/clang/AST/Expr.h clang/include/clang/Basic/Builtins.def clang/lib/AST/Expr.cpp clang/lib/AST/StmtPrinter.cpp clang/lib/CodeGen/CGAtomic.cpp clang/lib/Sema/SemaChecking.cpp clang/test/CodeGenCUDA/atomic-ops.cu Removed: diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h index 246585e1205fa..2c63406fba18d 100644 --- a/clang/include/clang/AST/Expr.h +++ b/clang/include/clang/AST/Expr.h @@ -6308,6 +6308,7 @@ class AtomicExpr : public Expr { getOp() == AO__hip_atomic_compare_exchange_strong || getOp() == AO__opencl_atomic_compare_exchange_strong || getOp() == AO__opencl_atomic_compare_exchange_weak || + getOp() == AO__hip_atomic_compare_exchange_weak || getOp() == AO__atomic_compare_exchange || getOp() == AO__atomic_compare_exchange_n; } @@ -6342,10 +6343,9 @@ class AtomicExpr : public Expr { auto Kind = (Op >= AO__opencl_atomic_load && Op <= AO__opencl_atomic_fetch_max) ? AtomicScopeModelKind::OpenCL -: (Op >= AO__hip_atomic_compare_exchange_strong && - Op <= AO__hip_atomic_fetch_max) - ? AtomicScopeModelKind::HIP - : AtomicScopeModelKind::None; +: (Op >= AO__hip_atomic_load && Op <= AO__hip_atomic_fetch_max) +? AtomicScopeModelKind::HIP +: AtomicScopeModelKind::None; return AtomicScopeModel::create(Kind); } diff --git a/clang/include/clang/Basic/Builtins.def b/clang/include/clang/Basic/Builtins.def index 1f7680e0d923c..ad8b66aa490be 100644 --- a/clang/include/clang/Basic/Builtins.def +++ b/clang/include/clang/Basic/Builtins.def @@ -855,8 +855,9 @@ ATOMIC_BUILTIN(__atomic_fetch_min, "v.", "t") ATOMIC_BUILTIN(__atomic_fetch_max, "v.", "t") // HIP atomic builtins. -// FIXME: Is `__hip_atomic_compare_exchange_n` or -// `__hip_atomic_compare_exchange_weak` needed? +ATOMIC_BUILTIN(__hip_atomic_load, "v.", "t") +ATOMIC_BUILTIN(__hip_atomic_store, "v.", "t") +ATOMIC_BUILTIN(__hip_atomic_compare_exchange_weak, "v.", "t") ATOMIC_BUILTIN(__hip_atomic_compare_exchange_strong, "v.", "t") ATOMIC_BUILTIN(__hip_atomic_exchange, "v.", "t") ATOMIC_BUILTIN(__hip_atomic_fetch_add, "v.", "t") diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp index ce6e30697f856..d3cb2ff3734cb 100644 --- a/clang/lib/AST/Expr.cpp +++ b/clang/lib/AST/Expr.cpp @@ -4681,6 +4681,7 @@ unsigned AtomicExpr::getNumSubExprs(AtomicOp Op) { return 2; case AO__opencl_atomic_load: + case AO__hip_atomic_load: case AO__c11_atomic_store: case AO__c11_atomic_exchange: case AO__atomic_load: @@ -4721,6 +4722,7 @@ unsigned AtomicExpr::getNumSubExprs(AtomicOp Op) { case AO__hip_atomic_fetch_min: case AO__hip_atomic_fetch_max: case AO__opencl_atomic_store: + case AO__hip_atomic_store: case AO__opencl_atomic_exchange: case AO__opencl_atomic_fetch_add: case AO__opencl_atomic_fetch_sub: @@ -4738,6 +4740,7 @@ unsigned AtomicExpr::getNumSubExprs(AtomicOp Op) { case AO__hip_atomic_compare_exchange_strong: case AO__opencl_atomic_compare_exchange_strong: case AO__opencl_atomic_compare_exchange_weak: + case AO__hip_atomic_compare_exchange_weak: case AO__atomic_compare_exchange: case AO__atomic_compare_exchange_n: return 6; diff --git a/clang/lib/AST/StmtPrinter.cpp b/clang/lib/AST/StmtPrinter.cpp index fc267d7006a1b..b65a38d1e5665 100644 --- a/clang/lib/AST/StmtPrinter.cpp +++ b/clang/lib/AST/StmtPrinter.cpp @@ -1691,7 +1691,8 @@ void StmtPrinter::VisitAtomicExpr(AtomicExpr *Node) { PrintExpr(Node->getPtr()); if (Node->getOp() != AtomicExpr::AO__c11_atomic_load && Node->getOp() != AtomicExpr::AO__atomic_load_n && - Node->getOp() != AtomicExpr::AO__opencl_atomic_load) { + Node->getOp() != AtomicExpr::AO__opencl_atomic_load && + Node->getOp() != AtomicExpr::AO__hip_atomic_load) { OS << ", "; PrintExpr(Node->getVal1()); } diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp index 9b507b87213a3..b68e6328acdfd 100644 --- a/clang/lib/CodeGen/CGAtomic.cpp +++ b/clang/lib/CodeGen/CGAtomic.cpp @@ -531,6 +531,7 @@ static void EmitAtomicOp(CodeGenFunction
Getting started
Hi everyone, My name is Anshil Gandhi and I am currently in my third year of BSc double majoring in Computing Science and Mathematics. I am interested in developing the clang frontend, C++ 1x features implementation in particular. I have cloned the git repository of llvm and explored through various features of clang, however I am not sure how to familarize myself with the project organization. I will appreciate any pointers on how to get started. Thanks in advance! Kind regards, Anshil Gandhi ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] a955a31 - [AMDGPU] Replace target feature for global fadd32
Author: Anshil Gandhi Date: 2023-03-28T15:58:30-06:00 New Revision: a955a31896370b67c6490251eca0095295d55f1f URL: https://github.com/llvm/llvm-project/commit/a955a31896370b67c6490251eca0095295d55f1f DIFF: https://github.com/llvm/llvm-project/commit/a955a31896370b67c6490251eca0095295d55f1f.diff LOG: [AMDGPU] Replace target feature for global fadd32 Change target feature of __builtin_amdgcn_global_atomic_fadd_f32 to atomic-fadd-rtn-insts. Enable atomic-fadd-rtn-insts for gfx90a, gfx940 and gfx1100 as they all support the return variant of `global_atomic_add_f32`. Fixes https://github.com/llvm/llvm-project/issues/61331. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D146840 Added: Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/lib/Basic/Targets/AMDGPU.cpp clang/test/CodeGenOpenCL/amdgpu-features.cl clang/test/CodeGenOpenCL/builtins-amdgcn-fp-atomics-gfx908-err.cl clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 965bd97a97d79..0196100cccac5 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -214,7 +214,7 @@ TARGET_BUILTIN(__builtin_amdgcn_perm, "UiUiUiUi", "nc", "gfx8-insts") TARGET_BUILTIN(__builtin_amdgcn_fmed3h, "", "nc", "gfx9-insts") TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f64, "dd*1d", "t", "gfx90a-insts") -TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1f", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1f", "t", "atomic-fadd-rtn-insts") TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2f16, "V2hV2h*1V2h", "t", "atomic-buffer-global-pk-add-f16-insts") TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fmin_f64, "dd*1d", "t", "gfx90a-insts") TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fmax_f64, "dd*1d", "t", "gfx90a-insts") diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 72dfb07804dff..9b3a0b0f40edb 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -206,6 +206,7 @@ bool AMDGPUTargetInfo::initFeatureMap( Features["gfx10-insts"] = true; Features["gfx10-3-insts"] = true; Features["gfx11-insts"] = true; + Features["atomic-fadd-rtn-insts"] = true; break; case GK_GFX1036: case GK_GFX1035: @@ -264,6 +265,7 @@ bool AMDGPUTargetInfo::initFeatureMap( case GK_GFX90A: Features["gfx90a-insts"] = true; Features["atomic-buffer-global-pk-add-f16-insts"] = true; + Features["atomic-fadd-rtn-insts"] = true; [[fallthrough]]; case GK_GFX908: Features["dot3-insts"] = true; diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index 4a4da6b270b9a..e000239cd03fe 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -72,9 +72,9 @@ // GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" // GFX908: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" // GFX909: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" -// GFX90A: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" +// GFX90A: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" // GFX90C: "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" -// GFX940: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-mem
[clang] 03375a3 - [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
Author: Anshil Gandhi Date: 2021-10-15T11:39:15-06:00 New Revision: 03375a3fb33b11e1249d9c88070b7f33cb97802a URL: https://github.com/llvm/llvm-project/commit/03375a3fb33b11e1249d9c88070b7f33cb97802a DIFF: https://github.com/llvm/llvm-project/commit/03375a3fb33b11e1249d9c88070b7f33cb97802a.diff LOG: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols By default clang emits complete contructors as alias of base constructors if they are the same. The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols. @yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true` and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had to be extended to support aliases to functions. inline-calls.ll was corrected appropriately. Reviewed By: yaxunl, #amdgpu Differential Revision: https://reviews.llvm.org/D109707 Added: clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu Modified: clang/lib/Driver/ToolChains/Clang.cpp llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp llvm/test/CodeGen/AMDGPU/inline-calls.ll Removed: diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 83afbc3952d84..316c6026adf5c 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -5089,9 +5089,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, } // Enable -mconstructor-aliases except on darwin, where we have to work around - // a linker bug (see ), and CUDA/AMDGPU device code, - // where aliases aren't supported. - if (!RawTriple.isOSDarwin() && !RawTriple.isNVPTX() && !RawTriple.isAMDGPU()) + // a linker bug (see ), and CUDA device code, where + // aliases aren't supported. + if (!RawTriple.isOSDarwin() && !RawTriple.isNVPTX()) CmdArgs.push_back("-mconstructor-aliases"); // Darwin's kernel doesn't support guard variables; just die if we diff --git a/clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu b/clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu new file mode 100644 index 0..f75088f8e1415 --- /dev/null +++ b/clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu @@ -0,0 +1,17 @@ +// REQUIRES: amdgpu-registered-target, clang-driver + +// RUN: %clang --offload-arch=gfx906 --cuda-device-only -nogpulib -nogpuinc -x hip -emit-llvm -S -o - %s \ +// RUN: -fgpu-rdc -O3 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false | \ +// RUN: FileCheck %s + +#include "Inputs/cuda.h" + +// CHECK: %struct.B = type { i8 } +struct B { + + // CHECK: @_ZN1BC1Ei = hidden unnamed_addr alias void (%struct.B*, i32), void (%struct.B*, i32)* @_ZN1BC2Ei + __device__ B(int x); +}; + +__device__ B::B(int x) { +} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp index 7ff24d1e9c62b..2e24e9f929d2a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp @@ -15,6 +15,7 @@ #include "AMDGPU.h" #include "AMDGPUTargetMachine.h" #include "Utils/AMDGPUBaseInfo.h" +#include "llvm/CodeGen/CommandFlags.h" #include "llvm/IR/Module.h" #include "llvm/Pass.h" #include "llvm/Support/CommandLine.h" @@ -90,9 +91,13 @@ static bool alwaysInlineImpl(Module &M, bool GlobalOpt) { SmallPtrSet FuncsToAlwaysInline; SmallPtrSet FuncsToNoInline; + Triple TT(M.getTargetTriple()); for (GlobalAlias &A : M.aliases()) { if (Function* F = dyn_cast(A.getAliasee())) { + if (TT.getArch() == Triple::amdgcn && + A.getLinkage() != GlobalValue::InternalLinkage) +continue; A.replaceAllUsesWith(F); AliasesToRemove.push_back(&A); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp index e841e939ef34b..3c5cb6e190850 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp @@ -29,6 +29,8 @@ #include "SIMachineFunctionInfo.h" #include "llvm/Analysis/CallGraph.h" #include "llvm/CodeGen/TargetPassConfig.h" +#include "llvm/IR/GlobalAlias.h" +#include "llvm/IR/GlobalValue.h" #include "llvm/Target/TargetMachine.h" using namespace llvm; @@ -61,7 +63,8 @@ static const Function *getCalleeFunction(const MachineOperand &Op) { assert(Op.getImm() == 0); return nullptr; } - + if (auto *GA = dyn_cast(Op.getGlobal())) +return cast(GA->getOperand(0)); return cast(Op.getGlobal()); } diff --git a/llvm/test/CodeGen/AMDGPU/inline-calls.ll b/llvm/test/CodeGen/AMDGPU/inline-calls.ll index 233485a202057..134cd301
[clang] 3b48e11 - [HIP] Relax conditions for address space cast in builtin args
Author: Anshil Gandhi Date: 2021-10-15T14:06:47-06:00 New Revision: 3b48e1170dc623a95ff13a1e34c839cc094bf321 URL: https://github.com/llvm/llvm-project/commit/3b48e1170dc623a95ff13a1e34c839cc094bf321 DIFF: https://github.com/llvm/llvm-project/commit/3b48e1170dc623a95ff13a1e34c839cc094bf321.diff LOG: [HIP] Relax conditions for address space cast in builtin args Allow (implicit) address space casting between LLVM-equivalent target address spaces. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D111734 Added: clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu Modified: clang/lib/Sema/SemaExpr.cpp Removed: diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index 94b44714b530d..b3211db8df2dc 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -6545,9 +6545,11 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, auto ArgPtTy = ArgTy->getPointeeType(); auto ArgAS = ArgPtTy.getAddressSpace(); -// Only allow implicit casting from a non-default address space pointee -// type to a default address space pointee type -if (ArgAS != LangAS::Default || ParamAS == LangAS::Default) +// Add address space cast if target address spaces are diff erent +if ((ArgAS != LangAS::Default && + getASTContext().getTargetAddressSpace(ArgAS) != + getASTContext().getTargetAddressSpace(ParamAS)) || +ParamAS == LangAS::Default) continue; // First, ensure that the Arg is an RValue. diff --git a/clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu b/clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu new file mode 100644 index 0..d15953b3cacaa --- /dev/null +++ b/clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu @@ -0,0 +1,20 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a -x hip \ +// RUN: -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -emit-llvm %s \ +// RUN: -o - | FileCheck %s + +#define __device__ __attribute__((device)) +typedef __attribute__((address_space(3))) float *LP; + +// CHECK-LABEL: test_ds_atomic_add_f32 +// CHECK: %[[ADDR_ADDR:.*]] = alloca float*, align 8, addrspace(5) +// CHECK: %[[ADDR_ADDR_ASCAST_PTR:.*]] = addrspacecast float* addrspace(5)* %[[ADDR_ADDR]] to float** +// CHECK: store float* %addr, float** %[[ADDR_ADDR_ASCAST_PTR]], align 8 +// CHECK: %[[ADDR_ADDR_ASCAST:.*]] = load float*, float** %[[ADDR_ADDR_ASCAST_PTR]], align 8 +// CHECK: %[[AS_CAST:.*]] = addrspacecast float* %[[ADDR_ADDR_ASCAST]] to float addrspace(3)* +// CHECK: %3 = call contract float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* %[[AS_CAST]] +// CHECK: %4 = load float*, float** %rtn.ascast, align 8 +// CHECK: store float %3, float* %4, align 4 +__device__ void test_ds_atomic_add_f32(float *addr, float val) { + float *rtn; + *rtn = __builtin_amdgcn_ds_faddf((LP)addr, val, 0, 0, 0); +} diff --git a/clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu b/clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu new file mode 100644 index 0..6f1484c68ec71 --- /dev/null +++ b/clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu @@ -0,0 +1,12 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a -x hip \ +// RUN: -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device %s \ +// RUN: -fsyntax-only -verify +// expected-no-diagnostics + +#define __device__ __attribute__((device)) +typedef __attribute__((address_space(3))) float *LP; + +__device__ void test_ds_atomic_add_f32(float *addr, float val) { + float *rtn; + *rtn = __builtin_amdgcn_ds_faddf((LP)addr, val, 0, 0, 0); +} ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 53fc510 - Revert "[HIP] Relax conditions for address space cast in builtin args"
Author: Anshil Gandhi Date: 2021-10-15T14:42:28-06:00 New Revision: 53fc5100e07ac078782ffb4e8e2b627c3cc8d046 URL: https://github.com/llvm/llvm-project/commit/53fc5100e07ac078782ffb4e8e2b627c3cc8d046 DIFF: https://github.com/llvm/llvm-project/commit/53fc5100e07ac078782ffb4e8e2b627c3cc8d046.diff LOG: Revert "[HIP] Relax conditions for address space cast in builtin args" This reverts commit 3b48e1170dc623a95ff13a1e34c839cc094bf321. Added: Modified: clang/lib/Sema/SemaExpr.cpp Removed: clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index b3211db8df2dc..94b44714b530d 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -6545,11 +6545,9 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, auto ArgPtTy = ArgTy->getPointeeType(); auto ArgAS = ArgPtTy.getAddressSpace(); -// Add address space cast if target address spaces are diff erent -if ((ArgAS != LangAS::Default && - getASTContext().getTargetAddressSpace(ArgAS) != - getASTContext().getTargetAddressSpace(ParamAS)) || -ParamAS == LangAS::Default) +// Only allow implicit casting from a non-default address space pointee +// type to a default address space pointee type +if (ArgAS != LangAS::Default || ParamAS == LangAS::Default) continue; // First, ensure that the Arg is an RValue. diff --git a/clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu b/clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu deleted file mode 100644 index d15953b3cacaa..0 --- a/clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu +++ /dev/null @@ -1,20 +0,0 @@ -// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a -x hip \ -// RUN: -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -emit-llvm %s \ -// RUN: -o - | FileCheck %s - -#define __device__ __attribute__((device)) -typedef __attribute__((address_space(3))) float *LP; - -// CHECK-LABEL: test_ds_atomic_add_f32 -// CHECK: %[[ADDR_ADDR:.*]] = alloca float*, align 8, addrspace(5) -// CHECK: %[[ADDR_ADDR_ASCAST_PTR:.*]] = addrspacecast float* addrspace(5)* %[[ADDR_ADDR]] to float** -// CHECK: store float* %addr, float** %[[ADDR_ADDR_ASCAST_PTR]], align 8 -// CHECK: %[[ADDR_ADDR_ASCAST:.*]] = load float*, float** %[[ADDR_ADDR_ASCAST_PTR]], align 8 -// CHECK: %[[AS_CAST:.*]] = addrspacecast float* %[[ADDR_ADDR_ASCAST]] to float addrspace(3)* -// CHECK: %3 = call contract float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* %[[AS_CAST]] -// CHECK: %4 = load float*, float** %rtn.ascast, align 8 -// CHECK: store float %3, float* %4, align 4 -__device__ void test_ds_atomic_add_f32(float *addr, float val) { - float *rtn; - *rtn = __builtin_amdgcn_ds_faddf((LP)addr, val, 0, 0, 0); -} diff --git a/clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu b/clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu deleted file mode 100644 index 6f1484c68ec71..0 --- a/clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu +++ /dev/null @@ -1,12 +0,0 @@ -// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a -x hip \ -// RUN: -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device %s \ -// RUN: -fsyntax-only -verify -// expected-no-diagnostics - -#define __device__ __attribute__((device)) -typedef __attribute__((address_space(3))) float *LP; - -__device__ void test_ds_atomic_add_f32(float *addr, float val) { - float *rtn; - *rtn = __builtin_amdgcn_ds_faddf((LP)addr, val, 0, 0, 0); -} ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] f92db6d - [HIP] Relax conditions for address space cast in builtin args
Author: Anshil Gandhi Date: 2021-10-15T15:35:52-06:00 New Revision: f92db6d3fff13bdacdf9b24660eb3f3158c58a17 URL: https://github.com/llvm/llvm-project/commit/f92db6d3fff13bdacdf9b24660eb3f3158c58a17 DIFF: https://github.com/llvm/llvm-project/commit/f92db6d3fff13bdacdf9b24660eb3f3158c58a17.diff LOG: [HIP] Relax conditions for address space cast in builtin args Allow (implicit) address space casting between LLVM-equivalent target address spaces. Reviewed By: yaxunl, tra Differential Revision: https://reviews.llvm.org/D111734 Added: clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu Modified: clang/lib/Sema/SemaExpr.cpp Removed: diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index 94b44714b530d..472b15b9ea06b 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -6545,9 +6545,13 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, auto ArgPtTy = ArgTy->getPointeeType(); auto ArgAS = ArgPtTy.getAddressSpace(); -// Only allow implicit casting from a non-default address space pointee -// type to a default address space pointee type -if (ArgAS != LangAS::Default || ParamAS == LangAS::Default) +// Add address space cast if target address spaces are diff erent +bool NeedImplicitASC = + ParamAS != LangAS::Default && // Pointer params in generic AS don't need special handling. + ( ArgAS == LangAS::Default || // We do allow implicit conversion from generic AS + // or from specific AS which has target AS matching that of Param. + getASTContext().getTargetAddressSpace(ArgAS) == getASTContext().getTargetAddressSpace(ParamAS)); +if (!NeedImplicitASC) continue; // First, ensure that the Arg is an RValue. diff --git a/clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu b/clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu new file mode 100644 index 0..d15953b3cacaa --- /dev/null +++ b/clang/test/CodeGenCUDA/builtins-unsafe-atomics-gfx90a.cu @@ -0,0 +1,20 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a -x hip \ +// RUN: -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -emit-llvm %s \ +// RUN: -o - | FileCheck %s + +#define __device__ __attribute__((device)) +typedef __attribute__((address_space(3))) float *LP; + +// CHECK-LABEL: test_ds_atomic_add_f32 +// CHECK: %[[ADDR_ADDR:.*]] = alloca float*, align 8, addrspace(5) +// CHECK: %[[ADDR_ADDR_ASCAST_PTR:.*]] = addrspacecast float* addrspace(5)* %[[ADDR_ADDR]] to float** +// CHECK: store float* %addr, float** %[[ADDR_ADDR_ASCAST_PTR]], align 8 +// CHECK: %[[ADDR_ADDR_ASCAST:.*]] = load float*, float** %[[ADDR_ADDR_ASCAST_PTR]], align 8 +// CHECK: %[[AS_CAST:.*]] = addrspacecast float* %[[ADDR_ADDR_ASCAST]] to float addrspace(3)* +// CHECK: %3 = call contract float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* %[[AS_CAST]] +// CHECK: %4 = load float*, float** %rtn.ascast, align 8 +// CHECK: store float %3, float* %4, align 4 +__device__ void test_ds_atomic_add_f32(float *addr, float val) { + float *rtn; + *rtn = __builtin_amdgcn_ds_faddf((LP)addr, val, 0, 0, 0); +} diff --git a/clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu b/clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu new file mode 100644 index 0..6f1484c68ec71 --- /dev/null +++ b/clang/test/SemaCUDA/builtins-unsafe-atomics-gfx90a.cu @@ -0,0 +1,12 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a -x hip \ +// RUN: -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device %s \ +// RUN: -fsyntax-only -verify +// expected-no-diagnostics + +#define __device__ __attribute__((device)) +typedef __attribute__((address_space(3))) float *LP; + +__device__ void test_ds_atomic_add_f32(float *addr, float val) { + float *rtn; + *rtn = __builtin_amdgcn_ds_faddf((LP)addr, val, 0, 0, 0); +} ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 1830ec9 - Revert "[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols"
Author: Anshil Gandhi Date: 2021-10-15T16:16:18-06:00 New Revision: 1830ec94ac022ae0b6d6876fc2251e6b91e5931e URL: https://github.com/llvm/llvm-project/commit/1830ec94ac022ae0b6d6876fc2251e6b91e5931e DIFF: https://github.com/llvm/llvm-project/commit/1830ec94ac022ae0b6d6876fc2251e6b91e5931e.diff LOG: Revert "[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols" This reverts commit 03375a3fb33b11e1249d9c88070b7f33cb97802a. Added: Modified: clang/lib/Driver/ToolChains/Clang.cpp llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp llvm/test/CodeGen/AMDGPU/inline-calls.ll Removed: clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 316c6026adf5c..83afbc3952d84 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -5089,9 +5089,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, } // Enable -mconstructor-aliases except on darwin, where we have to work around - // a linker bug (see ), and CUDA device code, where - // aliases aren't supported. - if (!RawTriple.isOSDarwin() && !RawTriple.isNVPTX()) + // a linker bug (see ), and CUDA/AMDGPU device code, + // where aliases aren't supported. + if (!RawTriple.isOSDarwin() && !RawTriple.isNVPTX() && !RawTriple.isAMDGPU()) CmdArgs.push_back("-mconstructor-aliases"); // Darwin's kernel doesn't support guard variables; just die if we diff --git a/clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu b/clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu deleted file mode 100644 index f75088f8e1415..0 --- a/clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu +++ /dev/null @@ -1,17 +0,0 @@ -// REQUIRES: amdgpu-registered-target, clang-driver - -// RUN: %clang --offload-arch=gfx906 --cuda-device-only -nogpulib -nogpuinc -x hip -emit-llvm -S -o - %s \ -// RUN: -fgpu-rdc -O3 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false | \ -// RUN: FileCheck %s - -#include "Inputs/cuda.h" - -// CHECK: %struct.B = type { i8 } -struct B { - - // CHECK: @_ZN1BC1Ei = hidden unnamed_addr alias void (%struct.B*, i32), void (%struct.B*, i32)* @_ZN1BC2Ei - __device__ B(int x); -}; - -__device__ B::B(int x) { -} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp index 2e24e9f929d2a..7ff24d1e9c62b 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp @@ -15,7 +15,6 @@ #include "AMDGPU.h" #include "AMDGPUTargetMachine.h" #include "Utils/AMDGPUBaseInfo.h" -#include "llvm/CodeGen/CommandFlags.h" #include "llvm/IR/Module.h" #include "llvm/Pass.h" #include "llvm/Support/CommandLine.h" @@ -91,13 +90,9 @@ static bool alwaysInlineImpl(Module &M, bool GlobalOpt) { SmallPtrSet FuncsToAlwaysInline; SmallPtrSet FuncsToNoInline; - Triple TT(M.getTargetTriple()); for (GlobalAlias &A : M.aliases()) { if (Function* F = dyn_cast(A.getAliasee())) { - if (TT.getArch() == Triple::amdgcn && - A.getLinkage() != GlobalValue::InternalLinkage) -continue; A.replaceAllUsesWith(F); AliasesToRemove.push_back(&A); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp index 3c5cb6e190850..e841e939ef34b 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp @@ -29,8 +29,6 @@ #include "SIMachineFunctionInfo.h" #include "llvm/Analysis/CallGraph.h" #include "llvm/CodeGen/TargetPassConfig.h" -#include "llvm/IR/GlobalAlias.h" -#include "llvm/IR/GlobalValue.h" #include "llvm/Target/TargetMachine.h" using namespace llvm; @@ -63,8 +61,7 @@ static const Function *getCalleeFunction(const MachineOperand &Op) { assert(Op.getImm() == 0); return nullptr; } - if (auto *GA = dyn_cast(Op.getGlobal())) -return cast(GA->getOperand(0)); + return cast(Op.getGlobal()); } diff --git a/llvm/test/CodeGen/AMDGPU/inline-calls.ll b/llvm/test/CodeGen/AMDGPU/inline-calls.ll index 134cd301b9743..233485a202057 100644 --- a/llvm/test/CodeGen/AMDGPU/inline-calls.ll +++ b/llvm/test/CodeGen/AMDGPU/inline-calls.ll @@ -1,6 +1,6 @@ -; RUN: llc -mtriple amdgcn-unknown-linux-gnu -mcpu=tahiti -verify-machineinstrs < %s | FileCheck %s -; RUN: llc -mtriple amdgcn-unknown-linux-gnu -mcpu=tonga -verify-machineinstrs < %s | FileCheck %s -; RUN: llc -mtriple r600-unknown-linux-gnu -mcpu=redwood -verify-machineinstrs < %s | FileCheck %s --check-prefix=R600 +; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s | FileCheck %s +; RUN: llc -march=amdgcn -mcpu=tonga -veri
[clang] 0567f03 - [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
Author: Anshil Gandhi Date: 2021-10-18T16:53:15-06:00 New Revision: 0567f0333176e476e15b7f32b463f58f7475ff22 URL: https://github.com/llvm/llvm-project/commit/0567f0333176e476e15b7f32b463f58f7475ff22 DIFF: https://github.com/llvm/llvm-project/commit/0567f0333176e476e15b7f32b463f58f7475ff22.diff LOG: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols By default clang emits complete contructors as alias of base constructors if they are the same. The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols. @yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true` and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had to be extended to support aliases to functions. inline-calls.ll was corrected appropriately. Reviewed By: yaxunl, #amdgpu Differential Revision: https://reviews.llvm.org/D109707 Added: clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu Modified: clang/lib/Driver/ToolChains/Clang.cpp llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp llvm/test/CodeGen/AMDGPU/inline-calls.ll Removed: diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 83afbc3952d84..316c6026adf5c 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -5089,9 +5089,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, } // Enable -mconstructor-aliases except on darwin, where we have to work around - // a linker bug (see ), and CUDA/AMDGPU device code, - // where aliases aren't supported. - if (!RawTriple.isOSDarwin() && !RawTriple.isNVPTX() && !RawTriple.isAMDGPU()) + // a linker bug (see ), and CUDA device code, where + // aliases aren't supported. + if (!RawTriple.isOSDarwin() && !RawTriple.isNVPTX()) CmdArgs.push_back("-mconstructor-aliases"); // Darwin's kernel doesn't support guard variables; just die if we diff --git a/clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu b/clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu new file mode 100644 index 0..ec7b7c3b7ff4c --- /dev/null +++ b/clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu @@ -0,0 +1,17 @@ +// REQUIRES: amdgpu-registered-target, clang-driver + +// RUN: %clang -target x86_64-unknown-linux-gnu --offload-arch=gfx906 --cuda-device-only -nogpulib -nogpuinc -x hip -emit-llvm -S -o - %s \ +// RUN: -fgpu-rdc -O3 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false | \ +// RUN: FileCheck %s + +#include "Inputs/cuda.h" + +// CHECK: %struct.B = type { i8 } +struct B { + + // CHECK: @_ZN1BC1Ei = hidden unnamed_addr alias void (%struct.B*, i32), void (%struct.B*, i32)* @_ZN1BC2Ei + __device__ B(int x); +}; + +__device__ B::B(int x) { +} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp index 7ff24d1e9c62b..2e24e9f929d2a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp @@ -15,6 +15,7 @@ #include "AMDGPU.h" #include "AMDGPUTargetMachine.h" #include "Utils/AMDGPUBaseInfo.h" +#include "llvm/CodeGen/CommandFlags.h" #include "llvm/IR/Module.h" #include "llvm/Pass.h" #include "llvm/Support/CommandLine.h" @@ -90,9 +91,13 @@ static bool alwaysInlineImpl(Module &M, bool GlobalOpt) { SmallPtrSet FuncsToAlwaysInline; SmallPtrSet FuncsToNoInline; + Triple TT(M.getTargetTriple()); for (GlobalAlias &A : M.aliases()) { if (Function* F = dyn_cast(A.getAliasee())) { + if (TT.getArch() == Triple::amdgcn && + A.getLinkage() != GlobalValue::InternalLinkage) +continue; A.replaceAllUsesWith(F); AliasesToRemove.push_back(&A); } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp index e841e939ef34b..3c5cb6e190850 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp @@ -29,6 +29,8 @@ #include "SIMachineFunctionInfo.h" #include "llvm/Analysis/CallGraph.h" #include "llvm/CodeGen/TargetPassConfig.h" +#include "llvm/IR/GlobalAlias.h" +#include "llvm/IR/GlobalValue.h" #include "llvm/Target/TargetMachine.h" using namespace llvm; @@ -61,7 +63,8 @@ static const Function *getCalleeFunction(const MachineOperand &Op) { assert(Op.getImm() == 0); return nullptr; } - + if (auto *GA = dyn_cast(Op.getGlobal())) +return cast(GA->getOperand(0)); return cast(Op.getGlobal()); } diff --git a/llvm/test/CodeGen/AMDGPU/inline-calls.ll b/llvm/test/CodeGen/AMDGPU/inline-calls
[clang] 508b066 - [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions
Author: Anshil Gandhi Date: 2021-08-19T20:51:19-06:00 New Revision: 508b06699a396cc6f2f2602dab350860cb69f087 URL: https://github.com/llvm/llvm-project/commit/508b06699a396cc6f2f2602dab350860cb69f087 DIFF: https://github.com/llvm/llvm-project/commit/508b06699a396cc6f2f2602dab350860cb69f087.diff LOG: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions Produce remarks when atomic instructions are expanded into hardware instructions in SIISelLowering.cpp. Currently, these remarks are only emitted for atomic fadd instructions. Differential Revision: https://reviews.llvm.org/D108150 Added: clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl llvm/test/CodeGen/AMDGPU/atomics-cas-remarks-gfx90a.ll llvm/test/CodeGen/AMDGPU/atomics-hw-remarks-gfx90a.ll Modified: llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/lib/Target/AMDGPU/SIISelLowering.cpp Removed: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/test/CodeGen/AMDGPU/atomics-remarks-gfx90a.ll diff --git a/clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl b/clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl similarity index 100% rename from clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl rename to clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl diff --git a/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl b/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl new file mode 100644 index 0..ea3324126c209 --- /dev/null +++ b/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl @@ -0,0 +1,44 @@ +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=si-lower -munsafe-fp-atomics %s -S -emit-llvm -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=GFX90A-HW + +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=si-lower -munsafe-fp-atomics %s -S -o - 2>&1 | \ +// RUN: FileCheck %s --check-prefix=GFX90A-HW-REMARK + + +// REQUIRES: amdgpu-registered-target + +typedef enum memory_order { + memory_order_relaxed = __ATOMIC_RELAXED, + memory_order_acquire = __ATOMIC_ACQUIRE, + memory_order_release = __ATOMIC_RELEASE, + memory_order_acq_rel = __ATOMIC_ACQ_REL, + memory_order_seq_cst = __ATOMIC_SEQ_CST +} memory_order; + +typedef enum memory_scope { + memory_scope_work_item = __OPENCL_MEMORY_SCOPE_WORK_ITEM, + memory_scope_work_group = __OPENCL_MEMORY_SCOPE_WORK_GROUP, + memory_scope_device = __OPENCL_MEMORY_SCOPE_DEVICE, + memory_scope_all_svm_devices = __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES, +#if defined(cl_intel_subgroups) || defined(cl_khr_subgroups) + memory_scope_sub_group = __OPENCL_MEMORY_SCOPE_SUB_GROUP +#endif +} memory_scope; + +// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation at memory scope workgroup-one-as due to an unsafe request. [-Rpass=si-lower] +// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation at memory scope agent-one-as due to an unsafe request. [-Rpass=si-lower] +// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation at memory scope wavefront-one-as due to an unsafe request. [-Rpass=si-lower] +// GFX90A-HW-REMARK: global_atomic_add_f32 v0, v[0:1], v2, off glc +// GFX90A-HW-REMARK: global_atomic_add_f32 v0, v[0:1], v2, off glc +// GFX90A-HW-REMARK: global_atomic_add_f32 v0, v[0:1], v2, off glc +// GFX90A-HW-LABEL: @atomic_unsafe_hw +// GFX90A-HW: atomicrmw fadd float addrspace(1)* %{{.*}}, float %{{.*}} syncscope("workgroup-one-as") monotonic, align 4 +// GFX90A-HW: atomicrmw fadd float addrspace(1)* %{{.*}}, float %{{.*}} syncscope("agent-one-as") monotonic, align 4 +// GFX90A-HW: atomicrmw fadd float addrspace(1)* %{{.*}}, float %{{.*}} syncscope("wavefront-one-as") monotonic, align 4 +void atomic_unsafe_hw(__global atomic_float *d, float a) { + float ret1 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_work_group); + float ret2 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_device); + float ret3 = __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_sub_group); +} diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp index 47cdd222702f2..1297f99698d8b 100644 --- a/llvm/lib/CodeGen/AtomicExpandPass.cpp +++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp @@ -610,7 +610,7 @@ bool AtomicExpand::tryExpandAtomicRMW(AtomicRMWInst *AI) { : SSNs[AI->getSyncScopeID()]; OptimizationRemarkEmitter ORE(AI->getFunction()); ORE.emit([&]() { -return OptimizationRemark(DEBUG_TYPE, "Passed", AI->getFunction()) +return OptimizationRemark(DEBUG_TYPE, "Passed", AI) << "A compare and swap loop was generated for an a
[clang] 7063ac1 - [HIP] Allow target addr space in target builtins
Author: Anshil Gandhi Date: 2021-08-19T23:51:58-06:00 New Revision: 7063ac1afa656bdbb851c8ef120ff699c2e98483 URL: https://github.com/llvm/llvm-project/commit/7063ac1afa656bdbb851c8ef120ff699c2e98483 DIFF: https://github.com/llvm/llvm-project/commit/7063ac1afa656bdbb851c8ef120ff699c2e98483.diff LOG: [HIP] Allow target addr space in target builtins This patch allows target specific addr space in target builtins for HIP. It inserts implicit addr space cast for non-generic pointer to generic pointer in general, and inserts implicit addr space cast for generic to non-generic for target builtin arguments only. It is NFC for non-HIP languages. Differential Revision: https://reviews.llvm.org/D102405 Added: Modified: clang/include/clang/AST/Type.h clang/lib/Basic/Targets/AMDGPU.h clang/lib/Sema/SemaExpr.cpp clang/test/CodeGenCUDA/builtins-amdgcn.cu Removed: diff --git a/clang/include/clang/AST/Type.h b/clang/include/clang/AST/Type.h index 09e9705bd86b8..fc83c895afa2e 100644 --- a/clang/include/clang/AST/Type.h +++ b/clang/include/clang/AST/Type.h @@ -495,7 +495,12 @@ class Qualifiers { (A == LangAS::Default && (B == LangAS::sycl_private || B == LangAS::sycl_local || B == LangAS::sycl_global || B == LangAS::sycl_global_device || - B == LangAS::sycl_global_host)); + B == LangAS::sycl_global_host)) || + // In HIP device compilation, any cuda address space is allowed + // to implicitly cast into the default address space. + (A == LangAS::Default && +(B == LangAS::cuda_constant || B == LangAS::cuda_device || + B == LangAS::cuda_shared)); } /// Returns true if the address space in these qualifiers is equal to or diff --git a/clang/lib/Basic/Targets/AMDGPU.h b/clang/lib/Basic/Targets/AMDGPU.h index 2e580ecf24259..77c2c5fd50145 100644 --- a/clang/lib/Basic/Targets/AMDGPU.h +++ b/clang/lib/Basic/Targets/AMDGPU.h @@ -352,7 +352,18 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo { } LangAS getCUDABuiltinAddressSpace(unsigned AS) const override { -return LangAS::Default; +switch (AS) { +case 0: + return LangAS::Default; +case 1: + return LangAS::cuda_device; +case 3: + return LangAS::cuda_shared; +case 4: + return LangAS::cuda_constant; +default: + return getLangASFromTargetAS(AS); +} } llvm::Optional getConstantAddressSpace() const override { diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index 8ef4a9d96320b..5bde87d02877e 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -6572,6 +6572,53 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, return ExprError(); checkDirectCallValidity(*this, Fn, FD, ArgExprs); + +// If this expression is a call to a builtin function in HIP device +// compilation, allow a pointer-type argument to default address space to be +// passed as a pointer-type parameter to a non-default address space. +// If Arg is declared in the default address space and Param is declared +// in a non-default address space, perform an implicit address space cast to +// the parameter type. +if (getLangOpts().HIP && getLangOpts().CUDAIsDevice && FD && +FD->getBuiltinID()) { + for (unsigned Idx = 0; Idx < FD->param_size(); ++Idx) { +ParmVarDecl *Param = FD->getParamDecl(Idx); +if (!ArgExprs[Idx] || !Param || !Param->getType()->isPointerType() || +!ArgExprs[Idx]->getType()->isPointerType()) + continue; + +auto ParamAS = Param->getType()->getPointeeType().getAddressSpace(); +auto ArgTy = ArgExprs[Idx]->getType(); +auto ArgPtTy = ArgTy->getPointeeType(); +auto ArgAS = ArgPtTy.getAddressSpace(); + +// Only allow implicit casting from a non-default address space pointee +// type to a default address space pointee type +if (ArgAS != LangAS::Default || ParamAS == LangAS::Default) + continue; + +// First, ensure that the Arg is an RValue. +if (ArgExprs[Idx]->isGLValue()) { + ArgExprs[Idx] = ImplicitCastExpr::Create( + Context, ArgExprs[Idx]->getType(), CK_NoOp, ArgExprs[Idx], + nullptr, VK_PRValue, FPOptionsOverride()); +} + +// Construct a new arg type with address space of Param +Qualifiers ArgPtQuals = ArgPtTy.getQualifiers(); +ArgPtQuals.setAddressSpace(ParamAS); +auto NewArgPtTy = +Context.getQualifiedType(ArgPtTy.getUnqualifiedType(), ArgPtQuals); +auto NewArgTy = +Context.getQualifiedType(Context.getPointerType(NewArgPtTy), + ArgTy.getQualifiers()); + +// Finally perf
[clang] [llvm] [AMDGPU][LTO] Assume closed world after linking (PR #105845)
https://github.com/gandhi56 created https://github.com/llvm/llvm-project/pull/105845 Change-Id: I7d8fa4251c80a6f815f55a0998677d18ade25b72 >From 8830b6f390039c9a952a86ea52e8fe9559900448 Mon Sep 17 00:00:00 2001 From: Anshil Gandhi Date: Thu, 22 Aug 2024 18:57:33 + Subject: [PATCH] [AMDGPU][LTO] Assume closed world after linking Change-Id: I7d8fa4251c80a6f815f55a0998677d18ade25b72 --- clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu | 12 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp| 3 +++ llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 2 +- 3 files changed, 16 insertions(+), 1 deletion(-) create mode 100644 clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu diff --git a/clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu b/clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu new file mode 100644 index 00..614917aecc0d60 --- /dev/null +++ b/clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu @@ -0,0 +1,12 @@ +// RUN: clang -x hip -O3 -fgpu-rdc %s -mllvm -debug-only=amdgpu-attributor -o - | FileCheck %s + +// CHECK: Module {{.*}} is not assumed to be a closed world +// CHECK: Module ld-temp.o is assumed to be a closed world + +__attribute__((device)) int foo() { +return 1; +} + +int main() { +return 0; +} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp index d65e0ae92308e6..c78fc66e41ec58 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp @@ -1066,6 +1066,9 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM, Attributor A(Functions, InfoCache, AC); + LLVM_DEBUG(dbgs() << "Module " << M.getName() << " is " << (AC.IsClosedWorldModule ? "" : "not ") +<< "assumed to be a closed world\n"); + for (Function &F : M) { if (F.isIntrinsic()) continue; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index 7ac7b3315bb972..a4898366a21ee1 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -761,7 +761,7 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) { if (EnableLowerModuleLDS) PM.addPass(AMDGPULowerModuleLDSPass(*this)); if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0) - PM.addPass(AMDGPUAttributorPass(*this)); + PM.addPass(AMDGPUAttributorPass(*this, AMDGPUAttributorOptions{true} )); }); PB.registerRegClassFilterParsingCallback( ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][LTO] Assume closed world after linking (PR #105845)
https://github.com/gandhi56 edited https://github.com/llvm/llvm-project/pull/105845 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][LTO] Assume closed world after linking (PR #105845)
https://github.com/gandhi56 updated https://github.com/llvm/llvm-project/pull/105845 >From d4b8e5b213b4ea9b5b615354d264b71ed76508d5 Mon Sep 17 00:00:00 2001 From: Anshil Gandhi Date: Thu, 22 Aug 2024 18:57:33 + Subject: [PATCH] [AMDGPU][LTO] Assume closed world after linking Change-Id: I7d8fa4251c80a6f815f55a0998677d18ade25b72 --- clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu | 12 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp| 4 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 3 ++- 3 files changed, 18 insertions(+), 1 deletion(-) create mode 100644 clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu diff --git a/clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu b/clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu new file mode 100644 index 00..614917aecc0d60 --- /dev/null +++ b/clang/test/CodeGenCUDA/gpu-rdc-amdgpu-attrs.cu @@ -0,0 +1,12 @@ +// RUN: clang -x hip -O3 -fgpu-rdc %s -mllvm -debug-only=amdgpu-attributor -o - | FileCheck %s + +// CHECK: Module {{.*}} is not assumed to be a closed world +// CHECK: Module ld-temp.o is assumed to be a closed world + +__attribute__((device)) int foo() { +return 1; +} + +int main() { +return 0; +} diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp index d65e0ae92308e6..53ee3e42eef4c8 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp @@ -1066,6 +1066,10 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM, Attributor A(Functions, InfoCache, AC); + LLVM_DEBUG(dbgs() << "Module " << M.getName() << " is " +<< (AC.IsClosedWorldModule ? "" : "not ") +<< "assumed to be a closed world\n"); + for (Function &F : M) { if (F.isIntrinsic()) continue; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index 7ac7b3315bb972..869afdcc62dbf6 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -761,7 +761,8 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) { if (EnableLowerModuleLDS) PM.addPass(AMDGPULowerModuleLDSPass(*this)); if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0) - PM.addPass(AMDGPUAttributorPass(*this)); + PM.addPass( + AMDGPUAttributorPass(*this, AMDGPUAttributorOptions{true})); }); PB.registerRegClassFilterParsingCallback( ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [NFC][Clang] Precommit test for VTable codegen (PR #124983)
https://github.com/gandhi56 updated https://github.com/llvm/llvm-project/pull/124983 >From 0e2317ae0ef1377bc461e7e461bf3b699d75014d Mon Sep 17 00:00:00 2001 From: Anshil Gandhi Date: Tue, 28 Jan 2025 18:04:44 -0600 Subject: [PATCH 1/3] [CUDA] Precommit test for VTable codegen --- .../CodeGenCUDA/increment-index-for-thunks.cu | 33 +++ 1 file changed, 33 insertions(+) create mode 100644 clang/test/CodeGenCUDA/increment-index-for-thunks.cu diff --git a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu new file mode 100644 index 000..65c929ad4ce7065 --- /dev/null +++ b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu @@ -0,0 +1,33 @@ +// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -target-cpu gfx942 \ +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s + +// CHECK: @_ZTV1C = linkonce_odr unnamed_addr addrspace(1) constant { [5 x ptr addrspace(1)], [4 x ptr addrspace(1)] } { [5 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))], [4 x ptr addrspace(1)] [ptr addrspace(1) inttoptr (i64 -8 to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @_ZTV1B = linkonce_odr unnamed_addr addrspace(1) constant { [3 x ptr addrspace(1)] } { [3 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @_ZTV1A = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @__hip_cuid_ = addrspace(1) global i8 0 +// CHECK: @llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_ to ptr)], section "llvm.metadata" +// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500 + +struct A { + __attribute__((device)) A() { } + virtual void neither_device_nor_host_f() = 0 ; + __attribute__((device)) virtual void f1() = 0; + +}; + +struct B { + __attribute__((device)) B() { } + __attribute__((device)) virtual void f2() { }; +}; + +struct C : public B, public A { + __attribute__((device)) C() : B(), A() { } + + virtual void neither_device_nor_host_f() override { } + __attribute__((device)) virtual void f1() override { } + +}; + +__attribute__((device)) void test() { + C obj; +} >From 96111e075df61e6e137f0f0c9c6b2aaefb5beca2 Mon Sep 17 00:00:00 2001 From: Anshil Gandhi Date: Wed, 29 Jan 2025 15:06:45 -0600 Subject: [PATCH 2/3] - Test for SPIRV codegen as well --- .../CodeGenCUDA/increment-index-for-thunks.cu | 22 +-- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu index 65c929ad4ce7065..c4ad064f4d938ce 100644 --- a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu +++ b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu @@ -1,12 +1,20 @@ // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -target-cpu gfx942 \ -// RUN: -emit-llvm -xhip %s -o - | FileCheck %s +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s --check-prefix=GCN +// RUN: %clang_cc1 -fcuda-is-device -triple spirv64-amd-amdhsa \ +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s --check-prefix=SPIRV -// CHECK: @_ZTV1C = linkonce_odr unnamed_addr addrspace(1) constant { [5 x ptr addrspace(1)], [4 x ptr addrspace(1)] } { [5 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))], [4 x ptr addrspace(1)] [ptr addrspace(1) inttoptr (i64 -8 to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))] }, comdat, align 8 -// CHECK: @_ZTV1B = linkonce_odr unnamed_addr addrspace(1) constant { [3 x ptr addrspace(1)] } { [3 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1))] }, comdat, align 8 -// CHECK: @_ZTV1A = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1))] }, comdat, align 8 -// CHECK: @__hip_cuid_ = addrspace(1) global i8 0 -// CHECK: @llvm.co
[clang] [CUDA] Increment VTable index for device thunks (PR #124989)
@@ -0,0 +1,41 @@ +// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -target-cpu gfx942 \ +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s --check-prefix=GCN +// RUN: %clang_cc1 -fcuda-is-device -triple spirv64-amd-amdhsa \ +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s --check-prefix=SPIRV + +// GCN: @_ZTV1C = linkonce_odr unnamed_addr addrspace(1) constant { [5 x ptr addrspace(1)], [4 x ptr addrspace(1)] } { [5 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))], [4 x ptr addrspace(1)] [ptr addrspace(1) inttoptr (i64 -8 to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZThn8_N1C2f1Ev to ptr addrspace(1))] }, comdat, align 8 +// GCN: @_ZTV1B = linkonce_odr unnamed_addr addrspace(1) constant { [3 x ptr addrspace(1)] } { [3 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1))] }, comdat, align 8 +// GCN: @_ZTV1A = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1))] }, comdat, align 8 +// GCN: @__hip_cuid_ = addrspace(1) global i8 0 gandhi56 wrote: I removed them in https://github.com/llvm/llvm-project/pull/124983. I will rebase this PR as soon as https://github.com/llvm/llvm-project/pull/124983 is merged in. https://github.com/llvm/llvm-project/pull/124989 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Precommit test for VTable codegen (PR #124983)
https://github.com/gandhi56 updated https://github.com/llvm/llvm-project/pull/124983 >From 0e2317ae0ef1377bc461e7e461bf3b699d75014d Mon Sep 17 00:00:00 2001 From: Anshil Gandhi Date: Tue, 28 Jan 2025 18:04:44 -0600 Subject: [PATCH 1/2] [CUDA] Precommit test for VTable codegen --- .../CodeGenCUDA/increment-index-for-thunks.cu | 33 +++ 1 file changed, 33 insertions(+) create mode 100644 clang/test/CodeGenCUDA/increment-index-for-thunks.cu diff --git a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu new file mode 100644 index 00..65c929ad4ce706 --- /dev/null +++ b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu @@ -0,0 +1,33 @@ +// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -target-cpu gfx942 \ +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s + +// CHECK: @_ZTV1C = linkonce_odr unnamed_addr addrspace(1) constant { [5 x ptr addrspace(1)], [4 x ptr addrspace(1)] } { [5 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))], [4 x ptr addrspace(1)] [ptr addrspace(1) inttoptr (i64 -8 to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @_ZTV1B = linkonce_odr unnamed_addr addrspace(1) constant { [3 x ptr addrspace(1)] } { [3 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @_ZTV1A = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @__hip_cuid_ = addrspace(1) global i8 0 +// CHECK: @llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_ to ptr)], section "llvm.metadata" +// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500 + +struct A { + __attribute__((device)) A() { } + virtual void neither_device_nor_host_f() = 0 ; + __attribute__((device)) virtual void f1() = 0; + +}; + +struct B { + __attribute__((device)) B() { } + __attribute__((device)) virtual void f2() { }; +}; + +struct C : public B, public A { + __attribute__((device)) C() : B(), A() { } + + virtual void neither_device_nor_host_f() override { } + __attribute__((device)) virtual void f1() override { } + +}; + +__attribute__((device)) void test() { + C obj; +} >From 96111e075df61e6e137f0f0c9c6b2aaefb5beca2 Mon Sep 17 00:00:00 2001 From: Anshil Gandhi Date: Wed, 29 Jan 2025 15:06:45 -0600 Subject: [PATCH 2/2] - Test for SPIRV codegen as well --- .../CodeGenCUDA/increment-index-for-thunks.cu | 22 +-- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu index 65c929ad4ce706..c4ad064f4d938c 100644 --- a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu +++ b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu @@ -1,12 +1,20 @@ // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -target-cpu gfx942 \ -// RUN: -emit-llvm -xhip %s -o - | FileCheck %s +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s --check-prefix=GCN +// RUN: %clang_cc1 -fcuda-is-device -triple spirv64-amd-amdhsa \ +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s --check-prefix=SPIRV -// CHECK: @_ZTV1C = linkonce_odr unnamed_addr addrspace(1) constant { [5 x ptr addrspace(1)], [4 x ptr addrspace(1)] } { [5 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))], [4 x ptr addrspace(1)] [ptr addrspace(1) inttoptr (i64 -8 to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))] }, comdat, align 8 -// CHECK: @_ZTV1B = linkonce_odr unnamed_addr addrspace(1) constant { [3 x ptr addrspace(1)] } { [3 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1))] }, comdat, align 8 -// CHECK: @_ZTV1A = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1))] }, comdat, align 8 -// CHECK: @__hip_cuid_ = addrspace(1) global i8 0 -// CHECK: @llvm.compil
[clang] [CUDA] Precommit test for VTable codegen (PR #124983)
https://github.com/gandhi56 edited https://github.com/llvm/llvm-project/pull/124983 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Increment VTable index for device thunks (PR #124989)
https://github.com/gandhi56 ready_for_review https://github.com/llvm/llvm-project/pull/124989 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [NFC][Clang] Precommit test for VTable codegen (PR #124983)
https://github.com/gandhi56 edited https://github.com/llvm/llvm-project/pull/124983 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Increment VTable index for device thunks (PR #124989)
https://github.com/gandhi56 created https://github.com/llvm/llvm-project/pull/124989 None >From 0e2317ae0ef1377bc461e7e461bf3b699d75014d Mon Sep 17 00:00:00 2001 From: Anshil Gandhi Date: Tue, 28 Jan 2025 18:04:44 -0600 Subject: [PATCH 1/3] [CUDA] Precommit test for VTable codegen --- .../CodeGenCUDA/increment-index-for-thunks.cu | 33 +++ 1 file changed, 33 insertions(+) create mode 100644 clang/test/CodeGenCUDA/increment-index-for-thunks.cu diff --git a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu new file mode 100644 index 00..65c929ad4ce706 --- /dev/null +++ b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu @@ -0,0 +1,33 @@ +// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -target-cpu gfx942 \ +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s + +// CHECK: @_ZTV1C = linkonce_odr unnamed_addr addrspace(1) constant { [5 x ptr addrspace(1)], [4 x ptr addrspace(1)] } { [5 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))], [4 x ptr addrspace(1)] [ptr addrspace(1) inttoptr (i64 -8 to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @_ZTV1B = linkonce_odr unnamed_addr addrspace(1) constant { [3 x ptr addrspace(1)] } { [3 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @_ZTV1A = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @__hip_cuid_ = addrspace(1) global i8 0 +// CHECK: @llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_ to ptr)], section "llvm.metadata" +// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500 + +struct A { + __attribute__((device)) A() { } + virtual void neither_device_nor_host_f() = 0 ; + __attribute__((device)) virtual void f1() = 0; + +}; + +struct B { + __attribute__((device)) B() { } + __attribute__((device)) virtual void f2() { }; +}; + +struct C : public B, public A { + __attribute__((device)) C() : B(), A() { } + + virtual void neither_device_nor_host_f() override { } + __attribute__((device)) virtual void f1() override { } + +}; + +__attribute__((device)) void test() { + C obj; +} >From 96111e075df61e6e137f0f0c9c6b2aaefb5beca2 Mon Sep 17 00:00:00 2001 From: Anshil Gandhi Date: Wed, 29 Jan 2025 15:06:45 -0600 Subject: [PATCH 2/3] - Test for SPIRV codegen as well --- .../CodeGenCUDA/increment-index-for-thunks.cu | 22 +-- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu index 65c929ad4ce706..c4ad064f4d938c 100644 --- a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu +++ b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu @@ -1,12 +1,20 @@ // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -target-cpu gfx942 \ -// RUN: -emit-llvm -xhip %s -o - | FileCheck %s +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s --check-prefix=GCN +// RUN: %clang_cc1 -fcuda-is-device -triple spirv64-amd-amdhsa \ +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s --check-prefix=SPIRV -// CHECK: @_ZTV1C = linkonce_odr unnamed_addr addrspace(1) constant { [5 x ptr addrspace(1)], [4 x ptr addrspace(1)] } { [5 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))], [4 x ptr addrspace(1)] [ptr addrspace(1) inttoptr (i64 -8 to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))] }, comdat, align 8 -// CHECK: @_ZTV1B = linkonce_odr unnamed_addr addrspace(1) constant { [3 x ptr addrspace(1)] } { [3 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1))] }, comdat, align 8 -// CHECK: @_ZTV1A = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1))] }, comdat, align 8 -// CHECK: @__hip_cuid_ = addrspace(1) global i8 0 -// CHECK: @llvm.
[clang] [CUDA] Precommit test for VTable codegen (PR #124983)
https://github.com/gandhi56 created https://github.com/llvm/llvm-project/pull/124983 None >From 0e2317ae0ef1377bc461e7e461bf3b699d75014d Mon Sep 17 00:00:00 2001 From: Anshil Gandhi Date: Tue, 28 Jan 2025 18:04:44 -0600 Subject: [PATCH] [CUDA] Precommit test for VTable codegen --- .../CodeGenCUDA/increment-index-for-thunks.cu | 33 +++ 1 file changed, 33 insertions(+) create mode 100644 clang/test/CodeGenCUDA/increment-index-for-thunks.cu diff --git a/clang/test/CodeGenCUDA/increment-index-for-thunks.cu b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu new file mode 100644 index 00..65c929ad4ce706 --- /dev/null +++ b/clang/test/CodeGenCUDA/increment-index-for-thunks.cu @@ -0,0 +1,33 @@ +// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -target-cpu gfx942 \ +// RUN: -emit-llvm -xhip %s -o - | FileCheck %s + +// CHECK: @_ZTV1C = linkonce_odr unnamed_addr addrspace(1) constant { [5 x ptr addrspace(1)], [4 x ptr addrspace(1)] } { [5 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))], [4 x ptr addrspace(1)] [ptr addrspace(1) inttoptr (i64 -8 to ptr addrspace(1)), ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1C2f1Ev to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @_ZTV1B = linkonce_odr unnamed_addr addrspace(1) constant { [3 x ptr addrspace(1)] } { [3 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN1B2f2Ev to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @_ZTV1A = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1))] }, comdat, align 8 +// CHECK: @__hip_cuid_ = addrspace(1) global i8 0 +// CHECK: @llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_ to ptr)], section "llvm.metadata" +// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500 + +struct A { + __attribute__((device)) A() { } + virtual void neither_device_nor_host_f() = 0 ; + __attribute__((device)) virtual void f1() = 0; + +}; + +struct B { + __attribute__((device)) B() { } + __attribute__((device)) virtual void f2() { }; +}; + +struct C : public B, public A { + __attribute__((device)) C() : B(), A() { } + + virtual void neither_device_nor_host_f() override { } + __attribute__((device)) virtual void f1() override { } + +}; + +__attribute__((device)) void test() { + C obj; +} ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Emit NULL in VTable in the last resort (PR #124687)
https://github.com/gandhi56 closed https://github.com/llvm/llvm-project/pull/124687 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Increment VTable index for device thunks (PR #124989)
gandhi56 wrote: Internal tests have passed. https://github.com/llvm/llvm-project/pull/124989 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Increment VTable index for device thunks (PR #124989)
https://github.com/gandhi56 closed https://github.com/llvm/llvm-project/pull/124989 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Increment VTable index for device thunks (PR #124989)
gandhi56 wrote: Appreciate the review, @yxsamliu https://github.com/llvm/llvm-project/pull/124989 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [NFC][Clang] Precommit test for VTable codegen (PR #124983)
gandhi56 wrote: @yxsamliu will you approve this PR since you have approved #124989? Thanks. https://github.com/llvm/llvm-project/pull/124983 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [NFC][Clang] Precommit test for VTable codegen (PR #124983)
https://github.com/gandhi56 closed https://github.com/llvm/llvm-project/pull/124983 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CUDA] Increment VTable index for device thunks (PR #124989)
gandhi56 wrote: ping https://github.com/llvm/llvm-project/pull/124989 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits