https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/98209
Summary: Currently, the GPU gets its math by using wrapper headers that eagerly replace libcalls with calls to the vendor's math library. e.g. ``` // __clang_cuda_math.h [[gnu::always_inline]] double sin(double __x) { return __nv_sin(__x); } ``` However, we want to be able to move away from including these headers. When these headers are not included, the lack of `errno` on the GPU target enables these to be transformed into intrinsic calls. These intrinsic calls will then potentially not be supported by the backend, see https://godbolt.org/z/oKvTevaE1. Even in the case that these functions are supported, we still want to use regular libcalls now so that the LTO linking will replace these calls before they reach the backend. This patch simply changes the logic to prevent emitting intrinsic functions for the standard math library functions. This means that `sin` will not be an intrinsic, but `__builtin_sin` will. A better solution long-term would be to have a pass that does custom lowering of all of these before LTO linking if possible. >From d6927d897b990c018bd5bac868de5aa406d878ab Mon Sep 17 00:00:00 2001 From: Joseph Huber <hube...@outlook.com> Date: Tue, 9 Jul 2024 14:43:55 -0500 Subject: [PATCH] [Clang] Do not emit intrinsic math functions on GPU targets Summary: Currently, the GPU gets its math by using wrapper headers that eagerly replace libcalls with calls to the vendor's math library. e.g. ``` // __clang_cuda_math.h [[gnu::always_inline]] double sin(double __x) { return __nv_sin(__x); } ``` However, we want to be able to move away from including these headers. When these headers are not included, the lack of `errno` on the GPU target enables these to be transformed into intrinsic calls. These intrinsic calls will then potentially not be supported by the backend, see https://godbolt.org/z/oKvTevaE1. Even in the case that these functions are supported, we still want to use regular libcalls now so that the LTO linking will replace these calls before they reach the backend. This patch simply changes the logic to prevent emitting intrinsic functions for the standard math library functions. This means that `sin` will not be an intrinsic, but `__builtin_sin` will. A better solution long-term would be to have a pass that does custom lowering of all of these before LTO linking if possible. --- clang/lib/CodeGen/CGBuiltin.cpp | 6 +++ clang/test/CodeGen/gpu-math-libcalls.c | 51 ++++++++++++++++++++++++++ 2 files changed, 57 insertions(+) create mode 100644 clang/test/CodeGen/gpu-math-libcalls.c diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 6cc0d9485720c..89c27147a2bd9 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -2637,6 +2637,12 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, GenerateIntrinsics = ConstWithoutErrnoOrExceptions && ErrnoOverridenToFalseWithOpt; } + // The GPU targets do not want math intrinsics to reach the backend. + // TODO: We should add a custom pass to lower these early enough for LTO. + if (getTarget().getTriple().isNVPTX() || getTarget().getTriple().isAMDGPU()) + GenerateIntrinsics = !getContext().BuiltinInfo.isPredefinedLibFunction( + BuiltinIDIfNoAsmLabel); + if (GenerateIntrinsics) { switch (BuiltinIDIfNoAsmLabel) { case Builtin::BIceil: diff --git a/clang/test/CodeGen/gpu-math-libcalls.c b/clang/test/CodeGen/gpu-math-libcalls.c new file mode 100644 index 0000000000000..436ad0384ee2d --- /dev/null +++ b/clang/test/CodeGen/gpu-math-libcalls.c @@ -0,0 +1,51 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa %s -emit-llvm -o - | FileCheck %s --check-prefix AMDGPU +// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda %s -emit-llvm -o - | FileCheck %s --check-prefix NVPTX + +double sin(double); +double cos(double); +double sqrt(double); + +// AMDGPU-LABEL: define dso_local void @libcalls( +// AMDGPU-SAME: ) #[[ATTR0:[0-9]+]] { +// AMDGPU-NEXT: [[ENTRY:.*:]] +// AMDGPU-NEXT: [[CALL:%.*]] = call double @sin(double noundef 0.000000e+00) #[[ATTR3:[0-9]+]] +// AMDGPU-NEXT: [[CALL1:%.*]] = call double @cos(double noundef 0.000000e+00) #[[ATTR3]] +// AMDGPU-NEXT: [[CALL2:%.*]] = call double @sqrt(double noundef 0.000000e+00) #[[ATTR3]] +// AMDGPU-NEXT: ret void +// +// NVPTX-LABEL: define dso_local void @libcalls( +// NVPTX-SAME: ) #[[ATTR0:[0-9]+]] { +// NVPTX-NEXT: [[ENTRY:.*:]] +// NVPTX-NEXT: [[CALL:%.*]] = call double @sin(double noundef 0.000000e+00) #[[ATTR3:[0-9]+]] +// NVPTX-NEXT: [[CALL1:%.*]] = call double @cos(double noundef 0.000000e+00) #[[ATTR3]] +// NVPTX-NEXT: [[CALL2:%.*]] = call double @sqrt(double noundef 0.000000e+00) #[[ATTR3]] +// NVPTX-NEXT: ret void +// +void libcalls() { + (void)sin(0.); + (void)cos(0.); + (void)sqrt(0.); +} + +// AMDGPU-LABEL: define dso_local void @builtins( +// AMDGPU-SAME: ) #[[ATTR0]] { +// AMDGPU-NEXT: [[ENTRY:.*:]] +// AMDGPU-NEXT: [[TMP0:%.*]] = call double @llvm.sin.f64(double 0.000000e+00) +// AMDGPU-NEXT: [[TMP1:%.*]] = call double @llvm.cos.f64(double 0.000000e+00) +// AMDGPU-NEXT: [[TMP2:%.*]] = call double @llvm.sqrt.f64(double 0.000000e+00) +// AMDGPU-NEXT: ret void +// +// NVPTX-LABEL: define dso_local void @builtins( +// NVPTX-SAME: ) #[[ATTR0]] { +// NVPTX-NEXT: [[ENTRY:.*:]] +// NVPTX-NEXT: [[TMP0:%.*]] = call double @llvm.sin.f64(double 0.000000e+00) +// NVPTX-NEXT: [[TMP1:%.*]] = call double @llvm.cos.f64(double 0.000000e+00) +// NVPTX-NEXT: [[TMP2:%.*]] = call double @llvm.sqrt.f64(double 0.000000e+00) +// NVPTX-NEXT: ret void +// +void builtins() { + (void)__builtin_sin(0.); + (void)__builtin_cos(0.); + (void)__builtin_sqrt(0.); +} _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits