llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-clang Author: Lewis Crawford (LewisCrawford) <details> <summary>Changes</summary> CUDA semantics specify that clz(0) = bitwidth, so clang should emit clz / ctz intrinsics for NVPTX with zero-is-poison = false. --- Full diff: https://github.com/llvm/llvm-project/pull/185630.diff 2 Files Affected: - (modified) clang/lib/Basic/Targets/NVPTX.h (+2) - (added) clang/test/CodeGenCUDA/builtin-count-zeros-nvptx.cu (+12) ``````````diff diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h index 6f8df323f379c..7921a042e9e9b 100644 --- a/clang/lib/Basic/Targets/NVPTX.h +++ b/clang/lib/Basic/Targets/NVPTX.h @@ -83,6 +83,8 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo { bool useFP16ConversionIntrinsics() const override { return false; } + bool isCLZForZeroUndef() const override { return false; } + bool initFeatureMap(llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU, diff --git a/clang/test/CodeGenCUDA/builtin-count-zeros-nvptx.cu b/clang/test/CodeGenCUDA/builtin-count-zeros-nvptx.cu new file mode 100644 index 0000000000000..f003b32ca73b2 --- /dev/null +++ b/clang/test/CodeGenCUDA/builtin-count-zeros-nvptx.cu @@ -0,0 +1,12 @@ +// REQUIRES: nvptx-registered-target +// RUN: %clang_cc1 -x cuda -triple nvptx64-unknown-unknown -fcuda-is-device -emit-llvm %s -o - | FileCheck %s +// +// Ensure NVPTX uses isCLZForZeroUndef() = false (CUDA semantics: CLZ(i32 0) == 32). + +#include "Inputs/cuda.h" + +__device__ int f(int x) { + return __builtin_ctz(x) + __builtin_clz(x); +} +// CHECK: call i32 @llvm.cttz.i32({{.*}}, i1 false) +// CHECK: call i32 @llvm.ctlz.i32({{.*}}, i1 false) `````````` </details> https://github.com/llvm/llvm-project/pull/185630 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
