[Lldb-commits] [flang] [clang] [lldb] [libc] [compiler-rt] [clang-tools-extra] [lld] [llvm] [libcxx] [openmp] Gcc 75 libomptarget type convert (PR #75562)
@@ -47,7 +47,9 @@ PluginAdaptorTy::create(const std::string &Name) { new PluginAdaptorTy(Name, std::move(LibraryHandler))); if (auto Err = PluginAdaptor->init()) return Err; - return PluginAdaptor; jhuber6 wrote: Does putting `std::move` here not work? https://github.com/llvm/llvm-project/pull/75562 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lld] [libc] [clang-tools-extra] [compiler-rt] [lldb] [llvm] [flang] [libcxx] [openmp] [clang] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -163,3 +163,87 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { +consumeError(std::move(Err)); +return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) +return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); jhuber6 wrote: Made a patch in https://github.com/llvm/llvm-project/pull/76992 and https://github.com/llvm/llvm-project/pull/76970 to make this actually work. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [libc] [clang] [lld] [clang-tools-extra] [compiler-rt] [flang] [lldb] [libcxx] [llvm] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -163,3 +163,87 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { +consumeError(std::move(Err)); +return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) +return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { +if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { +// Read in profiled function names +std::vector chars(sym.getSize() / sizeof(char), ' '); jhuber6 wrote: Why are we turning this into a vector of chars? Also isn't `sizeof(char)` pretty much always going to be `1`? https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [compiler-rt] [lldb] [openmp] [llvm] [clang-tools-extra] [lld] [flang] [clang] [libcxx] [libc] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -163,3 +163,87 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { +consumeError(std::move(Err)); +return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) +return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { +if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { +// Read in profiled function names +std::vector chars(sym.getSize() / sizeof(char), ' '); +GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); +if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; +std::string names(chars.begin(), chars.end()); +profdata.names = std::move(names); + } else if (name->starts_with(getInstrProfCountersVarPrefix())) { +// Read global variable profiling counts +std::vector counts(sym.getSize() / sizeof(int64_t), 0); +GlobalTy CountGlobal(name->str(), sym.getSize(), counts.data()); +if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal)) + return Err; +profdata.counts.push_back(std::move(counts)); + } else if (name->starts_with(getInstrProfDataVarPrefix())) { +// Read profiling data for this global variable +__llvm_profile_data data{}; +GlobalTy DataGlobal(name->str(), sym.getSize(), &data); +if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) + return Err; +profdata.data.push_back(std::move(data)); + } +} + } + return profdata; +} jhuber6 wrote: LLVM style for everything here. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [compiler-rt] [clang] [clang-tools-extra] [flang] [llvm] [libcxx] [lld] [lldb] [libc] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; jhuber6 wrote: What's the utility of this? https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lld] [clang-tools-extra] [openmp] [flang] [libc] [libcxx] [llvm] [lldb] [compiler-rt] [clang] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; +struct __llvm_profile_data { +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#include "llvm/ProfileData/InstrProfData.inc" +}; + +/// PGO profiling data extracted from a GPU device +struct GPUProfGlobals { + std::string names; + std::vector> counts; + std::vector<__llvm_profile_data> data; + Triple targetTriple; + jhuber6 wrote: These should probably use LLVM structs. E.g. `StringRef` is the name is a constant string with stable storage and `SmallVector`. I'd really appreciate some descriptions of how this is supposed to look and how it interacts with the existing profile data. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lld] [libcxx] [clang-tools-extra] [compiler-rt] [clang] [flang] [llvm] [libc] [openmp] [lldb] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -163,3 +163,87 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { +consumeError(std::move(Err)); +return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; jhuber6 wrote: ```suggestion GPUProfGlobals ProfData; ``` LLVM style. Also not a fan of the name. Maybe `DeviceProfileData` or something. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [libc] [openmp] [compiler-rt] [libcxx] [clang-tools-extra] [lld] [llvm] [clang] [flang] [lldb] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -163,3 +163,87 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { +consumeError(std::move(Err)); +return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) +return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { +if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { +// Read in profiled function names +std::vector chars(sym.getSize() / sizeof(char), ' '); +GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); +if (auto Err = readGlobalFromDevice(Device, Image, NamesGlobal)) + return Err; +std::string names(chars.begin(), chars.end()); +profdata.names = std::move(names); + } else if (name->starts_with(getInstrProfCountersVarPrefix())) { jhuber6 wrote: Are the `getInstrProfCountersVarPrefix` function preexisting? I don't see them defined in this patch set. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lld] [lldb] [clang-tools-extra] [compiler-rt] [flang] [llvm] [clang] [libcxx] [openmp] [libc] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -163,3 +163,87 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { +consumeError(std::move(Err)); +return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) +return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { +if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { +// Read in profiled function names +std::vector chars(sym.getSize() / sizeof(char), ' '); +GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); jhuber6 wrote: Okay, we're reading a string back from the device? What's the purpose of that? Also, just so you know, the ELF will only contain the correct size if it's emitted as an array. E.g. ``` const char a[] = "a"; // strlen("a") + 1 in ELF const char *b = "b"; // sizeof(char *) in ELF ``` https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [openmp] [lld] [clang-tools-extra] [libcxx] [llvm] [flang] [libc] [clang] [lldb] [compiler-rt] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -163,3 +163,87 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { +consumeError(std::move(Err)); +return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) +return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { +if (auto name = sym.getName()) { jhuber6 wrote: This is incorrect. If this returns an error it will exit the if, call the deconstructor, and then crash the program because it was not handled. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang-tools-extra] [libc] [lldb] [openmp] [clang] [llvm] [flang] [compiler-rt] [libcxx] [lld] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; +struct __llvm_profile_data { +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#include "llvm/ProfileData/InstrProfData.inc" +}; + +/// PGO profiling data extracted from a GPU device +struct GPUProfGlobals { + std::string names; + std::vector> counts; + std::vector<__llvm_profile_data> data; + Triple targetTriple; + jhuber6 wrote: All of them, SmallVector is a std::vector with small size optimizations like `std::string` basically. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang-tools-extra] [libc] [lldb] [openmp] [clang] [llvm] [flang] [compiler-rt] [libcxx] [lld] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; jhuber6 wrote: Okay. you should use the C++ `using` keyword instead of C's `typedef. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [libc] [lld] [lldb] [clang-tools-extra] [llvm] [compiler-rt] [flang] [libcxx] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [libc] [lld] [lldb] [clang-tools-extra] [llvm] [compiler-rt] [flang] [libcxx] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -163,3 +163,87 @@ Error GenericGlobalHandlerTy::readGlobalFromImage(GenericDeviceTy &Device, return Plugin::success(); } + +bool GenericGlobalHandlerTy::hasProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GlobalTy global(getInstrProfNamesVarName().str(), 0); + if (auto Err = getGlobalMetadataFromImage(Device, Image, global)) { +consumeError(std::move(Err)); +return false; + } + return true; +} + +Expected +GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, + DeviceImageTy &Image) { + GPUProfGlobals profdata; + auto ELFObj = getELFObjectFile(Image); + if (!ELFObj) +return ELFObj.takeError(); + profdata.targetTriple = ELFObj->makeTriple(); + // Iterate through elf symbols + for (auto &sym : ELFObj->symbols()) { +if (auto name = sym.getName()) { + // Check if given current global is a profiling global based + // on name + if (name->equals(getInstrProfNamesVarName())) { +// Read in profiled function names +std::vector chars(sym.getSize() / sizeof(char), ' '); +GlobalTy NamesGlobal(name->str(), sym.getSize(), chars.data()); jhuber6 wrote: Okay, this should use `SmallVector` as well, don't bother dividing by the size because the one reported from the ELF is absolute. Then just make the data inside `uint8_t`. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang-tools-extra] [flang] [libcxx] [lld] [compiler-rt] [lldb] [clang] [llvm] [libc] [openmp] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -58,6 +60,22 @@ class GlobalTy { void setPtr(void *P) { Ptr = P; } }; +typedef void *IntPtrT; +struct __llvm_profile_data { +#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer) Type Name; +#include "llvm/ProfileData/InstrProfData.inc" +}; + +/// PGO profiling data extracted from a GPU device +struct GPUProfGlobals { + std::string names; + std::vector> counts; + std::vector<__llvm_profile_data> data; + Triple targetTriple; + jhuber6 wrote: That's confusing, how would using a `std::vector` not have that problem as well? I'll need to look into that. https://github.com/llvm/llvm-project/pull/76587 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [libc] [lldb] [llvm] [mlir] [compiler-rt] [lld] [libcxx] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)
@@ -0,0 +1,7 @@ +/// Some target-specific options are ignored for GPU, so %clang exits with code 0. +// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60 --cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check +// DEFINE: %{check} = %clang -### -c %{gpu_opts} -mcmodel=medium %s +// RUN: %{check} -fbasic-block-sections=all jhuber6 wrote: Offloading compilation for these single-source languages pretty much just combines one "host" compilation job with N "Device" compilation jobs. Doing `--offload-device-only` and `--offload-host-only` simply does one part of that. There's probably some flags that behave differently depending on which end you're compiling on, so maybe it would be useful for separating that behavior if needed. https://github.com/llvm/llvm-project/pull/79222 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [llvm] [lldb] [lld] [compiler-rt] [clang] [mlir] [libc] [libcxx] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)
@@ -0,0 +1,7 @@ +/// Some target-specific options are ignored for GPU, so %clang exits with code 0. +// DEFINE: %{gpu_opts} = --cuda-gpu-arch=sm_60 --cuda-path=%S/Inputs/CUDA/usr/local/cuda --no-cuda-version-check +// DEFINE: %{check} = %clang -### -c %{gpu_opts} -mcmodel=medium %s +// RUN: %{check} -fbasic-block-sections=all + +// REDEFINE: %{gpu_opts} = -x hip --rocm-path=%S/Inputs/rocm -nogpulib jhuber6 wrote: Should probably include `-nogpuinc` as well. Best way to avoid spurious failures due to lack of a local CUDA / ROCm installation. Maybe in the future LLVM based offloading won't depend on so much external stuff. https://github.com/llvm/llvm-project/pull/79222 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [libc] [clang] [openmp] [lld] [clang-tools-extra] [lldb] [libcxx] [compiler-rt] [mlir] [llvm] [pstl] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)
https://github.com/jhuber6 approved this pull request. https://github.com/llvm/llvm-project/pull/79222 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lldb] [pstl] [llvm] [mlir] [libc] [compiler-rt] [libcxx] [openmp] [clang-tools-extra] [clang] [lld] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)
@@ -0,0 +1,5 @@ +/// Some target-specific options are ignored for GPU, so %clang exits with code 0. +// DEFINE: %{check} = %clang -### -c -mcmodel=medium jhuber6 wrote: Probably depends on the option we're testing. We could do both. https://github.com/llvm/llvm-project/pull/79222 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lldb] [clang] [openmp] [compiler-rt] [lld] [llvm] [libc] [libcxx] [clang-tools-extra] [mlir] [pstl] [Driver] Test ignored target-specific options for AMDGPU/NVPTX (PR #79222)
jhuber6 wrote: Maybe need to specify `--target=x86_64-unknown-linux-gnu` in the test? https://github.com/llvm/llvm-project/pull/79222 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [compiler-rt] [flang] [libcxx] [clang] [llvm] [clang-tools-extra] [lldb] [lld] [libc] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79373 >From 145b7bc932ce3ffa46545cd7af29b1c93981429c Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Wed, 24 Jan 2024 15:34:00 -0600 Subject: [PATCH 1/3] [NVPTX] Add support for -march=native in standalone NVPTX Summary: We support `--target=nvptx64-nvidia-cuda` as a way to target the NVPTX architecture from standard CPU. This patch simply uses the existing support for handling `--offload-arch=native` to also apply to the standalone toolchain. --- clang/lib/Driver/ToolChains/Cuda.cpp | 61 +- clang/lib/Driver/ToolChains/Cuda.h | 10 ++-- clang/test/Driver/nvptx-cuda-system-arch.c | 5 ++ 3 files changed, 45 insertions(+), 31 deletions(-) diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 1462576ca870e6f..6215c43b5fc96bd 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -738,9 +738,18 @@ NVPTXToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args, if (!llvm::is_contained(*DAL, A)) DAL->append(A); - if (!DAL->hasArg(options::OPT_march_EQ)) + if (!DAL->hasArg(options::OPT_march_EQ)) { DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), CudaArchToString(CudaArch::CudaDefault)); + } else if (DAL->getLastArgValue(options::OPT_march_EQ) == "native") { +auto GPUsOrErr = getSystemGPUArchs(Args); +if (!GPUsOrErr) + getDriver().Diag(diag::err_drv_undetermined_gpu_arch) + << getArchName() << llvm::toString(GPUsOrErr.takeError()) << "-march"; +else + DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), +Args.MakeArgString(GPUsOrErr->front())); + } return DAL; } @@ -783,6 +792,31 @@ void NVPTXToolChain::adjustDebugInfoKind( } } +Expected> +NVPTXToolChain::getSystemGPUArchs(const ArgList &Args) const { + // Detect NVIDIA GPUs availible on the system. + std::string Program; + if (Arg *A = Args.getLastArg(options::OPT_nvptx_arch_tool_EQ)) +Program = A->getValue(); + else +Program = GetProgramPath("nvptx-arch"); + + auto StdoutOrErr = executeToolChainProgram(Program); + if (!StdoutOrErr) +return StdoutOrErr.takeError(); + + SmallVector GPUArchs; + for (StringRef Arch : llvm::split((*StdoutOrErr)->getBuffer(), "\n")) +if (!Arch.empty()) + GPUArchs.push_back(Arch.str()); + + if (GPUArchs.empty()) +return llvm::createStringError(std::error_code(), + "No NVIDIA GPU detected in the system"); + + return std::move(GPUArchs); +} + /// CUDA toolchain. Our assembler is ptxas, and our "linker" is fatbinary, /// which isn't properly a linker but nonetheless performs the step of stitching /// together object files from the assembler into a single blob. @@ -948,31 +982,6 @@ CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args, return DAL; } -Expected> -CudaToolChain::getSystemGPUArchs(const ArgList &Args) const { - // Detect NVIDIA GPUs availible on the system. - std::string Program; - if (Arg *A = Args.getLastArg(options::OPT_nvptx_arch_tool_EQ)) -Program = A->getValue(); - else -Program = GetProgramPath("nvptx-arch"); - - auto StdoutOrErr = executeToolChainProgram(Program); - if (!StdoutOrErr) -return StdoutOrErr.takeError(); - - SmallVector GPUArchs; - for (StringRef Arch : llvm::split((*StdoutOrErr)->getBuffer(), "\n")) -if (!Arch.empty()) - GPUArchs.push_back(Arch.str()); - - if (GPUArchs.empty()) -return llvm::createStringError(std::error_code(), - "No NVIDIA GPU detected in the system"); - - return std::move(GPUArchs); -} - Tool *NVPTXToolChain::buildAssembler() const { return new tools::NVPTX::Assembler(*this); } diff --git a/clang/lib/Driver/ToolChains/Cuda.h b/clang/lib/Driver/ToolChains/Cuda.h index 8a053f3393e1206..43c17ba7c0ba03d 100644 --- a/clang/lib/Driver/ToolChains/Cuda.h +++ b/clang/lib/Driver/ToolChains/Cuda.h @@ -168,6 +168,11 @@ class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain { unsigned GetDefaultDwarfVersion() const override { return 2; } unsigned getMaxDwarfVersion() const override { return 2; } + /// Uses nvptx-arch tool to get arch of the system GPU. Will return error + /// if unable to find one. + virtual Expected> + getSystemGPUArchs(const llvm::opt::ArgList &Args) const override; + CudaInstallationDetector CudaInstallation; protected: @@ -223,11 +228,6 @@ class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain { const ToolChain &HostTC; - /// Uses nvptx-arch tool to get arch of the system GPU. Will return error - /// if unable to find one. - virtual Expected> - getSystemGPUArchs(const llvm::opt::ArgList &Args) const override; - protected: Tool *buildAssembler() const override; // ptxas Tool *buildLinker() const override;
[Lldb-commits] [lld] [lldb] [llvm] [compiler-rt] [clang-tools-extra] [libc] [clang] [flang] [libcxx] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)
jhuber6 wrote: > On the other hand, I'd be OK with providing --offload-arch=native translating > into "compile for all present GPU variants", with a possibility to further > adjust the selected set with the usual --no-offload-arch-foo, if the user > needs to. This will at least produce code that will run on the machine where > it's built, be somewhat consistent and is still adjustable by the user when > the default choice will inevitably be wrong. This is what we already do, but this is somewhat tangential. I've updated this patch to present the warning in the case of multiply GPUs being detected, so I don't think there's a concern here with the user being confused. If they have two GPUs, the warning will tell them which one it's using with the correct `sm_` value to specify it manually if they so wish. If there is only one GPU on the system, it should be obvious that it's going to be targeted. https://github.com/llvm/llvm-project/pull/79373 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [lld] [libcxx] [flang] [compiler-rt] [libc] [clang-tools-extra] [llvm] [lldb] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)
jhuber6 wrote: > User confusion is only part of the issue here. With any single GPU choice we > would still potentially produce a nonworking binary, if our GPU choice does > not match what the user wants. > > "all GPUs" has the advantage of always producing the binary that's guaranteed > to work. Granted, in the case of multiple GPUs it comes with the compilation > time overhead, but I think it's a better trade-off than compiling faster, but > not working. If the overhead is unacceptable, then we can tweak the build, > but in that case, the user may as well just specify the desired architectures > explicitly. I think the semantics of `native` on other architectures are clear enough here. This combined with the fact that using `-march=native` will error out in the case of no GPUs available, or give a warning if more than one GPU is available, should be sufficiently clear what it's doing. This obviously falls apart if you compile with `-march=native` and then move it off of the system you compiled it for, but the same applies for standard x64 binaries I feel. Realistically, very, very few casual users are going to be using direct NVPTX targeting. The current use-case is for building tests directly for the GPU without needing to handle calling `amdgpu-arch` and `nvptx-arch` manually in CMake. If I had this in, then I could simplify a lot of CMake code in my `libc` project by just letting the compiler handle the autodetection. Then one less random program dependency is removed from the build process. AMDGPU already has `-mcpu=native` so I'd like NVPTX to match if possible. https://github.com/llvm/llvm-project/pull/79373 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [clang-tools-extra] [lldb] [libc] [libcxx] [lld] [llvm] [flang] [compiler-rt] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)
jhuber6 wrote: > > I think the semantics of native on other architectures are clear enough > > here. > > I don't think we have the same idea about that. Let's spell it out, so > there's no confusion. > > [GCC > manual](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-march-16) > says: > > > Using -march=native enables all instruction subsets supported by the local > > machine (hence the result might not run on different machines) > > The way I read it "all instruction subsets supported by the local machine" > would be what all-GPUs strategy would do. The binary is expected to run on > all GPU architecture variants available on the machine. > > Granted, gcc was not written with GPUs in mind, but it's a good baseline for > establishing existing conventions for the meaning of `-march=native`. This more or less depends on what your definition of "local machine" is when it comes to a system augmented with GPUs. The verbiage of "**The** local machine" implies an assumption that there is only one, which I personally find consistent with just selecting the first GPU found on the system. There is ambiguity in how we should treat this in the case of multiple GPUs, but that's what the warning message is for. it informs the user that the "native" architecture is somewhat ambiguous and that the first one was selected. Further, our current default makes sense, because it corresponds to Device ID zero in CUDA, which means that unless you change the environment via `CUDA_VISIBLE_DEVICES` or something, it will work on the default device. So, in the case there is one device, the behavior is consistent with `-march=native`. In the case where there are two, we make an implicit decision to target the first GPU and inform the user. This method of compilation is not like CUDA, so we can't target all the GPUs at the same time. This will be useful in cases where we want to write code that simply targets a GPU that will "work". We have CMake code around LLVM already to do this, so it would be nice to get rid of that. https://github.com/llvm/llvm-project/pull/79373 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [flang] [clang] [clang-tools-extra] [llvm] [compiler-rt] [libcxx] [libc] [lldb] [lld] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)
jhuber6 wrote: > > This method of compilation is not like CUDA, so we can't target all the > > GPUs at the same time. > > I think this is the key fact I was missing. If the patch is only for a > standalone compilation which does not do multi-GPU compilation in principle, > then your approach makes sense. > > I was arguing from the normal offloading which does have ability to target > multiple GPUs. Yes, this is more similar to OpenCL or just regular CPU compilation where we have a single job that creates a simple executable, terminal application style. So given a single target, the desire is to "pick me the one that will work on the default CUDA device without me needing to check." type thing. https://github.com/llvm/llvm-project/pull/79373 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lld] [lldb] [libcxx] [compiler-rt] [clang-tools-extra] [llvm] [libc] [clang] [flang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)
jhuber6 wrote: > > This method of compilation is not like CUDA, so we can't target all the > > GPUs at the same time. > > Can you clarify for me -- what are you compiling where it's impossible to > target multiple GPUs in the binary? I'm confused because Art is understanding > that it's not CUDA, but we're modifying the CUDA driver here? The idea is to simply compile C / C++ code directly targeting NVPTX rather than going through offloading languages like CUDA or OpenMP. This is more or less what cross-compiling is. We specify `--target=nvptx64-nvidia-cuda` which instructs the compiler to cross-compile the C / C++ targeting NVPTX. This results in a workflow that is very close to compiling a standard executable by design. This is mostly related to my work on the LLVM C library for GPUs [which I did a talk on that goes in more detail](https://www.youtube.com/watch?v=_LLGc48GYHc) Right now, with the LLVM `libc` infrastructure I can do the following on my AMD GPU. ``` #include int main() { puts("Hello World!"); } ``` And compile it and run it more or less. ``` $ clang hello.c --target=amdgcn-amd-amdhsa -mcpu=native -flto -lc crt1.o $ amdhsa_loader a.out Hello World! ``` This works with AMD currently, and I want it to work for NVPTX so I can remove some ugly, annoying code in the `libc` project. This is how I'm running unit tests targeting the GPU in that project, which needs to run on the user's GPU. I'd rather just use `-march=native` than detect it manually in CMake. https://github.com/llvm/llvm-project/pull/79373 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lld] [lldb] [libcxx] [compiler-rt] [clang-tools-extra] [llvm] [libc] [clang] [flang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)
jhuber6 wrote: > I...think I understand. > > Is the output of this compilation step a cubin, then? Yes, it will spit out a simple `cubin` instead of a fatbinary. The NVIDIA toolchain is much worse about this stuff than the AMD one, but in general it works. You can check with `-###` or whatever like in https://godbolt.org/z/zWf5jezYP. https://github.com/llvm/llvm-project/pull/79373 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [flang] [clang] [libc] [compiler-rt] [clang-tools-extra] [llvm] [lld] [lldb] [libcxx] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)
jhuber6 wrote: > Got it, okay, thanks. > > Since this change only applies to `--target=nvptx64-nvidia-cuda`, fine by me. > Thanks for putting up with our scrutiny. :) No problem, I probably should've have been clearer in my commit messages. https://github.com/llvm/llvm-project/pull/79373 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang-tools-extra] [llvm] [libc] [clang] [libcxx] [lldb] [lld] [flang] [compiler-rt] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79373 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lldb] [libc][libm][GPU] Add missing vendor entrypoints to the GPU version of `libm` (PR #66034)
@@ -0,0 +1,17 @@ +//===-- Implementation of the GPU lroundf function jhuber6 wrote: This and some other headers need to be reformatted. https://github.com/llvm/llvm-project/pull/66034 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lldb] [libc][libm][GPU] Add missing vendor entrypoints to the GPU version of `libm` (PR #66034)
@@ -0,0 +1,21 @@ +//===-- Implementation of the GPU atan2f function jhuber6 wrote: Fix header https://github.com/llvm/llvm-project/pull/66034 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lldb] [libc][libm][GPU] Add missing vendor entrypoints to the GPU version of `libm` (PR #66034)
https://github.com/jhuber6 approved this pull request. LG after fixing the header formatting. https://github.com/llvm/llvm-project/pull/66034 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lldb] [libc][libm][GPU] Add missing vendor entrypoints to the GPU version of `libm` (PR #66034)
@@ -0,0 +1,19 @@ +//===-- Implementation of the GPU logbf function jhuber6 wrote: Fix header https://github.com/llvm/llvm-project/pull/66034 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [lldb] [libc][libm][GPU] Add missing vendor entrypoints to the GPU version of `libm` (PR #66034)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/66034 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [mlir] [polly] Add static to command line option (cl::opt) (PR #126243)
jhuber6 wrote: > So I should do like open 5 branches and make a PR for each? > > As for the cases mentioned, when I tried compile all projects (compile with > -DLLVM_ENABLE_PROJECTS=all), I did encounter such cases when compiling and > I've fixed them all. Now all projects can be successfully built, so I think > it should be fine? Yes, one PR for each project please. Also refer to the failing CI for what's broken, e.g. ``` /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-96x6s-1/llvm-project/github-pull-requests/flang/lib/Lower/OpenMP/Utils.cpp:30:28: error: static declaration of 'treatIndexAsSection' follows non-static declaration static llvm::cl::opt treatIndexAsSection( ^ /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-96x6s-1/llvm-project/github-pull-requests/flang/lib/Lower/OpenMP/Utils.h:19:28: note: previous declaration is here extern llvm::cl::opt treatIndexAsSection; ^ ``` https://github.com/llvm/llvm-project/pull/126243 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [mlir] [polly] Add static to command line option (cl::opt) (PR #126243)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/126243 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
[Lldb-commits] [clang] [clang-tools-extra] [flang] [lldb] [llvm] [mlir] [polly] Add static to command line option (cl::opt) (PR #126243)
https://github.com/jhuber6 commented: This should definitely be split up. Also some options are referenced in multiple places, i.e. ``` // foo.h extern cl::opt <...> // foo.cpp cl::opt <...> use opt //bar.cpp use opt. ``` https://github.com/llvm/llvm-project/pull/126243 ___ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits