[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls
gregrodgers added a comment. I like the idea of using an automatic include as a cc1 option (-include). However, I would prefer a more general automatic include for OpenMP, not just for math functions (__clang_cuda_device_functions.h). Clang cuda automatically includes __clang_cuda_runtime_wrapper.h. It includes other files as needed like __clang_cuda_device_functions.h. Lets hypothetically call my proposed automatic include for OpenMP , __clang_openmp_runtime_wrapper.h. Just because clang cuda defines functions in __clang_cuda_device_functins.h and automatically includes them does not make it right for OpenMP. In general, function definitions in headers should be avoided. The current function definitions in __clang_cuda_device_functions.h only work for hostile nv GPUs :). This is how we can avoid function definitions in the headers. In a new openmp build process, we can build libm-nvptx.bc. This can be done by compiling __clang_cuda_device_functions.h as a device-only compile. Assuming current naming conventions, these files would be installed in the same directory as libomptarget.so (.../lib). How do we tell clang cc1 to use this bc library? Use -mlink-builtin-bitcode. AddMathDeviceFunctions would then look something like this. if (this is for device cc1) { CC1Args.push_back("-mlink-builtin-bitcode"); if ( getTriple().isNVPTX()) CC1Args.push_back(DriverArgs.MakeArgString("libm-nvptx.bc")); if ( getTriple().getArch() == llvm::Triple::amdgcn); CC1Args.push_back(DriverArgs.MakeArgString("libm-amdgcn.bc")); } You can think of libm-.bc file as the device library equivalent of the host libm.so or libm.a. This concept of "host-consistent" library definitions can go beyond math libraries. In fact, I believe we should co-opt the -l (--library) option. The driver toolchain should look for device bc libraries for any -lX command line option. This gives us a strategy for adding user-defined device libraries. The above code hints at the idea of architecture specific bc files (nvptx vs amdgcn). The nvptx version would call into the cuda libdevice. For radeon processors, we may want processor-optimized versions of the libraries, just like there are sub-architecture optimized versions of the cuda libdevice. If we build --cuda-cuda-gpu-arch optimized versions of math bc libs, then the above code will get a bit more complex depending on naming convention of the bc lib and the value of --cuda-gpu-arch (which should have an alias --offload-arch). Using a bc lib, significantly reduces the complexity of __clang_openmp_runtime_wrapper.h. We do not not need or see math device function definitions or the nv headers that they need. However, it does need to correct the behaviour of rogue system headers that define host-optimized functions. We can fix this by adding the following to __clang_openmp_runtime_wrapper.h so that host passes still get host-optimized functions. #if defined(__AMDGCN__) || defined(__NVPTX__) #define __NO_INLINE__ 1 #endif There is a tradeoff to using pre-compiled bc libs. It makes compile-time macro logic hard to implement. For example, we cant do this #if defined(__CLANG_CUDA_APPROX_TRANSCENDENTALS__) #define __FAST_OR_SLOW(fast, slow) fast #else #define __FAST_OR_SLOW(fast, slow) slow #endif The openmp build process would either need to build alternative bc libraries for each option or a supplemental bc library to address these types of options. If some option is turned on, then an alternative lib or particular ordering of libs would be used to build the clang cc1 command. For example, the above code for AddMathDeviceFunctions would have this ... if ( getTriple().isNVPTX()) { if (LangOpts.CUDADeviceApproxTranscendentals || LangOpts.FastMath) { CC1Args.push_back("-mlink-builtin-bitcode"); CC1Args.push_back(DriverArgs.MakeArgString("libm-fast-nvptx.bc")); } CC1Args.push_back("-mlink-builtin-bitcode"); CC1Args.push_back(DriverArgs.MakeArgString("libm-nvptx.bc")); } I personally believe that pre-built bc libraries with some consistency to their host-equivalent libraries is a more sane approach for device libraries than complex header logic that is customized for each architecture. Repository: rC Clang https://reviews.llvm.org/D47849 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation
gregrodgers added a comment. I have a longer comment on header files, but let me first understand this patch. IIUC,the concept of this patch is to fake the macros to think it is seeing a host on the device patch. if ((LangOpts.CUDA || LangOpts.OpenMPIsDevice) && PP.getAuxTargetInfo()) InitializePredefinedAuxMacros(*PP.getAuxTargetInfo(), Builder); That would be counterproductive because well-behaved headers that only provide optized asm definitions would wrap that asm with #ifdef __x86_64__ do some x86 asm function definition #else just provide the function declaration to be implemented somewhere else. #endif What am I missing? Repository: rC Clang https://reviews.llvm.org/D50845 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42800: Let CUDA toolchain support amdgpu target
gregrodgers added a comment. Here my replys to the inline comments. Everything should be fixed in the next revision. Comment at: include/clang/Basic/Cuda.h:79 COMPUTE_72, + COMPUTE_GCN, }; t-tye wrote: > Suggest using amdgcn which matches the architecture name in > https://llvm.org/docs/AMDGPUUsage.html#processors . Yes, I will add them in the update. Comment at: include/clang/Basic/Cuda.h:79 COMPUTE_72, + COMPUTE_GCN, }; gregrodgers wrote: > t-tye wrote: > > Suggest using amdgcn which matches the architecture name in > > https://llvm.org/docs/AMDGPUUsage.html#processors . > Yes, I will add them in the update. Done in next update Comment at: lib/Basic/Targets/AMDGPU.cpp:437 + case CudaArch::UNKNOWN: +assert(false && "No GPU arch when compiling CUDA device code."); +return ""; arsenm wrote: > llvm_unreachable Fixed in next update Comment at: lib/Basic/Targets/AMDGPU.h:85 return TT.getEnvironmentName() == "amdgiz" || + TT.getOS() == llvm::Triple::CUDA || TT.getEnvironmentName() == "amdgizcl"; yaxunl wrote: > t-tye wrote: > > We still want to use the amdhsa OS for amdgpu which currently supports the > > different environments. So can cuda simply support the same environments? > > Is the plan is to eliminate the environments and simply always use the > > default address space for generic so this code is no longer needed? > Currently we already use amdgiz by default. This is no longer needed. removed in next update Comment at: lib/Driver/ToolChains/Cuda.cpp:359-361 + addBCLib(C, Args, CmdArgs, LibraryPaths, "opencl.amdgcn.bc"); + addBCLib(C, Args, CmdArgs, LibraryPaths, "ockl.amdgcn.bc"); + addBCLib(C, Args, CmdArgs, LibraryPaths, "irif.amdgcn.bc"); arsenm wrote: > Why is this done under an NVPTX:: class Because we are not creating a new toolchain for AMDGCN. We modify the logic in the tool constructor as needed for AMDGCN, keeping the ability to provide a set of mixed targets. Comment at: lib/Driver/ToolChains/Cuda.cpp:403-415 + // If Verbose, save input for llc in /tmp and print all symbols + if (Args.hasArg(options::OPT_v)) { +ArgStringList nmArgs; +nmArgs.push_back(ResultingBitcodeF); +nmArgs.push_back("-debug-syms"); +const char *nmExec = Args.MakeArgString(C.getDriver().Dir + "/llvm-nm"); +C.addCommand(llvm::make_unique(JA, *this, nmExec, nmArgs, Inputs)); Hahnfeld wrote: > This never gets cleaned up! OK, Deleted in revision. Comment at: lib/Driver/ToolChains/Cuda.cpp:531-534 + SmallString<256> OutputFileName(Output.getFilename()); + if (JA.isOffloading(Action::OFK_OpenMP)) +llvm::sys::path::replace_extension(OutputFileName, "cubin"); + CmdArgs.push_back(Args.MakeArgString(OutputFileName)); Hahnfeld wrote: > That is already done in `TC.getInputFilename(Output)` (since rC318763), the > same function call that you are removing here... Fixed in next update Comment at: lib/Driver/ToolChains/Cuda.cpp:639-640 +CmdArgs2.push_back(Args.MakeArgString(Output.getFilename())); +const char *Exec2 = +Args.MakeArgString(C.getDriver().Dir + "/clang-fixup-fatbin"); +C.addCommand( Hahnfeld wrote: > `clang-fixup-fatbin` is not upstreamed and won't work. Sounds like a horrible > name btw... Major cleanup here in the next update. It is not a bad name when you see the update and the comments in the update. Comment at: lib/Driver/ToolChains/Cuda.cpp:788-793 + // Do not add -link-cuda-bitcode or ptx42 features if gfx + for (Arg *A : DriverArgs) +if (A->getOption().matches(options::OPT_cuda_gpu_arch_EQ) && +StringRef(A->getValue()).startswith("gfx")) + return; + Hahnfeld wrote: > You should use `GpuArch` which comes from `DriverArgs.getLastArgValue`: The > last `-march` overrides previous arguments. Nice catch. I will fix this in the next update. https://reviews.llvm.org/D42800 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D48455: Remove hip.amdgcn.bc hc.amdgcn.bc from HIP Toolchains
gregrodgers added a comment. Why not provide a specific list of --hip-device-lib= for VDI builds? I am not sure about defining functions inside headers instead of using a hip bc lib. Repository: rC Clang https://reviews.llvm.org/D48455 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D84743: [Clang][AMDGCN] Universal device offloading macros header
gregrodgers added a comment. This is all excellent feedback. Thank you. I don't understand what I see on the godbolt link. So far, we have only tested with clang. We will test with gcc to understand the fail. I will make the change to use numeric values for _DEVICE_ARCH and change "UNKNOWN" to some integer (maybe -1). The value _DEVICE_GPU is intended to be generational within a specific _DEVICE_ARCH. To be clear, this is primarily for users or library writers to implement device specific or host-only code. This is not something that should be automatic. Users or library authors will opt-in with their own include. So maybe it does not belong in clang/lib/Headers. As noted in the header comment, I expect the community to help keep this up to date as offloading technology evolves. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D84743/new/ https://reviews.llvm.org/D84743 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D99235: [HIP] Change to code object v4
gregrodgers added inline comments. Comment at: clang/lib/Driver/ToolChains/HIP.cpp:116 + if (getOrCheckAMDGPUCodeObjectVersion(C.getDriver(), Args) >= 4) +OffloadKind = OffloadKind + "v4"; for (const auto &II : Inputs) { yaxunl wrote: > tra wrote: > > We do not do it for v2/v3. Could you elaborate on what makes v4 special > > that it needs its own offload kind? > > > > Will you need to target different object versions simultaneously? > > If yes, how? AFAICT, the version specified is currently global and applies > > to all sub-compilations. > > If not, then do we really need to encode the version in the offload target > > name? > Introducing hipv4 is to differentiate with code object version 2 and 3 which > are used by HIP applications compiled by older version of clang. ROCm > platform is required to keep binary backward compatibility, i.e., old HIP > applications built by ROCm 4.0 should run on ROCm 4.1. The bundle ID has > different interpretation depending on whether it is version 2/3 or version 4, > e.g. 'gfx906' implies xnack and sramecc off with code object v2/3 but implies > xnack and sramecc ANY with v4. Since code object version 2/3 uses 'hip', code > object version 4 needs to be different, therefore it uses 'hipv4'. We need to start thinking in terms of offload requirements of a compiled image vs the capabilities of a particular active runtime on a particular GPU. This concept can eliminate the need for a new offload kind. For AMD, we would add the requirement of code object v4 (cov4) if built for code object v4 or greater.This means it can only run on a system with that capability. This concept works well with requirements xnack+, xnack-, sramecc+ and sramecc-. The bundle entry id is the offload-kind, the triple, and the list of image requirements. The gpu type (offload-arch) is really an image requirement. In this model, there is no requirement for xnack-any. The lack of the xnack+ or xnack- requirement implies "any" which means it can run on any capable machine. This is a general model that is extensible. To make this work, a runtime must be able to detect the capabilities for any requirement that could be tagged on an image. In fact, every requirement of an embedded image must have its capability detected by the runtime for that offload image to be usable. However, a system's runtime could have more capabilities than the requirements of an image. So in the case of xnack, the lack of xnack- or xnack+ will be acceptable no matter what the xnack capability of the runtime is. If the compiler driver puts the requirement cov4 in the bundle entry id requirements field the runtime will not run that image unless the GPU loader supports v4 or greater. The clang driver can create the requirement xnack- for code object < 4 on those GPUs that support either xnack mode. This will ensure the image will gracefully fail or use an alternative image if the runtime capability is xnack+. But the cov4 requirement is mostly unrelated to xnack . It is about the capability of the GPU loader. If the code object version >= 4, then it will be tagged with the cov4 requirement. This would prevent an old system that does not have a newer software stack from running an image with a cov4 requirement. This general notion of image requirements and runtime capabilities is extensible to other offload architectures. Suppose cuda version 12 compilation REQUIRES that a cuda version 12 runtime. Old runtimes would never display cuv12 capability and would fail to run any image created with the requirement cuv12. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99235/new/ https://reviews.llvm.org/D99235 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D114890: [OpenMP] Make the new device runtime the default
gregrodgers added a comment. We want amdgcn to remain on old deviceRTL till we have verified it . I made inline comments on how this could be done. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5905 // runtime. if (Args.hasFlag(options::OPT_fopenmp_target_new_runtime, options::OPT_fno_openmp_target_new_runtime, Add a check for amdgcn something like this. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5907 options::OPT_fno_openmp_target_new_runtime, - /*Default=*/false)) + /*Default=*/true)) CmdArgs.push_back("-fopenmp-target-new-runtime"); This will keep AMDGCN on old runtime till we are ready to switch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D114890/new/ https://reviews.llvm.org/D114890 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D114890: [OpenMP] Make the new device runtime the default
gregrodgers requested changes to this revision. gregrodgers added a comment. This revision now requires changes to proceed. I forgot to add the "request changes" action. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D114890/new/ https://reviews.llvm.org/D114890 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D114890: [OpenMP] Make the new device runtime the default
gregrodgers accepted this revision. gregrodgers added a comment. This revision is now accepted and ready to land. this is ok as is Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D114890/new/ https://reviews.llvm.org/D114890 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D76987: Rename options --cuda-gpu-arch and --no-cuda-gpu-arch
gregrodgers added a comment. This was discussed on llvm-dev three years ago. Here is the thread. http://lists.llvm.org/pipermail/llvm-dev/2017-February/109930.html The last name discussed was "-- offload-arch". I don't believe we need a list option anymore. So ignore the very old request for --offload-archs. I am ok with the patch the way it is. In the future, we should consider renaming the CudaArch class to OffloadArch class . Also the GpuArchList is currently only initialized in CudaActionBuilder. Eventually this is will have to be done for HIPActionBuilder and OpenMPActionBuilder. Could you consider creating a function to InitializeGpuArchList ? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76987/new/ https://reviews.llvm.org/D76987 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D146075: [flang][driver][openmp] Write MLIR for -save-temps
gregrodgers accepted this revision. gregrodgers added a comment. This looks good. Please merge. I found it very useful especially in the context of other generated temp files generated after llvm llinking and optimization in the offload driver. For example just listing the temp files with ls -ltr provides a somewhat visual ordering of the compilation flow. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D146075/new/ https://reviews.llvm.org/D146075 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D147666: [OPENMP] Adds /lib to rpath to avoid need to set LD_LIBRARY_PATH to find plugins.
gregrodgers created this revision. Herald added subscribers: sunshaoce, guansong, yaxunl. Herald added a project: All. gregrodgers requested review of this revision. Herald added subscribers: cfe-commits, jplehr, sstefan1, MaskRay. Herald added a reviewer: jdoerfert. Herald added a project: clang. Currently, The user of an OpenMP application needs to set LD_LIBRARY_PATH to the directory that contains the offload plugins. This change adds the rpath for OpenMP applications. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D147666 Files: clang/lib/Driver/ToolChains/CommonArgs.cpp Index: clang/lib/Driver/ToolChains/CommonArgs.cpp === --- clang/lib/Driver/ToolChains/CommonArgs.cpp +++ clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -813,7 +813,9 @@ ArgStringList &CmdArgs) { // Enable -frtlib-add-rpath by default for the case of VE. const bool IsVE = TC.getTriple().isVE(); - bool DefaultValue = IsVE; + const bool IsOpenMP = (TC.getDriver().getOpenMPRuntime(Args) + != Driver::OMPRT_Unknown); + bool DefaultValue = IsVE || IsOpenMP ; if (!Args.hasFlag(options::OPT_frtlib_add_rpath, options::OPT_fno_rtlib_add_rpath, DefaultValue)) return; @@ -822,6 +824,16 @@ if (TC.getVFS().exists(CandidateRPath)) { CmdArgs.push_back("-rpath"); CmdArgs.push_back(Args.MakeArgString(CandidateRPath)); + } else { +if (IsOpenMP) { + SmallString<256> TopLibPath = +llvm::sys::path::parent_path(TC.getDriver().Dir); + llvm::sys::path::append(TopLibPath, "lib"); + if (TC.getVFS().exists(TopLibPath)) { +CmdArgs.push_back("-rpath"); +CmdArgs.push_back(Args.MakeArgString(TopLibPath)); + } +} } } Index: clang/lib/Driver/ToolChains/CommonArgs.cpp === --- clang/lib/Driver/ToolChains/CommonArgs.cpp +++ clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -813,7 +813,9 @@ ArgStringList &CmdArgs) { // Enable -frtlib-add-rpath by default for the case of VE. const bool IsVE = TC.getTriple().isVE(); - bool DefaultValue = IsVE; + const bool IsOpenMP = (TC.getDriver().getOpenMPRuntime(Args) + != Driver::OMPRT_Unknown); + bool DefaultValue = IsVE || IsOpenMP ; if (!Args.hasFlag(options::OPT_frtlib_add_rpath, options::OPT_fno_rtlib_add_rpath, DefaultValue)) return; @@ -822,6 +824,16 @@ if (TC.getVFS().exists(CandidateRPath)) { CmdArgs.push_back("-rpath"); CmdArgs.push_back(Args.MakeArgString(CandidateRPath)); + } else { +if (IsOpenMP) { + SmallString<256> TopLibPath = +llvm::sys::path::parent_path(TC.getDriver().Dir); + llvm::sys::path::append(TopLibPath, "lib"); + if (TC.getVFS().exists(TopLibPath)) { +CmdArgs.push_back("-rpath"); +CmdArgs.push_back(Args.MakeArgString(TopLibPath)); + } +} } } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D147666: [OPENMP] Adds /lib to rpath to avoid need to set LD_LIBRARY_PATH to find plugins.
gregrodgers abandoned this revision. gregrodgers added a comment. Its ugly, but to avoid requirement to set LD_LIBRARY_PATH for end-users who may need LD_LIBRARY_PATH for their own application, we will modify the compiler installation with these bash commands where _INSTALL_DIR is the installation directory. echo "-Wl,-rpath=${_INSTALL_DIR}/lib" > ${_INSTALL_DIR}/bin/rpath.cfg echo "-L${_INSTALL_DIR}/lib" >> ${_INSTALL_DIR}/bin/rpath.cfg ln -sf rpath.cfg ${_INSTALL_DIR}/bin/clang++.cfg ln -sf rpath.cfg ${_INSTALL_DIR}/bin/clang.cfg ln -sf rpath.cfg ${_INSTALL_DIR}/bin/flang.cfg ln -sf rpath.cfg ${_INSTALL_DIR}/bin/flang-new.cfg My apologies to linux distro packaging teams. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D147666/new/ https://reviews.llvm.org/D147666 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42800: Let CUDA toolchain support amdgpu target
gregrodgers requested changes to this revision. gregrodgers added a comment. This revision now requires changes to proceed. Thanks to everyone for the reviews. I hope I replied to all inline comments. Since I sent this to Sam to post, we discovered a major shortcoming. As tra points out, there is a lot of cuda headers in the cuda sdk that are processed. We are able to override asm() expansions with #undef and redefine as an equivalent amdgpu component so the compiler never sees the asm(). I am sure we will need to add more redefines as we broaden our testing. But that is not the big problem. We would like to be able to run cudaclang for AMD GPUs without an install of cuda. Of course you must always install cuda if any of your targeted GPUs are NVidia GPUs. To run cudaclang without cuda when only non-NVidia gpus are specified, we need an open set of headers and we must replace the fatbin tools used in the toolchain. The later can be addressed by using the libomptarget methods for embedding multiple target GPU objects. The former is going to take a lot of work. I am going to be sending an updated patch that has the stubs for the open headers noted in __clang_cuda_runtime_wrapper.h. They will be included with the CC1 flag -D__USE_OPEN_HEADERS__. This will be generated by the cuda driver when it finds no cuda installation and all target GPUs are not NVidia. https://reviews.llvm.org/D42800 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42800: Let CUDA toolchain support amdgpu target
gregrodgers added a comment. Sorry, all my great inline comments got lost somehow. I am a newbie to Phabricator. I will try to reconstruct my comments. https://reviews.llvm.org/D42800 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
gregrodgers requested changes to this revision. gregrodgers added a comment. This revision now requires changes to proceed. I have two serious concerns with this tool . 1. It does not provide the infrastructure to identify runtime capabilities to satisfy requirements of a compiled image. 2. It is not architecturally neutral or extensible to other architectures. I have a different proposed tool called offload-arch . Here is the help text for offload-arch. grodgers@r4:/tmp/grodgers/git/aomp13/llvm-project/clang/tools/offload-arch/build$ ./offload-arch -h offload-arch: Print offload architecture(s) for the current active system. or lookup information about offload architectures Usage: offload-arch [ Options ] [ Optional lookup-value ] With no options, offload-arch prints a value for first offload architecture found in current system. This value can be used by various clang frontends. For example, to compile for openmp offloading on current current system one could invoke clang with the following command: clang -fopenmp -openmp-targets=`offload-arch` foo.c If an optional lookup-value is specified, offload-arch will check if the value is either a valid offload-arch or a codename and lookup requested additional information. For example, this provides all information for offload-arch gfx906: offload-arch gfx906 -v Options: -h Print this help message -a Print values for all devices. Don't stop at first device found. -c Print codename -n Print numeric pci-id -t Print recommended offload triple. -v Verbose = -a -c -n -t -l Print long codename found in pci.ids file -r Print capabilities of current system to satisfy runtime requirements of compiled offload images. This option is used by the runtime to choose correct image when multiple compiled images are availble. The alias amdgpu-arch returns 1 if no amdgcn GPU is found. The alias nvidia-arch returns 1 if no cuda GPU is found. These aliases are useful to determine if architecture-specific tests should be run. Or these aliases could be used to conditionally load archecture-specific software. Copyright (c) 2021 ADVANCED MICRO DEVICES, INC. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99949/new/ https://reviews.llvm.org/D99949 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
gregrodgers added a comment. Dependence on hsa is not necessary. The amdgpu and nvidia drivers both use PCI codes available in /sys . We should use architecture independent methods as much as possible. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99949/new/ https://reviews.llvm.org/D99949 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
gregrodgers added inline comments. Comment at: clang/tools/amdgpu-arch/CMakeLists.txt:9 + +find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm) +if (NOT ${hsa-runtime64_FOUND}) What happens when /opt/rocm is not available? Again, we need a cross-architecture mechanism to identify the offload-arch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99949/new/ https://reviews.llvm.org/D99949 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
gregrodgers accepted this revision. gregrodgers added a comment. This revision is now accepted and ready to land. I am removing my objection with the understanding that we will either replace or enhance amdgpu-arch with the cross-architecture tool offload-arch as described in my comments above. Also, this patch changes amd toolchains to use amdgpu-arch. So it will not interfere with how I expect the cross-architecture tool to be used to greatly simplify openmp command line arguments. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99949/new/ https://reviews.llvm.org/D99949 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits