[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jplehr commented: I have only briefly looked at the NVPTX implementation. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); jplehr wrote: Is there a specific reason we do not return the error here, but instead consume and return success? Also, I think this should be `Plugin::success()` to not deviate from what is used in the plugin. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); +} + +// Allocate and construct the AMDGPU kernel. +GenericKernelTy *AMDGPUKernel = Plugin.allocate(); +if (!AMDGPUKernel) + return Plugin::error("Failed to allocate memory for AMDGPU kernel"); + +new (AMDGPUKernel) AMDGPUKernelTy(Name); +if (auto Err = AMDGPUKernel->initImpl(*this, Image)) + return std::move(Err); + +auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>(); +AsyncInfoWrapperTy AsyncInfoWrapper(*this, AsyncInfoPtr); + +if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper)) + return std::move(Err); + +KernelArgsTy KernelArgs = {}; +if (auto Err = AMDGPUKernel->launchImpl(*this, /*NumThread=*/1u, +/*NumBlocks=*/1ul, KernelArgs, +/*Args=*/nullptr, AsyncInfoWrapper)) + return std::move(Err); + +if (auto Err = synchronize(AsyncInfoPtr)) + return std::move(Err); +Error Err = Error::success(); jplehr wrote: Should this be `Plugin::success()` instead here as well? https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jplehr commented: Thanks Joseph. Another two nits. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy { Error synchronize(__tgt_async_info *AsyncInfo); virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0; + /// Invokes any global constructors on the device if present and is required + /// by the target. + virtual Error callGlobalConstructors(GenericPluginTy &Plugin, + DeviceImageTy &Image) { +return Error::success(); + } + + /// Invokes any global destructors on the device if present and is required + /// by the target. + virtual Error callGlobalDestructors(GenericPluginTy &Plugin, + DeviceImageTy &Image) { +return Error::success(); jplehr wrote: Plugin::success() https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy { Error synchronize(__tgt_async_info *AsyncInfo); virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0; + /// Invokes any global constructors on the device if present and is required + /// by the target. + virtual Error callGlobalConstructors(GenericPluginTy &Plugin, + DeviceImageTy &Image) { +return Error::success(); jplehr wrote: Plugin::success() https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); jplehr wrote: That would certainly make it more obvious. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [OpenMP] Fix runtime problem due to wrong map size. (PR #74692)
jplehr wrote: It appears that this breaks the AMDGPU OpenMP Offload buildbot: https://lab.llvm.org/buildbot/#/builders/193/builds/43297 https://github.com/llvm/llvm-project/pull/74692 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP] Fix runtime problem due to wrong map size. (PR #74692)
jplehr wrote: I'm looking into it locally https://github.com/llvm/llvm-project/pull/74692 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP] Fix runtime problem due to wrong map size. (PR #74692)
jplehr wrote: The issue comes from the bot building without setting CMake option `-DLIBOMPTARGET_ENABLE_DEBUG=ON`. This makes the environment variable `LIBOMPTARGET_DEBUG=1` have no effect in the test, i.e., no output for FileCheck to test. https://github.com/llvm/llvm-project/pull/74692 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)
https://github.com/jplehr created https://github.com/llvm/llvm-project/pull/75467 This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. I am a bit unhappy with the way this test is set up, but could not find a better way to do it. Happy to improve that and add more tests then. Marked as XFAIL to first land test and then enable in subsequent patch. >From ea2a9191122c5659aac380803b381f763c816e07 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Wed, 12 Jul 2023 05:04:41 -0400 Subject: [PATCH] [OpenMP][USM] Adds test for -fopenmp-force-usm flag This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Marked as XFAIL to first land test and then enable in subsequent patch. --- clang/test/OpenMP/force-usm.c | 73 +++ 1 file changed, 73 insertions(+) create mode 100644 clang/test/OpenMP/force-usm.c diff --git a/clang/test/OpenMP/force-usm.c b/clang/test/OpenMP/force-usm.c new file mode 100644 index 00..222705322b8976 --- /dev/null +++ b/clang/test/OpenMP/force-usm.c @@ -0,0 +1,73 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _ --version 3 +// XFAIL: amdgpu-registered-target + +// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -include %S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -emit-llvm-bc %s -o %t-ppc-host.bc +// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -include %S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck -check-prefix=CHECK-USM %s + +// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc +// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck -check-prefix=CHECK-DEFAULT %s +// expected-no-diagnostics + +extern "C" void *malloc(unsigned int b); + +int GI; +#pragma omp declare target +int *pGI; +#pragma omp end declare target + +int main(void) { + + GI = 0; + + pGI = (int *) malloc(sizeof(int)); + *pGI = 42; + +#pragma omp target map(pGI[:1], GI) + { +GI = 1; +*pGI = 2; + } + + return 0; +} + +// CHECK-USM-LABEL: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25 +// CHECK-USM-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[GI:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-USM-NEXT: entry: +// CHECK-USM-NEXT:[[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5) +// CHECK-USM-NEXT:[[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[GI_ADDR]] to ptr +// CHECK-USM-NEXT:store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8 +// CHECK-USM-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], align 8 +// CHECK-USM-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true) +// CHECK-USM-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1 +// CHECK-USM-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]] +// CHECK-USM: user_code.entry: +// CHECK-USM-NEXT:store i32 1, ptr [[TMP0]], align 4 +// CHECK-USM-NEXT:[[TMP2:%.*]] = load ptr, ptr @pGI_decl_tgt_ref_ptr, align 8 +// CHECK-USM-NEXT:[[TMP3:%.*]] = load ptr, ptr [[TMP2]], align 8 +// CHECK-USM-NEXT:store i32 2, ptr [[TMP3]], align 4 +// CHECK-USM-NEXT:call void @__kmpc_target_deinit(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i8 1) +// CHECK-USM-NEXT:ret void +// CHECK-USM: worker.exit: +// CHECK-USM-NEXT:ret void +// +// +// CHECK-DEFAULT-LABEL: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25 +// CHECK-DEFAULT-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[GI:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-DEFAULT-NEXT: entry: +// CHECK-DEFAULT-NEXT:[[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5) +// CHECK-DEFAULT-NEXT:[[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[GI_ADDR]] to ptr +// CHECK-DEFAULT-NEXT:store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8 +// CHECK-DEFAULT-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], align 8 +// CHECK-DEFAULT-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true) +// CHECK-DEFAULT-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1 +// CHECK-DEFAULT-NEXT:
[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)
https://github.com/jplehr created https://github.com/llvm/llvm-project/pull/75468 The new flag implements logic to include `#pragma omp requires unified_shared_memory` in every translation unit. This enables a straightforward way to enable USM for an application without the need to modify sources. This is the flag mentioned in https://github.com/llvm/llvm-project/pull/75467 Once the test landed, I'll rebase and enable the test with this patch. >From bc912bf0a63e6d10b60655d26846731d961021f3 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Thu, 6 Jul 2023 16:47:21 -0400 Subject: [PATCH] [OpenMP] Introduce -fopenmp-force-usm flag The new flag implements logic to include #pragma omp requires unified_shared_memory in every translation unit. This enables a straightforward way to enable USM for an application without the need to modify sources. --- clang/include/clang/Driver/Options.td | 2 ++ clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 14 ++ clang/lib/Headers/CMakeLists.txt | 1 + clang/lib/Headers/openmp_wrappers/usm/force_usm.h | 6 ++ 4 files changed, 23 insertions(+) create mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 1b02087425b751..b9cd3043a13a9a 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm="> Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group, Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>; //===--===// // Shared cc1 + fc1 OpenMP Target Options diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp index b012b7cb729378..2484a59085c276 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp @@ -129,6 +129,20 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const { void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs( const ArgList &DriverArgs, ArgStringList &CC1Args) const { HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args); + + CC1Args.push_back("-internal-isystem"); + SmallString<128> P(HostTC.getDriver().ResourceDir); + llvm::sys::path::append(P, "include/cuda_wrappers"); + CC1Args.push_back(DriverArgs.MakeArgString(P)); + + // Force APU mode will focefully include #pragma omp requires + // unified_shared_memory via the force_usm header + if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) { +CC1Args.push_back("-include"); +CC1Args.push_back( +DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir + + "/include/openmp_wrappers/force_usm.h")); + } } void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args, diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt index f8fdd402777e48..aac232fa8b4405 100644 --- a/clang/lib/Headers/CMakeLists.txt +++ b/clang/lib/Headers/CMakeLists.txt @@ -319,6 +319,7 @@ set(openmp_wrapper_files openmp_wrappers/__clang_openmp_device_functions.h openmp_wrappers/complex_cmath.h openmp_wrappers/new + openmp_wrappers/usm/force_usm.h ) set(llvm_libc_wrapper_files diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h new file mode 100644 index 00..15c394e27ce9c2 --- /dev/null +++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h @@ -0,0 +1,6 @@ +#ifndef __CLANG_FORCE_OPENMP_USM +#define __CLANG_FORCE_OPENMP_USM + +#pragma omp requires unified_shared_memory + +#endif ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)
https://github.com/jplehr updated https://github.com/llvm/llvm-project/pull/75468 >From 9809ba1ec31cb1a4a066f709ae8bd3e965e1 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Thu, 6 Jul 2023 16:47:21 -0400 Subject: [PATCH] [OpenMP] Introduce -fopenmp-force-usm flag The new flag implements logic to include #pragma omp requires unified_shared_memory in every translation unit. This enables a straightforward way to enable USM for an application without the need to modify sources. --- clang/include/clang/Driver/Options.td| 2 ++ clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 16 clang/lib/Headers/CMakeLists.txt | 1 + .../lib/Headers/openmp_wrappers/usm/force_usm.h | 6 ++ 4 files changed, 25 insertions(+) create mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 1b02087425b751..b9cd3043a13a9a 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm="> Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group, Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>; //===--===// // Shared cc1 + fc1 OpenMP Target Options diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp index b012b7cb729378..a077f2f06d7728 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp @@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const { void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs( const ArgList &DriverArgs, ArgStringList &CC1Args) const { HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args); + + CC1Args.push_back("-internal-isystem"); + SmallString<128> P(HostTC.getDriver().ResourceDir); + llvm::sys::path::append(P, "include/cuda_wrappers"); + CC1Args.push_back(DriverArgs.MakeArgString(P)); + + // Force USM mode will forcefully include #pragma omp requires + // unified_shared_memory via the force_usm header + // XXX This may result in a compilation error if the source + // file already includes that pragma. + if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) { +CC1Args.push_back("-include"); +CC1Args.push_back( +DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir + + "/include/openmp_wrappers/force_usm.h")); + } } void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args, diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt index f8fdd402777e48..aac232fa8b4405 100644 --- a/clang/lib/Headers/CMakeLists.txt +++ b/clang/lib/Headers/CMakeLists.txt @@ -319,6 +319,7 @@ set(openmp_wrapper_files openmp_wrappers/__clang_openmp_device_functions.h openmp_wrappers/complex_cmath.h openmp_wrappers/new + openmp_wrappers/usm/force_usm.h ) set(llvm_libc_wrapper_files diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h new file mode 100644 index 00..15c394e27ce9c2 --- /dev/null +++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h @@ -0,0 +1,6 @@ +#ifndef __CLANG_FORCE_OPENMP_USM +#define __CLANG_FORCE_OPENMP_USM + +#pragma omp requires unified_shared_memory + +#endif ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)
@@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const { void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs( const ArgList &DriverArgs, ArgStringList &CC1Args) const { HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args); + + CC1Args.push_back("-internal-isystem"); + SmallString<128> P(HostTC.getDriver().ResourceDir); + llvm::sys::path::append(P, "include/cuda_wrappers"); + CC1Args.push_back(DriverArgs.MakeArgString(P)); + + // Force USM mode will forcefully include #pragma omp requires + // unified_shared_memory via the force_usm header + // XXX This may result in a compilation error if the source + // file already includes that pragma. + if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) { +CC1Args.push_back("-include"); +CC1Args.push_back( +DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir + + "/include/openmp_wrappers/force_usm.h")); jplehr wrote: I'm happy to change that to something more reasonable, if you can point out where to look for inspiration on how to do it properly. https://github.com/llvm/llvm-project/pull/75468 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)
https://github.com/jplehr updated https://github.com/llvm/llvm-project/pull/75467 >From d3d073d7f57f2a5d06cd8c1de8c1503034af3b6b Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Wed, 12 Jul 2023 05:04:41 -0400 Subject: [PATCH] [OpenMP][USM] Adds test for -fopenmp-force-usm flag This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Marked as XFAIL to first land test and then enable in subsequent patch. --- clang/test/OpenMP/force-usm.c | 74 +++ 1 file changed, 74 insertions(+) create mode 100644 clang/test/OpenMP/force-usm.c diff --git a/clang/test/OpenMP/force-usm.c b/clang/test/OpenMP/force-usm.c new file mode 100644 index 00..f04d499e5f71c2 --- /dev/null +++ b/clang/test/OpenMP/force-usm.c @@ -0,0 +1,74 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _ --version 3 +// REQUIRES: amdgpu-registered-target +// XFAIL: amdgpu-registered-target + +// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -include %S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -emit-llvm-bc %s -o %t-ppc-host.bc +// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -include %S/../../lib/Headers/openmp_wrappers/usm/force_usm.h -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck -check-prefix=CHECK-USM %s + +// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc +// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck -check-prefix=CHECK-DEFAULT %s +// expected-no-diagnostics + +extern "C" void *malloc(unsigned int b); + +int GI; +#pragma omp declare target +int *pGI; +#pragma omp end declare target + +int main(void) { + + GI = 0; + + pGI = (int *) malloc(sizeof(int)); + *pGI = 42; + +#pragma omp target map(pGI[:1], GI) + { +GI = 1; +*pGI = 2; + } + + return 0; +} + +// CHECK-USM-LABEL: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25 +// CHECK-USM-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[GI:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-USM-NEXT: entry: +// CHECK-USM-NEXT:[[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5) +// CHECK-USM-NEXT:[[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[GI_ADDR]] to ptr +// CHECK-USM-NEXT:store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8 +// CHECK-USM-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], align 8 +// CHECK-USM-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true) +// CHECK-USM-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1 +// CHECK-USM-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]] +// CHECK-USM: user_code.entry: +// CHECK-USM-NEXT:store i32 1, ptr [[TMP0]], align 4 +// CHECK-USM-NEXT:[[TMP2:%.*]] = load ptr, ptr @pGI_decl_tgt_ref_ptr, align 8 +// CHECK-USM-NEXT:[[TMP3:%.*]] = load ptr, ptr [[TMP2]], align 8 +// CHECK-USM-NEXT:store i32 2, ptr [[TMP3]], align 4 +// CHECK-USM-NEXT:call void @__kmpc_target_deinit(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i8 1) +// CHECK-USM-NEXT:ret void +// CHECK-USM: worker.exit: +// CHECK-USM-NEXT:ret void +// +// +// CHECK-DEFAULT-LABEL: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_main_l25 +// CHECK-DEFAULT-SAME: (ptr noundef nonnull align 4 dereferenceable(4) [[GI:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-DEFAULT-NEXT: entry: +// CHECK-DEFAULT-NEXT:[[GI_ADDR:%.*]] = alloca ptr, align 8, addrspace(5) +// CHECK-DEFAULT-NEXT:[[GI_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[GI_ADDR]] to ptr +// CHECK-DEFAULT-NEXT:store ptr [[GI]], ptr [[GI_ADDR_ASCAST]], align 8 +// CHECK-DEFAULT-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GI_ADDR_ASCAST]], align 8 +// CHECK-DEFAULT-NEXT:[[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr), i8 1, i1 true) +// CHECK-DEFAULT-NEXT:[[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1 +// CHECK-DEFAULT-NEXT:br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]] +// CHECK-DEFAULT: user_code.entry: +// CHECK-DEFAULT-NEXT:store i32 1, ptr [[TMP0]], align 4 +// CHECK-DEFAULT-NEXT:[[TMP2:%.*]] = load ptr, ptr addrspacecast (ptr addrspace(1) @pGI to ptr), align 8 +// CHECK-DEFAU
[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)
@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm="> Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group, Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>; jplehr wrote: With the intent to remove the USM behavior from a codebase that has the requires pragma, by basically just ignoring it? https://github.com/llvm/llvm-project/pull/75468 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)
https://github.com/jplehr updated https://github.com/llvm/llvm-project/pull/75468 >From 4ecd07d786a5a994b33b9177d4e21d839bfe3fc9 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Thu, 6 Jul 2023 16:47:21 -0400 Subject: [PATCH] [OpenMP] Introduce -fopenmp-force-usm flag The new flag implements logic to include #pragma omp requires unified_shared_memory in every translation unit. This enables a straightforward way to enable USM for an application without the need to modify sources. --- clang/include/clang/Driver/Options.td| 2 ++ clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 16 clang/lib/Headers/CMakeLists.txt | 1 + .../lib/Headers/openmp_wrappers/usm/force_usm.h | 6 ++ 4 files changed, 25 insertions(+) create mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 1b02087425b751..73325d5620cc10 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm="> Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group, Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; //===--===// // Shared cc1 + fc1 OpenMP Target Options diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp index b012b7cb729378..a077f2f06d7728 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp @@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const { void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs( const ArgList &DriverArgs, ArgStringList &CC1Args) const { HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args); + + CC1Args.push_back("-internal-isystem"); + SmallString<128> P(HostTC.getDriver().ResourceDir); + llvm::sys::path::append(P, "include/cuda_wrappers"); + CC1Args.push_back(DriverArgs.MakeArgString(P)); + + // Force USM mode will forcefully include #pragma omp requires + // unified_shared_memory via the force_usm header + // XXX This may result in a compilation error if the source + // file already includes that pragma. + if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) { +CC1Args.push_back("-include"); +CC1Args.push_back( +DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir + + "/include/openmp_wrappers/force_usm.h")); + } } void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args, diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt index f8fdd402777e48..aac232fa8b4405 100644 --- a/clang/lib/Headers/CMakeLists.txt +++ b/clang/lib/Headers/CMakeLists.txt @@ -319,6 +319,7 @@ set(openmp_wrapper_files openmp_wrappers/__clang_openmp_device_functions.h openmp_wrappers/complex_cmath.h openmp_wrappers/new + openmp_wrappers/usm/force_usm.h ) set(llvm_libc_wrapper_files diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h new file mode 100644 index 00..15c394e27ce9c2 --- /dev/null +++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h @@ -0,0 +1,6 @@ +#ifndef __CLANG_FORCE_OPENMP_USM +#define __CLANG_FORCE_OPENMP_USM + +#pragma omp requires unified_shared_memory + +#endif ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [Clang][OpenMP] Fix mapping of structs to device (PR #75642)
jplehr wrote: It appears that this patch made the buildbot unhappy (https://lab.llvm.org/buildbot/#/builders/193/builds/43948). Let me know if you need help with this. https://github.com/llvm/llvm-project/pull/75642 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [mlir] [clang] [OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (PR #70401)
jplehr wrote: This was brought up and discussed in the weekly meeting. https://github.com/llvm/llvm-project/pull/70401 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)
https://github.com/jplehr updated https://github.com/llvm/llvm-project/pull/75468 >From 8f381c760fca8a4abd7550c492ff22fa8972933a Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Thu, 6 Jul 2023 16:47:21 -0400 Subject: [PATCH 1/3] [OpenMP] Introduce -fopenmp-force-usm flag The new flag implements logic to include #pragma omp requires unified_shared_memory in every translation unit. This enables a straightforward way to enable USM for an application without the need to modify sources. --- clang/include/clang/Driver/Options.td| 2 ++ clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 16 clang/lib/Headers/CMakeLists.txt | 1 + .../lib/Headers/openmp_wrappers/usm/force_usm.h | 6 ++ 4 files changed, 25 insertions(+) create mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 2b93ddf033499c..e33bc7d1b10d71 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm="> Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group, Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; //===--===// // Shared cc1 + fc1 OpenMP Target Options diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp index b012b7cb729378..a077f2f06d7728 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp @@ -129,6 +129,22 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const { void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs( const ArgList &DriverArgs, ArgStringList &CC1Args) const { HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args); + + CC1Args.push_back("-internal-isystem"); + SmallString<128> P(HostTC.getDriver().ResourceDir); + llvm::sys::path::append(P, "include/cuda_wrappers"); + CC1Args.push_back(DriverArgs.MakeArgString(P)); + + // Force USM mode will forcefully include #pragma omp requires + // unified_shared_memory via the force_usm header + // XXX This may result in a compilation error if the source + // file already includes that pragma. + if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) { +CC1Args.push_back("-include"); +CC1Args.push_back( +DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir + + "/include/openmp_wrappers/force_usm.h")); + } } void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args, diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt index 735e4e4e3be89b..ed491779abcd00 100644 --- a/clang/lib/Headers/CMakeLists.txt +++ b/clang/lib/Headers/CMakeLists.txt @@ -320,6 +320,7 @@ set(openmp_wrapper_files openmp_wrappers/__clang_openmp_device_functions.h openmp_wrappers/complex_cmath.h openmp_wrappers/new + openmp_wrappers/usm/force_usm.h ) set(llvm_libc_wrapper_files diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h new file mode 100644 index 00..15c394e27ce9c2 --- /dev/null +++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h @@ -0,0 +1,6 @@ +#ifndef __CLANG_FORCE_OPENMP_USM +#define __CLANG_FORCE_OPENMP_USM + +#pragma omp requires unified_shared_memory + +#endif >From 4d5a1f670b3bdd5b183515e347610414cb12cb90 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Fri, 29 Dec 2023 04:33:19 -0500 Subject: [PATCH 2/3] Revert "[OpenMP] Introduce -fopenmp-force-usm flag" This reverts commit 4ecd07d786a5a994b33b9177d4e21d839bfe3fc9. To test the other solution. --- clang/include/clang/Driver/Options.td| 2 -- clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | 16 clang/lib/Headers/CMakeLists.txt | 1 - .../lib/Headers/openmp_wrappers/usm/force_usm.h | 6 -- 4 files changed, 25 deletions(-) delete mode 100644 clang/lib/Headers/openmp_wrappers/usm/force_usm.h diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index e33bc7d1b10d71..2b93ddf033499c 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3381,8 +3381,6 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm="> Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>; def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group
[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)
jplehr wrote: I updated the feature PR (#75468) with a different solution. Will update the test after feedback if the route I took in the other PR is seen as OK. https://github.com/llvm/llvm-project/pull/75467 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)
jplehr wrote: Hmm.. I guess I screwed something up with git and the history. https://github.com/llvm/llvm-project/pull/75468 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
https://github.com/jplehr created https://github.com/llvm/llvm-project/pull/76571 This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. In favor of https://github.com/llvm/llvm-project/pull/75468, sorry for the noise and confusion. >From bf25a538e7c020efde557b595eba64b804cbb817 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Fri, 29 Dec 2023 04:32:24 -0500 Subject: [PATCH] [OpenMP][USM] Introduces -fopenmp-force-usm flag This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. --- clang/include/clang/Basic/LangOptions.def | 1 + clang/include/clang/Driver/Options.td | 4 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 +++ clang/lib/Driver/ToolChains/Clang.cpp | 2 ++ 4 files changed, 14 insertions(+) diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def index 21abc346cf17ac..81cf2ad9498a7f 100644 --- a/clang/include/clang/Basic/LangOptions.def +++ b/clang/include/clang/Basic/LangOptions.def @@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have mo LANGOPT(OpenMPNoThreadState , 1, 0, "Assume that no thread in a parallel region will modify an ICV.") LANGOPT(OpenMPNoNestedParallelism , 1, 0, "Assume that no thread in a parallel region will encounter a parallel region") LANGOPT(OpenMPOffloadMandatory , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.") +LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode via compiler.") LANGOPT(NoGPULib , 1, 0, "Indicate a build without the standard GPU libraries.") LANGOPT(RenderScript , 1, 0, "RenderScript") diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 2b93ddf033499c..28290da438c62d 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group< Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, HelpText<"Do not create a host fallback if offloading to the device fails.">, MarshallingInfoFlag>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, + HelpText<"Force behvaior as if the user specified pragma omp requires unified_shared_memory.">, + MarshallingInfoFlag>; def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group, Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>, HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies -foffload-lto=full">; diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index ea6645a39e8321..4855e7410a015a 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM) ? CGM.getLangOpts().OMPHostIRFile : StringRef{}); OMPBuilder.setConfig(Config); + + // The user forces the compiler to behave as if omp requires + // unified_shared_memory was given. + if (CGM.getLangOpts().OpenMPForceUSM) { +HasRequiresUnifiedSharedMemory = true; +OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true); + } } void CGOpenMPRuntime::clear() { diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index acfa119805068d..ffc24201ab2e0b 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism"); if (Args.hasArg(options::OPT_fopenmp_offload_mandatory)) CmdArgs.push_back("-fopenmp-offload-mandatory"); + if (Args.hasArg(options::OPT_fopenmp_force_usm)) +CmdArgs.push_back("-fopenmp-force-usm"); break; default: // By default, if Clang doesn't know how to generate useful OpenMP code _
[clang] [OpenMP] Introduce -fopenmp-force-usm flag (PR #75468)
https://github.com/jplehr closed https://github.com/llvm/llvm-project/pull/75468 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
jplehr wrote: Is the approach taken in this approach acceptable as opposed to the header solution I put up earlier? https://github.com/llvm/llvm-project/pull/76571 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
jplehr wrote: > > Is the approach taken in this approach acceptable as opposed to the header > > solution I put up earlier? > > Yes, it's pretty much exactly what I had in mind from my suggestion in the > last PR. Thanks. Perfect. I'll go ahead and add lit and runtime tests. https://github.com/llvm/llvm-project/pull/76571 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)
jplehr wrote: The IR is impacted for the global that is in that test case. Lines ~46-50 (first IR section) vs line ~68 (second IR section). The remaining code is indeed the same. The way that this test is executed is out of date however, given that I have reimplemented the flag. My plan is to keep this test and add a few runtime tests as well, since we should be able to validate that we do not see data transfers. https://github.com/llvm/llvm-project/pull/75467 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [mlir] [clang] [Flang][OpenMP][MLIR] Add support for -nogpulib option (PR #71045)
jplehr wrote: We have the buildbot now up in staging: https://lab.llvm.org/staging/#/builders/188 https://github.com/llvm/llvm-project/pull/71045 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl] Factor out CreateJITBuilder() and allow specialization in derived classes (PR #84461)
Stefan =?utf-8?q?Gränitz?= Message-ID: In-Reply-To: jplehr wrote: Hi, I think this one broke one of our buildbots: https://lab.llvm.org/buildbot/#/builders/259/builds/1769 I'm happy to help looking into it. https://github.com/llvm/llvm-project/pull/84461 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl] Factor out CreateJITBuilder() and allow specialization in derived classes (PR #84461)
Stefan =?utf-8?q?Gränitz?= Message-ID: In-Reply-To: jplehr wrote: Thanks! I am quite unfamiliar with that part of the code base and wonder if the symbol needs to just exist somewhere. The other thing used there (`InitializeNativeTargetAsmPrinter`) is declared in `TargetSelect.h`. So, does it need to be a two parts fix: one to declare the symbol and then the magic to do the right thing at runtime. If what you linked does both, even better. :) https://github.com/llvm/llvm-project/pull/84461 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-repl] Factor out CreateJITBuilder() and allow specialization in derived classes (PR #84461)
Stefan =?utf-8?q?Gr=C3=A4nitz?= Message-ID: In-Reply-To: jplehr wrote: I see. Thanks for the explanation and the fix! Bot is back to green. https://github.com/llvm/llvm-project/pull/84461 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang] [llvm] [XCOFF] Add compiler version to an auxiliary symbol table entry (PR #80162)
jplehr wrote: This introduced a spelling mistake that broke some builds. https://lab.llvm.org/buildbot/#/builders/193/builds/46220 https://github.com/llvm/llvm-project/pull/80162 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Append target search paths for direct offloading compilation (PR #82699)
https://github.com/jplehr approved this pull request. LG https://github.com/llvm/llvm-project/pull/82699 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][CMake] Support perf, LBR, and Instrument CLANG_BOLT options (PR #69133)
jplehr wrote: Hi @aaupov I think this did break the AMD Hip build bot (another annotated builder). Are you looking into the potential issue? https://github.com/llvm/llvm-project/pull/69133 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
https://github.com/jplehr updated https://github.com/llvm/llvm-project/pull/76571 >From bf25a538e7c020efde557b595eba64b804cbb817 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Fri, 29 Dec 2023 04:32:24 -0500 Subject: [PATCH 1/3] [OpenMP][USM] Introduces -fopenmp-force-usm flag This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. --- clang/include/clang/Basic/LangOptions.def | 1 + clang/include/clang/Driver/Options.td | 4 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 +++ clang/lib/Driver/ToolChains/Clang.cpp | 2 ++ 4 files changed, 14 insertions(+) diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def index 21abc346cf17ac..81cf2ad9498a7f 100644 --- a/clang/include/clang/Basic/LangOptions.def +++ b/clang/include/clang/Basic/LangOptions.def @@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have mo LANGOPT(OpenMPNoThreadState , 1, 0, "Assume that no thread in a parallel region will modify an ICV.") LANGOPT(OpenMPNoNestedParallelism , 1, 0, "Assume that no thread in a parallel region will encounter a parallel region") LANGOPT(OpenMPOffloadMandatory , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.") +LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode via compiler.") LANGOPT(NoGPULib , 1, 0, "Indicate a build without the standard GPU libraries.") LANGOPT(RenderScript , 1, 0, "RenderScript") diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 2b93ddf033499c..28290da438c62d 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group< Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, HelpText<"Do not create a host fallback if offloading to the device fails.">, MarshallingInfoFlag>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, + HelpText<"Force behvaior as if the user specified pragma omp requires unified_shared_memory.">, + MarshallingInfoFlag>; def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group, Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>, HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies -foffload-lto=full">; diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index ea6645a39e8321..4855e7410a015a 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM) ? CGM.getLangOpts().OMPHostIRFile : StringRef{}); OMPBuilder.setConfig(Config); + + // The user forces the compiler to behave as if omp requires + // unified_shared_memory was given. + if (CGM.getLangOpts().OpenMPForceUSM) { +HasRequiresUnifiedSharedMemory = true; +OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true); + } } void CGOpenMPRuntime::clear() { diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index acfa119805068d..ffc24201ab2e0b 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism"); if (Args.hasArg(options::OPT_fopenmp_offload_mandatory)) CmdArgs.push_back("-fopenmp-offload-mandatory"); + if (Args.hasArg(options::OPT_fopenmp_force_usm)) +CmdArgs.push_back("-fopenmp-force-usm"); break; default: // By default, if Clang doesn't know how to generate useful OpenMP code >From 11ad5633889870d897bfc4e77bc41b569e5ce539 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Wed, 12 Jul 2023 05:04:41 -0400 Subject: [PATCH 2/3] [OpenMP][USM] Adds test for -fopenmp-force-usm flag This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Marked as XFAIL to first land test and then enable in subsequent patch. --- clang/test/OpenMP/force-usm.c | 74 +++ 1 file changed, 74 insertions(+) create mode 100644 clang/test/OpenMP/force-usm.c diff --git a/clang/test/OpenM
[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
jplehr wrote: While I add some documentation, I'd appreciate feedback especially on the lit side of things. I would very much like to rename the pretty happy tripple-X workaround for substitution debugging into something sane. https://github.com/llvm/llvm-project/pull/76571 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)
jplehr wrote: Closing this. Test is now part of feature-PR. https://github.com/llvm/llvm-project/pull/75467 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][USM] Adds test for -fopenmp-force-usm flag (PR #75467)
https://github.com/jplehr closed https://github.com/llvm/llvm-project/pull/75467 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
https://github.com/jplehr updated https://github.com/llvm/llvm-project/pull/76571 >From bf25a538e7c020efde557b595eba64b804cbb817 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Fri, 29 Dec 2023 04:32:24 -0500 Subject: [PATCH 1/4] [OpenMP][USM] Introduces -fopenmp-force-usm flag This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. --- clang/include/clang/Basic/LangOptions.def | 1 + clang/include/clang/Driver/Options.td | 4 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 +++ clang/lib/Driver/ToolChains/Clang.cpp | 2 ++ 4 files changed, 14 insertions(+) diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def index 21abc346cf17ac3..81cf2ad9498a7f9 100644 --- a/clang/include/clang/Basic/LangOptions.def +++ b/clang/include/clang/Basic/LangOptions.def @@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have mo LANGOPT(OpenMPNoThreadState , 1, 0, "Assume that no thread in a parallel region will modify an ICV.") LANGOPT(OpenMPNoNestedParallelism , 1, 0, "Assume that no thread in a parallel region will encounter a parallel region") LANGOPT(OpenMPOffloadMandatory , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.") +LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode via compiler.") LANGOPT(NoGPULib , 1, 0, "Indicate a build without the standard GPU libraries.") LANGOPT(RenderScript , 1, 0, "RenderScript") diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 2b93ddf033499cc..28290da438c62db 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group< Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, HelpText<"Do not create a host fallback if offloading to the device fails.">, MarshallingInfoFlag>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, + HelpText<"Force behvaior as if the user specified pragma omp requires unified_shared_memory.">, + MarshallingInfoFlag>; def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group, Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>, HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies -foffload-lto=full">; diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index ea6645a39e83218..4855e7410a015aa 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM) ? CGM.getLangOpts().OMPHostIRFile : StringRef{}); OMPBuilder.setConfig(Config); + + // The user forces the compiler to behave as if omp requires + // unified_shared_memory was given. + if (CGM.getLangOpts().OpenMPForceUSM) { +HasRequiresUnifiedSharedMemory = true; +OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true); + } } void CGOpenMPRuntime::clear() { diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index acfa119805068d2..ffc24201ab2e0b5 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism"); if (Args.hasArg(options::OPT_fopenmp_offload_mandatory)) CmdArgs.push_back("-fopenmp-offload-mandatory"); + if (Args.hasArg(options::OPT_fopenmp_force_usm)) +CmdArgs.push_back("-fopenmp-force-usm"); break; default: // By default, if Clang doesn't know how to generate useful OpenMP code >From 11ad5633889870d897bfc4e77bc41b569e5ce539 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Wed, 12 Jul 2023 05:04:41 -0400 Subject: [PATCH 2/4] [OpenMP][USM] Adds test for -fopenmp-force-usm flag This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Marked as XFAIL to first land test and then enable in subsequent patch. --- clang/test/OpenMP/force-usm.c | 74 +++ 1 file changed, 74 insertions(+) create mode 100644 clang/test/OpenMP/force-usm.c diff --git a/clang/te
[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
https://github.com/jplehr updated https://github.com/llvm/llvm-project/pull/76571 >From bf25a538e7c020efde557b595eba64b804cbb817 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Fri, 29 Dec 2023 04:32:24 -0500 Subject: [PATCH 1/5] [OpenMP][USM] Introduces -fopenmp-force-usm flag This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. --- clang/include/clang/Basic/LangOptions.def | 1 + clang/include/clang/Driver/Options.td | 4 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 +++ clang/lib/Driver/ToolChains/Clang.cpp | 2 ++ 4 files changed, 14 insertions(+) diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def index 21abc346cf17ac..81cf2ad9498a7f 100644 --- a/clang/include/clang/Basic/LangOptions.def +++ b/clang/include/clang/Basic/LangOptions.def @@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have mo LANGOPT(OpenMPNoThreadState , 1, 0, "Assume that no thread in a parallel region will modify an ICV.") LANGOPT(OpenMPNoNestedParallelism , 1, 0, "Assume that no thread in a parallel region will encounter a parallel region") LANGOPT(OpenMPOffloadMandatory , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.") +LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode via compiler.") LANGOPT(NoGPULib , 1, 0, "Indicate a build without the standard GPU libraries.") LANGOPT(RenderScript , 1, 0, "RenderScript") diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 2b93ddf033499c..28290da438c62d 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group< Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, HelpText<"Do not create a host fallback if offloading to the device fails.">, MarshallingInfoFlag>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, + HelpText<"Force behvaior as if the user specified pragma omp requires unified_shared_memory.">, + MarshallingInfoFlag>; def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group, Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>, HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies -foffload-lto=full">; diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index ea6645a39e8321..4855e7410a015a 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM) ? CGM.getLangOpts().OMPHostIRFile : StringRef{}); OMPBuilder.setConfig(Config); + + // The user forces the compiler to behave as if omp requires + // unified_shared_memory was given. + if (CGM.getLangOpts().OpenMPForceUSM) { +HasRequiresUnifiedSharedMemory = true; +OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true); + } } void CGOpenMPRuntime::clear() { diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index acfa119805068d..ffc24201ab2e0b 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism"); if (Args.hasArg(options::OPT_fopenmp_offload_mandatory)) CmdArgs.push_back("-fopenmp-offload-mandatory"); + if (Args.hasArg(options::OPT_fopenmp_force_usm)) +CmdArgs.push_back("-fopenmp-force-usm"); break; default: // By default, if Clang doesn't know how to generate useful OpenMP code >From 11ad5633889870d897bfc4e77bc41b569e5ce539 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Wed, 12 Jul 2023 05:04:41 -0400 Subject: [PATCH 2/5] [OpenMP][USM] Adds test for -fopenmp-force-usm flag This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Marked as XFAIL to first land test and then enable in subsequent patch. --- clang/test/OpenMP/force-usm.c | 74 +++ 1 file changed, 74 insertions(+) create mode 100644 clang/test/OpenMP/force-usm.c diff --git a/clang/test/OpenM
[openmp] [clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
jplehr wrote: @carlobertolli can you have another look at the runtime test I added to see if that addresses your feedback? https://github.com/llvm/llvm-project/pull/76571 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
jplehr wrote: > Automatic zero-copy doesn't work on some of the bbot's. I will have to land > this once the lit test harness extension in #77851 re-lands. Having your work landed would be very helpful indeed. https://github.com/llvm/llvm-project/pull/76571 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
https://github.com/jplehr updated https://github.com/llvm/llvm-project/pull/76571 >From 41b227e2c84b3c7eeedb6a9ebf559bec2c34aec3 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Fri, 29 Dec 2023 04:32:24 -0500 Subject: [PATCH] [OpenMP][USM] Introduces -fopenmp-force-usm flag This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Which I think is the only difference observable in code generation. This runtime test checks for the (non-)occurence of data movement between host and device. It does one run without the flag and one with the flag to also see that both versions behave as expected. In the case w/o the new flag data movement between host and device is expected. In the case with the flag such data movement should not be present / reported. --- clang/include/clang/Basic/LangOptions.def | 1 + clang/include/clang/Driver/Options.td | 4 + clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 ++ clang/lib/Driver/ToolChains/Clang.cpp | 2 + clang/test/OpenMP/force-usm.c | 74 +++ openmp/libomptarget/test/lit.cfg | 8 ++ .../test/offloading/force-usm.cpp | 59 +++ 7 files changed, 155 insertions(+) create mode 100644 clang/test/OpenMP/force-usm.c create mode 100644 openmp/libomptarget/test/offloading/force-usm.cpp diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def index 21abc346cf17ac..81cf2ad9498a7f 100644 --- a/clang/include/clang/Basic/LangOptions.def +++ b/clang/include/clang/Basic/LangOptions.def @@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have mo LANGOPT(OpenMPNoThreadState , 1, 0, "Assume that no thread in a parallel region will modify an ICV.") LANGOPT(OpenMPNoNestedParallelism , 1, 0, "Assume that no thread in a parallel region will encounter a parallel region") LANGOPT(OpenMPOffloadMandatory , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.") +LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode via compiler.") LANGOPT(NoGPULib , 1, 0, "Indicate a build without the standard GPU libraries.") LANGOPT(RenderScript , 1, 0, "RenderScript") diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 2b93ddf033499c..28290da438c62d 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group< Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, HelpText<"Do not create a host fallback if offloading to the device fails.">, MarshallingInfoFlag>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, + HelpText<"Force behvaior as if the user specified pragma omp requires unified_shared_memory.">, + MarshallingInfoFlag>; def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group, Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>, HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies -foffload-lto=full">; diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index ea6645a39e8321..4855e7410a015a 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM) ? CGM.getLangOpts().OMPHostIRFile : StringRef{}); OMPBuilder.setConfig(Config); + + // The user forces the compiler to behave as if omp requires + // unified_shared_memory was given. + if (CGM.getLangOpts().OpenMPForceUSM) { +HasRequiresUnifiedSharedMemory = true; +OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true); + } } void CGOpenMPRuntime::clear() { diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index acfa119805068d..ffc24201ab2e0b 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism"); if (Args.hasArg(options::OPT_fopenmp_offlo
[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
jplehr wrote: I just realized that I need to update the clang lit tests, so this is *not ready to land*, but I don't see a button to indicate that. https://github.com/llvm/llvm-project/pull/76571 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
https://github.com/jplehr updated https://github.com/llvm/llvm-project/pull/76571 >From a6c437a52674613b90c451c2ed4105265f420a32 Mon Sep 17 00:00:00 2001 From: JP Lehr Date: Fri, 29 Dec 2023 04:32:24 -0500 Subject: [PATCH] [OpenMP][USM] Introduces -fopenmp-force-usm flag This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Which I think is the only difference observable in code generation. This runtime test checks for the (non-)occurence of data movement between host and device. It does one run without the flag and one with the flag to also see that both versions behave as expected. In the case w/o the new flag data movement between host and device is expected. In the case with the flag such data movement should not be present / reported. --- clang/include/clang/Basic/LangOptions.def | 1 + clang/include/clang/Driver/Options.td | 4 + clang/lib/CodeGen/CGOpenMPRuntime.cpp | 7 ++ clang/lib/Driver/ToolChains/Clang.cpp | 2 + clang/test/OpenMP/force-usm.c | 79 +++ openmp/libomptarget/test/lit.cfg | 8 ++ .../test/offloading/force-usm.cpp | 59 ++ 7 files changed, 160 insertions(+) create mode 100644 clang/test/OpenMP/force-usm.c create mode 100644 openmp/libomptarget/test/offloading/force-usm.cpp diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def index 21abc346cf17ac..81cf2ad9498a7f 100644 --- a/clang/include/clang/Basic/LangOptions.def +++ b/clang/include/clang/Basic/LangOptions.def @@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have mo LANGOPT(OpenMPNoThreadState , 1, 0, "Assume that no thread in a parallel region will modify an ICV.") LANGOPT(OpenMPNoNestedParallelism , 1, 0, "Assume that no thread in a parallel region will encounter a parallel region") LANGOPT(OpenMPOffloadMandatory , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.") +LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode via compiler.") LANGOPT(NoGPULib , 1, 0, "Indicate a build without the standard GPU libraries.") LANGOPT(RenderScript , 1, 0, "RenderScript") diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 2b93ddf033499c..28290da438c62d 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group< Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, HelpText<"Do not create a host fallback if offloading to the device fails.">, MarshallingInfoFlag>; +def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group, + Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>, + HelpText<"Force behvaior as if the user specified pragma omp requires unified_shared_memory.">, + MarshallingInfoFlag>; def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group, Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>, HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies -foffload-lto=full">; diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index ea6645a39e8321..4855e7410a015a 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM) ? CGM.getLangOpts().OMPHostIRFile : StringRef{}); OMPBuilder.setConfig(Config); + + // The user forces the compiler to behave as if omp requires + // unified_shared_memory was given. + if (CGM.getLangOpts().OpenMPForceUSM) { +HasRequiresUnifiedSharedMemory = true; +OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true); + } } void CGOpenMPRuntime::clear() { diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index acfa119805068d..ffc24201ab2e0b 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism"); if (Args.hasArg(options::OPT_fopenmp_offloa
[clang] [openmp] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
https://github.com/jplehr closed https://github.com/llvm/llvm-project/pull/76571 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (PR #66882)
jplehr wrote: It appears that this change made the AMDGPU OpenMP buildbot unhappy https://lab.llvm.org/buildbot/#/builders/193/builds/39050 https://github.com/llvm/llvm-project/pull/66882 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (PR #66882)
jplehr wrote: Sure, I'll look into it later today and get back to you. https://github.com/llvm/llvm-project/pull/66882 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP] Introduce the initial support for OpenMP kernel language (PR #66844)
jplehr wrote: It seems that this broke the AMDGPU OpenMP buildbot https://lab.llvm.org/buildbot/#/builders/193/builds/39393 I saw that you have since pushed up one patch regarding pointer compares. Are you looking at the remaining test fails as well? https://github.com/llvm/llvm-project/pull/66844 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][OpenMP] Clang adding the addrSpace according to DataLayout fix (PR #65483)
jplehr wrote: I believe this broke one of the AMDGPU OpenMP buildbots https://lab.llvm.org/staging/#/builders/247/builds/6351 https://github.com/llvm/llvm-project/pull/65483 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)
@@ -102,6 +80,121 @@ function(add_entrypoint_library target_name) list(APPEND all_deps ${entrypoint_target}) endforeach(dep) list(REMOVE_DUPLICATES all_deps) + set(${result} ${all_deps} PARENT_SCOPE) +endfunction() + +# A rule to build a library from a collection of entrypoint objects and bundle +# it into a GPU fatbinary. Usage is the same as 'add_entrypoint_library'. +# Usage: +# add_gpu_entrypoint_library( +# DEPENDS +# ) +function(add_gpu_entrypoint_library target_name) + cmake_parse_arguments( +"ENTRYPOINT_LIBRARY" +"" # No optional arguments +"" # No single value arguments +"DEPENDS" # Multi-value arguments +${ARGN} + ) + if(NOT ENTRYPOINT_LIBRARY_DEPENDS) +message(FATAL_ERROR "'add_entrypoint_library' target requires a DEPENDS list " +"of 'add_entrypoint_object' targets.") + endif() + + get_fq_deps_list(fq_deps_list ${ENTRYPOINT_LIBRARY_DEPENDS}) + get_all_object_file_deps(all_deps "${fq_deps_list}") + + # The GPU 'libc' needs to be exported in a format that can be linked with + # offloading langauges like OpenMP or CUDA. This wraps every GPU object into a + # fat binary and adds them to a static library. + set(objects "") + foreach(dep IN LISTS all_deps) +set(object $<$,${dep}>:$>) +string(FIND ${dep} "." last_dot_loc REVERSE) +math(EXPR name_loc "${last_dot_loc} + 1") +string(SUBSTRING ${dep} ${name_loc} -1 name) +if(LIBC_TARGET_ARCHITECTURE_IS_NVPTX) + set(prefix --image=arch=generic,triple=nvptx64-nvidia-cuda,feature=+ptx63) +else() jplehr wrote: Other places do `elseif(LIBC_TARGET_ARCHITECTURE_IS_AMDGPU)`. Maybe here as well for consistency? https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)
https://github.com/jplehr commented: I looked at the changes and from the little I understand CMake they seem ok. I added one nit. Maybe @saiislam can have a look as well. https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Offload] Provide a kernel library useable by the offload runtime (PR #104168)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/104168 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Offload] Provide a kernel library useable by the offload runtime (PR #104168)
https://github.com/jplehr commented: I tried a bot-config build and it fails with a compiler error, see my comment. https://github.com/llvm/llvm-project/pull/104168 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Offload] Provide a kernel library useable by the offload runtime (PR #104168)
@@ -1533,6 +1565,67 @@ Error GenericDeviceTy::printInfo() { return Plugin::success(); } +Expected GenericDeviceTy::getKernel(llvm::StringRef Name, + DeviceImageTy *ImagePtr, + bool Optional) { + bool KernelFound = false; + GenericKernelTy *&KernelPtr = KernelMap[Name]; + if (!KernelPtr) { +GenericGlobalHandlerTy &GHandler = Plugin.getGlobalHandler(); + +auto CheckImage = [&](DeviceImageTy &Image) -> GenericKernelTy * { + if (!GHandler.isSymbolInImage(*this, Image, Name)) +return nullptr; + KernelFound = true; + + auto KernelOrErr = constructKernelImpl(Name); + if (Error Err = KernelOrErr.takeError()) { +[[maybe_unused]] std::string ErrStr = toString(std::move(Err)); +DP("Failed to construct kernel ('%s'): %s", Name.data(), + ErrStr.c_str()); +return nullptr; + } + + GenericKernelTy &Kernel = *KernelOrErr; + if (auto Err = Kernel.init(*this, Image)) { +[[maybe_unused]] std::string ErrStr = toString(std::move(Err)); +DP("Failed to initialize kernel ('%s'): %s", Name.data(), + ErrStr.c_str()); +return nullptr; + } + + return &Kernel; +}; + +if (ImagePtr) { + KernelPtr = CheckImage(*ImagePtr); +} else { + for (DeviceImageTy *Image : LoadedImages) { +KernelPtr = CheckImage(*Image); +if (KernelPtr) + break; + } +} + } + + // If we didn't find the kernel and it was optional, we do not emit an error. + if (!KernelPtr && !KernelFound && Optional) +return nullptr; + // If we didn't find the kernel and it was not optional, we will emit an + // error. + if (!KernelPtr && !KernelFound) +return Plugin::error( +"Kernel '%s' not found%s", Name.data(), +ImagePtr +? "" +: ", searched " + std::to_string(LoadedImages.size()) + " images"); jplehr wrote: I believe this misses a `.data()` or something. https://github.com/llvm/llvm-project/pull/104168 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [Libomp] Place generated OpenMP headers into build resource directory (PR #88007)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/88007 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [Libomp] Place generated OpenMP headers into build resource directory (PR #88007)
https://github.com/jplehr commented: The changes seem reasonable to me. https://github.com/llvm/llvm-project/pull/88007 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [Libomp] Place generated OpenMP headers into build resource directory (PR #88007)
@@ -16,4 +16,12 @@ typedef unsigned __INTPTR_TYPE__ uintptr_t; #error Every target should have __INTPTR_TYPE__ #endif +#ifdef __INTPTR_MAX__ +#define INTPTR_MAX__INTPTR_MAX__ +#endif + +#ifdef __UINTPTR_MAX__ +#define UINTPTR_MAX __UINTPTR_MAX__ +#endif + jplehr wrote: Are these changes required to make this work? https://github.com/llvm/llvm-project/pull/88007 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [Offload][NFC] Remove `omp_` prefix from offloading entries (PR #88071)
https://github.com/jplehr approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/88071 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (PR #94549)
@@ -1199,7 +1244,9 @@ llvm::Function *CGNVCUDARuntime::finalizeModule() { } return nullptr; } - if (CGM.getLangOpts().OffloadingNewDriver && RelocatableDeviceCode) + if (CGM.getLangOpts().OffloadViaLLVM) +createOffloadingEntries(); + else if (CGM.getLangOpts().OffloadingNewDriver && RelocatableDeviceCode) jplehr wrote: Is this calling the same target in both cases? Why is it distinguished? https://github.com/llvm/llvm-project/pull/94549 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (PR #94549)
jplehr wrote: Should the NFCI changes (like initializing struct fields) be put into a separate PR? https://github.com/llvm/llvm-project/pull/94549 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ConstantFold] Drop gep of gep fold entirely (PR #95126)
jplehr wrote: I believe this broke our flang+openmp+offload bot: https://lab.llvm.org/staging/#/builders/140/builds/10168 Happy to help looking into it. https://github.com/llvm/llvm-project/pull/95126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [ConstantFold] Drop gep of gep fold entirely (PR #95126)
jplehr wrote: Thank you @nikic. I'll see to reproduce locally and narrow down as much as possible to provide small reproducer. https://github.com/llvm/llvm-project/pull/95126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [LinkerWrapper] Make `-Xoffload-linker` match `-Xlinker` semantics (PR #101032)
https://github.com/jplehr commented: Do we have some sort of documentation where this change of behavior needs to be communicated? https://github.com/llvm/llvm-project/pull/101032 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [libc] [llvm] [OpenMP][libc] Remove special handling for OpenMP printf (PR #98940)
jplehr wrote: I was mostly curious if we have some coverage that would ideally break if the implementation breaks. https://github.com/llvm/llvm-project/pull/98940 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] AMDGPU: Move attributor into optimization pipeline (PR #83131)
jplehr wrote: Hey @arsenm this broke all AMDGPU OpenMP Offload buildbots (e.g., https://lab.llvm.org/buildbot/#/builders/30). Any chance you can fix these issues? https://github.com/llvm/llvm-project/pull/83131 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/90959 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)
@@ -111,6 +156,71 @@ void __xray_init() XRAY_NEVER_INSTRUMENT { #endif } +// Default visibility is hidden, so we have to explicitly make it visible to +// DSO. +SANITIZER_INTERFACE_ATTRIBUTE int32_t __xray_register_dso( +const XRaySledEntry *SledsBegin, const XRaySledEntry *SledsEnd, +const XRayFunctionSledIndex *FnIndexBegin, +const XRayFunctionSledIndex *FnIndexEnd, +XRayTrampolines Trampolines) XRAY_NEVER_INSTRUMENT { + // Make sure XRay has been initialized in the main executable. + __xray_init(); + + if (__xray_num_objects() == 0) { +if (Verbosity()) + Report("No XRay instrumentation map in main executable. Not initializing " + "XRay for DSO.\n"); +return -1; + } + + // Register sleds in global map. + int ObjId = __xray_register_sleds(SledsBegin, SledsEnd, FnIndexBegin, + FnIndexEnd, true, Trampolines); + +#ifndef XRAY_NO_PREINIT + if (ObjId >= 0 && flags()->patch_premain) +__xray_patch_object(ObjId); +#endif + + return ObjId; +} + +SANITIZER_INTERFACE_ATTRIBUTE bool +__xray_deregister_dso(int32_t ObjId) XRAY_NEVER_INSTRUMENT { + // Make sure XRay has been initialized in the main executable. + __xray_init(); + + if (ObjId <= 0 || ObjId >= __xray_num_objects()) { +if (Verbosity()) + Report("Can't deregister object with ID %d: ID is invalid.\n", ObjId); +return false; + } + + { +SpinMutexLock Guard(&XRayInstrMapMutex); +auto &Entry = XRayInstrMaps[ObjId]; +if (!Entry.FromDSO) { + if (Verbosity()) +Report("Can't deregister object with ID %d: object does not correspond " + "to a shared library.\n", + ObjId); + return false; +} +if (!Entry.Loaded) { + if (Verbosity()) +Report("Can't deregister object with ID %d: object is not loaded.\n", + ObjId); +} +// This is all we have to do here. jplehr wrote: I don't think this comment is helpful. Maybe better: explain why. https://github.com/llvm/llvm-project/pull/90959 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)
https://github.com/jplehr commented: Thank you for contributing this patch. I am by no means an expert in this area, but would like to see xray get support for shared libraries. I did an initial pass to simply look for easy things and left a few comments. https://github.com/llvm/llvm-project/pull/90959 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)
@@ -50,14 +52,72 @@ atomic_uint8_t XRayInitialized{0}; // This should always be updated before XRayInitialized is updated. SpinMutex XRayInstrMapMutex; -XRaySledMap XRayInstrMap; +// XRaySledMap XRayInstrMap; +// Contains maps for the main executable as well as DSOs. +// std::vector XRayInstrMaps; +XRaySledMap *XRayInstrMaps; +atomic_uint32_t XRayNumObjects; // Global flag to determine whether the flags have been initialized. atomic_uint8_t XRayFlagsInitialized{0}; // A mutex to allow only one thread to initialize the XRay data structures. SpinMutex XRayInitMutex; +int32_t +__xray_register_sleds(const XRaySledEntry *SledsBegin, + const XRaySledEntry *SledsEnd, + const XRayFunctionSledIndex *FnIndexBegin, + const XRayFunctionSledIndex *FnIndexEnd, bool FromDSO, + XRayTrampolines Trampolines) XRAY_NEVER_INSTRUMENT { + if (!SledsBegin || !SledsEnd) { +return -1; + } + XRaySledMap SledMap; + SledMap.FromDSO = FromDSO; + SledMap.Loaded = true; + SledMap.Trampolines = Trampolines; + SledMap.Sleds = SledsBegin; + SledMap.Entries = SledsEnd - SledsBegin; + if (FnIndexBegin != nullptr) { +SledMap.SledsIndex = FnIndexBegin; +SledMap.Functions = FnIndexEnd - FnIndexBegin; + } else { +size_t CountFunctions = 0; +uint64_t LastFnAddr = 0; + +for (std::size_t I = 0; I < SledMap.Entries; I++) { + const auto &Sled = SledMap.Sleds[I]; + const auto Function = Sled.function(); + if (Function != LastFnAddr) { +CountFunctions++; +LastFnAddr = Function; + } +} + +SledMap.Functions = CountFunctions; + } + if (SledMap.Functions >= XRayMaxFunctions) { +Report("Too many functions! Maximum is %ld\n", XRayMaxFunctions); +return -1; + } + + if (Verbosity()) { jplehr wrote: No curly here https://github.com/llvm/llvm-project/pull/90959 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)
@@ -50,14 +52,72 @@ atomic_uint8_t XRayInitialized{0}; // This should always be updated before XRayInitialized is updated. SpinMutex XRayInstrMapMutex; -XRaySledMap XRayInstrMap; +// XRaySledMap XRayInstrMap; +// Contains maps for the main executable as well as DSOs. +// std::vector XRayInstrMaps; +XRaySledMap *XRayInstrMaps; +atomic_uint32_t XRayNumObjects; // Global flag to determine whether the flags have been initialized. atomic_uint8_t XRayFlagsInitialized{0}; // A mutex to allow only one thread to initialize the XRay data structures. SpinMutex XRayInitMutex; +int32_t +__xray_register_sleds(const XRaySledEntry *SledsBegin, + const XRaySledEntry *SledsEnd, + const XRayFunctionSledIndex *FnIndexBegin, + const XRayFunctionSledIndex *FnIndexEnd, bool FromDSO, + XRayTrampolines Trampolines) XRAY_NEVER_INSTRUMENT { + if (!SledsBegin || !SledsEnd) { jplehr wrote: No curly here https://github.com/llvm/llvm-project/pull/90959 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)
@@ -80,29 +140,14 @@ void __xray_init() XRAY_NEVER_INSTRUMENT { return; } - { -SpinMutexLock Guard(&XRayInstrMapMutex); -XRayInstrMap.Sleds = __start_xray_instr_map; -XRayInstrMap.Entries = __stop_xray_instr_map - __start_xray_instr_map; -if (__start_xray_fn_idx != nullptr) { - XRayInstrMap.SledsIndex = __start_xray_fn_idx; - XRayInstrMap.Functions = __stop_xray_fn_idx - __start_xray_fn_idx; -} else { - size_t CountFunctions = 0; - uint64_t LastFnAddr = 0; - - for (std::size_t I = 0; I < XRayInstrMap.Entries; I++) { -const auto &Sled = XRayInstrMap.Sleds[I]; -const auto Function = Sled.function(); -if (Function != LastFnAddr) { - CountFunctions++; - LastFnAddr = Function; -} - } + atomic_store(&XRayNumObjects, 0, memory_order_release); + + // Pre-allocation takes up approx. 5kB for XRayMaxObjects=64. + XRayInstrMaps = allocateBuffer(XRayMaxObjects); + + __xray_register_sleds(__start_xray_instr_map, __stop_xray_instr_map, jplehr wrote: Why is the return value not checked? https://github.com/llvm/llvm-project/pull/90959 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)
@@ -111,6 +156,71 @@ void __xray_init() XRAY_NEVER_INSTRUMENT { #endif } +// Default visibility is hidden, so we have to explicitly make it visible to +// DSO. +SANITIZER_INTERFACE_ATTRIBUTE int32_t __xray_register_dso( +const XRaySledEntry *SledsBegin, const XRaySledEntry *SledsEnd, +const XRayFunctionSledIndex *FnIndexBegin, +const XRayFunctionSledIndex *FnIndexEnd, +XRayTrampolines Trampolines) XRAY_NEVER_INSTRUMENT { + // Make sure XRay has been initialized in the main executable. + __xray_init(); + + if (__xray_num_objects() == 0) { +if (Verbosity()) + Report("No XRay instrumentation map in main executable. Not initializing " + "XRay for DSO.\n"); +return -1; + } + + // Register sleds in global map. + int ObjId = __xray_register_sleds(SledsBegin, SledsEnd, FnIndexBegin, + FnIndexEnd, true, Trampolines); + +#ifndef XRAY_NO_PREINIT + if (ObjId >= 0 && flags()->patch_premain) +__xray_patch_object(ObjId); +#endif + + return ObjId; +} + +SANITIZER_INTERFACE_ATTRIBUTE bool +__xray_deregister_dso(int32_t ObjId) XRAY_NEVER_INSTRUMENT { + // Make sure XRay has been initialized in the main executable. + __xray_init(); + + if (ObjId <= 0 || ObjId >= __xray_num_objects()) { +if (Verbosity()) + Report("Can't deregister object with ID %d: ID is invalid.\n", ObjId); +return false; + } + + { +SpinMutexLock Guard(&XRayInstrMapMutex); +auto &Entry = XRayInstrMaps[ObjId]; +if (!Entry.FromDSO) { + if (Verbosity()) +Report("Can't deregister object with ID %d: object does not correspond " + "to a shared library.\n", + ObjId); + return false; +} +if (!Entry.Loaded) { jplehr wrote: No curly here https://github.com/llvm/llvm-project/pull/90959 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [OpenMP] Map `omp_default_mem_alloc` to global memory (PR #104790)
jplehr wrote: For some reason this broke the bots, e.g., https://lab.llvm.org/buildbot/#/builders/30/builds/4417 https://lab.llvm.org/staging/#/builders/97/builds/2453 https://github.com/llvm/llvm-project/pull/104790 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (PR #102691)
jplehr wrote: I did a few local run of this with our buildbot config and that appeared clean for the time being. https://github.com/llvm/llvm-project/pull/102691 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [NFC][clang][TableGen] Remove redundant llvm:: namespace qualifier (PR #108627)
jplehr wrote: Hi, we see the same errors on our buildbots, two as examples: https://lab.llvm.org/buildbot/#/builders/30/builds/6250 https://lab.llvm.org/buildbot/#/builders/140/builds/6648 https://github.com/llvm/llvm-project/pull/108627 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Add support for instrumentation of DSOs on x86_64 (PR #90959)
https://github.com/jplehr closed https://github.com/llvm/llvm-project/pull/90959 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Clang] Put offloading globals in the `.llvm.rodata.offloading` section (PR #111890)
jplehr wrote: Would there be any reason to put entries for different offloading languages into distinctly named “sub entries”? https://github.com/llvm/llvm-project/pull/111890 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [OpenMP][Clang] Migrate OpenMP UserDefinedMapper from Clang to OMPIRBuilder (PR #110001)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/110001 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [OpenMP][Clang] Migrate OpenMP UserDefinedMapper from Clang to OMPIRBuilder (PR #110001)
@@ -9058,257 +9058,65 @@ void CGOpenMPRuntime::emitUserDefinedMapper(const OMPDeclareMapperDecl *D, return; ASTContext &C = CGM.getContext(); QualType Ty = D->getType(); - QualType PtrTy = C.getPointerType(Ty).withRestrict(); - QualType Int64Ty = C.getIntTypeForBitwidth(/*DestWidth=*/64, /*Signed=*/true); auto *MapperVarDecl = cast(cast(D->getMapperVarRef())->getDecl()); - SourceLocation Loc = D->getLocation(); CharUnits ElementSize = C.getTypeSizeInChars(Ty); llvm::Type *ElemTy = CGM.getTypes().ConvertTypeForMem(Ty); - // Prepare mapper function arguments and attributes. - ImplicitParamDecl HandleArg(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr, - C.VoidPtrTy, ImplicitParamKind::Other); - ImplicitParamDecl BaseArg(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr, C.VoidPtrTy, -ImplicitParamKind::Other); - ImplicitParamDecl BeginArg(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr, - C.VoidPtrTy, ImplicitParamKind::Other); - ImplicitParamDecl SizeArg(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr, Int64Ty, -ImplicitParamKind::Other); - ImplicitParamDecl TypeArg(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr, Int64Ty, -ImplicitParamKind::Other); - ImplicitParamDecl NameArg(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr, C.VoidPtrTy, -ImplicitParamKind::Other); - FunctionArgList Args; - Args.push_back(&HandleArg); - Args.push_back(&BaseArg); - Args.push_back(&BeginArg); - Args.push_back(&SizeArg); - Args.push_back(&TypeArg); - Args.push_back(&NameArg); - const CGFunctionInfo &FnInfo = - CGM.getTypes().arrangeBuiltinFunctionDeclaration(C.VoidTy, Args); - llvm::FunctionType *FnTy = CGM.getTypes().GetFunctionType(FnInfo); - SmallString<64> TyStr; - llvm::raw_svector_ostream Out(TyStr); - CGM.getCXXABI().getMangleContext().mangleCanonicalTypeName(Ty, Out); - std::string Name = getName({"omp_mapper", TyStr, D->getName()}); - auto *Fn = llvm::Function::Create(FnTy, llvm::GlobalValue::InternalLinkage, -Name, &CGM.getModule()); - CGM.SetInternalFunctionAttributes(GlobalDecl(), Fn, FnInfo); - Fn->removeFnAttr(llvm::Attribute::OptimizeNone); - // Start the mapper function code generation. CodeGenFunction MapperCGF(CGM); - MapperCGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, FnInfo, Args, Loc, Loc); - // Compute the starting and end addresses of array elements. - llvm::Value *Size = MapperCGF.EmitLoadOfScalar( - MapperCGF.GetAddrOfLocalVar(&SizeArg), /*Volatile=*/false, - C.getPointerType(Int64Ty), Loc); - // Prepare common arguments for array initiation and deletion. - llvm::Value *Handle = MapperCGF.EmitLoadOfScalar( - MapperCGF.GetAddrOfLocalVar(&HandleArg), - /*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc); - llvm::Value *BaseIn = MapperCGF.EmitLoadOfScalar( - MapperCGF.GetAddrOfLocalVar(&BaseArg), - /*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc); - llvm::Value *BeginIn = MapperCGF.EmitLoadOfScalar( - MapperCGF.GetAddrOfLocalVar(&BeginArg), - /*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc); - // Convert the size in bytes into the number of array elements. - Size = MapperCGF.Builder.CreateExactUDiv( - Size, MapperCGF.Builder.getInt64(ElementSize.getQuantity())); - llvm::Value *PtrBegin = MapperCGF.Builder.CreateBitCast( - BeginIn, CGM.getTypes().ConvertTypeForMem(PtrTy)); - llvm::Value *PtrEnd = MapperCGF.Builder.CreateGEP(ElemTy, PtrBegin, Size); - llvm::Value *MapType = MapperCGF.EmitLoadOfScalar( - MapperCGF.GetAddrOfLocalVar(&TypeArg), /*Volatile=*/false, - C.getPointerType(Int64Ty), Loc); - llvm::Value *MapName = MapperCGF.EmitLoadOfScalar( - MapperCGF.GetAddrOfLocalVar(&NameArg), - /*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc); - - // Emit array initiation if this is an array section and \p MapType indicates - // that memory allocation is required. - llvm::BasicBlock *HeadBB = MapperCGF.createBasicBlock("omp.arraymap.head"); - emitUDMapperArrayInitOrDel(MapperCGF, Handle, BaseIn, BeginIn, Size, MapType, - MapName, ElementSize, HeadBB, /*IsInit=*/true); - - // Emit a for loop to iterate through SizeArg of elements and map all of them. - - // Emit the loop header block. - MapperCGF.EmitBlock(HeadBB); - llvm::BasicBlock *BodyBB = MapperCGF.createBasicBlock("omp.arraymap.body"); - llvm::BasicBlock *DoneBB = MapperCGF.createBasicBlock("omp.done"); - // Evaluate whether the initial condition is satisfied. - llvm::Value *IsEmpty = - MapperCGF.Builder.CreateICmpEQ(PtrBegin, PtrEnd, "omp.arraymap.isempty"); - MapperCGF.Builder.CreateCondBr(IsEmpty, DoneBB, BodyBB); - llvm::BasicBlock *EntryBB = MapperCGF.Builder.GetInsertBlock(); + MappableExprsHandler::MapCombinedInfoTy CombinedInfo; + auto PrivatizeAndGenMapInfoCB = +
[clang] [llvm] [OpenMP][Clang] Migrate OpenMP UserDefinedMapper from Clang to OMPIRBuilder (PR #110001)
https://github.com/jplehr commented: There is a nit, but I'm also not very familiar with this piece of code. https://github.com/llvm/llvm-project/pull/110001 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Fix LLVM include in xray_interface.cpp (PR #111978)
https://github.com/jplehr approved this pull request. LGTM next time please separate PRs for fix and formatting. https://github.com/llvm/llvm-project/pull/111978 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [XRay] Fix LLVM include in xray_interface.cpp (PR #111978)
https://github.com/jplehr closed https://github.com/llvm/llvm-project/pull/111978 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] Reapply " [XRay] Add support for instrumentation of DSOs on x86_64 (#90959)" (PR #113548)
https://github.com/jplehr approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/113548 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] Reapply " [XRay] Add support for instrumentation of DSOs on x86_64 (#90959)" (PR #113548)
https://github.com/jplehr closed https://github.com/llvm/llvm-project/pull/113548 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [clang] Add option to specify opt pipeline during offload lto (PR #114401)
jplehr wrote: Just out of curiosity: Are all these things documented reasonably well somewhere? https://github.com/llvm/llvm-project/pull/114401 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically link the `compiler-rt` for GPUs if present (PR #109152)
jplehr wrote: I guess that makes sense. https://github.com/llvm/llvm-project/pull/109152 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [OpenMP][Clang] Migrate OpenMP UserDefinedMapper from Clang to OMPIRBuilder (PR #110001)
jplehr wrote: Thank you. Will take a closer look next week. So far, I ran this through one of our buildbot configs and did not see an issue there. https://github.com/llvm/llvm-project/pull/110001 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically link the `compiler-rt` for GPUs if present (PR #109152)
jplehr wrote: I ran this through a buildbot config and found no errors. https://github.com/llvm/llvm-project/pull/109152 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Automatically link the `compiler-rt` for GPUs if present (PR #109152)
https://github.com/jplehr approved this pull request. I think this is a reasonable change. https://github.com/llvm/llvm-project/pull/109152 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] Reapply " [XRay] Add support for instrumentation of DSOs on x86_64 (#90959)" (PR #112930)
@@ -150,21 +151,30 @@ class MProtectHelper { namespace { -bool patchSled(const XRaySledEntry &Sled, bool Enable, - int32_t FuncId) XRAY_NEVER_INSTRUMENT { +bool isObjectLoaded(int32_t ObjId) { + SpinMutexLock Guard(&XRayInstrMapMutex); + if (ObjId < 0 || static_cast(ObjId) >= + atomic_load(&XRayNumObjects, memory_order_acquire)) { +return false; + } + return XRayInstrMaps[ObjId].Loaded; +} + +bool patchSled(const XRaySledEntry &Sled, bool Enable, int32_t FuncId, + const XRayTrampolines &Trampolines) XRAY_NEVER_INSTRUMENT { bool Success = false; switch (Sled.Kind) { case XRayEntryType::ENTRY: -Success = patchFunctionEntry(Enable, FuncId, Sled, __xray_FunctionEntry); +Success = patchFunctionEntry(Enable, FuncId, Sled, Trampolines, false); jplehr wrote: ```suggestion Success = patchFunctionEntry(Enable, FuncId, Sled, Trampolines, /* LogArgs=*/false); ``` There is some syntax like that, that is understood by clang-format and that helps to know what this is actually doing. https://github.com/llvm/llvm-project/pull/112930 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] Reapply " [XRay] Add support for instrumentation of DSOs on x86_64 (#90959)" (PR #112930)
https://github.com/jplehr approved this pull request. Only a nit. Let's see what happens. https://github.com/llvm/llvm-project/pull/112930 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] Reapply " [XRay] Add support for instrumentation of DSOs on x86_64 (#90959)" (PR #112930)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/112930 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add a type for the named barrier (PR #113614)
jplehr wrote: Hi, This broke our OpenMP / Offload bots. It would be great if that can be easily fixed or reverted if more time is required to inspect the issue. Thanks! https://github.com/llvm/llvm-project/pull/113614 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Add a type for the named barrier (PR #113614)
jplehr wrote: > I just pushed a fix in > [75252e2](https://github.com/llvm/llvm-project/commit/75252e29ea6a0959f3c1670e641a03fc18fc65fa). I see another error in one of our bots for this patch: https://lab.llvm.org/staging/#/builders/130/builds/7112 https://github.com/llvm/llvm-project/pull/113614 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][Test] Update test after #115159 (PR #115172)
https://github.com/jplehr approved this pull request. LGTM to temporarily fix the bots. https://github.com/llvm/llvm-project/pull/115172 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Move warning about memset/memcpy to NonTriviallyCopyable type… (PR #117387)
jplehr wrote: > LLVM Buildbot has detected a new failure on builder > `openmp-offload-amdgpu-runtime` running on `omp-vega20-0` while building > `clang` at step 7 "Add check check-offload". > > Full details are available at: > https://lab.llvm.org/buildbot/#/builders/30/builds/11206 > > Here is the relevant piece of the build log for the reference I think you can ignore this failure. Apologies. https://github.com/llvm/llvm-project/pull/117387 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits