[clang] 3bbbe4c - [OpenMP] Add Additional Function Attribute Information to OMPKinds.def
Author: Joseph Huber Date: 2020-07-18T12:55:50-04:00 New Revision: 3bbbe4c4b6c8e20538a388df164da6f8d935e0cc URL: https://github.com/llvm/llvm-project/commit/3bbbe4c4b6c8e20538a388df164da6f8d935e0cc DIFF: https://github.com/llvm/llvm-project/commit/3bbbe4c4b6c8e20538a388df164da6f8d935e0cc.diff LOG: [OpenMP] Add Additional Function Attribute Information to OMPKinds.def Summary: This patch adds more function attribute information to the runtime function definitions in OMPKinds.def. The goal is to provide sufficient information about OpenMP runtime functions to perform more optimizations on OpenMP code. Reviewers: jdoerfert Subscribers: aaron.ballman cfe-commits yaxunl guansong sstefan1 llvm-commits Tags: #OpenMP #clang #LLVM Differential Revision: https://reviews.llvm.org/D81031 Added: Modified: clang/test/OpenMP/barrier_codegen.cpp llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/test/Transforms/OpenMP/add_attributes.ll llvm/test/Transforms/OpenMP/parallel_deletion.ll Removed: diff --git a/clang/test/OpenMP/barrier_codegen.cpp b/clang/test/OpenMP/barrier_codegen.cpp index f84a26380df9..35b2ed721276 100644 --- a/clang/test/OpenMP/barrier_codegen.cpp +++ b/clang/test/OpenMP/barrier_codegen.cpp @@ -46,7 +46,7 @@ int main(int argc, char **argv) { // IRBUILDER: ; Function Attrs: nounwind // IRBUILDER-NEXT: declare i32 @__kmpc_global_thread_num(%struct.ident_t*) # // IRBUILDER_OPT: ; Function Attrs: inaccessiblememonly nofree nosync nounwind readonly -// IRBUILDER_OPT-NEXT: declare i32 @__kmpc_global_thread_num(%struct.ident_t*) # +// IRBUILDER_OPT-NEXT: declare i32 @__kmpc_global_thread_num(%struct.ident_t* nocapture nofree readonly) # // CHECK: define {{.+}} [[TMAIN_INT]]( // CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num([[IDENT_T]]* [[LOC]]) diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index 0dc2b34f2e4d..4f2fcb8af5d1 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -383,7 +383,8 @@ __OMP_RTL(__kmpc_push_proc_bind, false, Void, IdentPtr, Int32, /* Int */ Int32) __OMP_RTL(__kmpc_serialized_parallel, false, Void, IdentPtr, Int32) __OMP_RTL(__kmpc_end_serialized_parallel, false, Void, IdentPtr, Int32) __OMP_RTL(__kmpc_omp_reg_task_with_affinity, false, Int32, IdentPtr, Int32, - Int8Ptr, Int32, Int8Ptr) + /* kmp_task_t */ VoidPtr, Int32, + /* kmp_task_affinity_info_t */ VoidPtr) __OMP_RTL(omp_get_thread_num, false, Int32, ) __OMP_RTL(omp_get_num_threads, false, Int32, ) @@ -430,8 +431,7 @@ __OMP_RTL(__kmpc_reduce, false, Int32, IdentPtr, Int32, Int32, SizeTy, VoidPtr, ReduceFunctionPtr, KmpCriticalNamePtrTy) __OMP_RTL(__kmpc_reduce_nowait, false, Int32, IdentPtr, Int32, Int32, SizeTy, VoidPtr, ReduceFunctionPtr, KmpCriticalNamePtrTy) -__OMP_RTL(__kmpc_end_reduce, false, Void, IdentPtr, Int32, - KmpCriticalNamePtrTy) +__OMP_RTL(__kmpc_end_reduce, false, Void, IdentPtr, Int32, KmpCriticalNamePtrTy) __OMP_RTL(__kmpc_end_reduce_nowait, false, Void, IdentPtr, Int32, KmpCriticalNamePtrTy) @@ -514,10 +514,10 @@ __OMP_RTL(__kmpc_taskloop, false, Void, IdentPtr, /* Int */ Int32, VoidPtr, /* Int */ Int32, Int64, VoidPtr) __OMP_RTL(__kmpc_omp_target_task_alloc, false, /* kmp_task_t */ VoidPtr, IdentPtr, Int32, Int32, SizeTy, SizeTy, TaskRoutineEntryPtr, Int64) -__OMP_RTL(__kmpc_taskred_modifier_init, false, VoidPtr, IdentPtr, - /* Int */ Int32, /* Int */ Int32, /* Int */ Int32, VoidPtr) -__OMP_RTL(__kmpc_taskred_init, false, VoidPtr, /* Int */ Int32, - /* Int */ Int32, VoidPtr) +__OMP_RTL(__kmpc_taskred_modifier_init, false, /* kmp_taskgroup */ VoidPtr, + IdentPtr, /* Int */ Int32, /* Int */ Int32, /* Int */ Int32, VoidPtr) +__OMP_RTL(__kmpc_taskred_init, false, /* kmp_taskgroup */ VoidPtr, + /* Int */ Int32, /* Int */ Int32, VoidPtr) __OMP_RTL(__kmpc_task_reduction_modifier_fini, false, Void, IdentPtr, /* Int */ Int32, /* Int */ Int32) __OMP_RTL(__kmpc_task_reduction_get_th_data, false, VoidPtr, Int32, VoidPtr, @@ -594,7 +594,9 @@ __OMP_RTL(__last, false, Void, ) #undef __OMP_RTL #undef OMP_RTL +#define ParamAttrs(...) ArrayRef({__VA_ARGS__}) #define EnumAttr(Kind) Attribute::get(Ctx, Attribute::AttrKind::Kind) +#define EnumAttrInt(Kind, N) Attribute::get(Ctx, Attribute::AttrKind::Kind, N) #define AttributeSet(...) \ AttributeSet::get(Ctx, ArrayRef({__VA_ARGS__})) @@ -607,19 +609,94 @@ __OMP_RTL(__last, false, Void, ) __OMP_ATTRS_SET(GetterAttrs, OptimisticAttributes ? AttributeSet(EnumAttr(NoUnwind), EnumAttr(ReadOnly), - EnumAttr(NoS
[clang] 5dbc7cf - [Object] Refactor code for extracting offload binaries
Author: Joseph Huber Date: 2022-09-06T08:55:16-05:00 New Revision: 5dbc7cf7cac4428e0876a94a4fca10fe60af7328 URL: https://github.com/llvm/llvm-project/commit/5dbc7cf7cac4428e0876a94a4fca10fe60af7328 DIFF: https://github.com/llvm/llvm-project/commit/5dbc7cf7cac4428e0876a94a4fca10fe60af7328.diff LOG: [Object] Refactor code for extracting offload binaries We currently extract offload binaries inside of the linker wrapper. Other tools may wish to do the same extraction operation. This patch simply factors out this handling into the `OffloadBinary.h` interface. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D132689 Added: Modified: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp llvm/include/llvm/Object/OffloadBinary.h llvm/lib/Object/CMakeLists.txt llvm/lib/Object/OffloadBinary.cpp Removed: diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp index f9d2c7710c77d..d29c4f93d60f7 100644 --- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp +++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp @@ -86,22 +86,6 @@ static std::atomic LTOError; using OffloadingImage = OffloadBinary::OffloadingImage; -/// A class to contain the binary information for a single OffloadBinary. -class OffloadFile : public OwningBinary { -public: - using TargetID = std::pair; - - OffloadFile(std::unique_ptr Binary, - std::unique_ptr Buffer) - : OwningBinary(std::move(Binary), std::move(Buffer)) {} - - /// We use the Triple and Architecture pair to group linker inputs together. - /// This conversion function lets us use these files in a hash-map. - operator TargetID() const { -return std::make_pair(getBinary()->getTriple(), getBinary()->getArch()); - } -}; - namespace llvm { // Provide DenseMapInfo so that OffloadKind can be used in a DenseMap. template <> struct DenseMapInfo { @@ -162,9 +146,6 @@ const OptTable &getOptTable() { return *Table; } -Error extractFromBuffer(std::unique_ptr Buffer, -SmallVectorImpl &DeviceFiles); - void printCommands(ArrayRef CmdArgs) { if (CmdArgs.empty()) return; @@ -284,150 +265,6 @@ void printVersion(raw_ostream &OS) { OS << clang::getClangToolFullVersion("clang-linker-wrapper") << '\n'; } -/// Attempts to extract all the embedded device images contained inside the -/// buffer \p Contents. The buffer is expected to contain a valid offloading -/// binary format. -Error extractOffloadFiles(MemoryBufferRef Contents, - SmallVectorImpl &DeviceFiles) { - uint64_t Offset = 0; - // There could be multiple offloading binaries stored at this section. - while (Offset < Contents.getBuffer().size()) { -std::unique_ptr Buffer = -MemoryBuffer::getMemBuffer(Contents.getBuffer().drop_front(Offset), "", - /*RequiresNullTerminator*/ false); -auto BinaryOrErr = OffloadBinary::create(*Buffer); -if (!BinaryOrErr) - return BinaryOrErr.takeError(); -OffloadBinary &Binary = **BinaryOrErr; - -// Create a new owned binary with a copy of the original memory. -std::unique_ptr BufferCopy = MemoryBuffer::getMemBufferCopy( -Binary.getData().take_front(Binary.getSize()), -Contents.getBufferIdentifier()); -auto NewBinaryOrErr = OffloadBinary::create(*BufferCopy); -if (!NewBinaryOrErr) - return NewBinaryOrErr.takeError(); -DeviceFiles.emplace_back(std::move(*NewBinaryOrErr), std::move(BufferCopy)); - -Offset += Binary.getSize(); - } - - return Error::success(); -} - -// Extract offloading binaries from an Object file \p Obj. -Error extractFromBinary(const ObjectFile &Obj, -SmallVectorImpl &DeviceFiles) { - for (ELFSectionRef Sec : Obj.sections()) { -if (Sec.getType() != ELF::SHT_LLVM_OFFLOADING) - continue; - -Expected Buffer = Sec.getContents(); -if (!Buffer) - return Buffer.takeError(); - -MemoryBufferRef Contents(*Buffer, Obj.getFileName()); -if (Error Err = extractOffloadFiles(Contents, DeviceFiles)) - return Err; - } - - return Error::success(); -} - -Error extractFromBitcode(std::unique_ptr Buffer, - SmallVectorImpl &DeviceFiles) { - LLVMContext Context; - SMDiagnostic Err; - std::unique_ptr M = getLazyIRModule(std::move(Buffer), Err, Context); - if (!M) -return createStringError(inconvertibleErrorCode(), - "Failed to create module"); - - // Extract offloading data from globals referenced by the - // `llvm.embedded.object` metadata with the `.llvm.offloading` section. - auto *MD = M->getNamedMetadata("llvm.embedded.objects"); - if (!MD) -return Error::success(); - - for (const MDNode *Op : MD->operands()) { -if (Op->getNumOperands() < 2) - contin
[clang] a69404c - [OffloadPackager] Add ability to extract images from other file types
Author: Joseph Huber Date: 2022-09-06T08:55:17-05:00 New Revision: a69404c0a294ce65432ce67d5f3e7dce28106496 URL: https://github.com/llvm/llvm-project/commit/a69404c0a294ce65432ce67d5f3e7dce28106496 DIFF: https://github.com/llvm/llvm-project/commit/a69404c0a294ce65432ce67d5f3e7dce28106496.diff LOG: [OffloadPackager] Add ability to extract images from other file types A previous patch added support for extracting images from offloading binaries. Users may wish to extract these files from the file types they are most commonly emebedded in, such as an ELF or bitcode. This can be difficult for the user to do manually, as these could be stored in different section names potentially. This patch addsp support for extracting these file types. Reviewed By: saiislam Differential Revision: https://reviews.llvm.org/D132607 Added: Modified: clang/test/Driver/offload-packager.c clang/tools/clang-offload-packager/ClangOffloadPackager.cpp Removed: diff --git a/clang/test/Driver/offload-packager.c b/clang/test/Driver/offload-packager.c index c4617d06e93d3..8d6ee50f2a190 100644 --- a/clang/test/Driver/offload-packager.c +++ b/clang/test/Driver/offload-packager.c @@ -29,3 +29,25 @@ // RUN: diff *-amdgcn-amd-amdhsa-gfx908.2.o %S/Inputs/dummy-elf.o; rm *-amdgcn-amd-amdhsa-gfx908.2.o // RUN: diff *-amdgcn-amd-amdhsa-gfx90a.3.o %S/Inputs/dummy-elf.o; rm *-amdgcn-amd-amdhsa-gfx90a.3.o // RUN: not diff *-amdgcn-amd-amdhsa-gfx90c.4.o %S/Inputs/dummy-elf.o + +// Check that we can extract from an ELF object file +// RUN: clang-offload-packager -o %t.out \ +// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908 \ +// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 +// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out +// RUN: clang-offload-packager %t.out \ +// RUN: --image=file=%t-sm_70.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \ +// RUN: --image=file=%t-gfx908.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908 +// RUN: diff %t-sm_70.o %S/Inputs/dummy-elf.o +// RUN: diff %t-gfx908.o %S/Inputs/dummy-elf.o + +// Check that we can extract from a bitcode file +// RUN: clang-offload-packager -o %t.out \ +// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908 \ +// RUN: --image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 +// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-llvm -o %t.bc -fembed-offload-object=%t.out +// RUN: clang-offload-packager %t.out \ +// RUN: --image=file=%t-sm_70.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \ +// RUN: --image=file=%t-gfx908.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908 +// RUN: diff %t-sm_70.o %S/Inputs/dummy-elf.o +// RUN: diff %t-gfx908.o %S/Inputs/dummy-elf.o diff --git a/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp b/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp index c9c722e0a5b5c..47ef155ef2783 100644 --- a/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp +++ b/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp @@ -14,8 +14,7 @@ #include "clang/Basic/Version.h" -#include "llvm/Object/Binary.h" -#include "llvm/Object/ObjectFile.h" +#include "llvm/BinaryFormat/Magic.h" #include "llvm/Object/OffloadBinary.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/FileOutputBuffer.h" @@ -123,29 +122,6 @@ static Error bundleImages() { return Error::success(); } -static Expected>> -extractOffloadFiles(MemoryBufferRef Contents) { - if (identify_magic(Contents.getBuffer()) != file_magic::offload_binary) -return createStringError(inconvertibleErrorCode(), - "Input buffer not an offloading binary"); - SmallVector> Binaries; - uint64_t Offset = 0; - // There could be multiple offloading binaries stored at this section. - while (Offset < Contents.getBuffer().size()) { -std::unique_ptr Buffer = -MemoryBuffer::getMemBuffer(Contents.getBuffer().drop_front(Offset), "", - /*RequiresNullTerminator*/ false); -auto BinaryOrErr = OffloadBinary::create(*Buffer); -if (!BinaryOrErr) - return BinaryOrErr.takeError(); - -Offset += (*BinaryOrErr)->getSize(); -Binaries.emplace_back(std::move(*BinaryOrErr)); - } - - return std::move(Binaries); -} - static Error unbundleImages() { ErrorOr> BufferOrErr = MemoryBuffer::getFileOrSTDIN(InputFile); @@ -159,9 +135,9 @@ static Error unbundleImages() { Buffer = MemoryBuffer::getMemBufferCopy(Buffer->getBuffer(), Buffer->getBufferIdentifier()); - auto BinariesOrErr = extractOffloadFiles(*Buffer); - if (!BinariesOrErr) -return BinariesOrErr.takeError();
[clang] 57ef29f - [OpenMP] Remove use of removed '-f[no-]openmp-new-driver' flag
Author: Joseph Huber Date: 2022-09-06T13:40:05-05:00 New Revision: 57ef29f2835eb594bc2ad4793df05188be4c2ef6 URL: https://github.com/llvm/llvm-project/commit/57ef29f2835eb594bc2ad4793df05188be4c2ef6 DIFF: https://github.com/llvm/llvm-project/commit/57ef29f2835eb594bc2ad4793df05188be4c2ef6.diff LOG: [OpenMP] Remove use of removed '-f[no-]openmp-new-driver' flag The changes in D130020 removed all support for the old method of compiling OpenMP offloading programs. This means that `-fopenmp-new-driver` has no effect and `-fno-openmp-new-driver` does not work. This patch removes the use and documentation of this flag. Note that the `--offload-new-driver` flag still exists for using the new driver optionally with CUDA and HIP. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D133367 Added: Modified: clang/docs/ClangCommandLineReference.rst clang/lib/Driver/Driver.cpp clang/lib/Driver/ToolChains/Clang.cpp clang/test/Driver/amdgpu-openmp-toolchain.c Removed: diff --git a/clang/docs/ClangCommandLineReference.rst b/clang/docs/ClangCommandLineReference.rst index 141c1464638a5..a7dc0634e97c0 100644 --- a/clang/docs/ClangCommandLineReference.rst +++ b/clang/docs/ClangCommandLineReference.rst @@ -2181,10 +2181,6 @@ Enable all Clang extensions for OpenMP directives and clauses Set rpath on OpenMP executables -.. option:: -fopenmp-new-driver - -Use the new driver for OpenMP offloading. - .. option:: -fopenmp-offload-mandatory Do not create a host fallback if offloading to the device fails. diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index 3743515d3d43f..36fba5d91eaf4 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -3902,9 +3902,7 @@ void Driver::BuildActions(Compilation &C, DerivedArgList &Args, OffloadingActionBuilder OffloadBuilder(C, Args, Inputs); bool UseNewOffloadingDriver = - (C.isOffloadingHostKind(Action::OFK_OpenMP) && - Args.hasFlag(options::OPT_fopenmp_new_driver, -options::OPT_no_offload_new_driver, true)) || + C.isOffloadingHostKind(Action::OFK_OpenMP) || Args.hasFlag(options::OPT_offload_new_driver, options::OPT_no_offload_new_driver, false); diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 99a8642cfd85b..d39f8715c7a19 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -4459,9 +4459,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, bool IsDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) || JA.isDeviceOffloading(Action::OFK_Host)); bool IsHostOffloadingAction = - (JA.isHostOffloading(Action::OFK_OpenMP) && - Args.hasFlag(options::OPT_fopenmp_new_driver, -options::OPT_no_offload_new_driver, true)) || + JA.isHostOffloading(Action::OFK_OpenMP) || (JA.isHostOffloading(C.getActiveOffloadKinds()) && Args.hasFlag(options::OPT_offload_new_driver, options::OPT_no_offload_new_driver, false)); @@ -4762,9 +4760,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, if (IsUsingLTO) { // Only AMDGPU supports device-side LTO. - if (IsDeviceOffloadAction && - !Args.hasFlag(options::OPT_fopenmp_new_driver, -options::OPT_no_offload_new_driver, true) && + if (IsDeviceOffloadAction && !JA.isHostOffloading(Action::OFK_OpenMP) && !Args.hasFlag(options::OPT_offload_new_driver, options::OPT_no_offload_new_driver, false) && !Triple.isAMDGPU()) { diff --git a/clang/test/Driver/amdgpu-openmp-toolchain.c b/clang/test/Driver/amdgpu-openmp-toolchain.c index 1551917ea50f0..50ce8e5d1b1fe 100644 --- a/clang/test/Driver/amdgpu-openmp-toolchain.c +++ b/clang/test/Driver/amdgpu-openmp-toolchain.c @@ -49,5 +49,5 @@ // RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR // CHECK-EMIT-LLVM-IR: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-emit-llvm" -// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -lm --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode -fopenmp-new-driver %s 2>&1 | FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NEW +// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -lm --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode %s 2>&1 | FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NEW // CHECK-LIB-DEVICE-NEW: {{.*}}clang-linker-wrapper{{.*}}-
[clang] 3a62399 - [OpenMP] Fix logic error when building offloading applications
Author: Joseph Huber Date: 2022-09-06T13:56:24-05:00 New Revision: 3a623999f3ff96843f97ee300e0c94b8cbc88a9f URL: https://github.com/llvm/llvm-project/commit/3a623999f3ff96843f97ee300e0c94b8cbc88a9f DIFF: https://github.com/llvm/llvm-project/commit/3a623999f3ff96843f97ee300e0c94b8cbc88a9f.diff LOG: [OpenMP] Fix logic error when building offloading applications Summary: A previous patch removed support for the `-fopenmp-new-driver` and accidentally used the `isHostOffloading` flag instead of `isDeviceOffloading` which lead to some build errors when compiling for the offloading device. This patch addresses that. Added: Modified: clang/lib/Driver/ToolChains/Clang.cpp Removed: diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index d39f8715c7a1..d3b5f82cb5c2 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -4760,7 +4760,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, if (IsUsingLTO) { // Only AMDGPU supports device-side LTO. - if (IsDeviceOffloadAction && !JA.isHostOffloading(Action::OFK_OpenMP) && + if (IsDeviceOffloadAction && !JA.isDeviceOffloading(Action::OFK_OpenMP) && !Args.hasFlag(options::OPT_offload_new_driver, options::OPT_no_offload_new_driver, false) && !Triple.isAMDGPU()) { ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 2753eaf - [Clang] Fix the new driver crashing when using '-fsyntax-only'
Author: Joseph Huber Date: 2022-09-06T19:49:47-05:00 New Revision: 2753eafe5a7f003776b12f425c5b0a475e8fb6b7 URL: https://github.com/llvm/llvm-project/commit/2753eafe5a7f003776b12f425c5b0a475e8fb6b7 DIFF: https://github.com/llvm/llvm-project/commit/2753eafe5a7f003776b12f425c5b0a475e8fb6b7.diff LOG: [Clang] Fix the new driver crashing when using '-fsyntax-only' The new driver currently crashses when attempting to use the '-fsyntax-only' option. This is because the option causes all output to be given the `TY_Nothing' type which should signal the end of the pipeline. The new driver was not treating this correctly and attempting to use empty input. This patch fixes the handling so we do not attempt to continue when the input is nothing. One concession is that we must now check when generating the arguments for Clang if the input is of 'TY_Nothing'. This is because the new driver will only create code if the device code is a dependency on the host, creating the output without the dependency would require a complete rewrite of the logic as we do not maintain any state between calls to 'BuildOffloadingActions' so I believe this is the most straightforward method. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D133161 Added: Modified: clang/lib/Driver/Driver.cpp clang/lib/Driver/ToolChains/Clang.cpp clang/test/Driver/cuda-bindings.cu clang/test/Driver/hip-binding.hip clang/test/Driver/openmp-offload.c Removed: diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index 36fba5d91eaf4..9517331ade26b 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -4309,10 +4309,14 @@ Action *Driver::BuildOffloadingActions(Compilation &C, auto TCAndArch = TCAndArchs.begin(); for (Action *&A : DeviceActions) { +if (A->getType() == types::TY_Nothing) + continue; + A = ConstructPhaseAction(C, Args, Phase, A, Kind); if (isa(A) && isa(HostAction) && -Kind == Action::OFK_OpenMP) { +Kind == Action::OFK_OpenMP && +HostAction->getType() != types::TY_Nothing) { // OpenMP offloading has a dependency on the host compile action to // identify which declarations need to be emitted. This shouldn't be // collapsed with any other actions so we can use it in the device. @@ -4380,11 +4384,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C, nullptr, Action::OFK_None); } + // If we are unable to embed a single device output into the host, we need to + // add each device output as a host dependency to ensure they are still built. + bool SingleDeviceOutput = !llvm::any_of(OffloadActions, [](Action *A) { +return A->getType() == types::TY_Nothing; + }) && isa(HostAction); OffloadAction::HostDependence HDep( *HostAction, *C.getSingleOffloadToolChain(), /*BoundArch=*/nullptr, isa(HostAction) ? DDep : DDeps); - return C.MakeAction( - HDep, isa(HostAction) ? DDep : DDeps); + return C.MakeAction(HDep, SingleDeviceOutput ? DDep : DDeps); } Action *Driver::ConstructPhaseAction( diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index d3b5f82cb5c20..837486971d112 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -4496,8 +4496,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, const InputInfo *CudaDeviceInput = nullptr; const InputInfo *OpenMPDeviceInput = nullptr; for (const InputInfo &I : Inputs) { -if (&I == &Input) { - // This is the primary input. +if (&I == &Input || I.getType() == types::TY_Nothing) { + // This is the primary input or contains nothing. } else if (IsHeaderModulePrecompile && types::getPrecompiledType(I.getType()) == types::TY_PCH) { types::ID Expected = HeaderModuleInput.getType(); diff --git a/clang/test/Driver/cuda-bindings.cu b/clang/test/Driver/cuda-bindings.cu index 6c4398b706973..3cc65b8cf98bc 100644 --- a/clang/test/Driver/cuda-bindings.cu +++ b/clang/test/Driver/cuda-bindings.cu @@ -216,3 +216,14 @@ // RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only -c -o %t %s 2>&1 \ // RUN: | FileCheck -check-prefix=MULTI-D-ONLY-O %s // MULTI-D-ONLY-O: error: cannot specify -o when generating multiple output files + +// +// Check to ensure that we can use '-fsyntax-only' for CUDA output with the new +// driver. +// +// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver \ +// RUN:-fsyntax-only --offload-arch=sm_70 --offload-arch=sm_52 -c %s 2>&1 \ +// RUN: | FileCheck -check-prefix=SYNTAX-ONLY %s +// SYNTAX-ONLY: "-cc1" "-triple" "nvptx64-nvidia-cuda"{{.*}}"-fsyntax-only" +// SYNTAX-ONLY: "-cc1" "-triple" "nvptx64-nvidia-cuda"{{.*}}"-fsyntax-only" +// S
[clang] a6bb7c2 - [CUDA] Fix test failing when using the new driver
Author: Joseph Huber Date: 2022-09-06T20:14:20-05:00 New Revision: a6bb7c22fc288686010076ac253a12b4b1cd2ee5 URL: https://github.com/llvm/llvm-project/commit/a6bb7c22fc288686010076ac253a12b4b1cd2ee5 DIFF: https://github.com/llvm/llvm-project/commit/a6bb7c22fc288686010076ac253a12b4b1cd2ee5.diff LOG: [CUDA] Fix test failing when using the new driver Summary: Previously the new driver crashed when using `-fsyntax-only` which required a work-around in one of the test files. This was not properly updated when it was fixed for the new driver. This patch fixes the test and also adjusts a missing boolean check. Added: Modified: clang/lib/Driver/Driver.cpp clang/test/Driver/cuda-bindings.cu Removed: diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index 9517331ade26..ca8e0e5240e1 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -4391,7 +4391,7 @@ Action *Driver::BuildOffloadingActions(Compilation &C, }) && isa(HostAction); OffloadAction::HostDependence HDep( *HostAction, *C.getSingleOffloadToolChain(), - /*BoundArch=*/nullptr, isa(HostAction) ? DDep : DDeps); + /*BoundArch=*/nullptr, SingleDeviceOutput ? DDep : DDeps); return C.MakeAction(HDep, SingleDeviceOutput ? DDep : DDeps); } diff --git a/clang/test/Driver/cuda-bindings.cu b/clang/test/Driver/cuda-bindings.cu index 3cc65b8cf98b..ce4b423064bc 100644 --- a/clang/test/Driver/cuda-bindings.cu +++ b/clang/test/Driver/cuda-bindings.cu @@ -102,8 +102,6 @@ // NDSYN-NOT: inputs: // NDSYN: # "nvptx64-nvidia-cuda" - "clang", inputs: [{{.*}}], output: (nothing) // NDSYN-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: [{{.*}}], output: (nothing) -// ! FIXME: new driver erroneously attempts to run linker phase w/ no inputs. -// Remove these checks once the issue is solved. // NDSYN-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: [(nothing), (nothing)], output: "{{.*}}" // NDSYN-NEXT: # "powerpc64le-ibm-linux-gnu" - "clang", inputs: [{{.*}}], output: (nothing) // NDSYN-NOT: inputs: ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 7354a73 - [CUDA] Actually fix the test correctly this time
Author: Joseph Huber Date: 2022-09-06T20:31:27-05:00 New Revision: 7354a73945f1c123d66b01f51374ecbdba18fab3 URL: https://github.com/llvm/llvm-project/commit/7354a73945f1c123d66b01f51374ecbdba18fab3 DIFF: https://github.com/llvm/llvm-project/commit/7354a73945f1c123d66b01f51374ecbdba18fab3.diff LOG: [CUDA] Actually fix the test correctly this time Added: Modified: clang/test/Driver/cuda-bindings.cu Removed: diff --git a/clang/test/Driver/cuda-bindings.cu b/clang/test/Driver/cuda-bindings.cu index ce4b423064bc..f95d2de80f4a 100644 --- a/clang/test/Driver/cuda-bindings.cu +++ b/clang/test/Driver/cuda-bindings.cu @@ -102,7 +102,6 @@ // NDSYN-NOT: inputs: // NDSYN: # "nvptx64-nvidia-cuda" - "clang", inputs: [{{.*}}], output: (nothing) // NDSYN-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: [{{.*}}], output: (nothing) -// NDSYN-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: [(nothing), (nothing)], output: "{{.*}}" // NDSYN-NEXT: # "powerpc64le-ibm-linux-gnu" - "clang", inputs: [{{.*}}], output: (nothing) // NDSYN-NOT: inputs: ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 194ec84 - [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU
Author: Joseph Huber Date: 2022-09-14T09:42:06-05:00 New Revision: 194ec844f5c67306f505a3418038c5e75859bad8 URL: https://github.com/llvm/llvm-project/commit/194ec844f5c67306f505a3418038c5e75859bad8 DIFF: https://github.com/llvm/llvm-project/commit/194ec844f5c67306f505a3418038c5e75859bad8.diff LOG: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU Previously, we linked in the ROCm device libraries which provide math and other utility functions late. This is not stricly correct as this library contains several flags that are only set per-TU, such as fast math or denormalization. This patch changes this to pass the bitcode libraries per-TU using the same method we use for the CUDA libraries. This has the advantage that we correctly propagate attributes making this implementation more correct. Additionally, many annoying unused functions were not being fully removed during LTO. This lead to erroneous warning messages and remarks on unused functions. I am not sure if not finding these libraries should be a hard error. let me know if it should be demoted to a warning saying that some device utilities will not work without them. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D133726 Added: Modified: clang/include/clang/Driver/ToolChain.h clang/lib/Driver/ToolChain.cpp clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp clang/lib/Driver/ToolChains/AMDGPUOpenMP.h clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Driver/ToolChains/HIPAMD.cpp clang/lib/Driver/ToolChains/HIPAMD.h clang/lib/Driver/ToolChains/HIPSPV.cpp clang/lib/Driver/ToolChains/HIPSPV.h clang/test/Driver/amdgpu-openmp-toolchain.c Removed: diff --git a/clang/include/clang/Driver/ToolChain.h b/clang/include/clang/Driver/ToolChain.h index 59d8dafc079f..28137e36e2af 100644 --- a/clang/include/clang/Driver/ToolChain.h +++ b/clang/include/clang/Driver/ToolChain.h @@ -714,9 +714,9 @@ class ToolChain { virtual VersionTuple computeMSVCVersion(const Driver *D, const llvm::opt::ArgList &Args) const; - /// Get paths of HIP device libraries. + /// Get paths for device libraries. virtual llvm::SmallVector - getHIPDeviceLibs(const llvm::opt::ArgList &Args) const; + getDeviceLibs(const llvm::opt::ArgList &Args) const; /// Add the system specific linker arguments to use /// for the given HIP runtime library type. diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index 7f16469155bd..26c5087b4ac2 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1099,7 +1099,7 @@ void ToolChain::AddHIPIncludeArgs(const ArgList &DriverArgs, ArgStringList &CC1Args) const {} llvm::SmallVector -ToolChain::getHIPDeviceLibs(const ArgList &DriverArgs) const { +ToolChain::getDeviceLibs(const ArgList &DriverArgs) const { return {}; } diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp index 4866982a8dfa..8ab79e1af532 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp @@ -75,6 +75,12 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions( if (DriverArgs.hasArg(options::OPT_nogpulib)) return; + for (auto BCFile : getDeviceLibs(DriverArgs)) { +CC1Args.push_back(BCFile.ShouldInternalize ? "-mlink-builtin-bitcode" + : "-mlink-bitcode-file"); +CC1Args.push_back(DriverArgs.MakeArgString(BCFile.Path)); + } + // Link the bitcode library late if we're using device LTO. if (getDriver().isUsingLTO(/* IsOffload */ true)) return; @@ -158,3 +164,24 @@ AMDGPUOpenMPToolChain::computeMSVCVersion(const Driver *D, const ArgList &Args) const { return HostTC.computeMSVCVersion(D, Args); } + +llvm::SmallVector +AMDGPUOpenMPToolChain::getDeviceLibs(const llvm::opt::ArgList &Args) const { + if (Args.hasArg(options::OPT_nogpulib)) +return {}; + + if (!RocmInstallation.hasDeviceLibrary()) { +getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 0; +return {}; + } + + StringRef GpuArch = getProcessorFromTargetID( + getTriple(), Args.getLastArgValue(options::OPT_march_EQ)); + + SmallVector BCLibs; + for (auto BCLib : getCommonDeviceLibNames(Args, GpuArch.str(), +/*IsOpenMP=*/true)) +BCLibs.emplace_back(BCLib); + + return BCLibs; +} diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.h b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.h index 51a1c4696754..2be444a42c55 100644 --- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.h +++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.h @@ -54,6 +54,9 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUOpenMPToolChain final computeMSVCVersion(const Dri
[clang] bae1a2c - [OpenMP] Remove unused function after removing simplified interface
Author: Joseph Huber Date: 2022-09-14T10:14:43-05:00 New Revision: bae1a2cf3cce529b0d03df8bac962d13b407e117 URL: https://github.com/llvm/llvm-project/commit/bae1a2cf3cce529b0d03df8bac962d13b407e117 DIFF: https://github.com/llvm/llvm-project/commit/bae1a2cf3cce529b0d03df8bac962d13b407e117.diff LOG: [OpenMP] Remove unused function after removing simplified interface Summary: A previous patch removed the user of this function but did not remove the function causing unused function warnings. Remove it. Added: Modified: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index 1587d52846b1..0ab988968908 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -905,92 +905,6 @@ static bool hasNestedLightweightDirective(ASTContext &Ctx, return false; } -/// Checks if the construct supports lightweight runtime. It must be SPMD -/// construct + inner loop-based construct with static scheduling. -static bool supportsLightweightRuntime(ASTContext &Ctx, - const OMPExecutableDirective &D) { - if (!supportsSPMDExecutionMode(Ctx, D)) -return false; - OpenMPDirectiveKind DirectiveKind = D.getDirectiveKind(); - switch (DirectiveKind) { - case OMPD_target: - case OMPD_target_teams: - case OMPD_target_parallel: -return hasNestedLightweightDirective(Ctx, D); - case OMPD_target_parallel_for: - case OMPD_target_parallel_for_simd: - case OMPD_target_teams_distribute_parallel_for: - case OMPD_target_teams_distribute_parallel_for_simd: -// (Last|First)-privates must be shared in parallel region. -return hasStaticScheduling(D); - case OMPD_target_simd: - case OMPD_target_teams_distribute_simd: -return true; - case OMPD_target_teams_distribute: -return false; - case OMPD_parallel: - case OMPD_for: - case OMPD_parallel_for: - case OMPD_parallel_master: - case OMPD_parallel_sections: - case OMPD_for_simd: - case OMPD_parallel_for_simd: - case OMPD_cancel: - case OMPD_cancellation_point: - case OMPD_ordered: - case OMPD_threadprivate: - case OMPD_allocate: - case OMPD_task: - case OMPD_simd: - case OMPD_sections: - case OMPD_section: - case OMPD_single: - case OMPD_master: - case OMPD_critical: - case OMPD_taskyield: - case OMPD_barrier: - case OMPD_taskwait: - case OMPD_taskgroup: - case OMPD_atomic: - case OMPD_flush: - case OMPD_depobj: - case OMPD_scan: - case OMPD_teams: - case OMPD_target_data: - case OMPD_target_exit_data: - case OMPD_target_enter_data: - case OMPD_distribute: - case OMPD_distribute_simd: - case OMPD_distribute_parallel_for: - case OMPD_distribute_parallel_for_simd: - case OMPD_teams_distribute: - case OMPD_teams_distribute_simd: - case OMPD_teams_distribute_parallel_for: - case OMPD_teams_distribute_parallel_for_simd: - case OMPD_target_update: - case OMPD_declare_simd: - case OMPD_declare_variant: - case OMPD_begin_declare_variant: - case OMPD_end_declare_variant: - case OMPD_declare_target: - case OMPD_end_declare_target: - case OMPD_declare_reduction: - case OMPD_declare_mapper: - case OMPD_taskloop: - case OMPD_taskloop_simd: - case OMPD_master_taskloop: - case OMPD_master_taskloop_simd: - case OMPD_parallel_master_taskloop: - case OMPD_parallel_master_taskloop_simd: - case OMPD_requires: - case OMPD_unknown: - default: -break; - } - llvm_unreachable( - "Unknown programming model for OpenMP directive on NVPTX target."); -} - void CGOpenMPRuntimeGPU::emitNonSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName, llvm::Function *&OutlinedFn, ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)
https://github.com/jhuber6 approved this pull request. https://github.com/llvm/llvm-project/pull/71234 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
@@ -1035,6 +1043,13 @@ void EmitAssemblyHelper::RunOptimizationPipeline( } } + // Re-link against any bitcodes supplied via the -mlink-builtin-bitcode option + // Some optimizations may generate new function calls that would not have + // been linked pre-optimization (i.e. fused sincos calls generated by + // AMDGPULibCalls::fold_sincos.) + if (ClRelinkBuiltinBitcodePostop) jhuber6 wrote: So, what I had in mind is that we could make a new `clang` option similar to `-mlink-builtin-bitcode`. This would then be used by the HIP toolchain or similar when constructing the list of files to pass via `-mlink-builtin-bitcode`. We would then simply register those with this secondary pass. This approach seems much simpler, being a boolean option that just relinks everything, but I somewhat like the idea of `-mlink-builtin-bitcode` being a pre-link operation and having another one for post-linking. That being said, it may not be worth the extra work because this is a huge hack around this ecosystem already. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
https://github.com/jhuber6 commented: Some comments. I remember there was a reason we couldn't use the existing linking support and needed the new pass, what was that again? https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
@@ -1035,6 +1043,13 @@ void EmitAssemblyHelper::RunOptimizationPipeline( } } + // Re-link against any bitcodes supplied via the -mlink-builtin-bitcode option + // Some optimizations may generate new function calls that would not have + // been linked pre-optimization (i.e. fused sincos calls generated by + // AMDGPULibCalls::fold_sincos.) + if (ClRelinkBuiltinBitcodePostop) jhuber6 wrote: It's definitely the easier option. This problem is pretty specific, but I can see it being easier to just remove this class of bugs entirely. It's an ugly solution for an ugly problem overall. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
@@ -45,7 +46,8 @@ namespace clang { const TargetOptions &TOpts, const LangOptions &LOpts, StringRef TDesc, llvm::Module *M, BackendAction Action, llvm::IntrusiveRefCntPtr VFS, - std::unique_ptr OS); + std::unique_ptr OS, + BackendConsumer *BC = NULL); jhuber6 wrote: Use `nullptr` in C++, it's type safe. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
@@ -48,428 +49,369 @@ #include "llvm/Support/ToolOutputFile.h" #include "llvm/Support/YAMLTraits.h" #include "llvm/Transforms/IPO/Internalize.h" +#include "llvm/Transforms/Utils/Cloning.h" -#include #include using namespace clang; using namespace llvm; #define DEBUG_TYPE "codegenaction" namespace clang { - class BackendConsumer; - class ClangDiagnosticHandler final : public DiagnosticHandler { - public: -ClangDiagnosticHandler(const CodeGenOptions &CGOpts, BackendConsumer *BCon) -: CodeGenOpts(CGOpts), BackendCon(BCon) {} +class BackendConsumer; +class ClangDiagnosticHandler final : public DiagnosticHandler { +public: + ClangDiagnosticHandler(const CodeGenOptions &CGOpts, BackendConsumer *BCon) + : CodeGenOpts(CGOpts), BackendCon(BCon) {} -bool handleDiagnostics(const DiagnosticInfo &DI) override; + bool handleDiagnostics(const DiagnosticInfo &DI) override; -bool isAnalysisRemarkEnabled(StringRef PassName) const override { - return CodeGenOpts.OptimizationRemarkAnalysis.patternMatches(PassName); -} -bool isMissedOptRemarkEnabled(StringRef PassName) const override { - return CodeGenOpts.OptimizationRemarkMissed.patternMatches(PassName); -} -bool isPassedOptRemarkEnabled(StringRef PassName) const override { - return CodeGenOpts.OptimizationRemark.patternMatches(PassName); -} + bool isAnalysisRemarkEnabled(StringRef PassName) const override { +return CodeGenOpts.OptimizationRemarkAnalysis.patternMatches(PassName); + } + bool isMissedOptRemarkEnabled(StringRef PassName) const override { +return CodeGenOpts.OptimizationRemarkMissed.patternMatches(PassName); + } + bool isPassedOptRemarkEnabled(StringRef PassName) const override { +return CodeGenOpts.OptimizationRemark.patternMatches(PassName); + } -bool isAnyRemarkEnabled() const override { - return CodeGenOpts.OptimizationRemarkAnalysis.hasValidPattern() || - CodeGenOpts.OptimizationRemarkMissed.hasValidPattern() || - CodeGenOpts.OptimizationRemark.hasValidPattern(); -} + bool isAnyRemarkEnabled() const override { +return CodeGenOpts.OptimizationRemarkAnalysis.hasValidPattern() || + CodeGenOpts.OptimizationRemarkMissed.hasValidPattern() || + CodeGenOpts.OptimizationRemark.hasValidPattern(); + } - private: -const CodeGenOptions &CodeGenOpts; -BackendConsumer *BackendCon; - }; +private: + const CodeGenOptions &CodeGenOpts; + BackendConsumer *BackendCon; +}; + +static void reportOptRecordError(Error E, DiagnosticsEngine &Diags, + const CodeGenOptions &CodeGenOpts) { + handleAllErrors( + std::move(E), +[&](const LLVMRemarkSetupFileError &E) { +Diags.Report(diag::err_cannot_open_file) +<< CodeGenOpts.OptRecordFile << E.message(); + }, +[&](const LLVMRemarkSetupPatternError &E) { +Diags.Report(diag::err_drv_optimization_remark_pattern) +<< E.message() << CodeGenOpts.OptRecordPasses; + }, +[&](const LLVMRemarkSetupFormatError &E) { +Diags.Report(diag::err_drv_optimization_remark_format) +<< CodeGenOpts.OptRecordFormat; + }); +} - static void reportOptRecordError(Error E, DiagnosticsEngine &Diags, - const CodeGenOptions &CodeGenOpts) { -handleAllErrors( -std::move(E), - [&](const LLVMRemarkSetupFileError &E) { - Diags.Report(diag::err_cannot_open_file) - << CodeGenOpts.OptRecordFile << E.message(); -}, - [&](const LLVMRemarkSetupPatternError &E) { - Diags.Report(diag::err_drv_optimization_remark_pattern) - << E.message() << CodeGenOpts.OptRecordPasses; -}, - [&](const LLVMRemarkSetupFormatError &E) { - Diags.Report(diag::err_drv_optimization_remark_format) - << CodeGenOpts.OptRecordFormat; -}); -} +BackendConsumer::BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags, + IntrusiveRefCntPtr VFS, + const HeaderSearchOptions &HeaderSearchOpts, + const PreprocessorOptions &PPOpts, + const CodeGenOptions &CodeGenOpts, + const TargetOptions &TargetOpts, + const LangOptions &LangOpts, + const std::string &InFile, + SmallVector LinkModules, + std::unique_ptr OS, + LLVMContext &C, + CoverageSourceInfo *CoverageInfo) + : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts), + CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts), + AsmOutStream(std::move(OS)), Context(nullptr), FS(VFS), + LLVMIRGeneration("irgen", "LLVM
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
@@ -155,10 +162,10 @@ class EmitAssemblyHelper { return F; } - void - RunOptimizationPipeline(BackendAction Action, + void RunOptimizationPipeline(BackendAction Action, std::unique_ptr &OS, - std::unique_ptr &ThinLinkOS); + std::unique_ptr &ThinLinkOS, + BackendConsumer *BC); jhuber6 wrote: Is this properly formatted? https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
@@ -0,0 +1,29 @@ +//===-- LinkInModulesPass.cpp - Module Linking pass --- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +/// \file +/// +/// LinkInModulesPass implementation. +/// +//===--===// + +#include "LinkInModulesPass.h" +#include "BackendConsumer.h" + +using namespace llvm; + +LinkInModulesPass::LinkInModulesPass(clang::BackendConsumer *BC, + bool ShouldLinkFiles) : BC(BC), + ShouldLinkFiles(ShouldLinkFiles) {} + +PreservedAnalyses LinkInModulesPass::run(Module &M, ModuleAnalysisManager &AM) { + + if (BC != NULL && BC->LinkInModules(&M, ShouldLinkFiles)) jhuber6 wrote: ```suggestion if (BC && BC->LinkInModules(&M, ShouldLinkFiles)) ``` Here as well. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
https://github.com/jhuber6 approved this pull request. I'm not entirely happy with the existence of this hack, but it's an ugly solution to an ugly self-inflicted problem, so I can live with it for now. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
jhuber6 wrote: The code formatter says that it's not happy. Can you try `git clang-format HEAD~1` in your branch? https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
jhuber6 wrote: > I am getting this from the formatter: > > ``` > - void RunOptimizationPipeline(BackendAction Action, > - std::unique_ptr &OS, > - std::unique_ptr &ThinLinkOS, > - BackendConsumer *BC); > + void RunOptimizationPipeline( > + BackendAction Action, std::unique_ptr &OS, > + std::unique_ptr &ThinLinkOS, BackendConsumer > *BC); > ``` > > But in this case I am just following the existing style. I did notice a > couple of other improvements from the formatter though, and I've added those > changes. Just do what the formatter says, not every file is 100% clang-formatted so there's bits of old code that haven't been properly cleaned yet. This was the same line that I thought looked wrong so it should probably be fixed. Using `git clang-format HEAD~1` only formats what you've changed, so you don't need to worry about spurious edits. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
jhuber6 wrote: > > Just do what the formatter says, not every file is 100% clang-formatted so > > there's bits of old code that haven't been properly cleaned yet. This was > > the same line that I thought looked wrong so it should probably be fixed. > > Using `git clang-format HEAD~1` only formats what you've changed, so you > > don't need to worry about spurious edits. > > Isn't the standard to follow the existing style, not re-format small sections > of code during a commit to a different style? > > [Always follow the golden rule: > > If you are extending, enhancing, or bug fixing already implemented code, use > the style that is already being used so that the source is uniform and easy > to follow.](https://llvm.org/docs/CodingStandards.html) Yes, but this doesn't really apply since you changed the function signature so it needs to be reformatted. That rule primarily applies to sections that have been manually formatted but don't exactly match the `clang-format` rules. Another reason we don't do bulk `clang-format` everywhere is because it confuses `git blame`. However, in this case there's really no reason not to. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)
jhuber6 wrote: > That said, I definitely don't want this to be a barrier to getting this patch > in, so if you still feel like we should go with the clang-format > recommendation, I'll change it and also update the EmitAssembly and > EmitBackendOutput signatures which were flagged by clang-format for the same > reasons. You should generally just go with what `clang-format` says unless there's a compelling reason not to. There's a reason the CI complains if `git clang-format HEAD~1` doesn't come back clean. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] ReworkCtorDtor (PR #71739)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/71739 - [NVPTX] Allow the ctor/dtor lowering pass to emit kernels - [OpenMP] Rework handling of global ctor/dtors in OpenMP >From c1505a29d542bebd5c5e81d231e633c518b08caf Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 09:19:51 -0600 Subject: [PATCH 1/2] [NVPTX] Allow the ctor/dtor lowering pass to emit kernels Summary: This pass emits the new "nvptx$device$init" and "nvptx$device$fini" kernels that are callable by the device. This intends to mimic the method of lowering for AMDGPU where we emit `amdgcn.device.init` and `amdgcn.device.fini` respectively. These kernels simply iterate a symbol called `__init_array_start/stop` and `__fini_array_start/stop`. Normally, the linker provides these symbols automatically. In the AMDGPU case we only need call the kernel and we call the ctors / dtors. However, for NVPTX we require the user initializes these variables to the associated globals that we already emit as a part of this pass. The motivation behind this change is to move away from OpenMP's handling of ctors / dtors. I would much prefer that the backend / runtime handles this. That allows us to handle ctors / dtors in a language agnostic way, This approach requires that the runtime initializes the associated globals. They are marked `weak` so we can emit this per-TU. The kernel itself is `weak_odr` as it is copied exactly. One downside is that any module containing these kernels elicitis the "stack size cannot be statically determined warning" every time from `nvlink` which is annoying but inconsequential for functionality. It would be nice if there were a way to silence this warning however. --- .../Target/NVPTX/NVPTXCtorDtorLowering.cpp| 162 +- llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll| 58 +++ 2 files changed, 213 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp index ed7839cafe3a4ac..48221c210de1e3a 100644 --- a/llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp @@ -11,6 +11,7 @@ //===--===// #include "NVPTXCtorDtorLowering.h" +#include "MCTargetDesc/NVPTXBaseInfo.h" #include "NVPTX.h" #include "llvm/ADT/StringExtras.h" #include "llvm/IR/Constants.h" @@ -32,6 +33,11 @@ static cl::opt cl::desc("Override unique ID of ctor/dtor globals."), cl::init(""), cl::Hidden); +static cl::opt +CreateKernels("nvptx-lower-global-ctor-dtor-kernel", + cl::desc("Do not emit the init/fini kernels."), + cl::init(true), cl::Hidden); + namespace { static std::string getHash(StringRef Str) { @@ -42,11 +48,132 @@ static std::string getHash(StringRef Str) { return llvm::utohexstr(Hash.low(), /*LowerCase=*/true); } -static bool createInitOrFiniGlobls(Module &M, StringRef GlobalName, - bool IsCtor) { - GlobalVariable *GV = M.getGlobalVariable(GlobalName); - if (!GV || !GV->hasInitializer()) -return false; +static void addKernelMetadata(Module &M, GlobalValue *GV) { + llvm::LLVMContext &Ctx = M.getContext(); + + // Get "nvvm.annotations" metadata node. + llvm::NamedMDNode *MD = M.getOrInsertNamedMetadata("nvvm.annotations"); + + llvm::Metadata *KernelMDVals[] = { + llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, "kernel"), + llvm::ConstantAsMetadata::get( + llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))}; + + // This kernel is only to be called single-threaded. + llvm::Metadata *ThreadXMDVals[] = { + llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, "maxntidx"), + llvm::ConstantAsMetadata::get( + llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))}; + llvm::Metadata *ThreadYMDVals[] = { + llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, "maxntidy"), + llvm::ConstantAsMetadata::get( + llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))}; + llvm::Metadata *ThreadZMDVals[] = { + llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, "maxntidz"), + llvm::ConstantAsMetadata::get( + llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))}; + + llvm::Metadata *BlockMDVals[] = { + llvm::ConstantAsMetadata::get(GV), + llvm::MDString::get(Ctx, "maxclusterrank"), + llvm::ConstantAsMetadata::get( + llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))}; + + // Append metadata to nvvm.annotations. + MD->addOperand(llvm::MDNode::get(Ctx, KernelMDVals)); + MD->addOperand(llvm::MDNode::get(Ctx, ThreadXMDVals)); + MD->addOperand(llvm::MDNode::get(Ctx, ThreadYMDVals)); + MD->addOperand(llvm::MDNode::get(Ctx, ThreadZMDVals)); + MD->addOperand(llvm::MDNode::get(Ctx, BlockMDVals)); +} + +static Function *createInitOr
[llvm] [clang] [openmp] ReworkCtorDtor (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 6be02dce45d672dd358bc277c97815cb201c4d0b Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 14 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 45 -- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 2 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.cpp | 22 +++ .../common/PluginInterface/GlobalHandler.h| 4 + .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 109 +++ openmp/libomptarget/src/rtl.cpp | 9 +- 19 files changed, 319 insertions(+), 211 deletions(-) diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..d816aa8554df8bb 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -22,6 +22,7 @@ #include "llvm/IR/Intrinsics.h" #include "llvm/IR/MDBuilder.h" #include "llvm/Support/Path.h" +#include "llvm/Transforms/Utils/ModuleUtils.h" using namespace clang; using namespace CodeGen; @@ -327,6 +328,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the C atexit runtime function. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +529,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/li
[llvm] [openmp] [clang] ReworkCtorDtor (PR #71739)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] ReworkCtorDtor (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From a9f8285ecef2d43c6ccd87a1be9f795d566ed9e8 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 2 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + 18 files changed, 286 insertions(+), 215 deletions(-) diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..bc77d4ed0851d4c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the C atexit runtime function. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emitThread
[clang] [llvm] [openmp] ReworkCtorDtor (PR #71739)
@@ -95,7 +95,7 @@ using namespace llvm; static cl::opt LowerCtorDtor("nvptx-lower-global-ctor-dtor", cl::desc("Lower GPU ctor / dtors to globals on the device."), - cl::init(false), cl::Hidden); + cl::init(true), cl::Hidden); jhuber6 wrote: This was the easiest way to get the desired effect. Passing `--nvptx-lower-glocal-ctor-dtor` is subtly broken because I think it will fail if the user didn't build with the NVPTX target. The OpenMP runtime is supposed to be buildable for NVPTX even without backend support so I was worried it would degrade that. Do you think I could check for `openmp` module data and set it based off of that? OpenMP always emits an `openmp` global flag. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] ReworkCtorDtor (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 159031c4c880e552ea90ec8ab6f6ed328c09ff10 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + 18 files changed, 290 insertions(+), 216 deletions(-) diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..bc77d4ed0851d4c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the C atexit runtime function. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emitThread
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -95,7 +95,7 @@ using namespace llvm; static cl::opt LowerCtorDtor("nvptx-lower-global-ctor-dtor", cl::desc("Lower GPU ctor / dtors to globals on the device."), - cl::init(false), cl::Hidden); + cl::init(true), cl::Hidden); jhuber6 wrote: Done https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 07a74b4561f2eb4f8debd40c7c2313da7b7fb0eb Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + 18 files changed, 290 insertions(+), 216 deletions(-) diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emit
[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 5e378ae3efdebedb044528167131c8cae4571a59 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 8 ++ .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + 18 files changed, 291 insertions(+), 216 deletions(-) diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emi
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); jhuber6 wrote: If there were any global ctors / dtors the backend will emit a kernel. This is simply encoding "Does this symbol exist? If not continue on". We check the ELF symbol table directly as it's more efficient than going through the device API. We probably need to encode the logic better, since `consumeError` is a bit of a code smell. Maybe a helper function like `Handler.hasGlobal` or something. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); +} + +// Allocate and construct the AMDGPU kernel. +GenericKernelTy *AMDGPUKernel = Plugin.allocate(); +if (!AMDGPUKernel) + return Plugin::error("Failed to allocate memory for AMDGPU kernel"); + +new (AMDGPUKernel) AMDGPUKernelTy(Name); +if (auto Err = AMDGPUKernel->initImpl(*this, Image)) + return std::move(Err); + +auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>(); +AsyncInfoWrapperTy AsyncInfoWrapper(*this, AsyncInfoPtr); + +if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper)) + return std::move(Err); + +KernelArgsTy KernelArgs = {}; +if (auto Err = AMDGPUKernel->launchImpl(*this, /*NumThread=*/1u, +/*NumBlocks=*/1ul, KernelArgs, +/*Args=*/nullptr, AsyncInfoWrapper)) + return std::move(Err); + +if (auto Err = synchronize(AsyncInfoPtr)) + return std::move(Err); +Error Err = Error::success(); jhuber6 wrote: Yes https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 0a1f4b5d514a5e1525e3178a80f6e8f5638bfb69 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 8 ++ .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + 18 files changed, 291 insertions(+), 216 deletions(-) diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emi
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy { Error synchronize(__tgt_async_info *AsyncInfo); virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0; + /// Invokes any global constructors on the device if present and is required + /// by the target. + virtual Error callGlobalConstructors(GenericPluginTy &Plugin, + DeviceImageTy &Image) { +return Error::success(); jhuber6 wrote: This code is in the header above the definition of the `Plugin` class, so we can't use that without a complete reordering. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D, if (D.isNoDestroy(CGM.getContext())) return; + // OpenMP offloading supports C++ constructors and destructors but we do not + // always have 'atexit' available. Instead lower these to use the LLVM global + // destructors which we can handle directly in the runtime. + if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice && + !D.isStaticLocal() && + (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX())) jhuber6 wrote: Yeah, these types of things are problematic especially if we consider getting SPIR-V support eventually. The logic basically goes like this. OpenMP supports global destructors but does not always support the `atexit` function. The old logic used to replace everything. This now at least lets CPU based targets use regular handling. I could make this unconditional for OpenMP, but I figured it'd be better to allow the CPU based targets to use the regular handling. More or less this is just a concession to prevent regressions from this patch. The old logic looked like this, which did this unconditionally. Like I said, could remove the AMD and PTX checks and just do this on the CPU as well if it would be better. ```c++ if (CGM.getLangOpts().OMPTargetTriples.empty() && !CGM.getLangOpts().OpenMPIsTargetDevice) return false; ``` https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 5283c5e08877b11a0eece51ca3877c9f5f8c7b82 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + 18 files changed, 290 insertions(+), 216 deletions(-) diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emit
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D, if (D.isNoDestroy(CGM.getContext())) return; + // OpenMP offloading supports C++ constructors and destructors but we do not + // always have 'atexit' available. Instead lower these to use the LLVM global + // destructors which we can handle directly in the runtime. + if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice && + !D.isStaticLocal() && + (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX())) jhuber6 wrote: Just make this apply to all triples. I don't want to remove the dependency on the OpenMP language because this is somewhat of a hack. We can revisit this later if needed. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CodeGen] Implement post-opt linking option for builtin bitocdes (PR #69371)
@@ -98,6 +100,11 @@ extern cl::opt PrintPipelinePasses; static cl::opt ClSanitizeOnOptimizerEarlyEP( "sanitizer-early-opt-ep", cl::Optional, cl::desc("Insert sanitizers on OptimizerEarlyEP."), cl::init(false)); + +// Re-link builtin bitcodes after optimization +static cl::opt ClRelinkBuiltinBitcodePostop( +"relink-builtin-bitcode-postop", cl::Optional, +cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false)); jhuber6 wrote: That's a clang flag, this is presumably more of an LLVM one because this added a new pass that lives in Clang. I still think the solution to this was to just stop the backend from doing this optimization if it will obviously break it, but supposedly that caused performance regressions. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
jhuber6 wrote: Just noticed I'm actually calling the destructors backwards in AMDGPU. Will fix that. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D, if (D.isNoDestroy(CGM.getContext())) return; + // OpenMP offloading supports C++ constructors and destructors but we do not + // always have 'atexit' available. Instead lower these to use the LLVM global + // destructors which we can handle directly in the runtime. + if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice && + !D.isStaticLocal() && + (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX())) jhuber6 wrote: So just some random helper function like "Does target support X?" https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From c3df637dd2cb9a5210cb90a3bb69a63c31236039 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + .../test/libc/global_ctor_dtor.cpp| 37 + 19 files changed, 327 insertions(+), 216 deletions(-) create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/Code
[clang] [openmp] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 45a645c4e65d3b1f98dee23c2eba1cf6db99bff0 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 113 +++ openmp/libomptarget/src/rtl.cpp | 6 + .../test/libc/global_ctor_dtor.cpp| 37 + 19 files changed, 325 insertions(+), 216 deletions(-) create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/CodeG
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D, if (D.isNoDestroy(CGM.getContext())) return; + // OpenMP offloading supports C++ constructors and destructors but we do not + // always have 'atexit' available. Instead lower these to use the LLVM global + // destructors which we can handle directly in the runtime. + if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice && + !D.isStaticLocal() && + (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX())) jhuber6 wrote: I could put something in `LangOptions` that just returns the same thing. Wasn't sure if it's worth forcing a recompile of everything though. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 45a645c4e65d3b1f98dee23c2eba1cf6db99bff0 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH 1/2] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 113 +++ openmp/libomptarget/src/rtl.cpp | 6 + .../test/libc/global_ctor_dtor.cpp| 37 + 19 files changed, 325 insertions(+), 216 deletions(-) create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/C
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 45a645c4e65d3b1f98dee23c2eba1cf6db99bff0 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH 1/2] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 113 +++ openmp/libomptarget/src/rtl.cpp | 6 + .../test/libc/global_ctor_dtor.cpp| 37 + 19 files changed, 325 insertions(+), 216 deletions(-) create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/C
[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy { using CUDAStreamManagerTy = GenericDeviceResourceManagerTy; using CUDAEventManagerTy = GenericDeviceResourceManagerTy; + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + bool IsCtor) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'nvptx-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini", +sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Plugin::success(); +} + +// The Nvidia backend cannot handle creating the ctor / dtor array +// automatically so we must create it ourselves. The backend will emit +// several globals that contain function pointers we can call. These are +// prefixed with a known name due to Nvidia's lack of section support. +const ELF64LEObjectFile *ELFObj = +Handler.getOrCreateELFObjectFile(*this, Image); +if (!ELFObj) + return Plugin::error("Unable to create ELF object for image %p", + Image.getStart()); + +// Search for all symbols that contain a constructor or destructor. +SmallVector> Funcs; +for (ELFSymbolRef Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) +return NameOrErr.takeError(); + + if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_" + : "__fini_array_object_")) +continue; + + uint16_t priority; + if (NameOrErr->rsplit('_').second.getAsInteger(10, priority)) +return Plugin::error("Invalid priority for constructor or destructor"); + + Funcs.emplace_back(*NameOrErr, priority); +} + +// Sort the created array to be in priority order. +llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; }); + +// Allocate a buffer to store all of the known constructor / destructor +// functions in so we can iterate them on the device. +void *Buffer = +allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED); jhuber6 wrote: It's much more convenient than copying over the buffer. `SHARED` in CUDA context would be "migratable" memory without async access AFAIK. So this will most likely just invoke a migration once it's accessed. Unsure if that's slower or faster than waiting on an explicit memcpy. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 5366317448060c928ec415f7e243a402ef181cb5 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 Add LangOption for atexit usage Summary: This method isn't 1-to-1 but it's more functional than not having it. --- clang/include/clang/Basic/LangOptions.h | 3 + clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 8 ++ .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 42 ++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 104 ++ openmp/libomptarget/src/rtl.cpp | 9 +- .../test/libc/global_ctor_dtor.cpp| 37 + 20 files changed, 312 insertions(+), 217 deletions(-) create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h index 20a8ada60e0fe51..ae99357eeea7f41 100644 --- a/clang/include/clang/Basic/LangOptions.h +++ b/clang/include/clang/Basic/LangOptions.h @@ -597,6 +597,9 @@ class LangOptions : public LangOptionsBase { return !requiresStrictPrototypes() && !OpenCL; } + /// Returns true if the language supports calling the 'atexit' function. + bool hasAtExit() const { return !(OpenMP && OpenMPIsTargetDevice); } + /// Returns true if implicit int is part of the language requirements. bool isImplicitIntRequired() const { return !CPlusPlus && !C99; } diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + l
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From e0281fc280385286c3d5da7de619e793bd3b6bea Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 Add LangOption for atexit usage Summary: This method isn't 1-to-1 but it's more functional than not having it. --- clang/include/clang/Basic/LangOptions.h | 3 + clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 8 ++ .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 42 ++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 110 +++ openmp/libomptarget/src/rtl.cpp | 9 +- .../test/libc/global_ctor_dtor.cpp| 37 + 20 files changed, 318 insertions(+), 217 deletions(-) create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h index 20a8ada60e0fe51..ae99357eeea7f41 100644 --- a/clang/include/clang/Basic/LangOptions.h +++ b/clang/include/clang/Basic/LangOptions.h @@ -597,6 +597,9 @@ class LangOptions : public LangOptionsBase { return !requiresStrictPrototypes() && !OpenCL; } + /// Returns true if the language supports calling the 'atexit' function. + bool hasAtExit() const { return !(OpenMP && OpenMPIsTargetDevice); } + /// Returns true if implicit int is part of the language requirements. bool isImplicitIntRequired() const { return !CPlusPlus && !C99; } diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. +
[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [llvm] [HIP] support 128 bit int division (PR #71978)
jhuber6 wrote: > Would it be feasible to consider switching to the new offloading driver mode > and really link with the library instead? It may be a conveniently isolated > use case with little/no existing users that would disrupt. I've thought a reasonable amount about a `compiler-rt` for GPUs. Right now it's a little difficult because of the issue of compatibility. We could do the traditional "Build the library N times for N architectures", but I'd like to think of something more intelligent in the future. The use of `-mlink-builtin-bitcode` handles this by more-or-less forcing correct attributes. What this patch does is a little interesting though, having the clang driver pick apart archives has always seemed a little weird. We did it in the past for AMD's old handling of static libraries. There's still a lot of that code leftover I want to delete. I really need to sit down and allow HIP to work with the new driver. I've been messing around with generic IR a bit, and I think what might work is LLVM-IR that intentionally leaves off target specific attributes and then introduce a pass that adds them in if missing before other optimizations are run. Then we may be able to investigate the use of i-funcs to resolve target specific branches once the architecture is known (once being linked in). I think @JonChesterfield was thinking about something to that effect as well. https://github.com/llvm/llvm-project/pull/71978 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/72280 Summary: The standard GNU atomic operations are a very common way to target hardware atomics on the device. With more hetergenous devices being introduced, the concept of memory scopes has been in the LLVM language for awhile via the `syncscope` modifier. For targets, such as the GPU, this can change code generation depending on whether or not we only need to be consistent with the memory ordering with the entire system, the single GPU device, or lower. Previously these scopes were only exported via the `opencl` and `hip` variants of these functions. However, this made it difficult to use outside of those languages and the semantics were different from the standard GNU versions. This patch introduces a `__scoped_atomic` variant for the common functions. There was some discussion over whether or not these should be overloads of the existing ones, or simply new variants. I leant towards new variants to be less disruptive. The scope here can be one of the following ``` __MEMORY_SCOPE_SYSTEM // All devices and systems __MEMORY_SCOPE_DEVICE // Just this device __MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block __MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp __MEMORY_SCOPE_SINGLE // A single thread. ``` Naming consistency was attempted, but it is difficult to cpature to full spectrum with no many names. Suggestions appreciated. >From e08deb90a8d99226fd1e18e5dbc37014a9f88d1d Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 6 Nov 2023 07:08:18 -0600 Subject: [PATCH] [Clang] Introduce scoped variants of GNU atomic functions Summary: The standard GNU atomic operations are a very common way to target hardware atomics on the device. With more hetergenous devices being introduced, the concept of memory scopes has been in the LLVM language for awhile via the `syncscope` modifier. For targets, such as the GPU, this can change code generation depending on whether or not we only need to be consistent with the memory ordering with the entire system, the single GPU device, or lower. Previously these scopes were only exported via the `opencl` and `hip` variants of these functions. However, this made it difficult to use outside of those languages and the semantics were different from the standard GNU versions. This patch introduces a `__scoped_atomic` variant for the common functions. There was some discussion over whether or not these should be overloads of the existing ones, or simply new variants. I leant towards new variants to be less disruptive. The scope here can be one of the following ``` __MEMORY_SCOPE_SYSTEM // All devices and systems __MEMORY_SCOPE_DEVICE // Just this device __MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block __MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp __MEMORY_SCOPE_SINGLE // A single thread. ``` Naming consistency was attempted, but it is difficult to cpature to full spectrum with no many names. Suggestions appreciated. --- clang/include/clang/AST/Expr.h | 20 +- clang/include/clang/Basic/Builtins.def | 26 ++ clang/include/clang/Basic/SyncScope.h| 69 - clang/lib/AST/Expr.cpp | 26 ++ clang/lib/AST/StmtPrinter.cpp| 1 + clang/lib/CodeGen/CGAtomic.cpp | 125 - clang/lib/CodeGen/Targets/AMDGPU.cpp | 5 + clang/lib/Frontend/InitPreprocessor.cpp | 7 + clang/lib/Sema/SemaChecking.cpp | 39 ++- clang/test/CodeGen/scoped-atomic-ops.c | 331 +++ clang/test/Preprocessor/init-aarch64.c | 5 + clang/test/Preprocessor/init-loongarch.c | 10 + clang/test/Preprocessor/init.c | 20 ++ clang/test/Sema/scoped-atomic-ops.c | 101 +++ 14 files changed, 764 insertions(+), 21 deletions(-) create mode 100644 clang/test/CodeGen/scoped-atomic-ops.c create mode 100644 clang/test/Sema/scoped-atomic-ops.c diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h index a9c4c67a60e8e8e..a41f2d66b37b69d 100644 --- a/clang/include/clang/AST/Expr.h +++ b/clang/include/clang/AST/Expr.h @@ -6498,7 +6498,7 @@ class AtomicExpr : public Expr { return cast(SubExprs[ORDER_FAIL]); } Expr *getVal2() const { -if (Op == AO__atomic_exchange) +if (Op == AO__atomic_exchange || Op == AO__scoped_atomic_exchange) return cast(SubExprs[ORDER_FAIL]); assert(NumSubExprs > VAL2); return cast(SubExprs[VAL2]); @@ -6539,7 +6539,9 @@ class AtomicExpr : public Expr { getOp() == AO__opencl_atomic_compare_exchange_weak || getOp() == AO__hip_atomic_compare_exchange_weak || getOp() == AO__atomic_compare_exchange || - getOp() == AO__atomic_compare_exchange_n; + getOp() == AO__atomic_compare_exchange_n || + getOp() == AO__scoped_atomic_compare_exchange || + getOp() == AO__scoped_atomic_compare_exchange_n; } bool isOpenCL() const { @@ -6569,13 +6571,13 @@ class
[clang] Fix tests clang-offload-bundler-zlib/zstd.c (PR #74504)
https://github.com/jhuber6 approved this pull request. Thanks, I noticed that spurious failure as well but didn't know what caused it. https://github.com/llvm/llvm-project/pull/74504 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [CUDA][HIP] Improve variable registration with the new driver (PR #73177)
jhuber6 wrote: Ping https://github.com/llvm/llvm-project/pull/73177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [CUDA][HIP] Improve variable registration with the new driver (PR #73177)
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/73177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Fix tests clang-offload-bundler-zlib/zstd.c (PR #74504)
jhuber6 wrote: I got this fail just now after doing a pull. ``` FAIL: Clang :: Driver/hip-offload-compress-zstd.hip (477 of 1078) TEST 'Clang :: Driver/hip-offload-compress-zstd.hip' FAILED Exit Code: 1 Command Output (stderr): -- RUN: at line 7: rm -rf /home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc + rm -rf /home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc RUN: at line 8: /home/jhuber/Documents/llvm/llvm-project/build/bin/clang -c -v --target=x86_64-linux-gnu-x hip --offload-arch=gfx1100 --offload-arch=gfx1101-fgpu-rdc -nogpuinc -nogpulib /home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/Inputs/hip_multiple_inputs/a.cu --offload-compress --offload-device-only --gpu-bundle-output-o /home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc 2>&1 | /home/jhuber/Documents/llvm/llvm-project/build/bin/FileCheck /home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip + /home/jhuber/Documents/llvm/llvm-project/build/bin/clang -c -v --target=x86_64-linux-gnu -x hip --offload-arch=gfx1100 --offload-arch=gfx1101 -fgpu-rdc -nogpuinc -nogpulib /home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/Inputs/hip_multiple_inputs/a.cu --offload-compress --offload-device-only --gpu-bundle-output -o /home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc + /home/jhuber/Documents/llvm/llvm-project/build/bin/FileCheck /home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip RUN: at line 23: /home/jhuber/Documents/llvm/llvm-project/build/bin/clang --hip-link -### -v --target=x86_64-linux-gnu--offload-arch=gfx1100 --offload-arch=gfx1101-fgpu-rdc -nogpulib /home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc --offload-device-only 2>&1 | /home/jhuber/Documents/llvm/llvm-project/build/bin/FileCheck -check-prefix=UNBUNDLE /home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip + /home/jhuber/Documents/llvm/llvm-project/build/bin/clang --hip-link -### -v --target=x86_64-linux-gnu --offload-arch=gfx1100 --offload-arch=gfx1101 -fgpu-rdc -nogpulib /home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc --offload-device-only + /home/jhuber/Documents/llvm/llvm-project/build/bin/FileCheck -check-prefix=UNBUNDLE /home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip /home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip:29:14: error: UNBUNDLE: expected string not found in input // UNBUNDLE: clang-offload-bundler{{.*}} "-type=bc" ^ :1:1: note: scanning from here clang version 18.0.0git ^ :17:96: note: possible intended match here clang: error: no such file or directory: '/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc' ^ Input file: Check file: /home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip -dump-input=help explains the following input dump. Input was: << 1: clang version 18.0.0git check:29'0 X~~~ error: no match found 2: Target: x86_64-unknown-linux-gnu check:29'0 ~ 3: Thread model: posix check:29'0 4: InstalledDir: /home/jhuber/Documents/llvm/llvm-project/build/bin check:29'0 ~ 5: Found candidate GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0 check:29'0 ~~ 6: Found candidate GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0 check:29'0 ~~ . . . 12: Candidate multilib: .;@m64 check:29'0 ~~~ 13: Candidate multilib: 32;@m32 check:29'0 14: Selected multilib: .;@m64 check:29'0 ~~ 15: Found CUDA installation: /opt/cuda, version check:29'0 ~ 16: Found HIP installation: /opt/rocm, version 5.6.31062 check:29'0 ~ 17: clang: error: no such file or directory: '/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc' check:29'0 ~~ check:29'1
[clang] bfd41c3 - [LinkerWrapper][Obvious] Fix missing use of texture data type
Author: Joseph Huber Date: 2023-12-07T16:55:14-06:00 New Revision: bfd41c3f8cc70bd65461a6d767f55c14d72150d9 URL: https://github.com/llvm/llvm-project/commit/bfd41c3f8cc70bd65461a6d767f55c14d72150d9 DIFF: https://github.com/llvm/llvm-project/commit/bfd41c3f8cc70bd65461a6d767f55c14d72150d9.diff LOG: [LinkerWrapper][Obvious] Fix missing use of texture data type Summary: This was accidentally linked to the wrong pointer, causing unused variable warnings and registering the wrong thing. Added: Modified: clang/test/Driver/linker-wrapper-image.c clang/tools/clang-linker-wrapper/OffloadWrapper.cpp Removed: diff --git a/clang/test/Driver/linker-wrapper-image.c b/clang/test/Driver/linker-wrapper-image.c index 4a17a8324b462..a2a1996f66430 100644 --- a/clang/test/Driver/linker-wrapper-image.c +++ b/clang/test/Driver/linker-wrapper-image.c @@ -90,7 +90,7 @@ // CUDA-NEXT: %4 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 3 // CUDA-NEXT: %flags = load i32, ptr %4, align 4 // CUDA-NEXT: %5 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 4 -// CUDA-NEXT: %textype = load i32, ptr %4, align 4 +// CUDA-NEXT: %textype = load i32, ptr %5, align 4 // CUDA-NEXT: %type = and i32 %flags, 7 // CUDA-NEXT: %6 = and i32 %flags, 8 // CUDA-NEXT: %extern = lshr i32 %6, 3 @@ -189,7 +189,7 @@ // HIP-NEXT: %4 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 3 // HIP-NEXT: %flags = load i32, ptr %4, align 4 // HIP-NEXT: %5 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 4 -// HIP-NEXT: %textype = load i32, ptr %4, align 4 +// HIP-NEXT: %textype = load i32, ptr %5, align 4 // HIP-NEXT: %type = and i32 %flags, 7 // HIP-NEXT: %6 = and i32 %flags, 8 // HIP-NEXT: %extern = lshr i32 %6, 3 diff --git a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp b/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp index 58d9e1e85ceff..f4f500b173572 100644 --- a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp +++ b/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp @@ -385,7 +385,7 @@ Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) { Builder.CreateInBoundsGEP(offloading::getEntryTy(M), Entry, {ConstantInt::get(getSizeTTy(M), 0), ConstantInt::get(Type::getInt32Ty(C), 4)}); - auto *Data = Builder.CreateLoad(Type::getInt32Ty(C), FlagsPtr, "textype"); + auto *Data = Builder.CreateLoad(Type::getInt32Ty(C), DataPtr, "textype"); auto *Kind = Builder.CreateAnd( Flags, ConstantInt::get(Type::getInt32Ty(C), 0x7), "type"); ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Fix tests hip-offload-compress-zlib/zstd.hip (PR #74783)
https://github.com/jhuber6 approved this pull request. https://github.com/llvm/llvm-project/pull/74783 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [CUDA] Add support for CUDA-12.3 and sm_90a (PR #74895)
@@ -80,8 +85,10 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo { bool allowFP16Math() const; bool hasMaskOperator() const { return PTXVersion >= 71; } bool hasNoReturn() const { return SmVersion >= 30 && PTXVersion >= 64; } - unsigned int getSmVersion() const { return SmVersion; } + unsigned int getSmVersion() const { return FullSmVersion / 10; } + unsigned int getFullSmVersion() const { return FullSmVersion; } std::string getTargetName() const { return TargetName; } + bool isSm90a() const { return getFullSmVersion() == 901; } jhuber6 wrote: Could we expose this more like `getSmVersion` and `getSmFeature`? Has CUDA even documented how they intend to further build on this? https://github.com/llvm/llvm-project/pull/74895 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [CUDA] Add support for CUDA-12.3 and sm_90a (PR #74895)
@@ -80,8 +85,10 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo { bool allowFP16Math() const; bool hasMaskOperator() const { return PTXVersion >= 71; } bool hasNoReturn() const { return SmVersion >= 30 && PTXVersion >= 64; } - unsigned int getSmVersion() const { return SmVersion; } + unsigned int getSmVersion() const { return FullSmVersion / 10; } + unsigned int getFullSmVersion() const { return FullSmVersion; } std::string getTargetName() const { return TargetName; } + bool isSm90a() const { return getFullSmVersion() == 901; } jhuber6 wrote: Yeah, I was thinking that the internal representation would just be what "FullSMVersion" is now, but `getSMVersion` would return `/ 10` and `getFeatures` or something would be `% 10`. https://github.com/llvm/llvm-project/pull/74895 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [CUDA] Add support for CUDA-12.3 and sm_90a (PR #74895)
https://github.com/jhuber6 approved this pull request. https://github.com/llvm/llvm-project/pull/74895 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] ef23bba - [Linkerwrapper] Make -Xoffload-linker pass directly to `clang`
Author: Joseph Huber Date: 2023-12-11T07:56:19-06:00 New Revision: ef23bba6e5aecbc6008e8a9ff8740fc4b04fe814 URL: https://github.com/llvm/llvm-project/commit/ef23bba6e5aecbc6008e8a9ff8740fc4b04fe814 DIFF: https://github.com/llvm/llvm-project/commit/ef23bba6e5aecbc6008e8a9ff8740fc4b04fe814.diff LOG: [Linkerwrapper] Make -Xoffload-linker pass directly to `clang` Summary: We provide `-Xoffload-linker` to pass arguments directly to the link step. Currently this uses `-Wl,` implicitly which prevents us from using clang options that we otherwise could make use of. This patch removes that implicit behavior as users can just as easiliy pass `-Xoffload-linker -Wl,-foo` if needed. Added: Modified: clang/test/Driver/linker-wrapper.c clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp Removed: diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c index e82febd618231..b763a003452ba 100644 --- a/clang/test/Driver/linker-wrapper.c +++ b/clang/test/Driver/linker-wrapper.c @@ -123,8 +123,8 @@ // RUN: --linker-path=/usr/bin/ld --device-linker=a --device-linker=nvptx64-nvidia-cuda=b -- \ // RUN: %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=LINKER-ARGS -// LINKER-ARGS: clang{{.*}}--target=amdgcn-amd-amdhsa{{.*}}-Wl,a -// LINKER-ARGS: clang{{.*}}--target=nvptx64-nvidia-cuda{{.*}}-Wl,a -Wl,b +// LINKER-ARGS: clang{{.*}}--target=amdgcn-amd-amdhsa{{.*}}a +// LINKER-ARGS: clang{{.*}}--target=nvptx64-nvidia-cuda{{.*}}a b // RUN: not clang-linker-wrapper --dry-run --host-triple=x86_64-unknown-linux-gnu -ldummy \ // RUN: --linker-path=/usr/bin/ld --device-linker=a --device-linker=nvptx64-nvidia-cuda=b -- \ diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp index db0ce3e2a1901..5d2fe98fe5601 100644 --- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp +++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp @@ -428,7 +428,7 @@ Expected clang(ArrayRef InputFiles, const ArgList &Args) { std::back_inserter(CmdArgs)); for (StringRef Arg : Args.getAllArgValues(OPT_linker_arg_EQ)) -CmdArgs.push_back(Args.MakeArgString("-Wl," + Arg)); +CmdArgs.push_back(Args.MakeArgString(Arg)); for (StringRef Arg : Args.getAllArgValues(OPT_builtin_bitcode_EQ)) { if (llvm::Triple(Arg.split('=').first) == Triple) ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] f3d5758 - [libc] Fix the wrapper headers for 'toupper' and 'tolower'
Author: Joseph Huber Date: 2023-11-14T11:52:43-06:00 New Revision: f3d57583b4942056a930b6f1e4101063637e9e98 URL: https://github.com/llvm/llvm-project/commit/f3d57583b4942056a930b6f1e4101063637e9e98 DIFF: https://github.com/llvm/llvm-project/commit/f3d57583b4942056a930b6f1e4101063637e9e98.diff LOG: [libc] Fix the wrapper headers for 'toupper' and 'tolower' Summary: The GNU headers like to reassign this function to a new function which the optimizer will pick up unless compiling with `O0`. This uses an external LUT which we don't have and fails to link. This patch makes sure that the GPU portion does not include these extra definitions and we only use the ones we support. It's hacky, but it's the only way to disable it. Added: Modified: clang/lib/Headers/llvm_libc_wrappers/ctype.h Removed: diff --git a/clang/lib/Headers/llvm_libc_wrappers/ctype.h b/clang/lib/Headers/llvm_libc_wrappers/ctype.h index 084c5a97765a360..49c2af93471b0e7 100644 --- a/clang/lib/Headers/llvm_libc_wrappers/ctype.h +++ b/clang/lib/Headers/llvm_libc_wrappers/ctype.h @@ -13,8 +13,19 @@ #error "This file is for GPU offloading compilation only" #endif +// The GNU headers like to define 'toupper' and 'tolower' redundantly. This is +// necessary to prevent it from doing that and remapping our implementation. +#if (defined(__NVPTX__) || defined(__AMDGPU__)) && defined(__GLIBC__) +#pragma push_macro("__USE_EXTERN_INLINES") +#undef __USE_EXTERN_INLINES +#endif + #include_next +#if (defined(__NVPTX__) || defined(__AMDGPU__)) && defined(__GLIBC__) +#pragma pop_macro("__USE_EXTERN_INLINES") +#endif + #if __has_include() #if defined(__HIP__) || defined(__CUDA__) ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
jhuber6 wrote: > Just a FYI, that recent NVIDIA GPUs have introduced a concept of [thread > block > cluster](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-block-clusters). > We may need another level of granularity between the block and device. Should be easy enough, though the numbers would no longer be incremental if we put it between there. It's somewhat difficult to decide what these things should be called. Also I was somewhat tempted to keep the names all the same length like the `__ATOMIC` ones are, but that might not be worth the effort. That being said, As far as I'm aware the Nvidia backend doesn't handle scoped atomics at all yet, we simply emit `volatile` versions even when scopes exist in PTX. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
jhuber6 wrote: > Is there any actual difference now between these and the HIP/OpenCL flavors > other than dropping the language from the name? Yes, these directly copy the GNU functions and names. The OpenCL / HIP ones use a different format. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
@@ -798,6 +798,13 @@ static void InitializePredefinedMacros(const TargetInfo &TI, Builder.defineMacro("__ATOMIC_ACQ_REL", "4"); Builder.defineMacro("__ATOMIC_SEQ_CST", "5"); + // Define macros for the clang atomic scopes. + Builder.defineMacro("__MEMORY_SCOPE_SYSTEM", "0"); + Builder.defineMacro("__MEMORY_SCOPE_DEVICE", "1"); + Builder.defineMacro("__MEMORY_SCOPE_WRKGRP", "2"); + Builder.defineMacro("__MEMORY_SCOPE_WVFRNT", "3"); + Builder.defineMacro("__MEMORY_SCOPE_SINGLE", "4"); + jhuber6 wrote: We could, though I might need to think of some better names. It's difficult to cover all the cases people might need. I think that cleanup would best be done in a follow-up patch. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
@@ -205,6 +220,56 @@ class AtomicScopeHIPModel : public AtomicScopeModel { } }; +/// Defines the generic atomic scope model. +class AtomicScopeGenericModel : public AtomicScopeModel { +public: + /// The enum values match predefined built-in macros __ATOMIC_SCOPE_*. + enum ID { +System = 0, +Device = 1, +Workgroup = 2, +Wavefront = 3, +Single = 4, +Last = Single + }; + + AtomicScopeGenericModel() = default; + + SyncScope map(unsigned S) const override { +switch (static_cast(S)) { +case Device: + return SyncScope::DeviceScope; +case System: + return SyncScope::SystemScope; +case Workgroup: + return SyncScope::WorkgroupScope; +case Wavefront: + return SyncScope::WavefrontScope; +case Single: + return SyncScope::SingleScope; +} +llvm_unreachable("Invalid language sync scope value"); jhuber6 wrote: Mostly just copying the existing code for this, but we have semantic checks to ensure that the value is valid. So there's no chance that a user will actually get to specify anything different from the macros provided. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
@@ -54,6 +59,16 @@ enum class SyncScope { inline llvm::StringRef getAsString(SyncScope S) { jhuber6 wrote: I think it's because this is for AST printing purposes, while the backend strings vary per target. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
@@ -904,6 +904,32 @@ BUILTIN(__atomic_signal_fence, "vi", "n") BUILTIN(__atomic_always_lock_free, "bzvCD*", "nE") BUILTIN(__atomic_is_lock_free, "bzvCD*", "nE") +// GNU atomic builtins with atomic scopes. +ATOMIC_BUILTIN(__scoped_atomic_load, "v.", "t") jhuber6 wrote: Naming things is hard, we could do ``` __atomic_scoped_load __scoped_atomic_load __atomic_load_scoped ``` Unsure which is the best. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
jhuber6 wrote: > Overall I think it is the right way to go. Memory scope has been used by > different offloading languages and the atomic clang builtins are essentially > the same. Adding a generic clang atomic builtins with memory scope allows > code sharing among offloading languages. I agree, I'm hoping to hear something from people more familiar with C/C++ or GNU stuff to see if they agree with this direction. Also it might help to decide on some better names for the memory scopes. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [Clang][OpenMP] Fix ordering of processing of map clauses when mapping a struct. (PR #72410)
https://github.com/jhuber6 commented: This being in clang instead seems like a good change. Are there no CodeGen tests changed? We should add one if so. Probably just take your `libomptarget` test and run `update_cc_test_checks` on it with the arguments found in other test files. https://github.com/llvm/llvm-project/pull/72410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [LinkerWrapper] Support device binaries in multiple link jobs (PR #72442)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/72442 Summary: Currently the linker wrapper strictly assigns a single input binary to a single link job based off of its input architecture. This is not sufficient to implement the AMDGPU target ID correctly as this could have many compatible architectures participating in multiple links. This patch introduces the ability to have a single binary input be linked multiple times. For example, given the following, we will now link in the static library where previously we would not. clang foo.c -fopenmp --offload-arch=gfx90a llvm-ar rcs libfoo.a foo.o clang foo.c -fopenmp --offload-arch=gfx90a:xnack+ libfoo.a This also means that given the following we will link the basic input twice, but that's on the user for providing two versions. clang foo.c -fopenmp --offload-arch=gfx90a,gfx90a:xnack+ This should allow us to also support a "generic" target in the future for IR without a specific architecture. This was revived from https://reviews.llvm.org/D152882. The previous issue was that the Window build failed for unknown reasons. Investigating if that is still the case. >From ad003f95734af878b14d24d81091618ec58901b5 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Wed, 15 Nov 2023 15:41:32 -0600 Subject: [PATCH] [LinkerWrapper] Support device binaries in multiple link jobs Summary: Currently the linker wrapper strictly assigns a single input binary to a single link job based off of its input architecture. This is not sufficient to implement the AMDGPU target ID correctly as this could have many compatible architectures participating in multiple links. This patch introduces the ability to have a single binary input be linked multiple times. For example, given the following, we will now link in the static library where previously we would not. clang foo.c -fopenmp --offload-arch=gfx90a llvm-ar rcs libfoo.a foo.o clang foo.c -fopenmp --offload-arch=gfx90a:xnack+ libfoo.a This also means that given the following we will link the basic input twice, but that's on the user for providing two versions. clang foo.c -fopenmp --offload-arch=gfx90a,gfx90a:xnack+ This should allow us to also support a "generic" target in the future for IR without a specific architecture. This was revived from https://reviews.llvm.org/D152882. The previous issue was that the Window build failed for unknown reasons. Investigating if that is still the case. --- clang/lib/Driver/ToolChains/Clang.cpp | 4 +- clang/test/Driver/amdgpu-openmp-toolchain.c | 2 +- clang/test/Driver/linker-wrapper.c| 15 +++ .../ClangLinkerWrapper.cpp| 93 +++ .../clang-linker-wrapper/LinkerWrapperOpts.td | 3 + llvm/include/llvm/Object/OffloadBinary.h | 33 +++ 6 files changed, 109 insertions(+), 41 deletions(-) diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index b462f5a44057d94..b845feb0ef2d9db 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -8692,12 +8692,10 @@ void OffloadPackager::ConstructJob(Compilation &C, const JobAction &JA, } } -// TODO: We need to pass in the full target-id and handle it properly in the -// linker wrapper. SmallVector Parts{ "file=" + File.str(), "triple=" + TC->getTripleString(), -"arch=" + getProcessorFromTargetID(TC->getTriple(), Arch).str(), +"arch=" + Arch.str(), "kind=" + Kind.str(), }; diff --git a/clang/test/Driver/amdgpu-openmp-toolchain.c b/clang/test/Driver/amdgpu-openmp-toolchain.c index f38486ad073..daa41b216089b2b 100644 --- a/clang/test/Driver/amdgpu-openmp-toolchain.c +++ b/clang/test/Driver/amdgpu-openmp-toolchain.c @@ -65,7 +65,7 @@ // RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a:sramecc-:xnack+ \ // RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID -// CHECK-TARGET-ID: clang-offload-packager{{.*}}arch=gfx90a,kind=openmp,feature=-sramecc,feature=+xnack +// CHECK-TARGET-ID: clang-offload-packager{{.*}}arch=gfx90a:sramecc-:xnack+,kind=openmp,feature=-sramecc,feature=+xnack // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a,gfx90a:xnack+ \ // RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c index da7bdc22153ceae..538520be9ac464a 100644 --- a/clang/test/Driver/linker-wrapper.c +++ b/clang/test/Driver/linker-wrapper.c @@ -2,6 +2,9 @@ // REQUIRES: nvptx-registered-target // REQUIRES: amdgpu-registered-target +// An externally visible variable so static libraries extract. +__attribute__((visibility("protected"), used)) int x; + // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o // RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.nvptx.bc // RU
[llvm] [clang] [LinkerWrapper] Support device binaries in multiple link jobs (PR #72442)
jhuber6 wrote: The Windows builder gives the following error which I don't relieve on Linux. Does anyone have any clue what this `invalid argument` error could be caused by? ``` # note: command had no output on stdout or stderr # error: command failed with exit status: 1 # executed command: 'c:\ws\src\build\bin\filecheck.exe' 'C:\ws\src\clang\test\Driver\linker-wrapper.c' --check-prefix=AMDGPU-LINK-ID # .---command stderr # | C:\ws\src\clang\test\Driver\linker-wrapper.c:52:20: error: AMDGPU-LINK-ID: expected string not found in input # | // AMDGPU-LINK-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o # |^ # | :1:1: note: scanning from here # | c:\ws\src\build\bin\clang-linker-wrapper.exe: error: invalid argument # | ^ # | # | Input file: # | Check file: C:\ws\src\clang\test\Driver\linker-wrapper.c # | # | -dump-input=help explains the following input dump. # | # | Input was: # | << # | 1: c:\ws\src\build\bin\clang-linker-wrapper.exe: error: invalid argument # | check:52 X~ error: no match found # | >> # `- # error: command failed with exit status: 1 ``` https://github.com/llvm/llvm-project/pull/72442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [LinkerWrapper] Support device binaries in multiple link jobs (PR #72442)
jhuber6 wrote: > the error msg is not generated by offload wrapper itself, right? is it from > some program called by the offload wrapper? It may be caused by the `clang` invocation. Though I'm unsure why this change causes that test to fail. https://github.com/llvm/llvm-project/pull/72442 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offloading][NFC] Refactor handling of offloading entries (PR #72544)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/72544 Summary: This patch is a simple refactoring of code out of the linker wrapper into a common location. The main motivation behind this change is to make it easier to change the handling in the future to accept a triple to be used to emit entries that function on that target. >From 0047be2207b775e6de6dda24751daa933bd66ce5 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Thu, 16 Nov 2023 12:05:09 -0600 Subject: [PATCH] [Offloading][NFC] Refactor handling of offloading entries Summary: This patch is a simple refactoring of code out of the linker wrapper into a common location. The main motivation behind this change is to make it easier to change the handling in the future to accept a triple to be used to emit entries that function on that target. --- clang/test/Driver/linker-wrapper-image.c | 38 +++--- .../tools/clang-linker-wrapper/CMakeLists.txt | 1 + .../clang-linker-wrapper/OffloadWrapper.cpp | 116 -- .../llvm/Frontend/Offloading/Utility.h| 11 +- llvm/lib/Frontend/Offloading/Utility.cpp | 31 - 5 files changed, 82 insertions(+), 115 deletions(-) diff --git a/clang/test/Driver/linker-wrapper-image.c b/clang/test/Driver/linker-wrapper-image.c index 83e7db6a49a6bb3..bb641a08bc023d5 100644 --- a/clang/test/Driver/linker-wrapper-image.c +++ b/clang/test/Driver/linker-wrapper-image.c @@ -10,9 +10,9 @@ // RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \ // RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=OPENMP -// OPENMP: @__start_omp_offloading_entries = external hidden constant %__tgt_offload_entry -// OPENMP-NEXT: @__stop_omp_offloading_entries = external hidden constant %__tgt_offload_entry -// OPENMP-NEXT: @__dummy.omp_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries" +// OPENMP: @__start_omp_offloading_entries = external hidden constant [0 x %struct.__tgt_offload_entry] +// OPENMP-NEXT: @__stop_omp_offloading_entries = external hidden constant [0 x %struct.__tgt_offload_entry] +// OPENMP-NEXT: @__dummy.omp_offloading_entries = hidden constant [0 x %struct.__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries" // OPENMP-NEXT: @.omp_offloading.device_image = internal unnamed_addr constant [[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}" // OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr @.omp_offloading.device_image, ptr getelementptr inbounds ([[[SIZE]] x i8], ptr @.omp_offloading.device_image, i64 1, i64 0), ptr @__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }] // OPENMP-NEXT: @.omp_offloading.descriptor = internal constant %__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr @__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries } @@ -39,10 +39,10 @@ // CUDA: @.fatbin_image = internal constant [0 x i8] zeroinitializer, section ".nv_fatbin" // CUDA-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 1180844977, i32 1, ptr @.fatbin_image, ptr null }, section ".nvFatBinSegment", align 8 -// CUDA-NEXT: @__dummy.cuda_offloading.entry = hidden constant [0 x %__tgt_offload_entry] zeroinitializer, section "cuda_offloading_entries" // CUDA-NEXT: @.cuda.binary_handle = internal global ptr null -// CUDA-NEXT: @__start_cuda_offloading_entries = external hidden constant [0 x %__tgt_offload_entry] -// CUDA-NEXT: @__stop_cuda_offloading_entries = external hidden constant [0 x %__tgt_offload_entry] +// CUDA-NEXT: @__start_cuda_offloading_entries = external hidden constant [0 x %struct.__tgt_offload_entry] +// CUDA-NEXT: @__stop_cuda_offloading_entries = external hidden constant [0 x %struct.__tgt_offload_entry] +// CUDA-NEXT: @__dummy.cuda_offloading_entries = hidden constant [0 x %struct.__tgt_offload_entry] zeroinitializer, section "cuda_offloading_entries" // CUDA-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg, ptr null }] // CUDA: define internal void @.cuda.fatbin_reg() section ".text.startup" { @@ -68,13 +68,13 @@ // CUDA: while.entry: // CUDA-NEXT: %entry1 = phi ptr [ @__start_cuda_offloading_entries, %entry ], [ %7, %if.end ] -// CUDA-NEXT: %1 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 0 +// CUDA-NEXT: %1 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 0 // CUDA-NEXT: %addr = load ptr, ptr %1, align 8 -// CUDA-NEXT: %2 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, i64 0, i32 1 +// CUDA-NEXT: %2 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 1 // CUDA-NEXT: %name = load ptr, ptr %2, align 8 -// CUDA-NEXT: %3 = getelementpt
[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/72554 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)
@@ -2458,6 +2458,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, &getTarget().getLongDoubleFormat() == &llvm::APFloat::IEEEquad()) BuiltinID = mutateLongDoubleBuiltin(BuiltinID); + // Mutate the printf builtin ID so that we use the same CodeGen path for + // HIP and OpenCL with AMDGPU targets. + if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf) + BuiltinID = Builtin::BIprintf; jhuber6 wrote: I'm very close to landing 'real' printf support in the GPU libc where `printf` is just a regular function call. Will this change the handling for that in any way? I've already had to make the backend pass respect `-fno-builtins` and remove `ockl` from OpenMP to make that possible so I'm hoping we don't end up with a lot more special casing for `printf`. https://github.com/llvm/llvm-project/pull/72554 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)
https://github.com/jhuber6 commented: Any tests? Can you explain why it's not sufficient to do this lowering in the AMDGPU pass? https://github.com/llvm/llvm-project/pull/72554 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)
@@ -2458,6 +2458,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, &getTarget().getLongDoubleFormat() == &llvm::APFloat::IEEEquad()) BuiltinID = mutateLongDoubleBuiltin(BuiltinID); + // Mutate the printf builtin ID so that we use the same CodeGen path for + // HIP and OpenCL with AMDGPU targets. + if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf) + BuiltinID = Builtin::BIprintf; jhuber6 wrote: If we do the eager replacement of `printf` that HIP and OpenCL uses currently then it won't be linked in. So users should still be able to link in stuff like `strcmp` or whatever without it interfering. This would require the new driver however, and if they attempted to use something like `fputs` it would segfault because no one initialized the buffer, which isn't a terrible failure mode all things considered. https://github.com/llvm/llvm-project/pull/72554 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
jhuber6 wrote: > I'm a little wary of adding these without actually going through any sort of > standardization process; if other vendors don't support the same interface, > we just have more variations. (See also > https://clang.llvm.org/get_involved.html#criteria ) > > How consistent are the various scopes across targets? How likely is it that > some target will need additional scopes that you haven't specified? I figured we can just treat these as `clang` extensions for the time being. We already have two variants that are more or less redundant for specific use-cases, (OpenCL and HIP), which should be able to be removed after this. Predicting all kinds of scopes is hard. The easy solution is to just number this or something since it's hierarchical. But @Artem-B has already pointed out that Nvidia has a scope between "device" and "blocks". Pretty much every system is going to have a conception of "device" and "system" and "single threaded" however. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
jhuber6 wrote: > > I figured we can just treat these as clang extensions for the time being. > > We already have two variants that are more or less redundant for specific > > use-cases, (OpenCL and HIP), which should be able to be removed after this. > > I'm not sure what you mean here. If you mean that users are expected to use > the OpenCL/HIP/etc. standard APIs, and you only expect to use these as part > of language runtimes, then maybe we don't care so much if it's clang-only. They should be available to users, but this level of programming is highly compiler-dependent already so I don't see it as much different. > It might be worth considering using string literals instead of numbers for > the different scopes. It removes any question of whether the list of scopes > is complete and the order of of numbers on the list. And it doesn't require > defining a bunch of new preprocessor macros. The underlying implementation is a string literal in the LLVM `syncscope` argument, but the problem is that this isn't standardized at all and varies between backends potentially. I suppose we could think of this are more literally "target the LLVM `syncscope` argument". I'd like something that's "reasonably" consistent between targets since a lot of this can be shared as simple hierarchy. it would be really annoying if each target had to define separate strings for something that's mostly common in concept. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)
jhuber6 wrote: > > The underlying implementation is a string literal in the LLVM syncscope > > argument, but the problem is that this isn't standardized at all and varies > > between backends potentially > > We don't have to use the same set of strings as syncscope if that doesn't > make sense. I don't think there's much of a point to making them strings if it's not directly invoking the syncscope name for the backend. Realistically as long as we give them descriptive names we can just ignore ones that don't apply on various targets. Like right now you can use these scoped variants in x64 code but it has no effect. Either that or we could use logic to go to the next heirarchy level that makes sense. As always, naming stuff is hard. https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offloading][NFC] Refactor handling of offloading entries (PR #72544)
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/72544 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/72697 Summary: This patch provides the initial support to allow handling the new driver's offloading entries. Normally, the ELF target can emit varibles at C-identifier named sections and the linker will provide a pointer to the section. For COFF target, instead the linker merges sections containing a `$` in alphabetical order. We thus can emit these variables at sections and then emit two variables that are guaranteed to be sorted before and after the others to traverse it. Previous patches consolidated the handling of offloading entries so that this patch more easily can handle mapping them to the appropriate section. Ideally, the only remaining step to allow the new driver to run on Windows targets is to accurately map the following `ld.lld` arguments to their `llvm-link` equivalents. These are used inside the linker-wrapper, so we should simply need to remap the arguments to the same functionality if possible. ``` -o, -output -l, --library -L, --library-path -v, --version -rpath -whole-archive, -no-whole-archive ``` I have not tested this at runtime as I do not have access to a windows machine. This patch was adapted from some initial efforts in https://reviews.llvm.org/D137470. >From 123a4a069166f3ba84dda479ca590fc4597b7074 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 17 Nov 2023 14:09:59 -0600 Subject: [PATCH] [Offload] Initial support for registering offloading entries on COFF targets Summary: This patch provides the initial support to allow handling the new driver's offloading entries. Normally, the ELF target can emit varibles at C-identifier named sections and the linker will provide a pointer to the section. For COFF target, instead the linker merges sections containing a `$` in alphabetical order. We thus can emit these variables at sections and then emit two variables that are guaranteed to be sorted before and after the others to traverse it. Previous patches consolidated the handling of offloading entries so that this patch more easily can handle mapping them to the appropriate section. Ideally, the only remaining step to allow the new driver to run on Windows targets is to accurately map the following `ld.lld` arguments to their `llvm-link` equivalents. These are used inside the linker-wrapper, so we should simply need to remap the arguments to the same functionality if possible. ``` -o, -output -l, --library -L, --library-path -v, --version -rpath -whole-archive, -no-whole-archive ``` I have not tested this at runtime as I do not have access to a windows machine. This patch was adapted from some initial efforts in https://reviews.llvm.org/D137470. --- clang/test/CodeGenCUDA/offloading-entries.cu | 48 +++ clang/test/Driver/linker-wrapper-image.c | 50 ++- .../OpenMP/declare_target_link_codegen.cpp| 8 ++- llvm/lib/Frontend/Offloading/CMakeLists.txt | 1 + llvm/lib/Frontend/Offloading/Utility.cpp | 61 --- 5 files changed, 128 insertions(+), 40 deletions(-) diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu b/clang/test/CodeGenCUDA/offloading-entries.cu index c4f8d2edad0a98e..46235051f1e4f12 100644 --- a/clang/test/CodeGenCUDA/offloading-entries.cu +++ b/clang/test/CodeGenCUDA/offloading-entries.cu @@ -5,6 +5,12 @@ // RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \ // RUN: --offload-new-driver -emit-llvm -o - -x hip %s | FileCheck \ // RUN: --check-prefix=HIP %s +// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \ +// RUN: --offload-new-driver -emit-llvm -o - -x cuda %s | FileCheck \ +// RUN: --check-prefix=CUDA-COFF %s +// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \ +// RUN: --offload-new-driver -emit-llvm -o - -x hip %s | FileCheck \ +// RUN: --check-prefix=HIP-COFF %s #include "Inputs/cuda.h" @@ -23,6 +29,20 @@ // HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" // HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries", align 1 //. +// CUDA-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00" +// CUDA-COFF: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +// CUDA-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00" +// CUDA-COFF: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +// CUDA-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8
[clang] [LinkerWrapper] Accenp some neede COFF linker argument (PR #72889)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/72889 Summary: The linker wrapper is a utility used to create offloading programs from single-source offloading languages such as OpenMP or CUDA. This is done by embedding device code into the host object, then feeding it into the linker wrapper which extracts the accelerator object files, links them, then wraps them in registration code for the target runtime. This previously has only worked in Linux / ELF platforms. This patch attempts to hand Windows / COFF inputs by also accepting COFF forms of certain linker arguments we use internally. The important arguments are library search paths, so we can identify libraries which may contain device code, libraries themselves, and the output name used for intermediate output. I am not intimately familiar with the semantics here for the semantics in how a `lib` file is earched. I am simply treating `foo.lib` as the GNU equivalent `-l:foo.lib` in the search logic. Similarly, I am assuming that static libraries will be llvm-ar style libraries. I will need to investigate the actual deficiencies later, but this should be a good starting point along with https://github.com/llvm/llvm-project/pull/72697 >From d06171561581d9d15c14f756c8999b478e1d769e Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 20 Nov 2023 10:12:04 -0600 Subject: [PATCH] [LinkerWrapper] Accenp some neede COFF linker argument Summary: The linker wrapper is a utility used to create offloading programs from single-source offloading languages such as OpenMP or CUDA. This is done by embedding device code into the host object, then feeding it into the linker wrapper which extracts the accelerator object files, links them, then wraps them in registration code for the target runtime. This previously has only worked in Linux / ELF platforms. This patch attempts to hand Windows / COFF inputs by also accepting COFF forms of certain linker arguments we use internally. The important arguments are library search paths, so we can identify libraries which may contain device code, libraries themselves, and the output name used for intermediate output. I am not intimately familiar with the semantics here for the semantics in how a `lib` file is earched. I am simply treating `foo.lib` as the GNU equivalent `-l:foo.lib` in the search logic. Similarly, I am assuming that static libraries will be llvm-ar style libraries. I will need to investigate the actual deficiencies later, but this should be a good starting point along with https://github.com/llvm/llvm-project/pull/72697 --- clang/test/Driver/linker-wrapper.c | 8 .../ClangLinkerWrapper.cpp | 18 +- .../clang-linker-wrapper/LinkerWrapperOpts.td | 5 + 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c index da7bdc22153ceae..e82febd61823102 100644 --- a/clang/test/Driver/linker-wrapper.c +++ b/clang/test/Driver/linker-wrapper.c @@ -140,3 +140,11 @@ // RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CLANG-BACKEND // CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.bc + +// RUN: clang-offload-packager -o %t.out \ +// RUN: --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 +// RUN: %clang -cc1 %s -triple x86_64-unknown-windows-msvc -emit-obj -o %t.o -fembed-offload-object=%t.out +// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-windows-msvc --dry-run \ +// RUN: --linker-path=/usr/bin/lld-link -- %t.o -libpath:./ -out:a.exe 2>&1 | FileCheck %s --check-prefix=COFF + +// COFF: "/usr/bin/lld-link" {{.*}}.o -libpath:./ -out:a.exe {{.*}}openmp.image.wrapper{{.*}} diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp index bafe8ace60d1cea..db0ce3e2a190192 100644 --- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp +++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp @@ -254,7 +254,7 @@ Error runLinker(ArrayRef Files, const ArgList &Args) { continue; Arg->render(Args, NewLinkerArgs); -if (Arg->getOption().matches(OPT_o)) +if (Arg->getOption().matches(OPT_o) || Arg->getOption().matches(OPT_out)) llvm::transform(Files, std::back_inserter(NewLinkerArgs), [&](StringRef Arg) { return Args.MakeArgString(Arg); }); } @@ -1188,7 +1188,7 @@ searchLibraryBaseName(StringRef Name, StringRef Root, /// `-lfoo` or `-l:libfoo.a`. std::optional searchLibrary(StringRef Input, StringRef Root, ArrayRef SearchPaths) { - if (Input.startswith(":")) + if (Input.startswith(":") || Input.ends_with(".lib")) return findFromSearchPaths(Input.drop_front(), Root, SearchPaths); return searchLibraryBaseName(Input, Root, SearchPath
[clang] 88b672b - [libc] Adjust headers for some implementations of 'stdio.h'
Author: Joseph Huber Date: 2023-11-20T11:22:59-06:00 New Revision: 88b672b0a79e9f68253abf7edcfa5a42d1321cae URL: https://github.com/llvm/llvm-project/commit/88b672b0a79e9f68253abf7edcfa5a42d1321cae DIFF: https://github.com/llvm/llvm-project/commit/88b672b0a79e9f68253abf7edcfa5a42d1321cae.diff LOG: [libc] Adjust headers for some implementations of 'stdio.h' Summary: This is sometimes a macro, undefine it so we can declare it as the GPU needs. Added: Modified: clang/lib/Headers/llvm_libc_wrappers/stdio.h Removed: diff --git a/clang/lib/Headers/llvm_libc_wrappers/stdio.h b/clang/lib/Headers/llvm_libc_wrappers/stdio.h index 51b0f0e3307772c..0870f3e741ec135 100644 --- a/clang/lib/Headers/llvm_libc_wrappers/stdio.h +++ b/clang/lib/Headers/llvm_libc_wrappers/stdio.h @@ -21,6 +21,17 @@ #define __LIBC_ATTRS __attribute__((device)) #endif +// Some headers provide these as macros. Temporarily undefine them so they do +// not conflict with any definitions for the GPU. + +#pragma push_macro("stdout") +#pragma push_macro("stdin") +#pragma push_macro("stderr") + +#undef stdout +#undef stderr +#undef stdin + #pragma omp begin declare target #include @@ -29,6 +40,13 @@ #undef __LIBC_ATTRS +// Restore the original macros when compiling on the host. +#if !defined(__NVPTX__) && !defined(__AMDGPU__) +#pragma pop_macro("stdout") +#pragma pop_macro("stderr") +#pragma pop_macro("stdin") +#endif + #endif #endif // __CLANG_LLVM_LIBC_WRAPPERS_STDIO_H__ ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [LinkerWrapper] Accept some needed COFF linker arguments (PR #72889)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/72889 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant *Addr, StringRef Name, M.getDataLayout().getDefaultGlobalsAddressSpace()); // The entry has to be created in the section the linker expects it to be. - Entry->setSection(SectionName); + if (Triple.isOSBinFormatCOFF()) +Entry->setSection((SectionName + "$OE").str()); + else +Entry->setSection(SectionName); Entry->setAlignment(Align(1)); } std::pair offloading::getOffloadEntryArray(Module &M, StringRef SectionName) { - auto *EntriesB = - new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0), - /*isConstant=*/true, GlobalValue::ExternalLinkage, - /*Initializer=*/nullptr, "__start_" + SectionName); + llvm::Triple Triple(M.getTargetTriple()); + + auto *ZeroInitilaizer = + ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u)); + auto *EntryInit = Triple.isOSBinFormatCOFF() ? ZeroInitilaizer : nullptr; + auto *EntryType = Triple.isOSBinFormatCOFF() +? ZeroInitilaizer->getType() +: ArrayType::get(getEntryTy(M), 0); jhuber6 wrote: One is a `EntryTy` the other is a `EntryTy[]` https://github.com/llvm/llvm-project/pull/72697 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/72697 >From ef4e04961a1f553a9f1dced26e69e927060d4dd7 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 17 Nov 2023 14:09:59 -0600 Subject: [PATCH] [Offload] Initial support for registering offloading entries on COFF targets Summary: This patch provides the initial support to allow handling the new driver's offloading entries. Normally, the ELF target can emit varibles at C-identifier named sections and the linker will provide a pointer to the section. For COFF target, instead the linker merges sections containing a `$` in alphabetical order. We thus can emit these variables at sections and then emit two variables that are guaranteed to be sorted before and after the others to traverse it. Previous patches consolidated the handling of offloading entries so that this patch more easily can handle mapping them to the appropriate section. Ideally, the only remaining step to allow the new driver to run on Windows targets is to accurately map the following `ld.lld` arguments to their `llvm-link` equivalents. These are used inside the linker-wrapper, so we should simply need to remap the arguments to the same functionality if possible. ``` -o, -output -l, --library -L, --library-path -v, --version -rpath -whole-archive, -no-whole-archive ``` I have not tested this at runtime as I do not have access to a windows machine. This patch was adapted from some initial efforts in https://reviews.llvm.org/D137470. --- clang/test/CodeGenCUDA/offloading-entries.cu | 48 + clang/test/Driver/linker-wrapper-image.c | 50 + .../OpenMP/declare_target_link_codegen.cpp| 8 ++- llvm/lib/Frontend/Offloading/CMakeLists.txt | 1 + llvm/lib/Frontend/Offloading/Utility.cpp | 70 --- 5 files changed, 132 insertions(+), 45 deletions(-) diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu b/clang/test/CodeGenCUDA/offloading-entries.cu index c4f8d2edad0a98e..46235051f1e4f12 100644 --- a/clang/test/CodeGenCUDA/offloading-entries.cu +++ b/clang/test/CodeGenCUDA/offloading-entries.cu @@ -5,6 +5,12 @@ // RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \ // RUN: --offload-new-driver -emit-llvm -o - -x hip %s | FileCheck \ // RUN: --check-prefix=HIP %s +// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \ +// RUN: --offload-new-driver -emit-llvm -o - -x cuda %s | FileCheck \ +// RUN: --check-prefix=CUDA-COFF %s +// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \ +// RUN: --offload-new-driver -emit-llvm -o - -x hip %s | FileCheck \ +// RUN: --check-prefix=HIP-COFF %s #include "Inputs/cuda.h" @@ -23,6 +29,20 @@ // HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" // HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries", align 1 //. +// CUDA-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00" +// CUDA-COFF: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +// CUDA-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00" +// CUDA-COFF: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +// CUDA-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" +// CUDA-COFF: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +//. +// HIP-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00" +// HIP-COFF: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1 +// HIP-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00" +// HIP-COFF: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z3barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1 +// HIP-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" +// HIP-COFF: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1 +//. // CUDA-LABEL: @_Z18__device_stub__foov( // CUDA-NEXT: entry: // CUDA-
[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant *Addr, StringRef Name, M.getDataLayout().getDefaultGlobalsAddressSpace()); // The entry has to be created in the section the linker expects it to be. - Entry->setSection(SectionName); + if (Triple.isOSBinFormatCOFF()) +Entry->setSection((SectionName + "$OE").str()); + else +Entry->setSection(SectionName); Entry->setAlignment(Align(1)); } std::pair offloading::getOffloadEntryArray(Module &M, StringRef SectionName) { - auto *EntriesB = - new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0), - /*isConstant=*/true, GlobalValue::ExternalLinkage, - /*Initializer=*/nullptr, "__start_" + SectionName); + llvm::Triple Triple(M.getTargetTriple()); + + auto *ZeroInitilaizer = + ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u)); + auto *EntryInit = Triple.isOSBinFormatCOFF() ? ZeroInitilaizer : nullptr; + auto *EntryType = Triple.isOSBinFormatCOFF() +? ZeroInitilaizer->getType() +: ArrayType::get(getEntryTy(M), 0); jhuber6 wrote: Actually you're right, forgot the initializer used the array type as well. https://github.com/llvm/llvm-project/pull/72697 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/72697 >From e3b6ab18f390e0ee4938095717aa9e4b21690aa7 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 17 Nov 2023 14:09:59 -0600 Subject: [PATCH] [Offload] Initial support for registering offloading entries on COFF targets Summary: This patch provides the initial support to allow handling the new driver's offloading entries. Normally, the ELF target can emit varibles at C-identifier named sections and the linker will provide a pointer to the section. For COFF target, instead the linker merges sections containing a `$` in alphabetical order. We thus can emit these variables at sections and then emit two variables that are guaranteed to be sorted before and after the others to traverse it. Previous patches consolidated the handling of offloading entries so that this patch more easily can handle mapping them to the appropriate section. Ideally, the only remaining step to allow the new driver to run on Windows targets is to accurately map the following `ld.lld` arguments to their `llvm-link` equivalents. These are used inside the linker-wrapper, so we should simply need to remap the arguments to the same functionality if possible. ``` -o, -output -l, --library -L, --library-path -v, --version -rpath -whole-archive, -no-whole-archive ``` I have not tested this at runtime as I do not have access to a windows machine. This patch was adapted from some initial efforts in https://reviews.llvm.org/D137470. --- clang/test/CodeGenCUDA/offloading-entries.cu | 48 + clang/test/Driver/linker-wrapper-image.c | 50 ++ .../OpenMP/declare_target_link_codegen.cpp| 8 ++- llvm/lib/Frontend/Offloading/CMakeLists.txt | 1 + llvm/lib/Frontend/Offloading/Utility.cpp | 68 +++ 5 files changed, 130 insertions(+), 45 deletions(-) diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu b/clang/test/CodeGenCUDA/offloading-entries.cu index c4f8d2edad0a98e..46235051f1e4f12 100644 --- a/clang/test/CodeGenCUDA/offloading-entries.cu +++ b/clang/test/CodeGenCUDA/offloading-entries.cu @@ -5,6 +5,12 @@ // RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \ // RUN: --offload-new-driver -emit-llvm -o - -x hip %s | FileCheck \ // RUN: --check-prefix=HIP %s +// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \ +// RUN: --offload-new-driver -emit-llvm -o - -x cuda %s | FileCheck \ +// RUN: --check-prefix=CUDA-COFF %s +// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \ +// RUN: --offload-new-driver -emit-llvm -o - -x hip %s | FileCheck \ +// RUN: --check-prefix=HIP-COFF %s #include "Inputs/cuda.h" @@ -23,6 +29,20 @@ // HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" // HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries", align 1 //. +// CUDA-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00" +// CUDA-COFF: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +// CUDA-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00" +// CUDA-COFF: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +// CUDA-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" +// CUDA-COFF: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +//. +// HIP-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00" +// HIP-COFF: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1 +// HIP-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00" +// HIP-COFF: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z3barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1 +// HIP-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" +// HIP-COFF: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1 +//. // CUDA-LABEL: @_Z18__device_stub__foov( // CUDA-NEXT: entry: // CUDA
[clang] [llvm] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/72697 >From 4627ea74d753eb6742051127e0a5b0c64a620f20 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 17 Nov 2023 14:09:59 -0600 Subject: [PATCH] [Offload] Initial support for registering offloading entries on COFF targets Summary: This patch provides the initial support to allow handling the new driver's offloading entries. Normally, the ELF target can emit varibles at C-identifier named sections and the linker will provide a pointer to the section. For COFF target, instead the linker merges sections containing a `$` in alphabetical order. We thus can emit these variables at sections and then emit two variables that are guaranteed to be sorted before and after the others to traverse it. Previous patches consolidated the handling of offloading entries so that this patch more easily can handle mapping them to the appropriate section. Ideally, the only remaining step to allow the new driver to run on Windows targets is to accurately map the following `ld.lld` arguments to their `llvm-link` equivalents. These are used inside the linker-wrapper, so we should simply need to remap the arguments to the same functionality if possible. ``` -o, -output -l, --library -L, --library-path -v, --version -rpath -whole-archive, -no-whole-archive ``` I have not tested this at runtime as I do not have access to a windows machine. This patch was adapted from some initial efforts in https://reviews.llvm.org/D137470. --- clang/test/CodeGenCUDA/offloading-entries.cu | 48 + clang/test/Driver/linker-wrapper-image.c | 50 ++ .../OpenMP/declare_target_link_codegen.cpp| 8 ++- llvm/lib/Frontend/Offloading/CMakeLists.txt | 1 + llvm/lib/Frontend/Offloading/Utility.cpp | 68 +++ 5 files changed, 130 insertions(+), 45 deletions(-) diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu b/clang/test/CodeGenCUDA/offloading-entries.cu index c4f8d2edad0a98e..46235051f1e4f12 100644 --- a/clang/test/CodeGenCUDA/offloading-entries.cu +++ b/clang/test/CodeGenCUDA/offloading-entries.cu @@ -5,6 +5,12 @@ // RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \ // RUN: --offload-new-driver -emit-llvm -o - -x hip %s | FileCheck \ // RUN: --check-prefix=HIP %s +// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \ +// RUN: --offload-new-driver -emit-llvm -o - -x cuda %s | FileCheck \ +// RUN: --check-prefix=CUDA-COFF %s +// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \ +// RUN: --offload-new-driver -emit-llvm -o - -x hip %s | FileCheck \ +// RUN: --check-prefix=HIP-COFF %s #include "Inputs/cuda.h" @@ -23,6 +29,20 @@ // HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" // HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries", align 1 //. +// CUDA-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00" +// CUDA-COFF: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +// CUDA-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00" +// CUDA-COFF: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +// CUDA-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" +// CUDA-COFF: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1 +//. +// HIP-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00" +// HIP-COFF: @.omp_offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1 +// HIP-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 x i8] c"_Z3barv\00" +// HIP-COFF: @.omp_offloading.entry._Z3barv = weak constant %struct.__tgt_offload_entry { ptr @_Z3barv, ptr @.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1 +// HIP-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x i8] c"x\00" +// HIP-COFF: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1 +//. // CUDA-LABEL: @_Z18__device_stub__foov( // CUDA-NEXT: entry: // CUDA
[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant *Addr, StringRef Name, M.getDataLayout().getDefaultGlobalsAddressSpace()); // The entry has to be created in the section the linker expects it to be. - Entry->setSection(SectionName); + if (Triple.isOSBinFormatCOFF()) +Entry->setSection((SectionName + "$OE").str()); + else +Entry->setSection(SectionName); Entry->setAlignment(Align(1)); } std::pair offloading::getOffloadEntryArray(Module &M, StringRef SectionName) { - auto *EntriesB = - new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0), - /*isConstant=*/true, GlobalValue::ExternalLinkage, - /*Initializer=*/nullptr, "__start_" + SectionName); + llvm::Triple Triple(M.getTargetTriple()); jhuber6 wrote: I fixed the other occurrences, these should stay to separate the type as I prefer `llvm::Triple Triple` over `Triple TheTriple` or similar. https://github.com/llvm/llvm-project/pull/72697 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [LinkerWrapper] Accept some needed COFF linker arguments (PR #72889)
@@ -126,3 +126,8 @@ def version : Flag<["--", "-"], "version">, Flags<[HelpHidden]>, Alias; def whole_archive : Flag<["--", "-"], "whole-archive">, Flags<[HelpHidden]>; def no_whole_archive : Flag<["--", "-"], "no-whole-archive">, Flags<[HelpHidden]>; + +// COFF-style linker options. +def out : Joined<["/", "-", "/?", "-?"], "out:">, Flags<[HelpHidden]>; jhuber6 wrote: I copied this from the COFF implementation of `lld-link` and assumed that it knew the flags better than I did. https://github.com/llvm/llvm-project/pull/72889 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [LinkerWrapper] Accept some needed lld-link linker arguments for COFF targets (PR #72889)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/72889 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [LinkerWrapper] Accept some needed lld-link linker arguments for COFF targets (PR #72889)
jhuber6 wrote: > The command-line argument handling is not related to > [PE](https://en.wikipedia.org/wiki/Portable_Executable)/COFF, but to > Microsoft's `link.exe` command line interface, for instance > [`/libpath:`](https://learn.microsoft.com/en-us/cpp/build/reference/libpath-additional-libpath?view=msvc-170). > `/usr/bin/lld-link` is a `link.exe`-compatible interface for lld with an > appropriate default triple, like `clang-cl` is for `clang`. IIRC, `lld` > choses its command-line interface based on the `argv[0]` name, so should > clang-linker-wrapper when passed `--linker-path=/usr/bin/lld-link` instead of > `--linker-path=/usr/bin/lld`, but both should be able to generate PE files. > > That is, this patch is not necessarily wrong, but the commit message and "// > COFF-style linker options." should refer to the command line interface > instead. I changed the title, realistically I could probably try to separate these more logically, but I think it's easier to just handle them both independently but identically in the logic. https://github.com/llvm/llvm-project/pull/72889 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits