from:"Joseph Huber via cfe\-commits"

[clang] 3bbbe4c - [OpenMP] Add Additional Function Attribute Information to OMPKinds.def

2020-07-18 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2020-07-18T12:55:50-04:00
New Revision: 3bbbe4c4b6c8e20538a388df164da6f8d935e0cc

URL: 
https://github.com/llvm/llvm-project/commit/3bbbe4c4b6c8e20538a388df164da6f8d935e0cc
DIFF: 
https://github.com/llvm/llvm-project/commit/3bbbe4c4b6c8e20538a388df164da6f8d935e0cc.diff

LOG: [OpenMP] Add Additional Function Attribute Information to OMPKinds.def

Summary:
This patch adds more function attribute information to the runtime function 
definitions in OMPKinds.def. The goal is to provide sufficient information 
about OpenMP runtime functions to perform more optimizations on OpenMP code.

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits yaxunl guansong sstefan1 llvm-commits

Tags: #OpenMP #clang #LLVM

Differential Revision: https://reviews.llvm.org/D81031

Added: 


Modified: 
clang/test/OpenMP/barrier_codegen.cpp
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
llvm/test/Transforms/OpenMP/add_attributes.ll
llvm/test/Transforms/OpenMP/parallel_deletion.ll

Removed: 




diff  --git a/clang/test/OpenMP/barrier_codegen.cpp 
b/clang/test/OpenMP/barrier_codegen.cpp
index f84a26380df9..35b2ed721276 100644
--- a/clang/test/OpenMP/barrier_codegen.cpp
+++ b/clang/test/OpenMP/barrier_codegen.cpp
@@ -46,7 +46,7 @@ int main(int argc, char **argv) {
 // IRBUILDER:  ; Function Attrs: nounwind
 // IRBUILDER-NEXT: declare i32 @__kmpc_global_thread_num(%struct.ident_t*) 
#
 // IRBUILDER_OPT:  ; Function Attrs: inaccessiblememonly nofree nosync 
nounwind readonly
-// IRBUILDER_OPT-NEXT: declare i32 @__kmpc_global_thread_num(%struct.ident_t*) 
#
+// IRBUILDER_OPT-NEXT: declare i32 @__kmpc_global_thread_num(%struct.ident_t* 
nocapture nofree readonly) #
 
 // CHECK: define {{.+}} [[TMAIN_INT]](
 // CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num([[IDENT_T]]* 
[[LOC]])

diff  --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def 
b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index 0dc2b34f2e4d..4f2fcb8af5d1 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -383,7 +383,8 @@ __OMP_RTL(__kmpc_push_proc_bind, false, Void, IdentPtr, 
Int32, /* Int */ Int32)
 __OMP_RTL(__kmpc_serialized_parallel, false, Void, IdentPtr, Int32)
 __OMP_RTL(__kmpc_end_serialized_parallel, false, Void, IdentPtr, Int32)
 __OMP_RTL(__kmpc_omp_reg_task_with_affinity, false, Int32, IdentPtr, Int32,
-  Int8Ptr, Int32, Int8Ptr)
+  /* kmp_task_t */ VoidPtr, Int32,
+  /* kmp_task_affinity_info_t */ VoidPtr)
 
 __OMP_RTL(omp_get_thread_num, false, Int32, )
 __OMP_RTL(omp_get_num_threads, false, Int32, )
@@ -430,8 +431,7 @@ __OMP_RTL(__kmpc_reduce, false, Int32, IdentPtr, Int32, 
Int32, SizeTy, VoidPtr,
   ReduceFunctionPtr, KmpCriticalNamePtrTy)
 __OMP_RTL(__kmpc_reduce_nowait, false, Int32, IdentPtr, Int32, Int32, SizeTy,
   VoidPtr, ReduceFunctionPtr, KmpCriticalNamePtrTy)
-__OMP_RTL(__kmpc_end_reduce, false, Void, IdentPtr, Int32,
-  KmpCriticalNamePtrTy)
+__OMP_RTL(__kmpc_end_reduce, false, Void, IdentPtr, Int32, 
KmpCriticalNamePtrTy)
 __OMP_RTL(__kmpc_end_reduce_nowait, false, Void, IdentPtr, Int32,
   KmpCriticalNamePtrTy)
 
@@ -514,10 +514,10 @@ __OMP_RTL(__kmpc_taskloop, false, Void, IdentPtr, /* Int 
*/ Int32, VoidPtr,
   /* Int */ Int32, Int64, VoidPtr)
 __OMP_RTL(__kmpc_omp_target_task_alloc, false, /* kmp_task_t */ VoidPtr,
   IdentPtr, Int32, Int32, SizeTy, SizeTy, TaskRoutineEntryPtr, Int64)
-__OMP_RTL(__kmpc_taskred_modifier_init, false, VoidPtr, IdentPtr,
-  /* Int */ Int32, /* Int */ Int32, /* Int */ Int32, VoidPtr)
-__OMP_RTL(__kmpc_taskred_init, false, VoidPtr, /* Int */ Int32,
-  /* Int */ Int32, VoidPtr)
+__OMP_RTL(__kmpc_taskred_modifier_init, false, /* kmp_taskgroup */ VoidPtr,
+  IdentPtr, /* Int */ Int32, /* Int */ Int32, /* Int */ Int32, VoidPtr)
+__OMP_RTL(__kmpc_taskred_init, false, /* kmp_taskgroup */ VoidPtr,
+  /* Int */ Int32, /* Int */ Int32, VoidPtr)
 __OMP_RTL(__kmpc_task_reduction_modifier_fini, false, Void, IdentPtr,
   /* Int */ Int32, /* Int */ Int32)
 __OMP_RTL(__kmpc_task_reduction_get_th_data, false, VoidPtr, Int32, VoidPtr,
@@ -594,7 +594,9 @@ __OMP_RTL(__last, false, Void, )
 #undef __OMP_RTL
 #undef OMP_RTL
 
+#define ParamAttrs(...) ArrayRef({__VA_ARGS__})
 #define EnumAttr(Kind) Attribute::get(Ctx, Attribute::AttrKind::Kind)
+#define EnumAttrInt(Kind, N) Attribute::get(Ctx, Attribute::AttrKind::Kind, N)
 #define AttributeSet(...)  
\
   AttributeSet::get(Ctx, ArrayRef({__VA_ARGS__}))
 
@@ -607,19 +609,94 @@ __OMP_RTL(__last, false, Void, )
 __OMP_ATTRS_SET(GetterAttrs,
 OptimisticAttributes
 ? AttributeSet(EnumAttr(NoUnwind), EnumAttr(ReadOnly),
-   EnumAttr(NoS

[clang] 5dbc7cf - [Object] Refactor code for extracting offload binaries

2022-09-06 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2022-09-06T08:55:16-05:00
New Revision: 5dbc7cf7cac4428e0876a94a4fca10fe60af7328

URL: 
https://github.com/llvm/llvm-project/commit/5dbc7cf7cac4428e0876a94a4fca10fe60af7328
DIFF: 
https://github.com/llvm/llvm-project/commit/5dbc7cf7cac4428e0876a94a4fca10fe60af7328.diff

LOG: [Object] Refactor code for extracting offload binaries

We currently extract offload binaries inside of the linker wrapper.
Other tools may wish to do the same extraction operation. This patch
simply factors out this handling into the `OffloadBinary.h` interface.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D132689

Added: 


Modified: 
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
llvm/include/llvm/Object/OffloadBinary.h
llvm/lib/Object/CMakeLists.txt
llvm/lib/Object/OffloadBinary.cpp

Removed: 




diff  --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index f9d2c7710c77d..d29c4f93d60f7 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -86,22 +86,6 @@ static std::atomic LTOError;
 
 using OffloadingImage = OffloadBinary::OffloadingImage;
 
-/// A class to contain the binary information for a single OffloadBinary.
-class OffloadFile : public OwningBinary {
-public:
-  using TargetID = std::pair;
-
-  OffloadFile(std::unique_ptr Binary,
-  std::unique_ptr Buffer)
-  : OwningBinary(std::move(Binary), std::move(Buffer)) {}
-
-  /// We use the Triple and Architecture pair to group linker inputs together.
-  /// This conversion function lets us use these files in a hash-map.
-  operator TargetID() const {
-return std::make_pair(getBinary()->getTriple(), getBinary()->getArch());
-  }
-};
-
 namespace llvm {
 // Provide DenseMapInfo so that OffloadKind can be used in a DenseMap.
 template <> struct DenseMapInfo {
@@ -162,9 +146,6 @@ const OptTable &getOptTable() {
   return *Table;
 }
 
-Error extractFromBuffer(std::unique_ptr Buffer,
-SmallVectorImpl &DeviceFiles);
-
 void printCommands(ArrayRef CmdArgs) {
   if (CmdArgs.empty())
 return;
@@ -284,150 +265,6 @@ void printVersion(raw_ostream &OS) {
   OS << clang::getClangToolFullVersion("clang-linker-wrapper") << '\n';
 }
 
-/// Attempts to extract all the embedded device images contained inside the
-/// buffer \p Contents. The buffer is expected to contain a valid offloading
-/// binary format.
-Error extractOffloadFiles(MemoryBufferRef Contents,
-  SmallVectorImpl &DeviceFiles) {
-  uint64_t Offset = 0;
-  // There could be multiple offloading binaries stored at this section.
-  while (Offset < Contents.getBuffer().size()) {
-std::unique_ptr Buffer =
-MemoryBuffer::getMemBuffer(Contents.getBuffer().drop_front(Offset), "",
-   /*RequiresNullTerminator*/ false);
-auto BinaryOrErr = OffloadBinary::create(*Buffer);
-if (!BinaryOrErr)
-  return BinaryOrErr.takeError();
-OffloadBinary &Binary = **BinaryOrErr;
-
-// Create a new owned binary with a copy of the original memory.
-std::unique_ptr BufferCopy = MemoryBuffer::getMemBufferCopy(
-Binary.getData().take_front(Binary.getSize()),
-Contents.getBufferIdentifier());
-auto NewBinaryOrErr = OffloadBinary::create(*BufferCopy);
-if (!NewBinaryOrErr)
-  return NewBinaryOrErr.takeError();
-DeviceFiles.emplace_back(std::move(*NewBinaryOrErr), 
std::move(BufferCopy));
-
-Offset += Binary.getSize();
-  }
-
-  return Error::success();
-}
-
-// Extract offloading binaries from an Object file \p Obj.
-Error extractFromBinary(const ObjectFile &Obj,
-SmallVectorImpl &DeviceFiles) {
-  for (ELFSectionRef Sec : Obj.sections()) {
-if (Sec.getType() != ELF::SHT_LLVM_OFFLOADING)
-  continue;
-
-Expected Buffer = Sec.getContents();
-if (!Buffer)
-  return Buffer.takeError();
-
-MemoryBufferRef Contents(*Buffer, Obj.getFileName());
-if (Error Err = extractOffloadFiles(Contents, DeviceFiles))
-  return Err;
-  }
-
-  return Error::success();
-}
-
-Error extractFromBitcode(std::unique_ptr Buffer,
- SmallVectorImpl &DeviceFiles) {
-  LLVMContext Context;
-  SMDiagnostic Err;
-  std::unique_ptr M = getLazyIRModule(std::move(Buffer), Err, Context);
-  if (!M)
-return createStringError(inconvertibleErrorCode(),
- "Failed to create module");
-
-  // Extract offloading data from globals referenced by the
-  // `llvm.embedded.object` metadata with the `.llvm.offloading` section.
-  auto *MD = M->getNamedMetadata("llvm.embedded.objects");
-  if (!MD)
-return Error::success();
-
-  for (const MDNode *Op : MD->operands()) {
-if (Op->getNumOperands() < 2)
-  contin

[clang] a69404c - [OffloadPackager] Add ability to extract images from other file types

2022-09-06 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2022-09-06T08:55:17-05:00
New Revision: a69404c0a294ce65432ce67d5f3e7dce28106496

URL: 
https://github.com/llvm/llvm-project/commit/a69404c0a294ce65432ce67d5f3e7dce28106496
DIFF: 
https://github.com/llvm/llvm-project/commit/a69404c0a294ce65432ce67d5f3e7dce28106496.diff

LOG: [OffloadPackager] Add ability to extract images from other file types

A previous patch added support for extracting images from offloading
binaries. Users may wish to extract these files from the file types they
are most commonly emebedded in, such as an ELF or bitcode. This can be
difficult for the user to do manually, as these could be stored in
different section names potentially. This patch addsp support for
extracting these file types.

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D132607

Added: 


Modified: 
clang/test/Driver/offload-packager.c
clang/tools/clang-offload-packager/ClangOffloadPackager.cpp

Removed: 




diff  --git a/clang/test/Driver/offload-packager.c 
b/clang/test/Driver/offload-packager.c
index c4617d06e93d3..8d6ee50f2a190 100644
--- a/clang/test/Driver/offload-packager.c
+++ b/clang/test/Driver/offload-packager.c
@@ -29,3 +29,25 @@
 // RUN: 
diff  *-amdgcn-amd-amdhsa-gfx908.2.o %S/Inputs/dummy-elf.o; rm 
*-amdgcn-amd-amdhsa-gfx908.2.o
 // RUN: 
diff  *-amdgcn-amd-amdhsa-gfx90a.3.o %S/Inputs/dummy-elf.o; rm 
*-amdgcn-amd-amdhsa-gfx90a.3.o
 // RUN: not 
diff  *-amdgcn-amd-amdhsa-gfx90c.4.o %S/Inputs/dummy-elf.o
+
+// Check that we can extract from an ELF object file
+// RUN: clang-offload-packager -o %t.out \
+// RUN:   
--image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908
 \
+// RUN:   
--image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o 
-fembed-offload-object=%t.out
+// RUN: clang-offload-packager %t.out \
+// RUN:   
--image=file=%t-sm_70.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
+// RUN:   
--image=file=%t-gfx908.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908
+// RUN: 
diff  %t-sm_70.o %S/Inputs/dummy-elf.o
+// RUN: 
diff  %t-gfx908.o %S/Inputs/dummy-elf.o
+
+// Check that we can extract from a bitcode file
+// RUN: clang-offload-packager -o %t.out \
+// RUN:   
--image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908
 \
+// RUN:   
--image=file=%S/Inputs/dummy-elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-llvm -o %t.bc 
-fembed-offload-object=%t.out
+// RUN: clang-offload-packager %t.out \
+// RUN:   
--image=file=%t-sm_70.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
+// RUN:   
--image=file=%t-gfx908.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx908
+// RUN: 
diff  %t-sm_70.o %S/Inputs/dummy-elf.o
+// RUN: 
diff  %t-gfx908.o %S/Inputs/dummy-elf.o

diff  --git a/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp 
b/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
index c9c722e0a5b5c..47ef155ef2783 100644
--- a/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
+++ b/clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
@@ -14,8 +14,7 @@
 
 #include "clang/Basic/Version.h"
 
-#include "llvm/Object/Binary.h"
-#include "llvm/Object/ObjectFile.h"
+#include "llvm/BinaryFormat/Magic.h"
 #include "llvm/Object/OffloadBinary.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/FileOutputBuffer.h"
@@ -123,29 +122,6 @@ static Error bundleImages() {
   return Error::success();
 }
 
-static Expected>>
-extractOffloadFiles(MemoryBufferRef Contents) {
-  if (identify_magic(Contents.getBuffer()) != file_magic::offload_binary)
-return createStringError(inconvertibleErrorCode(),
- "Input buffer not an offloading binary");
-  SmallVector> Binaries;
-  uint64_t Offset = 0;
-  // There could be multiple offloading binaries stored at this section.
-  while (Offset < Contents.getBuffer().size()) {
-std::unique_ptr Buffer =
-MemoryBuffer::getMemBuffer(Contents.getBuffer().drop_front(Offset), "",
-   /*RequiresNullTerminator*/ false);
-auto BinaryOrErr = OffloadBinary::create(*Buffer);
-if (!BinaryOrErr)
-  return BinaryOrErr.takeError();
-
-Offset += (*BinaryOrErr)->getSize();
-Binaries.emplace_back(std::move(*BinaryOrErr));
-  }
-
-  return std::move(Binaries);
-}
-
 static Error unbundleImages() {
   ErrorOr> BufferOrErr =
   MemoryBuffer::getFileOrSTDIN(InputFile);
@@ -159,9 +135,9 @@ static Error unbundleImages() {
 Buffer = MemoryBuffer::getMemBufferCopy(Buffer->getBuffer(),
 Buffer->getBufferIdentifier());
 
-  auto BinariesOrErr = extractOffloadFiles(*Buffer);
-  if (!BinariesOrErr)
-return BinariesOrErr.takeError();

[clang] 57ef29f - [OpenMP] Remove use of removed '-f[no-]openmp-new-driver' flag

2022-09-06 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2022-09-06T13:40:05-05:00
New Revision: 57ef29f2835eb594bc2ad4793df05188be4c2ef6

URL: 
https://github.com/llvm/llvm-project/commit/57ef29f2835eb594bc2ad4793df05188be4c2ef6
DIFF: 
https://github.com/llvm/llvm-project/commit/57ef29f2835eb594bc2ad4793df05188be4c2ef6.diff

LOG: [OpenMP] Remove use of removed '-f[no-]openmp-new-driver' flag

The changes in D130020 removed all support for the old method of
compiling OpenMP offloading programs. This means that
`-fopenmp-new-driver` has no effect and `-fno-openmp-new-driver` does
not work. This patch removes the use and documentation of this flag.
Note that the `--offload-new-driver` flag still exists for using the new
driver optionally with CUDA and HIP.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D133367

Added: 


Modified: 
clang/docs/ClangCommandLineReference.rst
clang/lib/Driver/Driver.cpp
clang/lib/Driver/ToolChains/Clang.cpp
clang/test/Driver/amdgpu-openmp-toolchain.c

Removed: 




diff  --git a/clang/docs/ClangCommandLineReference.rst 
b/clang/docs/ClangCommandLineReference.rst
index 141c1464638a5..a7dc0634e97c0 100644
--- a/clang/docs/ClangCommandLineReference.rst
+++ b/clang/docs/ClangCommandLineReference.rst
@@ -2181,10 +2181,6 @@ Enable all Clang extensions for OpenMP directives and 
clauses
 
 Set rpath on OpenMP executables
 
-.. option:: -fopenmp-new-driver
-
-Use the new driver for OpenMP offloading.
-
 .. option:: -fopenmp-offload-mandatory
 
 Do not create a host fallback if offloading to the device fails.

diff  --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 3743515d3d43f..36fba5d91eaf4 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3902,9 +3902,7 @@ void Driver::BuildActions(Compilation &C, DerivedArgList 
&Args,
   OffloadingActionBuilder OffloadBuilder(C, Args, Inputs);
 
   bool UseNewOffloadingDriver =
-  (C.isOffloadingHostKind(Action::OFK_OpenMP) &&
-   Args.hasFlag(options::OPT_fopenmp_new_driver,
-options::OPT_no_offload_new_driver, true)) ||
+  C.isOffloadingHostKind(Action::OFK_OpenMP) ||
   Args.hasFlag(options::OPT_offload_new_driver,
options::OPT_no_offload_new_driver, false);
 

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 99a8642cfd85b..d39f8715c7a19 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4459,9 +4459,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction 
&JA,
   bool IsDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) ||
  JA.isDeviceOffloading(Action::OFK_Host));
   bool IsHostOffloadingAction =
-  (JA.isHostOffloading(Action::OFK_OpenMP) &&
-   Args.hasFlag(options::OPT_fopenmp_new_driver,
-options::OPT_no_offload_new_driver, true)) ||
+  JA.isHostOffloading(Action::OFK_OpenMP) ||
   (JA.isHostOffloading(C.getActiveOffloadKinds()) &&
Args.hasFlag(options::OPT_offload_new_driver,
 options::OPT_no_offload_new_driver, false));
@@ -4762,9 +4760,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction 
&JA,
 
 if (IsUsingLTO) {
   // Only AMDGPU supports device-side LTO.
-  if (IsDeviceOffloadAction &&
-  !Args.hasFlag(options::OPT_fopenmp_new_driver,
-options::OPT_no_offload_new_driver, true) &&
+  if (IsDeviceOffloadAction && !JA.isHostOffloading(Action::OFK_OpenMP) &&
   !Args.hasFlag(options::OPT_offload_new_driver,
 options::OPT_no_offload_new_driver, false) &&
   !Triple.isAMDGPU()) {

diff  --git a/clang/test/Driver/amdgpu-openmp-toolchain.c 
b/clang/test/Driver/amdgpu-openmp-toolchain.c
index 1551917ea50f0..50ce8e5d1b1fe 100644
--- a/clang/test/Driver/amdgpu-openmp-toolchain.c
+++ b/clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -49,5 +49,5 @@
 // RUN:   %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp 
-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa 
-march=gfx803 -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
 // CHECK-EMIT-LLVM-IR: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-emit-llvm"
 
-// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp 
-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa 
-march=gfx803 -lm --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode 
-fopenmp-new-driver %s 2>&1 | FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NEW
+// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp 
-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa 
-march=gfx803 -lm --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode %s 2>&1 
| FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NEW
 // CHECK-LIB-DEVICE-NEW: 
{{.*}}clang-linker-wrapper{{.*}}-

[clang] 3a62399 - [OpenMP] Fix logic error when building offloading applications

2022-09-06 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2022-09-06T13:56:24-05:00
New Revision: 3a623999f3ff96843f97ee300e0c94b8cbc88a9f

URL: 
https://github.com/llvm/llvm-project/commit/3a623999f3ff96843f97ee300e0c94b8cbc88a9f
DIFF: 
https://github.com/llvm/llvm-project/commit/3a623999f3ff96843f97ee300e0c94b8cbc88a9f.diff

LOG: [OpenMP] Fix logic error when building offloading applications

Summary:
A previous patch removed support for the `-fopenmp-new-driver` and
accidentally used the `isHostOffloading` flag instead of
`isDeviceOffloading` which lead to some build errors when compiling for
the offloading device. This patch addresses that.

Added: 


Modified: 
clang/lib/Driver/ToolChains/Clang.cpp

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index d39f8715c7a1..d3b5f82cb5c2 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4760,7 +4760,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction 
&JA,
 
 if (IsUsingLTO) {
   // Only AMDGPU supports device-side LTO.
-  if (IsDeviceOffloadAction && !JA.isHostOffloading(Action::OFK_OpenMP) &&
+  if (IsDeviceOffloadAction && !JA.isDeviceOffloading(Action::OFK_OpenMP) 
&&
   !Args.hasFlag(options::OPT_offload_new_driver,
 options::OPT_no_offload_new_driver, false) &&
   !Triple.isAMDGPU()) {



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] 2753eaf - [Clang] Fix the new driver crashing when using '-fsyntax-only'

2022-09-06 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2022-09-06T19:49:47-05:00
New Revision: 2753eafe5a7f003776b12f425c5b0a475e8fb6b7

URL: 
https://github.com/llvm/llvm-project/commit/2753eafe5a7f003776b12f425c5b0a475e8fb6b7
DIFF: 
https://github.com/llvm/llvm-project/commit/2753eafe5a7f003776b12f425c5b0a475e8fb6b7.diff

LOG: [Clang] Fix the new driver crashing when using '-fsyntax-only'

The new driver currently crashses when attempting to use the
'-fsyntax-only' option. This is because the option causes all output to
be given the `TY_Nothing' type which should signal the end of the
pipeline. The new driver was not treating this correctly and attempting
to use empty input. This patch fixes the handling so we do not attempt
to continue when the input is nothing.

One concession is that we must now check when generating the arguments
for Clang if the input is of 'TY_Nothing'. This is because the new
driver will only create code if the device code is a dependency on the
host, creating the output without the dependency would require a
complete rewrite of the logic as we do not maintain any state between
calls to 'BuildOffloadingActions' so I believe this is the most
straightforward method.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D133161

Added: 


Modified: 
clang/lib/Driver/Driver.cpp
clang/lib/Driver/ToolChains/Clang.cpp
clang/test/Driver/cuda-bindings.cu
clang/test/Driver/hip-binding.hip
clang/test/Driver/openmp-offload.c

Removed: 




diff  --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 36fba5d91eaf4..9517331ade26b 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4309,10 +4309,14 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
 
   auto TCAndArch = TCAndArchs.begin();
   for (Action *&A : DeviceActions) {
+if (A->getType() == types::TY_Nothing)
+  continue;
+
 A = ConstructPhaseAction(C, Args, Phase, A, Kind);
 
 if (isa(A) && isa(HostAction) &&
-Kind == Action::OFK_OpenMP) {
+Kind == Action::OFK_OpenMP &&
+HostAction->getType() != types::TY_Nothing) {
   // OpenMP offloading has a dependency on the host compile action to
   // identify which declarations need to be emitted. This shouldn't be
   // collapsed with any other actions so we can use it in the device.
@@ -4380,11 +4384,15 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
  nullptr, Action::OFK_None);
   }
 
+  // If we are unable to embed a single device output into the host, we need to
+  // add each device output as a host dependency to ensure they are still 
built.
+  bool SingleDeviceOutput = !llvm::any_of(OffloadActions, [](Action *A) {
+return A->getType() == types::TY_Nothing;
+  }) && isa(HostAction);
   OffloadAction::HostDependence HDep(
   *HostAction, *C.getSingleOffloadToolChain(),
   /*BoundArch=*/nullptr, isa(HostAction) ? DDep : DDeps);
-  return C.MakeAction(
-  HDep, isa(HostAction) ? DDep : DDeps);
+  return C.MakeAction(HDep, SingleDeviceOutput ? DDep : DDeps);
 }
 
 Action *Driver::ConstructPhaseAction(

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index d3b5f82cb5c20..837486971d112 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4496,8 +4496,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction 
&JA,
   const InputInfo *CudaDeviceInput = nullptr;
   const InputInfo *OpenMPDeviceInput = nullptr;
   for (const InputInfo &I : Inputs) {
-if (&I == &Input) {
-  // This is the primary input.
+if (&I == &Input || I.getType() == types::TY_Nothing) {
+  // This is the primary input or contains nothing.
 } else if (IsHeaderModulePrecompile &&
types::getPrecompiledType(I.getType()) == types::TY_PCH) {
   types::ID Expected = HeaderModuleInput.getType();

diff  --git a/clang/test/Driver/cuda-bindings.cu 
b/clang/test/Driver/cuda-bindings.cu
index 6c4398b706973..3cc65b8cf98bc 100644
--- a/clang/test/Driver/cuda-bindings.cu
+++ b/clang/test/Driver/cuda-bindings.cu
@@ -216,3 +216,14 @@
 // RUN:--offload-arch=sm_70 --offload-arch=sm_52 --offload-device-only 
-c -o %t %s 2>&1 \
 // RUN: | FileCheck -check-prefix=MULTI-D-ONLY-O %s
 // MULTI-D-ONLY-O: error: cannot specify -o when generating multiple output 
files
+
+//
+// Check to ensure that we can use '-fsyntax-only' for CUDA output with the new
+// driver.
+// 
+// RUN: %clang -### -target powerpc64le-ibm-linux-gnu --offload-new-driver \
+// RUN:-fsyntax-only --offload-arch=sm_70 --offload-arch=sm_52 -c %s 
2>&1 \
+// RUN: | FileCheck -check-prefix=SYNTAX-ONLY %s
+// SYNTAX-ONLY: "-cc1" "-triple" "nvptx64-nvidia-cuda"{{.*}}"-fsyntax-only"
+// SYNTAX-ONLY: "-cc1" "-triple" "nvptx64-nvidia-cuda"{{.*}}"-fsyntax-only"
+// S

[clang] a6bb7c2 - [CUDA] Fix test failing when using the new driver

2022-09-06 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2022-09-06T20:14:20-05:00
New Revision: a6bb7c22fc288686010076ac253a12b4b1cd2ee5

URL: 
https://github.com/llvm/llvm-project/commit/a6bb7c22fc288686010076ac253a12b4b1cd2ee5
DIFF: 
https://github.com/llvm/llvm-project/commit/a6bb7c22fc288686010076ac253a12b4b1cd2ee5.diff

LOG: [CUDA] Fix test failing when using the new driver

Summary:
Previously the new driver crashed when using `-fsyntax-only` which
required a work-around in one of the test files. This was not properly
updated when it was fixed for the new driver. This patch fixes the test
and also adjusts a missing boolean check.

Added: 


Modified: 
clang/lib/Driver/Driver.cpp
clang/test/Driver/cuda-bindings.cu

Removed: 




diff  --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 9517331ade26..ca8e0e5240e1 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4391,7 +4391,7 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
   }) && isa(HostAction);
   OffloadAction::HostDependence HDep(
   *HostAction, *C.getSingleOffloadToolChain(),
-  /*BoundArch=*/nullptr, isa(HostAction) ? DDep : DDeps);
+  /*BoundArch=*/nullptr, SingleDeviceOutput ? DDep : DDeps);
   return C.MakeAction(HDep, SingleDeviceOutput ? DDep : DDeps);
 }
 

diff  --git a/clang/test/Driver/cuda-bindings.cu 
b/clang/test/Driver/cuda-bindings.cu
index 3cc65b8cf98b..ce4b423064bc 100644
--- a/clang/test/Driver/cuda-bindings.cu
+++ b/clang/test/Driver/cuda-bindings.cu
@@ -102,8 +102,6 @@
 // NDSYN-NOT: inputs:
 // NDSYN: # "nvptx64-nvidia-cuda" - "clang", inputs: [{{.*}}], output: 
(nothing)
 // NDSYN-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: [{{.*}}], output: 
(nothing)
-// ! FIXME: new driver erroneously attempts to run linker phase w/ no inputs.
-//  Remove these checks once the issue is solved.
 // NDSYN-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: [(nothing), 
(nothing)], output: "{{.*}}"
 // NDSYN-NEXT: # "powerpc64le-ibm-linux-gnu" - "clang", inputs: [{{.*}}], 
output: (nothing)
 // NDSYN-NOT: inputs:



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] 7354a73 - [CUDA] Actually fix the test correctly this time

2022-09-06 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2022-09-06T20:31:27-05:00
New Revision: 7354a73945f1c123d66b01f51374ecbdba18fab3

URL: 
https://github.com/llvm/llvm-project/commit/7354a73945f1c123d66b01f51374ecbdba18fab3
DIFF: 
https://github.com/llvm/llvm-project/commit/7354a73945f1c123d66b01f51374ecbdba18fab3.diff

LOG: [CUDA] Actually fix the test correctly this time

Added: 


Modified: 
clang/test/Driver/cuda-bindings.cu

Removed: 




diff  --git a/clang/test/Driver/cuda-bindings.cu 
b/clang/test/Driver/cuda-bindings.cu
index ce4b423064bc..f95d2de80f4a 100644
--- a/clang/test/Driver/cuda-bindings.cu
+++ b/clang/test/Driver/cuda-bindings.cu
@@ -102,7 +102,6 @@
 // NDSYN-NOT: inputs:
 // NDSYN: # "nvptx64-nvidia-cuda" - "clang", inputs: [{{.*}}], output: 
(nothing)
 // NDSYN-NEXT: # "nvptx64-nvidia-cuda" - "clang", inputs: [{{.*}}], output: 
(nothing)
-// NDSYN-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: [(nothing), 
(nothing)], output: "{{.*}}"
 // NDSYN-NEXT: # "powerpc64le-ibm-linux-gnu" - "clang", inputs: [{{.*}}], 
output: (nothing)
 // NDSYN-NOT: inputs:
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] 194ec84 - [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-14 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2022-09-14T09:42:06-05:00
New Revision: 194ec844f5c67306f505a3418038c5e75859bad8

URL: 
https://github.com/llvm/llvm-project/commit/194ec844f5c67306f505a3418038c5e75859bad8
DIFF: 
https://github.com/llvm/llvm-project/commit/194ec844f5c67306f505a3418038c5e75859bad8.diff

LOG: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

Previously, we linked in the ROCm device libraries which provide math
and other utility functions late. This is not stricly correct as this
library contains several flags that are only set per-TU, such as fast
math or denormalization. This patch changes this to pass the bitcode
libraries per-TU using the same method we use for the CUDA libraries.
This has the advantage that we correctly propagate attributes making
this implementation more correct. Additionally, many annoying unused
functions were not being fully removed during LTO. This lead to
erroneous warning messages and remarks on unused functions.

I am not sure if not finding these libraries should be a hard error. let
me know if it should be demoted to a warning saying that some device
utilities will not work without them.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D133726

Added: 


Modified: 
clang/include/clang/Driver/ToolChain.h
clang/lib/Driver/ToolChain.cpp
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Driver/ToolChains/HIPAMD.cpp
clang/lib/Driver/ToolChains/HIPAMD.h
clang/lib/Driver/ToolChains/HIPSPV.cpp
clang/lib/Driver/ToolChains/HIPSPV.h
clang/test/Driver/amdgpu-openmp-toolchain.c

Removed: 




diff  --git a/clang/include/clang/Driver/ToolChain.h 
b/clang/include/clang/Driver/ToolChain.h
index 59d8dafc079f..28137e36e2af 100644
--- a/clang/include/clang/Driver/ToolChain.h
+++ b/clang/include/clang/Driver/ToolChain.h
@@ -714,9 +714,9 @@ class ToolChain {
   virtual VersionTuple computeMSVCVersion(const Driver *D,
   const llvm::opt::ArgList &Args) 
const;
 
-  /// Get paths of HIP device libraries.
+  /// Get paths for device libraries.
   virtual llvm::SmallVector
-  getHIPDeviceLibs(const llvm::opt::ArgList &Args) const;
+  getDeviceLibs(const llvm::opt::ArgList &Args) const;
 
   /// Add the system specific linker arguments to use
   /// for the given HIP runtime library type.

diff  --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 7f16469155bd..26c5087b4ac2 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1099,7 +1099,7 @@ void ToolChain::AddHIPIncludeArgs(const ArgList 
&DriverArgs,
   ArgStringList &CC1Args) const {}
 
 llvm::SmallVector
-ToolChain::getHIPDeviceLibs(const ArgList &DriverArgs) const {
+ToolChain::getDeviceLibs(const ArgList &DriverArgs) const {
   return {};
 }
 

diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index 4866982a8dfa..8ab79e1af532 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -75,6 +75,12 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions(
   if (DriverArgs.hasArg(options::OPT_nogpulib))
 return;
 
+  for (auto BCFile : getDeviceLibs(DriverArgs)) {
+CC1Args.push_back(BCFile.ShouldInternalize ? "-mlink-builtin-bitcode"
+   : "-mlink-bitcode-file");
+CC1Args.push_back(DriverArgs.MakeArgString(BCFile.Path));
+  }
+
   // Link the bitcode library late if we're using device LTO.
   if (getDriver().isUsingLTO(/* IsOffload */ true))
 return;
@@ -158,3 +164,24 @@ AMDGPUOpenMPToolChain::computeMSVCVersion(const Driver *D,
   const ArgList &Args) const {
   return HostTC.computeMSVCVersion(D, Args);
 }
+
+llvm::SmallVector
+AMDGPUOpenMPToolChain::getDeviceLibs(const llvm::opt::ArgList &Args) const {
+  if (Args.hasArg(options::OPT_nogpulib))
+return {};
+
+  if (!RocmInstallation.hasDeviceLibrary()) {
+getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 0;
+return {};
+  }
+
+  StringRef GpuArch = getProcessorFromTargetID(
+  getTriple(), Args.getLastArgValue(options::OPT_march_EQ));
+
+  SmallVector BCLibs;
+  for (auto BCLib : getCommonDeviceLibNames(Args, GpuArch.str(),
+/*IsOpenMP=*/true))
+BCLibs.emplace_back(BCLib);
+
+  return BCLibs;
+}

diff  --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.h 
b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
index 51a1c4696754..2be444a42c55 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
@@ -54,6 +54,9 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUOpenMPToolChain final
   computeMSVCVersion(const Dri

[clang] bae1a2c - [OpenMP] Remove unused function after removing simplified interface

2022-09-14 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2022-09-14T10:14:43-05:00
New Revision: bae1a2cf3cce529b0d03df8bac962d13b407e117

URL: 
https://github.com/llvm/llvm-project/commit/bae1a2cf3cce529b0d03df8bac962d13b407e117
DIFF: 
https://github.com/llvm/llvm-project/commit/bae1a2cf3cce529b0d03df8bac962d13b407e117.diff

LOG: [OpenMP] Remove unused function after removing simplified interface

Summary:
A previous patch removed the user of this function but did not remove
the function causing unused function warnings. Remove it.

Added: 


Modified: 
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 1587d52846b1..0ab988968908 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -905,92 +905,6 @@ static bool hasNestedLightweightDirective(ASTContext &Ctx,
   return false;
 }
 
-/// Checks if the construct supports lightweight runtime. It must be SPMD
-/// construct + inner loop-based construct with static scheduling.
-static bool supportsLightweightRuntime(ASTContext &Ctx,
-   const OMPExecutableDirective &D) {
-  if (!supportsSPMDExecutionMode(Ctx, D))
-return false;
-  OpenMPDirectiveKind DirectiveKind = D.getDirectiveKind();
-  switch (DirectiveKind) {
-  case OMPD_target:
-  case OMPD_target_teams:
-  case OMPD_target_parallel:
-return hasNestedLightweightDirective(Ctx, D);
-  case OMPD_target_parallel_for:
-  case OMPD_target_parallel_for_simd:
-  case OMPD_target_teams_distribute_parallel_for:
-  case OMPD_target_teams_distribute_parallel_for_simd:
-// (Last|First)-privates must be shared in parallel region.
-return hasStaticScheduling(D);
-  case OMPD_target_simd:
-  case OMPD_target_teams_distribute_simd:
-return true;
-  case OMPD_target_teams_distribute:
-return false;
-  case OMPD_parallel:
-  case OMPD_for:
-  case OMPD_parallel_for:
-  case OMPD_parallel_master:
-  case OMPD_parallel_sections:
-  case OMPD_for_simd:
-  case OMPD_parallel_for_simd:
-  case OMPD_cancel:
-  case OMPD_cancellation_point:
-  case OMPD_ordered:
-  case OMPD_threadprivate:
-  case OMPD_allocate:
-  case OMPD_task:
-  case OMPD_simd:
-  case OMPD_sections:
-  case OMPD_section:
-  case OMPD_single:
-  case OMPD_master:
-  case OMPD_critical:
-  case OMPD_taskyield:
-  case OMPD_barrier:
-  case OMPD_taskwait:
-  case OMPD_taskgroup:
-  case OMPD_atomic:
-  case OMPD_flush:
-  case OMPD_depobj:
-  case OMPD_scan:
-  case OMPD_teams:
-  case OMPD_target_data:
-  case OMPD_target_exit_data:
-  case OMPD_target_enter_data:
-  case OMPD_distribute:
-  case OMPD_distribute_simd:
-  case OMPD_distribute_parallel_for:
-  case OMPD_distribute_parallel_for_simd:
-  case OMPD_teams_distribute:
-  case OMPD_teams_distribute_simd:
-  case OMPD_teams_distribute_parallel_for:
-  case OMPD_teams_distribute_parallel_for_simd:
-  case OMPD_target_update:
-  case OMPD_declare_simd:
-  case OMPD_declare_variant:
-  case OMPD_begin_declare_variant:
-  case OMPD_end_declare_variant:
-  case OMPD_declare_target:
-  case OMPD_end_declare_target:
-  case OMPD_declare_reduction:
-  case OMPD_declare_mapper:
-  case OMPD_taskloop:
-  case OMPD_taskloop_simd:
-  case OMPD_master_taskloop:
-  case OMPD_master_taskloop_simd:
-  case OMPD_parallel_master_taskloop:
-  case OMPD_parallel_master_taskloop_simd:
-  case OMPD_requires:
-  case OMPD_unknown:
-  default:
-break;
-  }
-  llvm_unreachable(
-  "Unknown programming model for OpenMP directive on NVPTX target.");
-}
-
 void CGOpenMPRuntimeGPU::emitNonSPMDKernel(const OMPExecutableDirective &D,
  StringRef ParentName,
  llvm::Function *&OutlinedFn,



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [clang] [OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (PR #71234)

2023-11-06 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/71234
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits



@@ -1035,6 +1043,13 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 }
   }
 
+  // Re-link against any bitcodes supplied via the -mlink-builtin-bitcode 
option
+  // Some optimizations may generate new function calls that would not have
+  // been linked pre-optimization (i.e. fused sincos calls generated by
+  // AMDGPULibCalls::fold_sincos.)
+  if (ClRelinkBuiltinBitcodePostop)

jhuber6 wrote:

So, what I had in mind is that we could make a new `clang` option similar to 
`-mlink-builtin-bitcode`. This would then be used by the HIP toolchain or 
similar when constructing the list of files to pass via 
`-mlink-builtin-bitcode`. We would then simply register those with this 
secondary pass.

This approach seems much simpler, being a boolean option that just relinks 
everything, but I somewhat like the idea of `-mlink-builtin-bitcode` being a 
pre-link operation and having another one for post-linking. That being said, it 
may not be worth the extra work because this is a huge hack around this 
ecosystem already.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

Some comments. I remember there was a reason we couldn't use the existing 
linking support and needed the new pass, what was that again?

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits



@@ -1035,6 +1043,13 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 }
   }
 
+  // Re-link against any bitcodes supplied via the -mlink-builtin-bitcode 
option
+  // Some optimizations may generate new function calls that would not have
+  // been linked pre-optimization (i.e. fused sincos calls generated by
+  // AMDGPULibCalls::fold_sincos.)
+  if (ClRelinkBuiltinBitcodePostop)

jhuber6 wrote:

It's definitely the easier option. This problem is pretty specific, but I can 
see it being easier to just remove this class of bugs entirely. It's an ugly 
solution for an ugly problem overall.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits



@@ -45,7 +46,8 @@ namespace clang {
  const TargetOptions &TOpts, const LangOptions &LOpts,
  StringRef TDesc, llvm::Module *M, BackendAction 
Action,
  llvm::IntrusiveRefCntPtr VFS,
- std::unique_ptr OS);
+ std::unique_ptr OS,
+ BackendConsumer *BC = NULL);

jhuber6 wrote:

Use `nullptr` in C++, it's type safe.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits



@@ -48,428 +49,369 @@
 #include "llvm/Support/ToolOutputFile.h"
 #include "llvm/Support/YAMLTraits.h"
 #include "llvm/Transforms/IPO/Internalize.h"
+#include "llvm/Transforms/Utils/Cloning.h"
 
-#include 
 #include 
 using namespace clang;
 using namespace llvm;
 
 #define DEBUG_TYPE "codegenaction"
 
 namespace clang {
-  class BackendConsumer;
-  class ClangDiagnosticHandler final : public DiagnosticHandler {
-  public:
-ClangDiagnosticHandler(const CodeGenOptions &CGOpts, BackendConsumer *BCon)
-: CodeGenOpts(CGOpts), BackendCon(BCon) {}
+class BackendConsumer;
+class ClangDiagnosticHandler final : public DiagnosticHandler {
+public:
+  ClangDiagnosticHandler(const CodeGenOptions &CGOpts, BackendConsumer *BCon)
+  : CodeGenOpts(CGOpts), BackendCon(BCon) {}
 
-bool handleDiagnostics(const DiagnosticInfo &DI) override;
+  bool handleDiagnostics(const DiagnosticInfo &DI) override;
 
-bool isAnalysisRemarkEnabled(StringRef PassName) const override {
-  return CodeGenOpts.OptimizationRemarkAnalysis.patternMatches(PassName);
-}
-bool isMissedOptRemarkEnabled(StringRef PassName) const override {
-  return CodeGenOpts.OptimizationRemarkMissed.patternMatches(PassName);
-}
-bool isPassedOptRemarkEnabled(StringRef PassName) const override {
-  return CodeGenOpts.OptimizationRemark.patternMatches(PassName);
-}
+  bool isAnalysisRemarkEnabled(StringRef PassName) const override {
+return CodeGenOpts.OptimizationRemarkAnalysis.patternMatches(PassName);
+  }
+  bool isMissedOptRemarkEnabled(StringRef PassName) const override {
+return CodeGenOpts.OptimizationRemarkMissed.patternMatches(PassName);
+  }
+  bool isPassedOptRemarkEnabled(StringRef PassName) const override {
+return CodeGenOpts.OptimizationRemark.patternMatches(PassName);
+  }
 
-bool isAnyRemarkEnabled() const override {
-  return CodeGenOpts.OptimizationRemarkAnalysis.hasValidPattern() ||
- CodeGenOpts.OptimizationRemarkMissed.hasValidPattern() ||
- CodeGenOpts.OptimizationRemark.hasValidPattern();
-}
+  bool isAnyRemarkEnabled() const override {
+return CodeGenOpts.OptimizationRemarkAnalysis.hasValidPattern() ||
+   CodeGenOpts.OptimizationRemarkMissed.hasValidPattern() ||
+   CodeGenOpts.OptimizationRemark.hasValidPattern();
+  }
 
-  private:
-const CodeGenOptions &CodeGenOpts;
-BackendConsumer *BackendCon;
-  };
+private:
+  const CodeGenOptions &CodeGenOpts;
+  BackendConsumer *BackendCon;
+};
+
+static void reportOptRecordError(Error E, DiagnosticsEngine &Diags,
+ const CodeGenOptions &CodeGenOpts) {
+  handleAllErrors(
+  std::move(E),
+[&](const LLVMRemarkSetupFileError &E) {
+Diags.Report(diag::err_cannot_open_file)
+<< CodeGenOpts.OptRecordFile << E.message();
+  },
+[&](const LLVMRemarkSetupPatternError &E) {
+Diags.Report(diag::err_drv_optimization_remark_pattern)
+<< E.message() << CodeGenOpts.OptRecordPasses;
+  },
+[&](const LLVMRemarkSetupFormatError &E) {
+Diags.Report(diag::err_drv_optimization_remark_format)
+<< CodeGenOpts.OptRecordFormat;
+  });
+}
 
-  static void reportOptRecordError(Error E, DiagnosticsEngine &Diags,
-   const CodeGenOptions &CodeGenOpts) {
-handleAllErrors(
-std::move(E),
-  [&](const LLVMRemarkSetupFileError &E) {
-  Diags.Report(diag::err_cannot_open_file)
-  << CodeGenOpts.OptRecordFile << E.message();
-},
-  [&](const LLVMRemarkSetupPatternError &E) {
-  Diags.Report(diag::err_drv_optimization_remark_pattern)
-  << E.message() << CodeGenOpts.OptRecordPasses;
-},
-  [&](const LLVMRemarkSetupFormatError &E) {
-  Diags.Report(diag::err_drv_optimization_remark_format)
-  << CodeGenOpts.OptRecordFormat;
-});
-}
+BackendConsumer::BackendConsumer(BackendAction Action, DiagnosticsEngine 
&Diags,
+ IntrusiveRefCntPtr VFS,
+ const HeaderSearchOptions &HeaderSearchOpts,
+ const PreprocessorOptions &PPOpts,
+ const CodeGenOptions &CodeGenOpts,
+ const TargetOptions &TargetOpts,
+ const LangOptions &LangOpts,
+ const std::string &InFile,
+ SmallVector LinkModules,
+ std::unique_ptr OS,
+ LLVMContext &C,
+ CoverageSourceInfo *CoverageInfo)
+  : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
+  CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
+  AsmOutStream(std::move(OS)), Context(nullptr), FS(VFS),
+  LLVMIRGeneration("irgen", "LLVM

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits



@@ -155,10 +162,10 @@ class EmitAssemblyHelper {
 return F;
   }
 
-  void
-  RunOptimizationPipeline(BackendAction Action,
+  void RunOptimizationPipeline(BackendAction Action,
   std::unique_ptr &OS,
-  std::unique_ptr &ThinLinkOS);
+  std::unique_ptr &ThinLinkOS,
+  BackendConsumer *BC);

jhuber6 wrote:

Is this properly formatted?

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits



@@ -0,0 +1,29 @@
+//===-- LinkInModulesPass.cpp - Module Linking pass --- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+/// \file
+///
+/// LinkInModulesPass implementation.
+///
+//===--===//
+
+#include "LinkInModulesPass.h"
+#include "BackendConsumer.h"
+
+using namespace llvm;
+
+LinkInModulesPass::LinkInModulesPass(clang::BackendConsumer *BC,
+ bool ShouldLinkFiles) : BC(BC),
+ ShouldLinkFiles(ShouldLinkFiles) {}
+
+PreservedAnalyses LinkInModulesPass::run(Module &M, ModuleAnalysisManager &AM) 
{
+
+  if (BC != NULL && BC->LinkInModules(&M, ShouldLinkFiles))

jhuber6 wrote:

```suggestion
  if (BC && BC->LinkInModules(&M, ShouldLinkFiles))
```
Here as well.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

I'm not entirely happy with the existence of this hack, but it's an ugly 
solution to an ugly self-inflicted problem, so I can live with it for now.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

The code formatter says that it's not happy. Can you try `git clang-format 
HEAD~1` in your branch?

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> I am getting this from the formatter:
> 
> ```
> -  void RunOptimizationPipeline(BackendAction Action,
> -  std::unique_ptr &OS,
> -  std::unique_ptr &ThinLinkOS,
> -  BackendConsumer *BC);
> +  void RunOptimizationPipeline(
> +  BackendAction Action, std::unique_ptr &OS,
> +  std::unique_ptr &ThinLinkOS, BackendConsumer 
> *BC);
> ```
> 
> But in this case I am just following the existing style. I did notice a 
> couple of other improvements from the formatter though, and I've added those 
> changes.

Just do what the formatter says, not every file is 100% clang-formatted so 
there's bits of old code that haven't been properly cleaned yet. This was the 
same line that I thought looked wrong so it should probably be fixed. Using 
`git clang-format HEAD~1` only formats what you've changed, so you don't need 
to worry about spurious edits.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> > Just do what the formatter says, not every file is 100% clang-formatted so 
> > there's bits of old code that haven't been properly cleaned yet. This was 
> > the same line that I thought looked wrong so it should probably be fixed. 
> > Using `git clang-format HEAD~1` only formats what you've changed, so you 
> > don't need to worry about spurious edits.
> 
> Isn't the standard to follow the existing style, not re-format small sections 
> of code during a commit to a different style?
> 
> [Always follow the golden rule:
> 
> If you are extending, enhancing, or bug fixing already implemented code, use 
> the style that is already being used so that the source is uniform and easy 
> to follow.](https://llvm.org/docs/CodingStandards.html)

Yes, but this doesn't really apply since you changed the function signature so 
it needs to be reformatted. That rule primarily applies to sections that have 
been manually formatted but don't exactly match the `clang-format` rules. 
Another reason we don't do bulk `clang-format` everywhere is because it 
confuses `git blame`. However, in this case there's really no reason not to.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

2023-11-06 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> That said, I definitely don't want this to be a barrier to getting this patch 
> in, so if you still feel like we should go with the clang-format 
> recommendation, I'll change it and also update the EmitAssembly and 
> EmitBackendOutput signatures which were flagged by clang-format for the same 
> reasons.

You should generally just go with what `clang-format` says unless there's a 
compelling reason not to. There's a reason the CI complains if `git 
clang-format HEAD~1` doesn't come back clean.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [openmp] [clang] ReworkCtorDtor (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/71739

- [NVPTX] Allow the ctor/dtor lowering pass to emit kernels
- [OpenMP] Rework handling of global ctor/dtors in OpenMP


>From c1505a29d542bebd5c5e81d231e633c518b08caf Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 09:19:51 -0600
Subject: [PATCH 1/2] [NVPTX] Allow the ctor/dtor lowering pass to emit kernels

Summary:
This pass emits the new "nvptx$device$init" and "nvptx$device$fini"
kernels that are callable by the device. This intends to mimic the
method of lowering for AMDGPU where we emit `amdgcn.device.init` and
`amdgcn.device.fini` respectively. These kernels simply iterate a symbol
called `__init_array_start/stop` and `__fini_array_start/stop`.
Normally, the linker provides these symbols automatically. In the AMDGPU
case we only need call the kernel and we call the ctors / dtors.
However, for NVPTX we require the user initializes these variables to
the associated globals that we already emit as a part of this pass.

The motivation behind this change is to move away from OpenMP's handling
of ctors / dtors. I would much prefer that the backend / runtime handles
this. That allows us to handle ctors / dtors in a language agnostic way,

This approach requires that the runtime initializes the associated
globals. They are marked `weak` so we can emit this per-TU. The kernel
itself is `weak_odr` as it is copied exactly.

One downside is that any module containing these kernels elicitis the
"stack size cannot be statically determined warning" every time from
`nvlink` which is annoying but inconsequential for functionality. It
would be nice if there were a way to silence this warning however.
---
 .../Target/NVPTX/NVPTXCtorDtorLowering.cpp| 162 +-
 llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll|  58 +++
 2 files changed, 213 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp 
b/llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp
index ed7839cafe3a4ac..48221c210de1e3a 100644
--- a/llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp
@@ -11,6 +11,7 @@
 
//===--===//
 
 #include "NVPTXCtorDtorLowering.h"
+#include "MCTargetDesc/NVPTXBaseInfo.h"
 #include "NVPTX.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/IR/Constants.h"
@@ -32,6 +33,11 @@ static cl::opt
   cl::desc("Override unique ID of ctor/dtor globals."),
   cl::init(""), cl::Hidden);
 
+static cl::opt
+CreateKernels("nvptx-lower-global-ctor-dtor-kernel",
+  cl::desc("Do not emit the init/fini kernels."),
+  cl::init(true), cl::Hidden);
+
 namespace {
 
 static std::string getHash(StringRef Str) {
@@ -42,11 +48,132 @@ static std::string getHash(StringRef Str) {
   return llvm::utohexstr(Hash.low(), /*LowerCase=*/true);
 }
 
-static bool createInitOrFiniGlobls(Module &M, StringRef GlobalName,
-   bool IsCtor) {
-  GlobalVariable *GV = M.getGlobalVariable(GlobalName);
-  if (!GV || !GV->hasInitializer())
-return false;
+static void addKernelMetadata(Module &M, GlobalValue *GV) {
+  llvm::LLVMContext &Ctx = M.getContext();
+
+  // Get "nvvm.annotations" metadata node.
+  llvm::NamedMDNode *MD = M.getOrInsertNamedMetadata("nvvm.annotations");
+
+  llvm::Metadata *KernelMDVals[] = {
+  llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, "kernel"),
+  llvm::ConstantAsMetadata::get(
+  llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))};
+
+  // This kernel is only to be called single-threaded.
+  llvm::Metadata *ThreadXMDVals[] = {
+  llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, "maxntidx"),
+  llvm::ConstantAsMetadata::get(
+  llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))};
+  llvm::Metadata *ThreadYMDVals[] = {
+  llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, "maxntidy"),
+  llvm::ConstantAsMetadata::get(
+  llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))};
+  llvm::Metadata *ThreadZMDVals[] = {
+  llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, "maxntidz"),
+  llvm::ConstantAsMetadata::get(
+  llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))};
+
+  llvm::Metadata *BlockMDVals[] = {
+  llvm::ConstantAsMetadata::get(GV),
+  llvm::MDString::get(Ctx, "maxclusterrank"),
+  llvm::ConstantAsMetadata::get(
+  llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), 1))};
+
+  // Append metadata to nvvm.annotations.
+  MD->addOperand(llvm::MDNode::get(Ctx, KernelMDVals));
+  MD->addOperand(llvm::MDNode::get(Ctx, ThreadXMDVals));
+  MD->addOperand(llvm::MDNode::get(Ctx, ThreadYMDVals));
+  MD->addOperand(llvm::MDNode::get(Ctx, ThreadZMDVals));
+  MD->addOperand(llvm::MDNode::get(Ctx, BlockMDVals));
+}
+
+static Function *createInitOr

[llvm] [clang] [openmp] ReworkCtorDtor (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 6be02dce45d672dd358bc277c97815cb201c4d0b Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  14 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  45 --
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   2 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.cpp  |  22 +++
 .../common/PluginInterface/GlobalHandler.h|   4 +
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 109 +++
 openmp/libomptarget/src/rtl.cpp   |   9 +-
 19 files changed, 319 insertions(+), 211 deletions(-)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..d816aa8554df8bb 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -22,6 +22,7 @@
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"

 using namespace clang;
 using namespace CodeGen;
@@ -327,6 +328,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the C atexit runtime function.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +529,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/li

[llvm] [openmp] [clang] ReworkCtorDtor (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [openmp] ReworkCtorDtor (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From a9f8285ecef2d43c6ccd87a1be9f795d566ed9e8 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   2 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 18 files changed, 286 insertions(+), 215 deletions(-)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..bc77d4ed0851d4c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the C atexit runtime function.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function 
*CGOpenMPRuntime::emitThread

[clang] [llvm] [openmp] ReworkCtorDtor (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits



@@ -95,7 +95,7 @@ using namespace llvm;
 static cl::opt
 LowerCtorDtor("nvptx-lower-global-ctor-dtor",
   cl::desc("Lower GPU ctor / dtors to globals on the device."),
-  cl::init(false), cl::Hidden);
+  cl::init(true), cl::Hidden);

jhuber6 wrote:

This was the easiest way to get the desired effect. Passing 
`--nvptx-lower-glocal-ctor-dtor` is subtly broken because I think it will fail 
if the user didn't build with the NVPTX target. The OpenMP runtime is supposed 
to be buildable for NVPTX even without backend support so I was worried it 
would degrade that. Do you think I could check for `openmp` module data and set 
it based off of that? OpenMP always emits an `openmp` global flag.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [llvm] [clang] ReworkCtorDtor (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 159031c4c880e552ea90ec8ab6f6ed328c09ff10 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 18 files changed, 290 insertions(+), 216 deletions(-)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..bc77d4ed0851d4c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the C atexit runtime function.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function 
*CGOpenMPRuntime::emitThread

[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits



@@ -95,7 +95,7 @@ using namespace llvm;
 static cl::opt
 LowerCtorDtor("nvptx-lower-global-ctor-dtor",
   cl::desc("Lower GPU ctor / dtors to globals on the device."),
-  cl::init(false), cl::Hidden);
+  cl::init(true), cl::Hidden);

jhuber6 wrote:

Done

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 07a74b4561f2eb4f8debd40c7c2313da7b7fb0eb Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 18 files changed, 290 insertions(+), 216 deletions(-)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function 
*CGOpenMPRuntime::emit

[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-08 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 5e378ae3efdebedb044528167131c8cae4571a59 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   8 ++
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 18 files changed, 291 insertions(+), 216 deletions(-)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function 
*CGOpenMPRuntime::emi

[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits



@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();

jhuber6 wrote:

If there were any global ctors / dtors the backend will emit a kernel. This is 
simply encoding "Does this symbol exist? If not continue on". We check the ELF 
symbol table directly as it's more efficient than going through the device API.

We probably need to encode the logic better, since `consumeError` is a bit of a 
code smell. Maybe a helper function like `Handler.hasGlobal` or something.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits



@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();
+}
+
+// Allocate and construct the AMDGPU kernel.
+GenericKernelTy *AMDGPUKernel = Plugin.allocate();
+if (!AMDGPUKernel)
+  return Plugin::error("Failed to allocate memory for AMDGPU kernel");
+
+new (AMDGPUKernel) AMDGPUKernelTy(Name);
+if (auto Err = AMDGPUKernel->initImpl(*this, Image))
+  return std::move(Err);
+
+auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>();
+AsyncInfoWrapperTy AsyncInfoWrapper(*this, AsyncInfoPtr);
+
+if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper))
+  return std::move(Err);
+
+KernelArgsTy KernelArgs = {};
+if (auto Err = AMDGPUKernel->launchImpl(*this, /*NumThread=*/1u,
+/*NumBlocks=*/1ul, KernelArgs,
+/*Args=*/nullptr, 
AsyncInfoWrapper))
+  return std::move(Err);
+
+if (auto Err = synchronize(AsyncInfoPtr))
+  return std::move(Err);
+Error Err = Error::success();

jhuber6 wrote:

Yes

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 0a1f4b5d514a5e1525e3178a80f6e8f5638bfb69 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   8 ++
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 18 files changed, 291 insertions(+), 216 deletions(-)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function 
*CGOpenMPRuntime::emi

[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits



@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   Error synchronize(__tgt_async_info *AsyncInfo);
   virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0;
 
+  /// Invokes any global constructors on the device if present and is required
+  /// by the target.
+  virtual Error callGlobalConstructors(GenericPluginTy &Plugin,
+   DeviceImageTy &Image) {
+return Error::success();

jhuber6 wrote:

This code is in the header above the definition of the `Plugin` class, so we 
can't use that without a complete reordering.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits



@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction 
&CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
 return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+  !D.isStaticLocal() &&
+  (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))

jhuber6 wrote:

Yeah, these types of things are problematic especially if we consider getting 
SPIR-V support eventually. The logic basically goes like this. OpenMP supports 
global destructors but does not always support the `atexit` function. The old 
logic used to replace everything. This now at least lets CPU based targets use 
regular handling. I could make this unconditional for OpenMP, but I figured 
it'd be better to allow the CPU based targets to use the regular handling.

More or less this is just a concession to prevent regressions from this patch. 
The old logic looked like this, which did this unconditionally. Like I said, 
could remove the AMD and PTX checks and just do this on the CPU as well if it 
would be better.
```c++
  if (CGM.getLangOpts().OMPTargetTriples.empty() &&
  !CGM.getLangOpts().OpenMPIsTargetDevice)
return false;
```

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 5283c5e08877b11a0eece51ca3877c9f5f8c7b82 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 18 files changed, 290 insertions(+), 216 deletions(-)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function 
*CGOpenMPRuntime::emit

[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits



@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction 
&CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
 return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+  !D.isStaticLocal() &&
+  (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))

jhuber6 wrote:

Just make this apply to all triples. I don't want to remove the dependency on 
the OpenMP language because this is somewhat of a hack. We can revisit this 
later if needed.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CodeGen] Implement post-opt linking option for builtin bitocdes (PR #69371)

2023-11-09 Thread Joseph Huber via cfe-commits



@@ -98,6 +100,11 @@ extern cl::opt PrintPipelinePasses;
 static cl::opt ClSanitizeOnOptimizerEarlyEP(
 "sanitizer-early-opt-ep", cl::Optional,
 cl::desc("Insert sanitizers on OptimizerEarlyEP."), cl::init(false));
+
+// Re-link builtin bitcodes after optimization
+static cl::opt ClRelinkBuiltinBitcodePostop(
+"relink-builtin-bitcode-postop", cl::Optional,
+cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false));

jhuber6 wrote:

That's a clang flag, this is presumably more of an LLVM one because this added 
a new pass that lives in Clang. I still think the solution to this was to just 
stop the backend from doing this optimization if it will obviously break it, 
but supposedly that caused performance regressions.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Just noticed I'm actually calling the destructors backwards in AMDGPU. Will fix 
that.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits



@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction 
&CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
 return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+  !D.isStaticLocal() &&
+  (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))

jhuber6 wrote:

So just some random helper function like "Does target support X?"

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From c3df637dd2cb9a5210cb90a3bb69a63c31236039 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 .../test/libc/global_ctor_dtor.cpp|  37 +
 19 files changed, 327 insertions(+), 216 deletions(-)
 create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/Code

[clang] [openmp] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 45a645c4e65d3b1f98dee23c2eba1cf6db99bff0 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 113 +++
 openmp/libomptarget/src/rtl.cpp   |   6 +
 .../test/libc/global_ctor_dtor.cpp|  37 +
 19 files changed, 325 insertions(+), 216 deletions(-)
 create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeG

[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits



@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction 
&CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
 return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+  !D.isStaticLocal() &&
+  (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))

jhuber6 wrote:

I could put something in `LangOptions` that just returns the same thing. Wasn't 
sure if it's worth forcing a recompile of everything though.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 45a645c4e65d3b1f98dee23c2eba1cf6db99bff0 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH 1/2] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 113 +++
 openmp/libomptarget/src/rtl.cpp   |   6 +
 .../test/libc/global_ctor_dtor.cpp|  37 +
 19 files changed, 325 insertions(+), 216 deletions(-)
 create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/C

[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 45a645c4e65d3b1f98dee23c2eba1cf6db99bff0 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH 1/2] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 113 +++
 openmp/libomptarget/src/rtl.cpp   |   6 +
 .../test/libc/global_ctor_dtor.cpp|  37 +
 19 files changed, 325 insertions(+), 216 deletions(-)
 create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;

-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/C

[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Joseph Huber via cfe-commits



@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy {
   using CUDAStreamManagerTy = GenericDeviceResourceManagerTy;
   using CUDAEventManagerTy = GenericDeviceResourceManagerTy;
 
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ bool IsCtor) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'nvptx-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini",
+sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Plugin::success();
+}
+
+// The Nvidia backend cannot handle creating the ctor / dtor array
+// automatically so we must create it ourselves. The backend will emit
+// several globals that contain function pointers we can call. These are
+// prefixed with a known name due to Nvidia's lack of section support.
+const ELF64LEObjectFile *ELFObj =
+Handler.getOrCreateELFObjectFile(*this, Image);
+if (!ELFObj)
+  return Plugin::error("Unable to create ELF object for image %p",
+   Image.getStart());
+
+// Search for all symbols that contain a constructor or destructor.
+SmallVector> Funcs;
+for (ELFSymbolRef Sym : ELFObj->symbols()) {
+  auto NameOrErr = Sym.getName();
+  if (!NameOrErr)
+return NameOrErr.takeError();
+
+  if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_"
+ : "__fini_array_object_"))
+continue;
+
+  uint16_t priority;
+  if (NameOrErr->rsplit('_').second.getAsInteger(10, priority))
+return Plugin::error("Invalid priority for constructor or destructor");
+
+  Funcs.emplace_back(*NameOrErr, priority);
+}
+
+// Sort the created array to be in priority order.
+llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; });
+
+// Allocate a buffer to store all of the known constructor / destructor
+// functions in so we can iterate them on the device.
+void *Buffer =
+allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED);

jhuber6 wrote:

It's much more convenient than copying over the buffer. `SHARED` in CUDA 
context would be "migratable" memory without async access AFAIK. So this will 
most likely just invoke a migration once it's accessed. Unsure if that's slower 
or faster than waiting on an explicit memcpy. 

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 5366317448060c928ec415f7e243a402ef181cb5 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549

Add LangOption for atexit usage

Summary:
This method isn't 1-to-1 but it's more functional than not having it.
---
 clang/include/clang/Basic/LangOptions.h   |   3 +
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   8 ++
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  42 ++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 104 ++
 openmp/libomptarget/src/rtl.cpp   |   9 +-
 .../test/libc/global_ctor_dtor.cpp|  37 +
 20 files changed, 312 insertions(+), 217 deletions(-)
 create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp

diff --git a/clang/include/clang/Basic/LangOptions.h 
b/clang/include/clang/Basic/LangOptions.h
index 20a8ada60e0fe51..ae99357eeea7f41 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -597,6 +597,9 @@ class LangOptions : public LangOptionsBase {
 return !requiresStrictPrototypes() && !OpenCL;
   }

+  /// Returns true if the language supports calling the 'atexit' function.
+  bool hasAtExit() const { return !(OpenMP && OpenMPIsTargetDevice); }
+
   /// Returns true if implicit int is part of the language requirements.
   bool isImplicitIntRequired() const { return !CPlusPlus && !C99; }

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  l

[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From e0281fc280385286c3d5da7de619e793bd3b6bea Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549

Add LangOption for atexit usage

Summary:
This method isn't 1-to-1 but it's more functional than not having it.
---
 clang/include/clang/Basic/LangOptions.h   |   3 +
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   8 ++
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  42 ++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 110 +++
 openmp/libomptarget/src/rtl.cpp   |   9 +-
 .../test/libc/global_ctor_dtor.cpp|  37 +
 20 files changed, 318 insertions(+), 217 deletions(-)
 create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp

diff --git a/clang/include/clang/Basic/LangOptions.h 
b/clang/include/clang/Basic/LangOptions.h
index 20a8ada60e0fe51..ae99357eeea7f41 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -597,6 +597,9 @@ class LangOptions : public LangOptionsBase {
 return !requiresStrictPrototypes() && !OpenCL;
   }

+  /// Returns true if the language supports calling the 'atexit' function.
+  bool hasAtExit() const { return !(OpenMP && OpenMPIsTargetDevice); }
+
   /// Returns true if implicit int is part of the language requirements.
   bool isImplicitIntRequired() const { return !CPlusPlus && !C99; }

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }

+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+

[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [compiler-rt] [llvm] [HIP] support 128 bit int division (PR #71978)

2023-11-12 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Would it be feasible to consider switching to the new offloading driver mode 
> and really link with the library instead? It may be a conveniently isolated 
> use case with little/no existing users that would disrupt.

I've thought a reasonable amount about a `compiler-rt` for GPUs. Right now it's 
a little difficult because of the issue of compatibility. We could do the 
traditional "Build the library N times for N architectures", but I'd like to 
think of something more intelligent in the future. The use of 
`-mlink-builtin-bitcode` handles this by more-or-less forcing correct 
attributes.

What this patch does is a little interesting though, having the clang driver 
pick apart archives has always seemed a little weird. We did it in the past for 
AMD's old handling of static libraries. There's still a lot of that code 
leftover I want to delete. I really need to sit down and allow HIP to work with 
the new driver.

I've been messing around with generic IR a bit, and I think what might work is 
LLVM-IR that intentionally leaves off target specific attributes and then 
introduce a pass that adds them in if missing before other optimizations are 
run. Then we may be able to investigate the use of i-funcs to resolve target 
specific branches once the architecture is known (once being linked in). I 
think @JonChesterfield was thinking about something to that effect as well.

https://github.com/llvm/llvm-project/pull/71978
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-14 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/72280

Summary:
The standard GNU atomic operations are a very common way to target
hardware atomics on the device. With more hetergenous devices being
introduced, the concept of memory scopes has been in the LLVM language
for awhile via the `syncscope` modifier. For targets, such as the GPU,
this can change code generation depending on whether or not we only need
to be consistent with the memory ordering with the entire system, the
single GPU device, or lower.

Previously these scopes were only exported via the `opencl` and `hip`
variants of these functions. However, this made it difficult to use
outside of those languages and the semantics were different from the
standard GNU versions. This patch introduces a `__scoped_atomic` variant
for the common functions. There was some discussion over whether or not
these should be overloads of the existing ones, or simply new variants.
I leant towards new variants to be less disruptive.

The scope here can be one of the following

```
__MEMORY_SCOPE_SYSTEM // All devices and systems
__MEMORY_SCOPE_DEVICE // Just this device
__MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block
__MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp
__MEMORY_SCOPE_SINGLE // A single thread.
```
Naming consistency was attempted, but it is difficult to cpature to full
spectrum with no many names. Suggestions appreciated.


>From e08deb90a8d99226fd1e18e5dbc37014a9f88d1d Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 6 Nov 2023 07:08:18 -0600
Subject: [PATCH] [Clang] Introduce scoped variants of GNU atomic functions

Summary:
The standard GNU atomic operations are a very common way to target
hardware atomics on the device. With more hetergenous devices being
introduced, the concept of memory scopes has been in the LLVM language
for awhile via the `syncscope` modifier. For targets, such as the GPU,
this can change code generation depending on whether or not we only need
to be consistent with the memory ordering with the entire system, the
single GPU device, or lower.

Previously these scopes were only exported via the `opencl` and `hip`
variants of these functions. However, this made it difficult to use
outside of those languages and the semantics were different from the
standard GNU versions. This patch introduces a `__scoped_atomic` variant
for the common functions. There was some discussion over whether or not
these should be overloads of the existing ones, or simply new variants.
I leant towards new variants to be less disruptive.

The scope here can be one of the following

```
__MEMORY_SCOPE_SYSTEM // All devices and systems
__MEMORY_SCOPE_DEVICE // Just this device
__MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block
__MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp
__MEMORY_SCOPE_SINGLE // A single thread.
```
Naming consistency was attempted, but it is difficult to cpature to full
spectrum with no many names. Suggestions appreciated.
---
 clang/include/clang/AST/Expr.h   |  20 +-
 clang/include/clang/Basic/Builtins.def   |  26 ++
 clang/include/clang/Basic/SyncScope.h|  69 -
 clang/lib/AST/Expr.cpp   |  26 ++
 clang/lib/AST/StmtPrinter.cpp|   1 +
 clang/lib/CodeGen/CGAtomic.cpp   | 125 -
 clang/lib/CodeGen/Targets/AMDGPU.cpp |   5 +
 clang/lib/Frontend/InitPreprocessor.cpp  |   7 +
 clang/lib/Sema/SemaChecking.cpp  |  39 ++-
 clang/test/CodeGen/scoped-atomic-ops.c   | 331 +++
 clang/test/Preprocessor/init-aarch64.c   |   5 +
 clang/test/Preprocessor/init-loongarch.c |  10 +
 clang/test/Preprocessor/init.c   |  20 ++
 clang/test/Sema/scoped-atomic-ops.c  | 101 +++
 14 files changed, 764 insertions(+), 21 deletions(-)
 create mode 100644 clang/test/CodeGen/scoped-atomic-ops.c
 create mode 100644 clang/test/Sema/scoped-atomic-ops.c

diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index a9c4c67a60e8e8e..a41f2d66b37b69d 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -6498,7 +6498,7 @@ class AtomicExpr : public Expr {
 return cast(SubExprs[ORDER_FAIL]);
   }
   Expr *getVal2() const {
-if (Op == AO__atomic_exchange)
+if (Op == AO__atomic_exchange || Op == AO__scoped_atomic_exchange)
   return cast(SubExprs[ORDER_FAIL]);
 assert(NumSubExprs > VAL2);
 return cast(SubExprs[VAL2]);
@@ -6539,7 +6539,9 @@ class AtomicExpr : public Expr {
getOp() == AO__opencl_atomic_compare_exchange_weak ||
getOp() == AO__hip_atomic_compare_exchange_weak ||
getOp() == AO__atomic_compare_exchange ||
-   getOp() == AO__atomic_compare_exchange_n;
+   getOp() == AO__atomic_compare_exchange_n ||
+   getOp() == AO__scoped_atomic_compare_exchange ||
+   getOp() == AO__scoped_atomic_compare_exchange_n;
   }
 
   bool isOpenCL() const {
@@ -6569,13 +6571,13 @@ class

[clang] Fix tests clang-offload-bundler-zlib/zstd.c (PR #74504)

2023-12-05 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.

Thanks, I noticed that spurious failure as well but didn't know what caused it.

https://github.com/llvm/llvm-project/pull/74504
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [CUDA][HIP] Improve variable registration with the new driver (PR #73177)

2023-12-07 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

Ping

https://github.com/llvm/llvm-project/pull/73177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-12-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [CUDA][HIP] Improve variable registration with the new driver (PR #73177)

2023-12-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/73177
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Fix tests clang-offload-bundler-zlib/zstd.c (PR #74504)

2023-12-07 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

I got this fail just now after doing a pull.
```
FAIL: Clang :: Driver/hip-offload-compress-zstd.hip (477 of 1078)
 TEST 'Clang :: Driver/hip-offload-compress-zstd.hip' 
FAILED 
Exit Code: 1

Command Output (stderr):
--
RUN: at line 7: rm -rf 
/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc
+ rm -rf 
/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc
RUN: at line 8: /home/jhuber/Documents/llvm/llvm-project/build/bin/clang -c -v 
--target=x86_64-linux-gnu-x hip --offload-arch=gfx1100 
--offload-arch=gfx1101-fgpu-rdc -nogpuinc -nogpulib
/home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/Inputs/hip_multiple_inputs/a.cu
--offload-compress --offload-device-only --gpu-bundle-output-o 
/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc
  2>&1 | /home/jhuber/Documents/llvm/llvm-project/build/bin/FileCheck 
/home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip
+ /home/jhuber/Documents/llvm/llvm-project/build/bin/clang -c -v 
--target=x86_64-linux-gnu -x hip --offload-arch=gfx1100 --offload-arch=gfx1101 
-fgpu-rdc -nogpuinc -nogpulib 
/home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/Inputs/hip_multiple_inputs/a.cu
 --offload-compress --offload-device-only --gpu-bundle-output -o 
/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc
+ /home/jhuber/Documents/llvm/llvm-project/build/bin/FileCheck 
/home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip
RUN: at line 23: /home/jhuber/Documents/llvm/llvm-project/build/bin/clang 
--hip-link -### -v --target=x86_64-linux-gnu--offload-arch=gfx1100 
--offload-arch=gfx1101-fgpu-rdc -nogpulib
/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc
 --offload-device-only  2>&1 | 
/home/jhuber/Documents/llvm/llvm-project/build/bin/FileCheck 
-check-prefix=UNBUNDLE 
/home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip
+ /home/jhuber/Documents/llvm/llvm-project/build/bin/clang --hip-link -### -v 
--target=x86_64-linux-gnu --offload-arch=gfx1100 --offload-arch=gfx1101 
-fgpu-rdc -nogpulib 
/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc
 --offload-device-only
+ /home/jhuber/Documents/llvm/llvm-project/build/bin/FileCheck 
-check-prefix=UNBUNDLE 
/home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip
/home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip:29:14:
 error: UNBUNDLE: expected string not found in input
// UNBUNDLE: clang-offload-bundler{{.*}} "-type=bc"
 ^
:1:1: note: scanning from here
clang version 18.0.0git
^
:17:96: note: possible intended match here
clang: error: no such file or directory: 
'/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc'

   ^

Input file: 
Check file: 
/home/jhuber/Documents/llvm/llvm-project/clang/test/Driver/hip-offload-compress-zstd.hip

-dump-input=help explains the following input dump.

Input was:
<<
1: clang version 18.0.0git 
check:29'0 X~~~ error: no match found
2: Target: x86_64-unknown-linux-gnu 
check:29'0 ~
3: Thread model: posix 
check:29'0 
4: InstalledDir: /home/jhuber/Documents/llvm/llvm-project/build/bin 
check:29'0 ~
5: Found candidate GCC installation: 
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0 
check:29'0 
~~
6: Found candidate GCC installation: 
/usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0 
check:29'0 
~~
.
.
.
   12: Candidate multilib: .;@m64 
check:29'0 ~~~
   13: Candidate multilib: 32;@m32 
check:29'0 
   14: Selected multilib: .;@m64 
check:29'0 ~~
   15: Found CUDA installation: /opt/cuda, version  
check:29'0 ~
   16: Found HIP installation: /opt/rocm, version 5.6.31062 
check:29'0 ~
   17: clang: error: no such file or directory: 
'/home/jhuber/Documents/llvm/llvm-project/build/tools/clang/test/Driver/Output/a.bc'
 
check:29'0 
~~
check:29'1

[clang] bfd41c3 - [LinkerWrapper][Obvious] Fix missing use of texture data type

2023-12-07 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2023-12-07T16:55:14-06:00
New Revision: bfd41c3f8cc70bd65461a6d767f55c14d72150d9

URL: 
https://github.com/llvm/llvm-project/commit/bfd41c3f8cc70bd65461a6d767f55c14d72150d9
DIFF: 
https://github.com/llvm/llvm-project/commit/bfd41c3f8cc70bd65461a6d767f55c14d72150d9.diff

LOG: [LinkerWrapper][Obvious] Fix missing use of texture data type

Summary:
This was accidentally linked to the wrong pointer, causing unused
variable warnings and registering the wrong thing.

Added: 


Modified: 
clang/test/Driver/linker-wrapper-image.c
clang/tools/clang-linker-wrapper/OffloadWrapper.cpp

Removed: 




diff  --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index 4a17a8324b462..a2a1996f66430 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -90,7 +90,7 @@
 // CUDA-NEXT:   %4 = getelementptr inbounds %struct.__tgt_offload_entry, ptr 
%entry1, i64 0, i32 3
 // CUDA-NEXT:   %flags = load i32, ptr %4, align 4
 // CUDA-NEXT:   %5 = getelementptr inbounds %struct.__tgt_offload_entry, ptr 
%entry1, i64 0, i32 4
-// CUDA-NEXT:   %textype = load i32, ptr %4, align 4
+// CUDA-NEXT:   %textype = load i32, ptr %5, align 4
 // CUDA-NEXT:   %type = and i32 %flags, 7
 // CUDA-NEXT:   %6 = and i32 %flags, 8
 // CUDA-NEXT:   %extern = lshr i32 %6, 3
@@ -189,7 +189,7 @@
 // HIP-NEXT:   %4 = getelementptr inbounds %struct.__tgt_offload_entry, ptr 
%entry1, i64 0, i32 3
 // HIP-NEXT:   %flags = load i32, ptr %4, align 4
 // HIP-NEXT:   %5 = getelementptr inbounds %struct.__tgt_offload_entry, ptr 
%entry1, i64 0, i32 4
-// HIP-NEXT:   %textype = load i32, ptr %4, align 4
+// HIP-NEXT:   %textype = load i32, ptr %5, align 4
 // HIP-NEXT:   %type = and i32 %flags, 7
 // HIP-NEXT:   %6 = and i32 %flags, 8
 // HIP-NEXT:   %extern = lshr i32 %6, 3

diff  --git a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp 
b/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
index 58d9e1e85ceff..f4f500b173572 100644
--- a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
@@ -385,7 +385,7 @@ Function *createRegisterGlobalsFunction(Module &M, bool 
IsHIP) {
   Builder.CreateInBoundsGEP(offloading::getEntryTy(M), Entry,
 {ConstantInt::get(getSizeTTy(M), 0),
  ConstantInt::get(Type::getInt32Ty(C), 4)});
-  auto *Data = Builder.CreateLoad(Type::getInt32Ty(C), FlagsPtr, "textype");
+  auto *Data = Builder.CreateLoad(Type::getInt32Ty(C), DataPtr, "textype");
   auto *Kind = Builder.CreateAnd(
   Flags, ConstantInt::get(Type::getInt32Ty(C), 0x7), "type");
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Fix tests hip-offload-compress-zlib/zstd.hip (PR #74783)

2023-12-07 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/74783
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CUDA] Add support for CUDA-12.3 and sm_90a (PR #74895)

2023-12-08 Thread Joseph Huber via cfe-commits



@@ -80,8 +85,10 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo {
   bool allowFP16Math() const;
   bool hasMaskOperator() const { return PTXVersion >= 71; }
   bool hasNoReturn() const { return SmVersion >= 30 && PTXVersion >= 64; }
-  unsigned int getSmVersion() const { return SmVersion; }
+  unsigned int getSmVersion() const { return FullSmVersion / 10; }
+  unsigned int getFullSmVersion() const { return FullSmVersion; }
   std::string getTargetName() const { return TargetName; }
+  bool isSm90a() const { return getFullSmVersion() == 901; }

jhuber6 wrote:

Could we expose this more like `getSmVersion` and `getSmFeature`? Has CUDA even 
documented how they intend to further build on this?

https://github.com/llvm/llvm-project/pull/74895
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [CUDA] Add support for CUDA-12.3 and sm_90a (PR #74895)

2023-12-08 Thread Joseph Huber via cfe-commits



@@ -80,8 +85,10 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo {
   bool allowFP16Math() const;
   bool hasMaskOperator() const { return PTXVersion >= 71; }
   bool hasNoReturn() const { return SmVersion >= 30 && PTXVersion >= 64; }
-  unsigned int getSmVersion() const { return SmVersion; }
+  unsigned int getSmVersion() const { return FullSmVersion / 10; }
+  unsigned int getFullSmVersion() const { return FullSmVersion; }
   std::string getTargetName() const { return TargetName; }
+  bool isSm90a() const { return getFullSmVersion() == 901; }

jhuber6 wrote:

Yeah, I was thinking that the internal representation would just be what 
"FullSMVersion" is now, but `getSMVersion` would return `/ 10` and 
`getFeatures` or something would be `% 10`.

https://github.com/llvm/llvm-project/pull/74895
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CUDA] Add support for CUDA-12.3 and sm_90a (PR #74895)

2023-12-08 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 approved this pull request.


https://github.com/llvm/llvm-project/pull/74895
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] ef23bba - [Linkerwrapper] Make -Xoffload-linker pass directly to `clang`

2023-12-11 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2023-12-11T07:56:19-06:00
New Revision: ef23bba6e5aecbc6008e8a9ff8740fc4b04fe814

URL: 
https://github.com/llvm/llvm-project/commit/ef23bba6e5aecbc6008e8a9ff8740fc4b04fe814
DIFF: 
https://github.com/llvm/llvm-project/commit/ef23bba6e5aecbc6008e8a9ff8740fc4b04fe814.diff

LOG: [Linkerwrapper] Make -Xoffload-linker pass directly to `clang`

Summary:
We provide `-Xoffload-linker` to pass arguments directly to the link
step. Currently this uses `-Wl,` implicitly which prevents us from using
clang options that we otherwise could make use of. This patch removes
that implicit behavior as users can just as easiliy pass
`-Xoffload-linker -Wl,-foo` if needed.

Added: 


Modified: 
clang/test/Driver/linker-wrapper.c
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

Removed: 




diff  --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index e82febd618231..b763a003452ba 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -123,8 +123,8 @@
 // RUN:   --linker-path=/usr/bin/ld --device-linker=a 
--device-linker=nvptx64-nvidia-cuda=b -- \
 // RUN:   %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=LINKER-ARGS
 
-// LINKER-ARGS: clang{{.*}}--target=amdgcn-amd-amdhsa{{.*}}-Wl,a
-// LINKER-ARGS: clang{{.*}}--target=nvptx64-nvidia-cuda{{.*}}-Wl,a -Wl,b
+// LINKER-ARGS: clang{{.*}}--target=amdgcn-amd-amdhsa{{.*}}a
+// LINKER-ARGS: clang{{.*}}--target=nvptx64-nvidia-cuda{{.*}}a b
 
 // RUN: not clang-linker-wrapper --dry-run 
--host-triple=x86_64-unknown-linux-gnu -ldummy \
 // RUN:   --linker-path=/usr/bin/ld --device-linker=a 
--device-linker=nvptx64-nvidia-cuda=b -- \

diff  --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index db0ce3e2a1901..5d2fe98fe5601 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -428,7 +428,7 @@ Expected clang(ArrayRef InputFiles, 
const ArgList &Args) {
 std::back_inserter(CmdArgs));
 
   for (StringRef Arg : Args.getAllArgValues(OPT_linker_arg_EQ))
-CmdArgs.push_back(Args.MakeArgString("-Wl," + Arg));
+CmdArgs.push_back(Args.MakeArgString(Arg));
 
   for (StringRef Arg : Args.getAllArgValues(OPT_builtin_bitcode_EQ)) {
 if (llvm::Triple(Arg.split('=').first) == Triple)



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] f3d5758 - [libc] Fix the wrapper headers for 'toupper' and 'tolower'

2023-11-14 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2023-11-14T11:52:43-06:00
New Revision: f3d57583b4942056a930b6f1e4101063637e9e98

URL: 
https://github.com/llvm/llvm-project/commit/f3d57583b4942056a930b6f1e4101063637e9e98
DIFF: 
https://github.com/llvm/llvm-project/commit/f3d57583b4942056a930b6f1e4101063637e9e98.diff

LOG: [libc] Fix the wrapper headers for 'toupper' and 'tolower'

Summary:
The GNU headers like to reassign this function to a new function which
the optimizer will pick up unless compiling with `O0`. This uses an
external LUT which we don't have and fails to link. This patch makes
sure that the GPU portion does not include these extra definitions and
we only use the ones we support. It's hacky, but it's the only way to
disable it.

Added: 


Modified: 
clang/lib/Headers/llvm_libc_wrappers/ctype.h

Removed: 




diff  --git a/clang/lib/Headers/llvm_libc_wrappers/ctype.h 
b/clang/lib/Headers/llvm_libc_wrappers/ctype.h
index 084c5a97765a360..49c2af93471b0e7 100644
--- a/clang/lib/Headers/llvm_libc_wrappers/ctype.h
+++ b/clang/lib/Headers/llvm_libc_wrappers/ctype.h
@@ -13,8 +13,19 @@
 #error "This file is for GPU offloading compilation only"
 #endif
 
+// The GNU headers like to define 'toupper' and 'tolower' redundantly. This is
+// necessary to prevent it from doing that and remapping our implementation.
+#if (defined(__NVPTX__) || defined(__AMDGPU__)) && defined(__GLIBC__)
+#pragma push_macro("__USE_EXTERN_INLINES")
+#undef __USE_EXTERN_INLINES
+#endif
+
 #include_next 
 
+#if (defined(__NVPTX__) || defined(__AMDGPU__)) && defined(__GLIBC__)
+#pragma pop_macro("__USE_EXTERN_INLINES")
+#endif
+
 #if __has_include()
 
 #if defined(__HIP__) || defined(__CUDA__)



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-14 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Just a FYI, that recent NVIDIA GPUs have introduced a concept of [thread 
> block 
> cluster](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-block-clusters).
>  We may need another level of granularity between the block and device.

Should be easy enough, though the numbers would no longer be incremental if we 
put it between there. It's somewhat difficult to decide what these things 
should be called. Also I was somewhat tempted to keep the names all the same 
length like the `__ATOMIC` ones are, but that might not be worth the effort.

That being said, As far as I'm aware the Nvidia backend doesn't handle scoped 
atomics at all yet, we simply emit `volatile` versions even when scopes exist 
in PTX.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> Is there any actual difference now between these and the HIP/OpenCL flavors 
> other than dropping the language from the name?

Yes, these directly copy the GNU functions and names. The OpenCL / HIP ones use 
a different format.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits



@@ -798,6 +798,13 @@ static void InitializePredefinedMacros(const TargetInfo 
&TI,
   Builder.defineMacro("__ATOMIC_ACQ_REL", "4");
   Builder.defineMacro("__ATOMIC_SEQ_CST", "5");
 
+  // Define macros for the clang atomic scopes.
+  Builder.defineMacro("__MEMORY_SCOPE_SYSTEM", "0");
+  Builder.defineMacro("__MEMORY_SCOPE_DEVICE", "1");
+  Builder.defineMacro("__MEMORY_SCOPE_WRKGRP", "2");
+  Builder.defineMacro("__MEMORY_SCOPE_WVFRNT", "3");
+  Builder.defineMacro("__MEMORY_SCOPE_SINGLE", "4");
+

jhuber6 wrote:

We could, though I might need to think of some better names. It's difficult to 
cover all the cases people might need. I think that cleanup would best be done 
in a follow-up patch.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits



@@ -205,6 +220,56 @@ class AtomicScopeHIPModel : public AtomicScopeModel {
   }
 };
 
+/// Defines the generic atomic scope model.
+class AtomicScopeGenericModel : public AtomicScopeModel {
+public:
+  /// The enum values match predefined built-in macros __ATOMIC_SCOPE_*.
+  enum ID {
+System = 0,
+Device = 1,
+Workgroup = 2,
+Wavefront = 3,
+Single = 4,
+Last = Single
+  };
+
+  AtomicScopeGenericModel() = default;
+
+  SyncScope map(unsigned S) const override {
+switch (static_cast(S)) {
+case Device:
+  return SyncScope::DeviceScope;
+case System:
+  return SyncScope::SystemScope;
+case Workgroup:
+  return SyncScope::WorkgroupScope;
+case Wavefront:
+  return SyncScope::WavefrontScope;
+case Single:
+  return SyncScope::SingleScope;
+}
+llvm_unreachable("Invalid language sync scope value");

jhuber6 wrote:

Mostly just copying the existing code for this, but we have semantic checks to 
ensure that the value is valid. So there's no chance that a user will actually 
get to specify anything different from the macros provided.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits



@@ -54,6 +59,16 @@ enum class SyncScope {
 
 inline llvm::StringRef getAsString(SyncScope S) {

jhuber6 wrote:

I think it's because this is for AST printing purposes, while the backend 
strings vary per target.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits



@@ -904,6 +904,32 @@ BUILTIN(__atomic_signal_fence, "vi", "n")
 BUILTIN(__atomic_always_lock_free, "bzvCD*", "nE")
 BUILTIN(__atomic_is_lock_free, "bzvCD*", "nE")
 
+// GNU atomic builtins with atomic scopes.
+ATOMIC_BUILTIN(__scoped_atomic_load, "v.", "t")

jhuber6 wrote:

Naming things is hard, we could do
```
__atomic_scoped_load
__scoped_atomic_load
__atomic_load_scoped
```
Unsure which is the best.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-15 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Overall I think it is the right way to go. Memory scope has been used by 
> different offloading languages and the atomic clang builtins are essentially 
> the same. Adding a generic clang atomic builtins with memory scope allows 
> code sharing among offloading languages.

I agree, I'm hoping to hear something from people more familiar with C/C++ or 
GNU stuff to see if they agree with this direction. Also it might help to 
decide on some better names for the memory scopes.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [openmp] [Clang][OpenMP] Fix ordering of processing of map clauses when mapping a struct. (PR #72410)

2023-11-15 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

This being in clang instead seems like a good change. Are there no CodeGen  
tests changed? We should add one if so.  Probably just take your `libomptarget` 
test and run `update_cc_test_checks` on it with the arguments found in other 
test files.

https://github.com/llvm/llvm-project/pull/72410
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support device binaries in multiple link jobs (PR #72442)

2023-11-15 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/72442

Summary:
Currently the linker wrapper strictly assigns a single input binary to a
single link job based off of its input architecture. This is not
sufficient to implement the AMDGPU target ID correctly as this could
have many compatible architectures participating in multiple links. This
patch introduces the ability to have a single binary input be linked
multiple times. For example, given the following, we will now link in
the static library where previously we would not. clang foo.c -fopenmp
--offload-arch=gfx90a llvm-ar rcs libfoo.a foo.o clang foo.c -fopenmp
--offload-arch=gfx90a:xnack+ libfoo.a This also means that given the
following we will link the basic input twice, but that's on the user for
providing two versions. clang foo.c -fopenmp
--offload-arch=gfx90a,gfx90a:xnack+ This should allow us to also support
a "generic" target in the future for IR without a specific architecture.

This was revived from https://reviews.llvm.org/D152882. The previous
issue was that the Window build failed for unknown reasons.
Investigating if that is still the case.


>From ad003f95734af878b14d24d81091618ec58901b5 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Wed, 15 Nov 2023 15:41:32 -0600
Subject: [PATCH] [LinkerWrapper] Support device binaries in multiple link jobs

Summary:
Currently the linker wrapper strictly assigns a single input binary to a
single link job based off of its input architecture. This is not
sufficient to implement the AMDGPU target ID correctly as this could
have many compatible architectures participating in multiple links. This
patch introduces the ability to have a single binary input be linked
multiple times. For example, given the following, we will now link in
the static library where previously we would not. clang foo.c -fopenmp
--offload-arch=gfx90a llvm-ar rcs libfoo.a foo.o clang foo.c -fopenmp
--offload-arch=gfx90a:xnack+ libfoo.a This also means that given the
following we will link the basic input twice, but that's on the user for
providing two versions. clang foo.c -fopenmp
--offload-arch=gfx90a,gfx90a:xnack+ This should allow us to also support
a "generic" target in the future for IR without a specific architecture.

This was revived from https://reviews.llvm.org/D152882. The previous
issue was that the Window build failed for unknown reasons.
Investigating if that is still the case.
---
 clang/lib/Driver/ToolChains/Clang.cpp |  4 +-
 clang/test/Driver/amdgpu-openmp-toolchain.c   |  2 +-
 clang/test/Driver/linker-wrapper.c| 15 +++
 .../ClangLinkerWrapper.cpp| 93 +++
 .../clang-linker-wrapper/LinkerWrapperOpts.td |  3 +
 llvm/include/llvm/Object/OffloadBinary.h  | 33 +++
 6 files changed, 109 insertions(+), 41 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index b462f5a44057d94..b845feb0ef2d9db 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -8692,12 +8692,10 @@ void OffloadPackager::ConstructJob(Compilation &C, 
const JobAction &JA,
   }
 }
 
-// TODO: We need to pass in the full target-id and handle it properly in 
the
-// linker wrapper.
 SmallVector Parts{
 "file=" + File.str(),
 "triple=" + TC->getTripleString(),
-"arch=" + getProcessorFromTargetID(TC->getTriple(), Arch).str(),
+"arch=" + Arch.str(),
 "kind=" + Kind.str(),
 };
 
diff --git a/clang/test/Driver/amdgpu-openmp-toolchain.c 
b/clang/test/Driver/amdgpu-openmp-toolchain.c
index f38486ad073..daa41b216089b2b 100644
--- a/clang/test/Driver/amdgpu-openmp-toolchain.c
+++ b/clang/test/Driver/amdgpu-openmp-toolchain.c
@@ -65,7 +65,7 @@
 
 // RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a:sramecc-:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID
-// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a,kind=openmp,feature=-sramecc,feature=+xnack
+// CHECK-TARGET-ID: 
clang-offload-packager{{.*}}arch=gfx90a:sramecc-:xnack+,kind=openmp,feature=-sramecc,feature=+xnack
 
 // RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp 
--offload-arch=gfx90a,gfx90a:xnack+ \
 // RUN:   -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
diff --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index da7bdc22153ceae..538520be9ac464a 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -2,6 +2,9 @@
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
 
+// An externally visible variable so static libraries extract.
+__attribute__((visibility("protected"), used)) int x;
+
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
 // RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.nvptx.bc
 // RU

[llvm] [clang] [LinkerWrapper] Support device binaries in multiple link jobs (PR #72442)

2023-11-16 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

The Windows builder gives the following error which I don't relieve on Linux. 
Does anyone have any clue what this `invalid argument` error could be caused by?
```
# note: command had no output on stdout or stderr
# error: command failed with exit status: 1
# executed command: 'c:\ws\src\build\bin\filecheck.exe' 
'C:\ws\src\clang\test\Driver\linker-wrapper.c' --check-prefix=AMDGPU-LINK-ID
# .---command stderr
# | C:\ws\src\clang\test\Driver\linker-wrapper.c:52:20: error: AMDGPU-LINK-ID: 
expected string not found in input
# | // AMDGPU-LINK-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa 
-mcpu=gfx90a -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
# |^
# | :1:1: note: scanning from here
# | c:\ws\src\build\bin\clang-linker-wrapper.exe: error: invalid argument
# | ^
# |
# | Input file: 
# | Check file: C:\ws\src\clang\test\Driver\linker-wrapper.c
# |
# | -dump-input=help explains the following input dump.
# |
# | Input was:
# | <<
# |   1: c:\ws\src\build\bin\clang-linker-wrapper.exe: error: invalid 
argument
# | check:52 
X~ error: 
no match found
# | >>
# `-
# error: command failed with exit status: 1
```

https://github.com/llvm/llvm-project/pull/72442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support device binaries in multiple link jobs (PR #72442)

2023-11-16 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> the error msg is not generated by offload wrapper itself, right? is it from 
> some program called by the offload wrapper?

It may be caused by the `clang` invocation. Though I'm unsure why this change 
causes that test to fail.

https://github.com/llvm/llvm-project/pull/72442
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [Offloading][NFC] Refactor handling of offloading entries (PR #72544)

2023-11-16 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/72544

Summary:
This patch is a simple refactoring of code out of the linker wrapper
into a common location. The main motivation behind this change is to
make it easier to change the handling in the future to accept a triple
to be used to emit entries that function on that target.


>From 0047be2207b775e6de6dda24751daa933bd66ce5 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Thu, 16 Nov 2023 12:05:09 -0600
Subject: [PATCH] [Offloading][NFC] Refactor handling of offloading entries

Summary:
This patch is a simple refactoring of code out of the linker wrapper
into a common location. The main motivation behind this change is to
make it easier to change the handling in the future to accept a triple
to be used to emit entries that function on that target.
---
 clang/test/Driver/linker-wrapper-image.c  |  38 +++---
 .../tools/clang-linker-wrapper/CMakeLists.txt |   1 +
 .../clang-linker-wrapper/OffloadWrapper.cpp   | 116 --
 .../llvm/Frontend/Offloading/Utility.h|  11 +-
 llvm/lib/Frontend/Offloading/Utility.cpp  |  31 -
 5 files changed, 82 insertions(+), 115 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper-image.c 
b/clang/test/Driver/linker-wrapper-image.c
index 83e7db6a49a6bb3..bb641a08bc023d5 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -10,9 +10,9 @@
 // RUN: clang-linker-wrapper --print-wrapped-module --dry-run 
--host-triple=x86_64-unknown-linux-gnu \
 // RUN:   --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s 
--check-prefix=OPENMP
 
-//  OPENMP: @__start_omp_offloading_entries = external hidden constant 
%__tgt_offload_entry
-// OPENMP-NEXT: @__stop_omp_offloading_entries = external hidden constant 
%__tgt_offload_entry
-// OPENMP-NEXT: @__dummy.omp_offloading.entry = hidden constant [0 x 
%__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries"
+//  OPENMP: @__start_omp_offloading_entries = external hidden constant [0 
x %struct.__tgt_offload_entry]
+// OPENMP-NEXT: @__stop_omp_offloading_entries = external hidden constant [0 x 
%struct.__tgt_offload_entry]
+// OPENMP-NEXT: @__dummy.omp_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries"
 // OPENMP-NEXT: @.omp_offloading.device_image = internal unnamed_addr constant 
[[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}"
 // OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr 
constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr 
@.omp_offloading.device_image, ptr getelementptr inbounds ([[[SIZE]] x i8], ptr 
@.omp_offloading.device_image, i64 1, i64 0), ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
 // OPENMP-NEXT: @.omp_offloading.descriptor = internal constant 
%__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr 
@__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }
@@ -39,10 +39,10 @@
 
 //  CUDA: @.fatbin_image = internal constant [0 x i8] zeroinitializer, 
section ".nv_fatbin"
 // CUDA-NEXT: @.fatbin_wrapper = internal constant %fatbin_wrapper { i32 
1180844977, i32 1, ptr @.fatbin_image, ptr null }, section ".nvFatBinSegment", 
align 8
-// CUDA-NEXT: @__dummy.cuda_offloading.entry = hidden constant [0 x 
%__tgt_offload_entry] zeroinitializer, section "cuda_offloading_entries"
 // CUDA-NEXT: @.cuda.binary_handle = internal global ptr null
-// CUDA-NEXT: @__start_cuda_offloading_entries = external hidden constant [0 x 
%__tgt_offload_entry]
-// CUDA-NEXT: @__stop_cuda_offloading_entries = external hidden constant [0 x 
%__tgt_offload_entry]
+// CUDA-NEXT: @__start_cuda_offloading_entries = external hidden constant [0 x 
%struct.__tgt_offload_entry]
+// CUDA-NEXT: @__stop_cuda_offloading_entries = external hidden constant [0 x 
%struct.__tgt_offload_entry]
+// CUDA-NEXT: @__dummy.cuda_offloading_entries = hidden constant [0 x 
%struct.__tgt_offload_entry] zeroinitializer, section "cuda_offloading_entries"
 // CUDA-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ 
i32, ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg, ptr null }]
 
 //  CUDA: define internal void @.cuda.fatbin_reg() section ".text.startup" 
{
@@ -68,13 +68,13 @@
 
 //  CUDA: while.entry:
 // CUDA-NEXT:  %entry1 = phi ptr [ @__start_cuda_offloading_entries, %entry ], 
[ %7, %if.end ]
-// CUDA-NEXT:  %1 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, 
i64 0, i32 0
+// CUDA-NEXT:  %1 = getelementptr inbounds %struct.__tgt_offload_entry, ptr 
%entry1, i64 0, i32 0
 // CUDA-NEXT:  %addr = load ptr, ptr %1, align 8
-// CUDA-NEXT:  %2 = getelementptr inbounds %__tgt_offload_entry, ptr %entry1, 
i64 0, i32 1
+// CUDA-NEXT:  %2 = getelementptr inbounds %struct.__tgt_offload_entry, ptr 
%entry1, i64 0, i32 1
 // CUDA-NEXT:  %name = load ptr, ptr %2, align 8
-// CUDA-NEXT:  %3 = getelementpt

[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)

2023-11-16 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/72554
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)

2023-11-16 Thread Joseph Huber via cfe-commits



@@ -2458,6 +2458,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
   &getTarget().getLongDoubleFormat() == &llvm::APFloat::IEEEquad())
 BuiltinID = mutateLongDoubleBuiltin(BuiltinID);
 
+   // Mutate the printf builtin ID so that we use the same CodeGen path for
+   // HIP and OpenCL with AMDGPU targets.
+   if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf)
+ BuiltinID = Builtin::BIprintf;

jhuber6 wrote:

I'm very close to landing 'real' printf support in the GPU libc where `printf` 
is just a regular function call. Will this change the handling for that in any 
way? I've already had to make the backend pass respect `-fno-builtins` and 
remove `ockl` from OpenMP to make that possible so I'm hoping we don't end up 
with a lot more special casing for `printf`.

https://github.com/llvm/llvm-project/pull/72554
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)

2023-11-16 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 commented:

Any tests? Can you explain why it's not sufficient to do this lowering in the 
AMDGPU pass?

https://github.com/llvm/llvm-project/pull/72554
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)

2023-11-16 Thread Joseph Huber via cfe-commits



@@ -2458,6 +2458,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
   &getTarget().getLongDoubleFormat() == &llvm::APFloat::IEEEquad())
 BuiltinID = mutateLongDoubleBuiltin(BuiltinID);
 
+   // Mutate the printf builtin ID so that we use the same CodeGen path for
+   // HIP and OpenCL with AMDGPU targets.
+   if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf)
+ BuiltinID = Builtin::BIprintf;

jhuber6 wrote:

If we do the eager replacement of `printf` that HIP and OpenCL uses currently 
then it won't be linked in. So users should still be able to link in stuff like 
`strcmp` or whatever without it interfering. This would require the new driver 
however, and if they attempted to use something like `fputs` it would segfault 
because no one initialized the buffer, which isn't a terrible failure mode all 
things considered.

https://github.com/llvm/llvm-project/pull/72554
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-16 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> I'm a little wary of adding these without actually going through any sort of 
> standardization process; if other vendors don't support the same interface, 
> we just have more variations. (See also 
> https://clang.llvm.org/get_involved.html#criteria )
> 
> How consistent are the various scopes across targets? How likely is it that 
> some target will need additional scopes that you haven't specified?

I figured we can just treat these as `clang` extensions for the time being. We 
already have two variants that are more or less redundant for specific 
use-cases, (OpenCL and HIP), which should be able to be removed after this. 
Predicting all kinds of scopes is hard. The easy solution is to just number 
this or something since it's hierarchical. But @Artem-B has already pointed out 
that Nvidia has a scope between "device" and "blocks". Pretty much every system 
is going to have a conception of "device" and "system" and "single threaded" 
however.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-16 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > I figured we can just treat these as clang extensions for the time being. 
> > We already have two variants that are more or less redundant for specific 
> > use-cases, (OpenCL and HIP), which should be able to be removed after this.
> 
> I'm not sure what you mean here. If you mean that users are expected to use 
> the OpenCL/HIP/etc. standard APIs, and you only expect to use these as part 
> of language runtimes, then maybe we don't care so much if it's clang-only.

They should be available to users, but this level of programming is highly 
compiler-dependent already so I don't see it as much different.

> It might be worth considering using string literals instead of numbers for 
> the different scopes. It removes any question of whether the list of scopes 
> is complete and the order of of numbers on the list. And it doesn't require 
> defining a bunch of new preprocessor macros.

The underlying implementation is a string literal in the LLVM `syncscope` 
argument, but the problem is that this isn't standardized at all and varies 
between backends potentially. I suppose we could think of this are more 
literally "target the LLVM `syncscope` argument". I'd like something that's 
"reasonably" consistent between targets since a lot of this can be shared as 
simple hierarchy. it would be really annoying if each target had to define 
separate strings for something that's mostly common in concept.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-16 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > The underlying implementation is a string literal in the LLVM syncscope 
> > argument, but the problem is that this isn't standardized at all and varies 
> > between backends potentially
> 
> We don't have to use the same set of strings as syncscope if that doesn't 
> make sense.

I don't think there's much of a point to making them strings if it's not 
directly invoking the syncscope name for the backend. Realistically as long as 
we give them descriptive names we can just ignore ones that don't apply on 
various targets. Like right now you can use these scoped variants in x64 code 
but it has no effect. Either that or we could use logic to go to the next 
heirarchy level that makes sense. As always, naming stuff is hard.

https://github.com/llvm/llvm-project/pull/72280
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [Offloading][NFC] Refactor handling of offloading entries (PR #72544)

2023-11-17 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/72544
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-17 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/72697

Summary:
This patch provides the initial support to allow handling the new
driver's offloading entries. Normally, the ELF target can emit varibles
at C-identifier named sections and the linker will provide a pointer to
the section. For COFF target, instead the linker merges sections
containing a `$` in alphabetical order. We thus can emit these variables
at sections and then emit two variables that are guaranteed to be sorted
before and after the others to traverse it. Previous patches
consolidated the handling of offloading entries so that this patch more
easily can handle mapping them to the appropriate section.

Ideally, the only remaining step to allow the new driver to run on
Windows targets is to accurately map the following `ld.lld` arguments to
their `llvm-link` equivalents. These are used inside the linker-wrapper,
so we should simply need to remap the arguments to the same
functionality if possible.
```
-o, -output
-l, --library
-L, --library-path
-v, --version
-rpath
-whole-archive, -no-whole-archive
```

I have not tested this at runtime as I do not have access to a windows
machine.

This patch was adapted from some initial efforts in
https://reviews.llvm.org/D137470.


>From 123a4a069166f3ba84dda479ca590fc4597b7074 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 17 Nov 2023 14:09:59 -0600
Subject: [PATCH] [Offload] Initial support for registering offloading entries
 on COFF targets

Summary:
This patch provides the initial support to allow handling the new
driver's offloading entries. Normally, the ELF target can emit varibles
at C-identifier named sections and the linker will provide a pointer to
the section. For COFF target, instead the linker merges sections
containing a `$` in alphabetical order. We thus can emit these variables
at sections and then emit two variables that are guaranteed to be sorted
before and after the others to traverse it. Previous patches
consolidated the handling of offloading entries so that this patch more
easily can handle mapping them to the appropriate section.

Ideally, the only remaining step to allow the new driver to run on
Windows targets is to accurately map the following `ld.lld` arguments to
their `llvm-link` equivalents. These are used inside the linker-wrapper,
so we should simply need to remap the arguments to the same
functionality if possible.
```
-o, -output
-l, --library
-L, --library-path
-v, --version
-rpath
-whole-archive, -no-whole-archive
```

I have not tested this at runtime as I do not have access to a windows
machine.

This patch was adapted from some initial efforts in
https://reviews.llvm.org/D137470.
---
 clang/test/CodeGenCUDA/offloading-entries.cu  | 48 +++
 clang/test/Driver/linker-wrapper-image.c  | 50 ++-
 .../OpenMP/declare_target_link_codegen.cpp|  8 ++-
 llvm/lib/Frontend/Offloading/CMakeLists.txt   |  1 +
 llvm/lib/Frontend/Offloading/Utility.cpp  | 61 ---
 5 files changed, 128 insertions(+), 40 deletions(-)

diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu 
b/clang/test/CodeGenCUDA/offloading-entries.cu
index c4f8d2edad0a98e..46235051f1e4f12 100644
--- a/clang/test/CodeGenCUDA/offloading-entries.cu
+++ b/clang/test/CodeGenCUDA/offloading-entries.cu
@@ -5,6 +5,12 @@
 // RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \
 // RUN:   --offload-new-driver -emit-llvm -o - -x hip  %s | FileCheck \
 // RUN:   --check-prefix=HIP %s
+// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \
+// RUN:   --offload-new-driver -emit-llvm -o - -x cuda  %s | FileCheck \
+// RUN:   --check-prefix=CUDA-COFF %s
+// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \
+// RUN:   --offload-new-driver -emit-llvm -o - -x hip  %s | FileCheck \
+// RUN:   --check-prefix=HIP-COFF %s
 
 #include "Inputs/cuda.h"
 
@@ -23,6 +29,20 @@
 // HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x 
i8] c"x\00"
 // HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { 
ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section 
"hip_offloading_entries", align 1
 //.
+// CUDA-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 
x i8] c"_Z3foov\00"
+// CUDA-COFF: @.omp_offloading.entry._Z3foov = weak constant 
%struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr 
@.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section 
"cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant 
[8 x i8] c"_Z3barv\00"
+// CUDA-COFF: @.omp_offloading.entry._Z3barv = weak constant 
%struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr 
@.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section 
"cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant 
[2 x i8

[clang] [LinkerWrapper] Accenp some neede COFF linker argument (PR #72889)

2023-11-20 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/72889

Summary:
The linker wrapper is a utility used to create offloading programs from
single-source offloading languages such as OpenMP or CUDA. This is done
by embedding device code into the host object, then feeding it into the
linker wrapper which extracts the accelerator object files, links them,
then wraps them in registration code for the target  runtime. This
previously has only worked in Linux / ELF platforms.

This patch attempts to hand Windows / COFF inputs by also accepting COFF
forms of certain linker arguments we use internally. The important
arguments are library search paths, so we can identify libraries which
may contain device code, libraries themselves, and the output name used
for intermediate output.

I am not intimately familiar with the semantics here for the semantics
in how a `lib` file is earched. I am simply treating `foo.lib` as the
GNU equivalent `-l:foo.lib` in the search logic. Similarly, I am
assuming that static libraries will be llvm-ar style libraries. I will
need to investigate the actual deficiencies later, but this should be a
good starting point along with https://github.com/llvm/llvm-project/pull/72697


>From d06171561581d9d15c14f756c8999b478e1d769e Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 20 Nov 2023 10:12:04 -0600
Subject: [PATCH] [LinkerWrapper] Accenp some neede COFF linker argument

Summary:
The linker wrapper is a utility used to create offloading programs from
single-source offloading languages such as OpenMP or CUDA. This is done
by embedding device code into the host object, then feeding it into the
linker wrapper which extracts the accelerator object files, links them,
then wraps them in registration code for the target  runtime. This
previously has only worked in Linux / ELF platforms.

This patch attempts to hand Windows / COFF inputs by also accepting COFF
forms of certain linker arguments we use internally. The important
arguments are library search paths, so we can identify libraries which
may contain device code, libraries themselves, and the output name used
for intermediate output.

I am not intimately familiar with the semantics here for the semantics
in how a `lib` file is earched. I am simply treating `foo.lib` as the
GNU equivalent `-l:foo.lib` in the search logic. Similarly, I am
assuming that static libraries will be llvm-ar style libraries. I will
need to investigate the actual deficiencies later, but this should be a
good starting point along with https://github.com/llvm/llvm-project/pull/72697
---
 clang/test/Driver/linker-wrapper.c |  8 
 .../ClangLinkerWrapper.cpp | 18 +-
 .../clang-linker-wrapper/LinkerWrapperOpts.td  |  5 +
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/clang/test/Driver/linker-wrapper.c 
b/clang/test/Driver/linker-wrapper.c
index da7bdc22153ceae..e82febd61823102 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -140,3 +140,11 @@
 // RUN:   --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s 
--check-prefix=CLANG-BACKEND
 
 // CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa 
-mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.bc
+
+// RUN: clang-offload-packager -o %t.out \
+// RUN:   
--image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
+// RUN: %clang -cc1 %s -triple x86_64-unknown-windows-msvc -emit-obj -o %t.o 
-fembed-offload-object=%t.out
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-windows-msvc 
--dry-run \
+// RUN:   --linker-path=/usr/bin/lld-link -- %t.o -libpath:./ -out:a.exe 2>&1 
| FileCheck %s --check-prefix=COFF
+
+// COFF: "/usr/bin/lld-link" {{.*}}.o -libpath:./ -out:a.exe 
{{.*}}openmp.image.wrapper{{.*}}
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp 
b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index bafe8ace60d1cea..db0ce3e2a190192 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -254,7 +254,7 @@ Error runLinker(ArrayRef Files, const ArgList 
&Args) {
   continue;
 
 Arg->render(Args, NewLinkerArgs);
-if (Arg->getOption().matches(OPT_o))
+if (Arg->getOption().matches(OPT_o) || Arg->getOption().matches(OPT_out))
   llvm::transform(Files, std::back_inserter(NewLinkerArgs),
   [&](StringRef Arg) { return Args.MakeArgString(Arg); });
   }
@@ -1188,7 +1188,7 @@ searchLibraryBaseName(StringRef Name, StringRef Root,
 /// `-lfoo` or `-l:libfoo.a`.
 std::optional searchLibrary(StringRef Input, StringRef Root,
  ArrayRef SearchPaths) {
-  if (Input.startswith(":"))
+  if (Input.startswith(":") || Input.ends_with(".lib"))
 return findFromSearchPaths(Input.drop_front(), Root, SearchPaths);
   return searchLibraryBaseName(Input, Root, SearchPath

[clang] 88b672b - [libc] Adjust headers for some implementations of 'stdio.h'

2023-11-20 Thread Joseph Huber via cfe-commits


Author: Joseph Huber
Date: 2023-11-20T11:22:59-06:00
New Revision: 88b672b0a79e9f68253abf7edcfa5a42d1321cae

URL: 
https://github.com/llvm/llvm-project/commit/88b672b0a79e9f68253abf7edcfa5a42d1321cae
DIFF: 
https://github.com/llvm/llvm-project/commit/88b672b0a79e9f68253abf7edcfa5a42d1321cae.diff

LOG: [libc] Adjust headers for some implementations of 'stdio.h'

Summary:
This is sometimes a macro, undefine it so we can declare it as the GPU
needs.

Added: 


Modified: 
clang/lib/Headers/llvm_libc_wrappers/stdio.h

Removed: 




diff  --git a/clang/lib/Headers/llvm_libc_wrappers/stdio.h 
b/clang/lib/Headers/llvm_libc_wrappers/stdio.h
index 51b0f0e3307772c..0870f3e741ec135 100644
--- a/clang/lib/Headers/llvm_libc_wrappers/stdio.h
+++ b/clang/lib/Headers/llvm_libc_wrappers/stdio.h
@@ -21,6 +21,17 @@
 #define __LIBC_ATTRS __attribute__((device))
 #endif
 
+// Some headers provide these as macros. Temporarily undefine them so they do
+// not conflict with any definitions for the GPU.
+
+#pragma push_macro("stdout")
+#pragma push_macro("stdin")
+#pragma push_macro("stderr")
+
+#undef stdout
+#undef stderr
+#undef stdin
+
 #pragma omp begin declare target
 
 #include 
@@ -29,6 +40,13 @@
 
 #undef __LIBC_ATTRS
 
+// Restore the original macros when compiling on the host.
+#if !defined(__NVPTX__) && !defined(__AMDGPU__)
+#pragma pop_macro("stdout")
+#pragma pop_macro("stderr")
+#pragma pop_macro("stdin")
+#endif
+
 #endif
 
 #endif // __CLANG_LLVM_LIBC_WRAPPERS_STDIO_H__



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [LinkerWrapper] Accept some needed COFF linker arguments (PR #72889)

2023-11-20 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/72889
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Joseph Huber via cfe-commits



@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant 
*Addr, StringRef Name,
   M.getDataLayout().getDefaultGlobalsAddressSpace());
 
   // The entry has to be created in the section the linker expects it to be.
-  Entry->setSection(SectionName);
+  if (Triple.isOSBinFormatCOFF())
+Entry->setSection((SectionName + "$OE").str());
+  else
+Entry->setSection(SectionName);
   Entry->setAlignment(Align(1));
 }
 
 std::pair
 offloading::getOffloadEntryArray(Module &M, StringRef SectionName) {
-  auto *EntriesB =
-  new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0),
- /*isConstant=*/true, GlobalValue::ExternalLinkage,
- /*Initializer=*/nullptr, "__start_" + SectionName);
+  llvm::Triple Triple(M.getTargetTriple());
+
+  auto *ZeroInitilaizer =
+  ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u));
+  auto *EntryInit = Triple.isOSBinFormatCOFF() ? ZeroInitilaizer : nullptr;
+  auto *EntryType = Triple.isOSBinFormatCOFF()
+? ZeroInitilaizer->getType()
+: ArrayType::get(getEntryTy(M), 0);

jhuber6 wrote:

One is a `EntryTy` the other is a `EntryTy[]`

https://github.com/llvm/llvm-project/pull/72697
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/72697

>From ef4e04961a1f553a9f1dced26e69e927060d4dd7 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 17 Nov 2023 14:09:59 -0600
Subject: [PATCH] [Offload] Initial support for registering offloading entries
 on COFF targets

Summary:
This patch provides the initial support to allow handling the new
driver's offloading entries. Normally, the ELF target can emit varibles
at C-identifier named sections and the linker will provide a pointer to
the section. For COFF target, instead the linker merges sections
containing a `$` in alphabetical order. We thus can emit these variables
at sections and then emit two variables that are guaranteed to be sorted
before and after the others to traverse it. Previous patches
consolidated the handling of offloading entries so that this patch more
easily can handle mapping them to the appropriate section.

Ideally, the only remaining step to allow the new driver to run on
Windows targets is to accurately map the following `ld.lld` arguments to
their `llvm-link` equivalents. These are used inside the linker-wrapper,
so we should simply need to remap the arguments to the same
functionality if possible.
```
-o, -output
-l, --library
-L, --library-path
-v, --version
-rpath
-whole-archive, -no-whole-archive
```

I have not tested this at runtime as I do not have access to a windows
machine.

This patch was adapted from some initial efforts in
https://reviews.llvm.org/D137470.
---
 clang/test/CodeGenCUDA/offloading-entries.cu  | 48 +
 clang/test/Driver/linker-wrapper-image.c  | 50 +
 .../OpenMP/declare_target_link_codegen.cpp|  8 ++-
 llvm/lib/Frontend/Offloading/CMakeLists.txt   |  1 +
 llvm/lib/Frontend/Offloading/Utility.cpp  | 70 ---
 5 files changed, 132 insertions(+), 45 deletions(-)

diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu 
b/clang/test/CodeGenCUDA/offloading-entries.cu
index c4f8d2edad0a98e..46235051f1e4f12 100644
--- a/clang/test/CodeGenCUDA/offloading-entries.cu
+++ b/clang/test/CodeGenCUDA/offloading-entries.cu
@@ -5,6 +5,12 @@
 // RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \
 // RUN:   --offload-new-driver -emit-llvm -o - -x hip  %s | FileCheck \
 // RUN:   --check-prefix=HIP %s
+// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \
+// RUN:   --offload-new-driver -emit-llvm -o - -x cuda  %s | FileCheck \
+// RUN:   --check-prefix=CUDA-COFF %s
+// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \
+// RUN:   --offload-new-driver -emit-llvm -o - -x hip  %s | FileCheck \
+// RUN:   --check-prefix=HIP-COFF %s
 
 #include "Inputs/cuda.h"
 
@@ -23,6 +29,20 @@
 // HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x 
i8] c"x\00"
 // HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { 
ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section 
"hip_offloading_entries", align 1
 //.
+// CUDA-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 
x i8] c"_Z3foov\00"
+// CUDA-COFF: @.omp_offloading.entry._Z3foov = weak constant 
%struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr 
@.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section 
"cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant 
[8 x i8] c"_Z3barv\00"
+// CUDA-COFF: @.omp_offloading.entry._Z3barv = weak constant 
%struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr 
@.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section 
"cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant 
[2 x i8] c"x\00"
+// CUDA-COFF: @.omp_offloading.entry.x = weak constant 
%struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, 
i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
+//.
+// HIP-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x 
i8] c"_Z3foov\00"
+// HIP-COFF: @.omp_offloading.entry._Z3foov = weak constant 
%struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.omp_offloading.entry_name, 
i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
+// HIP-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 
x i8] c"_Z3barv\00"
+// HIP-COFF: @.omp_offloading.entry._Z3barv = weak constant 
%struct.__tgt_offload_entry { ptr @_Z3barv, ptr @.omp_offloading.entry_name.1, 
i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
+// HIP-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 
x i8] c"x\00"
+// HIP-COFF: @.omp_offloading.entry.x = weak constant 
%struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, 
i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
+//.
 // CUDA-LABEL: @_Z18__device_stub__foov(
 // CUDA-NEXT:  entry:
 // CUDA-

[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Joseph Huber via cfe-commits



@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant 
*Addr, StringRef Name,
   M.getDataLayout().getDefaultGlobalsAddressSpace());
 
   // The entry has to be created in the section the linker expects it to be.
-  Entry->setSection(SectionName);
+  if (Triple.isOSBinFormatCOFF())
+Entry->setSection((SectionName + "$OE").str());
+  else
+Entry->setSection(SectionName);
   Entry->setAlignment(Align(1));
 }
 
 std::pair
 offloading::getOffloadEntryArray(Module &M, StringRef SectionName) {
-  auto *EntriesB =
-  new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0),
- /*isConstant=*/true, GlobalValue::ExternalLinkage,
- /*Initializer=*/nullptr, "__start_" + SectionName);
+  llvm::Triple Triple(M.getTargetTriple());
+
+  auto *ZeroInitilaizer =
+  ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u));
+  auto *EntryInit = Triple.isOSBinFormatCOFF() ? ZeroInitilaizer : nullptr;
+  auto *EntryType = Triple.isOSBinFormatCOFF()
+? ZeroInitilaizer->getType()
+: ArrayType::get(getEntryTy(M), 0);

jhuber6 wrote:

Actually you're right, forgot the initializer used the array type as well.

https://github.com/llvm/llvm-project/pull/72697
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/72697

>From e3b6ab18f390e0ee4938095717aa9e4b21690aa7 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 17 Nov 2023 14:09:59 -0600
Subject: [PATCH] [Offload] Initial support for registering offloading entries
 on COFF targets

Summary:
This patch provides the initial support to allow handling the new
driver's offloading entries. Normally, the ELF target can emit varibles
at C-identifier named sections and the linker will provide a pointer to
the section. For COFF target, instead the linker merges sections
containing a `$` in alphabetical order. We thus can emit these variables
at sections and then emit two variables that are guaranteed to be sorted
before and after the others to traverse it. Previous patches
consolidated the handling of offloading entries so that this patch more
easily can handle mapping them to the appropriate section.

Ideally, the only remaining step to allow the new driver to run on
Windows targets is to accurately map the following `ld.lld` arguments to
their `llvm-link` equivalents. These are used inside the linker-wrapper,
so we should simply need to remap the arguments to the same
functionality if possible.
```
-o, -output
-l, --library
-L, --library-path
-v, --version
-rpath
-whole-archive, -no-whole-archive
```

I have not tested this at runtime as I do not have access to a windows
machine.

This patch was adapted from some initial efforts in
https://reviews.llvm.org/D137470.
---
 clang/test/CodeGenCUDA/offloading-entries.cu  | 48 +
 clang/test/Driver/linker-wrapper-image.c  | 50 ++
 .../OpenMP/declare_target_link_codegen.cpp|  8 ++-
 llvm/lib/Frontend/Offloading/CMakeLists.txt   |  1 +
 llvm/lib/Frontend/Offloading/Utility.cpp  | 68 +++
 5 files changed, 130 insertions(+), 45 deletions(-)

diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu 
b/clang/test/CodeGenCUDA/offloading-entries.cu
index c4f8d2edad0a98e..46235051f1e4f12 100644
--- a/clang/test/CodeGenCUDA/offloading-entries.cu
+++ b/clang/test/CodeGenCUDA/offloading-entries.cu
@@ -5,6 +5,12 @@
 // RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \
 // RUN:   --offload-new-driver -emit-llvm -o - -x hip  %s | FileCheck \
 // RUN:   --check-prefix=HIP %s
+// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \
+// RUN:   --offload-new-driver -emit-llvm -o - -x cuda  %s | FileCheck \
+// RUN:   --check-prefix=CUDA-COFF %s
+// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \
+// RUN:   --offload-new-driver -emit-llvm -o - -x hip  %s | FileCheck \
+// RUN:   --check-prefix=HIP-COFF %s
 
 #include "Inputs/cuda.h"
 
@@ -23,6 +29,20 @@
 // HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x 
i8] c"x\00"
 // HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { 
ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section 
"hip_offloading_entries", align 1
 //.
+// CUDA-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 
x i8] c"_Z3foov\00"
+// CUDA-COFF: @.omp_offloading.entry._Z3foov = weak constant 
%struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr 
@.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section 
"cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant 
[8 x i8] c"_Z3barv\00"
+// CUDA-COFF: @.omp_offloading.entry._Z3barv = weak constant 
%struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr 
@.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section 
"cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant 
[2 x i8] c"x\00"
+// CUDA-COFF: @.omp_offloading.entry.x = weak constant 
%struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, 
i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
+//.
+// HIP-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x 
i8] c"_Z3foov\00"
+// HIP-COFF: @.omp_offloading.entry._Z3foov = weak constant 
%struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.omp_offloading.entry_name, 
i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
+// HIP-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 
x i8] c"_Z3barv\00"
+// HIP-COFF: @.omp_offloading.entry._Z3barv = weak constant 
%struct.__tgt_offload_entry { ptr @_Z3barv, ptr @.omp_offloading.entry_name.1, 
i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
+// HIP-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 
x i8] c"x\00"
+// HIP-COFF: @.omp_offloading.entry.x = weak constant 
%struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, 
i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
+//.
 // CUDA-LABEL: @_Z18__device_stub__foov(
 // CUDA-NEXT:  entry:
 // CUDA

[clang] [llvm] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/72697

>From 4627ea74d753eb6742051127e0a5b0c64a620f20 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 17 Nov 2023 14:09:59 -0600
Subject: [PATCH] [Offload] Initial support for registering offloading entries
 on COFF targets

Summary:
This patch provides the initial support to allow handling the new
driver's offloading entries. Normally, the ELF target can emit varibles
at C-identifier named sections and the linker will provide a pointer to
the section. For COFF target, instead the linker merges sections
containing a `$` in alphabetical order. We thus can emit these variables
at sections and then emit two variables that are guaranteed to be sorted
before and after the others to traverse it. Previous patches
consolidated the handling of offloading entries so that this patch more
easily can handle mapping them to the appropriate section.

Ideally, the only remaining step to allow the new driver to run on
Windows targets is to accurately map the following `ld.lld` arguments to
their `llvm-link` equivalents. These are used inside the linker-wrapper,
so we should simply need to remap the arguments to the same
functionality if possible.
```
-o, -output
-l, --library
-L, --library-path
-v, --version
-rpath
-whole-archive, -no-whole-archive
```

I have not tested this at runtime as I do not have access to a windows
machine.

This patch was adapted from some initial efforts in
https://reviews.llvm.org/D137470.
---
 clang/test/CodeGenCUDA/offloading-entries.cu  | 48 +
 clang/test/Driver/linker-wrapper-image.c  | 50 ++
 .../OpenMP/declare_target_link_codegen.cpp|  8 ++-
 llvm/lib/Frontend/Offloading/CMakeLists.txt   |  1 +
 llvm/lib/Frontend/Offloading/Utility.cpp  | 68 +++
 5 files changed, 130 insertions(+), 45 deletions(-)

diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu 
b/clang/test/CodeGenCUDA/offloading-entries.cu
index c4f8d2edad0a98e..46235051f1e4f12 100644
--- a/clang/test/CodeGenCUDA/offloading-entries.cu
+++ b/clang/test/CodeGenCUDA/offloading-entries.cu
@@ -5,6 +5,12 @@
 // RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \
 // RUN:   --offload-new-driver -emit-llvm -o - -x hip  %s | FileCheck \
 // RUN:   --check-prefix=HIP %s
+// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \
+// RUN:   --offload-new-driver -emit-llvm -o - -x cuda  %s | FileCheck \
+// RUN:   --check-prefix=CUDA-COFF %s
+// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-windows-gnu -fgpu-rdc \
+// RUN:   --offload-new-driver -emit-llvm -o - -x hip  %s | FileCheck \
+// RUN:   --check-prefix=HIP-COFF %s
 
 #include "Inputs/cuda.h"
 
@@ -23,6 +29,20 @@
 // HIP: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 x 
i8] c"x\00"
 // HIP: @.omp_offloading.entry.x = weak constant %struct.__tgt_offload_entry { 
ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, i32 0, i32 0 }, section 
"hip_offloading_entries", align 1
 //.
+// CUDA-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 
x i8] c"_Z3foov\00"
+// CUDA-COFF: @.omp_offloading.entry._Z3foov = weak constant 
%struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr 
@.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, section 
"cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant 
[8 x i8] c"_Z3barv\00"
+// CUDA-COFF: @.omp_offloading.entry._Z3barv = weak constant 
%struct.__tgt_offload_entry { ptr @_Z18__device_stub__barv, ptr 
@.omp_offloading.entry_name.1, i64 0, i32 0, i32 0 }, section 
"cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant 
[2 x i8] c"x\00"
+// CUDA-COFF: @.omp_offloading.entry.x = weak constant 
%struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, 
i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
+//.
+// HIP-COFF: @.omp_offloading.entry_name = internal unnamed_addr constant [8 x 
i8] c"_Z3foov\00"
+// HIP-COFF: @.omp_offloading.entry._Z3foov = weak constant 
%struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.omp_offloading.entry_name, 
i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
+// HIP-COFF: @.omp_offloading.entry_name.1 = internal unnamed_addr constant [8 
x i8] c"_Z3barv\00"
+// HIP-COFF: @.omp_offloading.entry._Z3barv = weak constant 
%struct.__tgt_offload_entry { ptr @_Z3barv, ptr @.omp_offloading.entry_name.1, 
i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
+// HIP-COFF: @.omp_offloading.entry_name.2 = internal unnamed_addr constant [2 
x i8] c"x\00"
+// HIP-COFF: @.omp_offloading.entry.x = weak constant 
%struct.__tgt_offload_entry { ptr @x, ptr @.omp_offloading.entry_name.2, i64 4, 
i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
+//.
 // CUDA-LABEL: @_Z18__device_stub__foov(
 // CUDA-NEXT:  entry:
 // CUDA

[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Joseph Huber via cfe-commits



@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant 
*Addr, StringRef Name,
   M.getDataLayout().getDefaultGlobalsAddressSpace());
 
   // The entry has to be created in the section the linker expects it to be.
-  Entry->setSection(SectionName);
+  if (Triple.isOSBinFormatCOFF())
+Entry->setSection((SectionName + "$OE").str());
+  else
+Entry->setSection(SectionName);
   Entry->setAlignment(Align(1));
 }
 
 std::pair
 offloading::getOffloadEntryArray(Module &M, StringRef SectionName) {
-  auto *EntriesB =
-  new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0),
- /*isConstant=*/true, GlobalValue::ExternalLinkage,
- /*Initializer=*/nullptr, "__start_" + SectionName);
+  llvm::Triple Triple(M.getTargetTriple());

jhuber6 wrote:

I fixed the other occurrences, these should stay to separate the type as I 
prefer `llvm::Triple Triple` over `Triple TheTriple` or similar. 

https://github.com/llvm/llvm-project/pull/72697
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [LinkerWrapper] Accept some needed COFF linker arguments (PR #72889)

2023-11-20 Thread Joseph Huber via cfe-commits



@@ -126,3 +126,8 @@ def version : Flag<["--", "-"], "version">, 
Flags<[HelpHidden]>, Alias;
 
 def whole_archive : Flag<["--", "-"], "whole-archive">, Flags<[HelpHidden]>;
 def no_whole_archive : Flag<["--", "-"], "no-whole-archive">, 
Flags<[HelpHidden]>;
+
+// COFF-style linker options.
+def out : Joined<["/", "-", "/?", "-?"], "out:">, Flags<[HelpHidden]>;

jhuber6 wrote:

I copied this from the COFF implementation of `lld-link` and assumed that it 
knew the flags better than I did.

https://github.com/llvm/llvm-project/pull/72889
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [LinkerWrapper] Accept some needed lld-link linker arguments for COFF targets (PR #72889)

2023-11-20 Thread Joseph Huber via cfe-commits


https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/72889
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [LinkerWrapper] Accept some needed lld-link linker arguments for COFF targets (PR #72889)

2023-11-20 Thread Joseph Huber via cfe-commits


jhuber6 wrote:

> The command-line argument handling is not related to 
> [PE](https://en.wikipedia.org/wiki/Portable_Executable)/COFF, but to 
> Microsoft's `link.exe` command line interface, for instance 
> [`/libpath:`](https://learn.microsoft.com/en-us/cpp/build/reference/libpath-additional-libpath?view=msvc-170).
>  `/usr/bin/lld-link` is a `link.exe`-compatible interface for lld with an 
> appropriate default triple, like `clang-cl` is for `clang`. IIRC, `lld` 
> choses its command-line interface based on the `argv[0]` name, so should 
> clang-linker-wrapper when passed `--linker-path=/usr/bin/lld-link` instead of 
> `--linker-path=/usr/bin/lld`, but both should be able to generate PE files.
> 
> That is, this patch is not necessarily wrong, but the commit message and "// 
> COFF-style linker options." should refer to the command line interface 
> instead.

I changed the title, realistically I could probably try to separate these more 
logically, but I think it's easier to just handle them both independently but 
identically in the logic.

https://github.com/llvm/llvm-project/pull/72889
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2677 matches

Mail list logo