from:"Alex Voicu via Phabricator via cfe\-commits"

[PATCH] D39739: [HCC] Add flag to Import Weak Functions in Function Importer

2017-12-04 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added inline comments.



Comment at: lib/Transforms/IPO/FunctionImport.cpp:107
+static cl::opt
+ForceImportWeak("force-import-weak", cl::Hidden,
+cl::desc("Allow weak functions to be imported"),

yaxunl wrote:
> Is it possible not to expose this option through extern? Generally these 
> options should be kept static.
This seems to suggest that it is not utterly unacceptable: 
http://llvm.org/docs/CommandLine.html#internal-vs-external-storage, and the use 
case described there maps pretty closely to this, since the flag itself must be 
accessible from two different TUs. Am I missing something?


Repository:
  rL LLVM

https://reviews.llvm.org/D39739



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-09-19 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

Gentle ping.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-19 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 557057.
AlexVlx marked an inline comment as done.
AlexVlx added a comment.

Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -390,6 +390,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,392 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This section describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of AMD GPUs. This, coupled with the
+the availability and maturity of GPU accelerated algorithm libraries that
+implement most / all corresponding algorithms in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an AMD accelerator, via a well specified,
+familiar, algorithmic interface, without having to delve into low-level hardware
+specific details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise AMDGPU accelerator programming, without loss of user
+  familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_u

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-09-19 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 557071.
AlexVlx added a comment.

Use correct flag naming.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/HIPAMD.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/hipstdpar/hipstdpar_lib.hpp
  clang/test/Driver/hipstdpar.c

Index: clang/test/Driver/hipstdpar.c
===
--- /dev/null
+++ clang/test/Driver/hipstdpar.c
@@ -0,0 +1,18 @@
+// RUN: not %clang -### -hipstdpar -nogpulib -nogpuinc --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-MISSING-LIB %s
+// RUN: %clang -### --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \
+// RUN:   --hipstdpar-thrust-path=%S/Inputs/hipstdpar/thrust \
+// RUN:   --hipstdpar-prim-path=%S/Inputs/hipstdpar/rocprim \
+// RUN:   -nogpulib -nogpuinc --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-COMPILE %s
+// RUN: touch %t.o
+// RUN: %clang -### -hipstdpar %t.o 2>&1 | FileCheck --check-prefix=HIPSTDPAR-LINK %s
+
+// HIPSTDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--hipstdpar-path'
+// HIPSTDPAR-COMPILE: "-x" "hip"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/thrust}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/rocprim}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/Inputs/hipstdpar}}"
+// HIPSTDPAR-COMPILE: "-include" "hipstdpar_lib.hpp"
+// HIPSTDPAR-LINK: "-rpath"
+// HIPSTDPAR-LINK: "-l{{.*hip.*}}"
Index: clang/lib/Driver/ToolChains/ROCm.h
===
--- clang/lib/Driver/ToolChains/ROCm.h
+++ clang/lib/Driver/ToolChains/ROCm.h
@@ -77,6 +77,9 @@
   const Driver &D;
   bool HasHIPRuntime = false;
   bool HasDeviceLibrary = false;
+  bool HasHIPStdParLibrary = false;
+  bool HasRocThrustLibrary = false;
+  bool HasRocPrimLibrary = false;
 
   // Default version if not detected or specified.
   const unsigned DefaultVersionMajor = 3;
@@ -96,6 +99,13 @@
   std::vector RocmDeviceLibPathArg;
   // HIP runtime path specified by --hip-path.
   StringRef HIPPathArg;
+  // HIP Standard Parallel Algorithm acceleration library specified by
+  // --hipstdpar-path
+  StringRef HIPStdParPathArg;
+  // rocThrust algorithm library specified by --hipstdpar-thrust-path
+  StringRef HIPRocThrustPathArg;
+  // rocPrim algorithm library specified by --hipstdpar-prim-path
+  StringRef HIPRocPrimPathArg;
   // HIP version specified by --hip-version.
   StringRef HIPVersionArg;
   // Wheter -nogpulib is specified.
@@ -180,6 +190,9 @@
   /// Check whether we detected a valid ROCm device library.
   bool hasDeviceLibrary() const { return HasDeviceLibrary; }
 
+  /// Check whether we detected a valid HIP STDPAR Acceleration library.
+  bool hasHIPStdParLibrary() const { return HasHIPStdParLibrary; }
+
   /// Print information about the detected ROCm installation.
   void print(raw_ostream &OS) const;
 
Index: clang/lib/Driver/ToolChains/HIPAMD.cpp
===
--- clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -115,6 +115,8 @@
 "--no-undefined",
 "-shared",
 "-plugin-opt=-amdgpu-internalize-symbols"};
+  if (Args.hasArg(options::OPT_hipstdpar))
+LldArgs.push_back("-plugin-opt=-amdgpu-enable-hipstdpar");
 
   auto &TC = getToolChain();
   auto &D = TC.getDriver();
@@ -246,6 +248,8 @@
   if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
   false))
 CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});
+  if (DriverArgs.hasArgNoClaim(options::OPT_hipstdpar))
+CC1Args.append({"-mllvm", "-amdgpu-enable-hipstdpar"});
 
   StringRef MaxThreadsPerBlock =
   DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ);
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6563,6 +6563,12 @@
 if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
  options::OPT_fno_gpu_allow_device_init, false))
   CmdArgs.push_back("-fgpu-allow-device-init");
+if (Args.hasArg(options::OPT_hipstdpar)) {
+  CmdArgs.push_back("-hipstdpar");
+
+  if (Args.hasArg(options::OPT_hipstdpar_interpose_alloc))
+CmdArgs.push_back("-hipstdpar-interpose-alloc");
+}
 Args.addOptInFlag(CmdArgs, options::OPT_fhip_kernel_arg_name,
   options::OPT_fno_hip_kernel_arg_name);
   }
Index: clang/lib/Driver/T

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-25 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added inline comments.



Comment at: clang/docs/HIPSupport.rst:266
+
+- ``-hipstdpar`` / ``--hipstdpar`` enables algorithm offload, which depending 
on
+  phase, has the following effects:

MaskRay wrote:
> Newer long options that don't use the common prefix like `-f` are preferred 
> to only support `--foo`, not `-foo`.
> Many `--hip*` options don't support `-hip*`. Why add `-hipstdpar`?
Thanks for the review - to answer the question, it was only done for (apparent) 
symmetry, there's no strong incentive to have the single dash flavour; I will 
update both this and the driver patch to retain only the double dash flavours.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-09-25 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 557328.
AlexVlx added a comment.

Use double dash compiler flags only.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/HIPAMD.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/hipstdpar/hipstdpar_lib.hpp
  clang/test/Driver/hipstdpar.c

Index: clang/test/Driver/hipstdpar.c
===
--- /dev/null
+++ clang/test/Driver/hipstdpar.c
@@ -0,0 +1,18 @@
+// RUN: not %clang -### --hipstdpar -nogpulib -nogpuinc --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-MISSING-LIB %s
+// RUN: %clang -### --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \
+// RUN:   --hipstdpar-thrust-path=%S/Inputs/hipstdpar/thrust \
+// RUN:   --hipstdpar-prim-path=%S/Inputs/hipstdpar/rocprim \
+// RUN:   -nogpulib -nogpuinc --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-COMPILE %s
+// RUN: touch %t.o
+// RUN: %clang -### --hipstdpar %t.o 2>&1 | FileCheck --check-prefix=HIPSTDPAR-LINK %s
+
+// HIPSTDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--hipstdpar-path'
+// HIPSTDPAR-COMPILE: "-x" "hip"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/thrust}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/rocprim}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/Inputs/hipstdpar}}"
+// HIPSTDPAR-COMPILE: "-include" "hipstdpar_lib.hpp"
+// HIPSTDPAR-LINK: "-rpath"
+// HIPSTDPAR-LINK: "-l{{.*hip.*}}"
Index: clang/lib/Driver/ToolChains/ROCm.h
===
--- clang/lib/Driver/ToolChains/ROCm.h
+++ clang/lib/Driver/ToolChains/ROCm.h
@@ -77,6 +77,9 @@
   const Driver &D;
   bool HasHIPRuntime = false;
   bool HasDeviceLibrary = false;
+  bool HasHIPStdParLibrary = false;
+  bool HasRocThrustLibrary = false;
+  bool HasRocPrimLibrary = false;
 
   // Default version if not detected or specified.
   const unsigned DefaultVersionMajor = 3;
@@ -96,6 +99,13 @@
   std::vector RocmDeviceLibPathArg;
   // HIP runtime path specified by --hip-path.
   StringRef HIPPathArg;
+  // HIP Standard Parallel Algorithm acceleration library specified by
+  // --hipstdpar-path
+  StringRef HIPStdParPathArg;
+  // rocThrust algorithm library specified by --hipstdpar-thrust-path
+  StringRef HIPRocThrustPathArg;
+  // rocPrim algorithm library specified by --hipstdpar-prim-path
+  StringRef HIPRocPrimPathArg;
   // HIP version specified by --hip-version.
   StringRef HIPVersionArg;
   // Wheter -nogpulib is specified.
@@ -180,6 +190,9 @@
   /// Check whether we detected a valid ROCm device library.
   bool hasDeviceLibrary() const { return HasDeviceLibrary; }
 
+  /// Check whether we detected a valid HIP STDPAR Acceleration library.
+  bool hasHIPStdParLibrary() const { return HasHIPStdParLibrary; }
+
   /// Print information about the detected ROCm installation.
   void print(raw_ostream &OS) const;
 
Index: clang/lib/Driver/ToolChains/HIPAMD.cpp
===
--- clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -115,6 +115,8 @@
 "--no-undefined",
 "-shared",
 "-plugin-opt=-amdgpu-internalize-symbols"};
+  if (Args.hasArg(options::OPT_hipstdpar))
+LldArgs.push_back("-plugin-opt=-amdgpu-enable-hipstdpar");
 
   auto &TC = getToolChain();
   auto &D = TC.getDriver();
@@ -246,6 +248,8 @@
   if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
   false))
 CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});
+  if (DriverArgs.hasArgNoClaim(options::OPT_hipstdpar))
+CC1Args.append({"-mllvm", "-amdgpu-enable-hipstdpar"});
 
   StringRef MaxThreadsPerBlock =
   DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ);
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6563,6 +6563,11 @@
   CmdArgs.push_back("-fhip-new-launch-api");
 Args.addOptInFlag(CmdArgs, options::OPT_fgpu_allow_device_init,
   options::OPT_fno_gpu_allow_device_init);
+if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
+ options::OPT_fno_gpu_allow_device_init, false))
+  CmdArgs.push_back("-fgpu-allow-device-init");
+Args.AddLastArg(CmdArgs, options::OPT_hipstdpar);
+Args.AddLastArg(CmdArgs, options::OPT_hipstdpar_interpose_alloc);
 Args.addOptInFlag(CmdArgs, options::OPT_fhip_kerne

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-26 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 557368.
AlexVlx added a comment.

Use double dash flags exclusively.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -390,6 +390,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``--hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,392 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This section describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of AMD GPUs. This, coupled with the
+the availability and maturity of GPU accelerated algorithm libraries that
+implement most / all corresponding algorithms in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an AMD accelerator, via a well specified,
+familiar, algorithmic interface, without having to delve into low-level hardware
+specific details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise AMDGPU accelerator programming, without loss of user
+  familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code:
+
+.. code-block:: C++
+
+   bool has_the_answer(const std::vector& v) {
+ return std::find(std::execution::par_unseq, std::cbegin(v), std::cend(v), 42) != std::cend(v);
+   }
+
+if Clang is invoked with the ``--hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``--hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``, we rely on

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-09-28 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

@MaskRay do you think there's anything else that needs adjusting? Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-09-29 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 557492.
AlexVlx marked 2 inline comments as done.
AlexVlx added a comment.

Remove noise, add missing stub folders needed by the test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/HIPAMD.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/hipstdpar/hipstdpar_lib.hpp
  clang/test/Driver/Inputs/hipstdpar/rocprim/.keep
  clang/test/Driver/Inputs/hipstdpar/thrust/.keep
  clang/test/Driver/hipstdpar.c

Index: clang/test/Driver/hipstdpar.c
===
--- /dev/null
+++ clang/test/Driver/hipstdpar.c
@@ -0,0 +1,18 @@
+// RUN: not %clang -### --hipstdpar -nogpulib -nogpuinc --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-MISSING-LIB %s
+// RUN: %clang -### --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \
+// RUN:   --hipstdpar-thrust-path=%S/Inputs/hipstdpar/thrust \
+// RUN:   --hipstdpar-prim-path=%S/Inputs/hipstdpar/rocprim \
+// RUN:   -nogpulib -nogpuinc --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-COMPILE %s
+// RUN: touch %t.o
+// RUN: %clang -### --hipstdpar %t.o 2>&1 | FileCheck --check-prefix=HIPSTDPAR-LINK %s
+
+// HIPSTDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--hipstdpar-path'
+// HIPSTDPAR-COMPILE: "-x" "hip"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/thrust}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/rocprim}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/Inputs/hipstdpar}}"
+// HIPSTDPAR-COMPILE: "-include" "hipstdpar_lib.hpp"
+// HIPSTDPAR-LINK: "-rpath"
+// HIPSTDPAR-LINK: "-l{{.*hip.*}}"
Index: clang/lib/Driver/ToolChains/ROCm.h
===
--- clang/lib/Driver/ToolChains/ROCm.h
+++ clang/lib/Driver/ToolChains/ROCm.h
@@ -77,6 +77,9 @@
   const Driver &D;
   bool HasHIPRuntime = false;
   bool HasDeviceLibrary = false;
+  bool HasHIPStdParLibrary = false;
+  bool HasRocThrustLibrary = false;
+  bool HasRocPrimLibrary = false;
 
   // Default version if not detected or specified.
   const unsigned DefaultVersionMajor = 3;
@@ -96,6 +99,13 @@
   std::vector RocmDeviceLibPathArg;
   // HIP runtime path specified by --hip-path.
   StringRef HIPPathArg;
+  // HIP Standard Parallel Algorithm acceleration library specified by
+  // --hipstdpar-path
+  StringRef HIPStdParPathArg;
+  // rocThrust algorithm library specified by --hipstdpar-thrust-path
+  StringRef HIPRocThrustPathArg;
+  // rocPrim algorithm library specified by --hipstdpar-prim-path
+  StringRef HIPRocPrimPathArg;
   // HIP version specified by --hip-version.
   StringRef HIPVersionArg;
   // Wheter -nogpulib is specified.
@@ -180,6 +190,9 @@
   /// Check whether we detected a valid ROCm device library.
   bool hasDeviceLibrary() const { return HasDeviceLibrary; }
 
+  /// Check whether we detected a valid HIP STDPAR Acceleration library.
+  bool hasHIPStdParLibrary() const { return HasHIPStdParLibrary; }
+
   /// Print information about the detected ROCm installation.
   void print(raw_ostream &OS) const;
 
Index: clang/lib/Driver/ToolChains/HIPAMD.cpp
===
--- clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -113,6 +113,8 @@
 "--no-undefined",
 "-shared",
 "-plugin-opt=-amdgpu-internalize-symbols"};
+  if (Args.hasArg(options::OPT_hipstdpar))
+LldArgs.push_back("-plugin-opt=-amdgpu-enable-hipstdpar");
 
   auto &TC = getToolChain();
   auto &D = TC.getDriver();
@@ -242,6 +244,8 @@
   if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
   false))
 CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});
+  if (DriverArgs.hasArgNoClaim(options::OPT_hipstdpar))
+CC1Args.append({"-mllvm", "-amdgpu-enable-hipstdpar"});
 
   StringRef MaxThreadsPerBlock =
   DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ);
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6580,6 +6580,8 @@
   CmdArgs.push_back("-fhip-new-launch-api");
 Args.addOptInFlag(CmdArgs, options::OPT_fgpu_allow_device_init,
   options::OPT_fno_gpu_allow_device_init);
+Args.AddLastArg(CmdArgs, options::OPT_hipstdpar);
+Args.AddLastArg(CmdArgs, options::OPT_hipstdpar_interpose_alloc);
 Args.addOptInFlag(CmdArgs, options::OPT_fhip_kernel_arg_name,

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-09-29 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added inline comments.



Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:340
+  HasRocThrustLibrary = !HIPRocThrustPathArg.empty() &&
+D.getVFS().exists(HIPRocThrustPathArg + "/thrust");
+  HIPRocPrimPathArg =

MaskRay wrote:
> Prefer `concat` so that if `HIPRocThrustPathArg` ends with `/`, we will not 
> get `...//thrust`.
If you don't mind, I'll do this in a subsequent patch since it's just annoying 
and not an outright error. The issue being that RocmInstallationDetector is not 
a ToolChain and isn't `friend`ly to one either, so `concat` is inaccessible. 
Addressing that would add a bit of unrelated noise (looking through the file we 
probably want concat in more places).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-10-03 Thread Alex Voicu via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG9a408588d1b8: [HIP][Clang][Driver] Add Driver support for 
`hipstdpar` (authored by AlexVlx).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/HIPAMD.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/hipstdpar/hipstdpar_lib.hpp
  clang/test/Driver/Inputs/hipstdpar/rocprim/.keep
  clang/test/Driver/Inputs/hipstdpar/thrust/.keep
  clang/test/Driver/hipstdpar.c

Index: clang/test/Driver/hipstdpar.c
===
--- /dev/null
+++ clang/test/Driver/hipstdpar.c
@@ -0,0 +1,18 @@
+// RUN: not %clang -### --hipstdpar -nogpulib -nogpuinc --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-MISSING-LIB %s
+// RUN: %clang -### --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \
+// RUN:   --hipstdpar-thrust-path=%S/Inputs/hipstdpar/thrust \
+// RUN:   --hipstdpar-prim-path=%S/Inputs/hipstdpar/rocprim \
+// RUN:   -nogpulib -nogpuinc --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-COMPILE %s
+// RUN: touch %t.o
+// RUN: %clang -### --hipstdpar %t.o 2>&1 | FileCheck --check-prefix=HIPSTDPAR-LINK %s
+
+// HIPSTDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--hipstdpar-path'
+// HIPSTDPAR-COMPILE: "-x" "hip"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/thrust}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/rocprim}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/Inputs/hipstdpar}}"
+// HIPSTDPAR-COMPILE: "-include" "hipstdpar_lib.hpp"
+// HIPSTDPAR-LINK: "-rpath"
+// HIPSTDPAR-LINK: "-l{{.*hip.*}}"
Index: clang/lib/Driver/ToolChains/ROCm.h
===
--- clang/lib/Driver/ToolChains/ROCm.h
+++ clang/lib/Driver/ToolChains/ROCm.h
@@ -77,6 +77,9 @@
   const Driver &D;
   bool HasHIPRuntime = false;
   bool HasDeviceLibrary = false;
+  bool HasHIPStdParLibrary = false;
+  bool HasRocThrustLibrary = false;
+  bool HasRocPrimLibrary = false;
 
   // Default version if not detected or specified.
   const unsigned DefaultVersionMajor = 3;
@@ -96,6 +99,13 @@
   std::vector RocmDeviceLibPathArg;
   // HIP runtime path specified by --hip-path.
   StringRef HIPPathArg;
+  // HIP Standard Parallel Algorithm acceleration library specified by
+  // --hipstdpar-path
+  StringRef HIPStdParPathArg;
+  // rocThrust algorithm library specified by --hipstdpar-thrust-path
+  StringRef HIPRocThrustPathArg;
+  // rocPrim algorithm library specified by --hipstdpar-prim-path
+  StringRef HIPRocPrimPathArg;
   // HIP version specified by --hip-version.
   StringRef HIPVersionArg;
   // Wheter -nogpulib is specified.
@@ -180,6 +190,9 @@
   /// Check whether we detected a valid ROCm device library.
   bool hasDeviceLibrary() const { return HasDeviceLibrary; }
 
+  /// Check whether we detected a valid HIP STDPAR Acceleration library.
+  bool hasHIPStdParLibrary() const { return HasHIPStdParLibrary; }
+
   /// Print information about the detected ROCm installation.
   void print(raw_ostream &OS) const;
 
Index: clang/lib/Driver/ToolChains/HIPAMD.cpp
===
--- clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -113,6 +113,8 @@
 "--no-undefined",
 "-shared",
 "-plugin-opt=-amdgpu-internalize-symbols"};
+  if (Args.hasArg(options::OPT_hipstdpar))
+LldArgs.push_back("-plugin-opt=-amdgpu-enable-hipstdpar");
 
   auto &TC = getToolChain();
   auto &D = TC.getDriver();
@@ -242,6 +244,8 @@
   if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
   false))
 CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});
+  if (DriverArgs.hasArgNoClaim(options::OPT_hipstdpar))
+CC1Args.append({"-mllvm", "-amdgpu-enable-hipstdpar"});
 
   StringRef MaxThreadsPerBlock =
   DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ);
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6580,6 +6580,8 @@
   CmdArgs.push_back("-fhip-new-launch-api");
 Args.addOptInFlag(CmdArgs, options::OPT_fgpu_allow_device_init,
   options::OPT_fno_gpu_allow_device_init);
+Args.AddLastArg(CmdArgs, options::OPT_hipstdpar);
+Args.AddLastArg(CmdArgs, options::OPT_hipstdpar_interpose_alloc);
 Args.addOptInFla

[PATCH] D155826: [HIP][Clang][Preprocessor][RFC] Add preprocessor support for C++ Parallel Algorithm Offload

2023-10-03 Thread Alex Voicu via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGc0f8748d448b: [HIP][Clang][Preprocessor] Add Preprocessor 
support for `hipstdpar` (authored by AlexVlx).

Changed prior to commit:
  https://reviews.llvm.org/D155826?vs=552952&id=557558#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155826/new/

https://reviews.llvm.org/D155826

Files:
  clang/lib/Frontend/InitPreprocessor.cpp
  clang/test/Preprocessor/predefined-macros.c


Index: clang/test/Preprocessor/predefined-macros.c
===
--- clang/test/Preprocessor/predefined-macros.c
+++ clang/test/Preprocessor/predefined-macros.c
@@ -290,3 +290,20 @@
 // RUN:   -fcuda-is-device -fgpu-default-stream=per-thread \
 // RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-PTH
 // CHECK-PTH: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip --hipstdpar -triple 
x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-HIPSTDPAR
+// CHECK-HIPSTDPAR: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-NOT: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip --hipstdpar 
--hipstdpar-interpose-alloc \
+// RUN:  -triple x86_64-unknown-linux-gnu | FileCheck -match-full-lines %s \
+// RUN:  --check-prefix=CHECK-HIPSTDPAR-INTERPOSE
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip --hipstdpar 
--hipstdpar-interpose-alloc \
+// RUN:  -triple amdgcn-amd-amdhsa -fcuda-is-device | FileCheck 
-match-full-lines \
+// RUN:  %s --check-prefix=CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG-NOT: #define 
__HIPSTDPAR_INTERPOSE_ALLOC__ 1
\ No newline at end of file
Index: clang/lib/Frontend/InitPreprocessor.cpp
===
--- clang/lib/Frontend/InitPreprocessor.cpp
+++ clang/lib/Frontend/InitPreprocessor.cpp
@@ -585,6 +585,11 @@
 Builder.defineMacro("__HIP_MEMORY_SCOPE_WORKGROUP", "3");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_AGENT", "4");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_SYSTEM", "5");
+if (LangOpts.HIPStdPar) {
+  Builder.defineMacro("__HIPSTDPAR__");
+  if (LangOpts.HIPStdParInterposeAlloc)
+Builder.defineMacro("__HIPSTDPAR_INTERPOSE_ALLOC__");
+}
 if (LangOpts.CUDAIsDevice) {
   Builder.defineMacro("__HIP_DEVICE_COMPILE__");
   if (!TI.hasHIPImageSupport()) {


Index: clang/test/Preprocessor/predefined-macros.c
===
--- clang/test/Preprocessor/predefined-macros.c
+++ clang/test/Preprocessor/predefined-macros.c
@@ -290,3 +290,20 @@
 // RUN:   -fcuda-is-device -fgpu-default-stream=per-thread \
 // RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-PTH
 // CHECK-PTH: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip --hipstdpar -triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-HIPSTDPAR
+// CHECK-HIPSTDPAR: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-NOT: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip --hipstdpar --hipstdpar-interpose-alloc \
+// RUN:  -triple x86_64-unknown-linux-gnu | FileCheck -match-full-lines %s \
+// RUN:  --check-prefix=CHECK-HIPSTDPAR-INTERPOSE
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip --hipstdpar --hipstdpar-interpose-alloc \
+// RUN:  -triple amdgcn-amd-amdhsa -fcuda-is-device | FileCheck -match-full-lines \
+// RUN:  %s --check-prefix=CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG-NOT: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
\ No newline at end of file
Index: clang/lib/Frontend/InitPreprocessor.cpp
===
--- clang/lib/Frontend/InitPreprocessor.cpp
+++ clang/lib/Frontend/InitPreprocessor.cpp
@@ -585,6 +585,11 @@
 Builder.defineMacro("__HIP_MEMORY_SCOPE_WORKGROUP", "3");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_AGENT", "4");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_SYSTEM", "5");
+if (LangOpts.HIPStdPar) {
+  Builder.defineMacro("__HIPSTDPAR__");
+  if (LangOpts.HIPStdParInterposeAlloc)
+Builder.defineMacro("__HIPSTDPAR_INTERPOSE_ALLOC__");
+}
 if (LangOpts.CUDAIsDevice) {
   Builder.defineMacro("__HIP_DEVICE_COMPILE__");
   if (!TI.hasHIPImageSupport()) {
___
cfe-commits mailin

[PATCH] D155833: [HIP][Clang][Sema][RFC] Add Sema support for C++ Parallel Algorithm Offload

2023-10-03 Thread Alex Voicu via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG4d680f56475c: [HIP][Clang][Sema] Add Sema support for 
`hipstdpar` (authored by AlexVlx).

Changed prior to commit:
  https://reviews.llvm.org/D155833?vs=552568&id=557559#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155833/new/

https://reviews.llvm.org/D155833

Files:
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/SemaHipStdPar/device-can-call-host.cpp

Index: clang/test/SemaHipStdPar/device-can-call-host.cpp
===
--- /dev/null
+++ clang/test/SemaHipStdPar/device-can-call-host.cpp
@@ -0,0 +1,93 @@
+// RUN: %clang_cc1 -x hip %s --hipstdpar -triple amdgcn-amd-amdhsa --std=c++17 \
+// RUN:   -fcuda-is-device -emit-llvm -o /dev/null -verify
+
+// Note: These would happen implicitly, within the implementation of the
+//   accelerator specific algorithm library, and not from user code.
+
+// Calls from the accelerator side to implicitly host (i.e. unannotated)
+// functions are fine.
+
+// expected-no-diagnostics
+
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+
+extern "C" void host_fn() {}
+
+struct Dummy {};
+
+struct S {
+  S() {}
+  ~S() { host_fn(); }
+
+  int x;
+};
+
+struct T {
+  __device__ void hd() { host_fn(); }
+
+  __device__ void hd3();
+
+  void h() {}
+
+  void operator+();
+  void operator-(const T&) {}
+
+  operator Dummy() { return Dummy(); }
+};
+
+__device__ void T::hd3() { host_fn(); }
+
+template  __device__ void hd2() { host_fn(); }
+
+__global__ void kernel() { hd2(); }
+
+__device__ void hd() { host_fn(); }
+
+template  __device__ void hd3() { host_fn(); }
+__device__ void device_fn() { hd3(); }
+
+__device__ void local_var() {
+  S s;
+}
+
+__device__ void explicit_destructor(S *s) {
+  s->~S();
+}
+
+__device__ void hd_member_fn() {
+  T t;
+
+  t.hd();
+}
+
+__device__ void h_member_fn() {
+  T t;
+  t.h();
+}
+
+__device__ void unaryOp() {
+  T t;
+  (void) +t;
+}
+
+__device__ void binaryOp() {
+  T t;
+  (void) (t - t);
+}
+
+__device__ void implicitConversion() {
+  T t;
+  Dummy d = t;
+}
+
+template 
+struct TmplStruct {
+  template  __device__ void fn() {}
+};
+
+template <>
+template <>
+__device__ void TmplStruct::fn() { host_fn(); }
+
+__device__ void double_specialization() { TmplStruct().fn(); }
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -271,7 +271,8 @@
   OutputName = Names[i]->getName();
 
 TargetInfo::ConstraintInfo Info(Literal->getString(), OutputName);
-if (!Context.getTargetInfo().validateOutputConstraint(Info)) {
+if (!Context.getTargetInfo().validateOutputConstraint(Info) &&
+!(LangOpts.HIPStdPar && LangOpts.CUDAIsDevice)) {
   targetDiag(Literal->getBeginLoc(),
  diag::err_asm_invalid_output_constraint)
   << Info.getConstraintStr();
Index: clang/lib/Sema/SemaExpr.cpp
===
--- clang/lib/Sema/SemaExpr.cpp
+++ clang/lib/Sema/SemaExpr.cpp
@@ -19157,7 +19157,7 @@
   // Diagnose ODR-use of host global variables in device functions.
   // Reference of device global variables in host functions is allowed
   // through shadow variables therefore it is not diagnosed.
-  if (SemaRef.LangOpts.CUDAIsDevice) {
+  if (SemaRef.LangOpts.CUDAIsDevice && !SemaRef.LangOpts.HIPStdPar) {
 SemaRef.targetDiag(Loc, diag::err_ref_bad_target)
 << /*host*/ 2 << /*variable*/ 1 << Var << UserTarget;
 SemaRef.targetDiag(Var->getLocation(),
Index: clang/lib/Sema/SemaCUDA.cpp
===
--- clang/lib/Sema/SemaCUDA.cpp
+++ clang/lib/Sema/SemaCUDA.cpp
@@ -249,6 +249,15 @@
   (CallerTarget == CFT_Global && CalleeTarget == CFT_Device))
 return CFP_Native;
 
+  // HipStdPar mode is special, in that assessing whether a device side call to
+  // a host target is deferred to a subsequent pass, and cannot unambiguously be
+  // adjudicated in the AST, hence we optimistically allow them to pass here.
+  if (getLangOpts().HIPStdPar &&
+  (CallerTarget == CFT_Global || CallerTarget == CFT_Device ||
+   CallerTarget == CFT_HostDevice) &&
+  CalleeTarget == CFT_Host)
+return CFP_HostDevice;
+
   // (d) HostDevice behavior depends on compilation mode.
   if (CallerTarget == CFT_HostDevice) {
 // It's OK to call a compilation-mode matching function from an HD one.
@@ -895,7 +904,7 @@
   if (!ShouldCheck || !Capture.isReferenceCapture())
 return;
   auto DiagKind = SemaDiagnosticBuilder::K_Deferred;
-  if (Capture.isVariableCapture()) {
+  if (Capture.isVariableC

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-10-03 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D155775#4652683 , @thakis wrote:

> This seems to break tests on Mac and windows, see eg 
> http://45.33.8.238/macm1/70415/step_7.txt
>
> Please take a look and revert for now if it takes a while to fix.

Whoops, thank you for flagging this, my bad. At the moment (and for the 
foreseeable future) this is Linux only, so I think I'd rather XFAIL it on non 
Linux targets, if that's OK.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-10-03 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

https://reviews.llvm.org/rG0701ee69f7ac should fix it, once again apologies for 
the noise.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-10-03 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D155775#4652767 , @dyung wrote:

> Hi @AlexVlx, the test you added is failing on an linux bot, can you take a 
> look and revert if you need time to investigate? It appears it has been 
> failing for over 6 hours.
>
> https://lab.llvm.org/buildbot/#/builders/139/builds/50940

This is another unsupported target (as is hexagon which is also failing), that 
I'll temporarily disable, but I admit to being actually confused as to why it's 
failing there, because the sequence it cannot find appears to be there. Thank 
you for flagging it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-10-03 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D155775#4652780 , @dyung wrote:

> In D155775#4652773 , @AlexVlx wrote:
>
>> In D155775#4652767 , @dyung wrote:
>>
>>> Hi @AlexVlx, the test you added is failing on an linux bot, can you take a 
>>> look and revert if you need time to investigate? It appears it has been 
>>> failing for over 6 hours.
>>>
>>> https://lab.llvm.org/buildbot/#/builders/139/builds/50940
>>
>> This is another unsupported target (as is hexagon which is also failing), 
>> that I'll temporarily disable, but I admit to being actually confused as to 
>> why it's failing there, because the sequence it cannot find appears to be 
>> there. Thank you for flagging it.
>
> Note that this linux bot builds with the ps4 target as the default target.
>
> At this point, would it be easier to add a REQUIRES line for the target the 
> test should support rather than just whack-a-mole for the targets it does not?

Oh, it definitely would be MUCH easier to add a `REQUIRES` line, however I'd 
have had to have thought about that, which I did not, so thank you for the wake 
up call. I'll update this in a few minutes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-10-04 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D155775#4652851 , @ro wrote:

> In D155775#4652785 , @AlexVlx wrote:
>
>> In D155775#4652780 , @dyung wrote:
>>
>>> 
>
>
>
>>> At this point, would it be easier to add a REQUIRES line for the target the 
>>> test should support rather than just whack-a-mole for the targets it does 
>>> not?
>>
>> Oh, it definitely would be MUCH easier to add a `REQUIRES` line, however I'd 
>> have had to have thought about that, which I did not, so thank you for the 
>> wake up call. I'll update this in a few minutes.
>
> Unfortunately, this is still not done after almost a day, leaving quite a 
> number of buildbots broken.  Please fix or revert.

Apologies, do you have an indication of which buildbots are still broken due to 
this change, after the most recent commit? The ones I was initially aware of, 
as well as those brought up here appear to pass at the moment, and glancing 
through the currently failing builds didn’t point to this as being the culprit 
(I could’ve simply missed it). Thank you!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153092: [Clang][CodeGen]`vtable`, `typeinfo` et al. are globals

2023-07-19 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 542086.
AlexVlx added a comment.

Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153092/new/

https://reviews.llvm.org/D153092

Files:
  clang/lib/CodeGen/CGVTT.cpp
  clang/lib/CodeGen/CGVTables.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/vtable-align-address-space.cpp
  clang/test/CodeGenCXX/vtable-assume-load-address-space.cpp
  clang/test/CodeGenCXX/vtable-consteval-address-space.cpp
  clang/test/CodeGenCXX/vtable-constexpr-address-space.cpp
  clang/test/CodeGenCXX/vtable-key-function-address-space.cpp
  clang/test/CodeGenCXX/vtable-layout-extreme-address-space.cpp
  clang/test/CodeGenCXX/vtable-linkage-address-space.cpp
  clang/test/CodeGenCXX/vtable-pointer-initialization-address-space.cpp
  clang/test/CodeGenCXX/vtt-address-space.cpp
  clang/test/CodeGenCXX/vtt-layout-address-space.cpp
  clang/test/Headers/hip-header.hip

Index: clang/test/Headers/hip-header.hip
===
--- clang/test/Headers/hip-header.hip
+++ clang/test/Headers/hip-header.hip
@@ -43,6 +43,22 @@
 
 // expected-no-diagnostics
 
+// Check handling of overriden, implicitly __host__ dtor (should emit as a
+// nullptr to global)
+
+struct vbase {
+virtual ~vbase();
+};
+
+template
+struct vderived : public vbase {
+~vderived();
+};
+
+template struct vderived;
+
+// CHECK: @_ZTV8vderivedIvE = weak_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } zeroinitializer, comdat, align 8
+
 // Check support for pure and deleted virtual functions
 struct base {
   __host__
@@ -60,9 +76,8 @@
 __device__ void test_vf() {
 derived d;
 }
-// CHECK: @_ZTV7derived = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr] } { [4 x ptr] [ptr null, ptr null, ptr @_ZN7derived2pvEv, ptr @__cxa_deleted_virtual] }, comdat, align 8
-// CHECK: @_ZTV4base = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr] } { [4 x ptr] [ptr null, ptr null, ptr @__cxa_pure_virtual, ptr @__cxa_deleted_virtual] }, comdat, align 8
-
+// CHECK: @_ZTV7derived = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN7derived2pvEv to ptr addrspace(1)), ptr addrspace(1) addrspacecast (ptr @__cxa_deleted_virtual to ptr addrspace(1))] }, comdat, align 8
+// CHECK: @_ZTV4base = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1)), ptr addrspace(1) addrspacecast (ptr @__cxa_deleted_virtual to ptr addrspace(1))] }, comdat, align 8
 // CHECK: define{{.*}}void @__cxa_pure_virtual()
 // CHECK: define{{.*}}void @__cxa_deleted_virtual()
 
Index: clang/test/CodeGenCXX/vtt-layout-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/vtt-layout-address-space.cpp
@@ -0,0 +1,89 @@
+// RUN: %clang_cc1 %s -triple=amdgcn-amd-amdhsa -std=c++11 -emit-llvm -o - | FileCheck %s
+
+// Test1::B should just have a single entry in its VTT, which points to the vtable.
+namespace Test1 {
+struct A { };
+
+struct B : virtual A {
+  virtual void f();
+};
+
+void B::f() { }
+}
+
+// Check that we don't add a secondary virtual pointer for Test2::A, since Test2::A doesn't have any virtual member functions or bases.
+namespace Test2 {
+  struct A { };
+
+  struct B : A { virtual void f(); };
+  struct C : virtual B { };
+
+  C c;
+}
+
+// This is the sample from the C++ Itanium ABI, p2.6.2.
+namespace Test3 {
+  class A1 { int i; };
+  class A2 { int i; virtual void f(); };
+  class V1 : public A1, public A2 { int i; };
+  class B1 { int i; };
+  class B2 { int i; };
+  class V2 : public B1, public B2, public virtual V1 { int i; };
+  class V3 {virtual void g(); };
+  class C1 : public virtual V1 { int i; };
+  class C2 : public virtual V3, virtual V2 { int i; };
+  class X1 { int i; };
+  class C3 : public X1 { int i; };
+  class D : public C1, public C2, public C3 { int i;  };
+
+  D d;
+}
+
+// This is the sample from the C++ Itanium ABI, p2.6.2, with the change suggested
+// (making A2 a virtual base of V1)
+namespace Test4 {
+  class A1 { int i; };
+  class A2 { int i; virtual void f(); };
+  class V1 : public A1, public virtual A2 { int i; };
+  class B1 { int i; };
+  class B2 { int i; };
+  class V2 : public B1, public B2, public virtual V1 { int i; };
+  class V3 {virtual void g(); };
+  class C1 : public virtual V1 { int i; };
+  class C2 : public virtual V3, virtual V2 { int i; };
+  class X1 { int i; };
+  class C3 : public X1 { int i; };
+  class D : public C1, public C2, public C3 { int i;  };
+
+  D d;
+}
+
+namespace Test5 {
+  struct A {
+virtual void f() = 0;
+virtual void anchor();
+  };
+
+  void A::anchor() {
+

[PATCH] D153092: [Clang][CodeGen]`vtable`, `typeinfo` et al. are globals

2023-07-19 Thread Alex Voicu via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG8acdcf401687: [Clang][CodeGen]`vtable`, `typeinfo` et al. 
are globals (authored by AlexVlx).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153092/new/

https://reviews.llvm.org/D153092

Files:
  clang/lib/CodeGen/CGVTT.cpp
  clang/lib/CodeGen/CGVTables.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/vtable-align-address-space.cpp
  clang/test/CodeGenCXX/vtable-assume-load-address-space.cpp
  clang/test/CodeGenCXX/vtable-consteval-address-space.cpp
  clang/test/CodeGenCXX/vtable-constexpr-address-space.cpp
  clang/test/CodeGenCXX/vtable-key-function-address-space.cpp
  clang/test/CodeGenCXX/vtable-layout-extreme-address-space.cpp
  clang/test/CodeGenCXX/vtable-linkage-address-space.cpp
  clang/test/CodeGenCXX/vtable-pointer-initialization-address-space.cpp
  clang/test/CodeGenCXX/vtt-address-space.cpp
  clang/test/CodeGenCXX/vtt-layout-address-space.cpp
  clang/test/Headers/hip-header.hip

Index: clang/test/Headers/hip-header.hip
===
--- clang/test/Headers/hip-header.hip
+++ clang/test/Headers/hip-header.hip
@@ -43,6 +43,22 @@
 
 // expected-no-diagnostics
 
+// Check handling of overriden, implicitly __host__ dtor (should emit as a
+// nullptr to global)
+
+struct vbase {
+virtual ~vbase();
+};
+
+template
+struct vderived : public vbase {
+~vderived();
+};
+
+template struct vderived;
+
+// CHECK: @_ZTV8vderivedIvE = weak_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } zeroinitializer, comdat, align 8
+
 // Check support for pure and deleted virtual functions
 struct base {
   __host__
@@ -60,9 +76,8 @@
 __device__ void test_vf() {
 derived d;
 }
-// CHECK: @_ZTV7derived = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr] } { [4 x ptr] [ptr null, ptr null, ptr @_ZN7derived2pvEv, ptr @__cxa_deleted_virtual] }, comdat, align 8
-// CHECK: @_ZTV4base = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr] } { [4 x ptr] [ptr null, ptr null, ptr @__cxa_pure_virtual, ptr @__cxa_deleted_virtual] }, comdat, align 8
-
+// CHECK: @_ZTV7derived = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN7derived2pvEv to ptr addrspace(1)), ptr addrspace(1) addrspacecast (ptr @__cxa_deleted_virtual to ptr addrspace(1))] }, comdat, align 8
+// CHECK: @_ZTV4base = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1)), ptr addrspace(1) addrspacecast (ptr @__cxa_deleted_virtual to ptr addrspace(1))] }, comdat, align 8
 // CHECK: define{{.*}}void @__cxa_pure_virtual()
 // CHECK: define{{.*}}void @__cxa_deleted_virtual()
 
Index: clang/test/CodeGenCXX/vtt-layout-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/vtt-layout-address-space.cpp
@@ -0,0 +1,89 @@
+// RUN: %clang_cc1 %s -triple=amdgcn-amd-amdhsa -std=c++11 -emit-llvm -o - | FileCheck %s
+
+// Test1::B should just have a single entry in its VTT, which points to the vtable.
+namespace Test1 {
+struct A { };
+
+struct B : virtual A {
+  virtual void f();
+};
+
+void B::f() { }
+}
+
+// Check that we don't add a secondary virtual pointer for Test2::A, since Test2::A doesn't have any virtual member functions or bases.
+namespace Test2 {
+  struct A { };
+
+  struct B : A { virtual void f(); };
+  struct C : virtual B { };
+
+  C c;
+}
+
+// This is the sample from the C++ Itanium ABI, p2.6.2.
+namespace Test3 {
+  class A1 { int i; };
+  class A2 { int i; virtual void f(); };
+  class V1 : public A1, public A2 { int i; };
+  class B1 { int i; };
+  class B2 { int i; };
+  class V2 : public B1, public B2, public virtual V1 { int i; };
+  class V3 {virtual void g(); };
+  class C1 : public virtual V1 { int i; };
+  class C2 : public virtual V3, virtual V2 { int i; };
+  class X1 { int i; };
+  class C3 : public X1 { int i; };
+  class D : public C1, public C2, public C3 { int i;  };
+
+  D d;
+}
+
+// This is the sample from the C++ Itanium ABI, p2.6.2, with the change suggested
+// (making A2 a virtual base of V1)
+namespace Test4 {
+  class A1 { int i; };
+  class A2 { int i; virtual void f(); };
+  class V1 : public A1, public virtual A2 { int i; };
+  class B1 { int i; };
+  class B2 { int i; };
+  class V2 : public B1, public B2, public virtual V1 { int i; };
+  class V3 {virtual void g(); };
+  class C1 : public virtual V1 { int i; };
+  class C2 : public virtual V3, virtual V2 { int i; };
+  class X1 { int i; };
+  class C3 : public X1 { int

[PATCH] D153092: [Clang][CodeGen]`vtable`, `typeinfo` et al. are globals

2023-07-19 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added inline comments.



Comment at: clang/lib/CodeGen/CGVTables.cpp:836
+fnPtr =
+llvm::ConstantExpr::getAddrSpaceCast(fnPtr, CGM.GlobalsInt8PtrTy);
+  return builder.add(fnPtr);

efriedma wrote:
> If I follow correctly, the fnPtr is guaranteed to be in the global 
> address-space, but GetAddrOfFunction returns a generic pointer.  So the 
> vtable entries are in the global address-space for efficiency? Seems 
> reasonable.
> 
> There isn't really much point to explicitly checking `FnAS != GVAS` before 
> the call to getAddrSpaceCast; getAddrSpaceCast does the same check internally.
Right, we know that these are going to be in global memory, and there is 
overhead when dealing with flat/generic. I would've liked to remove the check, 
but I think that's infeasible with how getAddrSpaceCast is currently 
implemented, because it `assert`s on `castIsValid`; addrspacecasts from the 
same AS to the same AS are invalid, so the `assert` flares on targets where 
FnAS == GVAS (e.g. x86). This is not super ergonomic IMHO, as it should be 
valid & a NOP just returning the source, but that's a change that would be 
required to allow deleting the silly check, I believe.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153092/new/

https://reviews.llvm.org/D153092

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155759: [Clang][CodeGen] Follow-up for `vtable`, `typeinfo` et al. are globals

2023-07-19 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx created this revision.
AlexVlx added reviewers: yaxunl, efriedma, rjmccall.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Unfortunately, 
https://reviews.llvm.org/rG8acdcf4016876d122733991561be706b64026e73 didn't 
include handling for the fact that `throw`'s implementation takes a pointer to 
a type's `typeinfo` struct, which implies that its signature needs to change as 
well. This patch corrects that and adds a test.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155759

Files:
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp


Index: clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions 
-fexceptions -std=c++11 -o - | FileCheck %s
+
+struct X {
+  ~X();
+};
+
+struct Error {
+  Error(const X&) noexcept;
+};
+
+void f() {
+  try {
+throw Error(X());
+  } catch (...) { }
+}
+
+// CHECK: declare void @__cxa_throw(ptr, ptr addrspace(1), ptr)
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1252,7 +1252,7 @@
   // void __cxa_throw(void *thrown_exception, std::type_info *tinfo,
   //  void (*dest) (void *));
 
-  llvm::Type *Args[3] = { CGM.Int8PtrTy, CGM.Int8PtrTy, CGM.Int8PtrTy };
+  llvm::Type *Args[3] = { CGM.Int8PtrTy, CGM.GlobalsInt8PtrTy, CGM.Int8PtrTy };
   llvm::FunctionType *FTy =
 llvm::FunctionType::get(CGM.VoidTy, Args, /*isVarArg=*/false);
 


Index: clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions -fexceptions -std=c++11 -o - | FileCheck %s
+
+struct X {
+  ~X();
+};
+
+struct Error {
+  Error(const X&) noexcept;
+};
+
+void f() {
+  try {
+throw Error(X());
+  } catch (...) { }
+}
+
+// CHECK: declare void @__cxa_throw(ptr, ptr addrspace(1), ptr)
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1252,7 +1252,7 @@
   // void __cxa_throw(void *thrown_exception, std::type_info *tinfo,
   //  void (*dest) (void *));
 
-  llvm::Type *Args[3] = { CGM.Int8PtrTy, CGM.Int8PtrTy, CGM.Int8PtrTy };
+  llvm::Type *Args[3] = { CGM.Int8PtrTy, CGM.GlobalsInt8PtrTy, CGM.Int8PtrTy };
   llvm::FunctionType *FTy =
 llvm::FunctionType::get(CGM.VoidTy, Args, /*isVarArg=*/false);
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155759: [Clang][CodeGen] Follow-up for `vtable`, `typeinfo` et al. are globals

2023-07-19 Thread Alex Voicu via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGf385abf131e0: [Clang][CodeGen] Follow-up for `vtable`, 
`typeinfo` et al. are globals (authored by AlexVlx).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155759/new/

https://reviews.llvm.org/D155759

Files:
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp


Index: clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions 
-fexceptions -std=c++11 -o - | FileCheck %s
+
+struct X {
+  ~X();
+};
+
+struct Error {
+  Error(const X&) noexcept;
+};
+
+void f() {
+  try {
+throw Error(X());
+  } catch (...) { }
+}
+
+// CHECK: declare void @__cxa_throw(ptr, ptr addrspace(1), ptr)
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1252,7 +1252,7 @@
   // void __cxa_throw(void *thrown_exception, std::type_info *tinfo,
   //  void (*dest) (void *));
 
-  llvm::Type *Args[3] = { CGM.Int8PtrTy, CGM.Int8PtrTy, CGM.Int8PtrTy };
+  llvm::Type *Args[3] = { CGM.Int8PtrTy, CGM.GlobalsInt8PtrTy, CGM.Int8PtrTy };
   llvm::FunctionType *FTy =
 llvm::FunctionType::get(CGM.VoidTy, Args, /*isVarArg=*/false);
 


Index: clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/throw-expression-typeinfo-in-address-space.cpp
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions -fexceptions -std=c++11 -o - | FileCheck %s
+
+struct X {
+  ~X();
+};
+
+struct Error {
+  Error(const X&) noexcept;
+};
+
+void f() {
+  try {
+throw Error(X());
+  } catch (...) { }
+}
+
+// CHECK: declare void @__cxa_throw(ptr, ptr addrspace(1), ptr)
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1252,7 +1252,7 @@
   // void __cxa_throw(void *thrown_exception, std::type_info *tinfo,
   //  void (*dest) (void *));
 
-  llvm::Type *Args[3] = { CGM.Int8PtrTy, CGM.Int8PtrTy, CGM.Int8PtrTy };
+  llvm::Type *Args[3] = { CGM.Int8PtrTy, CGM.GlobalsInt8PtrTy, CGM.Int8PtrTy };
   llvm::FunctionType *FTy =
 llvm::FunctionType::get(CGM.VoidTy, Args, /*isVarArg=*/false);
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155769: [Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-07-19 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx created this revision.
AlexVlx added reviewers: rjmccall, yaxunl, aaron.ballman.
AlexVlx added a project: clang.
Herald added a subscriber: arphaman.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added a subscriber: cfe-commits.

This patch adds the documentation for the standard algorithm offload feature 
being proposed here: 
https://discourse.llvm.org/t/rfc-adding-c-parallel-algorithm-offload-support-to-clang-llvm/72159/1.
 It is the parent of a series of patches that make up the implementation.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155769

Files:
  clang/docs/StdParSupport.rst
  clang/docs/index.rst

Index: clang/docs/index.rst
===
--- clang/docs/index.rst
+++ clang/docs/index.rst
@@ -47,6 +47,7 @@
OpenCLSupport
OpenMPSupport
SYCLSupport
+   StdParSupport
HLSL/HLSLDocs
ThinLTO
APINotes
Index: clang/docs/StdParSupport.rst
===
--- /dev/null
+++ clang/docs/StdParSupport.rst
@@ -0,0 +1,381 @@
+==
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+.. contents::
+   :local:
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-stdpar --offload-target=foo`` flags, the call to
+``find`` will be offloaded to an accelerator that is part of the ``foo`` target
+family. If either ``foo`` or its runtime environment do not support transparent
+on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--stdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-target``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-stdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by

[PATCH] D155775: [Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-07-19 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx created this revision.
AlexVlx added reviewers: jansvoboda11, arsenm, yaxunl, MaskRay.
AlexVlx added a project: clang.
Herald added subscribers: cmtice, kerbowa, ormris, hiraditya, tpr, jvesely.
Herald added a reviewer: jhenderson.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, wdng.
Herald added a project: LLVM.

This patch adds the Driver changes needed by the standard algorithm offload 
feature being proposed here: 
https://discourse.llvm.org/t/rfc-adding-c-parallel-algorithm-offload-support-to-clang-llvm/72159/1.
 The verbose documentation is included in its parent patch. What this change 
does can be summed up as follows:

1. add two flags, one for enabling `stdpar` compilation, the second enabling 
the optional allocation interposition mode;
2. the flags correspond to new LangOpt members;
3. if we are compiling or linking with `-stdpar`, we enable HIP; in the 
compilation case C and C++ inputs are treated as HIP inputs;
4. the ROCm / AMDGPU driver is augmented to look for and include an 
implementation detail forwarding header, which is provided here 
https://github.com/ROCmSoftwarePlatform/roc-stdpar/blob/main/include/stdpar_lib.hpp;
 we error out if the user requested `stdpar` but the header or its dependencies 
cannot be found (it is plausible that in the future we'll move the check for 
the dependencies to the header itself in order to reduce the compiler 
footprint).

Tests for the behaviour described above are also added.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/stdpar/stdpar_lib.hpp
  clang/test/Driver/stdpar.c
  llvm/include/llvm/CodeGen/AccelTable.h
  llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
  llvm/test/DebugInfo/Generic/accel-table-hash-collisions.ll
  llvm/test/DebugInfo/Generic/apple-names-hash-collisions.ll
  llvm/test/DebugInfo/Generic/debug-names-hash-collisions.ll
  llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
  llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn

Index: llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn
===
--- llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn
+++ llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn
@@ -238,7 +238,6 @@
 "sha512intrin.h",
 "shaintrin.h",
 "sifive_vector.h",
-"sm3intrin.h",
 "smmintrin.h",
 "stdalign.h",
 "stdarg.h",
Index: llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
===
--- llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
+++ llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
@@ -11,7 +11,6 @@
 //===--===//
 
 #include "llvm-dwarfdump.h"
-#include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/StringSet.h"
@@ -464,7 +463,7 @@
 static void findAllApple(
 DWARFContext &DICtx, raw_ostream &OS,
 std::function GetNameForDWARFReg) {
-  MapVector> NameToDies;
+  StringMap> NameToDies;
 
   auto PushDIEs = [&](const AppleAcceleratorTable &Accel) {
 for (const auto &Entry : Accel.entries()) {
Index: llvm/test/DebugInfo/Generic/debug-names-hash-collisions.ll
===
--- llvm/test/DebugInfo/Generic/debug-names-hash-collisions.ll
+++ llvm/test/DebugInfo/Generic/debug-names-hash-collisions.ll
@@ -29,21 +29,21 @@
 ; Check that all the names are present in the output
 ; CHECK: Bucket 0
 ; CHECK: Hash: 0xF8CF70D
-; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZN4lldb7SBBlockaSERKS0_"
-; CHECK: Hash: 0xF8CF70D
 ; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZN4lldb7SBBlockC1ERKS0_"
-; CHECK: Hash: 0x135A482C
-; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZN4lldb7SBErroraSERKS0_"
+; CHECK: Hash: 0xF8CF70D
+; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZN4lldb7SBBlockaSERKS0_"
 ; CHECK: Hash: 0x135A482C
 ; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZN4lldb7SBErrorC1ERKS0_"
+; CHECK: Hash: 0x135A482C
+; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZN4lldb7SBErroraSERKS0_"
 ; CHECK-NOT: String:
 ; CHECK: Bucket 1
 ; CHECK-NEXT: EMPTY
 ; CHECK: Bucket 2
 ; CHECK: Hash: 0x2841B989
-; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZL11numCommutes"
-; CHECK: Hash: 0x2841B989
 ; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZL11NumCommutes"
+; CHECK: Hash: 0x2841B989
+; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZL11numCommutes"
 ; CHECK: Hash: 0x3E190F5F
 ; CHECK-NEXT:String: 0x{{[0-9a-f]*}} "_ZL9NumRemats"
 ; CHECK: Hash: 0x3E190F5F
Index: llvm/test/DebugInfo/Generic/apple-names-hash-collisions.ll
=

[PATCH] D155775: [Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-07-19 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 542287.
AlexVlx added a comment.

Removed some accidental noise.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/stdpar/stdpar_lib.hpp
  clang/test/Driver/stdpar.c

Index: clang/test/Driver/stdpar.c
===
--- /dev/null
+++ clang/test/Driver/stdpar.c
@@ -0,0 +1,18 @@
+// RUN: %clang -### -stdpar --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=STDPAR-MISSING-LIB %s
+// STDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--stdpar-path'
+
+// RUN: %clang -### --stdpar --stdpar-path=%S/Inputs/stdpar \
+// RUN:   --stdpar-thrust-path=%S/Inputs/stdpar/thrust \
+// RUN:   --stdpar-prim-path=%S/Inputs/stdpar/prim --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=STDPAR-COMPILE %s
+// STDPAR-COMPILE: "-x" "hip"
+// STDPAR-COMPILE: "-idirafter" "{{.*/Inputs/stdpar/thrust}}"
+// STDPAR-COMPILE: "-idirafter" "{{.*/Inputs/stdpar/prim}}"
+// STDPAR-COMPILE: "-idirafter" "{{.*/Inputs/stdpar}}"
+// STDPAR-COMPILE: "-include" "stdpar_lib.hpp"
+
+// RUN: touch %t.o
+// RUN: %clang -### -stdpar %t.o 2>&1 | FileCheck --check-prefix=STDPAR-LINK %s
+// STDPAR-LINK: "-rpath"
+// STDPAR-LINK: "-l{{.*hip.*}}"
Index: clang/lib/Driver/ToolChains/ROCm.h
===
--- clang/lib/Driver/ToolChains/ROCm.h
+++ clang/lib/Driver/ToolChains/ROCm.h
@@ -77,6 +77,9 @@
   const Driver &D;
   bool HasHIPRuntime = false;
   bool HasDeviceLibrary = false;
+  bool HasHIPStdParLibrary = false;
+  bool HasRocThrustLibrary = false;
+  bool HasRocPrimLibrary = false;
 
   // Default version if not detected or specified.
   const unsigned DefaultVersionMajor = 3;
@@ -96,6 +99,13 @@
   std::vector RocmDeviceLibPathArg;
   // HIP runtime path specified by --hip-path.
   StringRef HIPPathArg;
+  // HIP Standard Parallel Algorithm acceleration library specified by
+  // --stdpar-path
+  StringRef HIPStdParPathArg;
+  // rocThrust algorithm library specified by --stdpar-thrust-path
+  StringRef HIPRocThrustPathArg;
+  // rocPrim algorithm library specified by --stdpar-prim-path
+  StringRef HIPRocPrimPathArg;
   // HIP version specified by --hip-version.
   StringRef HIPVersionArg;
   // Wheter -nogpulib is specified.
@@ -180,6 +190,9 @@
   /// Check whether we detected a valid ROCm device library.
   bool hasDeviceLibrary() const { return HasDeviceLibrary; }
 
+  /// Check whether we detected a valid HIP STDPAR Acceleration library.
+  bool hasHIPStdParLibrary() const { return HasHIPStdParLibrary; }
+
   /// Print information about the detected ROCm installation.
   void print(raw_ostream &OS) const;
 
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6527,6 +6527,12 @@
 if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
  options::OPT_fno_gpu_allow_device_init, false))
   CmdArgs.push_back("-fgpu-allow-device-init");
+if (Args.hasArg(options::OPT_stdpar)) {
+  CmdArgs.push_back("-stdpar");
+
+  if (Args.hasArg(options::OPT_stdpar_interpose_alloc))
+CmdArgs.push_back("-stdpar-interpose-alloc");
+}
 Args.addOptInFlag(CmdArgs, options::OPT_fhip_kernel_arg_name,
   options::OPT_fno_hip_kernel_arg_name);
   }
Index: clang/lib/Driver/ToolChains/AMDGPU.cpp
===
--- clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -329,6 +329,19 @@
   RocmDeviceLibPathArg =
   Args.getAllArgValues(clang::driver::options::OPT_rocm_device_lib_path_EQ);
   HIPPathArg = Args.getLastArgValue(clang::driver::options::OPT_hip_path_EQ);
+  HIPStdParPathArg =
+Args.getLastArgValue(clang::driver::options::OPT_stdpar_path_EQ);
+  HasHIPStdParLibrary = !HIPStdParPathArg.empty() &&
+D.getVFS().exists(HIPStdParPathArg + "/stdpar_lib.hpp");
+  HIPRocThrustPathArg =
+Args.getLastArgValue(clang::driver::options::OPT_stdpar_thrust_path_EQ);
+  HasRocThrustLibrary = !HIPRocThrustPathArg.empty() &&
+D.getVFS().exists(HIPRocThrustPathArg + "/thrust");
+  HIPRocPrimPathArg =
+Args.getLastArgValue(clang::driver::options::OPT_stdpar_prim_path_EQ);
+  HasRocPrimLibrary = !HIPRocPrimPathArg.empty() &&
+  D.getVFS().exists(HIPRocPrimPathArg + "/rocprim");
+
   if (auto *A = Args.getLastArg(clang::driver::options::OPT

[PATCH] D155826: [Clang][Preprocessor][RFC] Add preprocessor support for C++ Parallel Algorithm Offload

2023-07-20 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx created this revision.
AlexVlx added reviewers: yaxunl, arsenm.
AlexVlx added a project: clang.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added subscribers: cfe-commits, wdng.

This patch adds the Driver changes needed by the standard algorithm offload 
feature being proposed here: 
https://discourse.llvm.org/t/rfc-adding-c-parallel-algorithm-offload-support-to-clang-llvm/72159/1.
 The verbose documentation is included in the head of the patch series. This 
change merely adds two macros to inform user space if we are compiling in 
`stdpar` mode and, respectively, if the optional allocation interposition mode 
has been requested, as well as associated minimal tests. The macros can be used 
by the runtime implementation of offload to drive conditional compilation, and 
are only defined if the HIP language has been enabled.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155826

Files:
  clang/lib/Frontend/InitPreprocessor.cpp
  clang/test/Preprocessor/predefined-macros.c


Index: clang/test/Preprocessor/predefined-macros.c
===
--- clang/test/Preprocessor/predefined-macros.c
+++ clang/test/Preprocessor/predefined-macros.c
@@ -290,3 +290,19 @@
 // RUN:   -fcuda-is-device -fgpu-default-stream=per-thread \
 // RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-PTH
 // CHECK-PTH: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -stdpar -triple 
x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-STDPAR
+// CHECK-STDPAR: #define __STDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -stdpar -stdpar-interpose-alloc \
+// RUN:  -triple x86_64-unknown-linux-gnu | FileCheck -match-full-lines %s \
+// RUN:  --check-prefix=CHECK-STDPAR-INTERPOSE
+// CHECK-STDPAR-INTERPOSE: #define __STDPAR_INTERPOSE_ALLOC__ 1
+// CHECK-STDPAR-INTERPOSE: #define __STDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -stdpar -stdpar-interpose-alloc \
+// RUN:  -triple amdgcn-amd-amdhsa -fcuda-is-device | FileCheck 
-match-full-lines \
+// RUN:  %s --check-prefix=CHECK-STDPAR-INTERPOSE-DEV-NEG
+// CHECK-STDPAR-INTERPOSE-DEV-NEG: #define __STDPAR__ 1
+// CHECK-STDPAR-INTERPOSE-DEV-NEG-NOT: #define __STDPAR_INTERPOSE_ALLOC__ 1
\ No newline at end of file
Index: clang/lib/Frontend/InitPreprocessor.cpp
===
--- clang/lib/Frontend/InitPreprocessor.cpp
+++ clang/lib/Frontend/InitPreprocessor.cpp
@@ -586,6 +586,11 @@
 Builder.defineMacro("__HIP_MEMORY_SCOPE_WORKGROUP", "3");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_AGENT", "4");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_SYSTEM", "5");
+if (LangOpts.HIPStdPar) {
+  Builder.defineMacro("__STDPAR__");
+  if (!LangOpts.CUDAIsDevice)
+Builder.defineMacro("__STDPAR_INTERPOSE_ALLOC__");
+}
 if (LangOpts.CUDAIsDevice) {
   Builder.defineMacro("__HIP_DEVICE_COMPILE__");
   if (!TI.hasHIPImageSupport()) {


Index: clang/test/Preprocessor/predefined-macros.c
===
--- clang/test/Preprocessor/predefined-macros.c
+++ clang/test/Preprocessor/predefined-macros.c
@@ -290,3 +290,19 @@
 // RUN:   -fcuda-is-device -fgpu-default-stream=per-thread \
 // RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-PTH
 // CHECK-PTH: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -stdpar -triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-STDPAR
+// CHECK-STDPAR: #define __STDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -stdpar -stdpar-interpose-alloc \
+// RUN:  -triple x86_64-unknown-linux-gnu | FileCheck -match-full-lines %s \
+// RUN:  --check-prefix=CHECK-STDPAR-INTERPOSE
+// CHECK-STDPAR-INTERPOSE: #define __STDPAR_INTERPOSE_ALLOC__ 1
+// CHECK-STDPAR-INTERPOSE: #define __STDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -stdpar -stdpar-interpose-alloc \
+// RUN:  -triple amdgcn-amd-amdhsa -fcuda-is-device | FileCheck -match-full-lines \
+// RUN:  %s --check-prefix=CHECK-STDPAR-INTERPOSE-DEV-NEG
+// CHECK-STDPAR-INTERPOSE-DEV-NEG: #define __STDPAR__ 1
+// CHECK-STDPAR-INTERPOSE-DEV-NEG-NOT: #define __STDPAR_INTERPOSE_ALLOC__ 1
\ No newline at end of file
Index: clang/lib/Frontend/InitPreprocessor.cpp
===
--- clang/lib/Frontend/InitPreprocessor.cpp
+++ clang/lib/Frontend/InitPreprocessor.cpp
@@ -586,6 +586,11 @@
 Builder.defineMacro("__HIP_MEMORY_SCOPE_WORKGROUP", "3");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_AGENT", "4");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_SYSTEM", "5");
+if (LangOpts.HIPStdPar) {
+  Builder.defineMacro("__STDPAR__");
+  if (!LangOpts.CUDAIsDevice)
+Builder.defineMacro("__STDPAR_INTERPOSE_ALLOC__");
+}

[PATCH] D155833: [Clang][Sema][RFC] Add Sema support for C++ Parallel Algorithm Offload

2023-07-20 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx created this revision.
AlexVlx added reviewers: yaxunl, jlebar, tra.
AlexVlx added a project: clang.
Herald added subscribers: ChuanqiXu, s.egerton, simoncook, asb, dschuff.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added subscribers: cfe-commits, wangpc, jplehr, sstefan1, aheejin.
Herald added a reviewer: jdoerfert.

This patch adds the Sema changes needed by the standard algorithm offload
feature being proposed here:
https://discourse.llvm.org/t/rfc-adding-c-parallel-algorithm-offload-support-to-clang-llvm/72159/1.
The verbose documentation is included in the head of the patch series. This
change impacts the CUDA / HIP language specific checks, and only manifests if
compiling in `stdpar` mode. In this case, we essentially do two things:

1. Allow device side callers to call host side callees - since the user visible
HLL would be standard C++, with no annotations / restriction mechanisms, we
cannot unambiguously establish that such a call is an error, so we
conservatively allow all such calls, deferring actual cleanup to a subsequent
pass over IR;
2. Allow host formed lambdas to capture by reference;
3. Allow device functions to use host global variables.

Please note that host and device here are used to match existing nomenclature,
they would not be present in user code.

Repository:
rG LLVM Github Monorepo

https://reviews.llvm.org/D155833

Files:
clang/lib/Sema/SemaCUDA.cpp
clang/lib/Sema/SemaExpr.cpp
clang/test/SemaStdPar/Inputs/stdpar_lib.hpp
clang/test/SemaStdPar/device-can-call-host.cpp
stdpar_sema.patch

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-07-20 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx created this revision.
AlexVlx added reviewers: yaxunl, rjmccall, eli.friedman, arsenm, tra, jlebar.
AlexVlx added a project: clang.
Herald added a subscriber: ormris.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added subscribers: cfe-commits, wdng.

This patch adds the CodeGen changes needed by the standard algorithm offload 
feature being proposed here: 
https://discourse.llvm.org/t/rfc-adding-c-parallel-algorithm-offload-support-to-clang-llvm/72159/1.
 The verbose documentation is included in the head of the patch series. This 
change concludes the set of additions needed in Clang, and essentially relaxes 
restrictions on what gets emitted on the device path, when compiling in 
`stdpar` mode (after the previous patch relaxed restrictions on what is 
semantically correct):

1. Unless a function is explicitly marked `__host__`, it will get emitted, 
whereas before only `__device__` and `__global__` functions would be emitted;
  - At the moment we special case `thread_local` handling and still do not emit 
them, as they will require more scaffolding that will be proposed at some point 
in the future.
2. Unsupported builtins are ignored as opposed to being marked as an error, as 
the decision on their validity is deferred to the `stdpar` specific code 
selection pass we are adding, which will be the topic of the final patch in 
this series;
3. We add the `stdpar` specific passes to the `opt` pipeline, independent of 
optimisation level:
  - When compiling for the accelerator / offload device, we add a code 
selection pass;
  - When compiling for the host, iff the user requested it via the 
`--stdpar-interpose-alloc` flag, we add a pass which replaces canonical 
allocation / deallocation functions with accelerator aware equivalents.

A test to validate that unannotated functions get correctly emitted is added as 
well. Please note that `__device__`, `__global__` and `__host__` are used to 
match existing nomenclature, they would not be present in user code.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp


Index: clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-STDPAR-DEV %s
+
+// RUN: %clang_cc1 --stdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=STDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-STDPAR-DEV-NOT: define {{.*}} void @_Z3fooPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3fooPff({{.*}})
+void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+__device__ void bar(float *a, float b) {
+  *a = b;
+}
\ No newline at end of file
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3545,7 +3545,12 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!cast(Global)->getBuiltinID() &&
+!Global->hasAttr() &&
+!cast(Global)->isVariadic()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
@@ -5310,7 +5315,9 @@
 
   setNonAliasAttributes(D, GV);
 
-  if (D->getTLSKind() && !GV->isThreadLocal()) {
+  if (D->getTLSKind() &&
+  !GV->isThreadLocal() &&
+  !(getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice)) {
 if (D->getTLSKind() == VarDecl::TLS_Dynamic)
   CXXThreadLocals.push_back(D);
 setTLSMode(GV, *D);
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -5538,7 +5538,8 @@
 llvm_unreachable("Bad evaluation kind in EmitBuiltinExpr");
   }
 
-  ErrorUnsupported(E, "builtin function");
+  if (!getLangOpts().HIPStdPar)
+ErrorUnsupported(E, "builtin function");
 
   // Unknown builtin, for now just dump it out and return undef.
   return GetUndefRValue(E->getType());
Index: clang/lib/CodeGen/BackendUtil.cpp
===
--- clang/lib/CodeGen/BackendUtil.cpp
+++ clang/lib/CodeGen/BackendUtil.cpp
@@ -77,6 +77,7 @@
 #include "llvm/T

[PATCH] D155870: [Clang][CodeGen] Another follow-up for `vtable`, `typeinfo` et al. are globals

2023-07-20 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx created this revision.
AlexVlx added reviewers: rjmccall, efriedma, yaxunl.
AlexVlx added a project: clang.
Herald added a subscriber: arichardson.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added a subscriber: cfe-commits.

Turns out that in 
https://reviews.llvm.org/rG8acdcf4016876d122733991561be706b64026e73 I missed 
another use case (`dynamic_cast` relies on `typeinfo`, which its signature 
assumed to be in the generic address space), sadly. This patch corrects the 
oversight and adds an associated test.

P.S.: the current and @bjope 's inquiry on https://reviews.llvm.org/D153092 
makes me wonder if we want to consider a more centralised approach to all of 
the polymorphism related bits, so that future changes can be localised to one / 
a few spots, and then propagate around. I admit I've not thought deeply about 
it though and it would probably be non-trivial.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155870

Files:
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/dynamic-cast-address-space.cpp


Index: clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm 
-fcxx-exceptions -fexceptions -o - | FileCheck %s
+struct A { virtual void f(); };
+struct B : A { };
+
+// CHECK: {{define.*@_Z1fP1A}}
+// CHECK-SAME:  personality ptr @__gxx_personality_v0
+B fail;
+const B& f(A *a) {
+  try {
+// CHECK: call ptr @__dynamic_cast
+// CHECK: br i1
+// CHECK: invoke void @__cxa_bad_cast() [[NR:#[0-9]+]]
+dynamic_cast(*a);
+  } catch (...) {
+// CHECK:  landingpad { ptr, i32 }
+// CHECK-NEXT:   catch ptr null
+  }
+  return fail;
+}
+
+// CHECK: declare ptr @__dynamic_cast(ptr, ptr addrspace(1), ptr addrspace(1), 
i64) [[NUW_RO:#[0-9]+]]
+
+// CHECK: attributes [[NUW_RO]] = { nounwind memory(read) }
+// CHECK: attributes [[NR]] = { noreturn }
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1296,15 +1296,16 @@
 
 static llvm::FunctionCallee getItaniumDynamicCastFn(CodeGenFunction &CGF) {
   // void *__dynamic_cast(const void *sub,
-  //  const abi::__class_type_info *src,
-  //  const abi::__class_type_info *dst,
+  //  global_as const abi::__class_type_info *src,
+  //  global_as const abi::__class_type_info *dst,
   //  std::ptrdiff_t src2dst_offset);
 
   llvm::Type *Int8PtrTy = CGF.Int8PtrTy;
+  llvm::Type *GlobInt8PtrTy = CGF.GlobalsInt8PtrTy;
   llvm::Type *PtrDiffTy =
 CGF.ConvertType(CGF.getContext().getPointerDiffType());
 
-  llvm::Type *Args[4] = { Int8PtrTy, Int8PtrTy, Int8PtrTy, PtrDiffTy };
+  llvm::Type *Args[4] = { Int8PtrTy, GlobInt8PtrTy, GlobInt8PtrTy, PtrDiffTy };
 
   llvm::FunctionType *FTy = llvm::FunctionType::get(Int8PtrTy, Args, false);
 


Index: clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions -fexceptions -o - | FileCheck %s
+struct A { virtual void f(); };
+struct B : A { };
+
+// CHECK: {{define.*@_Z1fP1A}}
+// CHECK-SAME:  personality ptr @__gxx_personality_v0
+B fail;
+const B& f(A *a) {
+  try {
+// CHECK: call ptr @__dynamic_cast
+// CHECK: br i1
+// CHECK: invoke void @__cxa_bad_cast() [[NR:#[0-9]+]]
+dynamic_cast(*a);
+  } catch (...) {
+// CHECK:  landingpad { ptr, i32 }
+// CHECK-NEXT:   catch ptr null
+  }
+  return fail;
+}
+
+// CHECK: declare ptr @__dynamic_cast(ptr, ptr addrspace(1), ptr addrspace(1), i64) [[NUW_RO:#[0-9]+]]
+
+// CHECK: attributes [[NUW_RO]] = { nounwind memory(read) }
+// CHECK: attributes [[NR]] = { noreturn }
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1296,15 +1296,16 @@
 
 static llvm::FunctionCallee getItaniumDynamicCastFn(CodeGenFunction &CGF) {
   // void *__dynamic_cast(const void *sub,
-  //  const abi::__class_type_info *src,
-  //  const abi::__class_type_info *dst,
+  //  global_as const abi::__class_type_info *src,
+  //  global_as const abi::__class_type_info *dst,
   //  std::ptrdiff_t src2dst_offset);
 
   llvm::Type *Int8PtrTy = CGF.Int8PtrTy;
+  llvm::Type *GlobInt8PtrTy = CGF.GlobalsInt8PtrTy;
   llvm::Type *PtrDiffTy =
 CGF.ConvertType(CGF.getContext()

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-07-20 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added inline comments.

Comment at: clang/lib/CodeGen/CGBuiltin.cpp:5542
+  if (!getLangOpts().HIPStdPar)
+ErrorUnsupported(E, "builtin function");

efriedma wrote:
> This doesn't make sense; we can't just ignore bits of the source code.  I 
> guess this is related to "the decision on their validity is deferred", but I 
> don't see how you expect this to work.
This is one of the weirder parts, so let's consider the following example:

```cpp
void foo() { __builtin_ia32_pause(); }
void bar() { __builtin_trap(); }

void baz(const vector& v) {
return for_each(par_unseq, cbegin(v), cend(v), [](auto&& x) { if (x == 42) 
bar(); });
}
```

In the case above, what we'd offload to the accelerator, and ask the target BE 
to lower, is the implementation of `for_each`, and `bar`, because it is 
reachable from the latter. `foo` is not reachable by any execution path on the 
accelerator side, however it includes a builtin that is unsupported by the 
accelerator (unless said accelerator is x86, which is not impossible, but not 
something we're dealing with at the moment). If we were to actually error out 
early, in the FE, in these cases, there's almost no appeal to what is being 
proposed, because standard headers, as well as other libraries, are littered 
with various target specific builtins that are not going to be supported. This 
all builds on the core invariant of this model / extension / thingamabob, which 
is that the algorithms, and only the algorithms, are targets for offload. It 
thus follows that as long as code that is reachable from an algorithm's 
implementation is safe, all is fine, but we cannot know this in the FE / on an 
AST level, because we need the actual CFG. This part is handled in LLVM in the 
`SelectAcceleratorCodePass` that's in the last patch in this series.

Now, you will quite correctly observe that there's nothing preventing an user 
from calling `foo` in the callable they pass to an algorithm; they might read 
the docs / appreciate that this won't work, but even there they are not safe, 
because there via some opaque call chain they might end up touching some 
unsupported builtin. My intuition here, which is reflected above in letting 
builtins just flow through, is that such cases are better served with a compile 
time error, which is what will obtain once the target BE chokes trying to lower 
an unsupported builtin. It's not going to be a beautiful error, and we could 
probably prettify it somewhat if we were to check after we've done the 
accelerator code selection pass, but it will happen at compile time. Another 
solution would be to emit these as traps (poison + trap for value returning 
ones), but I am concerned that it would lead to really fascinating debug 
journeys.

Having said this, if there's a better way to deal with these scenarios, it 
would be rather nice. Similarly, if the above doesn't make sense, please let me 
know.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-07-21 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:5542
+  if (!getLangOpts().HIPStdPar)
+ErrorUnsupported(E, "builtin function");
 

efriedma wrote:
> AlexVlx wrote:
> > efriedma wrote:
> > > This doesn't make sense; we can't just ignore bits of the source code.  I 
> > > guess this is related to "the decision on their validity is deferred", 
> > > but I don't see how you expect this to work.
> > This is one of the weirder parts, so let's consider the following example:
> > 
> > ```cpp
> > void foo() { __builtin_ia32_pause(); }
> > void bar() { __builtin_trap(); }
> > 
> > void baz(const vector& v) {
> > return for_each(par_unseq, cbegin(v), cend(v), [](auto&& x) { if (x == 
> > 42) bar(); });
> > }
> > ```
> > 
> > In the case above, what we'd offload to the accelerator, and ask the target 
> > BE to lower, is the implementation of `for_each`, and `bar`, because it is 
> > reachable from the latter. `foo` is not reachable by any execution path on 
> > the accelerator side, however it includes a builtin that is unsupported by 
> > the accelerator (unless said accelerator is x86, which is not impossible, 
> > but not something we're dealing with at the moment). If we were to actually 
> > error out early, in the FE, in these cases, there's almost no appeal to 
> > what is being proposed, because standard headers, as well as other 
> > libraries, are littered with various target specific builtins that are not 
> > going to be supported. This all builds on the core invariant of this model 
> > / extension / thingamabob, which is that the algorithms, and only the 
> > algorithms, are targets for offload. It thus follows that as long as code 
> > that is reachable from an algorithm's implementation is safe, all is fine, 
> > but we cannot know this in the FE / on an AST level, because we need the 
> > actual CFG. This part is handled in LLVM in the `SelectAcceleratorCodePass` 
> > that's in the last patch in this series.
> > 
> > Now, you will quite correctly observe that there's nothing preventing an 
> > user from calling `foo` in the callable they pass to an algorithm; they 
> > might read the docs / appreciate that this won't work, but even there they 
> > are not safe, because there via some opaque call chain they might end up 
> > touching some unsupported builtin. My intuition here, which is reflected 
> > above in letting builtins just flow through, is that such cases are better 
> > served with a compile time error, which is what will obtain once the target 
> > BE chokes trying to lower an unsupported builtin. It's not going to be a 
> > beautiful error, and we could probably prettify it somewhat if we were to 
> > check after we've done the accelerator code selection pass, but it will 
> > happen at compile time. Another solution would be to emit these as traps 
> > (poison + trap for value returning ones), but I am concerned that it would 
> > lead to really fascinating debug journeys.
> > 
> > Having said this, if there's a better way to deal with these scenarios, it 
> > would be rather nice. Similarly, if the above doesn't make sense, please 
> > let me know.
> > 
> Oh, I see; you "optimistically" compile everything assuming it might run on 
> the accelerator, then run LLVM IR optimizations, then determine late which 
> bits of code will actually run on the accelerator, which then prunes the code 
> which shouldn't run.
> 
> I'm not sure I really like this... would it be possible to infer which 
> functions need to be run on the accelerator based on the AST?  I mean, if 
> your API takes a lambda expression that runs on the accelerator, you can mark 
> the lambda's body as "must be emitted for GPU", then recursively mark all the 
> functions referred to by the lambda.
> 
> Emiting errors lazily from the backend means you get different diagnostics 
> depending on the optimization level.
> 
> If you do go with this codegen-based approach, it's not clear to me how you 
> detect that a forbidden builtin was called; if you skip the error handling, 
> you just get a literal "undef".
`I'm not sure I really like this...` - actually, I am not a big fan either, 
however I think it's about the best one can do, given the constraints (consume 
standard C++, no annotations on the user side etc.). Having tried a few times 
in the past (and at least once in a different compiler), I don't quite think 
this can be done on an AST level. It would add some fairly awkward checking 
during template instantiation (no way to know earlier that a `CallableFoo` was 
passed to an offloadable algorithm), and it's a bit unwieldy to basically 
compute the CFG on the AST and mark reachable Callees at that point. Ignoring 
those, the main reason for which we cannot do this is that the interface is not 
constrained to only take lambdas, but callables in general, and that includes 
pointers to function as well. We don't deal with those today, but plan to, and 
ther

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-07-21 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

@yaxunl interesting point - are you worried about cases where due to missing 
inlining / const prop an indirect call site that can be replaced with a direct 
one would remain indirect? I think the problem in that case would actually be 
different, in that possibly reachable functions would not be identified as such 
and would be erroneously removed. I'm not sure there's any case where we'd fail 
to remove a meant to be unreachable function. We can definitely go with the 
`__clang_unsupported` approach, but I think I'd prefer these to be compile time 
errors rather than remarks + runtime `printf`, not in the least because 
`printf` adds some overhead. A way to ensure we don't "miss a spot" might be to 
check after removal for any remaining unsupported builtins, instead of doing it 
during reachability computation (this is coupled with the special naming from 
the prior post).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155870: [Clang][CodeGen] Another follow-up for `vtable`, `typeinfo` et al. are globals

2023-07-26 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 544324.
AlexVlx added a comment.

Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155870/new/

https://reviews.llvm.org/D155870

Files:
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/dynamic-cast-address-space.cpp


Index: clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm 
-fcxx-exceptions -fexceptions -o - | FileCheck %s
+struct A { virtual void f(); };
+struct B : A { };
+
+// CHECK: {{define.*@_Z1fP1A}}
+// CHECK-SAME:  personality ptr @__gxx_personality_v0
+B fail;
+const B& f(A *a) {
+  try {
+// CHECK: call ptr @__dynamic_cast
+// CHECK: br i1
+// CHECK: invoke void @__cxa_bad_cast() [[NR:#[0-9]+]]
+dynamic_cast(*a);
+  } catch (...) {
+// CHECK:  landingpad { ptr, i32 }
+// CHECK-NEXT:   catch ptr null
+  }
+  return fail;
+}
+
+// CHECK: declare ptr @__dynamic_cast(ptr, ptr addrspace(1), ptr addrspace(1), 
i64) [[NUW_RO:#[0-9]+]]
+
+// CHECK: attributes [[NUW_RO]] = { nounwind memory(read) }
+// CHECK: attributes [[NR]] = { noreturn }
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1344,15 +1344,16 @@
 
 static llvm::FunctionCallee getItaniumDynamicCastFn(CodeGenFunction &CGF) {
   // void *__dynamic_cast(const void *sub,
-  //  const abi::__class_type_info *src,
-  //  const abi::__class_type_info *dst,
+  //  global_as const abi::__class_type_info *src,
+  //  global_as const abi::__class_type_info *dst,
   //  std::ptrdiff_t src2dst_offset);
 
   llvm::Type *Int8PtrTy = CGF.Int8PtrTy;
+  llvm::Type *GlobInt8PtrTy = CGF.GlobalsInt8PtrTy;
   llvm::Type *PtrDiffTy =
 CGF.ConvertType(CGF.getContext().getPointerDiffType());
 
-  llvm::Type *Args[4] = { Int8PtrTy, Int8PtrTy, Int8PtrTy, PtrDiffTy };
+  llvm::Type *Args[4] = { Int8PtrTy, GlobInt8PtrTy, GlobInt8PtrTy, PtrDiffTy };
 
   llvm::FunctionType *FTy = llvm::FunctionType::get(Int8PtrTy, Args, false);
 


Index: clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions -fexceptions -o - | FileCheck %s
+struct A { virtual void f(); };
+struct B : A { };
+
+// CHECK: {{define.*@_Z1fP1A}}
+// CHECK-SAME:  personality ptr @__gxx_personality_v0
+B fail;
+const B& f(A *a) {
+  try {
+// CHECK: call ptr @__dynamic_cast
+// CHECK: br i1
+// CHECK: invoke void @__cxa_bad_cast() [[NR:#[0-9]+]]
+dynamic_cast(*a);
+  } catch (...) {
+// CHECK:  landingpad { ptr, i32 }
+// CHECK-NEXT:   catch ptr null
+  }
+  return fail;
+}
+
+// CHECK: declare ptr @__dynamic_cast(ptr, ptr addrspace(1), ptr addrspace(1), i64) [[NUW_RO:#[0-9]+]]
+
+// CHECK: attributes [[NUW_RO]] = { nounwind memory(read) }
+// CHECK: attributes [[NR]] = { noreturn }
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1344,15 +1344,16 @@
 
 static llvm::FunctionCallee getItaniumDynamicCastFn(CodeGenFunction &CGF) {
   // void *__dynamic_cast(const void *sub,
-  //  const abi::__class_type_info *src,
-  //  const abi::__class_type_info *dst,
+  //  global_as const abi::__class_type_info *src,
+  //  global_as const abi::__class_type_info *dst,
   //  std::ptrdiff_t src2dst_offset);
 
   llvm::Type *Int8PtrTy = CGF.Int8PtrTy;
+  llvm::Type *GlobInt8PtrTy = CGF.GlobalsInt8PtrTy;
   llvm::Type *PtrDiffTy =
 CGF.ConvertType(CGF.getContext().getPointerDiffType());
 
-  llvm::Type *Args[4] = { Int8PtrTy, Int8PtrTy, Int8PtrTy, PtrDiffTy };
+  llvm::Type *Args[4] = { Int8PtrTy, GlobInt8PtrTy, GlobInt8PtrTy, PtrDiffTy };
 
   llvm::FunctionType *FTy = llvm::FunctionType::get(Int8PtrTy, Args, false);
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155870: [Clang][CodeGen] Another follow-up for `vtable`, `typeinfo` et al. are globals

2023-07-26 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

Gentle ping.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155870/new/

https://reviews.llvm.org/D155870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155870: [Clang][CodeGen] Another follow-up for `vtable`, `typeinfo` et al. are globals

2023-07-26 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D155870#4535425 , @yaxunl wrote:

> `__dynamic_cast` is part of standard C++ library. If we ever implement it for 
> GPU, chances are we will use libc++abi with the same signature as other 
> targets, i.e., the 2nd and 3rd arguments are generic pointers.
>
> I feel it is safer to do an address space cast when calling the function 
> instead of changing its signature.

That's a fair point, and one I admit to not having considered (I'm not sure 
this would harm linking in practice though, since `__dynamic_cast` is extern 
"C" and thus unmangled; it might lead to broken IR depending on how we codegen 
the body). The concern I have with doing the cast at the callsite is that there 
might be many callsites, and it's going to make things a bit noisier than they 
need to be (IMHO). Also, this is an internal implementation detail of the 
standard library, and not the interface that users are expected to touch; I 
wonder if given the target specific nature of these bits it wouldn't be natural 
for our future implementation to carry the AS in the signature?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155870/new/

https://reviews.llvm.org/D155870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155870: [Clang][CodeGen] Another follow-up for `vtable`, `typeinfo` et al. are globals

2023-07-27 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

Ping @rjmccall or @efriedma for a non AMD perspective, if possible.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155870/new/

https://reviews.llvm.org/D155870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-07-27 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 544954.
AlexVlx removed a reviewer: eli.friedman.
AlexVlx added a comment.

This adds more ecumenical handling of unsupported builtins, as per the review 
discussion (a suffixed equivalent stub is emitted instead); it's paired with an 
associated change in accelerator code selection pass, where the actual check 
for these stubs occurs. I've also adjusted where the latter pass gets added to 
the `opt` pipeline, for the AMDGCN target; for the latter it's better, for the 
moment, to run it later because we essentially do LTCG, and therefore can 
unambiguously determine reachability by operating on the full module.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --stdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__stdpar_unsupported()
Index: clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-STDPAR-DEV %s
+
+// RUN: %clang_cc1 --stdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=STDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-STDPAR-DEV-NOT: define {{.*}} void @_Z3fooPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3fooPff({{.*}})
+void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+__device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3545,7 +3545,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
@@ -5307,7 +5310,9 @@
 
   setNonAliasAttributes(D, GV);
 
-  if (D->getTLSKind() && !GV->isThreadLocal()) {
+  if (D->getTLSKind() &&
+  !GV->isThreadLocal() &&
+  !(getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice)) {
 if (D->getTLSKind() == VarDecl::TLS_Dynamic)
   CXXThreadLocals.push_back(D);
 setTLSMode(GV, *D);
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -2251,6 +2251,19 @@
   return nullptr;
 }
 
+static RValue EmitStdParUnsupportedBuiltin(CodeGenFunction *CGF,
+   const FunctionDecl *FD) {
+  auto Name = FD->getNameAsString() + "__stdpar_unsupported";
+  auto FnTy = CGF->CGM.getTypes().GetFunctionType(FD);
+  auto UBF = CGF->CGM.getModule().getOrInsertFunction(Name, FnTy);
+
+  SmallVector Args;
+  for (auto &&FormalTy : FnTy->params())
+Args.push_back(llvm::PoisonValue::get(FormalTy));
+
+  return RValue::get(CGF->Builder.CreateCall(UBF, Args));
+}
+
 RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
 const CallExpr *E,
 ReturnValueSlot ReturnValue) {
@@ -5541,7 +5554,10 @@
 llvm_unreachable("Bad evaluation kind in EmitBuiltinExpr");
   }
 
-  ErrorUnsupported(E, "builtin function");
+  if (getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice)
+return EmitStdParUnsupportedBuiltin(this, FD);
+  else
+ErrorUnsupported(E, "builtin function");
 
   // Unknown builtin, for now just dump it out and return undef.
   return GetUndefRValue(E->getType());
Index: clang/lib/CodeGen/BackendUtil.cpp
===
--- clang/lib/CodeGen/BackendUtil.cpp
+++ clang/lib/CodeGen/BackendUtil.cpp
@@ -77,6 +77,7 @@
 #include "llvm/Transfor

[PATCH] D155775: [Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-07-27 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 544974.
AlexVlx added a comment.

Exploit the fact that ROCm/AMDGPU does LTCG at the moment and for the 
foreseeable future by moving the accelerator code selection pass to later.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/HIPAMD.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/stdpar/stdpar_lib.hpp
  clang/test/Driver/stdpar.c

Index: clang/test/Driver/stdpar.c
===
--- /dev/null
+++ clang/test/Driver/stdpar.c
@@ -0,0 +1,18 @@
+// RUN: %clang -### -stdpar --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=STDPAR-MISSING-LIB %s
+// STDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--stdpar-path'
+
+// RUN: %clang -### --stdpar --stdpar-path=%S/Inputs/stdpar \
+// RUN:   --stdpar-thrust-path=%S/Inputs/stdpar/thrust \
+// RUN:   --stdpar-prim-path=%S/Inputs/stdpar/prim --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=STDPAR-COMPILE %s
+// STDPAR-COMPILE: "-x" "hip"
+// STDPAR-COMPILE: "-idirafter" "{{.*/thrust}}"
+// STDPAR-COMPILE: "-idirafter" "{{.*/prim}}"
+// STDPAR-COMPILE: "-idirafter" "{{.*/Inputs/stdpar}}"
+// STDPAR-COMPILE: "-include" "stdpar_lib.hpp"
+
+// RUN: touch %t.o
+// RUN: %clang -### -stdpar %t.o 2>&1 | FileCheck --check-prefix=STDPAR-LINK %s
+// STDPAR-LINK: "-rpath"
+// STDPAR-LINK: "-l{{.*hip.*}}"
Index: clang/lib/Driver/ToolChains/ROCm.h
===
--- clang/lib/Driver/ToolChains/ROCm.h
+++ clang/lib/Driver/ToolChains/ROCm.h
@@ -77,6 +77,9 @@
   const Driver &D;
   bool HasHIPRuntime = false;
   bool HasDeviceLibrary = false;
+  bool HasHIPStdParLibrary = false;
+  bool HasRocThrustLibrary = false;
+  bool HasRocPrimLibrary = false;
 
   // Default version if not detected or specified.
   const unsigned DefaultVersionMajor = 3;
@@ -96,6 +99,13 @@
   std::vector RocmDeviceLibPathArg;
   // HIP runtime path specified by --hip-path.
   StringRef HIPPathArg;
+  // HIP Standard Parallel Algorithm acceleration library specified by
+  // --stdpar-path
+  StringRef HIPStdParPathArg;
+  // rocThrust algorithm library specified by --stdpar-thrust-path
+  StringRef HIPRocThrustPathArg;
+  // rocPrim algorithm library specified by --stdpar-prim-path
+  StringRef HIPRocPrimPathArg;
   // HIP version specified by --hip-version.
   StringRef HIPVersionArg;
   // Wheter -nogpulib is specified.
@@ -180,6 +190,9 @@
   /// Check whether we detected a valid ROCm device library.
   bool hasDeviceLibrary() const { return HasDeviceLibrary; }
 
+  /// Check whether we detected a valid HIP STDPAR Acceleration library.
+  bool hasHIPStdParLibrary() const { return HasHIPStdParLibrary; }
+
   /// Print information about the detected ROCm installation.
   void print(raw_ostream &OS) const;
 
Index: clang/lib/Driver/ToolChains/HIPAMD.cpp
===
--- clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -115,6 +115,8 @@
 "--no-undefined",
 "-shared",
 "-plugin-opt=-amdgpu-internalize-symbols"};
+  if (Args.hasArg(options::OPT_stdpar))
+LldArgs.push_back("-plugin-opt=-amdgpu-enable-stdpar");
 
   auto &TC = getToolChain();
   auto &D = TC.getDriver();
@@ -246,6 +248,8 @@
   if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
   false))
 CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});
+  if (DriverArgs.hasArgNoClaim(options::OPT_stdpar))
+CC1Args.append({"-mllvm", "-amdgpu-enable-stdpar"});
 
   StringRef MaxThreadsPerBlock =
   DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ);
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6543,6 +6543,12 @@
 if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
  options::OPT_fno_gpu_allow_device_init, false))
   CmdArgs.push_back("-fgpu-allow-device-init");
+if (Args.hasArg(options::OPT_stdpar)) {
+  CmdArgs.push_back("-stdpar");
+
+  if (Args.hasArg(options::OPT_stdpar_interpose_alloc))
+CmdArgs.push_back("-stdpar-interpose-alloc");
+}
 Args.addOptInFlag(CmdArgs, options::OPT_fhip_kernel_arg_name,
   options::OPT_fno_hip_kernel_arg_name);
   }
Index: clang/lib/Driver/ToolChains/AMDGPU.cpp

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-08-10 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 549097.
AlexVlx added a comment.

Add support for handling certain cases of unambiguously accelerator unsupported 
ASM i.e. cases where constraints are clearly mismatched. When that happens, we 
instead emit an `ASM__stdpar_unsupported` stub which takes as its single 
argument the `constexpr` string value of the ASM block. Later, in the 
AcceleratorCodeSelection pass, if such a stub is reachable from an accelerator 
callable, we error out and print the offending ASM alongside the location.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenStdPar/unsupported-ASM.cpp
  clang/test/CodeGenStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --stdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__stdpar_unsupported()
Index: clang/test/CodeGenStdPar/unsupported-ASM.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unsupported-ASM.cpp
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --stdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo(int i) {
+asm ("addl %2, %1; seto %b0" : "=q" (i), "+g" (i) : "r" (i));
+}
+
+// CHECK: declare void @ASM__stdpar_unsupported([{{.*}}])
Index: clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-STDPAR-DEV %s
+
+// RUN: %clang_cc1 --stdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=STDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-STDPAR-DEV-NOT: define {{.*}} void @_Z3fooPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3fooPff({{.*}})
+void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+__device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3558,7 +3558,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2594,10 +2594,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in StdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2630,7 +2635,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsStdPar)
   CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
   << FD->getDeclName() << TargetDecl->getDeclName() << Mis

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-08-10 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 549101.
AlexVlx added a comment.

Fix typo.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenStdPar/unsupported-ASM.cpp
  clang/test/CodeGenStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --stdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__stdpar_unsupported()
Index: clang/test/CodeGenStdPar/unsupported-ASM.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unsupported-ASM.cpp
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --stdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo(int i) {
+asm ("addl %2, %1; seto %b0" : "=q" (i), "+g" (i) : "r" (i));
+}
+
+// CHECK: declare void @ASM__stdpar_unsupported([{{.*}}])
Index: clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-STDPAR-DEV %s
+
+// RUN: %clang_cc1 --stdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=STDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-STDPAR-DEV-NOT: define {{.*}} void @_Z3fooPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3fooPff({{.*}})
+void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+__device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3558,7 +3558,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2594,10 +2594,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in StdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2630,7 +2635,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsStdPar)
   CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
   << FD->getDeclName() << TargetDecl->getDeclName() << MissingFeature;
   } else if (!FD->isMultiVersion() && FD->hasAttr()) {
@@ -2639,7 +2644,8 @@
 
 for (const auto &F : CalleeFeatureMap) {
   if (F.getValue() && (!CallerFeatureMap.lookup(F.getKey()) ||
-   !CallerFeatureMap.find(F.getKey())->getValue()))
+   !CallerFeatureMap.find(F.getKey())->getValue()) &&
+  !IsStdPar)
 CGM.getDiags().Report(Loc, diag::err_function_needs_featu

[PATCH] D155833: [Clang][Sema][RFC] Add Sema support for C++ Parallel Algorithm Offload

2023-08-10 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 549113.
AlexVlx removed a reviewer: jdoerfert.
AlexVlx added a comment.

Remove noise / unintended file. Add support for dealing with unsupported ASM.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155833/new/

https://reviews.llvm.org/D155833

Files:
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/SemaStdPar/Inputs/stdpar_lib.hpp
  clang/test/SemaStdPar/device-can-call-host.cpp

Index: clang/test/SemaStdPar/device-can-call-host.cpp
===
--- /dev/null
+++ clang/test/SemaStdPar/device-can-call-host.cpp
@@ -0,0 +1,91 @@
+// RUN: %clang %s --stdpar --stdpar-path=%S/Inputs \
+// RUN:   --stdpar-thrust-path=%S/Inputs --stdpar-prim-path=%S/Inputs \
+// RUN:   --offload-device-only -emit-llvm -o /dev/null -Xclang -verify
+
+// Note: These would happen implicitly, within the implementation of the
+//   accelerator specific algorithm library, and not from user code.
+
+// Calls from the accelerator side to implicitly host (i.e. unannotated)
+// functions are fine.
+
+// expected-no-diagnostics
+
+extern "C" void host_fn() {}
+
+struct Dummy {};
+
+struct S {
+  S() {}
+  ~S() { host_fn(); }
+
+  int x;
+};
+
+struct T {
+  __device__ void hd() { host_fn(); }
+
+  __device__ void hd3();
+
+  void h() {}
+
+  void operator+();
+  void operator-(const T&) {}
+
+  operator Dummy() { return Dummy(); }
+};
+
+__device__ void T::hd3() { host_fn(); }
+
+template  __device__ void hd2() { host_fn(); }
+
+__global__ void kernel() { hd2(); }
+
+__device__ void hd() { host_fn(); }
+
+template  __device__ void hd3() { host_fn(); }
+__device__ void device_fn() { hd3(); }
+
+__device__ void local_var() {
+  S s;
+}
+
+__device__ void explicit_destructor(S *s) {
+  s->~S();
+}
+
+__device__ void hd_member_fn() {
+  T t;
+
+  t.hd();
+}
+
+__device__ void h_member_fn() {
+  T t;
+  t.h();
+}
+
+__device__ void unaryOp() {
+  T t;
+  (void) +t;
+}
+
+__device__ void binaryOp() {
+  T t;
+  (void) (t - t);
+}
+
+__device__ void implicitConversion() {
+  T t;
+  Dummy d = t;
+}
+
+template 
+struct TmplStruct {
+  template  __device__ void fn() {}
+};
+
+template <>
+template <>
+__device__ void TmplStruct::fn() { host_fn(); }
+
+__device__ void double_specialization() { TmplStruct().fn(); }
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -271,7 +271,8 @@
   OutputName = Names[i]->getName();
 
 TargetInfo::ConstraintInfo Info(Literal->getString(), OutputName);
-if (!Context.getTargetInfo().validateOutputConstraint(Info)) {
+if (!Context.getTargetInfo().validateOutputConstraint(Info) &&
+!(LangOpts.HIPStdPar && LangOpts.CUDAIsDevice)) {
   targetDiag(Literal->getBeginLoc(),
  diag::err_asm_invalid_output_constraint)
   << Info.getConstraintStr();
Index: clang/lib/Sema/SemaExpr.cpp
===
--- clang/lib/Sema/SemaExpr.cpp
+++ clang/lib/Sema/SemaExpr.cpp
@@ -19106,7 +19106,7 @@
   // Diagnose ODR-use of host global variables in device functions.
   // Reference of device global variables in host functions is allowed
   // through shadow variables therefore it is not diagnosed.
-  if (SemaRef.LangOpts.CUDAIsDevice) {
+  if (SemaRef.LangOpts.CUDAIsDevice && !SemaRef.LangOpts.HIPStdPar) {
 SemaRef.targetDiag(Loc, diag::err_ref_bad_target)
 << /*host*/ 2 << /*variable*/ 1 << Var << UserTarget;
 SemaRef.targetDiag(Var->getLocation(),
Index: clang/lib/Sema/SemaCUDA.cpp
===
--- clang/lib/Sema/SemaCUDA.cpp
+++ clang/lib/Sema/SemaCUDA.cpp
@@ -231,6 +231,15 @@
   (CallerTarget == CFT_Global && CalleeTarget == CFT_Device))
 return CFP_Native;
 
+  // StdPar mode is special, in that assessing whether a device side call to a
+  // host target is deferred to a subsequent pass, and cannot unambiguously be
+  // adjudicated in the AST, hence we optimistically allow them to pass here.
+  if (getLangOpts().HIPStdPar &&
+  (CallerTarget == CFT_Global || CallerTarget == CFT_Device ||
+   CallerTarget == CFT_HostDevice) &&
+  CalleeTarget == CFT_Host)
+return CFP_HostDevice;
+
   // (d) HostDevice behavior depends on compilation mode.
   if (CallerTarget == CFT_HostDevice) {
 // It's OK to call a compilation-mode matching function from an HD one.
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155833: [Clang][Sema][RFC] Add Sema support for C++ Parallel Algorithm Offload

2023-08-10 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 549117.
AlexVlx added a comment.

Fix typo.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155833/new/

https://reviews.llvm.org/D155833

Files:
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/SemaStdPar/Inputs/stdpar_lib.hpp
  clang/test/SemaStdPar/device-can-call-host.cpp

Index: clang/test/SemaStdPar/device-can-call-host.cpp
===
--- /dev/null
+++ clang/test/SemaStdPar/device-can-call-host.cpp
@@ -0,0 +1,91 @@
+// RUN: %clang %s --stdpar --stdpar-path=%S/Inputs \
+// RUN:   --stdpar-thrust-path=%S/Inputs --stdpar-prim-path=%S/Inputs \
+// RUN:   --offload-device-only -emit-llvm -o /dev/null -Xclang -verify
+
+// Note: These would happen implicitly, within the implementation of the
+//   accelerator specific algorithm library, and not from user code.
+
+// Calls from the accelerator side to implicitly host (i.e. unannotated)
+// functions are fine.
+
+// expected-no-diagnostics
+
+extern "C" void host_fn() {}
+
+struct Dummy {};
+
+struct S {
+  S() {}
+  ~S() { host_fn(); }
+
+  int x;
+};
+
+struct T {
+  __device__ void hd() { host_fn(); }
+
+  __device__ void hd3();
+
+  void h() {}
+
+  void operator+();
+  void operator-(const T&) {}
+
+  operator Dummy() { return Dummy(); }
+};
+
+__device__ void T::hd3() { host_fn(); }
+
+template  __device__ void hd2() { host_fn(); }
+
+__global__ void kernel() { hd2(); }
+
+__device__ void hd() { host_fn(); }
+
+template  __device__ void hd3() { host_fn(); }
+__device__ void device_fn() { hd3(); }
+
+__device__ void local_var() {
+  S s;
+}
+
+__device__ void explicit_destructor(S *s) {
+  s->~S();
+}
+
+__device__ void hd_member_fn() {
+  T t;
+
+  t.hd();
+}
+
+__device__ void h_member_fn() {
+  T t;
+  t.h();
+}
+
+__device__ void unaryOp() {
+  T t;
+  (void) +t;
+}
+
+__device__ void binaryOp() {
+  T t;
+  (void) (t - t);
+}
+
+__device__ void implicitConversion() {
+  T t;
+  Dummy d = t;
+}
+
+template 
+struct TmplStruct {
+  template  __device__ void fn() {}
+};
+
+template <>
+template <>
+__device__ void TmplStruct::fn() { host_fn(); }
+
+__device__ void double_specialization() { TmplStruct().fn(); }
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -271,7 +271,8 @@
   OutputName = Names[i]->getName();
 
 TargetInfo::ConstraintInfo Info(Literal->getString(), OutputName);
-if (!Context.getTargetInfo().validateOutputConstraint(Info)) {
+if (!Context.getTargetInfo().validateOutputConstraint(Info) &&
+!(LangOpts.HIPStdPar && LangOpts.CUDAIsDevice)) {
   targetDiag(Literal->getBeginLoc(),
  diag::err_asm_invalid_output_constraint)
   << Info.getConstraintStr();
Index: clang/lib/Sema/SemaExpr.cpp
===
--- clang/lib/Sema/SemaExpr.cpp
+++ clang/lib/Sema/SemaExpr.cpp
@@ -19106,7 +19106,7 @@
   // Diagnose ODR-use of host global variables in device functions.
   // Reference of device global variables in host functions is allowed
   // through shadow variables therefore it is not diagnosed.
-  if (SemaRef.LangOpts.CUDAIsDevice) {
+  if (SemaRef.LangOpts.CUDAIsDevice && !SemaRef.LangOpts.HIPStdPar) {
 SemaRef.targetDiag(Loc, diag::err_ref_bad_target)
 << /*host*/ 2 << /*variable*/ 1 << Var << UserTarget;
 SemaRef.targetDiag(Var->getLocation(),
Index: clang/lib/Sema/SemaCUDA.cpp
===
--- clang/lib/Sema/SemaCUDA.cpp
+++ clang/lib/Sema/SemaCUDA.cpp
@@ -231,6 +231,15 @@
   (CallerTarget == CFT_Global && CalleeTarget == CFT_Device))
 return CFP_Native;
 
+  // StdPar mode is special, in that assessing whether a device side call to a
+  // host target is deferred to a subsequent pass, and cannot unambiguously be
+  // adjudicated in the AST, hence we optimistically allow them to pass here.
+  if (getLangOpts().HIPStdPar &&
+  (CallerTarget == CFT_Global || CallerTarget == CFT_Device ||
+   CallerTarget == CFT_HostDevice) &&
+  CalleeTarget == CFT_Host)
+return CFP_HostDevice;
+
   // (d) HostDevice behavior depends on compilation mode.
   if (CallerTarget == CFT_HostDevice) {
 // It's OK to call a compilation-mode matching function from an HD one.
@@ -877,7 +886,7 @@
   if (!ShouldCheck || !Capture.isReferenceCapture())
 return;
   auto DiagKind = SemaDiagnosticBuilder::K_Deferred;
-  if (Capture.isVariableCapture()) {
+  if (Capture.isVariableCapture() && !getLangOpts().HIPStdPar) {
 SemaDiagnosticBuilder(DiagKind, Capture.getLocation(),
   diag::err_capture_bad_target, Callee, *this)
 << Capture.getVariable();
___
cfe-commits mailing lis

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-08-10 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 549159.
AlexVlx added a comment.

Switch to `__ASM` prefix.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenStdPar/unsupported-ASM.cpp
  clang/test/CodeGenStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --stdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__stdpar_unsupported()
Index: clang/test/CodeGenStdPar/unsupported-ASM.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unsupported-ASM.cpp
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --stdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo(int i) {
+asm ("addl %2, %1; seto %b0" : "=q" (i), "+g" (i) : "r" (i));
+}
+
+// CHECK: declare void @__ASM__stdpar_unsupported([{{.*}}])
Index: clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-STDPAR-DEV %s
+
+// RUN: %clang_cc1 --stdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=STDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-STDPAR-DEV-NOT: define {{.*}} void @_Z3fooPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3fooPff({{.*}})
+void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+__device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3558,7 +3558,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2594,10 +2594,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in StdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2630,7 +2635,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsStdPar)
   CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
   << FD->getDeclName() << TargetDecl->getDeclName() << MissingFeature;
   } else if (!FD->isMultiVersion() && FD->hasAttr()) {
@@ -2639,7 +2644,8 @@
 
 for (const auto &F : CalleeFeatureMap) {
   if (F.getValue() && (!CallerFeatureMap.lookup(F.getKey()) ||
-   !CallerFeatureMap.find(F.getKey())->getValue()))
+   !CallerFeatureMap.find(F.getKey())->getValue()) &&
+  !IsStdPar)
 CGM.getDiags().Report(Loc, diag::err_fu

[PATCH] D157452: [RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces

2023-08-15 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 550311.
AlexVlx added a comment.

Rework the patch as the proposed approach was unsound; keep `typeid` generic.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157452/new/

https://reviews.llvm.org/D157452

Files:
  clang/lib/CodeGen/CGExprCXX.cpp
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/typeid-cxx11-with-address-space.cpp
  clang/test/CodeGenCXX/typeid-with-address-space.cpp
  clang/test/CodeGenCXX/typeinfo
  clang/test/CodeGenCXX/typeinfo-with-address-space.cpp

Index: clang/test/CodeGenCXX/typeinfo-with-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/typeinfo-with-address-space.cpp
@@ -0,0 +1,48 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -o - | FileCheck %s -check-prefix=AS
+// RUN: %clang_cc1 -I%S %s -triple x86_64-linux-gnu -emit-llvm -o - | FileCheck %s -check-prefix=NO-AS
+#include 
+
+class A {
+virtual void f() = 0;
+};
+
+class B : A {
+void f() override;
+};
+
+// AS: @_ZTISt9type_info = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTISt9type_info = external constant ptr
+// AS: @_ZTIi = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTIi = external constant ptr
+// AS: @_ZTVN10__cxxabiv117__class_type_infoE = external addrspace(1) global ptr addrspace(1)
+// NO-AS: @_ZTVN10__cxxabiv117__class_type_infoE = external global ptr
+// AS: @_ZTS1A = linkonce_odr addrspace(1) constant [3 x i8] c"1A\00", comdat, align 1
+// NO-AS: @_ZTS1A = linkonce_odr constant [3 x i8] c"1A\00", comdat, align 1
+// AS: @_ZTI1A = linkonce_odr addrspace(1) constant { ptr addrspace(1), ptr addrspace(1) } { ptr addrspace(1) getelementptr inbounds (ptr addrspace(1), ptr addrspace(1) @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr addrspace(1) @_ZTS1A }, comdat, align 8
+// NO-AS: @_ZTI1A = linkonce_odr constant { ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr @_ZTS1A }, comdat, align 8
+// AS: @_ZTIf = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTIf = external constant ptr
+
+unsigned long Fn(B& b) {
+// AS: %call = call noundef zeroext i1 @_ZNKSt9type_infoeqERKS_(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTISt9type_info to ptr), ptr {{.*}} %2)
+// NO-AS: %call = call noundef zeroext i1 @_ZNKSt9type_infoeqERKS_(ptr {{.*}} @_ZTISt9type_info, ptr {{.*}} %2)
+if (typeid(std::type_info) == typeid(b))
+return 42;
+// AS: %call2 = call noundef zeroext i1 @_ZNKSt9type_infoneERKS_(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTIi to ptr), ptr {{.*}} %5)
+// NO-AS: %call2 = call noundef zeroext i1 @_ZNKSt9type_infoneERKS_(ptr {{.*}} @_ZTIi, ptr {{.*}} %5)
+if (typeid(int) != typeid(b))
+return 1712;
+// AS: %call5 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTI1A to ptr))
+// NO-AS: %call5 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} @_ZTI1A)
+// AS: %call7 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} %8)
+// NO-AS: %call7 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} %8)
+if (typeid(A).name() == typeid(b).name())
+return 0;
+// AS: %call11 = call noundef zeroext i1 @_ZNKSt9type_info6beforeERKS_(ptr {{.*}} %11, ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTIf to ptr))
+// NO-AS:   %call11 = call noundef zeroext i1 @_ZNKSt9type_info6beforeERKS_(ptr {{.*}} %11, ptr {{.*}} @_ZTIf)
+if (typeid(b).before(typeid(float)))
+return 1;
+// AS: %call15 = call noundef i64 @_ZNKSt9type_info9hash_codeEv(ptr {{.*}} %14)
+// NO-AS: %call15 = call noundef i64 @_ZNKSt9type_info9hash_codeEv(ptr {{.*}} %14)
+return typeid(b).hash_code();
+}
Index: clang/test/CodeGenCXX/typeinfo
===
--- clang/test/CodeGenCXX/typeinfo
+++ clang/test/CodeGenCXX/typeinfo
@@ -10,6 +10,14 @@
 bool operator!=(const type_info& __arg) const {
   return !operator==(__arg);
 }
+
+bool before(const type_info& __arg) const {
+  return __name < __arg.__name;
+}
+
+unsigned long hash_code() const {
+  return reinterpret_cast(__name);
+}
   protected:
 const char *__name;
   };
Index: clang/test/CodeGenCXX/typeid-with-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/typeid-with-address-space.cpp
@@ -0,0 +1,50 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions -fexceptions -o - | FileCheck %s
+#include 
+
+namespace Test1 {
+
+// PR7400
+struct A { virtual void f(); };
+
+// CHECK: @_ZN5Test16int_tiE ={{.*}} constant ptr addrspacecast (ptr addrspace(1) @_ZTIi to ptr), align 8
+const std::type_info &int_ti = typeid(int);
+
+// CHECK: @_ZN5Test14A_tiE ={{.*}} constant ptr addrspacecast (ptr addrspace(1) @_ZTIN5Test11AE to ptr), align 8
+const std::type_info &A_ti = typ

[PATCH] D157452: [RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces

2023-08-15 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D157452#4586554 , @rjmccall wrote:

> The path of least resistance here is that IRGen should just insert 
> conversions from the global AS to the default AS as part of evaluating 
> `typeid`.  I haven't looked at it closely, but that seems to be what this 
> patch is doing.
>
> However, `std::type_info` is an interesting special case in that we actually 
> know statically that all objects of that type will be allocated in the global 
> AS, so there's really no reason to pass around pointers in the default AS; 
> `std::type_info *` should just default to being in the global AS.  It'd be a 
> non-trivial feature in support of a somewhat uncommonly-used C++ feature, and 
> I can't tell how best to spend your time, *but*... if you're so inclined, I 
> think it would make a somewhat compelling feature to be able to declare a 
> default AS for a class type, which your target could then adopt in the 
> headers for `std::type_info`.

I've reworked things along these lines, as both you and @yaxunl, thank you; 
turns out that what I was doing was unsound / would not catch all cases, 
whereas this does. As for the feature suggestion, that actually seems as if it 
would be very useful, beyond this application; I will add it to my TODO list, 
although I cannot promise to get to investigating it right away.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157452/new/

https://reviews.llvm.org/D157452

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D157452: [RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces

2023-08-15 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:5237-5238
+
+  if (VTy->isPointerTy() &&
+  VTy->getPointerAddressSpace() != IRTy->getPointerAddressSpace()) 
{
+// In the case of targets that use a non-default address space for

arsenm wrote:
> you can also just unconditionally call CreateAddrSpaceCast and let it no-op 
> if the types match
I would've if I could've:) Sadly, CastIsValid returns false for address space 
casts between the same AS: 
https://github.com/llvm/llvm-project/blob/ca68a7f956f24aa3882134c5d8e72659355292dc/llvm/lib/IR/Instructions.cpp#L3895,
 and this makes `assert`s flare. I'm not sure if that is vestigial or just 
overly cautious behaviour.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157452/new/

https://reviews.llvm.org/D157452

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D157917: clang/HIP: Use abs builtins instead of implementing them

2023-08-15 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx accepted this revision.
AlexVlx added a comment.
This revision is now accepted and ready to land.

LGTM, thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157917/new/

https://reviews.llvm.org/D157917

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155833: [Clang][Sema][RFC] Add Sema support for C++ Parallel Algorithm Offload

2023-08-15 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 550391.
AlexVlx added a comment.

Update test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155833/new/

https://reviews.llvm.org/D155833

Files:
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/SemaStdPar/Inputs/stdpar_lib.hpp
  clang/test/SemaStdPar/device-can-call-host.cpp

Index: clang/test/SemaStdPar/device-can-call-host.cpp
===
--- /dev/null
+++ clang/test/SemaStdPar/device-can-call-host.cpp
@@ -0,0 +1,93 @@
+// RUN: %clang_cc1 -x hip %s -stdpar -triple amdgcn-amd-amdhsa --std=c++17 \
+// RUN:   -fcuda-is-device -emit-llvm -o /dev/null -verify
+
+// Note: These would happen implicitly, within the implementation of the
+//   accelerator specific algorithm library, and not from user code.
+
+// Calls from the accelerator side to implicitly host (i.e. unannotated)
+// functions are fine.
+
+// expected-no-diagnostics
+
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+
+extern "C" void host_fn() {}
+
+struct Dummy {};
+
+struct S {
+  S() {}
+  ~S() { host_fn(); }
+
+  int x;
+};
+
+struct T {
+  __device__ void hd() { host_fn(); }
+
+  __device__ void hd3();
+
+  void h() {}
+
+  void operator+();
+  void operator-(const T&) {}
+
+  operator Dummy() { return Dummy(); }
+};
+
+__device__ void T::hd3() { host_fn(); }
+
+template  __device__ void hd2() { host_fn(); }
+
+__global__ void kernel() { hd2(); }
+
+__device__ void hd() { host_fn(); }
+
+template  __device__ void hd3() { host_fn(); }
+__device__ void device_fn() { hd3(); }
+
+__device__ void local_var() {
+  S s;
+}
+
+__device__ void explicit_destructor(S *s) {
+  s->~S();
+}
+
+__device__ void hd_member_fn() {
+  T t;
+
+  t.hd();
+}
+
+__device__ void h_member_fn() {
+  T t;
+  t.h();
+}
+
+__device__ void unaryOp() {
+  T t;
+  (void) +t;
+}
+
+__device__ void binaryOp() {
+  T t;
+  (void) (t - t);
+}
+
+__device__ void implicitConversion() {
+  T t;
+  Dummy d = t;
+}
+
+template 
+struct TmplStruct {
+  template  __device__ void fn() {}
+};
+
+template <>
+template <>
+__device__ void TmplStruct::fn() { host_fn(); }
+
+__device__ void double_specialization() { TmplStruct().fn(); }
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -271,7 +271,8 @@
   OutputName = Names[i]->getName();
 
 TargetInfo::ConstraintInfo Info(Literal->getString(), OutputName);
-if (!Context.getTargetInfo().validateOutputConstraint(Info)) {
+if (!Context.getTargetInfo().validateOutputConstraint(Info) &&
+!(LangOpts.HIPStdPar && LangOpts.CUDAIsDevice)) {
   targetDiag(Literal->getBeginLoc(),
  diag::err_asm_invalid_output_constraint)
   << Info.getConstraintStr();
Index: clang/lib/Sema/SemaExpr.cpp
===
--- clang/lib/Sema/SemaExpr.cpp
+++ clang/lib/Sema/SemaExpr.cpp
@@ -19110,7 +19110,7 @@
   // Diagnose ODR-use of host global variables in device functions.
   // Reference of device global variables in host functions is allowed
   // through shadow variables therefore it is not diagnosed.
-  if (SemaRef.LangOpts.CUDAIsDevice) {
+  if (SemaRef.LangOpts.CUDAIsDevice && !SemaRef.LangOpts.HIPStdPar) {
 SemaRef.targetDiag(Loc, diag::err_ref_bad_target)
 << /*host*/ 2 << /*variable*/ 1 << Var << UserTarget;
 SemaRef.targetDiag(Var->getLocation(),
Index: clang/lib/Sema/SemaCUDA.cpp
===
--- clang/lib/Sema/SemaCUDA.cpp
+++ clang/lib/Sema/SemaCUDA.cpp
@@ -231,6 +231,15 @@
   (CallerTarget == CFT_Global && CalleeTarget == CFT_Device))
 return CFP_Native;
 
+  // StdPar mode is special, in that assessing whether a device side call to a
+  // host target is deferred to a subsequent pass, and cannot unambiguously be
+  // adjudicated in the AST, hence we optimistically allow them to pass here.
+  if (getLangOpts().HIPStdPar &&
+  (CallerTarget == CFT_Global || CallerTarget == CFT_Device ||
+   CallerTarget == CFT_HostDevice) &&
+  CalleeTarget == CFT_Host)
+return CFP_HostDevice;
+
   // (d) HostDevice behavior depends on compilation mode.
   if (CallerTarget == CFT_HostDevice) {
 // It's OK to call a compilation-mode matching function from an HD one.
@@ -877,7 +886,7 @@
   if (!ShouldCheck || !Capture.isReferenceCapture())
 return;
   auto DiagKind = SemaDiagnosticBuilder::K_Deferred;
-  if (Capture.isVariableCapture()) {
+  if (Capture.isVariableCapture() && !getLangOpts().HIPStdPar) {
 SemaDiagnosticBuilder(DiagKind, Capture.getLocation(),
   diag::err_capture_bad_target, Callee, *this)
 << Capture.getVariable();

[PATCH] D157452: [RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces

2023-08-15 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 550516.
AlexVlx added a comment.

Remove unneeded cast, the dynamic case already emitted a generic pointer to 
`typeinfo`


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157452/new/

https://reviews.llvm.org/D157452

Files:
  clang/lib/CodeGen/CGExprCXX.cpp
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/typeid-cxx11-with-address-space.cpp
  clang/test/CodeGenCXX/typeid-with-address-space.cpp
  clang/test/CodeGenCXX/typeinfo
  clang/test/CodeGenCXX/typeinfo-with-address-space.cpp

Index: clang/test/CodeGenCXX/typeinfo-with-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/typeinfo-with-address-space.cpp
@@ -0,0 +1,48 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -o - | FileCheck %s -check-prefix=AS
+// RUN: %clang_cc1 -I%S %s -triple x86_64-linux-gnu -emit-llvm -o - | FileCheck %s -check-prefix=NO-AS
+#include 
+
+class A {
+virtual void f() = 0;
+};
+
+class B : A {
+void f() override;
+};
+
+// AS: @_ZTISt9type_info = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTISt9type_info = external constant ptr
+// AS: @_ZTIi = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTIi = external constant ptr
+// AS: @_ZTVN10__cxxabiv117__class_type_infoE = external addrspace(1) global ptr addrspace(1)
+// NO-AS: @_ZTVN10__cxxabiv117__class_type_infoE = external global ptr
+// AS: @_ZTS1A = linkonce_odr addrspace(1) constant [3 x i8] c"1A\00", comdat, align 1
+// NO-AS: @_ZTS1A = linkonce_odr constant [3 x i8] c"1A\00", comdat, align 1
+// AS: @_ZTI1A = linkonce_odr addrspace(1) constant { ptr addrspace(1), ptr addrspace(1) } { ptr addrspace(1) getelementptr inbounds (ptr addrspace(1), ptr addrspace(1) @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr addrspace(1) @_ZTS1A }, comdat, align 8
+// NO-AS: @_ZTI1A = linkonce_odr constant { ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr @_ZTS1A }, comdat, align 8
+// AS: @_ZTIf = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTIf = external constant ptr
+
+unsigned long Fn(B& b) {
+// AS: %call = call noundef zeroext i1 @_ZNKSt9type_infoeqERKS_(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTISt9type_info to ptr), ptr {{.*}} %2)
+// NO-AS: %call = call noundef zeroext i1 @_ZNKSt9type_infoeqERKS_(ptr {{.*}} @_ZTISt9type_info, ptr {{.*}} %2)
+if (typeid(std::type_info) == typeid(b))
+return 42;
+// AS: %call2 = call noundef zeroext i1 @_ZNKSt9type_infoneERKS_(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTIi to ptr), ptr {{.*}} %5)
+// NO-AS: %call2 = call noundef zeroext i1 @_ZNKSt9type_infoneERKS_(ptr {{.*}} @_ZTIi, ptr {{.*}} %5)
+if (typeid(int) != typeid(b))
+return 1712;
+// AS: %call5 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTI1A to ptr))
+// NO-AS: %call5 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} @_ZTI1A)
+// AS: %call7 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} %8)
+// NO-AS: %call7 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} %8)
+if (typeid(A).name() == typeid(b).name())
+return 0;
+// AS: %call11 = call noundef zeroext i1 @_ZNKSt9type_info6beforeERKS_(ptr {{.*}} %11, ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTIf to ptr))
+// NO-AS:   %call11 = call noundef zeroext i1 @_ZNKSt9type_info6beforeERKS_(ptr {{.*}} %11, ptr {{.*}} @_ZTIf)
+if (typeid(b).before(typeid(float)))
+return 1;
+// AS: %call15 = call noundef i64 @_ZNKSt9type_info9hash_codeEv(ptr {{.*}} %14)
+// NO-AS: %call15 = call noundef i64 @_ZNKSt9type_info9hash_codeEv(ptr {{.*}} %14)
+return typeid(b).hash_code();
+}
Index: clang/test/CodeGenCXX/typeinfo
===
--- clang/test/CodeGenCXX/typeinfo
+++ clang/test/CodeGenCXX/typeinfo
@@ -10,6 +10,14 @@
 bool operator!=(const type_info& __arg) const {
   return !operator==(__arg);
 }
+
+bool before(const type_info& __arg) const {
+  return __name < __arg.__name;
+}
+
+unsigned long hash_code() const {
+  return reinterpret_cast(__name);
+}
   protected:
 const char *__name;
   };
Index: clang/test/CodeGenCXX/typeid-with-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/typeid-with-address-space.cpp
@@ -0,0 +1,50 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions -fexceptions -o - | FileCheck %s
+#include 
+
+namespace Test1 {
+
+// PR7400
+struct A { virtual void f(); };
+
+// CHECK: @_ZN5Test16int_tiE ={{.*}} constant ptr addrspacecast (ptr addrspace(1) @_ZTIi to ptr), align 8
+const std::type_info &int_ti = typeid(int);
+
+// CHECK: @_ZN5Test14A_tiE ={{.*}} constant ptr addrspacecast (ptr addrspace(1) @_ZTIN5Test11AE to ptr), align 8
+const std::type_info &

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-22 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 552550.
AlexVlx marked an inline comment as done.
AlexVlx retitled this revision from "[Clang][docs][RFC] Add documentation for 
C++ Parallel Algorithm Offload" to "[HIP][Clang][docs][RFC] Add documentation 
for C++ Parallel Algorithm Offload".
AlexVlx removed reviewers: bader, Anastasia, erichkeane.
AlexVlx added a comment.
Herald added a reviewer: jdoerfert.
Herald added subscribers: jplehr, sstefan1.

Updating this to reflect the outcome of the RFC, which is that this shall be a 
HIP only extension. As such, documentation lives within the HIP Support master 
document; the other patches in the series will be updated accordingly.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst

Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,387 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arcj``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``, we rely on a
+pass over IR to clean up any and a

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-08-22 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 552556.
AlexVlx retitled this revision from "[Clang][Driver][RFC] Add driver support 
for C++ Parallel Algorithm Offload" to "[HIP][Clang][Driver][RFC] Add driver 
support for C++ Parallel Algorithm Offload".
AlexVlx added a comment.

Updating this to reflect the outcome of the RFC, which is that this will be 
added as a HIP extension exclusively.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/HIPAMD.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/hipstdpar/hipstdpar_lib.hpp
  clang/test/Driver/hipstdpar.c

Index: clang/test/Driver/hipstdpar.c
===
--- /dev/null
+++ clang/test/Driver/hipstdpar.c
@@ -0,0 +1,17 @@
+// RUN: not %clang -### -hipstdpar --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-MISSING-LIB %s
+// RUN: %clang -### --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \
+// RUN:   --hipstdpar-thrust-path=%S/Inputs/hipstdpar/thrust \
+// RUN:   --hipstdpar-prim-path=%S/Inputs/hipstdpar/rocprim --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-COMPILE %s
+// RUN: touch %t.o
+// RUN: %clang -### -hipstdpar %t.o 2>&1 | FileCheck --check-prefix=HIPSTDPAR-LINK %s
+
+// HIPSTDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--hipstdpar-path'
+// HIPSTDPAR-COMPILE: "-x" "hip"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/thrust}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/rocprim}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/Inputs/hipstdpar}}"
+// HIPSTDPAR-COMPILE: "-include" "hipstdpar_lib.hpp"
+// HIPSTDPAR-LINK: "-rpath"
+// HIPSTDPAR-LINK: "-l{{.*hip.*}}"
Index: clang/lib/Driver/ToolChains/ROCm.h
===
--- clang/lib/Driver/ToolChains/ROCm.h
+++ clang/lib/Driver/ToolChains/ROCm.h
@@ -77,6 +77,9 @@
   const Driver &D;
   bool HasHIPRuntime = false;
   bool HasDeviceLibrary = false;
+  bool HasHIPStdParLibrary = false;
+  bool HasRocThrustLibrary = false;
+  bool HasRocPrimLibrary = false;
 
   // Default version if not detected or specified.
   const unsigned DefaultVersionMajor = 3;
@@ -96,6 +99,13 @@
   std::vector RocmDeviceLibPathArg;
   // HIP runtime path specified by --hip-path.
   StringRef HIPPathArg;
+  // HIP Standard Parallel Algorithm acceleration library specified by
+  // --hipstdpar-path
+  StringRef HIPStdParPathArg;
+  // rocThrust algorithm library specified by --hipstdpar-thrust-path
+  StringRef HIPRocThrustPathArg;
+  // rocPrim algorithm library specified by --hipstdpar-prim-path
+  StringRef HIPRocPrimPathArg;
   // HIP version specified by --hip-version.
   StringRef HIPVersionArg;
   // Wheter -nogpulib is specified.
@@ -180,6 +190,9 @@
   /// Check whether we detected a valid ROCm device library.
   bool hasDeviceLibrary() const { return HasDeviceLibrary; }
 
+  /// Check whether we detected a valid HIP STDPAR Acceleration library.
+  bool hasHIPStdParLibrary() const { return HasHIPStdParLibrary; }
+
   /// Print information about the detected ROCm installation.
   void print(raw_ostream &OS) const;
 
Index: clang/lib/Driver/ToolChains/HIPAMD.cpp
===
--- clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -115,6 +115,8 @@
 "--no-undefined",
 "-shared",
 "-plugin-opt=-amdgpu-internalize-symbols"};
+  if (Args.hasArg(options::OPT_hipstdpar))
+LldArgs.push_back("-plugin-opt=-amdgpu-enable-hipstdpar");
 
   auto &TC = getToolChain();
   auto &D = TC.getDriver();
@@ -246,6 +248,8 @@
   if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
   false))
 CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});
+  if (DriverArgs.hasArgNoClaim(options::OPT_hipstdpar))
+CC1Args.append({"-mllvm", "-amdgpu-enable-hipstdpar"});
 
   StringRef MaxThreadsPerBlock =
   DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ);
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6552,6 +6552,12 @@
 if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
  options::OPT_fno_gpu_allow_device_init, false))
   CmdArgs.push_back("-fgpu-allow-device-init");
+if (Args.hasArg(options::OPT_hipstdpar)) {
+  CmdArgs.push_back("-hipstdpar");
+
+  if (Args.hasArg(options::OPT_hipstdpar_inte

[PATCH] D155826: [HIP][Clang][Preprocessor][RFC] Add preprocessor support for C++ Parallel Algorithm Offload

2023-08-22 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 552561.
AlexVlx retitled this revision from "[Clang][Preprocessor][RFC] Add 
preprocessor support for C++ Parallel Algorithm Offload" to 
"[HIP][Clang][Preprocessor][RFC] Add preprocessor support for C++ Parallel 
Algorithm Offload".
AlexVlx added a comment.

Updating this to reflect the outcome of the RFC, which is that this will be 
added as a HIP extension exclusively.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155826/new/

https://reviews.llvm.org/D155826

Files:
  clang/lib/Frontend/InitPreprocessor.cpp
  clang/test/Preprocessor/predefined-macros.c


Index: clang/test/Preprocessor/predefined-macros.c
===
--- clang/test/Preprocessor/predefined-macros.c
+++ clang/test/Preprocessor/predefined-macros.c
@@ -290,3 +290,19 @@
 // RUN:   -fcuda-is-device -fgpu-default-stream=per-thread \
 // RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-PTH
 // CHECK-PTH: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -triple 
x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-HIPSTDPAR
+// CHECK-HIPSTDPAR: #define __HIPSTDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -hipstdpar-interpose-alloc 
\
+// RUN:  -triple x86_64-unknown-linux-gnu | FileCheck -match-full-lines %s \
+// RUN:  --check-prefix=CHECK-HIPSTDPAR-INTERPOSE
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -hipstdpar-interpose-alloc 
\
+// RUN:  -triple amdgcn-amd-amdhsa -fcuda-is-device | FileCheck 
-match-full-lines \
+// RUN:  %s --check-prefix=CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG-NOT: #define 
__HIPSTDPAR_INTERPOSE_ALLOC__ 1
\ No newline at end of file
Index: clang/lib/Frontend/InitPreprocessor.cpp
===
--- clang/lib/Frontend/InitPreprocessor.cpp
+++ clang/lib/Frontend/InitPreprocessor.cpp
@@ -585,6 +585,11 @@
 Builder.defineMacro("__HIP_MEMORY_SCOPE_WORKGROUP", "3");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_AGENT", "4");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_SYSTEM", "5");
+if (LangOpts.HIPStdPar) {
+  Builder.defineMacro("__HIPSTDPAR__");
+  if (!LangOpts.CUDAIsDevice)
+Builder.defineMacro("__HIPSTDPAR_INTERPOSE_ALLOC__");
+}
 if (LangOpts.CUDAIsDevice) {
   Builder.defineMacro("__HIP_DEVICE_COMPILE__");
   if (!TI.hasHIPImageSupport()) {


Index: clang/test/Preprocessor/predefined-macros.c
===
--- clang/test/Preprocessor/predefined-macros.c
+++ clang/test/Preprocessor/predefined-macros.c
@@ -290,3 +290,19 @@
 // RUN:   -fcuda-is-device -fgpu-default-stream=per-thread \
 // RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-PTH
 // CHECK-PTH: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-HIPSTDPAR
+// CHECK-HIPSTDPAR: #define __HIPSTDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -hipstdpar-interpose-alloc \
+// RUN:  -triple x86_64-unknown-linux-gnu | FileCheck -match-full-lines %s \
+// RUN:  --check-prefix=CHECK-HIPSTDPAR-INTERPOSE
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -hipstdpar-interpose-alloc \
+// RUN:  -triple amdgcn-amd-amdhsa -fcuda-is-device | FileCheck -match-full-lines \
+// RUN:  %s --check-prefix=CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG-NOT: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
\ No newline at end of file
Index: clang/lib/Frontend/InitPreprocessor.cpp
===
--- clang/lib/Frontend/InitPreprocessor.cpp
+++ clang/lib/Frontend/InitPreprocessor.cpp
@@ -585,6 +585,11 @@
 Builder.defineMacro("__HIP_MEMORY_SCOPE_WORKGROUP", "3");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_AGENT", "4");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_SYSTEM", "5");
+if (LangOpts.HIPStdPar) {
+  Builder.defineMacro("__HIPSTDPAR__");
+  if (!LangOpts.CUDAIsDevice)
+Builder.defineMacro("__HIPSTDPAR_INTERPOSE_ALLOC__");
+}
 if (LangOpts.CUDAIsDevice) {
   Builder.defineMacro("__HIP_DEVICE_COMPILE__");
   if (!TI.hasHIPImageSupport()) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155833: [HIP][Clang][Sema][RFC] Add Sema support for C++ Parallel Algorithm Offload

2023-08-22 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 552568.
AlexVlx retitled this revision from "[Clang][Sema][RFC] Add Sema support for 
C++ Parallel Algorithm Offload" to "[HIP][Clang][Sema][RFC] Add Sema support 
for C++ Parallel Algorithm Offload".
AlexVlx added a comment.

Updating this to reflect the outcome of the RFC, which is that it will be added 
as a HIP extension exclusively.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155833/new/

https://reviews.llvm.org/D155833

Files:
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/SemaHipStdPar/Inputs/hipstdpar_lib.hpp
  clang/test/SemaHipStdPar/device-can-call-host.cpp

Index: clang/test/SemaHipStdPar/device-can-call-host.cpp
===
--- /dev/null
+++ clang/test/SemaHipStdPar/device-can-call-host.cpp
@@ -0,0 +1,93 @@
+// RUN: %clang_cc1 -x hip %s -hipstdpar -triple amdgcn-amd-amdhsa --std=c++17 \
+// RUN:   -fcuda-is-device -emit-llvm -o /dev/null -verify
+
+// Note: These would happen implicitly, within the implementation of the
+//   accelerator specific algorithm library, and not from user code.
+
+// Calls from the accelerator side to implicitly host (i.e. unannotated)
+// functions are fine.
+
+// expected-no-diagnostics
+
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+
+extern "C" void host_fn() {}
+
+struct Dummy {};
+
+struct S {
+  S() {}
+  ~S() { host_fn(); }
+
+  int x;
+};
+
+struct T {
+  __device__ void hd() { host_fn(); }
+
+  __device__ void hd3();
+
+  void h() {}
+
+  void operator+();
+  void operator-(const T&) {}
+
+  operator Dummy() { return Dummy(); }
+};
+
+__device__ void T::hd3() { host_fn(); }
+
+template  __device__ void hd2() { host_fn(); }
+
+__global__ void kernel() { hd2(); }
+
+__device__ void hd() { host_fn(); }
+
+template  __device__ void hd3() { host_fn(); }
+__device__ void device_fn() { hd3(); }
+
+__device__ void local_var() {
+  S s;
+}
+
+__device__ void explicit_destructor(S *s) {
+  s->~S();
+}
+
+__device__ void hd_member_fn() {
+  T t;
+
+  t.hd();
+}
+
+__device__ void h_member_fn() {
+  T t;
+  t.h();
+}
+
+__device__ void unaryOp() {
+  T t;
+  (void) +t;
+}
+
+__device__ void binaryOp() {
+  T t;
+  (void) (t - t);
+}
+
+__device__ void implicitConversion() {
+  T t;
+  Dummy d = t;
+}
+
+template 
+struct TmplStruct {
+  template  __device__ void fn() {}
+};
+
+template <>
+template <>
+__device__ void TmplStruct::fn() { host_fn(); }
+
+__device__ void double_specialization() { TmplStruct().fn(); }
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -271,7 +271,8 @@
   OutputName = Names[i]->getName();
 
 TargetInfo::ConstraintInfo Info(Literal->getString(), OutputName);
-if (!Context.getTargetInfo().validateOutputConstraint(Info)) {
+if (!Context.getTargetInfo().validateOutputConstraint(Info) &&
+!(LangOpts.HIPStdPar && LangOpts.CUDAIsDevice)) {
   targetDiag(Literal->getBeginLoc(),
  diag::err_asm_invalid_output_constraint)
   << Info.getConstraintStr();
Index: clang/lib/Sema/SemaExpr.cpp
===
--- clang/lib/Sema/SemaExpr.cpp
+++ clang/lib/Sema/SemaExpr.cpp
@@ -19109,7 +19109,7 @@
   // Diagnose ODR-use of host global variables in device functions.
   // Reference of device global variables in host functions is allowed
   // through shadow variables therefore it is not diagnosed.
-  if (SemaRef.LangOpts.CUDAIsDevice) {
+  if (SemaRef.LangOpts.CUDAIsDevice && !SemaRef.LangOpts.HIPStdPar) {
 SemaRef.targetDiag(Loc, diag::err_ref_bad_target)
 << /*host*/ 2 << /*variable*/ 1 << Var << UserTarget;
 SemaRef.targetDiag(Var->getLocation(),
Index: clang/lib/Sema/SemaCUDA.cpp
===
--- clang/lib/Sema/SemaCUDA.cpp
+++ clang/lib/Sema/SemaCUDA.cpp
@@ -231,6 +231,15 @@
   (CallerTarget == CFT_Global && CalleeTarget == CFT_Device))
 return CFP_Native;
 
+  // HipStdPar mode is special, in that assessing whether a device side call to
+  // a host target is deferred to a subsequent pass, and cannot unambiguously be
+  // adjudicated in the AST, hence we optimistically allow them to pass here.
+  if (getLangOpts().HIPStdPar &&
+  (CallerTarget == CFT_Global || CallerTarget == CFT_Device ||
+   CallerTarget == CFT_HostDevice) &&
+  CalleeTarget == CFT_Host)
+return CFP_HostDevice;
+
   // (d) HostDevice behavior depends on compilation mode.
   if (CallerTarget == CFT_HostDevice) {
 // It's OK to call a compilation-mode matching function from an HD one.
@@ -877,7 +886,7 @@
   if (!ShouldCheck || !Capture.isReferenceCapture())
 return;
   auto DiagKind = SemaDiagnosticBuilder::K_Deferre

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-08-22 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 552575.
AlexVlx edited the summary of this revision.
AlexVlx added a comment.

Updating to reflect the outcome of the RFC, which is that this will be added as 
a HIP extension exclusively.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
  clang/test/CodeGenHipStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__hipstdpar_unsupported()
Index: clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo(int i) {
+asm ("addl %2, %1; seto %b0" : "=q" (i), "+g" (i) : "r" (i));
+}
+
+// CHECK: declare void @__ASM__hipstdpar_unsupported([{{.*}}])
Index: clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-HIPSTDPAR-DEV %s
+
+// RUN: %clang_cc1 --hipstdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=HIPSTDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-HIPSTDPAR-DEV-NOT: define {{.*}} void @_Z3fooPff({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @_Z3fooPff({{.*}})
+void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-HIPSTDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+__device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3558,7 +3558,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2594,10 +2594,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in HipStdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsHipStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsHipStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2630,7 +2635,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsHipStdPar)
   CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
   << FD->getDeclName() << TargetDecl->getDeclName() << MissingFeature;
   } else if (!FD->isMultiVersion() && FD->hasAttr()) {
@@ -2639,7 +2644,8 @@
 
 for (const auto &F : CalleeFeatureMap) {
   if (F.getValue() && (!CallerFeatureMap.lookup(F.getKey()) ||
-

[PATCH] D155775: [HIP][Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-08-23 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 552738.
AlexVlx added a comment.

Use `--hipstdpar` instead of `-hipstdpar` in the test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/HIPAMD.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/hipstdpar/hipstdpar_lib.hpp
  clang/test/Driver/hipstdpar.c

Index: clang/test/Driver/hipstdpar.c
===
--- /dev/null
+++ clang/test/Driver/hipstdpar.c
@@ -0,0 +1,17 @@
+// RUN: not %clang -### --hipstdpar --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-MISSING-LIB %s
+// RUN: %clang -### --hipstdpar --hipstdpar-path=%S/Inputs/hipstdpar \
+// RUN:   --hipstdpar-thrust-path=%S/Inputs/hipstdpar/thrust \
+// RUN:   --hipstdpar-prim-path=%S/Inputs/hipstdpar/rocprim --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=HIPSTDPAR-COMPILE %s
+// RUN: touch %t.o
+// RUN: %clang -### --hipstdpar %t.o 2>&1 | FileCheck --check-prefix=HIPSTDPAR-LINK %s
+
+// HIPSTDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--hipstdpar-path'
+// HIPSTDPAR-COMPILE: "-x" "hip"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/thrust}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/rocprim}}"
+// HIPSTDPAR-COMPILE: "-idirafter" "{{.*/Inputs/hipstdpar}}"
+// HIPSTDPAR-COMPILE: "-include" "hipstdpar_lib.hpp"
+// HIPSTDPAR-LINK: "-rpath"
+// HIPSTDPAR-LINK: "-l{{.*hip.*}}"
Index: clang/lib/Driver/ToolChains/ROCm.h
===
--- clang/lib/Driver/ToolChains/ROCm.h
+++ clang/lib/Driver/ToolChains/ROCm.h
@@ -77,6 +77,9 @@
   const Driver &D;
   bool HasHIPRuntime = false;
   bool HasDeviceLibrary = false;
+  bool HasHIPStdParLibrary = false;
+  bool HasRocThrustLibrary = false;
+  bool HasRocPrimLibrary = false;
 
   // Default version if not detected or specified.
   const unsigned DefaultVersionMajor = 3;
@@ -96,6 +99,13 @@
   std::vector RocmDeviceLibPathArg;
   // HIP runtime path specified by --hip-path.
   StringRef HIPPathArg;
+  // HIP Standard Parallel Algorithm acceleration library specified by
+  // --hipstdpar-path
+  StringRef HIPStdParPathArg;
+  // rocThrust algorithm library specified by --hipstdpar-thrust-path
+  StringRef HIPRocThrustPathArg;
+  // rocPrim algorithm library specified by --hipstdpar-prim-path
+  StringRef HIPRocPrimPathArg;
   // HIP version specified by --hip-version.
   StringRef HIPVersionArg;
   // Wheter -nogpulib is specified.
@@ -180,6 +190,9 @@
   /// Check whether we detected a valid ROCm device library.
   bool hasDeviceLibrary() const { return HasDeviceLibrary; }
 
+  /// Check whether we detected a valid HIP STDPAR Acceleration library.
+  bool hasHIPStdParLibrary() const { return HasHIPStdParLibrary; }
+
   /// Print information about the detected ROCm installation.
   void print(raw_ostream &OS) const;
 
Index: clang/lib/Driver/ToolChains/HIPAMD.cpp
===
--- clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -115,6 +115,8 @@
 "--no-undefined",
 "-shared",
 "-plugin-opt=-amdgpu-internalize-symbols"};
+  if (Args.hasArg(options::OPT_hipstdpar))
+LldArgs.push_back("-plugin-opt=-amdgpu-enable-hipstdpar");
 
   auto &TC = getToolChain();
   auto &D = TC.getDriver();
@@ -246,6 +248,8 @@
   if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
   false))
 CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});
+  if (DriverArgs.hasArgNoClaim(options::OPT_hipstdpar))
+CC1Args.append({"-mllvm", "-amdgpu-enable-hipstdpar"});
 
   StringRef MaxThreadsPerBlock =
   DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ);
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6552,6 +6552,12 @@
 if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
  options::OPT_fno_gpu_allow_device_init, false))
   CmdArgs.push_back("-fgpu-allow-device-init");
+if (Args.hasArg(options::OPT_hipstdpar)) {
+  CmdArgs.push_back("-hipstdpar");
+
+  if (Args.hasArg(options::OPT_hipstdpar_interpose_alloc))
+CmdArgs.push_back("-hipstdpar-interpose-alloc");
+}
 Args.addOptInFlag(CmdArgs, options::OPT_fhip_kernel_arg_name,
   options::OPT_fno_hip_kernel_arg_name);
   }
Index: clang/lib/Driver/ToolChains/AMDGPU.cpp

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-23 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 552754.
AlexVlx added a comment.
Herald added subscribers: llvm-commits, hiraditya.
Herald added a reviewer: kiranchandramohan.
Herald added a project: LLVM.

Update per review comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  flang/lib/Semantics/resolve-directives.cpp
  lld/tools/lld/lld.cpp
  llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
  llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
  llvm/test/CodeGen/AArch64/dbg-declare-swift-async.ll
  llvm/test/DebugInfo/AArch64/dbg-entry-value-swiftasync.mir

Index: llvm/test/DebugInfo/AArch64/dbg-entry-value-swiftasync.mir
===
--- llvm/test/DebugInfo/AArch64/dbg-entry-value-swiftasync.mir
+++ llvm/test/DebugInfo/AArch64/dbg-entry-value-swiftasync.mir
@@ -36,7 +36,7 @@
   - { id: 0, name: '', type: spill-slot, offset: -16, size: 8, alignment: 16,
   stack-id: default, callee-saved-register: '$lr', callee-saved-restored: true }
 entry_values:
-  - { entry-value-register: '$x22', debug-info-variable: '!10', debug-info-expression: '!DIExpression(DW_OP_LLVM_entry_value, 1, DW_OP_deref)',
+  - { entry-value-register: '$x22', debug-info-variable: '!10', debug-info-expression: '!DIExpression(DW_OP_LLVM_entry_value, 1)',
   debug-info-location: '!12' }
 body: |
   bb.0 (%ir-block.0):
Index: llvm/test/CodeGen/AArch64/dbg-declare-swift-async.ll
===
--- llvm/test/CodeGen/AArch64/dbg-declare-swift-async.ll
+++ llvm/test/CodeGen/AArch64/dbg-declare-swift-async.ll
@@ -3,11 +3,11 @@
 ; RUN: llc -O0 -fast-isel=false -global-isel=false -stop-after=finalize-isel %s -o - | FileCheck %s
 
 ; CHECK: void @foo
-; CHECK-NEXT: dbg.declare(metadata {{.*}}, metadata ![[VAR:.*]], metadata !DIExpression([[EXPR:.*]])), !dbg ![[LOC:.*]]
+; CHECK-NEXT: dbg.declare(metadata {{.*}}, metadata ![[VAR:.*]], metadata ![[EXPR:.*]]), !dbg ![[LOC:.*]]
 ; CHECK: entry_values:
-; CHECK-NEXT: entry-value-register: '$x22', debug-info-variable: '![[VAR]]', debug-info-expression: '!DIExpression([[EXPR]], DW_OP_deref)',
+; CHECK-NEXT: entry-value-register: '$x22', debug-info-variable: '![[VAR]]', debug-info-expression: '![[EXPR]]',
 ; CHECK-NEXT:   debug-info-location: '![[LOC]]
-; CHECK-NEXT: entry-value-register: '$x22', debug-info-variable: '![[VAR]]', debug-info-expression: '!DIExpression([[EXPR]], DW_OP_deref)'
+; CHECK-NEXT: entry-value-register: '$x22', debug-info-variable: '![[VAR]]', debug-info-expression: '![[EXPR]]'
 ; CHECK-NEXT:   debug-info-location: '![[LOC]]
 
 ; CHECK-NOT: DBG_VALUE
Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
===
--- llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
@@ -1357,8 +1357,6 @@
   // Find the corresponding livein physical register to this argument.
   for (auto [PhysReg, VirtReg] : FuncInfo.RegInfo->liveins())
 if (VirtReg == ArgVReg) {
-  // Append an op deref to account for the fact that this is a dbg_declare.
-  Expr = DIExpression::append(Expr, dwarf::DW_OP_deref);
   FuncInfo.MF->setVariableDbgInfo(Var, Expr, PhysReg, DbgLoc);
   LLVM_DEBUG(dbgs() << "processDbgDeclare: setVariableDbgInfo Var=" << *Var
 << ", Expr=" << *Expr << ",  MCRegister=" << PhysReg
Index: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
===
--- llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -1943,8 +1943,6 @@
   if (!PhysReg)
 return false;
 
-  // Append an op deref to account for the fact that this is a dbg_declare.
-  Expr = DIExpression::append(Expr, dwarf::DW_OP_deref);
   MF->setVariableDbgInfo(DebugInst.getVariable(), Expr, *PhysReg,
  DebugInst.getDebugLoc());
   return true;
Index: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
===
--- llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -1565,7 +1565,7 @@
 if (VI.inStackSlot())
   RegVar->initializeMMI(VI.Expr, VI.getStackSlot());
 else {
-  MachineLocation MLoc(VI.getEntryValueRegister(), /*IsIndirect*/ false);
+  MachineLocation MLoc(VI.getEntryValueRegister(), /*IsIndirect*/ true);
   auto LocEntry = DbgValueLocEntry(MLoc);
   RegVar->initializeDbgValue(DbgValueLoc(VI.Expr, LocEntry));
 }
Index: lld/tools/lld/lld.cpp
===
--- lld/tools/lld/lld.cpp
+++ lld/tools/lld/lld.cpp
@@ -30,6 +30,7 @@
 #include "lld/Common/Memory.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
+#inclu

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-23 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 552763.
AlexVlx added a comment.

Remove noise.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst

Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,391 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``, we rely on a
+pass over IR to clean up any and all code that was not "meant" for offload. If
+requested, allocation interposition is also handled via a separate pass over IR.
+
+To interface with the client HIP runtime, and to forward offloaded algorithm
+invocations to the corresponding accelerator specific library implementation, an
+implementation detail forwarding header is implicitly included by the driver,
+when compiling with ``-hipstdpar``. In what follows, we will delve into each
+component that contributes to implementing Algorithm Offload support.
+
+Implementation - Driver
+===
+
+We augment the ``cl

[PATCH] D157452: [RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces

2023-08-23 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

Gentle ping.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157452/new/

https://reviews.llvm.org/D157452

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155850: [HIP][Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-10-10 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 557672.
AlexVlx removed reviewers: tra, jlebar.
AlexVlx added a comment.

Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CMakeLists.txt
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
  clang/test/CodeGenHipStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__hipstdpar_unsupported()
Index: clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo(int i) {
+asm ("addl %2, %1; seto %b0" : "=q" (i), "+g" (i) : "r" (i));
+}
+
+// CHECK: declare void @__ASM__hipstdpar_unsupported([{{.*}}])
Index: clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-HIPSTDPAR-DEV %s
+
+// RUN: %clang_cc1 --hipstdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=HIPSTDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-HIPSTDPAR-DEV-NOT: define {{.*}} void @_Z3fooPff({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @_Z3fooPff({{.*}})
+void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-HIPSTDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+__device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3585,7 +3585,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2595,10 +2595,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in HipStdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsHipStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice;
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsHipStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2631,7 +2636,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsHipStdPar)
   CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
   << FD->getDeclName() << TargetDecl->getDeclName() << MissingFeature;
   } else if (!FD->isMultiVersion() && FD->hasAttr()) {
@@ -2640,7 +2645,8 @@
 
 for (const auto &F : CalleeFeatureMap) {
   if (F.getValue() && (!CallerFeatureMap.lookup(F.getKey()) ||
-   !CallerFeatureMap.find(F.getKey())->getValue()))
+

[PATCH] D155850: [HIP][Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-10-10 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 557673.
AlexVlx added a comment.

Use unmangled names in test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CMakeLists.txt
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
  clang/test/CodeGenHipStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__hipstdpar_unsupported()
Index: clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo(int i) {
+asm ("addl %2, %1; seto %b0" : "=q" (i), "+g" (i) : "r" (i));
+}
+
+// CHECK: declare void @__ASM__hipstdpar_unsupported([{{.*}}])
Index: clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-HIPSTDPAR-DEV %s
+
+// RUN: %clang_cc1 --hipstdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=HIPSTDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-HIPSTDPAR-DEV-NOT: define {{.*}} void @foo({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @foo({{.*}})
+extern "C" void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-HIPSTDPAR-DEV: define {{.*}} void @bar({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @bar({{.*}})
+extern "C" __device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3585,7 +3585,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2595,10 +2595,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in HipStdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsHipStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice;
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsHipStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2631,7 +2636,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsHipStdPar)
   CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
   << FD->getDeclName() << TargetDecl->getDeclName() << MissingFeature;
   } else if (!FD->isMultiVersion() && FD->hasAttr()) {
@@ -2640,7 +2645,8 @@
 
 for (const auto &F : CalleeFeatureMap) {
   if (F.getValue() && (!CallerFeatureMap.lookup(F.getKey()) ||
-   !CallerFeatureMap.find(F.getKey())->getValue()))
+   !CallerFeatu

[PATCH] D155850: [HIP][Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-10-15 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 557712.
AlexVlx edited the summary of this revision.
AlexVlx added a comment.

Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CMakeLists.txt
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
  clang/test/CodeGenHipStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__hipstdpar_unsupported()
Index: clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo(int i) {
+asm ("addl %2, %1; seto %b0" : "=q" (i), "+g" (i) : "r" (i));
+}
+
+// CHECK: declare void @__ASM__hipstdpar_unsupported([{{.*}}])
Index: clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-HIPSTDPAR-DEV %s
+
+// RUN: %clang_cc1 --hipstdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=HIPSTDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-HIPSTDPAR-DEV-NOT: define {{.*}} void @foo({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @foo({{.*}})
+extern "C" void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-HIPSTDPAR-DEV: define {{.*}} void @bar({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @bar({{.*}})
+extern "C" __device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3526,7 +3526,7 @@
 GV->setComdat(TheModule.getOrInsertComdat(GV->getName()));
   Emitter.finalize(GV);
 
-  return ConstantAddress(GV, GV->getValueType(), Alignment);
+return ConstantAddress(GV, GV->getValueType(), Alignment);
 }
 
 ConstantAddress CodeGenModule::GetWeakRefReference(const ValueDecl *VD) {
@@ -3585,7 +3585,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2594,10 +2594,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in HipStdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsHipStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice;
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsHipStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2630,7 +2635,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsHipStdPar)
   CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
   << FD->getDeclName() << TargetDecl->g

[PATCH] D155850: [HIP][Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-10-17 Thread Alex Voicu via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGdd5d65adb641: [HIP][Clang][CodeGen] Add CodeGen support for 
`hipstdpar` (authored by AlexVlx).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CMakeLists.txt
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
  clang/test/CodeGenHipStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__hipstdpar_unsupported()
Index: clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo(int i) {
+asm ("addl %2, %1; seto %b0" : "=q" (i), "+g" (i) : "r" (i));
+}
+
+// CHECK: declare void @__ASM__hipstdpar_unsupported([{{.*}}])
Index: clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-HIPSTDPAR-DEV %s
+
+// RUN: %clang_cc1 --hipstdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=HIPSTDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-HIPSTDPAR-DEV-NOT: define {{.*}} void @foo({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @foo({{.*}})
+extern "C" void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-HIPSTDPAR-DEV: define {{.*}} void @bar({{.*}})
+// HIPSTDPAR-DEV: define {{.*}} void @bar({{.*}})
+extern "C" __device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3526,7 +3526,7 @@
 GV->setComdat(TheModule.getOrInsertComdat(GV->getName()));
   Emitter.finalize(GV);
 
-  return ConstantAddress(GV, GV->getValueType(), Alignment);
+return ConstantAddress(GV, GV->getValueType(), Alignment);
 }
 
 ConstantAddress CodeGenModule::GetWeakRefReference(const ValueDecl *VD) {
@@ -3585,7 +3585,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2594,10 +2594,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in HipStdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsHipStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice;
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsHipStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2630,7 +2635,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsHipStdPar)
   CGM.get

[PATCH] D155850: [HIP][Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-10-17 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D155850#4654146 , @hnrklssn wrote:

> `clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp` is 
> failing: 
> https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/38041/testReport/junit/Clang/CodeGenHipStdPar/unannotated_functions_get_emitted_cpp/
>
>   
> project/clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp:15:22:
>  error: NO-HIPSTDPAR-DEV: expected string not found in input
>   // NO-HIPSTDPAR-DEV: define {{.*}} void @bar({{.*}})
>^
>   :1:1: note: scanning from here
>   ; ModuleID = 
> '/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp'
>   ^
>   :7:1: note: possible intended match here
>   define void @bar(ptr noundef %a, float noundef %b) #0 {
>   ^
>
> It looks like it may be due to the matcher having whitespace on both sides of 
> `{{.*}}`, while the output only has a single space between `define` and 
> `void`, but I'm not too well versed in FileCheck edge cases to know for sure.

Thank you for the ping... this is pretty confusing since it's not tripping any 
of the buildbots, or flaring locally, let me look into it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155850: [HIP][Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-10-17 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 557733.
AlexVlx added a comment.

Simplify test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CMakeLists.txt
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
  clang/test/CodeGenHipStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__hipstdpar_unsupported()
Index: clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unsupported-ASM.cpp
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --hipstdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo(int i) {
+asm ("addl %2, %1; seto %b0" : "=q" (i), "+g" (i) : "r" (i));
+}
+
+// CHECK: declare void @__ASM__hipstdpar_unsupported([{{.*}}])
Index: clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenHipStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-HIPSTDPAR-DEV %s
+
+// RUN: %clang_cc1 --hipstdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=HIPSTDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-HIPSTDPAR-DEV-NOT: {{.*}}void @foo({{.*}})
+// HIPSTDPAR-DEV: {{.*}}void @foo({{.*}})
+extern "C" void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-HIPSTDPAR-DEV: {{.*}}void @bar({{.*}})
+// HIPSTDPAR-DEV: {{.*}}void @bar({{.*}})
+extern "C" __device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3526,7 +3526,7 @@
 GV->setComdat(TheModule.getOrInsertComdat(GV->getName()));
   Emitter.finalize(GV);
 
-  return ConstantAddress(GV, GV->getValueType(), Alignment);
+return ConstantAddress(GV, GV->getValueType(), Alignment);
 }
 
 ConstantAddress CodeGenModule::GetWeakRefReference(const ValueDecl *VD) {
@@ -3585,7 +3585,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2594,10 +2594,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in HipStdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsHipStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice;
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsHipStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2630,7 +2635,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsHipStdPar)
   CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
   << FD->getDeclName() << TargetDecl->getDeclName() << MissingFeature;
   } else if (!FD->isMultiVersion() &&

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-28 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx created this revision.
AlexVlx added reviewers: rjmccall, yaxunl.
AlexVlx added a project: clang.
Herald added a subscriber: arichardson.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added a subscriber: cfe-commits.

`alloca` instructions always return pointers to the stack / alloca address 
space. This composes poorly with most HLLs which are address space agnostic and 
thus have all pointers point to generic/flat. Static `alloca`s were already 
handled on the AST level, in the implementation of the `CreateTempAlloca` 
family of functions, however dynamic allocas were not, which would make the 
following valid C++ code:

  const char* p = static_cast(__builtin_alloca(42));

lead to subtly incorrect IR / an assert flaring. This patch addresses that by 
inserting an address space cast iff the alloca address space is different from 
generic.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D156539

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/dynamic-alloca-with-address-space.c


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,28 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s
+
+void allocas(unsigned long n) {
+void *a = __builtin_alloca(n);
+void *uninitialized_a = __builtin_alloca_uninitialized(n);
+void *aligned_a = __builtin_alloca_with_align(n, 8);
+void *aligned_uninitialized_a = 
__builtin_alloca_with_align_uninitialized(n, 8);
+}
+
+// CHECK: @allocas(
+// CHECK: store i64 %n, ptr %n.addr.ascast, align 8
+// CHECK: %0 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %1 = alloca i8, i64 %0, align 8, addrspace(5)
+// CHECK: %2 = addrspacecast ptr addrspace(5) %1 to ptr
+// CHECK: store ptr %2, ptr %a.ascast, align 8
+// CHECK: %3 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %4 = alloca i8, i64 %3, align 8, addrspace(5)
+// CHECK: %5 = addrspacecast ptr addrspace(5) %4 to ptr
+// CHECK: store ptr %5, ptr %uninitialized_a.ascast, align 8
+// CHECK: %6 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %7 = alloca i8, i64 %6, align 1, addrspace(5)
+// CHECK: %8 = addrspacecast ptr addrspace(5) %7 to ptr
+// CHECK: store ptr %8, ptr %aligned_a.ascast, align 8
+// CHECK: %9 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %10 = alloca i8, i64 %9, align 1, addrspace(5)
+// CHECK: %11 = addrspacecast ptr addrspace(5) %10 to ptr
+// CHECK: store ptr %11, ptr %aligned_uninitialized_a.ascast, align 8
+// CHECK: ret void
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -3514,6 +3514,12 @@
 return RValue::get(Result);
   }
 
+  // An alloca will always return a pointer to the alloca (stack) address
+  // space. This address space need not be the same as the AST / Language
+  // default (e.g. in C / C++ auto vars are in the generic address space). At
+  // the AST level this is handled within CreateTempAlloca et al., but for the
+  // builtin / dynamic alloca we have to handle it here. We use an explicit 
cast
+  // instead of passing an AS to CreateAlloca so as to not inhibit 
optimisation.
   case Builtin::BIalloca:
   case Builtin::BI_alloca:
   case Builtin::BI__builtin_alloca_uninitialized:
@@ -3529,7 +3535,10 @@
 AI->setAlignment(SuitableAlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_uninitialized)
   initializeAlloca(*this, AI, Size, SuitableAlignmentInBytes);
-return RValue::get(AI);
+if (getASTAllocaAddressSpace() != LangAS::Default)
+  return RValue::get(Builder.CreateAddrSpaceCast(AI, CGM.Int8PtrTy));
+else
+  return RValue::get(AI);
   }
 
   case Builtin::BI__builtin_alloca_with_align_uninitialized:
@@ -3544,7 +3553,10 @@
 AI->setAlignment(AlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_with_align_uninitialized)
   initializeAlloca(*this, AI, Size, AlignmentInBytes);
-return RValue::get(AI);
+if (getASTAllocaAddressSpace() != LangAS::Default)
+  return RValue::get(Builder.CreateAddrSpaceCast(AI, CGM.Int8PtrTy));
+else
+  return RValue::get(AI);
   }
 
   case Builtin::BIbzero:


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,28 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s
+
+void allocas(unsigned long n) {
+void *a = __builtin_alloca(n);
+void *uninitialized_a = __builtin_alloca_uninitialized(n);
+void *aligned_a = __builtin_alloca_with_align(n, 8);
+void *aligned_uninitialized_a = __builtin_alloca_with_align_uninitialized(n, 8);
+}
+
+// CHECK: @allocas(

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-31 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D156539#4542836 , @rjmccall wrote:

> We should probably write this code to work properly in case we add a target 
> that makes `__builtin_alloca` return a pointer in the private address space.  
> Could you recover the target AS from the type of the expression instead of 
> assuming `LangAS::Default`?

I believe it should work as written (it is possible that I misunderstand the 
objection, case in which I do apologise). The issue is precisely that for some 
targets (amdgcn being one) `alloca` returns a pointer to private, which doesn't 
compose with the default AS being unspecified / nonexistent, for some languages 
(e.g. HIP,  C/C++ etc.). For the OCL case @arsenm mentions, I believe that we 
make a choice based on `LangOpts.OpenCLGenericAddressSpace`, and the code above 
would only be valid from the OCL language perspective iff that is true. I've 
put together a short Godbolt that captures these (and the bug the patch should 
fix, please see the bitcasts between ASes in the non-OCL case): 
https://gcc.godbolt.org/z/sYGK76zqv.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-31 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 545679.
AlexVlx added a reviewer: arsenm.
AlexVlx added a comment.
Herald added a subscriber: wdng.

Incorporated review feedback. Updated test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/dynamic-alloca-with-address-space.c


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,28 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s
+
+void allocas(unsigned long n) {
+char *a = (char *)__builtin_alloca(n);
+char *uninitialized_a = (char *)__builtin_alloca_uninitialized(n);
+char *aligned_a = (char *)__builtin_alloca_with_align(n, 8);
+char *aligned_uninitialized_a = (char 
*)__builtin_alloca_with_align_uninitialized(n, 8);
+}
+
+// CHECK: @allocas(
+// CHECK: store i64 %n, ptr %n.addr.ascast, align 8
+// CHECK: %0 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %1 = alloca i8, i64 %0, align 8, addrspace(5)
+// CHECK: %2 = addrspacecast ptr addrspace(5) %1 to ptr
+// CHECK: store ptr %2, ptr %a.ascast, align 8
+// CHECK: %3 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %4 = alloca i8, i64 %3, align 8, addrspace(5)
+// CHECK: %5 = addrspacecast ptr addrspace(5) %4 to ptr
+// CHECK: store ptr %5, ptr %uninitialized_a.ascast, align 8
+// CHECK: %6 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %7 = alloca i8, i64 %6, align 1, addrspace(5)
+// CHECK: %8 = addrspacecast ptr addrspace(5) %7 to ptr
+// CHECK: store ptr %8, ptr %aligned_a.ascast, align 8
+// CHECK: %9 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %10 = alloca i8, i64 %9, align 1, addrspace(5)
+// CHECK: %11 = addrspacecast ptr addrspace(5) %10 to ptr
+// CHECK: store ptr %11, ptr %aligned_uninitialized_a.ascast, align 8
+// CHECK: ret void
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -3514,6 +3514,12 @@
 return RValue::get(Result);
   }
 
+  // An alloca will always return a pointer to the alloca (stack) address
+  // space. This address space need not be the same as the AST / Language
+  // default (e.g. in C / C++ auto vars are in the generic address space). At
+  // the AST level this is handled within CreateTempAlloca et al., but for the
+  // builtin / dynamic alloca we have to handle it here. We use an explicit 
cast
+  // instead of passing an AS to CreateAlloca so as to not inhibit 
optimisation.
   case Builtin::BIalloca:
   case Builtin::BI_alloca:
   case Builtin::BI__builtin_alloca_uninitialized:
@@ -3529,6 +3535,8 @@
 AI->setAlignment(SuitableAlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_uninitialized)
   initializeAlloca(*this, AI, Size, SuitableAlignmentInBytes);
+if (getASTAllocaAddressSpace() != LangAS::Default)
+  return RValue::get(Builder.CreateAddrSpaceCast(AI, CGM.Int8PtrTy));
 return RValue::get(AI);
   }
 
@@ -3544,6 +3552,8 @@
 AI->setAlignment(AlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_with_align_uninitialized)
   initializeAlloca(*this, AI, Size, AlignmentInBytes);
+if (getASTAllocaAddressSpace() != LangAS::Default)
+  return RValue::get(Builder.CreateAddrSpaceCast(AI, CGM.Int8PtrTy));
 return RValue::get(AI);
   }
 


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,28 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s
+
+void allocas(unsigned long n) {
+char *a = (char *)__builtin_alloca(n);
+char *uninitialized_a = (char *)__builtin_alloca_uninitialized(n);
+char *aligned_a = (char *)__builtin_alloca_with_align(n, 8);
+char *aligned_uninitialized_a = (char *)__builtin_alloca_with_align_uninitialized(n, 8);
+}
+
+// CHECK: @allocas(
+// CHECK: store i64 %n, ptr %n.addr.ascast, align 8
+// CHECK: %0 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %1 = alloca i8, i64 %0, align 8, addrspace(5)
+// CHECK: %2 = addrspacecast ptr addrspace(5) %1 to ptr
+// CHECK: store ptr %2, ptr %a.ascast, align 8
+// CHECK: %3 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %4 = alloca i8, i64 %3, align 8, addrspace(5)
+// CHECK: %5 = addrspacecast ptr addrspace(5) %4 to ptr
+// CHECK: store ptr %5, ptr %uninitialized_a.ascast, align 8
+// CHECK: %6 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %7 = alloca i8, i64 %6, align 1, addrspace(5)
+// CHECK: %8 = addrspacecast ptr addrspace(5) %7 to ptr
+// CHECK: store ptr %8, ptr %aligned_a.ascast, align 8
+// CHECK: %9 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %10

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-31 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added inline comments.



Comment at: clang/test/CodeGen/dynamic-alloca-with-address-space.c:1
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s
+

arsenm wrote:
> Can you add an opencl 1.2 and 2.0 run line too
This is not valid 1.2 code, for 2.0 sure.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-31 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D156539#4547814 , @rjmccall wrote:

> In D156539#4546834 , @AlexVlx wrote:
>
>> In D156539#4542836 , @rjmccall 
>> wrote:
>>
>>> We should probably write this code to work properly in case we add a target 
>>> that makes `__builtin_alloca` return a pointer in the private address 
>>> space.  Could you recover the target AS from the type of the expression 
>>> instead of assuming `LangAS::Default`?
>>
>> I believe it should work as written (it is possible that I misunderstand the 
>> objection, case in which I do apologise). The issue is precisely that for 
>> some targets (amdgcn being one) `alloca` returns a pointer to private, which 
>> doesn't compose with the default AS being unspecified / nonexistent, for 
>> some languages (e.g. HIP,  C/C++ etc.). For the OCL case @arsenm mentions, I 
>> believe that we make a choice based on `LangOpts.OpenCLGenericAddressSpace`, 
>> and the code above would only be valid from the OCL language perspective iff 
>> that is true. I've put together a short Godbolt that captures these (and the 
>> bug the patch should fix, please see the bitcasts between ASes in the 
>> non-OCL case): https://gcc.godbolt.org/z/sYGK76zqv.
>
> An address space conversion is required in IR if there is a difference 
> between the address space of the pointer type formally returned by the call 
> to `__builtin_alloca` and the address space produced by the `alloca` 
> operation in IR.  If Sema has set the type of `__builtin_alloca` to formally 
> return something in the stack address space, no conversion is required.  What 
> I'm saying that I'd like you to not directly refer to `LangAS::Default` in 
> this code, because you're assuming that `__builtin_alloca` is always 
> returning a pointer that's formally in `LangAS::Default`.  Please recover the 
> target address space from the type of the expression.
>
> Additionally, in IRGen we allow the target to hook address-space conversions; 
> please call `performAddrSpaceCast` instead of directly emitting an 
> `addrspacecast` instruction.

Hmm, I think I see where you are coming from, thank you for clarifying and 
apologies for not seeing it from the getgo. Before delving into it, I will note 
that this has been handled similarly, but not identically, for static allocas 
for a while e.g.: 
https://github.com/llvm/llvm-project/blob/e0b3f45d87a6677efb59d8a4f0e6deac9c346c76/clang/lib/CodeGen/CGExpr.cpp#L83.
 I guess the concern here is somewhat forward looking because today the builtin 
is defined as `BUILTIN(__builtin_alloca, "v*z"   , "Fn")`, which means that it 
won't carry an AS on its return value on the ast, and will only ever return a 
pointer to generic / default (why the query around ASTAllocaAddressSpace 
exists, I assume). AFAICT, we actually expect languages that are not OCL to 
have alloca return a pointer to LangAS::Default, please see:  
https://github.com/llvm/llvm-project/blob/5fbee1c6e300eee9ce9d18275bf8a6de0a22ba59/clang/lib/CodeGen/CGDecl.cpp#L1443.
 My sense is that, currently, on an AST level, the stack AS must be the default 
as or, iff OpenCL, opencl_private (also the `alloca` AS in OCL), but the alloca 
AS needn't be the stack AS. Which I guess could be changed in the future, but 
seems like it'd take some work. Granted, that's maybe not a reason to constrain 
the builtin, but then static and dynamic allocas will hypothetically behave 
differently.

Noted on the target-hook, will do, thank you.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-31 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 545860.
AlexVlx added a comment.

Address review comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/dynamic-alloca-with-address-space.c


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,41 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - \
+// RUN:   | FileCheck %s --check-prefix=CHECK
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -DOCL12 -x cl -std=cl1.2 \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-CL12
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x cl -std=cl2.0 \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-CL20
+
+#if defined(OCL12)
+#define CAST (char *)(unsigned long)
+#else
+#define CAST (char *)
+#endif
+
+void allocas(unsigned long n) {
+char *a = CAST __builtin_alloca(n);
+char *uninitialized_a = CAST __builtin_alloca_uninitialized(n);
+char *aligned_a = CAST __builtin_alloca_with_align(n, 8);
+char *aligned_uninitialized_a = CAST 
__builtin_alloca_with_align_uninitialized(n, 8);
+}
+
+// CHECK: @allocas(
+// CHECK: store i64 %n, ptr %n.addr.ascast, align 8
+// CHECK: %0 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %1 = alloca i8, i64 %0, align 8, addrspace(5)
+// CHECK: %2 = addrspacecast ptr addrspace(5) %1 to ptr
+// CHECK: store ptr %2, ptr %a.ascast, align 8
+// CHECK: %3 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %4 = alloca i8, i64 %3, align 8, addrspace(5)
+// CHECK: %5 = addrspacecast ptr addrspace(5) %4 to ptr
+// CHECK: store ptr %5, ptr %uninitialized_a.ascast, align 8
+// CHECK: %6 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %7 = alloca i8, i64 %6, align 1, addrspace(5)
+// CHECK: %8 = addrspacecast ptr addrspace(5) %7 to ptr
+// CHECK: store ptr %8, ptr %aligned_a.ascast, align 8
+// CHECK: %9 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %10 = alloca i8, i64 %9, align 1, addrspace(5)
+// CHECK: %11 = addrspacecast ptr addrspace(5) %10 to ptr
+// CHECK: store ptr %11, ptr %aligned_uninitialized_a.ascast, align 8
+// CHECK: ret void
+// CHECK-CL12-NOT: addrspacecast
+// CHECK-CL20-NOT: addrspacecast
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -3517,6 +3517,12 @@
 return RValue::get(Result);
   }
 
+  // An alloca will always return a pointer to the alloca (stack) address
+  // space. This address space need not be the same as the AST / Language
+  // default (e.g. in C / C++ auto vars are in the generic address space). At
+  // the AST level this is handled within CreateTempAlloca et al., but for the
+  // builtin / dynamic alloca we have to handle it here. We use an explicit 
cast
+  // instead of passing an AS to CreateAlloca so as to not inhibit 
optimisation.
   case Builtin::BIalloca:
   case Builtin::BI_alloca:
   case Builtin::BI__builtin_alloca_uninitialized:
@@ -3532,6 +3538,13 @@
 AI->setAlignment(SuitableAlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_uninitialized)
   initializeAlloca(*this, AI, Size, SuitableAlignmentInBytes);
+LangAS AAS = getASTAllocaAddressSpace();
+LangAS EAS = E->getType().getAddressSpace();
+if (AAS != EAS) {
+  llvm::Type *Ty = CGM.getTypes().ConvertType(E->getType());
+  return RValue::get(getTargetHooks().performAddrSpaceCast(*this, AI, AAS,
+   EAS, Ty));
+}
 return RValue::get(AI);
   }
 
@@ -3547,6 +3560,13 @@
 AI->setAlignment(AlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_with_align_uninitialized)
   initializeAlloca(*this, AI, Size, AlignmentInBytes);
+LangAS AAS = getASTAllocaAddressSpace();
+LangAS EAS = E->getType().getAddressSpace();
+if (AAS != EAS) {
+  llvm::Type *Ty = CGM.getTypes().ConvertType(E->getType());
+  return RValue::get(getTargetHooks().performAddrSpaceCast(*this, AI, AAS,
+   EAS, Ty));
+}
 return RValue::get(AI);
   }
 


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,41 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - \
+// RUN:   | FileCheck %s --check-prefix=CHECK
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -DOCL12 -x cl -std=cl1.2 \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-CL12
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x cl -std=cl2.0 \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefix=C

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-31 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 545882.
AlexVlx added a comment.

Use pointee's address space for the check.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/dynamic-alloca-with-address-space.c


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,41 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - \
+// RUN:   | FileCheck %s --check-prefix=CHECK
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -DOCL12 -x cl -std=cl1.2 \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-CL12
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x cl -std=cl2.0 \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-CL20
+
+#if defined(OCL12)
+#define CAST (char *)(unsigned long)
+#else
+#define CAST (char *)
+#endif
+
+void allocas(unsigned long n) {
+char *a = CAST __builtin_alloca(n);
+char *uninitialized_a = CAST __builtin_alloca_uninitialized(n);
+char *aligned_a = CAST __builtin_alloca_with_align(n, 8);
+char *aligned_uninitialized_a = CAST 
__builtin_alloca_with_align_uninitialized(n, 8);
+}
+
+// CHECK: @allocas(
+// CHECK: store i64 %n, ptr %n.addr.ascast, align 8
+// CHECK: %0 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %1 = alloca i8, i64 %0, align 8, addrspace(5)
+// CHECK: %2 = addrspacecast ptr addrspace(5) %1 to ptr
+// CHECK: store ptr %2, ptr %a.ascast, align 8
+// CHECK: %3 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %4 = alloca i8, i64 %3, align 8, addrspace(5)
+// CHECK: %5 = addrspacecast ptr addrspace(5) %4 to ptr
+// CHECK: store ptr %5, ptr %uninitialized_a.ascast, align 8
+// CHECK: %6 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %7 = alloca i8, i64 %6, align 1, addrspace(5)
+// CHECK: %8 = addrspacecast ptr addrspace(5) %7 to ptr
+// CHECK: store ptr %8, ptr %aligned_a.ascast, align 8
+// CHECK: %9 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %10 = alloca i8, i64 %9, align 1, addrspace(5)
+// CHECK: %11 = addrspacecast ptr addrspace(5) %10 to ptr
+// CHECK: store ptr %11, ptr %aligned_uninitialized_a.ascast, align 8
+// CHECK: ret void
+// CHECK-CL12-NOT: addrspacecast
+// CHECK-CL20-NOT: addrspacecast
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -3517,6 +3517,12 @@
 return RValue::get(Result);
   }
 
+  // An alloca will always return a pointer to the alloca (stack) address
+  // space. This address space need not be the same as the AST / Language
+  // default (e.g. in C / C++ auto vars are in the generic address space). At
+  // the AST level this is handled within CreateTempAlloca et al., but for the
+  // builtin / dynamic alloca we have to handle it here. We use an explicit 
cast
+  // instead of passing an AS to CreateAlloca so as to not inhibit 
optimisation.
   case Builtin::BIalloca:
   case Builtin::BI_alloca:
   case Builtin::BI__builtin_alloca_uninitialized:
@@ -3532,6 +3538,13 @@
 AI->setAlignment(SuitableAlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_uninitialized)
   initializeAlloca(*this, AI, Size, SuitableAlignmentInBytes);
+LangAS AAS = getASTAllocaAddressSpace();
+LangAS EAS = E->getType()->getPointeeType().getAddressSpace();
+if (AAS != EAS) {
+  llvm::Type *Ty = CGM.getTypes().ConvertType(E->getType());
+  return RValue::get(getTargetHooks().performAddrSpaceCast(*this, AI, AAS,
+   EAS, Ty));
+}
 return RValue::get(AI);
   }
 
@@ -3547,6 +3560,13 @@
 AI->setAlignment(AlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_with_align_uninitialized)
   initializeAlloca(*this, AI, Size, AlignmentInBytes);
+LangAS AAS = getASTAllocaAddressSpace();
+LangAS EAS = E->getType()->getPointeeType().getAddressSpace();
+if (AAS != EAS) {
+  llvm::Type *Ty = CGM.getTypes().ConvertType(E->getType());
+  return RValue::get(getTargetHooks().performAddrSpaceCast(*this, AI, AAS,
+   EAS, Ty));
+}
 return RValue::get(AI);
   }
 


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,41 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - \
+// RUN:   | FileCheck %s --check-prefix=CHECK
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -DOCL12 -x cl -std=cl1.2 \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-CL12
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x cl -std=cl2.0 \
+// RUN

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-07-31 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx marked an inline comment as done.
AlexVlx added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:3542
+LangAS AAS = getASTAllocaAddressSpace();
+LangAS EAS = E->getType().getAddressSpace();
+if (AAS != EAS) {

rjmccall wrote:
> `E->getType().getAddressSpace()` is actually the top-level qualification of 
> the type; you need `E->getType()->getPointeeType().getAddressSpace()`.
D'oh! I will probably never avoid getting tripped by this, sorry for the noise.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D156539: [Clang][CodeGen] `__builtin_alloca`s should care about address spaces too

2023-08-01 Thread Alex Voicu via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
AlexVlx marked an inline comment as done.
Closed by commit rG51a014cb2d9c: [Clang][CodeGen] `__builtin_alloca`s should 
care about address spaces (authored by AlexVlx).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156539/new/

https://reviews.llvm.org/D156539

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/dynamic-alloca-with-address-space.c


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,41 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - \
+// RUN:   | FileCheck %s --check-prefix=CHECK
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -DOCL12 -x cl -std=cl1.2 \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-CL12
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x cl -std=cl2.0 \
+// RUN:   -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK-CL20
+
+#if defined(OCL12)
+#define CAST (char *)(unsigned long)
+#else
+#define CAST (char *)
+#endif
+
+void allocas(unsigned long n) {
+char *a = CAST __builtin_alloca(n);
+char *uninitialized_a = CAST __builtin_alloca_uninitialized(n);
+char *aligned_a = CAST __builtin_alloca_with_align(n, 8);
+char *aligned_uninitialized_a = CAST 
__builtin_alloca_with_align_uninitialized(n, 8);
+}
+
+// CHECK: @allocas(
+// CHECK: store i64 %n, ptr %n.addr.ascast, align 8
+// CHECK: %0 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %1 = alloca i8, i64 %0, align 8, addrspace(5)
+// CHECK: %2 = addrspacecast ptr addrspace(5) %1 to ptr
+// CHECK: store ptr %2, ptr %a.ascast, align 8
+// CHECK: %3 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %4 = alloca i8, i64 %3, align 8, addrspace(5)
+// CHECK: %5 = addrspacecast ptr addrspace(5) %4 to ptr
+// CHECK: store ptr %5, ptr %uninitialized_a.ascast, align 8
+// CHECK: %6 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %7 = alloca i8, i64 %6, align 1, addrspace(5)
+// CHECK: %8 = addrspacecast ptr addrspace(5) %7 to ptr
+// CHECK: store ptr %8, ptr %aligned_a.ascast, align 8
+// CHECK: %9 = load i64, ptr %n.addr.ascast, align 8
+// CHECK: %10 = alloca i8, i64 %9, align 1, addrspace(5)
+// CHECK: %11 = addrspacecast ptr addrspace(5) %10 to ptr
+// CHECK: store ptr %11, ptr %aligned_uninitialized_a.ascast, align 8
+// CHECK: ret void
+// CHECK-CL12-NOT: addrspacecast
+// CHECK-CL20-NOT: addrspacecast
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -3517,6 +3517,12 @@
 return RValue::get(Result);
   }
 
+  // An alloca will always return a pointer to the alloca (stack) address
+  // space. This address space need not be the same as the AST / Language
+  // default (e.g. in C / C++ auto vars are in the generic address space). At
+  // the AST level this is handled within CreateTempAlloca et al., but for the
+  // builtin / dynamic alloca we have to handle it here. We use an explicit 
cast
+  // instead of passing an AS to CreateAlloca so as to not inhibit 
optimisation.
   case Builtin::BIalloca:
   case Builtin::BI_alloca:
   case Builtin::BI__builtin_alloca_uninitialized:
@@ -3532,6 +3538,13 @@
 AI->setAlignment(SuitableAlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_uninitialized)
   initializeAlloca(*this, AI, Size, SuitableAlignmentInBytes);
+LangAS AAS = getASTAllocaAddressSpace();
+LangAS EAS = E->getType()->getPointeeType().getAddressSpace();
+if (AAS != EAS) {
+  llvm::Type *Ty = CGM.getTypes().ConvertType(E->getType());
+  return RValue::get(getTargetHooks().performAddrSpaceCast(*this, AI, AAS,
+   EAS, Ty));
+}
 return RValue::get(AI);
   }
 
@@ -3547,6 +3560,13 @@
 AI->setAlignment(AlignmentInBytes);
 if (BuiltinID != Builtin::BI__builtin_alloca_with_align_uninitialized)
   initializeAlloca(*this, AI, Size, AlignmentInBytes);
+LangAS AAS = getASTAllocaAddressSpace();
+LangAS EAS = E->getType()->getPointeeType().getAddressSpace();
+if (AAS != EAS) {
+  llvm::Type *Ty = CGM.getTypes().ConvertType(E->getType());
+  return RValue::get(getTargetHooks().performAddrSpaceCast(*this, AI, AAS,
+   EAS, Ty));
+}
 return RValue::get(AI);
   }
 


Index: clang/test/CodeGen/dynamic-alloca-with-address-space.c
===
--- /dev/null
+++ clang/test/CodeGen/dynamic-alloca-with-address-space.c
@@ -0,0 +1,41 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - \
+// RUN:   | FileCheck %s --check-prefix=

[PATCH] D155870: [Clang][CodeGen] Another follow-up for `vtable`, `typeinfo` et al. are globals

2023-08-01 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

Ping?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155870/new/

https://reviews.llvm.org/D155870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-08-02 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added inline comments.



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:5315
+  !GV->isThreadLocal() &&
+  !(getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice)) {
 if (D->getTLSKind() == VarDecl::TLS_Dynamic)

efriedma wrote:
> You can't just pretend a thread-local variable isn't thread-local.  If the 
> intent here is that thread-local variables are illegal in device code, you 
> need to figure out some way to produce a diagnostic.  (Maybe by generating a 
> call to __stdpar_unsupported_threadlocal or something like that if code tries 
> to refer to such a variable.)
Oh, this is actually an error that slipped through, I botched the diff it 
appears, I'll correct it, apologies.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155870: [Clang][CodeGen] Another follow-up for `vtable`, `typeinfo` et al. are globals

2023-08-03 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 546911.
AlexVlx added a comment.

Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155870/new/

https://reviews.llvm.org/D155870

Files:
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/dynamic-cast-address-space.cpp


Index: clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm 
-fcxx-exceptions -fexceptions -o - | FileCheck %s
+struct A { virtual void f(); };
+struct B : A { };
+
+// CHECK: {{define.*@_Z1fP1A}}
+// CHECK-SAME:  personality ptr @__gxx_personality_v0
+B fail;
+const B& f(A *a) {
+  try {
+// CHECK: call ptr @__dynamic_cast
+// CHECK: br i1
+// CHECK: invoke void @__cxa_bad_cast() [[NR:#[0-9]+]]
+dynamic_cast(*a);
+  } catch (...) {
+// CHECK:  landingpad { ptr, i32 }
+// CHECK-NEXT:   catch ptr null
+  }
+  return fail;
+}
+
+// CHECK: declare ptr @__dynamic_cast(ptr, ptr addrspace(1), ptr addrspace(1), 
i64) [[NUW_RO:#[0-9]+]]
+
+// CHECK: attributes [[NUW_RO]] = { nounwind memory(read) }
+// CHECK: attributes [[NR]] = { noreturn }
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1344,15 +1344,16 @@
 
 static llvm::FunctionCallee getItaniumDynamicCastFn(CodeGenFunction &CGF) {
   // void *__dynamic_cast(const void *sub,
-  //  const abi::__class_type_info *src,
-  //  const abi::__class_type_info *dst,
+  //  global_as const abi::__class_type_info *src,
+  //  global_as const abi::__class_type_info *dst,
   //  std::ptrdiff_t src2dst_offset);
 
   llvm::Type *Int8PtrTy = CGF.Int8PtrTy;
+  llvm::Type *GlobInt8PtrTy = CGF.GlobalsInt8PtrTy;
   llvm::Type *PtrDiffTy =
 CGF.ConvertType(CGF.getContext().getPointerDiffType());
 
-  llvm::Type *Args[4] = { Int8PtrTy, Int8PtrTy, Int8PtrTy, PtrDiffTy };
+  llvm::Type *Args[4] = { Int8PtrTy, GlobInt8PtrTy, GlobInt8PtrTy, PtrDiffTy };
 
   llvm::FunctionType *FTy = llvm::FunctionType::get(Int8PtrTy, Args, false);
 


Index: clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions -fexceptions -o - | FileCheck %s
+struct A { virtual void f(); };
+struct B : A { };
+
+// CHECK: {{define.*@_Z1fP1A}}
+// CHECK-SAME:  personality ptr @__gxx_personality_v0
+B fail;
+const B& f(A *a) {
+  try {
+// CHECK: call ptr @__dynamic_cast
+// CHECK: br i1
+// CHECK: invoke void @__cxa_bad_cast() [[NR:#[0-9]+]]
+dynamic_cast(*a);
+  } catch (...) {
+// CHECK:  landingpad { ptr, i32 }
+// CHECK-NEXT:   catch ptr null
+  }
+  return fail;
+}
+
+// CHECK: declare ptr @__dynamic_cast(ptr, ptr addrspace(1), ptr addrspace(1), i64) [[NUW_RO:#[0-9]+]]
+
+// CHECK: attributes [[NUW_RO]] = { nounwind memory(read) }
+// CHECK: attributes [[NR]] = { noreturn }
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1344,15 +1344,16 @@
 
 static llvm::FunctionCallee getItaniumDynamicCastFn(CodeGenFunction &CGF) {
   // void *__dynamic_cast(const void *sub,
-  //  const abi::__class_type_info *src,
-  //  const abi::__class_type_info *dst,
+  //  global_as const abi::__class_type_info *src,
+  //  global_as const abi::__class_type_info *dst,
   //  std::ptrdiff_t src2dst_offset);
 
   llvm::Type *Int8PtrTy = CGF.Int8PtrTy;
+  llvm::Type *GlobInt8PtrTy = CGF.GlobalsInt8PtrTy;
   llvm::Type *PtrDiffTy =
 CGF.ConvertType(CGF.getContext().getPointerDiffType());
 
-  llvm::Type *Args[4] = { Int8PtrTy, Int8PtrTy, Int8PtrTy, PtrDiffTy };
+  llvm::Type *Args[4] = { Int8PtrTy, GlobInt8PtrTy, GlobInt8PtrTy, PtrDiffTy };
 
   llvm::FunctionType *FTy = llvm::FunctionType::get(Int8PtrTy, Args, false);
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155870: [Clang][CodeGen] Another follow-up for `vtable`, `typeinfo` et al. are globals

2023-08-03 Thread Alex Voicu via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG7240008c0afa: [Clang][CodeGen] `__dynamic_cast` should care 
about `type_info`'s address space (authored by AlexVlx).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155870/new/

https://reviews.llvm.org/D155870

Files:
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/dynamic-cast-address-space.cpp


Index: clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm 
-fcxx-exceptions -fexceptions -o - | FileCheck %s
+struct A { virtual void f(); };
+struct B : A { };
+
+// CHECK: {{define.*@_Z1fP1A}}
+// CHECK-SAME:  personality ptr @__gxx_personality_v0
+B fail;
+const B& f(A *a) {
+  try {
+// CHECK: call ptr @__dynamic_cast
+// CHECK: br i1
+// CHECK: invoke void @__cxa_bad_cast() [[NR:#[0-9]+]]
+dynamic_cast(*a);
+  } catch (...) {
+// CHECK:  landingpad { ptr, i32 }
+// CHECK-NEXT:   catch ptr null
+  }
+  return fail;
+}
+
+// CHECK: declare ptr @__dynamic_cast(ptr, ptr addrspace(1), ptr addrspace(1), 
i64) [[NUW_RO:#[0-9]+]]
+
+// CHECK: attributes [[NUW_RO]] = { nounwind memory(read) }
+// CHECK: attributes [[NR]] = { noreturn }
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1338,15 +1338,16 @@
 
 static llvm::FunctionCallee getItaniumDynamicCastFn(CodeGenFunction &CGF) {
   // void *__dynamic_cast(const void *sub,
-  //  const abi::__class_type_info *src,
-  //  const abi::__class_type_info *dst,
+  //  global_as const abi::__class_type_info *src,
+  //  global_as const abi::__class_type_info *dst,
   //  std::ptrdiff_t src2dst_offset);
 
   llvm::Type *Int8PtrTy = CGF.Int8PtrTy;
+  llvm::Type *GlobInt8PtrTy = CGF.GlobalsInt8PtrTy;
   llvm::Type *PtrDiffTy =
 CGF.ConvertType(CGF.getContext().getPointerDiffType());
 
-  llvm::Type *Args[4] = { Int8PtrTy, Int8PtrTy, Int8PtrTy, PtrDiffTy };
+  llvm::Type *Args[4] = { Int8PtrTy, GlobInt8PtrTy, GlobInt8PtrTy, PtrDiffTy };
 
   llvm::FunctionType *FTy = llvm::FunctionType::get(Int8PtrTy, Args, false);
 


Index: clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions -fexceptions -o - | FileCheck %s
+struct A { virtual void f(); };
+struct B : A { };
+
+// CHECK: {{define.*@_Z1fP1A}}
+// CHECK-SAME:  personality ptr @__gxx_personality_v0
+B fail;
+const B& f(A *a) {
+  try {
+// CHECK: call ptr @__dynamic_cast
+// CHECK: br i1
+// CHECK: invoke void @__cxa_bad_cast() [[NR:#[0-9]+]]
+dynamic_cast(*a);
+  } catch (...) {
+// CHECK:  landingpad { ptr, i32 }
+// CHECK-NEXT:   catch ptr null
+  }
+  return fail;
+}
+
+// CHECK: declare ptr @__dynamic_cast(ptr, ptr addrspace(1), ptr addrspace(1), i64) [[NUW_RO:#[0-9]+]]
+
+// CHECK: attributes [[NUW_RO]] = { nounwind memory(read) }
+// CHECK: attributes [[NR]] = { noreturn }
Index: clang/lib/CodeGen/ItaniumCXXABI.cpp
===
--- clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -1338,15 +1338,16 @@
 
 static llvm::FunctionCallee getItaniumDynamicCastFn(CodeGenFunction &CGF) {
   // void *__dynamic_cast(const void *sub,
-  //  const abi::__class_type_info *src,
-  //  const abi::__class_type_info *dst,
+  //  global_as const abi::__class_type_info *src,
+  //  global_as const abi::__class_type_info *dst,
   //  std::ptrdiff_t src2dst_offset);
 
   llvm::Type *Int8PtrTy = CGF.Int8PtrTy;
+  llvm::Type *GlobInt8PtrTy = CGF.GlobalsInt8PtrTy;
   llvm::Type *PtrDiffTy =
 CGF.ConvertType(CGF.getContext().getPointerDiffType());
 
-  llvm::Type *Args[4] = { Int8PtrTy, Int8PtrTy, Int8PtrTy, PtrDiffTy };
+  llvm::Type *Args[4] = { Int8PtrTy, GlobInt8PtrTy, GlobInt8PtrTy, PtrDiffTy };
 
   llvm::FunctionType *FTy = llvm::FunctionType::get(Int8PtrTy, Args, false);
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-08-03 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 547024.
AlexVlx added a comment.

Remove noise, correct style.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --stdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__stdpar_unsupported()
Index: clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-STDPAR-DEV %s
+
+// RUN: %clang_cc1 --stdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=STDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-STDPAR-DEV-NOT: define {{.*}} void @_Z3fooPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3fooPff({{.*}})
+void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+__device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3550,7 +3550,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -2238,6 +2238,19 @@
   return nullptr;
 }
 
+static RValue EmitStdParUnsupportedBuiltin(CodeGenFunction *CGF,
+   const FunctionDecl *FD) {
+  auto Name = FD->getNameAsString() + "__stdpar_unsupported";
+  auto FnTy = CGF->CGM.getTypes().GetFunctionType(FD);
+  auto UBF = CGF->CGM.getModule().getOrInsertFunction(Name, FnTy);
+
+  SmallVector Args;
+  for (auto &&FormalTy : FnTy->params())
+Args.push_back(llvm::PoisonValue::get(FormalTy));
+
+  return RValue::get(CGF->Builder.CreateCall(UBF, Args));
+}
+
 RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
 const CallExpr *E,
 ReturnValueSlot ReturnValue) {
@@ -5545,6 +5558,9 @@
 llvm_unreachable("Bad evaluation kind in EmitBuiltinExpr");
   }
 
+  if (getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice)
+return EmitStdParUnsupportedBuiltin(this, FD);
+
   ErrorUnsupported(E, "builtin function");
 
   // Unknown builtin, for now just dump it out and return undef.
Index: clang/lib/CodeGen/BackendUtil.cpp
===
--- clang/lib/CodeGen/BackendUtil.cpp
+++ clang/lib/CodeGen/BackendUtil.cpp
@@ -77,6 +77,7 @@
 #include "llvm/Transforms/Scalar/EarlyCSE.h"
 #include "llvm/Transforms/Scalar/GVN.h"
 #include "llvm/Transforms/Scalar/JumpThreading.h"
+#include "llvm/Transforms/StdPar/StdPar.h"
 #include "llvm/Transforms/Utils/Debugify.h"
 #include "llvm/Transforms/Utils/EntryExitInstrumenter.h"
 #include "llvm/Transforms/Utils/ModuleUtils.h"
@@ -1093,6 +1094,16 @@
   TheModule->addModuleFlag(Module::Error, "UnifiedLTO", uint32_t(1));
   }
 
+  if (LangOpts.HIPStdPar) {
+if (LangOpts.CUDAIsDevice) {
+  if (!TargetTriple.isAMDGCN())
+MPM.addPass(StdParAcceleratorCodeSelectionPass());
+}
+else if (LangOpts.HIPStdParInterposeAlloc) {
+  MPM.addPass(StdParAllocationInterpositionPass());
+}
+  }
+
   // Now that we have all of the passes ready, run them.
   {
 PrettyStackTraceString CrashInfo("Optimizer");
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155775: [Clang][Driver][RFC] Add driver support for C++ Parallel Algorithm Offload

2023-08-06 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 547570.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155775/new/

https://reviews.llvm.org/D155775

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/HIPAMD.cpp
  clang/lib/Driver/ToolChains/ROCm.h
  clang/test/Driver/Inputs/stdpar/stdpar_lib.hpp
  clang/test/Driver/stdpar.c

Index: clang/test/Driver/stdpar.c
===
--- /dev/null
+++ clang/test/Driver/stdpar.c
@@ -0,0 +1,18 @@
+// RUN: %clang -### -stdpar --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=STDPAR-MISSING-LIB %s
+// STDPAR-MISSING-LIB: error: cannot find HIP Standard Parallelism Acceleration library; provide it via '--stdpar-path'
+
+// RUN: %clang -### --stdpar --stdpar-path=%S/Inputs/stdpar \
+// RUN:   --stdpar-thrust-path=%S/Inputs/stdpar/thrust \
+// RUN:   --stdpar-prim-path=%S/Inputs/stdpar/prim --compile %s 2>&1 | \
+// RUN:   FileCheck --check-prefix=STDPAR-COMPILE %s
+// STDPAR-COMPILE: "-x" "hip"
+// STDPAR-COMPILE: "-idirafter" "{{.*/thrust}}"
+// STDPAR-COMPILE: "-idirafter" "{{.*/prim}}"
+// STDPAR-COMPILE: "-idirafter" "{{.*/Inputs/stdpar}}"
+// STDPAR-COMPILE: "-include" "stdpar_lib.hpp"
+
+// RUN: touch %t.o
+// RUN: %clang -### -stdpar %t.o 2>&1 | FileCheck --check-prefix=STDPAR-LINK %s
+// STDPAR-LINK: "-rpath"
+// STDPAR-LINK: "-l{{.*hip.*}}"
Index: clang/lib/Driver/ToolChains/ROCm.h
===
--- clang/lib/Driver/ToolChains/ROCm.h
+++ clang/lib/Driver/ToolChains/ROCm.h
@@ -77,6 +77,9 @@
   const Driver &D;
   bool HasHIPRuntime = false;
   bool HasDeviceLibrary = false;
+  bool HasHIPStdParLibrary = false;
+  bool HasRocThrustLibrary = false;
+  bool HasRocPrimLibrary = false;
 
   // Default version if not detected or specified.
   const unsigned DefaultVersionMajor = 3;
@@ -96,6 +99,13 @@
   std::vector RocmDeviceLibPathArg;
   // HIP runtime path specified by --hip-path.
   StringRef HIPPathArg;
+  // HIP Standard Parallel Algorithm acceleration library specified by
+  // --stdpar-path
+  StringRef HIPStdParPathArg;
+  // rocThrust algorithm library specified by --stdpar-thrust-path
+  StringRef HIPRocThrustPathArg;
+  // rocPrim algorithm library specified by --stdpar-prim-path
+  StringRef HIPRocPrimPathArg;
   // HIP version specified by --hip-version.
   StringRef HIPVersionArg;
   // Wheter -nogpulib is specified.
@@ -180,6 +190,9 @@
   /// Check whether we detected a valid ROCm device library.
   bool hasDeviceLibrary() const { return HasDeviceLibrary; }
 
+  /// Check whether we detected a valid HIP STDPAR Acceleration library.
+  bool hasHIPStdParLibrary() const { return HasHIPStdParLibrary; }
+
   /// Print information about the detected ROCm installation.
   void print(raw_ostream &OS) const;
 
Index: clang/lib/Driver/ToolChains/HIPAMD.cpp
===
--- clang/lib/Driver/ToolChains/HIPAMD.cpp
+++ clang/lib/Driver/ToolChains/HIPAMD.cpp
@@ -115,6 +115,8 @@
 "--no-undefined",
 "-shared",
 "-plugin-opt=-amdgpu-internalize-symbols"};
+  if (Args.hasArg(options::OPT_stdpar))
+LldArgs.push_back("-plugin-opt=-amdgpu-enable-stdpar");
 
   auto &TC = getToolChain();
   auto &D = TC.getDriver();
@@ -246,6 +248,8 @@
   if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
   false))
 CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});
+  if (DriverArgs.hasArgNoClaim(options::OPT_stdpar))
+CC1Args.append({"-mllvm", "-amdgpu-enable-stdpar"});
 
   StringRef MaxThreadsPerBlock =
   DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ);
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6533,6 +6533,12 @@
 if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
  options::OPT_fno_gpu_allow_device_init, false))
   CmdArgs.push_back("-fgpu-allow-device-init");
+if (Args.hasArg(options::OPT_stdpar)) {
+  CmdArgs.push_back("-stdpar");
+
+  if (Args.hasArg(options::OPT_stdpar_interpose_alloc))
+CmdArgs.push_back("-stdpar-interpose-alloc");
+}
 Args.addOptInFlag(CmdArgs, options::OPT_fhip_kernel_arg_name,
   options::OPT_fno_hip_kernel_arg_name);
   }
Index: clang/lib/Driver/ToolChains/AMDGPU.cpp
===
--- clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -329,6 +329,19 @@
   RocmDeviceLibPathArg =
   A

[PATCH] D155769: [Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-06 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 547572.
AlexVlx added a comment.

Remove confusing guidance around mGPU.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/StdParSupport.rst
  clang/docs/index.rst

Index: clang/docs/index.rst
===
--- clang/docs/index.rst
+++ clang/docs/index.rst
@@ -47,6 +47,7 @@
OpenCLSupport
OpenMPSupport
SYCLSupport
+   StdParSupport
HIPSupport
HLSL/HLSLDocs
ThinLTO
Index: clang/docs/StdParSupport.rst
===
--- /dev/null
+++ clang/docs/StdParSupport.rst
@@ -0,0 +1,350 @@
+==
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+.. contents::
+   :local:
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-stdpar --offload-target=foo`` flags, the call to
+``find`` will be offloaded to an accelerator that is part of the ``foo`` target
+family. If either ``foo`` or its runtime environment do not support transparent
+on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--stdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-target``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-stdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``, we rely on a
+pass over IR to clean up any and all code that was not "meant" for offload. If
+requested, allocation interposition is also handled via a separate pass over IR.
+
+To interface with the client HIP runtime, and to forward offloaded algorithm
+invocations to the corresponding accelerator specific library implementation, an
+implementation de

[PATCH] D155769: [Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-06 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx marked an inline comment as done.
AlexVlx added inline comments.



Comment at: clang/docs/StdParSupport.rst:366-367
+
+   Note that this is a temporary, unsafe workaround for a deficiency in the C++
+   Standard.
+

keryell wrote:
> Another way could be to hide somehow a way to select the device in the policy 
> like in https://github.com/KhronosGroup/SyclParallelSTL, which might be 
> something included in your point "4." of "Open Questions / Future 
> Developments".
> Perhaps better than opening the TLS Pandora box?
In hindsight, this was needlessly confusing and relied on an implementation 
detail, therefore the reference was removed. Thank you for pointing that out.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-08-07 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 548022.
AlexVlx added a comment.

Extend handling of unsupported builtins to include dealing with the `target` 
attribute.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
  clang/test/CodeGenStdPar/unsupported-builtins.cpp

Index: clang/test/CodeGenStdPar/unsupported-builtins.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unsupported-builtins.cpp
@@ -0,0 +1,8 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu \
+// RUN:   --stdpar -x hip -emit-llvm -fcuda-is-device -o - %s | FileCheck %s
+
+#define __global__ __attribute__((global))
+
+__global__ void foo() { return __builtin_ia32_pause(); }
+
+// CHECK: declare void @__builtin_ia32_pause__stdpar_unsupported()
Index: clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
===
--- /dev/null
+++ clang/test/CodeGenStdPar/unannotated-functions-get-emitted.cpp
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 -x hip -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=NO-STDPAR-DEV %s
+
+// RUN: %clang_cc1 --stdpar -emit-llvm -fcuda-is-device \
+// RUN:   -o - %s | FileCheck --check-prefix=STDPAR-DEV %s
+
+#define __device__ __attribute__((device))
+
+// NO-STDPAR-DEV-NOT: define {{.*}} void @_Z3fooPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3fooPff({{.*}})
+void foo(float *a, float b) {
+  *a = b;
+}
+
+// NO-STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+// STDPAR-DEV: define {{.*}} void @_Z3barPff({{.*}})
+__device__ void bar(float *a, float b) {
+  *a = b;
+}
Index: clang/lib/CodeGen/CodeGenModule.cpp
===
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -3550,7 +3550,10 @@
   !Global->hasAttr() &&
   !Global->hasAttr() &&
   !Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
-  !Global->getType()->isCUDADeviceBuiltinTextureType())
+  !Global->getType()->isCUDADeviceBuiltinTextureType() &&
+  !(LangOpts.HIPStdPar &&
+isa(Global) &&
+!Global->hasAttr()))
 return;
 } else {
   // We need to emit host-side 'shadows' for all global
Index: clang/lib/CodeGen/CodeGenFunction.cpp
===
--- clang/lib/CodeGen/CodeGenFunction.cpp
+++ clang/lib/CodeGen/CodeGenFunction.cpp
@@ -2594,10 +2594,15 @@
   std::string MissingFeature;
   llvm::StringMap CallerFeatureMap;
   CGM.getContext().getFunctionFeatureMap(CallerFeatureMap, FD);
+  // When compiling in StdPar mode we have to be conservative in rejecting
+  // target specific features in the FE, and defer the possible error to the
+  // AcceleratorCodeSelection pass, wherein iff an unsupported target builtin is
+  // referenced by an accelerator executable function, we emit an error.
+  bool IsStdPar = getLangOpts().HIPStdPar && getLangOpts().CUDAIsDevice
   if (BuiltinID) {
 StringRef FeatureList(CGM.getContext().BuiltinInfo.getRequiredFeatures(BuiltinID));
 if (!Builtin::evaluateRequiredTargetFeatures(
-FeatureList, CallerFeatureMap)) {
+FeatureList, CallerFeatureMap) && !IsStdPar) {
   CGM.getDiags().Report(Loc, diag::err_builtin_needs_feature)
   << TargetDecl->getDeclName()
   << FeatureList;
@@ -2630,7 +2635,7 @@
 return false;
   }
   return true;
-}))
+}) && !IsStdPar)
   CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
   << FD->getDeclName() << TargetDecl->getDeclName() << MissingFeature;
   } else if (!FD->isMultiVersion() && FD->hasAttr()) {
@@ -2639,7 +2644,8 @@
 
 for (const auto &F : CalleeFeatureMap) {
   if (F.getValue() && (!CallerFeatureMap.lookup(F.getKey()) ||
-   !CallerFeatureMap.find(F.getKey())->getValue()))
+   !CallerFeatureMap.find(F.getKey())->getValue()) &&
+  !IsStdPar)
 CGM.getDiags().Report(Loc, diag::err_function_needs_feature)
 << FD->getDeclName() << TargetDecl->getDeclName() << F.getKey();
 }
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -2238,6 +2238,19 @@
   return nullptr;
 }
 
+static RValue EmitStdParUnsupportedBuiltin(CodeGenFunction *CGF,
+   const FunctionDecl *FD) {
+  auto Name = FD->getNameAsString() + "__stdpar_unsupported";
+  auto FnTy = CGF->CGM.getTypes().GetFunctionType(FD);
+

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

2023-08-08 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D155850#4570336 , @efriedma wrote:

> LGTM (but please don't merge until we reach consensus on the overall feature)

Of course, and thank you for the review. Please, do stick around if you don't 
mind, because this'll still get at least one update.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D157452: [RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces

2023-08-08 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx created this revision.
AlexVlx added reviewers: rjmccall, efriedma, yaxunl, arsenm.
AlexVlx added a project: clang.
Herald added a project: All.
AlexVlx requested review of this revision.
Herald added subscribers: cfe-commits, wdng.

After https://reviews.llvm.org/D153092, for targets that use a non-default AS 
for globals, an "interesting" situation arises around `typeid` and its paired 
type, `type_info`:

- on the AST level, the `type_info` interface is defined with default / generic 
addresses, be it for function arguments, or for `this`;
- in IR, `type_info` values are globals, and thus pointers to `type_info` 
values are pointers to global

This leads to a mismatch between the function signature / formal type of the 
argument, and its actual type. Currently we try to handle such mismatches via 
bitcast, but that is wrong in this case, since an ascast is required. 
**However**, I'm not convinced this is the right way to address it, I've just 
not found a better / less noisy one (yet?); the other alternative would be to 
special case codegen for typeinfo itself, and adjust the IR function signatures 
there. That's less centralised and doesn't actually seem correct, since 
`type_info` has a C++(mangled) interface, and we'd be basically lying about the 
type. Hopefully I'm missing some obvious and significantly more tidy solution - 
hence the RFC.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D157452

Files:
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CGExprCXX.cpp
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/typeid-cxx11-with-address-space.cpp
  clang/test/CodeGenCXX/typeid-with-address-space.cpp
  clang/test/CodeGenCXX/typeinfo
  clang/test/CodeGenCXX/typeinfo-with-address-space.cpp

Index: clang/test/CodeGenCXX/typeinfo-with-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/typeinfo-with-address-space.cpp
@@ -0,0 +1,51 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -o - | FileCheck %s -check-prefix=AS
+// RUN: %clang_cc1 -I%S %s -triple x86_64-linux-gnu -emit-llvm -o - | FileCheck %s -check-prefix=NO-AS
+#include 
+
+class A {
+virtual void f() = 0;
+};
+
+class B : A {
+void f() override;
+};
+
+// AS: @_ZTISt9type_info = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTISt9type_info = external constant ptr
+// AS: @_ZTIi = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTIi = external constant ptr
+// AS: @_ZTVN10__cxxabiv117__class_type_infoE = external addrspace(1) global ptr addrspace(1)
+// NO-AS: @_ZTVN10__cxxabiv117__class_type_infoE = external global ptr
+// AS: @_ZTS1A = linkonce_odr addrspace(1) constant [3 x i8] c"1A\00", comdat, align 1
+// NO-AS: @_ZTS1A = linkonce_odr constant [3 x i8] c"1A\00", comdat, align 1
+// AS: @_ZTI1A = linkonce_odr addrspace(1) constant { ptr addrspace(1), ptr addrspace(1) } { ptr addrspace(1) getelementptr inbounds (ptr addrspace(1), ptr addrspace(1) @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr addrspace(1) @_ZTS1A }, comdat, align 8
+// NO-AS: @_ZTI1A = linkonce_odr constant { ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr @_ZTS1A }, comdat, align 8
+// AS: @_ZTIf = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTIf = external constant ptr
+
+unsigned long Fn(B& b) {
+// AS: %3 = addrspacecast ptr addrspace(1) %2 to ptr
+// AS-NEXT: %call = call noundef zeroext i1 @_ZNKSt9type_infoeqERKS_(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTISt9type_info to ptr), ptr {{.*}} %3)
+// NO-AS: %call = call noundef zeroext i1 @_ZNKSt9type_infoeqERKS_(ptr {{.*}} @_ZTISt9type_info, ptr {{.*}} %2)
+if (typeid(std::type_info) == typeid(b))
+return 42;
+// AS: %7 = addrspacecast ptr addrspace(1) %6 to ptr
+// AS-NEXT: %call2 = call noundef zeroext i1 @_ZNKSt9type_infoneERKS_(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTIi to ptr), ptr {{.*}} %7)
+// NO-AS: %call2 = call noundef zeroext i1 @_ZNKSt9type_infoneERKS_(ptr {{.*}} @_ZTIi, ptr {{.*}} %5)
+if (typeid(int) != typeid(b))
+return 1712;
+// AS: %11 = addrspacecast ptr addrspace(1) %10 to ptr
+// AS-NEXT: %call7 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} %11)
+// NO-AS: %call7 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} %8)
+if (typeid(A).name() == typeid(b).name())
+return 0;
+// AS: %15 = addrspacecast ptr addrspace(1) %14 to ptr
+// AS-NEXT: %call11 = call noundef zeroext i1 @_ZNKSt9type_info6beforeERKS_(ptr {{.*}} %15, ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTIf to ptr))
+// NO-AS:   %call11 = call noundef zeroext i1 @_ZNKSt9type_info6beforeERKS_(ptr {{.*}} %11, ptr {{.*}} @_ZTIf)
+if (typeid(b).before(typeid(float)))
+return 1;
+// AS: %19 = addrspacecast ptr addrspace(1) %18 to ptr
+// AS-NEXT: %call15 = call noundef i64 @_ZNKSt9type_info9hash_codeEv(ptr {{.*}} %19)
+// NO-AS: %cal

[PATCH] D157452: [RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces

2023-08-09 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

In D157452#4573076 , @yaxunl wrote:

> It is a little concerning how far the global address will spread further. 
> Compared to handling user-defined global variables, we keep the global 
> address to its definition in the IR and any use of it will use the generic 
> pointer addrcasted from its definition. This simplifies things a lot since 
> the AST is not aware of the global address. Should we reconsider the handling 
> of type id and type info here with a similar approach to how the user-defined 
> global variables are handled? Or we are confident that the effect of global 
> address can be confined.

This is a really good observation / question. I'm fairly confident this should 
be the last piece of noise, until (if) we decide to do something about 
functions. My reservation about address casting uses is that it might lead to a 
proliferation of casts, and it also appeared (when I tried) that it would be 
fairly spread out. `type_info` is special and a bit obnoxious because it is 
actually defined in the standard and has a mangled interface, so "lying" about 
the signature of its member functions in IR seems risky too. FWIW, this was 
silently incorrect in some cases today as well (without `assert`s we just do 
the bitcast, which happens to work on our target).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157452/new/

https://reviews.llvm.org/D157452

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-10 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 556369.
AlexVlx added a comment.

Fix some wording to further clarify this is an HIP + AMDGPU exclusive extension.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -340,6 +340,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,392 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This section describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of AMD GPUs. This, coupled with the
+the availability and maturity of GPU accelerated algorithm libraries that
+implement most / all corresponding algorithms in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an AMD accelerator, via a well specified,
+familiar, algorithmic interface, without having to delve into low-level hardware
+specific details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise AMDGPU accelerator programming, without loss of user
+  familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-12 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 556618.
AlexVlx added a comment.

Address review feedback around explicit use of `std::`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -340,6 +340,11 @@
 CUDA/HIP Language Changes
 ^

+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 

Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,392 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.

+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This section describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of AMD GPUs. This, coupled with the
+the availability and maturity of GPU accelerated algorithm libraries that
+implement most / all corresponding algorithms in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an AMD accelerator, via a well specified,
+familiar, algorithmic interface, without having to delve into low-level hardware
+specific details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise AMDGPU accelerator programming, without loss of user
+  familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code:
+
+.. code-block:: C++
+
+   bool has_the_answer(const std::vector& v) {
+ return std::find(std::execution::par_unseq, std::cbegin(v), std::cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``,

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-12 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx marked an inline comment as done.
AlexVlx added inline comments.



Comment at: clang/docs/HIPSupport.rst:216
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+

keryell wrote:
> Since this does not sounds like an official wording and this is not a 
> recommended practice 
> https://isocpp.org/wiki/faq/coding-standards#using-namespace-std, I suggest 
> just adding `std::` everywhere since this is an end-user document.
> Further more it makes clear that your extension can work with the standard 
> library instead of something that would be declared in a namespace from your 
> extension.
> 
Addressed, thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153092: [Clang][CodeGen]`vtable`, `typeinfo` et al. are globals

2023-07-09 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 538437.
AlexVlx added a comment.
Herald added a subscriber: wangpc.

Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153092/new/

https://reviews.llvm.org/D153092

Files:
  clang/lib/CodeGen/CGVTT.cpp
  clang/lib/CodeGen/CGVTables.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/vtable-align-address-space.cpp
  clang/test/CodeGenCXX/vtable-assume-load-address-space.cpp
  clang/test/CodeGenCXX/vtable-consteval-address-space.cpp
  clang/test/CodeGenCXX/vtable-constexpr-address-space.cpp
  clang/test/CodeGenCXX/vtable-key-function-address-space.cpp
  clang/test/CodeGenCXX/vtable-layout-extreme-address-space.cpp
  clang/test/CodeGenCXX/vtable-linkage-address-space.cpp
  clang/test/CodeGenCXX/vtable-pointer-initialization-address-space.cpp
  clang/test/CodeGenCXX/vtt-address-space.cpp
  clang/test/CodeGenCXX/vtt-layout-address-space.cpp
  clang/test/Headers/hip-header.hip

Index: clang/test/Headers/hip-header.hip
===
--- clang/test/Headers/hip-header.hip
+++ clang/test/Headers/hip-header.hip
@@ -43,6 +43,22 @@
 
 // expected-no-diagnostics
 
+// Check handling of overriden, implicitly __host__ dtor (should emit as a
+// nullptr to global)
+
+struct vbase {
+virtual ~vbase();
+};
+
+template
+struct vderived : public vbase {
+~vderived();
+};
+
+template struct vderived;
+
+// CHECK: @_ZTV8vderivedIvE = weak_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } zeroinitializer, comdat, align 8
+
 // Check support for pure and deleted virtual functions
 struct base {
   __host__
@@ -60,9 +76,8 @@
 __device__ void test_vf() {
 derived d;
 }
-// CHECK: @_ZTV7derived = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr] } { [4 x ptr] [ptr null, ptr null, ptr @_ZN7derived2pvEv, ptr @__cxa_deleted_virtual] }, comdat, align 8
-// CHECK: @_ZTV4base = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr] } { [4 x ptr] [ptr null, ptr null, ptr @__cxa_pure_virtual, ptr @__cxa_deleted_virtual] }, comdat, align 8
-
+// CHECK: @_ZTV7derived = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @_ZN7derived2pvEv to ptr addrspace(1)), ptr addrspace(1) addrspacecast (ptr @__cxa_deleted_virtual to ptr addrspace(1))] }, comdat, align 8
+// CHECK: @_ZTV4base = linkonce_odr unnamed_addr addrspace(1) constant { [4 x ptr addrspace(1)] } { [4 x ptr addrspace(1)] [ptr addrspace(1) null, ptr addrspace(1) null, ptr addrspace(1) addrspacecast (ptr @__cxa_pure_virtual to ptr addrspace(1)), ptr addrspace(1) addrspacecast (ptr @__cxa_deleted_virtual to ptr addrspace(1))] }, comdat, align 8
 // CHECK: define{{.*}}void @__cxa_pure_virtual()
 // CHECK: define{{.*}}void @__cxa_deleted_virtual()
 
Index: clang/test/CodeGenCXX/vtt-layout-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/vtt-layout-address-space.cpp
@@ -0,0 +1,89 @@
+// RUN: %clang_cc1 %s -triple=amdgcn-amd-amdhsa -std=c++11 -emit-llvm -o - | FileCheck %s
+
+// Test1::B should just have a single entry in its VTT, which points to the vtable.
+namespace Test1 {
+struct A { };
+
+struct B : virtual A {
+  virtual void f();
+};
+
+void B::f() { }
+}
+
+// Check that we don't add a secondary virtual pointer for Test2::A, since Test2::A doesn't have any virtual member functions or bases.
+namespace Test2 {
+  struct A { };
+
+  struct B : A { virtual void f(); };
+  struct C : virtual B { };
+
+  C c;
+}
+
+// This is the sample from the C++ Itanium ABI, p2.6.2.
+namespace Test3 {
+  class A1 { int i; };
+  class A2 { int i; virtual void f(); };
+  class V1 : public A1, public A2 { int i; };
+  class B1 { int i; };
+  class B2 { int i; };
+  class V2 : public B1, public B2, public virtual V1 { int i; };
+  class V3 {virtual void g(); };
+  class C1 : public virtual V1 { int i; };
+  class C2 : public virtual V3, virtual V2 { int i; };
+  class X1 { int i; };
+  class C3 : public X1 { int i; };
+  class D : public C1, public C2, public C3 { int i;  };
+
+  D d;
+}
+
+// This is the sample from the C++ Itanium ABI, p2.6.2, with the change suggested
+// (making A2 a virtual base of V1)
+namespace Test4 {
+  class A1 { int i; };
+  class A2 { int i; virtual void f(); };
+  class V1 : public A1, public virtual A2 { int i; };
+  class B1 { int i; };
+  class B2 { int i; };
+  class V2 : public B1, public B2, public virtual V1 { int i; };
+  class V3 {virtual void g(); };
+  class C1 : public virtual V1 { int i; };
+  class C2 : public virtual V3, virtual V2 { int i; };
+  class X1 { int i; };
+  class C3 : public X1 { int i; };
+  class D : public C1, public C2, public C3 { int i;  };
+
+  D d;
+}
+
+namespace Test5 {
+  struct A {
+virtual void f() = 0;
+virtual void anchor()

[PATCH] D153092: [Clang][CodeGen]`vtable`, `typeinfo` et al. are globals

2023-07-13 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx added a comment.

Ping.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153092/new/

https://reviews.llvm.org/D153092

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155826: [HIP][Clang][Preprocessor][RFC] Add preprocessor support for C++ Parallel Algorithm Offload

2023-08-23 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 552952.
AlexVlx added a comment.

Define the interpose macro iff in interpose mode.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155826/new/

https://reviews.llvm.org/D155826

Files:
  clang/lib/Frontend/InitPreprocessor.cpp
  clang/test/Preprocessor/predefined-macros.c


Index: clang/test/Preprocessor/predefined-macros.c
===
--- clang/test/Preprocessor/predefined-macros.c
+++ clang/test/Preprocessor/predefined-macros.c
@@ -290,3 +290,20 @@
 // RUN:   -fcuda-is-device -fgpu-default-stream=per-thread \
 // RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-PTH
 // CHECK-PTH: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -triple 
x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-HIPSTDPAR
+// CHECK-HIPSTDPAR: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-NOT: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -hipstdpar-interpose-alloc 
\
+// RUN:  -triple x86_64-unknown-linux-gnu | FileCheck -match-full-lines %s \
+// RUN:  --check-prefix=CHECK-HIPSTDPAR-INTERPOSE
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -hipstdpar-interpose-alloc 
\
+// RUN:  -triple amdgcn-amd-amdhsa -fcuda-is-device | FileCheck 
-match-full-lines \
+// RUN:  %s --check-prefix=CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG-NOT: #define 
__HIPSTDPAR_INTERPOSE_ALLOC__ 1
\ No newline at end of file
Index: clang/lib/Frontend/InitPreprocessor.cpp
===
--- clang/lib/Frontend/InitPreprocessor.cpp
+++ clang/lib/Frontend/InitPreprocessor.cpp
@@ -585,6 +585,11 @@
 Builder.defineMacro("__HIP_MEMORY_SCOPE_WORKGROUP", "3");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_AGENT", "4");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_SYSTEM", "5");
+if (LangOpts.HIPStdPar) {
+  Builder.defineMacro("__HIPSTDPAR__");
+  if (LangOpts.HIPStdParInterposeAlloc)
+Builder.defineMacro("__HIPSTDPAR_INTERPOSE_ALLOC__");
+}
 if (LangOpts.CUDAIsDevice) {
   Builder.defineMacro("__HIP_DEVICE_COMPILE__");
   if (!TI.hasHIPImageSupport()) {


Index: clang/test/Preprocessor/predefined-macros.c
===
--- clang/test/Preprocessor/predefined-macros.c
+++ clang/test/Preprocessor/predefined-macros.c
@@ -290,3 +290,20 @@
 // RUN:   -fcuda-is-device -fgpu-default-stream=per-thread \
 // RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-PTH
 // CHECK-PTH: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -triple x86_64-unknown-linux-gnu \
+// RUN:   | FileCheck -match-full-lines %s --check-prefix=CHECK-HIPSTDPAR
+// CHECK-HIPSTDPAR: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-NOT: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -hipstdpar-interpose-alloc \
+// RUN:  -triple x86_64-unknown-linux-gnu | FileCheck -match-full-lines %s \
+// RUN:  --check-prefix=CHECK-HIPSTDPAR-INTERPOSE
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
+// CHECK-HIPSTDPAR-INTERPOSE: #define __HIPSTDPAR__ 1
+
+// RUN: %clang_cc1 %s -E -dM -o - -x hip -hipstdpar -hipstdpar-interpose-alloc \
+// RUN:  -triple amdgcn-amd-amdhsa -fcuda-is-device | FileCheck -match-full-lines \
+// RUN:  %s --check-prefix=CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG: #define __HIPSTDPAR__ 1
+// CHECK-HIPSTDPAR-INTERPOSE-DEV-NEG-NOT: #define __HIPSTDPAR_INTERPOSE_ALLOC__ 1
\ No newline at end of file
Index: clang/lib/Frontend/InitPreprocessor.cpp
===
--- clang/lib/Frontend/InitPreprocessor.cpp
+++ clang/lib/Frontend/InitPreprocessor.cpp
@@ -585,6 +585,11 @@
 Builder.defineMacro("__HIP_MEMORY_SCOPE_WORKGROUP", "3");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_AGENT", "4");
 Builder.defineMacro("__HIP_MEMORY_SCOPE_SYSTEM", "5");
+if (LangOpts.HIPStdPar) {
+  Builder.defineMacro("__HIPSTDPAR__");
+  if (LangOpts.HIPStdParInterposeAlloc)
+Builder.defineMacro("__HIPSTDPAR_INTERPOSE_ALLOC__");
+}
 if (LangOpts.CUDAIsDevice) {
   Builder.defineMacro("__HIP_DEVICE_COMPILE__");
   if (!TI.hasHIPImageSupport()) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-27 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 553799.
AlexVlx edited reviewers, added: jhuber6; removed: tra, jdoerfert.
AlexVlx added a comment.
Herald added a reviewer: jdoerfert.

Fix typo, add release notes reflecting the HIP-only nature.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -279,6 +279,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,387 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know tha

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-27 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 553800.
AlexVlx added a comment.

Add back clarification about RDC.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -279,6 +279,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,391 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_polic

[PATCH] D157452: [RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces

2023-08-28 Thread Alex Voicu via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9c760ca8ecfd: [Clang][CodeGen] `typeid` needs special care 
when `type_info` is not in the… (authored by AlexVlx).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157452/new/

https://reviews.llvm.org/D157452

Files:
  clang/lib/CodeGen/CGExprCXX.cpp
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGenCXX/typeid-cxx11-with-address-space.cpp
  clang/test/CodeGenCXX/typeid-with-address-space.cpp
  clang/test/CodeGenCXX/typeinfo
  clang/test/CodeGenCXX/typeinfo-with-address-space.cpp

Index: clang/test/CodeGenCXX/typeinfo-with-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/typeinfo-with-address-space.cpp
@@ -0,0 +1,48 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -o - | FileCheck %s -check-prefix=AS
+// RUN: %clang_cc1 -I%S %s -triple x86_64-linux-gnu -emit-llvm -o - | FileCheck %s -check-prefix=NO-AS
+#include 
+
+class A {
+virtual void f() = 0;
+};
+
+class B : A {
+void f() override;
+};
+
+// AS: @_ZTISt9type_info = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTISt9type_info = external constant ptr
+// AS: @_ZTIi = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTIi = external constant ptr
+// AS: @_ZTVN10__cxxabiv117__class_type_infoE = external addrspace(1) global ptr addrspace(1)
+// NO-AS: @_ZTVN10__cxxabiv117__class_type_infoE = external global ptr
+// AS: @_ZTS1A = linkonce_odr addrspace(1) constant [3 x i8] c"1A\00", comdat, align 1
+// NO-AS: @_ZTS1A = linkonce_odr constant [3 x i8] c"1A\00", comdat, align 1
+// AS: @_ZTI1A = linkonce_odr addrspace(1) constant { ptr addrspace(1), ptr addrspace(1) } { ptr addrspace(1) getelementptr inbounds (ptr addrspace(1), ptr addrspace(1) @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr addrspace(1) @_ZTS1A }, comdat, align 8
+// NO-AS: @_ZTI1A = linkonce_odr constant { ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr @_ZTS1A }, comdat, align 8
+// AS: @_ZTIf = external addrspace(1) constant ptr addrspace(1)
+// NO-AS: @_ZTIf = external constant ptr
+
+unsigned long Fn(B& b) {
+// AS: %call = call noundef zeroext i1 @_ZNKSt9type_infoeqERKS_(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTISt9type_info to ptr), ptr {{.*}} %2)
+// NO-AS: %call = call noundef zeroext i1 @_ZNKSt9type_infoeqERKS_(ptr {{.*}} @_ZTISt9type_info, ptr {{.*}} %2)
+if (typeid(std::type_info) == typeid(b))
+return 42;
+// AS: %call2 = call noundef zeroext i1 @_ZNKSt9type_infoneERKS_(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTIi to ptr), ptr {{.*}} %5)
+// NO-AS: %call2 = call noundef zeroext i1 @_ZNKSt9type_infoneERKS_(ptr {{.*}} @_ZTIi, ptr {{.*}} %5)
+if (typeid(int) != typeid(b))
+return 1712;
+// AS: %call5 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTI1A to ptr))
+// NO-AS: %call5 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} @_ZTI1A)
+// AS: %call7 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} %8)
+// NO-AS: %call7 = call noundef ptr @_ZNKSt9type_info4nameEv(ptr {{.*}} %8)
+if (typeid(A).name() == typeid(b).name())
+return 0;
+// AS: %call11 = call noundef zeroext i1 @_ZNKSt9type_info6beforeERKS_(ptr {{.*}} %11, ptr {{.*}} addrspacecast (ptr addrspace(1) @_ZTIf to ptr))
+// NO-AS:   %call11 = call noundef zeroext i1 @_ZNKSt9type_info6beforeERKS_(ptr {{.*}} %11, ptr {{.*}} @_ZTIf)
+if (typeid(b).before(typeid(float)))
+return 1;
+// AS: %call15 = call noundef i64 @_ZNKSt9type_info9hash_codeEv(ptr {{.*}} %14)
+// NO-AS: %call15 = call noundef i64 @_ZNKSt9type_info9hash_codeEv(ptr {{.*}} %14)
+return typeid(b).hash_code();
+}
Index: clang/test/CodeGenCXX/typeinfo
===
--- clang/test/CodeGenCXX/typeinfo
+++ clang/test/CodeGenCXX/typeinfo
@@ -10,6 +10,14 @@
 bool operator!=(const type_info& __arg) const {
   return !operator==(__arg);
 }
+
+bool before(const type_info& __arg) const {
+  return __name < __arg.__name;
+}
+
+unsigned long hash_code() const {
+  return reinterpret_cast(__name);
+}
   protected:
 const char *__name;
   };
Index: clang/test/CodeGenCXX/typeid-with-address-space.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/typeid-with-address-space.cpp
@@ -0,0 +1,50 @@
+// RUN: %clang_cc1 -I%S %s -triple amdgcn-amd-amdhsa -emit-llvm -fcxx-exceptions -fexceptions -o - | FileCheck %s
+#include 
+
+namespace Test1 {
+
+// PR7400
+struct A { virtual void f(); };
+
+// CHECK: @_ZN5Test16int_tiE ={{.*}} constant ptr addrspacecast (ptr addrspace(1) @_ZTIi to ptr), align 8
+const std::type_info &int_ti = typeid(int

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-01 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 555450.
AlexVlx added a reviewer: JonChesterfield.
AlexVlx added a comment.

Correctly reflect AMDGPU-only nature of the extension.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -311,6 +311,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,391 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded

[PATCH] D39857: [AMDGPU] Late parsed / dependent arguments for AMDGPU kernel attributes

2017-11-09 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 122318.

https://reviews.llvm.org/D39857

Files:
  include/clang/Basic/Attr.td
  lib/CodeGen/TargetInfo.cpp
  lib/Sema/SemaDeclAttr.cpp

Index: lib/Sema/SemaDeclAttr.cpp
===
--- lib/Sema/SemaDeclAttr.cpp
+++ lib/Sema/SemaDeclAttr.cpp
@@ -5469,16 +5469,36 @@
   }
 }
 
+namespace
+{
+  bool checkAllAreIntegral(const AttributeList &Attr, Sema &S) {
+for (auto i = 0u; i != Attr.getNumArgs(); ++i) {
+  auto e = Attr.getArgAsExpr(i);
+  if (e && !e->getType()->isIntegralOrEnumerationType()) {
+S.Diag(getAttrLoc(Attr), diag::err_attribute_argument_n_type)
+  << getAttrName(Attr) << i << AANT_ArgumentIntegerConstant
+  << e->getSourceRange();
+
+return false;
+  }
+}
+
+return true;
+  }
+}
+
 static void handleAMDGPUFlatWorkGroupSizeAttr(Sema &S, Decl *D,
   const AttributeList &Attr) {
   uint32_t Min = 0;
   Expr *MinExpr = Attr.getArgAsExpr(0);
-  if (!checkUInt32Argument(S, Attr, MinExpr, Min))
+  if (MinExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, MinExpr, Min))
 return;
 
   uint32_t Max = 0;
   Expr *MaxExpr = Attr.getArgAsExpr(1);
-  if (!checkUInt32Argument(S, Attr, MaxExpr, Max))
+  if (MaxExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, MaxExpr, Max))
 return;
 
   if (Min == 0 && Max != 0) {
@@ -5493,21 +5513,28 @@
   }
 
   D->addAttr(::new (S.Context)
- AMDGPUFlatWorkGroupSizeAttr(Attr.getLoc(), S.Context, Min, Max,
- Attr.getAttributeSpellingListIndex()));
+ AMDGPUFlatWorkGroupSizeAttr(
+   Attr.getLoc(), S.Context, MinExpr, MaxExpr,
+   Attr.getAttributeSpellingListIndex()));
 }
 
 static void handleAMDGPUWavesPerEUAttr(Sema &S, Decl *D,
const AttributeList &Attr) {
+  if (!checkAllAreIntegral(Attr, S))
+return;
+
   uint32_t Min = 0;
   Expr *MinExpr = Attr.getArgAsExpr(0);
-  if (!checkUInt32Argument(S, Attr, MinExpr, Min))
+  if (MinExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, MinExpr, Min))
 return;
 
   uint32_t Max = 0;
+  Expr *MaxExpr = MinExpr;
   if (Attr.getNumArgs() == 2) {
-Expr *MaxExpr = Attr.getArgAsExpr(1);
-if (!checkUInt32Argument(S, Attr, MaxExpr, Max))
+MaxExpr = Attr.getArgAsExpr(1);
+if (MaxExpr->isEvaluatable(S.Context) &&
+!checkUInt32Argument(S, Attr, MaxExpr, Max))
   return;
   }
 
@@ -5523,31 +5550,39 @@
   }
 
   D->addAttr(::new (S.Context)
- AMDGPUWavesPerEUAttr(Attr.getLoc(), S.Context, Min, Max,
+ AMDGPUWavesPerEUAttr(Attr.getLoc(), S.Context, MinExpr, MaxExpr,
   Attr.getAttributeSpellingListIndex()));
 }
 
 static void handleAMDGPUNumSGPRAttr(Sema &S, Decl *D,
 const AttributeList &Attr) {
+  if (!checkAllAreIntegral(Attr, S))
+return;
+
   uint32_t NumSGPR = 0;
   Expr *NumSGPRExpr = Attr.getArgAsExpr(0);
-  if (!checkUInt32Argument(S, Attr, NumSGPRExpr, NumSGPR))
+  if (NumSGPRExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, NumSGPRExpr, NumSGPR))
 return;
 
   D->addAttr(::new (S.Context)
- AMDGPUNumSGPRAttr(Attr.getLoc(), S.Context, NumSGPR,
+ AMDGPUNumSGPRAttr(Attr.getLoc(), S.Context, NumSGPRExpr,
Attr.getAttributeSpellingListIndex()));
 }
 
 static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D,
 const AttributeList &Attr) {
+  if (!checkAllAreIntegral(Attr, S))
+return;
+
   uint32_t NumVGPR = 0;
   Expr *NumVGPRExpr = Attr.getArgAsExpr(0);
-  if (!checkUInt32Argument(S, Attr, NumVGPRExpr, NumVGPR))
+  if (NumVGPRExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, NumVGPRExpr, NumVGPR))
 return;
 
   D->addAttr(::new (S.Context)
- AMDGPUNumVGPRAttr(Attr.getLoc(), S.Context, NumVGPR,
+ AMDGPUNumVGPRAttr(Attr.getLoc(), S.Context, NumVGPRExpr,
Attr.getAttributeSpellingListIndex()));
 }
 
Index: lib/CodeGen/TargetInfo.cpp
===
--- lib/CodeGen/TargetInfo.cpp
+++ lib/CodeGen/TargetInfo.cpp
@@ -7661,6 +7661,18 @@
 };
 }
 
+namespace
+{
+  inline
+  llvm::APSInt getConstexprInt(const Expr *E, const ASTContext& Ctx)
+  {
+llvm::APSInt r{32, 0};
+if (E) E->EvaluateAsInt(r, Ctx);
+
+return r;
+  }
+}
+
 void AMDGPUTargetCodeGenInfo::setTargetAttributes(
 const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M,
 ForDefinition_t IsForDefinition) const {
@@ -7676,8 +7688,11 @@
 FD->getAttr() : nullptr;
   const auto *FlatWGS = FD->getAttr();
   if (ReqdWGS || FlatWGS) {
-unsigned Min = FlatWGS ? FlatWGS->getMin() : 0;
-unsigned Max = Fla

[PATCH] D39857: [AMDGPU] Late parsed / dependent arguments for AMDGPU kernel attributes

2017-11-09 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx updated this revision to Diff 122323.
AlexVlx marked 7 inline comments as done.

https://reviews.llvm.org/D39857

Files:
  include/clang/Basic/Attr.td
  lib/CodeGen/TargetInfo.cpp
  lib/Sema/SemaDeclAttr.cpp

Index: lib/Sema/SemaDeclAttr.cpp
===
--- lib/Sema/SemaDeclAttr.cpp
+++ lib/Sema/SemaDeclAttr.cpp
@@ -5469,16 +5469,33 @@
   }
 }
 
+static bool checkAllAreIntegral(Sema &S, const AttributeList &Attr) {
+  for (auto i = 0u; i != Attr.getNumArgs(); ++i) {
+Expr* E = Attr.getArgAsExpr(i);
+if (E && !E->getType()->isIntegralOrEnumerationType()) {
+  S.Diag(getAttrLoc(Attr), diag::err_attribute_argument_n_type)
+<< getAttrName(Attr) << i << AANT_ArgumentIntegerConstant
+<< E->getSourceRange();
+
+  return false;
+}
+  }
+
+  return true;
+}
+
 static void handleAMDGPUFlatWorkGroupSizeAttr(Sema &S, Decl *D,
   const AttributeList &Attr) {
   uint32_t Min = 0;
   Expr *MinExpr = Attr.getArgAsExpr(0);
-  if (!checkUInt32Argument(S, Attr, MinExpr, Min))
+  if (MinExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, MinExpr, Min))
 return;
 
   uint32_t Max = 0;
   Expr *MaxExpr = Attr.getArgAsExpr(1);
-  if (!checkUInt32Argument(S, Attr, MaxExpr, Max))
+  if (MaxExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, MaxExpr, Max))
 return;
 
   if (Min == 0 && Max != 0) {
@@ -5493,21 +5510,28 @@
   }
 
   D->addAttr(::new (S.Context)
- AMDGPUFlatWorkGroupSizeAttr(Attr.getLoc(), S.Context, Min, Max,
- Attr.getAttributeSpellingListIndex()));
+ AMDGPUFlatWorkGroupSizeAttr(
+   Attr.getLoc(), S.Context, MinExpr, MaxExpr,
+   Attr.getAttributeSpellingListIndex()));
 }
 
 static void handleAMDGPUWavesPerEUAttr(Sema &S, Decl *D,
const AttributeList &Attr) {
+  if (!checkAllAreIntegral(S, Attr))
+return;
+
   uint32_t Min = 0;
   Expr *MinExpr = Attr.getArgAsExpr(0);
-  if (!checkUInt32Argument(S, Attr, MinExpr, Min))
+  if (MinExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, MinExpr, Min))
 return;
 
   uint32_t Max = 0;
+  Expr *MaxExpr = MinExpr;
   if (Attr.getNumArgs() == 2) {
-Expr *MaxExpr = Attr.getArgAsExpr(1);
-if (!checkUInt32Argument(S, Attr, MaxExpr, Max))
+MaxExpr = Attr.getArgAsExpr(1);
+if (MaxExpr->isEvaluatable(S.Context) &&
+!checkUInt32Argument(S, Attr, MaxExpr, Max))
   return;
   }
 
@@ -5523,31 +5547,39 @@
   }
 
   D->addAttr(::new (S.Context)
- AMDGPUWavesPerEUAttr(Attr.getLoc(), S.Context, Min, Max,
+ AMDGPUWavesPerEUAttr(Attr.getLoc(), S.Context, MinExpr, MaxExpr,
   Attr.getAttributeSpellingListIndex()));
 }
 
 static void handleAMDGPUNumSGPRAttr(Sema &S, Decl *D,
 const AttributeList &Attr) {
+  if (!checkAllAreIntegral(S, Attr))
+return;
+
   uint32_t NumSGPR = 0;
   Expr *NumSGPRExpr = Attr.getArgAsExpr(0);
-  if (!checkUInt32Argument(S, Attr, NumSGPRExpr, NumSGPR))
+  if (NumSGPRExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, NumSGPRExpr, NumSGPR))
 return;
 
   D->addAttr(::new (S.Context)
- AMDGPUNumSGPRAttr(Attr.getLoc(), S.Context, NumSGPR,
+ AMDGPUNumSGPRAttr(Attr.getLoc(), S.Context, NumSGPRExpr,
Attr.getAttributeSpellingListIndex()));
 }
 
 static void handleAMDGPUNumVGPRAttr(Sema &S, Decl *D,
 const AttributeList &Attr) {
+  if (!checkAllAreIntegral(S, Attr))
+return;
+
   uint32_t NumVGPR = 0;
   Expr *NumVGPRExpr = Attr.getArgAsExpr(0);
-  if (!checkUInt32Argument(S, Attr, NumVGPRExpr, NumVGPR))
+  if (NumVGPRExpr->isEvaluatable(S.Context) &&
+  !checkUInt32Argument(S, Attr, NumVGPRExpr, NumVGPR))
 return;
 
   D->addAttr(::new (S.Context)
- AMDGPUNumVGPRAttr(Attr.getLoc(), S.Context, NumVGPR,
+ AMDGPUNumVGPRAttr(Attr.getLoc(), S.Context, NumVGPRExpr,
Attr.getAttributeSpellingListIndex()));
 }
 
Index: lib/CodeGen/TargetInfo.cpp
===
--- lib/CodeGen/TargetInfo.cpp
+++ lib/CodeGen/TargetInfo.cpp
@@ -7661,6 +7661,16 @@
 };
 }
 
+static llvm::APSInt getConstexprInt(const Expr *E, const ASTContext& Ctx)
+{
+  assert(E);
+
+  llvm::APSInt Tmp{32, 0};
+  E->EvaluateAsInt(Tmp, Ctx);
+
+  return Tmp;
+}
+
 void AMDGPUTargetCodeGenInfo::setTargetAttributes(
 const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M,
 ForDefinition_t IsForDefinition) const {
@@ -7676,8 +7686,11 @@
 FD->getAttr() : nullptr;
   const auto *FlatWGS = FD->getAttr();
   if (ReqdWGS || FlatWGS) {
-unsigned Min = FlatWGS ? FlatWGS->getMin() : 0;
-unsigned Max = FlatWGS ? F

[PATCH] D39857: [AMDGPU] Late parsed / dependent arguments for AMDGPU kernel attributes

2017-11-09 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx marked 2 inline comments as done.
AlexVlx added inline comments.



Comment at: include/clang/Basic/Attr.td:1328
+  ExprArgument<"Max", 1>,
   StringArgument<"ISA", 1>];
   let Documentation = [AMDGPUFlatWorkGroupSizeDocs];

kzhuravl wrote:
> Not in trunk.
Done (hopefully). Apologies for that.



Comment at: include/clang/Basic/Attr.td:1339
+  ExprArgument<"Max", 1>,
   StringArgument<"ISA", 1>];
   let Documentation = [AMDGPUWavesPerEUDocs];

kzhuravl wrote:
> Not in trunk.
Done (hopefully). Apologies for that.



Comment at: include/clang/Basic/Attr.td:1361-1370
+def AMDGPUMaxWorkGroupDim : InheritableAttr {
   let Spellings = [CXX11<"","hc_max_workgroup_dim", 201511>];
-  let Args = [IntArgument<"X">,
-  IntArgument<"Y", 1>,
-  IntArgument<"Z", 1>,
+  let Args = [ExprArgument<"X">,
+  ExprArgument<"Y">,
+  ExprArgument<"Z">,
   StringArgument<"ISA", 1>];
   let Subjects = SubjectList<[Function], ErrorDiag>;

kzhuravl wrote:
> Not in trunk.
Done (hopefully). Apologies for that.



Comment at: lib/CodeGen/TargetInfo.cpp:7665-7666
 
+namespace
+{
+  inline

aaron.ballman wrote:
> This should be a static function rather than in an inline namespace.
Done.



Comment at: lib/CodeGen/TargetInfo.cpp:7671
+llvm::APSInt r{32, 0};
+if (E) E->EvaluateAsInt(r, Ctx);
+

aaron.ballman wrote:
> If there is no expression given, why should this return an `APSInt` for 0? 
> This seems more like something you would assert.
Hello Aaron, thank you for your comment. The initial premise was that it could 
be possible for a pass null, for cases where an user would not have specified 
an optional argument (e.g. for MaxWavesPerEU). This would've put the actual 
check further away from the actual attribute handling logic, making it somewhat 
tidier. However, thinking about it some more, from the specification of the 
attributes where this would have applied this can be done upstream in such a 
way as to ensure that all expressions describing the arguments passed to an 
attribute are non-null, so I've switched it over to follow your suggestion.


https://reviews.llvm.org/D39857



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D39857: [AMDGPU] Late parsed / dependent arguments for AMDGPU kernel attributes

2017-11-09 Thread Alex Voicu via Phabricator via cfe-commits

AlexVlx marked an inline comment as done.
AlexVlx added a comment.

In https://reviews.llvm.org/D39857#920814, @kzhuravl wrote:

> Hi Alex, can you rebase on top of trunk (I think you brought in some extra 
> changes from hcc branch) and upload a full diff?

Hello Konstantine - apologies about that, hopefully it's correct now. Thank you.

https://reviews.llvm.org/D39857

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 >

1 - 100 of 126 matches

Mail list logo