from:"Jose Manuel Monsalve Diaz via Phabricator via cfe\-commits"

[PATCH] D133802: [OpenMP] Remove simplified device runtime handling

2022-09-13 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 added a comment.

This is a good idea. Thanks Joseph.

Other than the two comments I made, I think this should be accepted.

Jose




Comment at: clang/include/clang/Driver/Options.td:2565-2566
   Flags<[NoArgumentUnused, HelpHidden]>;
-def fopenmp_cuda_force_full_runtime : Flag<["-"], 
"fopenmp-cuda-force-full-runtime">, Group,
-  Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;
-def fno_openmp_cuda_force_full_runtime : Flag<["-"], 
"fno-openmp-cuda-force-full-runtime">, Group,
-  Flags<[NoArgumentUnused, HelpHidden]>;
+def fopenmp_cuda_force_full_runtime : Flag<["-"], 
"fopenmp-cuda-force-full-runtime">, Flags<[HelpHidden]>;
+def fno_openmp_cuda_force_full_runtime : Flag<["-"], 
"fno-openmp-cuda-force-full-runtime">, Flags<[HelpHidden]>;
 def fopenmp_cuda_number_of_sm_EQ : Joined<["-"], 
"fopenmp-cuda-number-of-sm=">, Group,

Why not remove these? Are they used somewhere else? 



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:80-90
   ExecutionRuntimeModesRAII(CGOpenMPRuntimeGPU::ExecutionMode &ExecMode)
   : ExecMode(ExecMode) {
 SavedExecMode = ExecMode;
 ExecMode = CGOpenMPRuntimeGPU::EM_NonSPMD;
   }
   /// Constructor for SPMD mode.
+  ExecutionRuntimeModesRAII(CGOpenMPRuntimeGPU::ExecutionMode &ExecMode, bool)

What if we combine these two and just leave one that receives two modes. I am 
really confused by this code. Is there something I am missing here? 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133802/new/

https://reviews.llvm.org/D133802

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D133802: [OpenMP] Remove simplified device runtime handling

2022-09-13 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 added inline comments.



Comment at: clang/include/clang/Driver/Options.td:2565-2566
   Flags<[NoArgumentUnused, HelpHidden]>;
-def fopenmp_cuda_force_full_runtime : Flag<["-"], 
"fopenmp-cuda-force-full-runtime">, Group,
-  Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;
-def fno_openmp_cuda_force_full_runtime : Flag<["-"], 
"fno-openmp-cuda-force-full-runtime">, Group,
-  Flags<[NoArgumentUnused, HelpHidden]>;
+def fopenmp_cuda_force_full_runtime : Flag<["-"], 
"fopenmp-cuda-force-full-runtime">, Flags<[HelpHidden]>;
+def fno_openmp_cuda_force_full_runtime : Flag<["-"], 
"fno-openmp-cuda-force-full-runtime">, Flags<[HelpHidden]>;
 def fopenmp_cuda_number_of_sm_EQ : Joined<["-"], 
"fopenmp-cuda-number-of-sm=">, Group,

jhuber6 wrote:
> josemonsalve2 wrote:
> > Why not remove these? Are they used somewhere else? 
> We usually don't remove driver arguments between releases as this could cause 
> existing applications to stop compiling. Leaving them here will cause Clang 
> to continue compiling but emit an unused flag warning.
Will that generate a warning saying this flag has no use? 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133802/new/

https://reviews.llvm.org/D133802

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

2021-07-19 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 created this revision.
josemonsalve2 added reviewers: ABataev, jdoerfert, JonChesterfield, 
ggeorgakoudis, jhuber6, baziotis, sstefan1, uenoku, tianshilei1992.
Herald added subscribers: jfb, guansong, yaxunl.
josemonsalve2 requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

The device runtime contains several calls to 
__kmpc_get_hardware_num_threads_in_block
and __kmpc_get_hardware_num_blocks. If the thread_limit and the num_teams are 
constant,
these calls can be folded to the constant value.

In commit D106033  we have the optimization 
phase. This commit adds the attributes to
the outlined function for the grid size. the two attributes are `NumTeams` and
`ThreadLimit`. These values are added as long as they are constant.

Two functions are created `getNumThreadsExprForTargetDirective` and
`getNumTeamsExprForTargetDirective`. The original functions 
`emitNumTeamsForTargetDirective`
 and `emitNumThreadsForTargetDirective` identify the expresion and emit the 
code.
However, for the Device version of the outlined function, we cannot emit 
anything.
Therefore, this is a first attempt to separate emision of code from deduction 
of the
values.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106298

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h

Index: clang/lib/CodeGen/CGOpenMPRuntime.h
===
--- clang/lib/CodeGen/CGOpenMPRuntime.h
+++ clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -340,6 +340,35 @@
   llvm::Value *emitUpdateLocation(CodeGenFunction &CGF, SourceLocation Loc,
   unsigned Flags = 0);
 
+  /// Emit the number of teams for a target directive.  Inspect the num_teams
+  /// clause associated with a teams construct combined or closely nested
+  /// with the target directive.
+  ///
+  /// Emit a team of size one for directives such as 'target parallel' that
+  /// have no associated teams construct.
+  ///
+  /// Otherwise, return nullptr.
+  const Expr *getNumTeamsExprForTargetDirective(CodeGenFunction &CGF,
+const OMPExecutableDirective &D,
+int32_t &DefaultVal);
+  llvm::Value *emitNumTeamsForTargetDirective(CodeGenFunction &CGF,
+  const OMPExecutableDirective &D);
+  /// Emit the number of threads for a target directive.  Inspect the
+  /// thread_limit clause associated with a teams construct combined or closely
+  /// nested with the target directive.
+  ///
+  /// Emit the num_threads clause for directives such as 'target parallel' that
+  /// have no associated teams construct.
+  ///
+  /// Otherwise, return nullptr.
+  const Expr *
+  getNumThreadsExprForTargetDirective(CodeGenFunction &CGF,
+  const OMPExecutableDirective &D,
+  int32_t &DefaultVal);
+  llvm::Value *
+  emitNumThreadsForTargetDirective(CodeGenFunction &CGF,
+   const OMPExecutableDirective &D);
+
   /// Returns pointer to ident_t type.
   llvm::Type *getIdentTyPointerTy();
 
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -6551,6 +6551,20 @@
   OffloadEntriesInfoManager.registerTargetRegionEntryInfo(
   DeviceID, FileID, ParentName, Line, OutlinedFn, OutlinedFnID,
   OffloadEntriesInfoManagerTy::OMPTargetRegionEntryTargetRegion);
+
+  // Add NumTeams and ThreadLimit attributes to the outlined GPU function
+  int32_t DefaultValTeams = -1;
+  getNumTeamsExprForTargetDirective(CGF, D, DefaultValTeams);
+  if (DefaultValTeams > 0) {
+OutlinedFn->addFnAttr(llvm::StringRef("NumTeams"),
+  std::to_string(DefaultValTeams));
+  }
+  int32_t DefaultValThreads = -1;
+  getNumThreadsExprForTargetDirective(CGF, D, DefaultValThreads);
+  if (DefaultValThreads > 0) {
+OutlinedFn->addFnAttr(llvm::StringRef("ThreadLimit"),
+  std::to_string(DefaultValThreads));
+  }
 }
 
 /// Checks if the expression is constant or does not have non-trivial function
@@ -6605,24 +6619,13 @@
   return Child;
 }
 
-/// Emit the number of teams for a target directive.  Inspect the num_teams
-/// clause associated with a teams construct combined or closely nested
-/// with the target directive.
-///
-/// Emit a team of size one for directives such as 'target parallel' that
-/// have no associated teams construct.
-///
-/// Otherwise, return nullptr.
-static llvm::Value *
-emitNumTeamsForTargetDirective(CodeGenFunction &CGF,
-   const OMPExecutableDirective &D) {
-  assert(!CGF.getLangOpts().OpenMPIsDevice &&
- "Clauses associated

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

2021-07-19 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 added a comment.

In D106298#2888234 , @jdoerfert wrote:

> Tests?

If you are referring to new tests, working on them. If you are referring to 
those tests that are failing, is because introducing new attributes broke some 
tests because they expect a lot of functions to have the same attributes, this 
creates a new group.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106298/new/

https://reviews.llvm.org/D106298

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

2021-07-19 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 359930.
josemonsalve2 added a comment.

Changing the attribute names to those sugested by @jdoerfert


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106298/new/

https://reviews.llvm.org/D106298

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h

Index: clang/lib/CodeGen/CGOpenMPRuntime.h
===
--- clang/lib/CodeGen/CGOpenMPRuntime.h
+++ clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -340,6 +340,35 @@
   llvm::Value *emitUpdateLocation(CodeGenFunction &CGF, SourceLocation Loc,
   unsigned Flags = 0);
 
+  /// Emit the number of teams for a target directive.  Inspect the num_teams
+  /// clause associated with a teams construct combined or closely nested
+  /// with the target directive.
+  ///
+  /// Emit a team of size one for directives such as 'target parallel' that
+  /// have no associated teams construct.
+  ///
+  /// Otherwise, return nullptr.
+  const Expr *getNumTeamsExprForTargetDirective(CodeGenFunction &CGF,
+const OMPExecutableDirective &D,
+int32_t &DefaultVal);
+  llvm::Value *emitNumTeamsForTargetDirective(CodeGenFunction &CGF,
+  const OMPExecutableDirective &D);
+  /// Emit the number of threads for a target directive.  Inspect the
+  /// thread_limit clause associated with a teams construct combined or closely
+  /// nested with the target directive.
+  ///
+  /// Emit the num_threads clause for directives such as 'target parallel' that
+  /// have no associated teams construct.
+  ///
+  /// Otherwise, return nullptr.
+  const Expr *
+  getNumThreadsExprForTargetDirective(CodeGenFunction &CGF,
+  const OMPExecutableDirective &D,
+  int32_t &DefaultVal);
+  llvm::Value *
+  emitNumThreadsForTargetDirective(CodeGenFunction &CGF,
+   const OMPExecutableDirective &D);
+
   /// Returns pointer to ident_t type.
   llvm::Type *getIdentTyPointerTy();
 
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -6551,6 +6551,20 @@
   OffloadEntriesInfoManager.registerTargetRegionEntryInfo(
   DeviceID, FileID, ParentName, Line, OutlinedFn, OutlinedFnID,
   OffloadEntriesInfoManagerTy::OMPTargetRegionEntryTargetRegion);
+
+  // Add NumTeams and ThreadLimit attributes to the outlined GPU function
+  int32_t DefaultValTeams = -1;
+  getNumTeamsExprForTargetDirective(CGF, D, DefaultValTeams);
+  if (DefaultValTeams > 0) {
+OutlinedFn->addFnAttr("omp_target_num_teams",
+  std::to_string(DefaultValTeams));
+  }
+  int32_t DefaultValThreads = -1;
+  getNumThreadsExprForTargetDirective(CGF, D, DefaultValThreads);
+  if (DefaultValThreads > 0) {
+OutlinedFn->addFnAttr("omp_target_thread_limit",
+  std::to_string(DefaultValThreads));
+  }
 }
 
 /// Checks if the expression is constant or does not have non-trivial function
@@ -6605,24 +6619,13 @@
   return Child;
 }
 
-/// Emit the number of teams for a target directive.  Inspect the num_teams
-/// clause associated with a teams construct combined or closely nested
-/// with the target directive.
-///
-/// Emit a team of size one for directives such as 'target parallel' that
-/// have no associated teams construct.
-///
-/// Otherwise, return nullptr.
-static llvm::Value *
-emitNumTeamsForTargetDirective(CodeGenFunction &CGF,
-   const OMPExecutableDirective &D) {
-  assert(!CGF.getLangOpts().OpenMPIsDevice &&
- "Clauses associated with the teams directive expected to be emitted "
- "only for the host!");
+const Expr *CGOpenMPRuntime::getNumTeamsExprForTargetDirective(
+CodeGenFunction &CGF, const OMPExecutableDirective &D,
+int32_t &DefaultVal) {
+
   OpenMPDirectiveKind DirectiveKind = D.getDirectiveKind();
   assert(isOpenMPTargetExecutionDirective(DirectiveKind) &&
  "Expected target-based executable directive.");
-  CGBuilderTy &Bld = CGF.Builder;
   switch (DirectiveKind) {
   case OMPD_target: {
 const auto *CS = D.getInnermostCapturedStmt();
@@ -6638,18 +6641,20 @@
   CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(CGF, &CGInfo);
   const Expr *NumTeams =
   NestedDir->getSingleClause()->getNumTeams();
-  llvm::Value *NumTeamsVal =
-  CGF.EmitScalarExpr(NumTeams,
- /*IgnoreResultAssign*/ true);
-  return Bld.CreateIntCast(NumTeamsVal, CGF.Int32Ty,
-   /*isSigned=*/true);
+  if (NumTeams->isIntegerConstantExpr(C

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

2021-07-19 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 marked 2 inline comments as done.
josemonsalve2 added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntime.cpp:6659
 }
 return nullptr;
   }

Should I default here to 1? Since this is an `omp target` 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106298/new/

https://reviews.llvm.org/D106298

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

2021-07-19 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 359946.
josemonsalve2 added a comment.

Making the default num teams for `omp target` be 1. Also fixing clang-tidy 
error and missing initialization.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106298/new/

https://reviews.llvm.org/D106298

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h

Index: clang/lib/CodeGen/CGOpenMPRuntime.h
===
--- clang/lib/CodeGen/CGOpenMPRuntime.h
+++ clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -340,6 +340,35 @@
   llvm::Value *emitUpdateLocation(CodeGenFunction &CGF, SourceLocation Loc,
   unsigned Flags = 0);
 
+  /// Emit the number of teams for a target directive.  Inspect the num_teams
+  /// clause associated with a teams construct combined or closely nested
+  /// with the target directive.
+  ///
+  /// Emit a team of size one for directives such as 'target parallel' that
+  /// have no associated teams construct.
+  ///
+  /// Otherwise, return nullptr.
+  const Expr *getNumTeamsExprForTargetDirective(CodeGenFunction &CGF,
+const OMPExecutableDirective &D,
+int32_t &DefaultVal);
+  llvm::Value *emitNumTeamsForTargetDirective(CodeGenFunction &CGF,
+  const OMPExecutableDirective &D);
+  /// Emit the number of threads for a target directive.  Inspect the
+  /// thread_limit clause associated with a teams construct combined or closely
+  /// nested with the target directive.
+  ///
+  /// Emit the num_threads clause for directives such as 'target parallel' that
+  /// have no associated teams construct.
+  ///
+  /// Otherwise, return nullptr.
+  const Expr *
+  getNumThreadsExprForTargetDirective(CodeGenFunction &CGF,
+  const OMPExecutableDirective &D,
+  int32_t &DefaultVal);
+  llvm::Value *
+  emitNumThreadsForTargetDirective(CodeGenFunction &CGF,
+   const OMPExecutableDirective &D);
+
   /// Returns pointer to ident_t type.
   llvm::Type *getIdentTyPointerTy();
 
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -6551,6 +6551,20 @@
   OffloadEntriesInfoManager.registerTargetRegionEntryInfo(
   DeviceID, FileID, ParentName, Line, OutlinedFn, OutlinedFnID,
   OffloadEntriesInfoManagerTy::OMPTargetRegionEntryTargetRegion);
+
+  // Add NumTeams and ThreadLimit attributes to the outlined GPU function
+  int32_t DefaultValTeams = -1;
+  getNumTeamsExprForTargetDirective(CGF, D, DefaultValTeams);
+  if (DefaultValTeams > 0) {
+OutlinedFn->addFnAttr("omp_target_num_teams",
+  std::to_string(DefaultValTeams));
+  }
+  int32_t DefaultValThreads = -1;
+  getNumThreadsExprForTargetDirective(CGF, D, DefaultValThreads);
+  if (DefaultValThreads > 0) {
+OutlinedFn->addFnAttr("omp_target_thread_limit",
+  std::to_string(DefaultValThreads));
+  }
 }
 
 /// Checks if the expression is constant or does not have non-trivial function
@@ -6605,24 +6619,13 @@
   return Child;
 }
 
-/// Emit the number of teams for a target directive.  Inspect the num_teams
-/// clause associated with a teams construct combined or closely nested
-/// with the target directive.
-///
-/// Emit a team of size one for directives such as 'target parallel' that
-/// have no associated teams construct.
-///
-/// Otherwise, return nullptr.
-static llvm::Value *
-emitNumTeamsForTargetDirective(CodeGenFunction &CGF,
-   const OMPExecutableDirective &D) {
-  assert(!CGF.getLangOpts().OpenMPIsDevice &&
- "Clauses associated with the teams directive expected to be emitted "
- "only for the host!");
+const Expr *CGOpenMPRuntime::getNumTeamsExprForTargetDirective(
+CodeGenFunction &CGF, const OMPExecutableDirective &D,
+int32_t &DefaultVal) {
+
   OpenMPDirectiveKind DirectiveKind = D.getDirectiveKind();
   assert(isOpenMPTargetExecutionDirective(DirectiveKind) &&
  "Expected target-based executable directive.");
-  CGBuilderTy &Bld = CGF.Builder;
   switch (DirectiveKind) {
   case OMPD_target: {
 const auto *CS = D.getInnermostCapturedStmt();
@@ -6638,19 +6641,22 @@
   CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(CGF, &CGInfo);
   const Expr *NumTeams =
   NestedDir->getSingleClause()->getNumTeams();
-  llvm::Value *NumTeamsVal =
-  CGF.EmitScalarExpr(NumTeams,
- /*IgnoreResultAssign*/ true);
-  return Bld.CreateIntCast(NumTeamsVal, CGF.Int32Ty,
-   /*isSigned=*/true);

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

2021-07-19 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 359969.
josemonsalve2 added a comment.

Adding test file 
`/clang/test/OpenMP/target_num_teams_num_threads_attributes.cpp`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106298/new/

https://reviews.llvm.org/D106298

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/test/OpenMP/target_num_teams_num_threads_attributes.cpp

Index: clang/test/OpenMP/target_num_teams_num_threads_attributes.cpp
===
--- /dev/null
+++ clang/test/OpenMP/target_num_teams_num_threads_attributes.cpp
@@ -0,0 +1,191 @@
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap  %s -check-prefix=CHECK1
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s -check-prefix=CHECK1
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK1
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK1
+
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap  %s -check-prefix=CHECK2
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s -check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK2
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK2
+
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap  %s -check-prefix=CHECK3
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s -check-prefix=CHECK3
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK3
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK3
+
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap  %s -check-prefix=CHECK4
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s -check-prefix=CHECK4
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK4
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

2021-07-20 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 360190.
josemonsalve2 added a comment.

I've fixed all the clang tests. It is also not possible to provide a default
for `omp target` because the code relies on a nullptr being returned for 
generating
the right runtime call. Therefore I reverted that change and use -1 to flag
this case. I've also moved some elements to the emit function that were in the 
getNumTeamsExpr function


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106298/new/

https://reviews.llvm.org/D106298

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_map_codegen_03.cpp
  clang/test/OpenMP/target_num_teams_num_threads_attributes.cpp
  clang/test/OpenMP/target_parallel_codegen.cpp
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_parallel_if_codegen.cpp
  clang/test/OpenMP/target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_collapse_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_dist_schedule_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_reduction_codegen.cpp
  clang/test/OpenMP/target_teams_num_teams_codegen.cpp
  clang/test/OpenMP/target_teams_thread_limit_codegen.cpp
  clang/test/OpenMP/teams_codegen.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106298: [OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions

2021-07-20 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 360196.
josemonsalve2 added a comment.

Forgot to run `git-clang-format`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106298/new/

https://reviews.llvm.org/D106298

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_map_codegen_03.cpp
  clang/test/OpenMP/target_num_teams_num_threads_attributes.cpp
  clang/test/OpenMP/target_parallel_codegen.cpp
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_parallel_if_codegen.cpp
  clang/test/OpenMP/target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_collapse_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_dist_schedule_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_reduction_codegen.cpp
  clang/test/OpenMP/target_teams_num_teams_codegen.cpp
  clang/test/OpenMP/target_teams_thread_limit_codegen.cpp
  clang/test/OpenMP/teams_codegen.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106298: [OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions

2021-07-26 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 361777.
josemonsalve2 added a comment.

Rebasing to main


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106298/new/

https://reviews.llvm.org/D106298

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_map_codegen_03.cpp
  clang/test/OpenMP/target_num_teams_num_threads_attributes.cpp
  clang/test/OpenMP/target_parallel_codegen.cpp
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_parallel_if_codegen.cpp
  clang/test/OpenMP/target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_collapse_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_dist_schedule_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_reduction_codegen.cpp
  clang/test/OpenMP/target_teams_num_teams_codegen.cpp
  clang/test/OpenMP/target_teams_thread_limit_codegen.cpp
  clang/test/OpenMP/teams_codegen.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

2021-07-26 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 abandoned this revision.
josemonsalve2 added a subscriber: jhuber6.
josemonsalve2 added a comment.

This has been completed by @jhuber6 in D97680 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90670/new/

https://reviews.llvm.org/D90670

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106298: [OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions

2021-07-27 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 362074.
josemonsalve2 added a comment.

Fixing tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106298/new/

https://reviews.llvm.org/D106298

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/lib/CodeGen/CGOpenMPRuntime.h
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_map_codegen_03.cpp
  clang/test/OpenMP/target_num_teams_num_threads_attributes.cpp
  clang/test/OpenMP/target_parallel_codegen.cpp
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_parallel_if_codegen.cpp
  clang/test/OpenMP/target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_collapse_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_dist_schedule_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_reduction_codegen.cpp
  clang/test/OpenMP/target_teams_num_teams_codegen.cpp
  clang/test/OpenMP/target_teams_thread_limit_codegen.cpp
  clang/test/OpenMP/teams_codegen.cpp

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106033: [OpenMP] Folding threadLimit and numThreads when single value in kernels

2021-07-27 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 362226.
josemonsalve2 added a comment.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Sync to main


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106033/new/

https://reviews.llvm.org/D106033

Files:
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_map_codegen_03.cpp
  clang/test/OpenMP/target_parallel_codegen.cpp
  clang/test/OpenMP/target_parallel_for_codegen.cpp
  clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_parallel_if_codegen.cpp
  clang/test/OpenMP/target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_collapse_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_dist_schedule_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_reduction_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
  openmp/libomptarget/deviceRTLs/target_interface.h

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106033: [OpenMP] Folding threadLimit and numThreads when single value in kernels

2021-07-27 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 362234.
josemonsalve2 added a comment.

Resync again


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106033/new/

https://reviews.llvm.org/D106033

Files:
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_map_codegen_03.cpp
  clang/test/OpenMP/target_parallel_codegen.cpp
  clang/test/OpenMP/target_parallel_for_codegen.cpp
  clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/target_parallel_if_codegen.cpp
  clang/test/OpenMP/target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_collapse_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_dist_schedule_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_simd_reduction_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
  openmp/libomptarget/deviceRTLs/target_interface.h

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106033: [OpenMP] Folding threadLimit and numThreads when single value in kernels

2021-07-27 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 362248.
josemonsalve2 added a comment.

Rebasing to main this time for real


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106033/new/

https://reviews.llvm.org/D106033

Files:
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
  openmp/libomptarget/deviceRTLs/target_interface.h

Index: openmp/libomptarget/deviceRTLs/target_interface.h
===
--- openmp/libomptarget/deviceRTLs/target_interface.h
+++ openmp/libomptarget/deviceRTLs/target_interface.h
@@ -18,8 +18,8 @@
 // Calls to the NVPTX layer (assuming 1D layout)
 EXTERN int __kmpc_get_hardware_thread_id_in_block();
 EXTERN int GetBlockIdInKernel();
-EXTERN int __kmpc_get_hardware_num_blocks();
-EXTERN int __kmpc_get_hardware_num_threads_in_block();
+EXTERN NOINLINE int __kmpc_get_hardware_num_blocks();
+EXTERN NOINLINE int __kmpc_get_hardware_num_threads_in_block();
 EXTERN unsigned GetWarpId();
 EXTERN unsigned GetWarpSize();
 EXTERN unsigned GetLaneId();
Index: llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
===
--- /dev/null
+++ llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
@@ -0,0 +1,128 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-globals
+; RUN: opt -S -passes=openmp-opt < %s | FileCheck %s
+target triple = "nvptx64"
+
+%struct.ident_t = type { i32, i32, i32, i32, i8* }
+
+@kernel0_exec_mode = weak constant i8 1
+
+@G = external global i32
+;.
+; CHECK: @[[G:[a-zA-Z0-9_$"\\.-]+]] = external global i32
+;.
+define weak void @kernel0() #0 {
+; CHECK-LABEL: define {{[^@]+}}@kernel0()
+; CHECK: #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:[[I:%.*]] = call i32 @__kmpc_target_init(%struct.ident_t* null, i1 true, i1 false, i1 false)
+; CHECK-NEXT:call void @helper0()
+; CHECK-NEXT:call void @helper1()
+; CHECK-NEXT:call void @helper2()
+; CHECK-NEXT:call void @__kmpc_target_deinit(%struct.ident_t* null, i1 true, i1 false)
+; CHECK-NEXT:ret void
+;
+  %i = call i32 @__kmpc_target_init(%struct.ident_t* null, i1 true, i1 false, i1 false)
+  call void @helper0()
+  call void @helper1()
+  call void @helper2()
+  call void @__kmpc_target_deinit(%struct.ident_t* null, i1 true, i1 false)
+  ret void
+}
+
+@kernel1_exec_mode = weak constant i8 1
+
+define weak void @kernel1() #0 {
+; CHECK-LABEL: define {{[^@]+}}@kernel1()
+; CHECK: #[[ATTR0]] {
+; CHECK-NEXT:[[I:%.*]] = call i32 @__kmpc_target_init(%struct.ident_t* null, i1 true, i1 false, i1 false)
+; CHECK-NEXT:call void @helper1()
+; CHECK-NEXT:call void @__kmpc_target_deinit(%struct.ident_t* null, i1 false, i1 false)
+; CHECK-NEXT:ret void
+;
+  %i = call i32 @__kmpc_target_init(%struct.ident_t* null, i1 true, i1 false, i1 false)
+  call void @helper1()
+  call void @__kmpc_target_deinit(%struct.ident_t* null, i1 false, i1 false)
+  ret void
+}
+
+@kernel2_exec_mode = weak constant i8 1
+
+define weak void @kernel2() #0 {
+; CHECK-LABEL: define {{[^@]+}}@kernel2()
+; CHECK: #[[ATTR0]] {
+; CHECK-NEXT:[[I:%.*]] = call i32 @__kmpc_target_init(%struct.ident_t* null, i1 false, i1 false, i1 false)
+; CHECK-NEXT:call void @helper0()
+; CHECK-NEXT:call void @helper1()
+; CHECK-NEXT:call void @helper2()
+; CHECK-NEXT:call void @__kmpc_target_deinit(%struct.ident_t* null, i1 false, i1 false)
+; CHECK-NEXT:ret void
+;
+  %i = call i32 @__kmpc_target_init(%struct.ident_t* null, i1 false, i1 false, i1 false)
+  call void @helper0()
+  call void @helper1()
+  call void @helper2()
+  call void @__kmpc_target_deinit(%struct.ident_t* null, i1 false, i1 false)
+  ret void
+}
+
+define internal void @helper0() {
+; CHECK-LABEL: define {{[^@]+}}@helper0() {{#[0-9]+}} {
+; CHECK-NEXT:store i32 666, i32* @G, align 4
+; CHECK-NEXT:ret void
+;
+  %threadLimit = call i32 @__kmpc_get_hardware_num_threads_in_block()
+  store i32 %threadLimit, i32* @G
+  ret void
+}
+
+define internal void @helper1() {
+; CHECK-LABEL: define {{[^@]+}}@helper1() {{#[0-9]+}} {
+; CHECK-NEXT:br label [[F:%.*]]
+; CHECK:   t:
+; CHECK-NEXT:unreachable
+; CHECK:   f:
+; CHECK-NEXT:ret void
+;
+  %threadLimit = call i32 @__kmpc_get_hardware_num_threads_in_block()
+  %c = icmp eq i32 %threadLimit, 666
+  br i1 %c, label %f, label %t
+t:
+  call void @helper0()
+  ret void
+f:
+  ret void
+}
+
+define internal void @helper2() {
+; CHECK-LABEL: define {{[^@]+}}@helper2() {{#[0-9]+}} {
+; CHECK-NEXT:store i32 666, i32* @G
+; CHECK-NEXT:ret void
+;
+  %threadLimit = call i32 @__kmpc_get_hardware_num_threads_in_block()
+  store i32 %threadLimit, i32* @G
+  ret void
+}
+
+declare i32 @__kmpc_get_hardware_num_threads_in_block()
+declare i32

[PATCH] D92853: Simplifying memory globalization from the front end to move optimizations to the middle end.Memory globalization was fully implemented in the front end. There are three runtimefunction

2020-12-08 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 created this revision.
josemonsalve2 requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added subscribers: llvm-commits, openmp-commits, cfe-commits, sstefan1.
Herald added projects: clang, OpenMP, LLVM.

...__kmpc_data_sharing_coalesced_push_stack* __kmpc_data_sharing_pop_stackThe 
front end performed a scape analysis and created a record declare with all the 
stackvariables. Then, based on the context (isTTD and other parameters) it 
would create a pushfor the size of the record, or for that size multiplied by 
the WARP (to globalize for thewhole WARP.This PR removes the record creation, 
and it simplifies the front end to be a simple runtimecall that will be later 
on optimized in the middle end. The middle end will be able todetermine the 
stack variables that do scape, and those that do not, as well as theapprorpiate 
merging of different globalized variablesDifferential Revision: 
https://reviews.llvm.org/D90670


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D92853

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/test/OpenMP/nvptx_data_sharing.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
  openmp/libomptarget/deviceRTLs/interface.h

Index: openmp/libomptarget/deviceRTLs/interface.h
===
--- openmp/libomptarget/deviceRTLs/interface.h
+++ openmp/libomptarget/deviceRTLs/interface.h
@@ -432,7 +432,7 @@
 EXTERN void __kmpc_data_sharing_init_stack_spmd();
 EXTERN void *__kmpc_data_sharing_coalesced_push_stack(size_t size,
 int16_t UseSharedMemory);
-EXTERN void *__kmpc_data_sharing_push_stack(size_t size, int16_t UseSharedMemory);
+EXTERN void *__kmpc_data_sharing_push_stack(size_t size);
 EXTERN void __kmpc_data_sharing_pop_stack(void *a);
 EXTERN void __kmpc_begin_sharing_variables(void ***GlobalArgs, size_t nArgs);
 EXTERN void __kmpc_end_sharing_variables();
Index: openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
===
--- openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
+++ openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
@@ -144,11 +144,7 @@
 // the list of references to shared variables and to pre-allocate global storage
 // for holding the globalized variables.
 //
-// By default the globalized variables are stored in global memory. If the
-// UseSharedMemory is set to true, the runtime will attempt to use shared memory
-// as long as the size requested fits the pre-allocated size.
-EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize,
-int16_t UseSharedMemory) {
+EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize) {
   // Compute the total memory footprint of the requested data.
   // The master thread requires a stack only for itself. A worker
   // thread (which at this point is a warp master) will require
Index: llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
===
--- llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -543,7 +543,7 @@
 __OMP_RTL(__kmpc_data_sharing_init_stack_spmd, false, Void, )
 
 __OMP_RTL(__kmpc_data_sharing_coalesced_push_stack, false, VoidPtr, SizeTy, Int16)
-__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy, Int16)
+__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy)
 __OMP_RTL(__kmpc_data_sharing_pop_stack, false, Void, VoidPtr)
 __OMP_RTL(__kmpc_begin_sharing_variables, false, Void, VoidPtrPtrPtr, SizeTy)
 __OMP_RTL(__kmpc_end_sharing_variables, false, Void, )
Index: clang/test/OpenMP/nvptx_data_sharing.cpp
===
--- clang/test/OpenMP/nvptx_data_sharing.cpp
+++ clang/test/OpenMP/nvptx_data_sharing.cpp
@@ -2,8 +2,7 @@
 ///==///
 
 // RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
-// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix CK1 --check-prefix SEQ
-// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions | FileCheck %s --check-prefix CK1 --check-prefix PAR
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

2020-12-08 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 310242.
josemonsalve2 added a comment.

Removing globalized record for parallel regions

When globalization occurs in parallel regions, a record was crerated that is 
not necessary anymore.
This is expected to be done in the back end.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90670/new/

https://reviews.llvm.org/D90670

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/test/OpenMP/nvptx_data_sharing.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
  openmp/libomptarget/deviceRTLs/interface.h

Index: openmp/libomptarget/deviceRTLs/interface.h
===
--- openmp/libomptarget/deviceRTLs/interface.h
+++ openmp/libomptarget/deviceRTLs/interface.h
@@ -432,7 +432,7 @@
 EXTERN void __kmpc_data_sharing_init_stack_spmd();
 EXTERN void *__kmpc_data_sharing_coalesced_push_stack(size_t size,
 int16_t UseSharedMemory);
-EXTERN void *__kmpc_data_sharing_push_stack(size_t size, int16_t UseSharedMemory);
+EXTERN void *__kmpc_data_sharing_push_stack(size_t size);
 EXTERN void __kmpc_data_sharing_pop_stack(void *a);
 EXTERN void __kmpc_begin_sharing_variables(void ***GlobalArgs, size_t nArgs);
 EXTERN void __kmpc_end_sharing_variables();
Index: openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
===
--- openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
+++ openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
@@ -144,11 +144,7 @@
 // the list of references to shared variables and to pre-allocate global storage
 // for holding the globalized variables.
 //
-// By default the globalized variables are stored in global memory. If the
-// UseSharedMemory is set to true, the runtime will attempt to use shared memory
-// as long as the size requested fits the pre-allocated size.
-EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize,
-int16_t UseSharedMemory) {
+EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize) {
   // Compute the total memory footprint of the requested data.
   // The master thread requires a stack only for itself. A worker
   // thread (which at this point is a warp master) will require
Index: llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
===
--- llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -543,7 +543,7 @@
 __OMP_RTL(__kmpc_data_sharing_init_stack_spmd, false, Void, )
 
 __OMP_RTL(__kmpc_data_sharing_coalesced_push_stack, false, VoidPtr, SizeTy, Int16)
-__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy, Int16)
+__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy)
 __OMP_RTL(__kmpc_data_sharing_pop_stack, false, Void, VoidPtr)
 __OMP_RTL(__kmpc_begin_sharing_variables, false, Void, VoidPtrPtrPtr, SizeTy)
 __OMP_RTL(__kmpc_end_sharing_variables, false, Void, )
Index: clang/test/OpenMP/nvptx_data_sharing.cpp
===
--- clang/test/OpenMP/nvptx_data_sharing.cpp
+++ clang/test/OpenMP/nvptx_data_sharing.cpp
@@ -2,8 +2,7 @@
 ///==///
 
 // RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
-// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix CK1 --check-prefix SEQ
-// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions | FileCheck %s --check-prefix CK1 --check-prefix PAR
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix CK1
 
 // expected-no-diagnostics
 
@@ -27,11 +26,6 @@
 }
   }
 }
-// SEQ: [[MEM_TY:%.+]] = type { [128 x i8] }
-// SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = common addrspace(3) global [[MEM_TY]] zeroinitializer
-// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* null
-// SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i64 8
-// SEQ-DAG: [[KERNEL_SHARED:@.+]] = internal unnamed_addr constant i16 1
 
 /// = In the worker function = ///
 // CK1: {{.*}}define internal void @__omp_offloading{{.*}}test_ds{{.*}}_worke

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

2020-12-08 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 added a comment.

I'm working on the other tests right now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90670/new/

https://reviews.llvm.org/D90670

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

2020-12-22 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 updated this revision to Diff 313352.
josemonsalve2 added a comment.

Modifying 3 more tests to reflect changes in this patch


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90670/new/

https://reviews.llvm.org/D90670

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  clang/test/OpenMP/declare_target_codegen_globalization.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
  openmp/libomptarget/deviceRTLs/interface.h

Index: openmp/libomptarget/deviceRTLs/interface.h
===
--- openmp/libomptarget/deviceRTLs/interface.h
+++ openmp/libomptarget/deviceRTLs/interface.h
@@ -432,7 +432,7 @@
 EXTERN void __kmpc_data_sharing_init_stack_spmd();
 EXTERN void *__kmpc_data_sharing_coalesced_push_stack(size_t size,
 int16_t UseSharedMemory);
-EXTERN void *__kmpc_data_sharing_push_stack(size_t size, int16_t UseSharedMemory);
+EXTERN void *__kmpc_data_sharing_push_stack(size_t size);
 EXTERN void __kmpc_data_sharing_pop_stack(void *a);
 EXTERN void __kmpc_begin_sharing_variables(void ***GlobalArgs, size_t nArgs);
 EXTERN void __kmpc_end_sharing_variables();
Index: openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
===
--- openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
+++ openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
@@ -144,11 +144,7 @@
 // the list of references to shared variables and to pre-allocate global storage
 // for holding the globalized variables.
 //
-// By default the globalized variables are stored in global memory. If the
-// UseSharedMemory is set to true, the runtime will attempt to use shared memory
-// as long as the size requested fits the pre-allocated size.
-EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize,
-int16_t UseSharedMemory) {
+EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize) {
   // Compute the total memory footprint of the requested data.
   // The master thread requires a stack only for itself. A worker
   // thread (which at this point is a warp master) will require
Index: llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
===
--- llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -543,7 +543,7 @@
 __OMP_RTL(__kmpc_data_sharing_init_stack_spmd, false, Void, )
 
 __OMP_RTL(__kmpc_data_sharing_coalesced_push_stack, false, VoidPtr, SizeTy, Int16)
-__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy, Int16)
+__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy)
 __OMP_RTL(__kmpc_data_sharing_pop_stack, false, Void, VoidPtr)
 __OMP_RTL(__kmpc_begin_sharing_variables, false, Void, VoidPtrPtrPtr, SizeTy)
 __OMP_RTL(__kmpc_end_sharing_variables, false, Void, )
Index: clang/test/OpenMP/nvptx_parallel_codegen.cpp
===
--- clang/test/OpenMP/nvptx_parallel_codegen.cpp
+++ clang/test/OpenMP/nvptx_parallel_codegen.cpp
@@ -1,16 +1,16 @@
 // Test target codegen - host bc file has to be created first.
 // RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
-// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix SEQ
 // RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions | FileCheck %s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix PAR
 // RUN: %clang_cc1 -verify -fopenmp -x c++ -triple i386-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc
-// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix SEQ
-// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix CHECK --check-pr

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

2020-11-02 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 created this revision.
josemonsalve2 added a reviewer: jdoerfert.
Herald added projects: clang, OpenMP, LLVM.
Herald added subscribers: llvm-commits, openmp-commits, cfe-commits.
josemonsalve2 requested review of this revision.
Herald added a subscriber: sstefan1.

Memory globalization was fully implemented in the front end. There are three 
runtime
functions in Libomptarget:

- __kmpc_data_sharing_push_stack
- __kmpc_data_sharing_coalesced_push_stack
- __kmpc_data_sharing_pop_stack

The front end performed a scape analysis and created a record declare with all 
the stack
variables. Then, based on the context (isTTD and other parameters) it would 
create a push
for the size of the record, or for that size multiplied by the WARP (to 
globalize for the 
whole WARP.

This PR removes the record creation, and it simplifies the front end to be a 
simple runtime
call that will be later on optimized in the middle end. The middle end will be 
able to 
determine the stack variables that do scape, and those that do not, as well as 
the 
approrpiate merging of different globalized variables


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D90670

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
  openmp/libomptarget/deviceRTLs/interface.h

Index: openmp/libomptarget/deviceRTLs/interface.h
===
--- openmp/libomptarget/deviceRTLs/interface.h
+++ openmp/libomptarget/deviceRTLs/interface.h
@@ -432,7 +432,7 @@
 EXTERN void __kmpc_data_sharing_init_stack_spmd();
 EXTERN void *__kmpc_data_sharing_coalesced_push_stack(size_t size,
 int16_t UseSharedMemory);
-EXTERN void *__kmpc_data_sharing_push_stack(size_t size, int16_t UseSharedMemory);
+EXTERN void *__kmpc_data_sharing_push_stack(size_t size);
 EXTERN void __kmpc_data_sharing_pop_stack(void *a);
 EXTERN void __kmpc_begin_sharing_variables(void ***GlobalArgs, size_t nArgs);
 EXTERN void __kmpc_end_sharing_variables();
Index: openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
===
--- openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
+++ openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
@@ -144,11 +144,7 @@
 // the list of references to shared variables and to pre-allocate global storage
 // for holding the globalized variables.
 //
-// By default the globalized variables are stored in global memory. If the
-// UseSharedMemory is set to true, the runtime will attempt to use shared memory
-// as long as the size requested fits the pre-allocated size.
-EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize,
-int16_t UseSharedMemory) {
+EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize) {
   // Compute the total memory footprint of the requested data.
   // The master thread requires a stack only for itself. A worker
   // thread (which at this point is a warp master) will require
Index: llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
===
--- llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -543,7 +543,7 @@
 __OMP_RTL(__kmpc_data_sharing_init_stack_spmd, false, Void, )
 
 __OMP_RTL(__kmpc_data_sharing_coalesced_push_stack, false, VoidPtr, SizeTy, Int16)
-__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy, Int16)
+__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy)
 __OMP_RTL(__kmpc_data_sharing_pop_stack, false, Void, VoidPtr)
 __OMP_RTL(__kmpc_begin_sharing_variables, false, Void, VoidPtrPtrPtr, SizeTy)
 __OMP_RTL(__kmpc_end_sharing_variables, false, Void, )
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
@@ -439,16 +439,14 @@
 
   /// The data for the single globalized variable.
   struct MappedVarData {
-/// Corresponding field in the global record.
-const FieldDecl *FD = nullptr;
 /// Corresponding address.
 Address PrivateAddr = Address::invalid();
+llvm::Value *globalizedVal;
 /// true, if only one element is required (for latprivates in SPMD mode),
 /// false, if need to create based on the warp-size.
 bool IsOnePerTeam = false;
 MappedVarData() = delete;
-MappedVarData(const FieldDecl *FD, bool IsOnePerTeam = false)
-: FD(FD), IsOnePerTeam(IsOnePerTeam) {}
+MappedVarData(bool IsOnePerTeam = false) : IsOnePerTeam(IsOnePerTeam) {}
   };
   /// The map of local variables to their addresses in the global memory.
   using DeclToAddrMapTy = llvm::MapVector;
@@ -456,13 +454,9 @@
   using EscapedParamsTy = llvm::SmallPtrSet;
   struct FunctionData {

[PATCH] D102107: [OpenMP] Codegen aggregate for outlined function captures

2021-07-09 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 added a comment.

In D102107#2867417 , @ABataev wrote:

> In D102107#2867382 , @jdoerfert 
> wrote:
>
>> In D102107#2832740 , @ABataev 
>> wrote:
>>
>>> In D102107#2832286 , @jdoerfert 
>>> wrote:
>>>
 In D102107#2824581 , @ABataev 
 wrote:

> In D102107#2823706 , @jdoerfert 
> wrote:
>
>> In D102107#2821976 , @ABataev 
>> wrote:
>>
>>> We used this kind of codegen initially but later found out that it 
>>> causes a large overhead when gathering pointers into a record. What 
>>> about hybrid scheme where the first args are passed as arguments and 
>>> others (if any) are gathered into a record?
>>
>> I'm confused, maybe I misunderstand the problem. The parallel function 
>> arguments need to go from the main thread to the workers somehow, I 
>> don't see how this is done w/o a record. This patch makes it explicit 
>> though.
>
> Pass it in a record for workers only? And use a hybrid scheme for all 
> other parallel regions.

 I still do not follow. What does it mean for workers only? What is a 
 hybrid scheme? And, probably most importantly, how would we not eventually 
 put everything into a record anyway?
>>>
>>> On the host you don’t need to put everything into a record, especially for 
>>> small parallel regions. Pass some first args in registers and only the 
>>> remaining args gather into the record. For workers just pass all args in 
>>> the record.
>>
>> Could you please respond to my question so we make progress here. We 
>> *always* have to pass things in a record, do you agree?
>
> On the GPU device, yes. And I'm absolutely fine with packing args for the GPU 
> device. But the patch packs the args not only for the GPU devices but also 
> for the host and other devices which may not require packing/unpacking. For 
> such devices/host better to avoid packing/unpacking as it introduces overhead 
> in many cases.

Hi Alexey,

Wouldn't you always need to pack to pass the arguments to the outlined 
function? What is the benefit of avoiding packing the arguments in the runtime 
call, if then you have to pack them for the outlined function?

I would really appreciate an example, since I am just getting an understanding 
of OpenMP in LLVM.

Thanks!

>> If we pack the things eventually to pass it to the workers, why would we not 
>> pack it right away and avoid complexity? Passing varargs, then packing them 
>> later (with the same thread) into a record to give it to the workers is 
>> arguably introducing cost. What is the benefit of a hybrid approach given 
>> that it is (theoretically) more costly and arguably more complex?




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102107/new/

https://reviews.llvm.org/D102107

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136103: OpenMP asynchronous memory copy support

2022-10-17 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 added a comment.

Thanks for implementing this. I have added a comment inlined.




Comment at: openmp/libomptarget/src/private.h:113
+
+typedef struct kmp_tasking_flags { /* Total struct must be exactly 32 bits */
+  /* Compiler flags */ /* Total compiler flags must be 16 bits */

This would be the third location where this struct is duplicated: interop.h, 
kmp.h and this file. Would it make sense to try to add it to another common 
header file?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136103/new/

https://reviews.llvm.org/D136103

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D127065: [docs] Update supported language standards list for C++

2022-06-09 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 added a comment.

Changes are fine. I am not familiar with the progress in C++20 and 23, but I 
trust your judgement here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127065/new/

https://reviews.llvm.org/D127065

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D143527: [OpenMP][5.1] Fix parallel masked is ignored #59939

2023-04-03 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG64549f0903e2: [OpenMP][5.1] Fix parallel masked is ignored 
#59939 (authored by randreshg, committed by josemonsalve2).
Herald added projects: clang, OpenMP.
Herald added subscribers: openmp-commits, cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143527/new/

https://reviews.llvm.org/D143527

Files:
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CGStmtOpenMP.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/lib/Parse/ParseOpenMP.cpp
  clang/test/OpenMP/parallel_masked.cpp
  clang/test/OpenMP/parallel_masked_target.cpp
  openmp/libomptarget/DeviceRTL/include/Interface.h
  openmp/libomptarget/DeviceRTL/src/Synchronization.cpp

Index: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
===
--- openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
+++ openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
@@ -520,6 +520,13 @@
 
 void __kmpc_end_master(IdentTy *Loc, int32_t TId) { FunctionTracingRAII(); }
 
+int32_t __kmpc_masked(IdentTy *Loc, int32_t TId, int32_t Filter) {
+  FunctionTracingRAII();
+  return omp_get_thread_num() == Filter;
+}
+
+void __kmpc_end_masked(IdentTy *Loc, int32_t TId) { FunctionTracingRAII(); }
+
 int32_t __kmpc_single(IdentTy *Loc, int32_t TId) {
   FunctionTracingRAII();
   return __kmpc_master(Loc, TId);
Index: openmp/libomptarget/DeviceRTL/include/Interface.h
===
--- openmp/libomptarget/DeviceRTL/include/Interface.h
+++ openmp/libomptarget/DeviceRTL/include/Interface.h
@@ -260,6 +260,10 @@
 
 void __kmpc_end_master(IdentTy *Loc, int32_t TId);
 
+int32_t __kmpc_masked(IdentTy *Loc, int32_t TId, int32_t Filter);
+
+void __kmpc_end_masked(IdentTy *Loc, int32_t TId);
+
 int32_t __kmpc_single(IdentTy *Loc, int32_t TId);
 
 void __kmpc_end_single(IdentTy *Loc, int32_t TId);
Index: clang/test/OpenMP/parallel_masked_target.cpp
===
--- /dev/null
+++ clang/test/OpenMP/parallel_masked_target.cpp
@@ -0,0 +1,112 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --prefix-filecheck-ir-name _
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -fopenmp -fopenmp-version=52 -fopenmp-targets=nvptx64 -offload-device-only -x c -emit-llvm %s -o - | FileCheck %s
+// expected-no-diagnostics
+
+void foo();
+
+void masked() {
+#pragma target
+#pragma omp parallel masked
+{
+foo();
+}
+}
+
+void maskedFilter() {
+const int tid = 1;
+#pragma target
+#pragma omp parallel masked filter(tid)
+{
+foo();
+}
+}
+
+void master() {
+#pragma target
+#pragma omp parallel master
+{
+foo();
+}
+}
+// CHECK-LABEL: define {{[^@]+}}@masked
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB1:[0-9]+]], i32 0, ptr @.omp_outlined.)
+// CHECK-NEXT:ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@.omp_outlined.
+// CHECK-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:[[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR]], align 8
+// CHECK-NEXT:store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
+// CHECK-NEXT:[[TMP1:%.*]] = load i32, ptr [[TMP0]], align 4
+// CHECK-NEXT:[[TMP2:%.*]] = call i32 @__kmpc_masked(ptr @[[GLOB1]], i32 [[TMP1]], i32 0)
+// CHECK-NEXT:[[TMP3:%.*]] = icmp ne i32 [[TMP2]], 0
+// CHECK-NEXT:br i1 [[TMP3]], label [[OMP_IF_THEN:%.*]], label [[OMP_IF_END:%.*]]
+// CHECK:   omp_if.then:
+// CHECK-NEXT:call void (...) @foo()
+// CHECK-NEXT:call void @__kmpc_end_masked(ptr @[[GLOB1]], i32 [[TMP1]])
+// CHECK-NEXT:br label [[OMP_IF_END]]
+// CHECK:   omp_if.end:
+// CHECK-NEXT:ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@maskedFilter
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TID:%.*]] = alloca i32, align 4
+// CHECK-NEXT:store i32 1, ptr [[TID]], align 4
+// CHECK-NEXT:call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB1]], i32 0, ptr @.omp_outlined..1)
+// CHECK-NEXT:ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@.omp_outlined..1
+// CHECK-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]]) #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:[[DOTBOUND_

[PATCH] D97680: [OpenMP] Simplify GPU memory globalization

2021-03-19 Thread Jose Manuel Monsalve Diaz via Phabricator via cfe-commits

josemonsalve2 added inline comments.



Comment at: clang/test/OpenMP/nvptx_parallel_codegen.cpp:4
-// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device 
-fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck 
%s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix SEQ
-// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device 
-fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns 
-fopenmp-cuda-parallel-target-regions | FileCheck %s --check-prefix CHECK 
--check-prefix CHECK-64 --check-prefix PAR
 // RUN: %clang_cc1 -verify -fopenmp -x c++ -triple i386-unknown-unknown 
-fopenmp-targets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc

Is this flag `-fopenmp-cuda-parallel-target-regions` useful after this change? 
I know it was used to determine something in globalization, and I believe this 
was removed. But is it used for anything else somewhere else or could it be 
removed?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97680/new/

https://reviews.llvm.org/D97680

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D133802: [OpenMP] Remove simplified device runtime handling

[PATCH] D133802: [OpenMP] Remove simplified device runtime handling

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

[PATCH] D106298: [OpenMP] Creating the `NumTeams` and `ThreadLimit` attributes to outlined functions

[PATCH] D106298: [OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions

[PATCH] D106298: [OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

[PATCH] D106298: [OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions

[PATCH] D106033: [OpenMP] Folding threadLimit and numThreads when single value in kernels

[PATCH] D106033: [OpenMP] Folding threadLimit and numThreads when single value in kernels

[PATCH] D106033: [OpenMP] Folding threadLimit and numThreads when single value in kernels

[PATCH] D92853: Simplifying memory globalization from the front end to move optimizations to the middle end.Memory globalization was fully implemented in the front end. There are three runtimefunction

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

[PATCH] D90670: Simplifying memory globalization from the front end to move optimizations to the middle end.

[PATCH] D102107: [OpenMP] Codegen aggregate for outlined function captures

[PATCH] D136103: OpenMP asynchronous memory copy support

[PATCH] D127065: [docs] Update supported language standards list for C++

[PATCH] D143527: [OpenMP][5.1] Fix parallel masked is ignored #59939

[PATCH] D97680: [OpenMP] Simplify GPU memory globalization

26 matches

Site Navigation

Mail list logo

Footer information