r351642 - [NFC] Generalize expected output for callback test

2019-01-19 Thread Johannes Doerfert via cfe-commits
Author: jdoerfert
Date: Sat Jan 19 01:40:08 2019
New Revision: 351642

URL: http://llvm.org/viewvc/llvm-project?rev=351642&view=rev
Log:
[NFC] Generalize expected output for callback test

Modified:
cfe/trunk/test/CodeGen/callback_pthread_create.c

Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_pthread_create.c?rev=351642&r1=351641&r2=351642&view=diff
==
--- cfe/trunk/test/CodeGen/callback_pthread_create.c (original)
+++ cfe/trunk/test/CodeGen/callback_pthread_create.c Sat Jan 19 01:40:08 2019
@@ -1,7 +1,7 @@
 // RUN: %clang -O1 %s -S -c -emit-llvm -o - | FileCheck %s
 // RUN: %clang -O1 %s -S -c -emit-llvm -o - | opt -ipconstprop -S | FileCheck 
--check-prefix=IPCP %s
 
-// CHECK: declare !callback ![[cid:[0-9]+]] dso_local i32 @pthread_create
+// CHECK: declare !callback ![[cid:[0-9]+]] {{.*}}i32 @pthread_create
 // CHECK: ![[cid]] = !{![[cidb:[0-9]+]]}
 // CHECK: ![[cidb]] = !{i64 2, i64 3, i1 false}
 


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r351643 - [FIX] Restrict callback pthreads_create test to linux only

2019-01-19 Thread Johannes Doerfert via cfe-commits
Author: jdoerfert
Date: Sat Jan 19 01:40:10 2019
New Revision: 351643

URL: http://llvm.org/viewvc/llvm-project?rev=351643&view=rev
Log:
[FIX] Restrict callback pthreads_create test to linux only

Modified:
cfe/trunk/test/CodeGen/callback_pthread_create.c

Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_pthread_create.c?rev=351643&r1=351642&r2=351643&view=diff
==
--- cfe/trunk/test/CodeGen/callback_pthread_create.c (original)
+++ cfe/trunk/test/CodeGen/callback_pthread_create.c Sat Jan 19 01:40:10 2019
@@ -1,6 +1,9 @@
 // RUN: %clang -O1 %s -S -c -emit-llvm -o - | FileCheck %s
 // RUN: %clang -O1 %s -S -c -emit-llvm -o - | opt -ipconstprop -S | FileCheck 
--check-prefix=IPCP %s
 
+// This is a linux only test for now due to the include.
+// UNSUPPORTED: !linux
+
 // CHECK: declare !callback ![[cid:[0-9]+]] {{.*}}i32 @pthread_create
 // CHECK: ![[cid]] = !{![[cidb:[0-9]+]]}
 // CHECK: ![[cidb]] = !{i64 2, i64 3, i1 false}


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r351629 - Emit !callback metadata and introduce the callback attribute

2019-01-19 Thread Johannes Doerfert via cfe-commits
Author: jdoerfert
Date: Fri Jan 18 21:36:54 2019
New Revision: 351629

URL: http://llvm.org/viewvc/llvm-project?rev=351629&view=rev
Log:
Emit !callback metadata and introduce the callback attribute

  With commit r351627, LLVM gained the ability to apply (existing) IPO
  optimizations on indirections through callbacks, or transitive calls.
  The general idea is that we use an abstraction to hide the middle man
  and represent the callback call in the context of the initial caller.
  It is described in more detail in the commit message of the LLVM patch
  r351627, the llvm::AbstractCallSite class description, and the
  language reference section on callback-metadata.

  This commit enables clang to emit !callback metadata that is
  understood by LLVM. It does so in three different cases:
1) For known broker functions declarations that are directly
   generated, e.g., __kmpc_fork_call for the OpenMP pragma parallel.
2) For known broker functions that are identified by their name and
   source location through the builtin detection, e.g.,
   pthread_create from the POSIX thread API.
3) For user annotated functions that carry the "callback(callee, ...)"
   attribute. The attribute has to include the name, or index, of
   the callback callee and how the passed arguments can be
   identified (as many as the callback callee has). See the callback
   attribute documentation for detailed information.

Differential Revision: https://reviews.llvm.org/D55483

Added:
cfe/trunk/test/CodeGen/attr-callback.c
cfe/trunk/test/CodeGen/callback_annotated.c
cfe/trunk/test/CodeGen/callback_openmp.c
cfe/trunk/test/CodeGen/callback_pthread_create.c
cfe/trunk/test/CodeGenCXX/attr-callback.cpp
cfe/trunk/test/Sema/attr-callback-broken.c
cfe/trunk/test/Sema/attr-callback.c
cfe/trunk/test/SemaCXX/attr-callback-broken.cpp
cfe/trunk/test/SemaCXX/attr-callback.cpp
Modified:
cfe/trunk/include/clang/AST/ASTContext.h
cfe/trunk/include/clang/Basic/Attr.td
cfe/trunk/include/clang/Basic/AttrDocs.td
cfe/trunk/include/clang/Basic/Builtins.def
cfe/trunk/include/clang/Basic/Builtins.h
cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td
cfe/trunk/lib/AST/ASTContext.cpp
cfe/trunk/lib/Basic/Builtins.cpp
cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp
cfe/trunk/lib/CodeGen/CodeGenModule.cpp
cfe/trunk/lib/Parse/ParseDecl.cpp
cfe/trunk/lib/Sema/SemaDecl.cpp
cfe/trunk/lib/Sema/SemaDeclAttr.cpp
cfe/trunk/test/Analysis/retain-release.m
cfe/trunk/test/Misc/pragma-attribute-supported-attributes-list.test
cfe/trunk/test/OpenMP/parallel_codegen.cpp
cfe/trunk/utils/TableGen/ClangAttrEmitter.cpp

Modified: cfe/trunk/include/clang/AST/ASTContext.h
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/AST/ASTContext.h?rev=351629&r1=351628&r2=351629&view=diff
==
--- cfe/trunk/include/clang/AST/ASTContext.h (original)
+++ cfe/trunk/include/clang/AST/ASTContext.h Fri Jan 18 21:36:54 2019
@@ -2003,6 +2003,9 @@ public:
 /// No error
 GE_None,
 
+/// Missing a type
+GE_Missing_type,
+
 /// Missing a type from 
 GE_Missing_stdio,
 

Modified: cfe/trunk/include/clang/Basic/Attr.td
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Attr.td?rev=351629&r1=351628&r2=351629&view=diff
==
--- cfe/trunk/include/clang/Basic/Attr.td (original)
+++ cfe/trunk/include/clang/Basic/Attr.td Fri Jan 18 21:36:54 2019
@@ -190,6 +190,9 @@ class VariadicIdentifierArgument : Argument;
 
+// A list of identifiers matching parameters or ParamIdx indices.
+class VariadicParamOrParamIdxArgument : Argument;
+
 // Like VariadicParamIdxArgument but for a single function parameter index.
 class ParamIdxArgument : Argument;
 
@@ -1210,6 +1213,13 @@ def FormatArg : InheritableAttr {
   let Documentation = [Undocumented];
 }
 
+def Callback : InheritableAttr {
+  let Spellings = [Clang<"callback">];
+  let Args = [VariadicParamOrParamIdxArgument<"Encoding">];
+  let Subjects = SubjectList<[Function]>;
+  let Documentation = [CallbackDocs];
+}
+
 def GNUInline : InheritableAttr {
   let Spellings = [GCC<"gnu_inline">];
   let Subjects = SubjectList<[Function]>;

Modified: cfe/trunk/include/clang/Basic/AttrDocs.td
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/AttrDocs.td?rev=351629&r1=351628&r2=351629&view=diff
==
--- cfe/trunk/include/clang/Basic/AttrDocs.td (original)
+++ cfe/trunk/include/clang/Basic/AttrDocs.td Fri Jan 18 21:36:54 2019
@@ -3781,6 +3781,55 @@ it rather documents the programmer's int
   }];
 }
 
+def CallbackDocs : Documentation {
+  let Category = DocCatVariable;
+  let Content = [{
+The ``callback`` attribute specifies that the annotated functio

r351665 - [FIX] Generalize the expected results for callback clang tests

2019-01-19 Thread Johannes Doerfert via cfe-commits
Author: jdoerfert
Date: Sat Jan 19 12:46:10 2019
New Revision: 351665

URL: http://llvm.org/viewvc/llvm-project?rev=351665&view=rev
Log:
[FIX] Generalize the expected results for callback clang tests

Modified:
cfe/trunk/test/CodeGen/callback_annotated.c
cfe/trunk/test/CodeGen/callback_pthread_create.c

Modified: cfe/trunk/test/CodeGen/callback_annotated.c
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_annotated.c?rev=351665&r1=351664&r2=351665&view=diff
==
--- cfe/trunk/test/CodeGen/callback_annotated.c (original)
+++ cfe/trunk/test/CodeGen/callback_annotated.c Sat Jan 19 12:46:10 2019
@@ -30,22 +30,20 @@ __attribute__((callback(4, d, 5, 2))) vo
 
 static void *VoidPtr2VoidPtr(void *payload) {
   // RUN2: ret i8* %payload
-  // IPCP:  ret i8* null
+  // IPCP: ret i8* null
   return payload;
 }
 
 static int ThreeInt2Int(int a, int b, int c) {
-  // RUN2:  define internal i32 @ThreeInt2Int(i32 %a, i32 %b, i32 %c)
-  // RUN2-NEXT: entry:
-  // RUN2-NEXT: %mul = mul nsw i32 %b, %a
-  // RUN2-NEXT: %add = add nsw i32 %mul, %c
-  // RUN2-NEXT: ret i32 %add
+  // RUN2:   define internal i32 @ThreeInt2Int(i32 %a, i32 %b, i32 %c)
+  // RUN2: %mul = mul nsw i32 %b, %a
+  // RUN2: %add = add nsw i32 %mul, %c
+  // RUN2: ret i32 %add
 
-  // IPCP:   define internal i32 @ThreeInt2Int(i32 %a, i32 %b, i32 %c)
-  // IPCP-NEXT:  entry:
-  // IPCP-NEXT:  %mul = mul nsw i32 4, %a
-  // IPCP-NEXT:  %add = add nsw i32 %mul, %c
-  // IPCP-NEXT:  ret i32 %add
+  // IPCP:   define internal i32 @ThreeInt2Int(i32 %a, i32 %b, i32 %c)
+  // IPCP: %mul = mul nsw i32 4, %a
+  // IPCP: %add = add nsw i32 %mul, %c
+  // IPCP: ret i32 %add
 
   return a * b + c;
 }

Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_pthread_create.c?rev=351665&r1=351664&r2=351665&view=diff
==
--- cfe/trunk/test/CodeGen/callback_pthread_create.c (original)
+++ cfe/trunk/test/CodeGen/callback_pthread_create.c Sat Jan 19 12:46:10 2019
@@ -14,15 +14,13 @@ const int GlobalVar = 0;
 
 static void *callee0(void *payload) {
 // IPCP:  define internal i8* @callee0
-// IPCP-NEXT:   entry:
-// IPCP-NEXT: ret i8* null
+// IPCP:ret i8* null
   return payload;
 }
 
 static void *callee1(void *payload) {
 // IPCP:  define internal i8* @callee1
-// IPCP-NEXT:   entry:
-// IPCP-NEXT: ret i8* bitcast (i32* @GlobalVar to i8*)
+// IPCP:ret i8* bitcast (i32* @GlobalVar to i8*)
   return payload;
 }
 


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r351744 - [NFC] Fix comparison warning issues by MSVC

2019-01-21 Thread Johannes Doerfert via cfe-commits
Author: jdoerfert
Date: Mon Jan 21 06:23:46 2019
New Revision: 351744

URL: http://llvm.org/viewvc/llvm-project?rev=351744&view=rev
Log:
[NFC] Fix comparison warning issues by MSVC

Modified:
cfe/trunk/lib/Sema/SemaDeclAttr.cpp

Modified: cfe/trunk/lib/Sema/SemaDeclAttr.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaDeclAttr.cpp?rev=351744&r1=351743&r2=351744&view=diff
==
--- cfe/trunk/lib/Sema/SemaDeclAttr.cpp (original)
+++ cfe/trunk/lib/Sema/SemaDeclAttr.cpp Mon Jan 21 06:23:46 2019
@@ -3560,7 +3560,9 @@ static void handleCallbackAttr(Sema &S,
 
   int CalleeIdx = EncodingIndices.front();
   // Check if the callee index is proper, thus not "this" and not "unknown".
-  if (CalleeIdx < HasImplicitThisParam) {
+  // This means the "CalleeIdx" has to be non-negative if 
"HasImplicitThisParam"
+  // is false and positive if "HasImplicitThisParam" is true.
+  if (CalleeIdx < (int)HasImplicitThisParam) {
 S.Diag(AL.getLoc(), diag::err_callback_attribute_invalid_callee)
 << AL.getRange();
 return;


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r352299 - [FIX] Adjust CXX microsoft abi dynamic cast test to r352293

2019-01-26 Thread Johannes Doerfert via cfe-commits
Author: jdoerfert
Date: Sat Jan 26 16:22:10 2019
New Revision: 352299

URL: http://llvm.org/viewvc/llvm-project?rev=352299&view=rev
Log:
[FIX] Adjust CXX microsoft abi dynamic cast test to r352293

Modified:
cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp
cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp

Modified: cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp?rev=352299&r1=352298&r2=352299&view=diff
==
--- cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp (original)
+++ cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp Sat Jan 26 
16:22:10 2019
@@ -60,7 +60,7 @@ T* test5(A* x) { return dynamic_cast
 // CHECK-NEXT:   [[VBOFFP:%.*]] = getelementptr inbounds i32, i32* [[VBTBL]], 
i32 1
 // CHECK-NEXT:   [[VBOFFS:%.*]] = load i32, i32* [[VBOFFP]], align 4
 // CHECK-NEXT:   [[ADJ:%.*]] = getelementptr inbounds i8, i8* [[VOIDP]], i32 
[[VBOFFS]]
-// CHECK-NEXT:   [[CALL:%.*]] = tail call i8* @__RTDynamicCast(i8* [[ADJ]], 
i32 [[VBOFFS]], i8* {{.*}}bitcast (%rtti.TypeDescriptor7* @"??_R0?AUA@@@8" to 
i8*), i8* {{.*}}bitcast (%rtti.TypeDescriptor7* @"??_R0?AUT@@@8" to i8*), i32 0)
+// CHECK-NEXT:   [[CALL:%.*]] = tail call i8* @__RTDynamicCast(i8* nonnull 
[[ADJ]], i32 [[VBOFFS]], i8* {{.*}}bitcast (%rtti.TypeDescriptor7* 
@"??_R0?AUA@@@8" to i8*), i8* {{.*}}bitcast (%rtti.TypeDescriptor7* 
@"??_R0?AUT@@@8" to i8*), i32 0)
 // CHECK-NEXT:   [[RES:%.*]] = bitcast i8* [[CALL]] to %struct.T*
 // CHECK-NEXT:   br label
 // CHECK:[[RET:%.*]] = phi %struct.T*
@@ -100,7 +100,7 @@ void* test8(A* x) { return dynamic_cast<
 // CHECK-NEXT:   [[VBOFFP:%.*]] = getelementptr inbounds i32, i32* [[VBTBL]], 
i32 1
 // CHECK-NEXT:   [[VBOFFS:%.*]] = load i32, i32* [[VBOFFP]], align 4
 // CHECK-NEXT:   [[ADJ:%.*]] = getelementptr inbounds i8, i8* [[VOIDP]], i32 
[[VBOFFS]]
-// CHECK-NEXT:   [[RES:%.*]] = tail call i8* @__RTCastToVoid(i8* [[ADJ]])
+// CHECK-NEXT:   [[RES:%.*]] = tail call i8* @__RTCastToVoid(i8* nonnull 
[[ADJ]])
 // CHECK-NEXT:   br label
 // CHECK:[[RET:%.*]] = phi i8*
 // CHECK-NEXT:   ret i8* [[RET]]

Modified: cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp?rev=352299&r1=352298&r2=352299&view=diff
==
--- cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp (original)
+++ cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp Sat Jan 26 16:22:10 2019
@@ -36,7 +36,7 @@ const std::type_info* test3_typeid() { r
 // CHECK-NEXT:   [[VBSLOT:%.*]] = getelementptr inbounds i32, i32* [[VBTBL]], 
i32 1
 // CHECK-NEXT:   [[VBASE_OFFS:%.*]] = load i32, i32* [[VBSLOT]], align 4
 // CHECK-NEXT:   [[ADJ:%.*]] = getelementptr inbounds i8, i8* [[THIS]], i32 
[[VBASE_OFFS]]
-// CHECK-NEXT:   [[RT:%.*]] = tail call i8* @__RTtypeid(i8* [[ADJ]])
+// CHECK-NEXT:   [[RT:%.*]] = tail call i8* @__RTtypeid(i8* nonnull [[ADJ]])
 // CHECK-NEXT:   [[RET:%.*]] = bitcast i8* [[RT]] to %struct.type_info*
 // CHECK-NEXT:   ret %struct.type_info* [[RET]]
 


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r353088 - Generalize pthread callback test case

2019-02-04 Thread Johannes Doerfert via cfe-commits
Author: jdoerfert
Date: Mon Feb  4 12:42:38 2019
New Revision: 353088

URL: http://llvm.org/viewvc/llvm-project?rev=353088&view=rev
Log:
Generalize pthread callback test case

Changes suggested by Eli Friedman 

Modified:
cfe/trunk/test/CodeGen/callback_pthread_create.c

Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_pthread_create.c?rev=353088&r1=353087&r2=353088&view=diff
==
--- cfe/trunk/test/CodeGen/callback_pthread_create.c (original)
+++ cfe/trunk/test/CodeGen/callback_pthread_create.c Mon Feb  4 12:42:38 2019
@@ -1,14 +1,22 @@
-// RUN: %clang -O1 %s -S -c -emit-llvm -o - | FileCheck %s
-// RUN: %clang -O1 %s -S -c -emit-llvm -o - | opt -ipconstprop -S | FileCheck 
--check-prefix=IPCP %s
-
-// This is a linux only test for now due to the include.
-// UNSUPPORTED: !linux
+// RUN: %clang_cc1 -O1 %s -S -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -O1 %s -S -emit-llvm -o - | opt -ipconstprop -S | FileCheck 
--check-prefix=IPCP %s
 
 // CHECK: declare !callback ![[cid:[0-9]+]] {{.*}}i32 @pthread_create
 // CHECK: ![[cid]] = !{![[cidb:[0-9]+]]}
 // CHECK: ![[cidb]] = !{i64 2, i64 3, i1 false}
 
-#include 
+// Taken from test/Analysis/retain-release.m
+//{
+struct _opaque_pthread_t {};
+struct _opaque_pthread_attr_t {};
+typedef struct _opaque_pthread_t *__darwin_pthread_t;
+typedef struct _opaque_pthread_attr_t __darwin_pthread_attr_t;
+typedef __darwin_pthread_t pthread_t;
+typedef __darwin_pthread_attr_t pthread_attr_t;
+
+int pthread_create(pthread_t *, const pthread_attr_t *,
+   void *(*)(void *), void *);
+//}
 
 const int GlobalVar = 0;
 
@@ -26,8 +34,8 @@ static void *callee1(void *payload) {
 
 void foo() {
   pthread_t MyFirstThread;
-  pthread_create(&MyFirstThread, NULL, callee0, NULL);
+  pthread_create(&MyFirstThread, 0, callee0, 0);
 
   pthread_t MySecondThread;
-  pthread_create(&MySecondThread, NULL, callee1, (void *)&GlobalVar);
+  pthread_create(&MySecondThread, 0, callee1, (void *)&GlobalVar);
 }


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r362545 - Introduce Value::stripPointerCastsSameRepresentation

2019-06-04 Thread Johannes Doerfert via cfe-commits
Author: jdoerfert
Date: Tue Jun  4 13:21:46 2019
New Revision: 362545

URL: http://llvm.org/viewvc/llvm-project?rev=362545&view=rev
Log:
Introduce Value::stripPointerCastsSameRepresentation

This patch allows current users of Value::stripPointerCasts() to force
the result of the function to have the same representation as the value
it was called on. This is useful in various cases, e.g., (non-)null
checks.

In this patch only a single call site was adjusted to fix an existing
misuse that would cause nonnull where they may be wrong. Uses in
attribute deduction and other areas, e.g., D60047, are to be expected.

For a discussion on this topic, please see [0].

[0] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128423.html

Reviewers: hfinkel, arsenm, reames

Subscribers: wdng, hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61607

Modified:
cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl

Modified: cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl?rev=362545&r1=362544&r2=362545&view=diff
==
--- cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl (original)
+++ cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl Tue Jun  4 13:21:46 
2019
@@ -9,6 +9,6 @@ void foo() {
   // CHECK: [[REF:%.*]] = alloca i32
   // CHECK: store i32 1, i32* [[REF]]
   // CHECK: [[REG:%[0-9]+]] = addrspacecast i32* [[REF]] to i32 addrspace(4)*
-  // CHECK: call spir_func i32 @_Z3barRU3AS4Kj(i32 addrspace(4)* nonnull 
dereferenceable(4) [[REG]])
+  // CHECK: call spir_func i32 @_Z3barRU3AS4Kj(i32 addrspace(4)* 
dereferenceable(4) [[REG]])
   bar(1);
 }


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r367387 - [Fix] Customize warnings for missing built-in types

2019-07-30 Thread Johannes Doerfert via cfe-commits
Author: jdoerfert
Date: Tue Jul 30 22:16:38 2019
New Revision: 367387

URL: http://llvm.org/viewvc/llvm-project?rev=367387&view=rev
Log:
[Fix] Customize warnings for missing built-in types

If we detect a built-in declaration for which we cannot derive a type
matching the pattern in the Builtins.def file, we currently emit a
warning that the respective header is needed. However, this is not
necessarily the behavior we want as it has no connection to the location
of the declaration (which can actually be in the header in question).
Instead, this warning is generated
  - if we could not build the type for the pattern on file (for some
reason). Here we should make the reason explicit. The actual problem
is otherwise circumvented as the warning is misleading, see [0] for
an example.
  - if we could not build the type for the pattern because we do not
have a type on record, possible since D55483, we should not emit any
warning. See [1] for a legitimate problem.

This patch address both cases. For the "setjmp" family a new warning is
introduced and for built-ins without type on record, so far
"pthread_create", we do not emit the warning anymore.

Also see: PR40692

[0] https://lkml.org/lkml/2019/1/11/718
[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235583

Differential Revision: https://reviews.llvm.org/D58091

Added:
cfe/trunk/test/Sema/builtin-setjmp.c
Modified:
cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td
cfe/trunk/lib/Sema/SemaDecl.cpp
cfe/trunk/test/Analysis/retain-release.m
cfe/trunk/test/Sema/implicit-builtin-decl.c

Modified: cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td?rev=367387&r1=367386&r2=367387&view=diff
==
--- cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td (original)
+++ cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td Tue Jul 30 22:16:38 
2019
@@ -598,6 +598,10 @@ def ext_implicit_lib_function_decl : Ext
 def note_include_header_or_declare : Note<
   "include the header <%0> or explicitly provide a declaration for '%1'">;
 def note_previous_builtin_declaration : Note<"%0 is a builtin with type %1">;
+def warn_implicit_decl_no_jmp_buf
+: Warning<"declaration of built-in function '%0' requires the declaration"
+" of the 'jmp_buf' type, commonly provided in the header .">,
+  InGroup>;
 def warn_implicit_decl_requires_sysheader : Warning<
   "declaration of built-in function '%1' requires inclusion of the header 
<%0>">,
   InGroup;

Modified: cfe/trunk/lib/Sema/SemaDecl.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaDecl.cpp?rev=367387&r1=367386&r2=367387&view=diff
==
--- cfe/trunk/lib/Sema/SemaDecl.cpp (original)
+++ cfe/trunk/lib/Sema/SemaDecl.cpp Tue Jul 30 22:16:38 2019
@@ -1983,10 +1983,27 @@ NamedDecl *Sema::LazilyCreateBuiltin(Ide
   ASTContext::GetBuiltinTypeError Error;
   QualType R = Context.GetBuiltinType(ID, Error);
   if (Error) {
-if (ForRedeclaration)
-  Diag(Loc, diag::warn_implicit_decl_requires_sysheader)
-  << getHeaderName(Context.BuiltinInfo, ID, Error)
+if (!ForRedeclaration)
+  return nullptr;
+
+// If we have a builtin without an associated type we should not emit a
+// warning when we were not able to find a type for it.
+if (Error == ASTContext::GE_Missing_type)
+  return nullptr;
+
+// If we could not find a type for setjmp it is because the jmp_buf type 
was
+// not defined prior to the setjmp declaration.
+if (Error == ASTContext::GE_Missing_setjmp) {
+  Diag(Loc, diag::warn_implicit_decl_no_jmp_buf)
   << Context.BuiltinInfo.getName(ID);
+  return nullptr;
+}
+
+// Generally, we emit a warning that the declaration requires the
+// appropriate header.
+Diag(Loc, diag::warn_implicit_decl_requires_sysheader)
+<< getHeaderName(Context.BuiltinInfo, ID, Error)
+<< Context.BuiltinInfo.getName(ID);
 return nullptr;
   }
 

Modified: cfe/trunk/test/Analysis/retain-release.m
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Analysis/retain-release.m?rev=367387&r1=367386&r2=367387&view=diff
==
--- cfe/trunk/test/Analysis/retain-release.m (original)
+++ cfe/trunk/test/Analysis/retain-release.m Tue Jul 30 22:16:38 2019
@@ -2,7 +2,7 @@
 // RUN: %clang_analyze_cc1 -triple x86_64-apple-darwin10\
 // RUN: -analyzer-checker=core,osx.coreFoundation.CFRetainRelease\
 // RUN: -analyzer-checker=osx.cocoa.ClassRelease,osx.cocoa.RetainCount\
-// RUN: -analyzer-checker=debug.ExprInspection -fblocks -verify=expected,C 
%s\
+// RUN: -analyzer-checker=debug.ExprInspection -fblocks -verify %s\
 // RUN: -Wno-objc-root-class -analyzer-output=plis

[clang] 7db017b - [OpenMP][Docs] Update Clang Support docs after D75591

2020-07-29 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-29T10:21:05-05:00
New Revision: 7db017bf3405c7fa43786fe27380d88702e19584

URL: 
https://github.com/llvm/llvm-project/commit/7db017bf3405c7fa43786fe27380d88702e19584
DIFF: 
https://github.com/llvm/llvm-project/commit/7db017bf3405c7fa43786fe27380d88702e19584.diff

LOG: [OpenMP][Docs] Update Clang Support docs after D75591

Added: 


Modified: 
clang/docs/OpenMPSupport.rst

Removed: 




diff  --git a/clang/docs/OpenMPSupport.rst b/clang/docs/OpenMPSupport.rst
index a1d1b120bcec..bda52e934c26 100644
--- a/clang/docs/OpenMPSupport.rst
+++ b/clang/docs/OpenMPSupport.rst
@@ -264,7 +264,7 @@ want to help with the implementation.
 
+==+==+==+===+
 | misc extension   | user-defined function variants with #ifdef 
protection| :part:`worked on`| D71179   
 |
 
+--+--+--+---+
-| misc extension   | default(firstprivate) & default(private)  
   | :part:`worked on`| 
  |
+| misc extension   | default(firstprivate) & default(private)  
   | :part:`partial`  | firstprivate done: D75591   
  |
 
+--+--+--+---+
 | loop extension   | Loop tiling transformation
   | :part:`claimed`  | 
  |
 
+--+--+--+---+



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] ee05167 - [OpenMP] Allow traits for the OpenMP context selector `isa`

2020-07-29 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-29T10:22:27-05:00
New Revision: ee05167cc42b95f70bc2ff1bd4402969f356f53b

URL: 
https://github.com/llvm/llvm-project/commit/ee05167cc42b95f70bc2ff1bd4402969f356f53b
DIFF: 
https://github.com/llvm/llvm-project/commit/ee05167cc42b95f70bc2ff1bd4402969f356f53b.diff

LOG: [OpenMP] Allow traits for the OpenMP context selector `isa`

It was unclear what `isa` was supposed to mean so we did not provide any
traits for this context selector. With this patch we will allow *any*
string or identifier. We use the target attribute and target info to
determine if the trait matches. In other words, we will check if the
provided value is a target feature that is available (at the call site).

Fixes PR46338

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D83281

Added: 
clang/test/OpenMP/declare_variant_device_isa_codegen_1.c

Modified: 
clang/include/clang/AST/OpenMPClause.h
clang/include/clang/Basic/DiagnosticParseKinds.td
clang/include/clang/Basic/DiagnosticSemaKinds.td
clang/lib/AST/OpenMPClause.cpp
clang/lib/Parse/ParseOpenMP.cpp
clang/lib/Sema/SemaOpenMP.cpp
clang/test/OpenMP/declare_variant_messages.c
llvm/include/llvm/Frontend/OpenMP/OMPContext.h
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
llvm/lib/Frontend/OpenMP/OMPContext.cpp
llvm/unittests/Frontend/OpenMPContextTest.cpp

Removed: 




diff  --git a/clang/include/clang/AST/OpenMPClause.h 
b/clang/include/clang/AST/OpenMPClause.h
index c649502f765b..4f94aa7074ee 100644
--- a/clang/include/clang/AST/OpenMPClause.h
+++ b/clang/include/clang/AST/OpenMPClause.h
@@ -7635,6 +7635,10 @@ class OMPClausePrinter final : public 
OMPClauseVisitor {
 
 struct OMPTraitProperty {
   llvm::omp::TraitProperty Kind = llvm::omp::TraitProperty::invalid;
+
+  /// The raw string as we parsed it. This is needed for the `isa` trait set
+  /// (which accepts anything) and (later) extensions.
+  StringRef RawString;
 };
 struct OMPTraitSelector {
   Expr *ScoreOrCondition = nullptr;
@@ -7692,6 +7696,23 @@ class OMPTraitInfo {
 llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const OMPTraitInfo &TI);
 llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const OMPTraitInfo *TI);
 
+/// Clang specific specialization of the OMPContext to lookup target features.
+struct TargetOMPContext final : public llvm::omp::OMPContext {
+
+  TargetOMPContext(ASTContext &ASTCtx,
+   std::function &&DiagUnknownTrait,
+   const FunctionDecl *CurrentFunctionDecl);
+  virtual ~TargetOMPContext() = default;
+
+  /// See llvm::omp::OMPContext::matchesISATrait
+  bool matchesISATrait(StringRef RawString) const override;
+
+private:
+  std::function FeatureValidityCheck;
+  std::function DiagUnknownTrait;
+  llvm::StringMap FeatureMap;
+};
+
 } // namespace clang
 
 #endif // LLVM_CLANG_AST_OPENMPCLAUSE_H

diff  --git a/clang/include/clang/Basic/DiagnosticParseKinds.td 
b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 6138b27fb87f..08b91de31993 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1278,6 +1278,11 @@ def warn_omp_declare_variant_string_literal_or_identifier
   "%select{set|selector|property}0; "
   "%select{set|selector|property}0 skipped">,
   InGroup;
+def warn_unknown_begin_declare_variant_isa_trait
+: Warning<"isa trait '%0' is not known to the current target; verify the "
+  "spelling or consider restricting the context selector with the "
+  "'arch' selector further">,
+  InGroup;
 def note_omp_declare_variant_ctx_options
 : Note<"context %select{set|selector|property}0 options are: %1">;
 def warn_omp_declare_variant_expected

diff  --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 8093e7ed3fbe..ae693a08108c 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -10320,6 +10320,11 @@ def warn_nested_declare_variant
 : Warning<"nesting `omp begin/end declare variant` is not supported yet; "
   "nested context ignored">,
   InGroup;
+def warn_unknown_declare_variant_isa_trait
+: Warning<"isa trait '%0' is not known to the current target; verify the "
+  "spelling or consider restricting the context selector with the "
+  "'arch' selector further">,
+  InGroup;
 def err_omp_non_pointer_type_array_shaping_base : Error<
   "expected expression with a pointer to a complete type as a base of an array 
"
   "shaping operation">;

diff  --git a/clang/lib/AST/OpenMPClause.cpp b/clang/lib/AST/OpenMPClause.cpp
index 6933c5742552..9caa691188fd 100644
--- a/clang/lib/AST/OpenMPClause.cpp
+++ b/clang/lib/AST/OpenMPClause.cpp
@@ -17,6 +17,7 @@
 #include "clang/

[clang] 8723280 - [OpenMP] Fix D83281 issue on windows by allowing `dso_local` in CHECK

2020-07-29 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-29T15:18:20-05:00
New Revision: 8723280b68b1e5ed97a699466720b36a32a9e406

URL: 
https://github.com/llvm/llvm-project/commit/8723280b68b1e5ed97a699466720b36a32a9e406
DIFF: 
https://github.com/llvm/llvm-project/commit/8723280b68b1e5ed97a699466720b36a32a9e406.diff

LOG: [OpenMP] Fix D83281 issue on windows by allowing `dso_local` in CHECK

Added: 


Modified: 
clang/test/OpenMP/declare_variant_device_isa_codegen_1.c

Removed: 




diff  --git a/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c 
b/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c
index baa5eb8f8830..25fc3941dd50 100644
--- a/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c
+++ b/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c
@@ -31,18 +31,18 @@ void avx512_saxpy(int n, float s, float *x, float *y) {
 }
 
 void caller(int n, float s, float *x, float *y) {
-  // GENERIC: define void @{{.*}}caller
-  // GENERIC:  call void @{{.*}}base_saxpy
-  // WITHFEATURE: define void @{{.*}}caller
-  // WITHFEATURE:  call void @{{.*}}avx512_saxpy
+  // GENERIC: define void {{.*}}caller
+  // GENERIC:  call void {{.*}}base_saxpy
+  // WITHFEATURE: define void {{.*}}caller
+  // WITHFEATURE:  call void {{.*}}avx512_saxpy
   base_saxpy(n, s, x, y);
 }
 
 __attribute__((target("avx512f"))) void variant_caller(int n, float s, float 
*x, float *y) {
-  // GENERIC: define void @{{.*}}variant_caller
-  // GENERIC:  call void @{{.*}}avx512_saxpy
-  // WITHFEATURE: define void @{{.*}}variant_caller
-  // WITHFEATURE:  call void @{{.*}}avx512_saxpy
+  // GENERIC: define void {{.*}}variant_caller
+  // GENERIC:  call void {{.*}}avx512_saxpy
+  // WITHFEATURE: define void {{.*}}variant_caller
+  // WITHFEATURE:  call void {{.*}}avx512_saxpy
   base_saxpy(n, s, x, y);
 }
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] b08abf4 - [OpenMP] Fix D83281 issue on windows by allowing `dso_local` in CHECK [2/1]

2020-07-29 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-29T15:47:45-05:00
New Revision: b08abf4c808e98718b8806dafcae1626328676d4

URL: 
https://github.com/llvm/llvm-project/commit/b08abf4c808e98718b8806dafcae1626328676d4
DIFF: 
https://github.com/llvm/llvm-project/commit/b08abf4c808e98718b8806dafcae1626328676d4.diff

LOG: [OpenMP] Fix D83281 issue on windows by allowing `dso_local` in CHECK [2/1]

The problem with 8723280b68b1e5ed97a699466720b36a32a9e406 was that the
`dso_local` is *before* the void not after. Hope this works.

Added: 


Modified: 
clang/test/OpenMP/declare_variant_device_isa_codegen_1.c

Removed: 




diff  --git a/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c 
b/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c
index 25fc3941dd50..76a3eedeae30 100644
--- a/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c
+++ b/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c
@@ -31,18 +31,18 @@ void avx512_saxpy(int n, float s, float *x, float *y) {
 }
 
 void caller(int n, float s, float *x, float *y) {
-  // GENERIC: define void {{.*}}caller
-  // GENERIC:  call void {{.*}}base_saxpy
-  // WITHFEATURE: define void {{.*}}caller
-  // WITHFEATURE:  call void {{.*}}avx512_saxpy
+  // GENERIC: define {{.*}}void @{{.*}}caller
+  // GENERIC:  call void @{{.*}}base_saxpy
+  // WITHFEATURE: define {{.*}}void @{{.*}}caller
+  // WITHFEATURE:  call void @{{.*}}avx512_saxpy
   base_saxpy(n, s, x, y);
 }
 
 __attribute__((target("avx512f"))) void variant_caller(int n, float s, float 
*x, float *y) {
-  // GENERIC: define void {{.*}}variant_caller
-  // GENERIC:  call void {{.*}}avx512_saxpy
-  // WITHFEATURE: define void {{.*}}variant_caller
-  // WITHFEATURE:  call void {{.*}}avx512_saxpy
+  // GENERIC: define {{.*}}void @{{.*}}variant_caller
+  // GENERIC:  call void @{{.*}}avx512_saxpy
+  // WITHFEATURE: define {{.*}}void @{{.*}}variant_caller
+  // WITHFEATURE:  call void @{{.*}}avx512_saxpy
   base_saxpy(n, s, x, y);
 }
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 19756ef - [OpenMP][IRBuilder] Support allocas in nested parallel regions

2020-07-30 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-30T10:19:39-05:00
New Revision: 19756ef53a498b7aa1fbac9e3a7cd3aa8e110fad

URL: 
https://github.com/llvm/llvm-project/commit/19756ef53a498b7aa1fbac9e3a7cd3aa8e110fad
DIFF: 
https://github.com/llvm/llvm-project/commit/19756ef53a498b7aa1fbac9e3a7cd3aa8e110fad.diff

LOG: [OpenMP][IRBuilder] Support allocas in nested parallel regions

We need to keep track of the alloca insertion point (which we already
communicate via the callback to the user) as we place allocas as well.

Reviewed By: fghanim, SouraVX

Differential Revision: https://reviews.llvm.org/D82470

Added: 


Modified: 
clang/lib/CodeGen/CGStmtOpenMP.cpp
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp 
b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 0ee1133ebaa1..df1cc1666de4 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -1707,9 +1707,11 @@ void CodeGenFunction::EmitOMPParallelDirective(const 
OMPParallelDirective &S) {
 
 CGCapturedStmtInfo CGSI(*CS, CR_OpenMP);
 CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(*this, &CGSI);
-Builder.restoreIP(OMPBuilder.CreateParallel(Builder, BodyGenCB, PrivCB,
-FiniCB, IfCond, NumThreads,
-ProcBind, S.hasCancel()));
+llvm::OpenMPIRBuilder::InsertPointTy AllocaIP(
+AllocaInsertPt->getParent(), AllocaInsertPt->getIterator());
+Builder.restoreIP(
+OMPBuilder.CreateParallel(Builder, AllocaIP, BodyGenCB, PrivCB, FiniCB,
+  IfCond, NumThreads, ProcBind, 
S.hasCancel()));
 return;
   }
 

diff  --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 95eed59f1b3d..f813a730342e 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -156,6 +156,7 @@ class OpenMPIRBuilder {
   /// Generator for '#omp parallel'
   ///
   /// \param Loc The insert and source location description.
+  /// \param AllocaIP The insertion points to be used for alloca instructions.
   /// \param BodyGenCB Callback that will generate the region code.
   /// \param PrivCB Callback to copy a given variable (think copy constructor).
   /// \param FiniCB Callback to finalize variable copies.
@@ -166,10 +167,11 @@ class OpenMPIRBuilder {
   ///
   /// \returns The insertion position *after* the parallel.
   IRBuilder<>::InsertPoint
-  CreateParallel(const LocationDescription &Loc, BodyGenCallbackTy BodyGenCB,
- PrivatizeCallbackTy PrivCB, FinalizeCallbackTy FiniCB,
- Value *IfCondition, Value *NumThreads,
- omp::ProcBindKind ProcBind, bool IsCancellable);
+  CreateParallel(const LocationDescription &Loc, InsertPointTy AllocaIP,
+ BodyGenCallbackTy BodyGenCB, PrivatizeCallbackTy PrivCB,
+ FinalizeCallbackTy FiniCB, Value *IfCondition,
+ Value *NumThreads, omp::ProcBindKind ProcBind,
+ bool IsCancellable);
 
   /// Generator for '#omp flush'
   ///

diff  --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 9468a3aa3c8d..a5fe4ec87c46 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -394,9 +394,10 @@ void OpenMPIRBuilder::emitCancelationCheckImpl(
 }
 
 IRBuilder<>::InsertPoint OpenMPIRBuilder::CreateParallel(
-const LocationDescription &Loc, BodyGenCallbackTy BodyGenCB,
-PrivatizeCallbackTy PrivCB, FinalizeCallbackTy FiniCB, Value *IfCondition,
-Value *NumThreads, omp::ProcBindKind ProcBind, bool IsCancellable) {
+const LocationDescription &Loc, InsertPointTy OuterAllocaIP,
+BodyGenCallbackTy BodyGenCB, PrivatizeCallbackTy PrivCB,
+FinalizeCallbackTy FiniCB, Value *IfCondition, Value *NumThreads,
+omp::ProcBindKind ProcBind, bool IsCancellable) {
   if (!updateToLocation(Loc))
 return Loc.IP;
 
@@ -429,7 +430,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::CreateParallel(
   // we want to delete at the end.
   SmallVector ToBeDeleted;
 
-  Builder.SetInsertPoint(OuterFn->getEntryBlock().getFirstNonPHI());
+  // Change the location to the outer alloca insertion point to create and
+  // initialize the allocas we pass into the parallel region.
+  Builder.restoreIP(OuterAllocaIP);
   AllocaInst *TIDAddr = Builder.CreateAlloca(Int32, nullptr, "tid.addr");
   AllocaInst *ZeroAddr = Builder.CreateAlloca(Int32, nullptr, "zero.addr");
 
@@ -481,9 +484,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::CreateParallel(
 
   // Generate the privatization allocas in the block that will become the entry
   // o

[clang] ebad64d - [OpenMP][FIX] Consistently use OpenMPIRBuilder if requested

2020-07-30 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-30T10:19:40-05:00
New Revision: ebad64dfe133e64d1df6b82e6ef2fb031d635b08

URL: 
https://github.com/llvm/llvm-project/commit/ebad64dfe133e64d1df6b82e6ef2fb031d635b08
DIFF: 
https://github.com/llvm/llvm-project/commit/ebad64dfe133e64d1df6b82e6ef2fb031d635b08.diff

LOG: [OpenMP][FIX] Consistently use OpenMPIRBuilder if requested

When we use the OpenMPIRBuilder for the parallel region we need to also
use it to get the thread ID (among other things) in the body. This is
because CGOpenMPRuntime::getThreadID() and
CGOpenMPRuntime::emitUpdateLocation implicitly assumes that if they are
called from within a parallel region there is a certain structure to the
code and certain members of the OMPRegionInfo are initialized. It might
make sense to initialize them even if we use the OpenMPIRBuilder but we
would preferably get rid of such state instead.

Bug reported by Anchu Rajendran Sudhakumari.

Depends on D82470.

Reviewed By: anchu-rajendran

Differential Revision: https://reviews.llvm.org/D82822

Added: 
clang/test/OpenMP/irbuilder_nested_parallel_for.c

Modified: 
clang/lib/CodeGen/CGOpenMPRuntime.cpp
clang/test/OpenMP/cancel_codegen.cpp
clang/test/OpenMP/task_codegen.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index dc12286c72be..60c7081b135b 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1455,6 +1455,19 @@ void 
CGOpenMPRuntime::clearLocThreadIdInsertPt(CodeGenFunction &CGF) {
   }
 }
 
+static StringRef getIdentStringFromSourceLocation(CodeGenFunction &CGF,
+  SourceLocation Loc,
+  SmallString<128> &Buffer) {
+  llvm::raw_svector_ostream OS(Buffer);
+  // Build debug location
+  PresumedLoc PLoc = CGF.getContext().getSourceManager().getPresumedLoc(Loc);
+  OS << ";" << PLoc.getFilename() << ";";
+  if (const auto *FD = dyn_cast_or_null(CGF.CurFuncDecl))
+OS << FD->getQualifiedNameAsString();
+  OS << ";" << PLoc.getLine() << ";" << PLoc.getColumn() << ";;";
+  return OS.str();
+}
+
 llvm::Value *CGOpenMPRuntime::emitUpdateLocation(CodeGenFunction &CGF,
  SourceLocation Loc,
  unsigned Flags) {
@@ -1464,6 +1477,16 @@ llvm::Value 
*CGOpenMPRuntime::emitUpdateLocation(CodeGenFunction &CGF,
   Loc.isInvalid())
 return getOrCreateDefaultLocation(Flags).getPointer();
 
+  // If the OpenMPIRBuilder is used we need to use it for all location handling
+  // as the clang invariants used below might be broken.
+  if (CGM.getLangOpts().OpenMPIRBuilder) {
+SmallString<128> Buffer;
+OMPBuilder.updateToLocation(CGF.Builder.saveIP());
+auto *SrcLocStr = OMPBuilder.getOrCreateSrcLocStr(
+getIdentStringFromSourceLocation(CGF, Loc, Buffer));
+return OMPBuilder.getOrCreateIdent(SrcLocStr, IdentFlag(Flags));
+  }
+
   assert(CGF.CurFn && "No function in current CodeGenFunction.");
 
   CharUnits Align = CGM.getContext().getTypeAlignInChars(IdentQTy);
@@ -1497,15 +1520,9 @@ llvm::Value 
*CGOpenMPRuntime::emitUpdateLocation(CodeGenFunction &CGF,
 
   llvm::Value *OMPDebugLoc = OpenMPDebugLocMap.lookup(Loc.getRawEncoding());
   if (OMPDebugLoc == nullptr) {
-SmallString<128> Buffer2;
-llvm::raw_svector_ostream OS2(Buffer2);
-// Build debug location
-PresumedLoc PLoc = CGF.getContext().getSourceManager().getPresumedLoc(Loc);
-OS2 << ";" << PLoc.getFilename() << ";";
-if (const auto *FD = dyn_cast_or_null(CGF.CurFuncDecl))
-  OS2 << FD->getQualifiedNameAsString();
-OS2 << ";" << PLoc.getLine() << ";" << PLoc.getColumn() << ";;";
-OMPDebugLoc = CGF.Builder.CreateGlobalStringPtr(OS2.str());
+SmallString<128> Buffer;
+OMPDebugLoc = CGF.Builder.CreateGlobalStringPtr(
+getIdentStringFromSourceLocation(CGF, Loc, Buffer));
 OpenMPDebugLocMap[Loc.getRawEncoding()] = OMPDebugLoc;
   }
   // *psource = ";;";
@@ -1519,6 +1536,16 @@ llvm::Value 
*CGOpenMPRuntime::emitUpdateLocation(CodeGenFunction &CGF,
 llvm::Value *CGOpenMPRuntime::getThreadID(CodeGenFunction &CGF,
   SourceLocation Loc) {
   assert(CGF.CurFn && "No function in current CodeGenFunction.");
+  // If the OpenMPIRBuilder is used we need to use it for all thread id calls 
as
+  // the clang invariants used below might be broken.
+  if (CGM.getLangOpts().OpenMPIRBuilder) {
+SmallString<128> Buffer;
+OMPBuilder.updateToLocation(CGF.Builder.saveIP());
+auto *SrcLocStr = OMPBuilder.getOrCreateSrcLocStr(
+getIdentStringFromSourceLocation(CGF, Loc, Buffer));
+return OMPBuilder.getOrCreateThreadID(
+OMPBuilder.getOrCreateIdent(SrcLocStr));
+  }
 
   llvm::Value *ThreadID = nullptr

[clang] ceed44a - [OpenMP][NFC] Remove unnecessary argument

2020-04-04 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-04T11:34:58-05:00
New Revision: ceed44adfd1ae9d714eaa4f0e7fa5a1a149b4dc5

URL: 
https://github.com/llvm/llvm-project/commit/ceed44adfd1ae9d714eaa4f0e7fa5a1a149b4dc5
DIFF: 
https://github.com/llvm/llvm-project/commit/ceed44adfd1ae9d714eaa4f0e7fa5a1a149b4dc5.diff

LOG: [OpenMP][NFC] Remove unnecessary argument

Added: 


Modified: 
clang/include/clang/Sema/Sema.h
clang/lib/Sema/SemaExpr.cpp
clang/lib/Sema/SemaOpenMP.cpp

Removed: 




diff  --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 7c689c2a13e8..10e2d69f3d9e 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -9892,7 +9892,7 @@ class Sema final {
   /// specialization via the OpenMP declare variant mechanism available. If
   /// there is, return the specialized call expression, otherwise return the
   /// original \p Call.
-  ExprResult ActOnOpenMPCall(Sema &S, ExprResult Call, Scope *Scope,
+  ExprResult ActOnOpenMPCall(ExprResult Call, Scope *Scope,
  SourceLocation LParenLoc, MultiExprArg ArgExprs,
  SourceLocation RParenLoc, Expr *ExecConfig);
 

diff  --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 8d0e97c85771..b311aad84816 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -5997,7 +5997,7 @@ ExprResult Sema::ActOnCallExpr(Scope *Scope, Expr *Fn, 
SourceLocation LParenLoc,
   }
 
   if (LangOpts.OpenMP)
-Call = ActOnOpenMPCall(*this, Call, Scope, LParenLoc, ArgExprs, RParenLoc,
+Call = ActOnOpenMPCall(Call, Scope, LParenLoc, ArgExprs, RParenLoc,
ExecConfig);
 
   return Call;

diff  --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index f663b1d43659..cfaf981983c1 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -5584,7 +5584,7 @@ void 
Sema::ActOnFinishedFunctionDefinitionInOpenMPDeclareVariantScope(
   BaseFD->setImplicit(true);
 }
 
-ExprResult Sema::ActOnOpenMPCall(Sema &S, ExprResult Call, Scope *Scope,
+ExprResult Sema::ActOnOpenMPCall(ExprResult Call, Scope *Scope,
  SourceLocation LParenLoc,
  MultiExprArg ArgExprs,
  SourceLocation RParenLoc, Expr *ExecConfig) {
@@ -5601,8 +5601,8 @@ ExprResult Sema::ActOnOpenMPCall(Sema &S, ExprResult 
Call, Scope *Scope,
   if (!CalleeFnDecl->hasAttr())
 return Call;
 
-  ASTContext &Context = S.getASTContext();
-  OMPContext OMPCtx(S.getLangOpts().OpenMPIsDevice,
+  ASTContext &Context = getASTContext();
+  OMPContext OMPCtx(getLangOpts().OpenMPIsDevice,
 Context.getTargetInfo().getTriple());
 
   SmallVector Exprs;
@@ -5650,12 +5650,12 @@ ExprResult Sema::ActOnOpenMPCall(Sema &S, ExprResult 
Call, Scope *Scope,
   if (auto *SpecializedMethod = dyn_cast(BestDecl)) {
 auto *MemberCall = dyn_cast(CE);
 BestExpr = MemberExpr::CreateImplicit(
-S.Context, MemberCall->getImplicitObjectArgument(),
-/* IsArrow */ false, SpecializedMethod, S.Context.BoundMemberTy,
+Context, MemberCall->getImplicitObjectArgument(),
+/* IsArrow */ false, SpecializedMethod, Context.BoundMemberTy,
 MemberCall->getValueKind(), MemberCall->getObjectKind());
   }
-  NewCall = S.BuildCallExpr(Scope, BestExpr, LParenLoc, ArgExprs, 
RParenLoc,
-ExecConfig);
+  NewCall = BuildCallExpr(Scope, BestExpr, LParenLoc, ArgExprs, RParenLoc,
+  ExecConfig);
   if (NewCall.isUsable())
 break;
 }
@@ -5666,7 +5666,6 @@ ExprResult Sema::ActOnOpenMPCall(Sema &S, ExprResult 
Call, Scope *Scope,
 
   if (!NewCall.isUsable())
 return Call;
-
   return PseudoObjectExpr::Create(Context, CE, {NewCall.get()}, 0);
 }
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] 8ea07f6 - [OpenMP] Add extra qualification to OpenMP clause id

2020-04-05 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-05T23:10:58-05:00
New Revision: 8ea07f62a6f06bdb7da981425227995423867a4d

URL: 
https://github.com/llvm/llvm-project/commit/8ea07f62a6f06bdb7da981425227995423867a4d
DIFF: 
https://github.com/llvm/llvm-project/commit/8ea07f62a6f06bdb7da981425227995423867a4d.diff

LOG: [OpenMP] Add extra qualification to OpenMP clause id

Forgot to adjust this use in 419a559c5a73f13578d891feb1299cada08d581e.

Added: 


Modified: 
clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp

Removed: 




diff  --git a/clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp 
b/clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp
index efd70e778c6f..724e9b9b9cbc 100644
--- a/clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp
+++ b/clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp
@@ -24,7 +24,7 @@ namespace openmp {
 void UseDefaultNoneCheck::registerMatchers(MatchFinder *Finder) {
   Finder->addMatcher(
   ompExecutableDirective(
-  allOf(isAllowedToContainClauseKind(OMPC_default),
+  allOf(isAllowedToContainClauseKind(llvm::omp::OMPC_default),
 anyOf(unless(hasAnyClause(ompDefaultClause())),
   hasAnyClause(ompDefaultClause(unless(isNoneKind()))
.bind("clause")



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 931c0cd - [OpenMP][NFC] Move and simplify directive -> allowed clause mapping

2020-04-05 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-06T00:04:08-05:00
New Revision: 931c0cd713ee9b082389727bed1b518c6a44344f

URL: 
https://github.com/llvm/llvm-project/commit/931c0cd713ee9b082389727bed1b518c6a44344f
DIFF: 
https://github.com/llvm/llvm-project/commit/931c0cd713ee9b082389727bed1b518c6a44344f.diff

LOG: [OpenMP][NFC] Move and simplify directive -> allowed clause mapping

Move the listing of allowed clauses per OpenMP directive to the new
macro file in `llvm/Frontend/OpenMP`. Also, use a single generic macro
that specifies the directive and one allowed clause explicitly instead
of a dedicated macro per directive.

We save 800 loc and boilerplate for all new directives/clauses with no
functional change. We also need to include the macro file only once and
not once per directive.

Depends on D77112.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D77113

Added: 


Modified: 
clang/include/clang/ASTMatchers/ASTMatchers.h
clang/include/clang/Basic/OpenMPKinds.def
clang/include/clang/Basic/OpenMPKinds.h
clang/lib/ASTMatchers/Dynamic/CMakeLists.txt
clang/lib/Basic/OpenMPKinds.cpp
clang/lib/Tooling/CMakeLists.txt
clang/lib/Tooling/Transformer/CMakeLists.txt
clang/unittests/AST/CMakeLists.txt
clang/unittests/ASTMatchers/CMakeLists.txt
clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt
clang/unittests/Analysis/CMakeLists.txt
clang/unittests/Rename/CMakeLists.txt
clang/unittests/Sema/CMakeLists.txt
clang/unittests/StaticAnalyzer/CMakeLists.txt
clang/unittests/Tooling/CMakeLists.txt
llvm/include/llvm/Frontend/OpenMP/OMPConstants.h
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
llvm/lib/Frontend/OpenMP/OMPConstants.cpp

Removed: 




diff  --git a/clang/include/clang/ASTMatchers/ASTMatchers.h 
b/clang/include/clang/ASTMatchers/ASTMatchers.h
index 8d97c32a0d36..9d7b4dcaacfd 100644
--- a/clang/include/clang/ASTMatchers/ASTMatchers.h
+++ b/clang/include/clang/ASTMatchers/ASTMatchers.h
@@ -7119,7 +7119,7 @@ AST_MATCHER(OMPDefaultClause, isSharedKind) {
 /// ``isAllowedToContainClauseKind("OMPC_default").``
 AST_MATCHER_P(OMPExecutableDirective, isAllowedToContainClauseKind,
   OpenMPClauseKind, CKind) {
-  return isAllowedClauseForDirective(
+  return llvm::omp::isAllowedClauseForDirective(
   Node.getDirectiveKind(), CKind,
   Finder->getASTContext().getLangOpts().OpenMP);
 }

diff  --git a/clang/include/clang/Basic/OpenMPKinds.def 
b/clang/include/clang/Basic/OpenMPKinds.def
index 4a4e6c6cb4c3..0ae0bc844e36 100644
--- a/clang/include/clang/Basic/OpenMPKinds.def
+++ b/clang/include/clang/Basic/OpenMPKinds.def
@@ -11,102 +11,6 @@
 ///
 
//===--===//
 
-#ifndef OPENMP_CLAUSE
-#  define OPENMP_CLAUSE(Name, Class)
-#endif
-#ifndef OPENMP_PARALLEL_CLAUSE
-#  define OPENMP_PARALLEL_CLAUSE(Name)
-#endif
-#ifndef OPENMP_SIMD_CLAUSE
-#  define OPENMP_SIMD_CLAUSE(Name)
-#endif
-#ifndef OPENMP_FOR_CLAUSE
-#  define OPENMP_FOR_CLAUSE(Name)
-#endif
-#ifndef OPENMP_FOR_SIMD_CLAUSE
-#  define OPENMP_FOR_SIMD_CLAUSE(Name)
-#endif
-#ifndef OPENMP_SECTIONS_CLAUSE
-#  define OPENMP_SECTIONS_CLAUSE(Name)
-#endif
-#ifndef OPENMP_SINGLE_CLAUSE
-#  define OPENMP_SINGLE_CLAUSE(Name)
-#endif
-#ifndef OPENMP_PARALLEL_FOR_CLAUSE
-#  define OPENMP_PARALLEL_FOR_CLAUSE(Name)
-#endif
-#ifndef OPENMP_PARALLEL_FOR_SIMD_CLAUSE
-#  define OPENMP_PARALLEL_FOR_SIMD_CLAUSE(Name)
-#endif
-#ifndef OPENMP_PARALLEL_MASTER_CLAUSE
-#  define OPENMP_PARALLEL_MASTER_CLAUSE(Name)
-#endif
-#ifndef OPENMP_PARALLEL_SECTIONS_CLAUSE
-#  define OPENMP_PARALLEL_SECTIONS_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TASK_CLAUSE
-#  define OPENMP_TASK_CLAUSE(Name)
-#endif
-#ifndef OPENMP_ATOMIC_CLAUSE
-#  define OPENMP_ATOMIC_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TARGET_CLAUSE
-#  define OPENMP_TARGET_CLAUSE(Name)
-#endif
-#ifndef OPENMP_REQUIRES_CLAUSE
-# define OPENMP_REQUIRES_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TARGET_DATA_CLAUSE
-#  define OPENMP_TARGET_DATA_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TARGET_ENTER_DATA_CLAUSE
-#define OPENMP_TARGET_ENTER_DATA_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TARGET_EXIT_DATA_CLAUSE
-#define OPENMP_TARGET_EXIT_DATA_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TARGET_PARALLEL_CLAUSE
-#  define OPENMP_TARGET_PARALLEL_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TARGET_PARALLEL_FOR_CLAUSE
-#  define OPENMP_TARGET_PARALLEL_FOR_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TARGET_UPDATE_CLAUSE
-#  define OPENMP_TARGET_UPDATE_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TEAMS_CLAUSE
-#  define OPENMP_TEAMS_CLAUSE(Name)
-#endif
-#ifndef OPENMP_CANCEL_CLAUSE
-#  define OPENMP_CANCEL_CLAUSE(Name)
-#endif
-#ifndef OPENMP_ORDERED_CLAUSE
-#  define OPENMP_ORDERED_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TASKLOOP_CLAUSE
-#  define OPENMP_TASKLOOP_CLAUSE(Name)
-#endif
-#ifndef OPENMP_TASKLOOP_SIMD_CLAUSE
-#  define OPENMP_TA

[clang-tools-extra] 9e1af17 - [OpenMP][FIX] Add missing cmake dependence needed after 931c0cd713ee

2020-04-06 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-06T09:01:43-05:00
New Revision: 9e1af172eec9a06bffac337057a2452b88466288

URL: 
https://github.com/llvm/llvm-project/commit/9e1af172eec9a06bffac337057a2452b88466288
DIFF: 
https://github.com/llvm/llvm-project/commit/9e1af172eec9a06bffac337057a2452b88466288.diff

LOG: [OpenMP][FIX] Add missing cmake dependence needed after 931c0cd713ee

Added: 


Modified: 
clang-tools-extra/clang-reorder-fields/CMakeLists.txt

Removed: 




diff  --git a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt 
b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt
index 9c75d785cc9a..c357d0a3cfbf 100644
--- a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt
+++ b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt
@@ -1,4 +1,7 @@
-set(LLVM_LINK_COMPONENTS support)
+set(LLVM_LINK_COMPONENTS
+  FrontendOpenMP
+  support
+)
 
 add_clang_library(clangReorderFields
   ReorderFieldsAction.cpp



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [clang-tools-extra] 9e1af17 - [OpenMP][FIX] Add missing cmake dependence needed after 931c0cd713ee

2020-04-06 Thread Johannes Doerfert via cfe-commits

On 4/6/20 9:06 AM, Roman Lebedev wrote:

This seems suspicious.


Agreed, especially since this is also not the only place.

I was hoping to unblock the builders with this.



Does clang-reorder-fields actually explicitly needs something from
FrontendOpenMP?
If not, it looks like there dependency is missing elsewhere, or
there's wrong layering.


The root cause is the use of `isAllowedClauseForDirective` in ASTMatchers.h.



On Mon, Apr 6, 2020 at 5:03 PM Johannes Doerfert via cfe-commits
 wrote:


Author: Johannes Doerfert
Date: 2020-04-06T09:01:43-05:00
New Revision: 9e1af172eec9a06bffac337057a2452b88466288

URL: 
https://github.com/llvm/llvm-project/commit/9e1af172eec9a06bffac337057a2452b88466288
DIFF: 
https://github.com/llvm/llvm-project/commit/9e1af172eec9a06bffac337057a2452b88466288.diff

LOG: [OpenMP][FIX] Add missing cmake dependence needed after 931c0cd713ee

Added:


Modified:
 clang-tools-extra/clang-reorder-fields/CMakeLists.txt

Removed:




diff  --git a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt 
b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt
index 9c75d785cc9a..c357d0a3cfbf 100644
--- a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt
+++ b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt
@@ -1,4 +1,7 @@
-set(LLVM_LINK_COMPONENTS support)
+set(LLVM_LINK_COMPONENTS
+  FrontendOpenMP
+  support
+)

  add_clang_library(clangReorderFields
ReorderFieldsAction.cpp



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] 97aa593 - [OpenMP] Fix layering problem with FrontendOpenMP

2020-04-06 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-06T13:04:26-05:00
New Revision: 97aa593a8387586095b7eac12974ba2fdd08f4c3

URL: 
https://github.com/llvm/llvm-project/commit/97aa593a8387586095b7eac12974ba2fdd08f4c3
DIFF: 
https://github.com/llvm/llvm-project/commit/97aa593a8387586095b7eac12974ba2fdd08f4c3.diff

LOG: [OpenMP] Fix layering problem with FrontendOpenMP

Summary:
ASTMatchers is used in various places and it now exposes the
LLVMFrontendOpenMP library to its users without them needing to depend
on it explicitly.

Reviewers: lebedev.ri

Subscribers: mgorny, yaxunl, bollu, guansong, martong, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77574

Added: 


Modified: 
clang-tools-extra/clang-reorder-fields/CMakeLists.txt
clang-tools-extra/clang-tidy/openmp/CMakeLists.txt
clang/lib/ASTMatchers/CMakeLists.txt
clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
clang/lib/StaticAnalyzer/Core/CMakeLists.txt
clang/lib/Tooling/CMakeLists.txt
clang/lib/Tooling/Transformer/CMakeLists.txt
clang/unittests/AST/CMakeLists.txt
clang/unittests/ASTMatchers/CMakeLists.txt
clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt
clang/unittests/Analysis/CMakeLists.txt
clang/unittests/Rename/CMakeLists.txt
clang/unittests/Sema/CMakeLists.txt
clang/unittests/StaticAnalyzer/CMakeLists.txt
clang/unittests/Tooling/CMakeLists.txt

Removed: 




diff  --git a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt 
b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt
index c357d0a3cfbf..153271ce58e4 100644
--- a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt
+++ b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt
@@ -1,5 +1,4 @@
 set(LLVM_LINK_COMPONENTS
-  FrontendOpenMP
   support
 )
 

diff  --git a/clang-tools-extra/clang-tidy/openmp/CMakeLists.txt 
b/clang-tools-extra/clang-tidy/openmp/CMakeLists.txt
index af95704fd445..ad1f591a6338 100644
--- a/clang-tools-extra/clang-tidy/openmp/CMakeLists.txt
+++ b/clang-tools-extra/clang-tidy/openmp/CMakeLists.txt
@@ -1,5 +1,4 @@
 set(LLVM_LINK_COMPONENTS
-  FrontendOpenMP
   Support)
 
 add_clang_library(clangTidyOpenMPModule

diff  --git a/clang/lib/ASTMatchers/CMakeLists.txt 
b/clang/lib/ASTMatchers/CMakeLists.txt
index cde871cd31ca..1b78d95e4fac 100644
--- a/clang/lib/ASTMatchers/CMakeLists.txt
+++ b/clang/lib/ASTMatchers/CMakeLists.txt
@@ -1,7 +1,6 @@
 add_subdirectory(Dynamic)
 
 set(LLVM_LINK_COMPONENTS
-  FrontendOpenMP
   Support
 )
 
@@ -15,3 +14,5 @@ add_clang_library(clangASTMatchers
   clangBasic
   clangLex
   )
+
+target_link_libraries(clangASTMatchers PUBLIC LLVMFrontendOpenMP)

diff  --git a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt 
b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
index bcf2dfdb8326..b7fb0d90c980 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
+++ b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
@@ -1,5 +1,4 @@
 set(LLVM_LINK_COMPONENTS
-  FrontendOpenMP
   Support
   )
 

diff  --git a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt 
b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt
index 057cdd4bb18a..c7c9aa2ff1f4 100644
--- a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt
+++ b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt
@@ -1,5 +1,4 @@
 set(LLVM_LINK_COMPONENTS
-  FrontendOpenMP
   Support
 )
 

diff  --git a/clang/lib/Tooling/CMakeLists.txt 
b/clang/lib/Tooling/CMakeLists.txt
index 71b6cc55e504..59c990daaa29 100644
--- a/clang/lib/Tooling/CMakeLists.txt
+++ b/clang/lib/Tooling/CMakeLists.txt
@@ -1,6 +1,5 @@
 set(LLVM_LINK_COMPONENTS
   Option
-  FrontendOpenMP
   Support
   )
 

diff  --git a/clang/lib/Tooling/Transformer/CMakeLists.txt 
b/clang/lib/Tooling/Transformer/CMakeLists.txt
index 281af1007a65..3d82bb98ff69 100644
--- a/clang/lib/Tooling/Transformer/CMakeLists.txt
+++ b/clang/lib/Tooling/Transformer/CMakeLists.txt
@@ -1,5 +1,4 @@
 set(LLVM_LINK_COMPONENTS
-  FrontendOpenMP
   Support
 )
 

diff  --git a/clang/unittests/AST/CMakeLists.txt 
b/clang/unittests/AST/CMakeLists.txt
index 868635b6eea5..b738ce08d06d 100644
--- a/clang/unittests/AST/CMakeLists.txt
+++ b/clang/unittests/AST/CMakeLists.txt
@@ -1,5 +1,4 @@
 set(LLVM_LINK_COMPONENTS
-  FrontendOpenMP
   Support
   )
 

diff  --git a/clang/unittests/ASTMatchers/CMakeLists.txt 
b/clang/unittests/ASTMatchers/CMakeLists.txt
index e128cfe695a6..aa5438c947f4 100644
--- a/clang/unittests/ASTMatchers/CMakeLists.txt
+++ b/clang/unittests/ASTMatchers/CMakeLists.txt
@@ -1,5 +1,4 @@
 set(LLVM_LINK_COMPONENTS
-  FrontendOpenMP
   Support
   )
 

diff  --git a/clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt 
b/clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt
index 85556b01cae1..c40964dfaf0b 100644
--- a/clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt
+++ b/clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt
@@ -1,5 +1,4 @@
 set(LLVM_LINK_COMPONENTS
-  FrontendOpenMP
   Support
   )
 

diff  --git a/clang

[clang-tools-extra] f9d558c - [OpenMP] "UnFix" layering problem with FrontendOpenMP

2020-04-07 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-07T14:41:18-05:00
New Revision: f9d558c871337699d2815dbf116bae94025f5d90

URL: 
https://github.com/llvm/llvm-project/commit/f9d558c871337699d2815dbf116bae94025f5d90
DIFF: 
https://github.com/llvm/llvm-project/commit/f9d558c871337699d2815dbf116bae94025f5d90.diff

LOG: [OpenMP] "UnFix" layering problem with FrontendOpenMP

This reverts commit 97aa593a8387586095b7eac12974ba2fdd08f4c3 as it
causes problems (PR45453) https://reviews.llvm.org/D77574#1966321.

This additionally adds an explicit reference to FrontendOpenMP to
clang-tidy where ASTMatchers is used.

This is hopefully just a temporary solution. The dependence on
`FrontendOpenMP` from `ASTMatchers` should be handled by CMake
implicitly, not us explicitly.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D77666

Added: 


Modified: 
clang-tools-extra/clang-change-namespace/CMakeLists.txt
clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt
clang-tools-extra/clang-doc/CMakeLists.txt
clang-tools-extra/clang-include-fixer/find-all-symbols/CMakeLists.txt
clang-tools-extra/clang-move/CMakeLists.txt
clang-tools-extra/clang-query/CMakeLists.txt
clang-tools-extra/clang-reorder-fields/CMakeLists.txt
clang-tools-extra/clang-tidy/CMakeLists.txt
clang-tools-extra/clang-tidy/abseil/CMakeLists.txt
clang-tools-extra/clang-tidy/android/CMakeLists.txt
clang-tools-extra/clang-tidy/boost/CMakeLists.txt
clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
clang-tools-extra/clang-tidy/cert/CMakeLists.txt
clang-tools-extra/clang-tidy/cppcoreguidelines/CMakeLists.txt
clang-tools-extra/clang-tidy/darwin/CMakeLists.txt
clang-tools-extra/clang-tidy/fuchsia/CMakeLists.txt
clang-tools-extra/clang-tidy/google/CMakeLists.txt
clang-tools-extra/clang-tidy/hicpp/CMakeLists.txt
clang-tools-extra/clang-tidy/linuxkernel/CMakeLists.txt
clang-tools-extra/clang-tidy/llvm/CMakeLists.txt
clang-tools-extra/clang-tidy/llvmlibc/CMakeLists.txt
clang-tools-extra/clang-tidy/misc/CMakeLists.txt
clang-tools-extra/clang-tidy/modernize/CMakeLists.txt
clang-tools-extra/clang-tidy/mpi/CMakeLists.txt
clang-tools-extra/clang-tidy/objc/CMakeLists.txt
clang-tools-extra/clang-tidy/openmp/CMakeLists.txt
clang-tools-extra/clang-tidy/performance/CMakeLists.txt
clang-tools-extra/clang-tidy/portability/CMakeLists.txt
clang-tools-extra/clang-tidy/readability/CMakeLists.txt
clang-tools-extra/clang-tidy/utils/CMakeLists.txt
clang-tools-extra/clang-tidy/zircon/CMakeLists.txt
clang-tools-extra/clangd/CMakeLists.txt
clang-tools-extra/clangd/unittests/CMakeLists.txt
clang-tools-extra/tool-template/CMakeLists.txt
clang-tools-extra/unittests/clang-change-namespace/CMakeLists.txt
clang-tools-extra/unittests/clang-doc/CMakeLists.txt

clang-tools-extra/unittests/clang-include-fixer/find-all-symbols/CMakeLists.txt
clang-tools-extra/unittests/clang-move/CMakeLists.txt
clang-tools-extra/unittests/clang-query/CMakeLists.txt
clang-tools-extra/unittests/clang-tidy/CMakeLists.txt
clang/lib/ASTMatchers/CMakeLists.txt
clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
clang/lib/StaticAnalyzer/Core/CMakeLists.txt
clang/lib/Tooling/CMakeLists.txt
clang/lib/Tooling/Transformer/CMakeLists.txt
clang/unittests/AST/CMakeLists.txt
clang/unittests/ASTMatchers/CMakeLists.txt
clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt
clang/unittests/Analysis/CMakeLists.txt
clang/unittests/Rename/CMakeLists.txt
clang/unittests/Sema/CMakeLists.txt
clang/unittests/StaticAnalyzer/CMakeLists.txt
clang/unittests/Tooling/CMakeLists.txt

Removed: 




diff  --git a/clang-tools-extra/clang-change-namespace/CMakeLists.txt 
b/clang-tools-extra/clang-change-namespace/CMakeLists.txt
index 178306423eb7..7c0363cd00d0 100644
--- a/clang-tools-extra/clang-change-namespace/CMakeLists.txt
+++ b/clang-tools-extra/clang-change-namespace/CMakeLists.txt
@@ -1,5 +1,6 @@
 set(LLVM_LINK_COMPONENTS
-  support
+  FrontendOpenMP
+  Support
   )
 
 add_clang_library(clangChangeNamespace

diff  --git a/clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt 
b/clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt
index ae48a5e0f798..c168bb4d5794 100644
--- a/clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt
+++ b/clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt
@@ -1,6 +1,7 @@
 include_directories(${CMAKE_CURRENT_SOURCE_DIR}/..)
 
 set(LLVM_LINK_COMPONENTS
+  FrontendOpenMP
   Support
   )
 

diff  --git a/clang-tools-extra/clang-doc/CMakeLists.txt 
b/clang-tools-extra/clang-doc/CMakeLists.txt
index c301ad5aface..8df7d3ef9098 100644
--- a/clang-tools-extra/clang-doc/CMakeLists.txt
+++ b/clang-tools-extra/clang-doc/CMakeLists.txt
@@ -1,6 +1,7 @@
 set(LLVM_LINK_COMPONENTS
   support
   Bit

[clang-tools-extra] 5303770 - [OpenMP] "UnFix" last layering problem with FrontendOpenMP

2020-04-07 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-07T22:47:41-05:00
New Revision: 530377018f624eadb8c07650511bbb9ca63608de

URL: 
https://github.com/llvm/llvm-project/commit/530377018f624eadb8c07650511bbb9ca63608de
DIFF: 
https://github.com/llvm/llvm-project/commit/530377018f624eadb8c07650511bbb9ca63608de.diff

LOG: [OpenMP] "UnFix" last layering problem with FrontendOpenMP

It seems one target was missed in D77666 which kept some bots red [0].

[0] 
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/12079/steps/build%20stage%201/logs/stdio

Added: 


Modified: 
clang-tools-extra/clang-tidy/tool/CMakeLists.txt

Removed: 




diff  --git a/clang-tools-extra/clang-tidy/tool/CMakeLists.txt 
b/clang-tools-extra/clang-tidy/tool/CMakeLists.txt
index 0cd15ddb4653..ff9104b661d0 100644
--- a/clang-tools-extra/clang-tidy/tool/CMakeLists.txt
+++ b/clang-tools-extra/clang-tidy/tool/CMakeLists.txt
@@ -2,6 +2,7 @@ set(LLVM_LINK_COMPONENTS
   AllTargetsAsmParsers
   AllTargetsDescs
   AllTargetsInfos
+  FrontendOpenMP
   support
   )
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] a19eb1d - [OpenMP] Add match_{all,any,none} declare variant selector extensions.

2020-04-07 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-07T23:33:24-05:00
New Revision: a19eb1de726c1ccbf60dca6a1fbcd49b3157282f

URL: 
https://github.com/llvm/llvm-project/commit/a19eb1de726c1ccbf60dca6a1fbcd49b3157282f
DIFF: 
https://github.com/llvm/llvm-project/commit/a19eb1de726c1ccbf60dca6a1fbcd49b3157282f.diff

LOG: [OpenMP] Add match_{all,any,none} declare variant selector extensions.

By default, all traits in the OpenMP context selector have to match for
it to be acceptable. Though, we sometimes want a single property out of
multiple to match (=any) or no match at all (=none). We offer these
choices as extensions via
  `implementation={extension(match_{all,any,none})}`
to the user. The choice will affect the entire context selector not only
the traits following the match property.

The first user will be D75788. There we can replace
```
  #pragma omp begin declare variant match(device={arch(nvptx64)})
  #define __CUDA__

  #include <__clang_cuda_cmath.h>

  // TODO: Hack until we support an extension to the match clause that allows 
"or".
  #undef __CLANG_CUDA_CMATH_H__

  #undef __CUDA__
  #pragma omp end declare variant

  #pragma omp begin declare variant match(device={arch(nvptx)})
  #define __CUDA__

  #include <__clang_cuda_cmath.h>

  #undef __CUDA__
  #pragma omp end declare variant
```
with the much simpler
```
  #pragma omp begin declare variant match(device={arch(nvptx, nvptx64)}, 
implementation={extension(match_any)})
  #define __CUDA__

  #include <__clang_cuda_cmath.h>

  #undef __CUDA__
  #pragma omp end declare variant
```

Reviewed By: mikerice

Differential Revision: https://reviews.llvm.org/D77414

Added: 
clang/test/AST/ast-dump-openmp-declare-variant-extensions-messages.c
clang/test/AST/ast-dump-openmp-declare-variant-extensions.c

Modified: 
clang/include/clang/AST/OpenMPClause.h
clang/include/clang/Basic/AttrDocs.td
clang/include/clang/Basic/DiagnosticParseKinds.td
clang/lib/AST/OpenMPClause.cpp
clang/lib/Parse/ParseOpenMP.cpp
clang/lib/Sema/SemaOpenMP.cpp
clang/test/OpenMP/declare_variant_ast_print.c
clang/test/OpenMP/declare_variant_messages.c
llvm/include/llvm/Frontend/OpenMP/OMPContext.h
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
llvm/lib/Frontend/OpenMP/OMPContext.cpp

Removed: 




diff  --git a/clang/include/clang/AST/OpenMPClause.h 
b/clang/include/clang/AST/OpenMPClause.h
index 9f5ff5a85182..f276611e3d0c 100644
--- a/clang/include/clang/AST/OpenMPClause.h
+++ b/clang/include/clang/AST/OpenMPClause.h
@@ -7229,11 +7229,9 @@ class OMPTraitInfo {
   /// former is a flat representation the actual main 
diff erence is that the
   /// latter uses clang::Expr to store the score/condition while the former is
   /// independent of clang. Thus, expressions and conditions are evaluated in
-  /// this method. If \p DeviceSetOnly is true, only the device selector set, 
if
-  /// present, is put in \p VMI, otherwise all selector sets are put in \p VMI.
+  /// this method.
   void getAsVariantMatchInfo(ASTContext &ASTCtx,
- llvm::omp::VariantMatchInfo &VMI,
- bool DeviceSetOnly) const;
+ llvm::omp::VariantMatchInfo &VMI) const;
 
   /// Return a string representation identifying this context selector.
   std::string getMangledName() const;

diff  --git a/clang/include/clang/Basic/AttrDocs.td 
b/clang/include/clang/Basic/AttrDocs.td
index 36561c04d395..e3308cd74874 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -3394,6 +3394,20 @@ where clause is one of the following:
 and where `variant-func-id` is the name of a function variant that is either a
 base language identifier or, for C++, a template-id.
 
+Clang provides the following context selector extensions, used via 
`implementation={extension(EXTENSION)}`:
+
+  .. code-block:: none
+
+match_all
+match_any
+match_none
+
+The match extensions change when the *entire* context selector is considered a
+match for an OpenMP context. The default is `all`, with `none` no trait in the
+selector is allowed to be in the OpenMP context, with `any` a single trait in
+both the selector and OpenMP context is sufficient. Only a single match
+extension trait is allowed per context selector.
+
   }];
 }
 

diff  --git a/clang/include/clang/Basic/DiagnosticParseKinds.td 
b/clang/include/clang/Basic/DiagnosticParseKinds.td
index f3741531c8b5..a1311d7776c6 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1313,6 +1313,8 @@ def warn_omp_ctx_incompatible_score_for_property
 def warn_omp_more_one_device_type_clause
 : Warning<"more than one 'device_type' clause is specified">,
   InGroup;
+def err_omp_variant_ctx_second_match_extension : Error<
+  "only a single match extension allowed per OpenMP context s

[clang] eb5a16e - [OpenMP] Specialize OpenMP calls after template instantiation

2020-04-07 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-07T23:33:24-05:00
New Revision: eb5a16efbf59150af31bd4e3d37b8ea5976d780b

URL: 
https://github.com/llvm/llvm-project/commit/eb5a16efbf59150af31bd4e3d37b8ea5976d780b
DIFF: 
https://github.com/llvm/llvm-project/commit/eb5a16efbf59150af31bd4e3d37b8ea5976d780b.diff

LOG: [OpenMP] Specialize OpenMP calls after template instantiation

As with regular calls, we want to specialize a call that went through
template instantiation if it has an applicable OpenMP declare variant.

Reviewed By: erichkeane, mikerice

Differential Revision: https://reviews.llvm.org/D77290

Added: 
clang/test/AST/ast-dump-openmp-begin-declare-variant_template_1.cpp

Modified: 
clang/lib/Sema/TreeTransform.h

Removed: 




diff  --git a/clang/lib/Sema/TreeTransform.h b/clang/lib/Sema/TreeTransform.h
index e9f4b11ca7bb..a3103205f2bd 100644
--- a/clang/lib/Sema/TreeTransform.h
+++ b/clang/lib/Sema/TreeTransform.h
@@ -2411,8 +2411,8 @@ class TreeTransform {
MultiExprArg Args,
SourceLocation RParenLoc,
Expr *ExecConfig = nullptr) {
-return getSema().BuildCallExpr(/*Scope=*/nullptr, Callee, LParenLoc, Args,
-   RParenLoc, ExecConfig);
+return getSema().ActOnCallExpr(
+/*Scope=*/nullptr, Callee, LParenLoc, Args, RParenLoc, ExecConfig);
   }
 
   /// Build a new member access expression.

diff  --git 
a/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_1.cpp 
b/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_1.cpp
new file mode 100644
index ..6a663d5d75d9
--- /dev/null
+++ b/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_1.cpp
@@ -0,0 +1,170 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -fopenmp -verify -ast-dump 
%s   | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -fopenmp -verify -ast-dump 
%s -x c++| FileCheck %s
+// expected-no-diagnostics
+
+int also_before() {
+  return 1;
+}
+
+#pragma omp begin declare variant 
match(implementation={vendor(score(100):llvm)})
+int also_after(void) {
+  return 2;
+}
+int also_after(int) {
+  return 3;
+}
+int also_after(double) {
+  return 0;
+}
+#pragma omp end declare variant
+#pragma omp begin declare variant match(implementation={vendor(score(0):llvm)})
+int also_before() {
+  return 0;
+}
+#pragma omp end declare variant
+
+int also_after(void) {
+  return 4;
+}
+int also_after(int) {
+  return 5;
+}
+int also_after(double) {
+  return 6;
+}
+
+template
+int test1() {
+  // Should return 0.
+  return also_after(T(0));
+}
+
+typedef int(*Ty)();
+
+template
+int test2() {
+  // Should return 0.
+  return fn();
+}
+
+int test() {
+  // Should return 0.
+  return test1() + test2();
+}
+
+// CHECK:  |-FunctionDecl [[ADDR_0:0x[a-z0-9]*]] <{{.*}}, line:7:1> 
line:5:5 used also_before 'int ({{.*}})'
+// CHECK-NEXT: | |-CompoundStmt [[ADDR_1:0x[a-z0-9]*]] 
+// CHECK-NEXT: | | `-ReturnStmt [[ADDR_2:0x[a-z0-9]*]] 
+// CHECK-NEXT: | |   `-IntegerLiteral [[ADDR_3:0x[a-z0-9]*]]  'int' 1
+// CHECK-NEXT: | `-OMPDeclareVariantAttr [[ADDR_4:0x[a-z0-9]*]] <> Implicit implementation={vendor(score(0): llvm)}
+// CHECK-NEXT: |   `-DeclRefExpr [[ADDR_5:0x[a-z0-9]*]]  'int 
({{.*}})' Function [[ADDR_6:0x[a-z0-9]*]] 
'also_before[implementation={vendor(llvm)}]' 'int ({{.*}})'
+// CHECK-NEXT: |-FunctionDecl [[ADDR_7:0x[a-z0-9]*]]  col:5 
implicit also_after 'int ({{.*}})'
+// CHECK-NEXT: | `-OMPDeclareVariantAttr [[ADDR_8:0x[a-z0-9]*]] <> Implicit implementation={vendor(score(100): llvm)}
+// CHECK-NEXT: |   `-DeclRefExpr [[ADDR_9:0x[a-z0-9]*]]  'int ({{.*}})' 
Function [[ADDR_10:0x[a-z0-9]*]] 'also_after[implementation={vendor(llvm)}]' 
'int ({{.*}})'
+// CHECK-NEXT: |-FunctionDecl [[ADDR_10]]  line:10:1 
also_after[implementation={vendor(llvm)}] 'int ({{.*}})'
+// CHECK-NEXT: | `-CompoundStmt [[ADDR_11:0x[a-z0-9]*]] 
+// CHECK-NEXT: |   `-ReturnStmt [[ADDR_12:0x[a-z0-9]*]] 
+// CHECK-NEXT: | `-IntegerLiteral [[ADDR_13:0x[a-z0-9]*]]  'int' 2
+// CHECK-NEXT: |-FunctionDecl [[ADDR_14:0x[a-z0-9]*]]  
col:5 implicit also_after 'int (int)'
+// CHECK-NEXT: | |-ParmVarDecl [[ADDR_15:0x[a-z0-9]*]]  col:19 'int'
+// CHECK-NEXT: | `-OMPDeclareVariantAttr [[ADDR_16:0x[a-z0-9]*]] <> Implicit implementation={vendor(score(100): llvm)}
+// CHECK-NEXT: |   `-DeclRefExpr [[ADDR_17:0x[a-z0-9]*]]  'int (int)' 
Function [[ADDR_18:0x[a-z0-9]*]] 'also_after[implementation={vendor(llvm)}]' 
'int (int)'
+// CHECK-NEXT: |-FunctionDecl [[ADDR_18]]  line:13:1 
also_after[implementation={vendor(llvm)}] 'int (int)'
+// CHECK-NEXT: | |-ParmVarDecl [[ADDR_15]]  col:19 'int'
+// CHECK-NEXT: | `-CompoundStmt [[ADDR_19:0x[a-z0-9]*]] 
+// CHECK-NEXT: |   `-ReturnStmt [[ADDR_20:0x[a-z0-9]*]] 
+// CHECK-NEXT: | `-IntegerLiteral [[ADDR_21:0x[a-z0-9]*]]  'int' 3
+// CHECK-NEXT: |-Function

[clang] f85ae05 - [OpenMP] Provide math functions in OpenMP device code via OpenMP variants

2020-04-07 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-07T23:33:24-05:00
New Revision: f85ae058f580e9d74c4a8f2f0de168c18da6150f

URL: 
https://github.com/llvm/llvm-project/commit/f85ae058f580e9d74c4a8f2f0de168c18da6150f
DIFF: 
https://github.com/llvm/llvm-project/commit/f85ae058f580e9d74c4a8f2f0de168c18da6150f.diff

LOG: [OpenMP] Provide math functions in OpenMP device code via OpenMP variants

For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation 
of math functions,
we include the appropriate definitions inside of an `omp begin/end declare 
variant match(device={arch(nvptx)})` scope.
This way, the vendor specific math functions will become specialized versions 
of the system math functions.
When a system math function is called and specialized version is available the 
selection logic introduced in D75779
instead call the specialized version. In contrast to the code path we used so 
far, the system header is actually included.
This means functions without specialized versions are available and so are 
macro definitions.

This should address PR42061, PR42798, and PR42799.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D75788

Added: 
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
clang/lib/Headers/openmp_wrappers/time.h
clang/test/Headers/Inputs/include/climits
clang/test/Headers/nvptx_device_math_complex.c
clang/test/Headers/nvptx_device_math_macro.cpp
clang/test/Headers/nvptx_device_math_modf.cpp
clang/test/Headers/nvptx_device_math_sin.c
clang/test/Headers/nvptx_device_math_sin.cpp
clang/test/Headers/nvptx_device_math_sin_cos.cpp
clang/test/Headers/nvptx_device_math_sincos.cpp

Modified: 
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Headers/CMakeLists.txt
clang/lib/Headers/__clang_cuda_cmath.h
clang/lib/Headers/__clang_cuda_device_functions.h
clang/lib/Headers/__clang_cuda_math.h
clang/lib/Headers/__clang_cuda_math_forward_declares.h
clang/lib/Headers/openmp_wrappers/cmath
clang/lib/Headers/openmp_wrappers/math.h
clang/test/Headers/Inputs/include/cmath
clang/test/Headers/Inputs/include/cstdlib
clang/test/Headers/Inputs/include/math.h
clang/test/Headers/Inputs/include/stdlib.h
clang/test/Headers/nvptx_device_cmath_functions.c
clang/test/Headers/nvptx_device_cmath_functions.cpp
clang/test/Headers/nvptx_device_cmath_functions_cxx17.cpp
clang/test/Headers/nvptx_device_math_functions.c
clang/test/Headers/nvptx_device_math_functions.cpp
clang/test/Headers/nvptx_device_math_functions_cxx17.cpp

Removed: 
clang/lib/Headers/openmp_wrappers/__clang_openmp_math.h
clang/lib/Headers/openmp_wrappers/__clang_openmp_math_declares.h



diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 4d825301be41..2b368131f5cc 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -1216,7 +1216,7 @@ void Clang::AddPreprocessingOptions(Compilation &C, const 
JobAction &JA,
 }
 
 CmdArgs.push_back("-include");
-CmdArgs.push_back("__clang_openmp_math_declares.h");
+CmdArgs.push_back("__clang_openmp_device_functions.h");
   }
 
   // Add -i* options, and automatically translate to

diff  --git a/clang/lib/Headers/CMakeLists.txt 
b/clang/lib/Headers/CMakeLists.txt
index 6851957600e0..d6c8ed5e1fc6 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -145,8 +145,7 @@ set(ppc_wrapper_files
 set(openmp_wrapper_files
   openmp_wrappers/math.h
   openmp_wrappers/cmath
-  openmp_wrappers/__clang_openmp_math.h
-  openmp_wrappers/__clang_openmp_math_declares.h
+  openmp_wrappers/__clang_openmp_device_functions.h
   openmp_wrappers/new
 )
 

diff  --git a/clang/lib/Headers/__clang_cuda_cmath.h 
b/clang/lib/Headers/__clang_cuda_cmath.h
index 834a2e3fd134..f406112164e5 100644
--- a/clang/lib/Headers/__clang_cuda_cmath.h
+++ b/clang/lib/Headers/__clang_cuda_cmath.h
@@ -12,7 +12,9 @@
 #error "This file is for CUDA compilation only."
 #endif
 
+#ifndef _OPENMP
 #include 
+#endif
 
 // CUDA lets us use various std math functions on the device side.  This file
 // works in concert with __clang_cuda_math_forward_declares.h to make this 
work.
@@ -31,31 +33,15 @@
 // std covers all of the known knowns.
 
 #ifdef _OPENMP
-#define __DEVICE__ static __attribute__((always_inline))
+#define __DEVICE__ static constexpr __attribute__((always_inline, nothrow))
 #else
 #define __DEVICE__ static __device__ __inline__ __attribute__((always_inline))
 #endif
 
-// For C++ 17 we need to include noexcept attribute to be compatible
-// with the header-defined version. This may be removed once
-// variant is supported.
-#if defined(_OPENMP) && defined(__cplusplus) && __cplusplus >= 201703L
-#define __NOEXCEPT noexcept
-#else
-#define __NOEXCEPT
-#endif
-
-#if !(defined(_OPENMP) && de

[clang] 17d8334 - [OpenMP] Allow to go first in C++-mode in target regions

2020-04-09 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-04-09T22:10:31-05:00
New Revision: 17d83342235f01d4b110dc5d4664fe96f6597f11

URL: 
https://github.com/llvm/llvm-project/commit/17d83342235f01d4b110dc5d4664fe96f6597f11
DIFF: 
https://github.com/llvm/llvm-project/commit/17d83342235f01d4b110dc5d4664fe96f6597f11.diff

LOG: [OpenMP] Allow  to go first in C++-mode in target regions

If we are in C++ mode and include  (not ) first, we still
need to make sure  is read first. The problem otherwise is that
we haven't seen the declarations of the math.h functions when the system
math.h includes our cmath overlay. However, our cmath overlay, or better
the underlying overlay, e.g. CUDA, uses the math.h functions. Since we
haven't declared them yet we get errors. CUDA avoids this by eagerly
declaring all math functions (in the __device__ space) but we cannot do
this. Instead we break the dependence by forcing cmath to go first.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D4

Added: 


Modified: 
clang/lib/Headers/openmp_wrappers/math.h
clang/test/Headers/Inputs/include/math.h
clang/test/Headers/nvptx_device_math_sincos.cpp

Removed: 




diff  --git a/clang/lib/Headers/openmp_wrappers/math.h 
b/clang/lib/Headers/openmp_wrappers/math.h
index 1ce22e065c27..e917a149b5c9 100644
--- a/clang/lib/Headers/openmp_wrappers/math.h
+++ b/clang/lib/Headers/openmp_wrappers/math.h
@@ -7,6 +7,19 @@
  *===---===
  */
 
+// If we are in C++ mode and include  (not ) first, we still 
need
+// to make sure  is read first. The problem otherwise is that we haven't
+// seen the declarations of the math.h functions when the system math.h 
includes
+// our cmath overlay. However, our cmath overlay, or better the underlying
+// overlay, e.g. CUDA, uses the math.h functions. Since we haven't declared 
them
+// yet we get errors. CUDA avoids this by eagerly declaring all math functions
+// (in the __device__ space) but we cannot do this. Instead we break the
+// dependence by forcing cmath to go first. While our cmath will in turn 
include
+// this file, the cmath guards will prevent recursion.
+#ifdef __cplusplus
+#include 
+#endif
+
 #ifndef __CLANG_OPENMP_MATH_H__
 #define __CLANG_OPENMP_MATH_H__
 

diff  --git a/clang/test/Headers/Inputs/include/math.h 
b/clang/test/Headers/Inputs/include/math.h
index a60ad45b4d71..b13b14f2b124 100644
--- a/clang/test/Headers/Inputs/include/math.h
+++ b/clang/test/Headers/Inputs/include/math.h
@@ -197,3 +197,7 @@ float ynf(int __a, float __b);
  * math functions.
  */
 #define HUGE_VAL (__builtin_huge_val())
+
+#ifdef __cplusplus
+#include 
+#endif

diff  --git a/clang/test/Headers/nvptx_device_math_sincos.cpp 
b/clang/test/Headers/nvptx_device_math_sincos.cpp
index 5419ee2c3513..cf9b67903bf6 100644
--- a/clang/test/Headers/nvptx_device_math_sincos.cpp
+++ b/clang/test/Headers/nvptx_device_math_sincos.cpp
@@ -1,8 +1,13 @@
 // REQUIRES: nvptx-registered-target
 // RUN: %clang_cc1 -internal-isystem %S/Inputs/include -fopenmp -triple 
powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc 
%s -o %t-ppc-host.bc
-// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers 
-include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include 
-fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown 
-fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device 
-fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers  
   -include __clang_openmp_device_functions.h -internal-isystem 
%S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple 
powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
+// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers 
-DCMATH -include __clang_openmp_device_functions.h -internal-isystem 
%S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple 
powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s 
-fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s
 
+#ifdef CMATH
 #include 
+#else
+#include 
+#endif
 
 // 4 calls to sincos(f), all translated to __nv_sincos calls:
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] d999cbc - [OpenMP] Initial support for std::complex in target regions

2020-07-08 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-08T17:33:59-05:00
New Revision: d999cbc98832154e15e786b98281211d5c1b9f5d

URL: 
https://github.com/llvm/llvm-project/commit/d999cbc98832154e15e786b98281211d5c1b9f5d
DIFF: 
https://github.com/llvm/llvm-project/commit/d999cbc98832154e15e786b98281211d5c1b9f5d.diff

LOG: [OpenMP] Initial support for std::complex in target regions

This simply follows the scheme we have for other wrappers. It resolves
the current link problem, e.g., `__muldc3 not found`, when std::complex
operations are used on a device.

This will not allow complex make math function calls to work properly,
e.g., sin, but that is more complex (pan intended) anyway.

Reviewed By: tra, JonChesterfield

Differential Revision: https://reviews.llvm.org/D80897

Added: 
clang/lib/Headers/openmp_wrappers/complex
clang/lib/Headers/openmp_wrappers/complex.h
clang/test/Headers/Inputs/include/complex
clang/test/Headers/nvptx_device_math_complex.cpp

Modified: 
clang/lib/Headers/CMakeLists.txt
clang/lib/Headers/__clang_cuda_complex_builtins.h
clang/lib/Headers/__clang_cuda_math.h
clang/test/Headers/Inputs/include/cmath
clang/test/Headers/Inputs/include/cstdlib
clang/test/Headers/nvptx_device_math_complex.c

Removed: 




diff  --git a/clang/lib/Headers/CMakeLists.txt 
b/clang/lib/Headers/CMakeLists.txt
index e7bee192d918..0692fe75a441 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -151,6 +151,8 @@ set(ppc_wrapper_files
 set(openmp_wrapper_files
   openmp_wrappers/math.h
   openmp_wrappers/cmath
+  openmp_wrappers/complex.h
+  openmp_wrappers/complex
   openmp_wrappers/__clang_openmp_device_functions.h
   openmp_wrappers/new
 )

diff  --git a/clang/lib/Headers/__clang_cuda_complex_builtins.h 
b/clang/lib/Headers/__clang_cuda_complex_builtins.h
index 576a958b16bb..d698be71d011 100644
--- a/clang/lib/Headers/__clang_cuda_complex_builtins.h
+++ b/clang/lib/Headers/__clang_cuda_complex_builtins.h
@@ -13,10 +13,61 @@
 // This header defines __muldc3, __mulsc3, __divdc3, and __divsc3.  These are
 // libgcc functions that clang assumes are available when compiling c99 complex
 // operations.  (These implementations come from libc++, and have been modified
-// to work with CUDA.)
+// to work with CUDA and OpenMP target offloading [in C and C++ mode].)
 
-extern "C" inline __device__ double _Complex __muldc3(double __a, double __b,
-  double __c, double __d) {
+#pragma push_macro("__DEVICE__")
+#ifdef _OPENMP
+#pragma omp declare target
+#define __DEVICE__ __attribute__((noinline, nothrow, cold))
+#else
+#define __DEVICE__ __device__ inline
+#endif
+
+// Make the algorithms available for C and C++ by selecting the right 
functions.
+#if defined(__cplusplus)
+// TODO: In OpenMP mode we cannot overload isinf/isnan/isfinite the way we
+// overload all other math functions because old math system headers and not
+// always conformant and return an integer instead of a boolean. Until that has
+// been addressed we need to work around it. For now, we substituate with the
+// calls we would have used to implement those three functions. Note that we
+// could use the C alternatives as well.
+#define _ISNANd ::__isnan
+#define _ISNANf ::__isnanf
+#define _ISINFd ::__isinf
+#define _ISINFf ::__isinff
+#define _ISFINITEd ::__isfinited
+#define _ISFINITEf ::__finitef
+#define _COPYSIGNd std::copysign
+#define _COPYSIGNf std::copysign
+#define _SCALBNd std::scalbn
+#define _SCALBNf std::scalbn
+#define _ABSd std::abs
+#define _ABSf std::abs
+#define _LOGBd std::logb
+#define _LOGBf std::logb
+#else
+#define _ISNANd isnan
+#define _ISNANf isnanf
+#define _ISINFd isinf
+#define _ISINFf isinff
+#define _ISFINITEd isfinite
+#define _ISFINITEf isfinitef
+#define _COPYSIGNd copysign
+#define _COPYSIGNf copysignf
+#define _SCALBNd scalbn
+#define _SCALBNf scalbnf
+#define _ABSd abs
+#define _ABSf absf
+#define _LOGBd logb
+#define _LOGBf logbf
+#endif
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+__DEVICE__ double _Complex __muldc3(double __a, double __b, double __c,
+double __d) {
   double __ac = __a * __c;
   double __bd = __b * __d;
   double __ad = __a * __d;
@@ -24,50 +75,49 @@ extern "C" inline __device__ double _Complex 
__muldc3(double __a, double __b,
   double _Complex z;
   __real__(z) = __ac - __bd;
   __imag__(z) = __ad + __bc;
-  if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {
+  if (_ISNANd(__real__(z)) && _ISNANd(__imag__(z))) {
 int __recalc = 0;
-if (std::isinf(__a) || std::isinf(__b)) {
-  __a = std::copysign(std::isinf(__a) ? 1 : 0, __a);
-  __b = std::copysign(std::isinf(__b) ? 1 : 0, __b);
-  if (std::isnan(__c))
-__c = std::copysign(0, __c);
-  if (std::isnan(__d))
-__d = std::copysign(0, __d);
+if (_ISINFd(__a) |

[clang] e3e47e8 - [OpenMP] Make complex soft-float functions on the GPU weak definitions

2020-07-08 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-09T01:06:55-05:00
New Revision: e3e47e80355422df2e730cf97a0c80bb6de3915e

URL: 
https://github.com/llvm/llvm-project/commit/e3e47e80355422df2e730cf97a0c80bb6de3915e
DIFF: 
https://github.com/llvm/llvm-project/commit/e3e47e80355422df2e730cf97a0c80bb6de3915e.diff

LOG: [OpenMP] Make complex soft-float functions on the GPU weak definitions

To avoid linkage errors we have to ensure the linkage allows multiple
definitions of these compiler inserted functions. Since they are on the
cold path of complex computations, we want to avoid `inline`. Instead,
we opt for `weak` and `noinline` for now.

Added: 


Modified: 
clang/lib/Headers/__clang_cuda_complex_builtins.h
clang/test/Headers/nvptx_device_math_complex.c
clang/test/Headers/nvptx_device_math_complex.cpp

Removed: 




diff  --git a/clang/lib/Headers/__clang_cuda_complex_builtins.h 
b/clang/lib/Headers/__clang_cuda_complex_builtins.h
index d698be71d011..c48c754ed1a4 100644
--- a/clang/lib/Headers/__clang_cuda_complex_builtins.h
+++ b/clang/lib/Headers/__clang_cuda_complex_builtins.h
@@ -18,7 +18,7 @@
 #pragma push_macro("__DEVICE__")
 #ifdef _OPENMP
 #pragma omp declare target
-#define __DEVICE__ __attribute__((noinline, nothrow, cold))
+#define __DEVICE__ __attribute__((noinline, nothrow, cold, weak))
 #else
 #define __DEVICE__ __device__ inline
 #endif

diff  --git a/clang/test/Headers/nvptx_device_math_complex.c 
b/clang/test/Headers/nvptx_device_math_complex.c
index 9b96b5dd8c22..0e212592dd2b 100644
--- a/clang/test/Headers/nvptx_device_math_complex.c
+++ b/clang/test/Headers/nvptx_device_math_complex.c
@@ -11,10 +11,10 @@
 #include 
 #endif
 
-// CHECK-DAG: define {{.*}} @__mulsc3
-// CHECK-DAG: define {{.*}} @__muldc3
-// CHECK-DAG: define {{.*}} @__divsc3
-// CHECK-DAG: define {{.*}} @__divdc3
+// CHECK-DAG: define weak {{.*}} @__mulsc3
+// CHECK-DAG: define weak {{.*}} @__muldc3
+// CHECK-DAG: define weak {{.*}} @__divsc3
+// CHECK-DAG: define weak {{.*}} @__divdc3
 
 // CHECK-DAG: call float @__nv_scalbnf(
 void test_scmplx(float _Complex a) {

diff  --git a/clang/test/Headers/nvptx_device_math_complex.cpp 
b/clang/test/Headers/nvptx_device_math_complex.cpp
index 15434d907605..58ed24b74b0e 100644
--- a/clang/test/Headers/nvptx_device_math_complex.cpp
+++ b/clang/test/Headers/nvptx_device_math_complex.cpp
@@ -5,10 +5,10 @@
 
 #include 
 
-// CHECK-DAG: define {{.*}} @__mulsc3
-// CHECK-DAG: define {{.*}} @__muldc3
-// CHECK-DAG: define {{.*}} @__divsc3
-// CHECK-DAG: define {{.*}} @__divdc3
+// CHECK-DAG: define weak {{.*}} @__mulsc3
+// CHECK-DAG: define weak {{.*}} @__muldc3
+// CHECK-DAG: define weak {{.*}} @__divsc3
+// CHECK-DAG: define weak {{.*}} @__divdc3
 
 // CHECK-DAG: call float @__nv_scalbnf(
 void test_scmplx(std::complex a) {



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 7f1e6fc - [OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers

2020-07-10 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-10T18:53:34-05:00
New Revision: 7f1e6fcff9427adfa8efa3bfeeeac801da788b87

URL: 
https://github.com/llvm/llvm-project/commit/7f1e6fcff9427adfa8efa3bfeeeac801da788b87
DIFF: 
https://github.com/llvm/llvm-project/commit/7f1e6fcff9427adfa8efa3bfeeeac801da788b87.diff

LOG: [OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers

Due to recent changes we cannot use OpenMP in CUDA files anymore
(PR45533) as the math handling of CUDA is different when _OPENMP is
defined. We actually want this different behavior only if we are
offloading with OpenMP to NVIDIA, thus generating NVPTX. With this patch
we do not interfere with the CUDA math handling except if we are in
NVPTX offloading mode, as indicated by the presence of __OPENMP_NVPTX__.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D78155

Added: 


Modified: 
clang/lib/Headers/__clang_cuda_cmath.h
clang/lib/Headers/__clang_cuda_device_functions.h
clang/lib/Headers/__clang_cuda_libdevice_declares.h
clang/lib/Headers/__clang_cuda_math.h
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
clang/lib/Headers/openmp_wrappers/cmath
clang/lib/Headers/openmp_wrappers/math.h

Removed: 




diff  --git a/clang/lib/Headers/__clang_cuda_cmath.h 
b/clang/lib/Headers/__clang_cuda_cmath.h
index f406112164e5..8ba182689a4f 100644
--- a/clang/lib/Headers/__clang_cuda_cmath.h
+++ b/clang/lib/Headers/__clang_cuda_cmath.h
@@ -12,7 +12,7 @@
 #error "This file is for CUDA compilation only."
 #endif
 
-#ifndef _OPENMP
+#ifndef __OPENMP_NVPTX__
 #include 
 #endif
 
@@ -32,7 +32,7 @@
 // implementation.  Declaring in the global namespace and pulling into 
namespace
 // std covers all of the known knowns.
 
-#ifdef _OPENMP
+#ifdef __OPENMP_NVPTX__
 #define __DEVICE__ static constexpr __attribute__((always_inline, nothrow))
 #else
 #define __DEVICE__ static __device__ __inline__ __attribute__((always_inline))
@@ -69,7 +69,7 @@ __DEVICE__ float frexp(float __arg, int *__exp) {
 // Windows. For OpenMP we omit these as some old system headers have
 // non-conforming `isinf(float)` and `isnan(float)` implementations that return
 // an `int`. The system versions of these functions should be fine anyway.
-#if !defined(_MSC_VER) && !defined(_OPENMP)
+#if !defined(_MSC_VER) && !defined(__OPENMP_NVPTX__)
 __DEVICE__ bool isinf(float __x) { return ::__isinff(__x); }
 __DEVICE__ bool isinf(double __x) { return ::__isinf(__x); }
 __DEVICE__ bool isfinite(float __x) { return ::__finitef(__x); }
@@ -146,7 +146,7 @@ __DEVICE__ float tanh(float __x) { return ::tanhf(__x); }
 // libdevice doesn't provide an implementation, and we don't want to be in the
 // business of implementing tricky libm functions in this header.
 
-#ifndef _OPENMP
+#ifndef __OPENMP_NVPTX__
 
 // Now we've defined everything we promised we'd define in
 // __clang_cuda_math_forward_declares.h.  We need to do two additional things 
to
@@ -463,7 +463,7 @@ _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 #endif
 
-#endif // _OPENMP
+#endif // __OPENMP_NVPTX__
 
 #undef __DEVICE__
 

diff  --git a/clang/lib/Headers/__clang_cuda_device_functions.h 
b/clang/lib/Headers/__clang_cuda_device_functions.h
index 76c588997f18..f801e5426aa4 100644
--- a/clang/lib/Headers/__clang_cuda_device_functions.h
+++ b/clang/lib/Headers/__clang_cuda_device_functions.h
@@ -10,7 +10,7 @@
 #ifndef __CLANG_CUDA_DEVICE_FUNCTIONS_H__
 #define __CLANG_CUDA_DEVICE_FUNCTIONS_H__
 
-#ifndef _OPENMP
+#ifndef __OPENMP_NVPTX__
 #if CUDA_VERSION < 9000
 #error This file is intended to be used with CUDA-9+ only.
 #endif
@@ -20,7 +20,7 @@
 // we implement in this file. We need static in order to avoid emitting unused
 // functions and __forceinline__ helps inlining these wrappers at -O1.
 #pragma push_macro("__DEVICE__")
-#ifdef _OPENMP
+#ifdef __OPENMP_NVPTX__
 #define __DEVICE__ static __attribute__((always_inline, nothrow))
 #else
 #define __DEVICE__ static __device__ __forceinline__
@@ -1466,14 +1466,14 @@ __DEVICE__ unsigned int __vsubus4(unsigned int __a, 
unsigned int __b) {
 
 // For OpenMP we require the user to include  as we need to know what
 // clock_t is on the system.
-#ifndef _OPENMP
+#ifndef __OPENMP_NVPTX__
 __DEVICE__ /* clock_t= */ int clock() { return __nvvm_read_ptx_sreg_clock(); }
 #endif
 __DEVICE__ long long clock64() { return __nvvm_read_ptx_sreg_clock64(); }
 
 // These functions shouldn't be declared when including this header
 // for math function resolution purposes.
-#ifndef _OPENMP
+#ifndef __OPENMP_NVPTX__
 __DEVICE__ void *memcpy(void *__a, const void *__b, size_t __c) {
   return __builtin_memcpy(__a, __b, __c);
 }

diff  --git a/clang/lib/Headers/__clang_cuda_libdevice_declares.h 
b/clang/lib/Headers/__clang_cuda_libdevice_declares.h
index 4d70353394c8..6173b589e3ef 100644
--- a/clang/lib/Headers/__clang_cuda_libdevice_declares.h
+++ b/cl

[clang] cd0ea03 - [OpenMP][NFC] Remove unused and untested code from the device runtime

2020-07-10 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-10T19:09:41-05:00
New Revision: cd0ea03e6f157e8fb477cd8368b29e1448eeb265

URL: 
https://github.com/llvm/llvm-project/commit/cd0ea03e6f157e8fb477cd8368b29e1448eeb265
DIFF: 
https://github.com/llvm/llvm-project/commit/cd0ea03e6f157e8fb477cd8368b29e1448eeb265.diff

LOG: [OpenMP][NFC] Remove unused and untested code from the device runtime

Summary:
We carried a lot of unused and untested code in the device runtime.
Among other reasons, we are planning major rewrites for which reduced
size is going to help a lot.

The number of code lines reduced by 14%!

Before:
---
Language files  blankcomment   code
---
CUDA13489841   2454
C/C++ Header14322493   1377
C   12117124559
CMake4 64 64262
C++  1  6  6 39
---
SUM:44998   1528   4691
---

After:
---
Language files  blankcomment   code
---
CUDA13366733   1879
C/C++ Header14317484   1293
C   12117124559
CMake4 64 64262
C++  1  6  6 39
---
SUM:44870   1411   4032
---

Reviewers: hfinkel, jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, 
ye-luo, tianshilei1992, ggeorgakoudis, Hahnfeld, ABataev, hbae, ronlieb, 
gregrodgers

Subscribers: jvesely, yaxunl, bollu, guansong, jfb, sstefan1, aaron.ballman, 
openmp-commits, cfe-commits

Tags: #clang, #openmp

Differential Revision: https://reviews.llvm.org/D83349

Added: 


Modified: 
clang/test/OpenMP/nvptx_target_simd_codegen.cpp
openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h
openmp/libomptarget/deviceRTLs/common/omptarget.h
openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
openmp/libomptarget/deviceRTLs/common/src/libcall.cu
openmp/libomptarget/deviceRTLs/common/src/loop.cu
openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
openmp/libomptarget/deviceRTLs/common/src/parallel.cu
openmp/libomptarget/deviceRTLs/common/src/reduction.cu
openmp/libomptarget/deviceRTLs/common/src/support.cu
openmp/libomptarget/deviceRTLs/common/src/sync.cu
openmp/libomptarget/deviceRTLs/common/support.h
openmp/libomptarget/deviceRTLs/interface.h
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h

Removed: 




diff  --git a/clang/test/OpenMP/nvptx_target_simd_codegen.cpp 
b/clang/test/OpenMP/nvptx_target_simd_codegen.cpp
index 073d6fa2f14e..7a1f01c1f1ad 100644
--- a/clang/test/OpenMP/nvptx_target_simd_codegen.cpp
+++ b/clang/test/OpenMP/nvptx_target_simd_codegen.cpp
@@ -78,7 +78,6 @@ int bar(int n){
 // CHECK: call void @__kmpc_spmd_kernel_init(i32 %{{.+}}, i16 0, i16 0)
 // CHECK-NOT: call void @__kmpc_for_static_init
 // CHECK-NOT: call void @__kmpc_for_static_fini
-// CHECK-NOT: call i32 @__kmpc_nvptx_simd_reduce_nowait(
 // CHECK-NOT: call void @__kmpc_nvptx_end_reduce_nowait(
 // CHECK: call void @__kmpc_spmd_kernel_deinit_v2(i16 0)
 // CHECK: ret void

diff  --git a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h 
b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h
index 77a0ffb54f95..3c90b39282c9 100644
--- a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h
+++ b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h
@@ -140,8 +140,6 @@ DEVICE int GetNumberOfThreadsInBlock();
 DEVICE unsigned GetWarpId();
 DEVICE unsigned GetLaneId();
 
-DEVICE bool __kmpc_impl_is_first_active_thread();
-
 // Locks
 DEVICE void __kmpc_impl_init_lock(omp_lock_t *lock);
 DEVICE void __kmpc_impl_destroy_lock(omp_lock_t *lock);

diff  --git a/openmp/libomptarget/deviceRTLs/common/omptarget.h 
b/openmp/libomptarget/deviceRTLs/common/ompta

[clang] b5667d0 - [OpenMP][CUDA] Fix std::complex in GPU regions

2020-07-10 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-11T00:40:05-05:00
New Revision: b5667d00e0447747419a783697b84a37f59ce055

URL: 
https://github.com/llvm/llvm-project/commit/b5667d00e0447747419a783697b84a37f59ce055
DIFF: 
https://github.com/llvm/llvm-project/commit/b5667d00e0447747419a783697b84a37f59ce055.diff

LOG: [OpenMP][CUDA] Fix std::complex in GPU regions

The old way worked to some degree for C++-mode but in C mode we actually
tried to introduce variants of macros (e.g., isinf). To make both modes
work reliably we get rid of those extra variants and directly use NVIDIA
intrinsics in the complex implementation. While this has to be revisited
as we add other GPU targets which want to reuse the code, it should be
fine for now.

Reviewed By: tra, JonChesterfield, yaxunl

Differential Revision: https://reviews.llvm.org/D83591

Added: 


Modified: 
clang/lib/Headers/__clang_cuda_complex_builtins.h
clang/lib/Headers/__clang_cuda_math.h
clang/test/Headers/nvptx_device_math_complex.c
clang/test/Headers/nvptx_device_math_complex.cpp

Removed: 




diff  --git a/clang/lib/Headers/__clang_cuda_complex_builtins.h 
b/clang/lib/Headers/__clang_cuda_complex_builtins.h
index c48c754ed1a4..8c10ff6b461f 100644
--- a/clang/lib/Headers/__clang_cuda_complex_builtins.h
+++ b/clang/lib/Headers/__clang_cuda_complex_builtins.h
@@ -23,20 +23,16 @@
 #define __DEVICE__ __device__ inline
 #endif
 
-// Make the algorithms available for C and C++ by selecting the right 
functions.
-#if defined(__cplusplus)
-// TODO: In OpenMP mode we cannot overload isinf/isnan/isfinite the way we
-// overload all other math functions because old math system headers and not
-// always conformant and return an integer instead of a boolean. Until that has
-// been addressed we need to work around it. For now, we substituate with the
-// calls we would have used to implement those three functions. Note that we
-// could use the C alternatives as well.
-#define _ISNANd ::__isnan
-#define _ISNANf ::__isnanf
-#define _ISINFd ::__isinf
-#define _ISINFf ::__isinff
-#define _ISFINITEd ::__isfinited
-#define _ISFINITEf ::__finitef
+// To make the algorithms available for C and C++ in CUDA and OpenMP we select
+// 
diff erent but equivalent function versions. TODO: For OpenMP we currently
+// select the native builtins as the overload support for templates is lacking.
+#if !defined(_OPENMP)
+#define _ISNANd std::isnan
+#define _ISNANf std::isnan
+#define _ISINFd std::isinf
+#define _ISINFf std::isinf
+#define _ISFINITEd std::isfinite
+#define _ISFINITEf std::isfinite
 #define _COPYSIGNd std::copysign
 #define _COPYSIGNf std::copysign
 #define _SCALBNd std::scalbn
@@ -46,20 +42,20 @@
 #define _LOGBd std::logb
 #define _LOGBf std::logb
 #else
-#define _ISNANd isnan
-#define _ISNANf isnanf
-#define _ISINFd isinf
-#define _ISINFf isinff
-#define _ISFINITEd isfinite
-#define _ISFINITEf isfinitef
-#define _COPYSIGNd copysign
-#define _COPYSIGNf copysignf
-#define _SCALBNd scalbn
-#define _SCALBNf scalbnf
-#define _ABSd abs
-#define _ABSf absf
-#define _LOGBd logb
-#define _LOGBf logbf
+#define _ISNANd __nv_isnand
+#define _ISNANf __nv_isnanf
+#define _ISINFd __nv_isinfd
+#define _ISINFf __nv_isinff
+#define _ISFINITEd __nv_isfinited
+#define _ISFINITEf __nv_finitef
+#define _COPYSIGNd __nv_copysign
+#define _COPYSIGNf __nv_copysignf
+#define _SCALBNd __nv_scalbn
+#define _SCALBNf __nv_scalbnf
+#define _ABSd __nv_fabs
+#define _ABSf __nv_fabsf
+#define _LOGBd __nv_logb
+#define _LOGBf __nv_logbf
 #endif
 
 #if defined(__cplusplus)

diff  --git a/clang/lib/Headers/__clang_cuda_math.h 
b/clang/lib/Headers/__clang_cuda_math.h
index 2e8e6ae71d9c..332e616702ac 100644
--- a/clang/lib/Headers/__clang_cuda_math.h
+++ b/clang/lib/Headers/__clang_cuda_math.h
@@ -340,16 +340,6 @@ __DEVICE__ float y1f(float __a) { return __nv_y1f(__a); }
 __DEVICE__ double yn(int __a, double __b) { return __nv_yn(__a, __b); }
 __DEVICE__ float ynf(int __a, float __b) { return __nv_ynf(__a, __b); }
 
-// In C++ mode OpenMP takes the system versions of these because some math
-// headers provide the wrong return type. This cannot happen in C and we can 
and
-// want to use the specialized versions right away.
-#if defined(_OPENMP) && !defined(__cplusplus)
-__DEVICE__ int isinff(float __x) { return __nv_isinff(__x); }
-__DEVICE__ int isinf(double __x) { return __nv_isinfd(__x); }
-__DEVICE__ int isnanf(float __x) { return __nv_isnanf(__x); }
-__DEVICE__ int isnan(double __x) { return __nv_isnand(__x); }
-#endif
-
 #pragma pop_macro("__DEVICE__")
 #pragma pop_macro("__DEVICE_VOID__")
 #pragma pop_macro("__FAST_OR_SLOW")

diff  --git a/clang/test/Headers/nvptx_device_math_complex.c 
b/clang/test/Headers/nvptx_device_math_complex.c
index 0e212592dd2b..6e3e8bffbd24 100644
--- a/clang/test/Headers/nvptx_device_math_complex.c
+++ b/clang/test/Headers/nvptx_device_math_complex.c
@@ -11,12 +11,34 @@
 #include 

[clang] c986995 - [OpenMP][NFC] Remove unused (always fixed) arguments

2020-07-10 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-11T00:51:51-05:00
New Revision: c98699582a6333bbe76ff7853b4cd6beb45754cf

URL: 
https://github.com/llvm/llvm-project/commit/c98699582a6333bbe76ff7853b4cd6beb45754cf
DIFF: 
https://github.com/llvm/llvm-project/commit/c98699582a6333bbe76ff7853b4cd6beb45754cf.diff

LOG: [OpenMP][NFC] Remove unused (always fixed) arguments

There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83268

Added: 


Modified: 
clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
clang/test/OpenMP/nvptx_data_sharing.cpp
clang/test/OpenMP/nvptx_parallel_codegen.cpp
clang/test/OpenMP/nvptx_target_codegen.cpp
clang/test/OpenMP/nvptx_target_teams_codegen.cpp
clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
openmp/libomptarget/deviceRTLs/common/src/parallel.cu
openmp/libomptarget/deviceRTLs/interface.h

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
index cabd06bd76e8..cbd443134e7a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
@@ -38,11 +38,9 @@ enum OpenMPRTLFunctionNVPTX {
   /// Call to void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime);
   OMPRTL_NVPTX__kmpc_spmd_kernel_deinit_v2,
   /// Call to void __kmpc_kernel_prepare_parallel(void
-  /// *outlined_function, int16_t
-  /// IsOMPRuntimeInitialized);
+  /// *outlined_function);
   OMPRTL_NVPTX__kmpc_kernel_prepare_parallel,
-  /// Call to bool __kmpc_kernel_parallel(void **outlined_function,
-  /// int16_t IsOMPRuntimeInitialized);
+  /// Call to bool __kmpc_kernel_parallel(void **outlined_function);
   OMPRTL_NVPTX__kmpc_kernel_parallel,
   /// Call to void __kmpc_kernel_end_parallel();
   OMPRTL_NVPTX__kmpc_kernel_end_parallel,
@@ -1466,8 +1464,7 @@ void CGOpenMPRuntimeNVPTX::emitWorkerLoop(CodeGenFunction 
&CGF,
   CGF.InitTempAlloca(WorkFn, llvm::Constant::getNullValue(CGF.Int8PtrTy));
 
   // TODO: Optimize runtime initialization and pass in correct value.
-  llvm::Value *Args[] = {WorkFn.getPointer(),
- /*RequiresOMPRuntime=*/Bld.getInt16(1)};
+  llvm::Value *Args[] = {WorkFn.getPointer()};
   llvm::Value *Ret = CGF.EmitRuntimeCall(
   createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_parallel), Args);
   Bld.CreateStore(Bld.CreateZExt(Ret, CGF.Int8Ty), ExecStatus);
@@ -1595,17 +1592,16 @@ 
CGOpenMPRuntimeNVPTX::createNVPTXRuntimeFunction(unsigned Function) {
   }
   case OMPRTL_NVPTX__kmpc_kernel_prepare_parallel: {
 /// Build void __kmpc_kernel_prepare_parallel(
-/// void *outlined_function, int16_t IsOMPRuntimeInitialized);
-llvm::Type *TypeParams[] = {CGM.Int8PtrTy, CGM.Int16Ty};
+/// void *outlined_function);
+llvm::Type *TypeParams[] = {CGM.Int8PtrTy};
 auto *FnTy =
 llvm::FunctionType::get(CGM.VoidTy, TypeParams, /*isVarArg*/ false);
 RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_kernel_prepare_parallel");
 break;
   }
   case OMPRTL_NVPTX__kmpc_kernel_parallel: {
-/// Build bool __kmpc_kernel_parallel(void **outlined_function,
-/// int16_t IsOMPRuntimeInitialized);
-llvm::Type *TypeParams[] = {CGM.Int8PtrPtrTy, CGM.Int16Ty};
+/// Build bool __kmpc_kernel_parallel(void **outlined_function);
+llvm::Type *TypeParams[] = {CGM.Int8PtrPtrTy};
 llvm::Type *RetTy = CGM.getTypes().ConvertType(CGM.getContext().BoolTy);
 auto *FnTy =
 llvm::FunctionType::get(RetTy, TypeParams, /*isVarArg*/ false);
@@ -2569,7 +2565,7 @@ void CGOpenMPRuntimeNVPTX::emitNonSPMDParallelCall(
 llvm::Value *ID = Bld.CreateBitOrPointerCast(WFn, CGM.Int8PtrTy);
 
 // Prepare for parallel region. Indicate the outlined function.
-llvm::Value *Args[] = {ID, /*RequiresOMPRuntime=*/Bld.getInt16(1)};
+llvm::Value *Args[] = {ID};
 CGF.EmitRuntimeCall(
 createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_prepare_parallel),
 Args);

diff  --git a/clang/test/OpenMP/nvptx_data_sharing.cpp 
b/clang/test/OpenMP/nvptx_data_sharing.cpp
index 2ee6bd2b4701..1372246c7fc8 100644
--- a/clang/test/OpenMP/nvptx_data_sharing.cpp
+++ b/clang/test/OpenMP/nvptx_data_sharing.cpp
@@ -55,7 +55,7 @@ void test_ds(){
 // CK1: [[A:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, 
%struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 0
 // CK1: [[B:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, 
%struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 1
 // CK1: store i32 10, i32* [[A]]
-// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}}, i16 1)
+// CK1: call void @__kmpc_kernel_prepare_parallel({

[clang] fec1f21 - [OpenMP] Emit remarks during GPU state machine optimization

2020-07-14 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-14T22:33:57-05:00
New Revision: fec1f2109f33c9a1a7650272b3bfb8f0f81f6a2b

URL: 
https://github.com/llvm/llvm-project/commit/fec1f2109f33c9a1a7650272b3bfb8f0f81f6a2b
DIFF: 
https://github.com/llvm/llvm-project/commit/fec1f2109f33c9a1a7650272b3bfb8f0f81f6a2b.diff

LOG: [OpenMP] Emit remarks during GPU state machine optimization

Since D83271 we can optimize the GPU state machine to avoid spurious
call edges that increase the register usage of kernels. With this patch
we inform the user why and if this optimization is happening and when it
is not.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D83707

Added: 
clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
clang/test/OpenMP/remarks_parallel_in_target_state_machine.c

Modified: 
llvm/lib/Transforms/IPO/OpenMPOpt.cpp

Removed: 




diff  --git 
a/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c 
b/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
new file mode 100644
index ..c5152d401c8b
--- /dev/null
+++ b/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
@@ -0,0 +1,102 @@
+// RUN: %clang_cc1 -verify=host
  -Rpass=openmp -fopenmp -x c++ 
-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda 
-emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify=all,safe
  -Rpass=openmp -fopenmp -O2 -x c++ 
-triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm 
%s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t.out
+// RUN: %clang_cc1 -fexperimental-new-pass-manager -verify=all,safe
  -Rpass=openmp -fopenmp -O2 -x c++ 
-triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm 
%s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t.out
+
+// host-no-diagnostics
+
+void bar1(void) {
+#pragma omp parallel // #0
+ // all-remark@#0 {{Found a parallel region that is called 
in a target region but not part of a combined target construct nor nesed inside 
a target construct without intermediate code. This can lead to excessive 
register usage for unrelated target regions in the same translation unit due to 
spurious call edges assumed by ptxas.}}
+ // safe-remark@#0 {{Parallel region is not known to be 
called from a unique single target region, maybe the surrounding function has 
external linkage?; will not attempt to rewrite the state machine use.}}
+ // force-remark@#0 {{[UNSAFE] Parallel region is not 
known to be called from a unique single target region, maybe the surrounding 
function has external linkage?; will rewrite the state machine use due to 
command line flag, this can lead to undefined behavior if the parallel region 
is called from a target region outside this translation unit.}}
+ // force-remark@#0 {{Specialize parallel region that is 
only reached from a single target region to avoid spurious call edges and 
excessive register usage in other target regions. (parallel region ID: 
__omp_outlined__2_wrapper, kernel ID: }}
+  {
+  }
+}
+void bar2(void) {
+#pragma omp parallel // #1
+ // all-remark@#1 {{Found a parallel region that is called 
in a target region but not part of a combined target construct nor nesed inside 
a target construct without intermediate code. This can lead to excessive 
register usage for unrelated target regions in the same translation unit due to 
spurious call edges assumed by ptxas.}}
+ // safe-remark@#1 {{Parallel region is not known to be 
called from a unique single target region, maybe the surrounding function has 
external linkage?; will not attempt to rewrite the state machine use.}}
+ // force-remark@#1 {{[UNSAFE] Parallel region is not 
known to be called from a unique single target region, maybe the surrounding 
function has external linkage?; will rewrite the state machine use due to 
command line flag, this can lead to undefined behavior if the parallel region 
is called from a target region outside this translation unit.}}
+ // force-remark@#1 {{Specialize parallel region that is 
only reached from a single target region to avoid spurious call edges and 
excessive register usage in other target regions. (parallel region ID: 
__omp_outlined__6_wrapper, kernel ID: }}
+  {
+  }
+}
+
+void foo1(void) {
+#pragma omp target teams // #2
+ // all-remark@#2 {{Target region containing the 
parallel region that is specialized. (parallel region ID: 
__omp_outlined__1_w

[clang] 7af287d - [OpenMP][IRBuilder] Support nested parallel regions

2020-07-14 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-14T22:39:06-05:00
New Revision: 7af287d0d921471f18b5c3054ce42381c0f973ed

URL: 
https://github.com/llvm/llvm-project/commit/7af287d0d921471f18b5c3054ce42381c0f973ed
DIFF: 
https://github.com/llvm/llvm-project/commit/7af287d0d921471f18b5c3054ce42381c0f973ed.diff

LOG: [OpenMP][IRBuilder] Support nested parallel regions

During code generation we might change/add basic blocks so keeping a
list of them is fairly easy to break. Nested parallel regions were
enough. The new scheme does recompute the list of blocks to be outlined
once it is needed.

Reviewed By: anchu-rajendran

Differential Revision: https://reviews.llvm.org/D82722

Added: 
clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c

Modified: 
clang/test/OpenMP/cancel_codegen.cpp
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Removed: 




diff  --git a/clang/test/OpenMP/cancel_codegen.cpp 
b/clang/test/OpenMP/cancel_codegen.cpp
index b7d1cea56721..a21a9db1e39a 100644
--- a/clang/test/OpenMP/cancel_codegen.cpp
+++ b/clang/test/OpenMP/cancel_codegen.cpp
@@ -175,7 +175,7 @@ for (int i = 0; i < argc; ++i) {
 
 // IRBUILDER: define internal void @main
 
-// IRBUILDER: [[RETURN:omp.par.exit[^:]*]]
+// IRBUILDER: [[RETURN:omp.par.outlined.exit[^:]*]]
 // IRBUILDER-NEXT: ret void
 // IRBUILDER: [[FLAG:%.+]] = load float, float* @{{.+}},
 
@@ -192,10 +192,8 @@ for (int i = 0; i < argc; ++i) {
 // IRBUILDER: [[CMP:%.+]] = icmp eq i32 [[RES]], 0
 // IRBUILDER: br i1 [[CMP]], label %[[CONTINUE:[^,].+]], label %[[EXIT:.+]]
 // IRBUILDER: [[EXIT]]
-// IRBUILDER: br label %[[EXIT2:.+]]
-// IRBUILDER: [[CONTINUE]]
-// IRBUILDER: br label %[[ELSE:.+]]
-// IRBUILDER: [[EXIT2]]
 // IRBUILDER: br label %[[RETURN]]
+// IRBUILDER: [[CONTINUE]]
+// IRBUILDER: br label %[[ELSE2:.+]]
 
 #endif

diff  --git a/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c 
b/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c
new file mode 100644
index ..552455eb9779
--- /dev/null
+++ b/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c
@@ -0,0 +1,110 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-enable-irbuilder -x c++ 
-emit-llvm %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - 
| FileCheck %s --check-prefixes=ALL,IRBUILDER
+//  %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -x c++ -std=c++11 -triple 
x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o /tmp/t1 %s
+//  %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -x c++ -triple 
x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited 
-std=c++11 -include-pch /tmp/t1 -verify %s -emit-llvm -o - | FileCheck 
--check-prefixes=ALL-DEBUG,IRBUILDER-DEBUG %s
+
+// expected-no-diagnostics
+
+// TODO: Teach the update script to check new functions too.
+
+#ifndef HEADER
+#define HEADER
+
+// ALL-LABEL: @_Z17nested_parallel_0v(
+// ALL-NEXT:  entry:
+// ALL-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 
@__kmpc_global_thread_num(%struct.ident_t* @1)
+// ALL-NEXT:br label [[OMP_PARALLEL:%.*]]
+// ALL:   omp_parallel:
+// ALL-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, 
...) @__kmpc_fork_call(%struct.ident_t* @1, i32 0, void (i32*, i32*, ...)* 
bitcast (void (i32*, i32*)* @_Z17nested_parallel_0v..omp_par.1 to void (i32*, 
i32*, ...)*))
+// ALL-NEXT:br label [[OMP_PAR_OUTLINED_EXIT12:%.*]]
+// ALL:   omp.par.outlined.exit12:
+// ALL-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]]
+// ALL:   omp.par.exit.split:
+// ALL-NEXT:ret void
+//
+void nested_parallel_0(void) {
+#pragma omp parallel
+  {
+#pragma omp parallel
+{
+}
+  }
+}
+
+// ALL-LABEL: @_Z17nested_parallel_1Pfid(
+// ALL-NEXT:  entry:
+// ALL-NEXT:[[R_ADDR:%.*]] = alloca float*, align 8
+// ALL-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4
+// ALL-NEXT:[[B_ADDR:%.*]] = alloca double, align 8
+// ALL-NEXT:store float* [[R:%.*]], float** [[R_ADDR]], align 8
+// ALL-NEXT:store i32 [[A:%.*]], i32* [[A_ADDR]], align 4
+// ALL-NEXT:store double [[B:%.*]], double* [[B_ADDR]], align 8
+// ALL-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 
@__kmpc_global_thread_num(%struct.ident_t* @1)
+// ALL-NEXT:br label [[OMP_PARALLEL:%.*]]
+// ALL:   omp_parallel:
+// ALL-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, 
...) @__kmpc_fork_call(%struct.ident_t* @1, i32 3, void (i32*, i32*, ...)* 
bitcast (void (i32*, i32*, i32*, double*, float**)* 
@_Z17nested_parallel_1Pfid..omp_par.2 to void (i32*, i32*, ...)*), i32* 
[[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]])
+// ALL-NEXT:br label [[OMP_PAR_OUTLINED_EXIT13:%.*]]
+// ALL:   omp.par.outlined.exit13:
+// ALL-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]]
+// ALL:   omp.par.exit.sp

[clang] d87c92e - [OpenMP][FIX] Check only for deterministic part of a generated function name

2020-07-14 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-07-14T22:48:22-05:00
New Revision: d87c92e5a2eca620903ce53592ccbe4f8807abe1

URL: 
https://github.com/llvm/llvm-project/commit/d87c92e5a2eca620903ce53592ccbe4f8807abe1
DIFF: 
https://github.com/llvm/llvm-project/commit/d87c92e5a2eca620903ce53592ccbe4f8807abe1.diff

LOG: [OpenMP][FIX] Check only for deterministic part of a generated function 
name

Added: 


Modified: 
clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
clang/test/OpenMP/remarks_parallel_in_target_state_machine.c

Removed: 




diff  --git 
a/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c 
b/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
index c5152d401c8b..163f0b92468a 100644
--- a/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
+++ b/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
@@ -25,18 +25,18 @@ void bar2(void) {
 
 void foo1(void) {
 #pragma omp target teams // #2
- // all-remark@#2 {{Target region containing the 
parallel region that is specialized. (parallel region ID: 
__omp_outlined__1_wrapper, kernel ID: __omp_offloading_22}}
- // all-remark@#2 {{Target region containing the 
parallel region that is specialized. (parallel region ID: 
__omp_outlined__3_wrapper, kernel ID: __omp_offloading_22}}
+ // all-remark@#2 {{Target region containing the 
parallel region that is specialized. (parallel region ID: 
__omp_outlined__1_wrapper, kernel ID: __omp_offloading}}
+ // all-remark@#2 {{Target region containing the 
parallel region that is specialized. (parallel region ID: 
__omp_outlined__3_wrapper, kernel ID: __omp_offloading}}
   {
 #pragma omp parallel // #3
  // all-remark@#3 {{Found a parallel region that is called 
in a target region but not part of a combined target construct nor nesed inside 
a target construct without intermediate code. This can lead to excessive 
register usage for unrelated target regions in the same translation unit due to 
spurious call edges assumed by ptxas.}}
- // all-remark@#3 {{Specialize parallel region that is 
only reached from a single target region to avoid spurious call edges and 
excessive register usage in other target regions. (parallel region ID: 
__omp_outlined__1_wrapper, kernel ID: __omp_offloading_22}}
+ // all-remark@#3 {{Specialize parallel region that is 
only reached from a single target region to avoid spurious call edges and 
excessive register usage in other target regions. (parallel region ID: 
__omp_outlined__1_wrapper, kernel ID: __omp_offloading}}
 {
 }
 bar1();
 #pragma omp parallel // #4
  // all-remark@#4 {{Found a parallel region that is called 
in a target region but not part of a combined target construct nor nesed inside 
a target construct without intermediate code. This can lead to excessive 
register usage for unrelated target regions in the same translation unit due to 
spurious call edges assumed by ptxas.}}
- // all-remark@#4 {{Specialize parallel region that is 
only reached from a single target region to avoid spurious call edges and 
excessive register usage in other target regions. (parallel region ID: 
__omp_outlined__3_wrapper, kernel ID: __omp_offloading_22}}
+ // all-remark@#4 {{Specialize parallel region that is 
only reached from a single target region to avoid spurious call edges and 
excessive register usage in other target regions. (parallel region ID: 
__omp_outlined__3_wrapper, kernel ID: __omp_offloading}}
 {
 }
   }
@@ -44,19 +44,19 @@ void foo1(void) {
 
 void foo2(void) {
 #pragma omp target teams // #5
- // all-remark@#5 {{Target region containing the 
parallel region that is specialized. (parallel region ID: 
__omp_outlined__5_wrapper, kernel ID: __omp_offloading_22}}
- // all-remark@#5 {{Target region containing the 
parallel region that is specialized. (parallel region ID: 
__omp_outlined__7_wrapper, kernel ID: __omp_offloading_22}}
+ // all-remark@#5 {{Target region containing the 
parallel region that is specialized. (parallel region ID: 
__omp_outlined__5_wrapper, kernel ID: __omp_offloading}}
+ // all-remark@#5 {{Target region containing the 
parallel region that is specialized. (parallel region ID: 
__omp_outlined__7_wrapper, kernel ID: __omp_offloading}}
   {
 #pragma omp parallel // #6
  // all-remark@#6 {{Found a parallel region that is called 
in a target region but not part of a combined target construct nor nesed inside 
a target construct without intermediate code. This can lead to excessive 
register usage for unrelated target regions in the 

[clang] 97ce7fd - [UpdateTestChecks] Match unnamed values like "@[0-9]+" and "![0-9]+"

2020-08-11 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-08-12T01:04:16-05:00
New Revision: 97ce7fd89fcc92d84c1938108388f735d55d372c

URL: 
https://github.com/llvm/llvm-project/commit/97ce7fd89fcc92d84c1938108388f735d55d372c
DIFF: 
https://github.com/llvm/llvm-project/commit/97ce7fd89fcc92d84c1938108388f735d55d372c.diff

LOG: [UpdateTestChecks] Match unnamed values like "@[0-9]+" and "![0-9]+"

With this patch we will match most *uses* of "temporary" named things in
the IR via regular expressions, not their name at creation time. The new
"values" we match are:
  - "unnamed" globals: `@[0-9]+`
  - debug metadata: `!dbg ![0-9]+`
  - loop metadata: `!loop ![0-9]+`
  - tbaa metadata: `!tbaa ![0-9]+`
  - range metadata: `!range ![0-9]+`
  - generic metadata: `metadata ![0-9]+`
  - attributes groups: `#[0-9]`

We still don't match the declarations but that can be done later. This
patch can introduce churn when existing check lines contain the old
hardcoded versions of the above "values". We can add a flag to opt-out,
or opt-in, if necessary.

Reviewed By: arichardson, MaskRay

Differential Revision: https://reviews.llvm.org/D85099

Added: 

llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/various_ir_values.ll

llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/various_ir_values.ll.expected

llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/various_ir_values.ll.funcsig.expected
llvm/test/tools/UpdateTestChecks/update_test_checks/various_ir_values.test

Modified: 
clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected

clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected

llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.funcattrs.expected

llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.plain.expected

llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/scrub_attrs.ll.plain.expected
llvm/utils/UpdateTestChecks/asm.py
llvm/utils/UpdateTestChecks/common.py
llvm/utils/update_cc_test_checks.py
llvm/utils/update_test_checks.py

Removed: 




diff  --git 
a/clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected 
b/clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected
index 48ee67a7165a..8095b10d7877 100644
--- a/clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected
+++ b/clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected
@@ -44,7 +44,7 @@ Foo::Foo(int x) : x(x) {}
 // CHECK-NEXT:[[THIS_ADDR:%.*]] = alloca %class.Foo*, align 8
 // CHECK-NEXT:store %class.Foo* [[THIS:%.*]], %class.Foo** [[THIS_ADDR]], 
align 8
 // CHECK-NEXT:[[THIS1:%.*]] = load %class.Foo*, %class.Foo** 
[[THIS_ADDR]], align 8
-// CHECK-NEXT:call void @_ZN3FooD2Ev(%class.Foo* [[THIS1]]) #2
+// CHECK-NEXT:call void @_ZN3FooD2Ev(%class.Foo* [[THIS1]]) [[ATTR2:#.*]]
 // CHECK-NEXT:ret void
 //
 Foo::~Foo() {}
@@ -70,7 +70,7 @@ int Foo::function_defined_out_of_line(int arg) const { return 
x - arg; }
 // CHECK-NEXT:call void @_ZN3FooC1Ei(%class.Foo* [[F]], i32 1)
 // CHECK-NEXT:[[CALL:%.*]] = call i32 
@_ZNK3Foo23function_defined_inlineEi(%class.Foo* [[F]], i32 2)
 // CHECK-NEXT:[[CALL1:%.*]] = call i32 
@_ZNK3Foo28function_defined_out_of_lineEi(%class.Foo* [[F]], i32 3)
-// CHECK-NEXT:call void @_ZN3FooD1Ev(%class.Foo* [[F]]) #2
+// CHECK-NEXT:call void @_ZN3FooD1Ev(%class.Foo* [[F]]) [[ATTR2]]
 // CHECK-NEXT:ret i32 0
 //
 int main() {

diff  --git 
a/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected
 
b/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected
index e76cf074bdb7..313bd5bcef7c 100644
--- 
a/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected
+++ 
b/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected
@@ -3,7 +3,7 @@
 // RUN: %clang_cc1 -triple=x86_64-unknown-linux-gnu -emit-llvm -o - %s | 
FileCheck %s
 
 // CHECK-LABEL: define {{[^@]+}}@test
-// CHECK-SAME: (i64 [[A:%.*]], i32 [[B:%.*]]) #0
+// CHECK-SAME: (i64 [[A:%.*]], i32 [[B:%.*]]) [[ATTR0:#.*]]
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[A_ADDR:%.*]] = alloca i64, align 8
 // CHECK-NEXT:[[B_ADDR:%.*]] = alloca i32, align 4
@@ -21,7 +21,7 @@ long test(long a, int b) {
 
 // A function with a mangled name
 // CHECK-LABEL: define {{[^@]+}}@_Z4testlii
-// CHECK-SAME: (i64 [[A:%.*]], i32 [[B:%.*]], i32 [[C:%.*]]) #0
+// CHECK-SAME: (i64 [[A:%.*]], i32 [[B:%.*]], i32 [[C:%.*]]) [[ATTR0]]
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[A_ADDR:%.*]] = alloca i64, align 8
 // CHECK-NEXT:[[B_ADDR:%.*]] = alloca i32, align 4

diff  --git 
a/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.funcattrs.expected
 
b/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.fun

[clang] 07c3348 - [OpenMP][NFC] Update test check lines with new script version

2020-08-14 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-08-14T08:59:25-05:00
New Revision: 07c33487faff3067953d61e5e968b6c3d1b845d6

URL: 
https://github.com/llvm/llvm-project/commit/07c33487faff3067953d61e5e968b6c3d1b845d6
DIFF: 
https://github.com/llvm/llvm-project/commit/07c33487faff3067953d61e5e968b6c3d1b845d6.diff

LOG: [OpenMP][NFC] Update test check lines with new script version

Added: 


Modified: 
clang/test/OpenMP/irbuilder_nested_parallel_for.c

Removed: 




diff  --git a/clang/test/OpenMP/irbuilder_nested_parallel_for.c 
b/clang/test/OpenMP/irbuilder_nested_parallel_for.c
index 929a92827689..2ca6fe711e28 100644
--- a/clang/test/OpenMP/irbuilder_nested_parallel_for.c
+++ b/clang/test/OpenMP/irbuilder_nested_parallel_for.c
@@ -11,10 +11,10 @@
 
 // CHECK-LABEL: @_Z14parallel_for_0v(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 
@__kmpc_global_thread_num(%struct.ident_t* @1)
+// CHECK-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 
@__kmpc_global_thread_num(%struct.ident_t* [[GLOB1:@.*]])
 // CHECK-NEXT:br label [[OMP_PARALLEL:%.*]]
 // CHECK:   omp_parallel:
-// CHECK-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, 
...) @__kmpc_fork_call(%struct.ident_t* @1, i32 0, void (i32*, i32*, ...)* 
bitcast (void (i32*, i32*)* @_Z14parallel_for_0v..omp_par to void (i32*, i32*, 
...)*))
+// CHECK-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, 
...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 0, void (i32*, i32*, 
...)* bitcast (void (i32*, i32*)* @_Z14parallel_for_0v..omp_par to void (i32*, 
i32*, ...)*))
 // CHECK-NEXT:br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
 // CHECK:   omp.par.outlined.exit:
 // CHECK-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]]
@@ -23,15 +23,15 @@
 //
 // CHECK-DEBUG-LABEL: @_Z14parallel_for_0v(
 // CHECK-DEBUG-NEXT:  entry:
-// CHECK-DEBUG-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 
@__kmpc_global_thread_num(%struct.ident_t* @1), !dbg !{{[0-9]*}}
+// CHECK-DEBUG-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 
@__kmpc_global_thread_num(%struct.ident_t* [[GLOB1:@.*]]), [[DBG10:!dbg !.*]]
 // CHECK-DEBUG-NEXT:br label [[OMP_PARALLEL:%.*]]
 // CHECK-DEBUG:   omp_parallel:
-// CHECK-DEBUG-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, 
...)*, ...) @__kmpc_fork_call(%struct.ident_t* @1, i32 0, void (i32*, i32*, 
...)* bitcast (void (i32*, i32*)* @_Z14parallel_for_0v..omp_par to void (i32*, 
i32*, ...)*)), !dbg !{{[0-9]*}}
+// CHECK-DEBUG-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, 
...)*, ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 0, void (i32*, 
i32*, ...)* bitcast (void (i32*, i32*)* @_Z14parallel_for_0v..omp_par to void 
(i32*, i32*, ...)*)), [[DBG11:!dbg !.*]]
 // CHECK-DEBUG-NEXT:br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
 // CHECK-DEBUG:   omp.par.outlined.exit:
 // CHECK-DEBUG-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]]
 // CHECK-DEBUG:   omp.par.exit.split:
-// CHECK-DEBUG-NEXT:ret void, !dbg !{{[0-9]*}}
+// CHECK-DEBUG-NEXT:ret void, [[DBG14:!dbg !.*]]
 //
 void parallel_for_0(void) {
 #pragma omp parallel
@@ -50,10 +50,10 @@ void parallel_for_0(void) {
 // CHECK-NEXT:store float* [[R:%.*]], float** [[R_ADDR]], align 8
 // CHECK-NEXT:store i32 [[A:%.*]], i32* [[A_ADDR]], align 4
 // CHECK-NEXT:store double [[B:%.*]], double* [[B_ADDR]], align 8
-// CHECK-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 
@__kmpc_global_thread_num(%struct.ident_t* @1)
+// CHECK-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 
@__kmpc_global_thread_num(%struct.ident_t* [[GLOB1]])
 // CHECK-NEXT:br label [[OMP_PARALLEL:%.*]]
 // CHECK:   omp_parallel:
-// CHECK-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, 
...) @__kmpc_fork_call(%struct.ident_t* @1, i32 3, void (i32*, i32*, ...)* 
bitcast (void (i32*, i32*, i32*, double*, float**)* 
@_Z14parallel_for_1Pfid..omp_par.1 to void (i32*, i32*, ...)*), i32* 
[[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]])
+// CHECK-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, 
...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 3, void (i32*, i32*, 
...)* bitcast (void (i32*, i32*, i32*, double*, float**)* 
@_Z14parallel_for_1Pfid..omp_par.1 to void (i32*, i32*, ...)*), i32* 
[[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]])
 // CHECK-NEXT:br label [[OMP_PAR_OUTLINED_EXIT19:%.*]]
 // CHECK:   omp.par.outlined.exit19:
 // CHECK-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]]
@@ -66,20 +66,20 @@ void parallel_for_0(void) {
 // CHECK-DEBUG-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4
 // CHECK-DEBUG-NEXT:[[B_ADDR:%.*]] = alloca double, align 8
 // CHECK-DEBUG-NEXT:store float* [[R:%.*]], float** [[R_ADDR]], align 8
-// CHECK-DEBUG-NEXT:call void @llvm.dbg.declare(metadata float** 
[[R_ADDR]], metadata !{{[0-9]*}}, metadata !DIExpre

[clang] 95a25e4 - [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156

2020-08-16 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2020-08-16T14:38:31-05:00
New Revision: 95a25e4c3203f35e9f57f9fac620b4a21bffd6e1

URL: 
https://github.com/llvm/llvm-project/commit/95a25e4c3203f35e9f57f9fac620b4a21bffd6e1
DIFF: 
https://github.com/llvm/llvm-project/commit/95a25e4c3203f35e9f57f9fac620b4a21bffd6e1.diff

LOG: [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156

When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.

Verified on the reproducer of PR46156 and QMCPack.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86037

Added: 
clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp

Modified: 
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 1b2608e9854a..d9ef6c2a1078 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -2845,8 +2845,12 @@ static llvm::Value *castValueToType(CodeGenFunction 
&CGF, llvm::Value *Val,
   Address CastItem = CGF.CreateMemTemp(CastTy);
   Address ValCastItem = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
   CastItem, Val->getType()->getPointerTo(CastItem.getAddressSpace()));
-  CGF.EmitStoreOfScalar(Val, ValCastItem, /*Volatile=*/false, ValTy);
-  return CGF.EmitLoadOfScalar(CastItem, /*Volatile=*/false, CastTy, Loc);
+  CGF.EmitStoreOfScalar(Val, ValCastItem, /*Volatile=*/false, ValTy,
+LValueBaseInfo(AlignmentSource::Type),
+TBAAAccessInfo());
+  return CGF.EmitLoadOfScalar(CastItem, /*Volatile=*/false, CastTy, Loc,
+  LValueBaseInfo(AlignmentSource::Type),
+  TBAAAccessInfo());
 }
 
 /// This function creates calls to one of two shuffle functions to copy
@@ -2933,9 +2937,14 @@ static void shuffleAndStore(CodeGenFunction &CGF, 
Address SrcAddr,
ThenBB, ExitBB);
   CGF.EmitBlock(ThenBB);
   llvm::Value *Res = createRuntimeShuffleFunction(
-  CGF, CGF.EmitLoadOfScalar(Ptr, /*Volatile=*/false, IntType, Loc),
+  CGF,
+  CGF.EmitLoadOfScalar(Ptr, /*Volatile=*/false, IntType, Loc,
+   LValueBaseInfo(AlignmentSource::Type),
+   TBAAAccessInfo()),
   IntType, Offset, Loc);
-  CGF.EmitStoreOfScalar(Res, ElemPtr, /*Volatile=*/false, IntType);
+  CGF.EmitStoreOfScalar(Res, ElemPtr, /*Volatile=*/false, IntType,
+LValueBaseInfo(AlignmentSource::Type),
+TBAAAccessInfo());
   Address LocalPtr = Bld.CreateConstGEP(Ptr, 1);
   Address LocalElemPtr = Bld.CreateConstGEP(ElemPtr, 1);
   PhiSrc->addIncoming(LocalPtr.getPointer(), ThenBB);
@@ -2944,9 +2953,14 @@ static void shuffleAndStore(CodeGenFunction &CGF, 
Address SrcAddr,
   CGF.EmitBlock(ExitBB);
 } else {
   llvm::Value *Res = createRuntimeShuffleFunction(
-  CGF, CGF.EmitLoadOfScalar(Ptr, /*Volatile=*/false, IntType, Loc),
+  CGF,
+  CGF.EmitLoadOfScalar(Ptr, /*Volatile=*/false, IntType, Loc,
+   LValueBaseInfo(AlignmentSource::Type),
+   TBAAAccessInfo()),
   IntType, Offset, Loc);
-  CGF.EmitStoreOfScalar(Res, ElemPtr, /*Volatile=*/false, IntType);
+  CGF.EmitStoreOfScalar(Res, ElemPtr, /*Volatile=*/false, IntType,
+LValueBaseInfo(AlignmentSource::Type),
+TBAAAccessInfo());
   Ptr = Bld.CreateConstGEP(Ptr, 1);
   ElemPtr = Bld.CreateConstGEP(ElemPtr, 1);
 }
@@ -3100,12 +3114,14 @@ static void emitReductionListCopy(
 } else {
   switch (CGF.getEvaluationKind(Private->getType())) {
   case TEK_Scalar: {
-llvm::Value *Elem =
-CGF.EmitLoadOfScalar(SrcElementAddr, /*Volatile=*/false,
- Private->getType(), Private->getExprLoc());
+llvm::Value *Elem = CGF.EmitLoadOfScalar(
+SrcElementAddr, /*Volatile=*/false, Private->getType(),
+Private->getExprLoc(), LValueBaseInfo(AlignmentSource::Type),
+TBAAAccessInfo());
 // Store the source element value to the dest element address.
-CGF.EmitStoreOfScalar(Elem, DestElementAddr, /*Volatile=*/false,
-  Private->getType());
+CGF.EmitStoreOfScalar(
+Elem, DestElementAddr, /*Volatile=*/false, Private->getType(),
+ 

[clang] abe71b7 - [OpenMP][NFC] Delete dead code

2023-11-06 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2023-11-06T11:50:41-08:00
New Revision: abe71b77f9ac2fc689e899fa6aa2486120c53912

URL: 
https://github.com/llvm/llvm-project/commit/abe71b77f9ac2fc689e899fa6aa2486120c53912
DIFF: 
https://github.com/llvm/llvm-project/commit/abe71b77f9ac2fc689e899fa6aa2486120c53912.diff

LOG: [OpenMP][NFC] Delete dead code

Added: 


Modified: 
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 335ccec6455fc46..229668c8ba5db20 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -1574,11 +1574,6 @@ enum CopyAction : unsigned {
   RemoteLaneToThread,
   // ThreadCopy: Make a copy of a Reduce list on the thread's stack.
   ThreadCopy,
-  // ThreadToScratchpad: Copy a team-reduced array to the scratchpad.
-  ThreadToScratchpad,
-  // ScratchpadToThread: Copy from a scratchpad array in global memory
-  // containing team-reduced data to a thread's stack.
-  ScratchpadToThread,
 };
 } // namespace
 
@@ -1600,13 +1595,10 @@ static void emitReductionListCopy(
   CGBuilderTy &Bld = CGF.Builder;
 
   llvm::Value *RemoteLaneOffset = CopyOptions.RemoteLaneOffset;
-  llvm::Value *ScratchpadIndex = CopyOptions.ScratchpadIndex;
-  llvm::Value *ScratchpadWidth = CopyOptions.ScratchpadWidth;
 
   // Iterates, element-by-element, through the source Reduce list and
   // make a copy.
   unsigned Idx = 0;
-  unsigned Size = Privates.size();
   for (const Expr *Private : Privates) {
 Address SrcElementAddr = Address::invalid();
 Address DestElementAddr = Address::invalid();
@@ -1616,10 +1608,6 @@ static void emitReductionListCopy(
 // Set to true to update the pointer in the dest Reduce list to a
 // newly created element.
 bool UpdateDestListPtr = false;
-// Increment the src or dest pointer to the scratchpad, for each
-// new element.
-bool IncrScratchpadSrc = false;
-bool IncrScratchpadDest = false;
 QualType PrivatePtrType = C.getPointerType(Private->getType());
 llvm::Type *PrivateLlvmPtrType = CGF.ConvertType(PrivatePtrType);
 
@@ -1655,49 +1643,6 @@ static void emitReductionListCopy(
   PrivatePtrType->castAs());
   break;
 }
-case ThreadToScratchpad: {
-  // Step 1.1: Get the address for the src element in the Reduce list.
-  Address SrcElementPtrAddr = Bld.CreateConstArrayGEP(SrcBase, Idx);
-  SrcElementAddr = CGF.EmitLoadOfPointer(
-  SrcElementPtrAddr.withElementType(PrivateLlvmPtrType),
-  PrivatePtrType->castAs());
-
-  // Step 1.2: Get the address for dest element:
-  // address = base + index * ElementSizeInChars.
-  llvm::Value *ElementSizeInChars = CGF.getTypeSize(Private->getType());
-  llvm::Value *CurrentOffset =
-  Bld.CreateNUWMul(ElementSizeInChars, ScratchpadIndex);
-  llvm::Value *ScratchPadElemAbsolutePtrVal =
-  Bld.CreateNUWAdd(DestBase.getPointer(), CurrentOffset);
-  ScratchPadElemAbsolutePtrVal =
-  Bld.CreateIntToPtr(ScratchPadElemAbsolutePtrVal, CGF.VoidPtrTy);
-  DestElementAddr = Address(ScratchPadElemAbsolutePtrVal, CGF.Int8Ty,
-C.getTypeAlignInChars(Private->getType()));
-  IncrScratchpadDest = true;
-  break;
-}
-case ScratchpadToThread: {
-  // Step 1.1: Get the address for the src element in the scratchpad.
-  // address = base + index * ElementSizeInChars.
-  llvm::Value *ElementSizeInChars = CGF.getTypeSize(Private->getType());
-  llvm::Value *CurrentOffset =
-  Bld.CreateNUWMul(ElementSizeInChars, ScratchpadIndex);
-  llvm::Value *ScratchPadElemAbsolutePtrVal =
-  Bld.CreateNUWAdd(SrcBase.getPointer(), CurrentOffset);
-  ScratchPadElemAbsolutePtrVal =
-  Bld.CreateIntToPtr(ScratchPadElemAbsolutePtrVal, CGF.VoidPtrTy);
-  SrcElementAddr = Address(ScratchPadElemAbsolutePtrVal, CGF.Int8Ty,
-   C.getTypeAlignInChars(Private->getType()));
-  IncrScratchpadSrc = true;
-
-  // Step 1.2: Create a temporary to store the element in the destination
-  // Reduce list.
-  DestElementPtrAddr = Bld.CreateConstArrayGEP(DestBase, Idx);
-  DestElementAddr =
-  CGF.CreateMemTemp(Private->getType(), ".omp.reduction.element");
-  UpdateDestListPtr = true;
-  break;
-}
 }
 
 // Regardless of src and dest of copy, we emit the load of src
@@ -1755,39 +1700,6 @@ static void emitReductionListCopy(
 C.VoidPtrTy);
 }
 
-// Step 4.1: Increment SrcBase/DestBase so that it points to the starting
-// address of the next element in scratchpad memory, unless we're currently
-// processing the last one.  Memory alignment is also taken care of here.
-if ((IncrScratchpadDest || IncrScratchpadSrc)

[clang] 921bd29 - [OpenMP] Remove alignment for global <-> local reduction functions

2023-11-06 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2023-11-06T11:50:41-08:00
New Revision: 921bd299134fbe17c676b2486af269e18281def4

URL: 
https://github.com/llvm/llvm-project/commit/921bd299134fbe17c676b2486af269e18281def4
DIFF: 
https://github.com/llvm/llvm-project/commit/921bd299134fbe17c676b2486af269e18281def4.diff

LOG: [OpenMP] Remove alignment for global <-> local reduction functions

The alignment did likely not help much but increases the memory
requirement. Note that half of the affected accesses are all performed
by a single thread in each block. The reads are by consecutive threads
in a single block.

Added: 


Modified: 
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
clang/test/OpenMP/reduction_implicit_map.cpp
clang/test/OpenMP/target_teams_generic_loop_codegen.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 229668c8ba5db20..b4e067ff497a085 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -85,18 +85,6 @@ class ExecutionRuntimeModesRAII {
   ~ExecutionRuntimeModesRAII() { ExecMode = SavedExecMode; }
 };
 
-/// GPU Configuration:  This information can be derived from cuda registers,
-/// however, providing compile time constants helps generate more efficient
-/// code.  For all practical purposes this is fine because the configuration
-/// is the same for all known NVPTX architectures.
-enum MachineConfiguration : unsigned {
-  /// See "llvm/Frontend/OpenMP/OMPGridValues.h" for various related target
-  /// specific Grid Values like GV_Warp_Size, GV_Slot_Size
-
-  /// Global memory alignment for performance.
-  GlobalMemoryAlignment = 128,
-};
-
 static const ValueDecl *getPrivateItem(const Expr *RefExpr) {
   RefExpr = RefExpr->IgnoreParens();
   if (const auto *ASE = dyn_cast(RefExpr)) {
@@ -119,31 +107,23 @@ static const ValueDecl *getPrivateItem(const Expr 
*RefExpr) {
   return cast(ME->getMemberDecl()->getCanonicalDecl());
 }
 
-
 static RecordDecl *buildRecordForGlobalizedVars(
 ASTContext &C, ArrayRef EscapedDecls,
 ArrayRef EscapedDeclsForTeams,
 llvm::SmallDenseMap
-&MappedDeclsFields, int BufSize) {
+&MappedDeclsFields,
+int BufSize) {
   using VarsDataTy = std::pair;
   if (EscapedDecls.empty() && EscapedDeclsForTeams.empty())
 return nullptr;
   SmallVector GlobalizedVars;
   for (const ValueDecl *D : EscapedDecls)
-GlobalizedVars.emplace_back(
-CharUnits::fromQuantity(std::max(
-C.getDeclAlign(D).getQuantity(),
-static_cast(GlobalMemoryAlignment))),
-D);
+GlobalizedVars.emplace_back(C.getDeclAlign(D), D);
   for (const ValueDecl *D : EscapedDeclsForTeams)
 GlobalizedVars.emplace_back(C.getDeclAlign(D), D);
-  llvm::stable_sort(GlobalizedVars, [](VarsDataTy L, VarsDataTy R) {
-return L.first > R.first;
-  });
 
   // Build struct _globalized_locals_ty {
-  // /*  globalized vars  */[WarSize] align (max(decl_align,
-  // GlobalMemoryAlignment))
+  // /*  globalized vars  */[WarSize] align (decl_align)
   // /*  globalized vars  */ for EscapedDeclsForTeams
   //   };
   RecordDecl *GlobalizedRD = C.buildImplicitRecord("_globalized_locals_ty");
@@ -182,9 +162,7 @@ static RecordDecl *buildRecordForGlobalizedVars(
   /*BW=*/nullptr, /*Mutable=*/false,
   /*InitStyle=*/ICIS_NoInit);
   Field->setAccess(AS_public);
-  llvm::APInt Align(32, std::max(C.getDeclAlign(VD).getQuantity(),
- static_cast(
- GlobalMemoryAlignment)));
+  llvm::APInt Align(32, Pair.first.getQuantity());
   Field->addAttr(AlignedAttr::CreateImplicit(
   C, /*IsAlignmentExpr=*/true,
   IntegerLiteral::Create(C, Align,

diff  --git a/clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp 
b/clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
index 32b67762a1e1e6b..27af206098c10b1 100644
--- a/clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
+++ b/clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
@@ -253,7 +253,7 @@ int bar(int n){
 // CHECK1-NEXT:[[E:%.*]] = getelementptr inbounds 
[[STRUCT__GLOBALIZED_LOCALS_TY:%.*]], ptr [[TMP4]], i32 0, i32 0
 // CHECK1-NEXT:[[TMP8:%.*]] = getelementptr inbounds [1024 x double], ptr 
[[E]], i32 0, i32 [[TMP5]]
 // CHECK1-NEXT:[[TMP9:%.*]] = load double, ptr [[TMP7]], align 8
-// CHECK1-NEXT:store double [[TMP9]], ptr [[TMP8]], align 128
+// CHECK1-NEXT:store double [[TMP9]], ptr [[TMP8]], align 8
 // CHECK1-NEXT:ret void
 //
 //
@@ -294,7 +294,7 @@ int bar(int n){
 // CHECK1-NEXT:[[TMP7:%.*]] = load ptr, ptr [[TMP6]], align 8
 // CHECK1-NEXT:[[E:%.*]] = getelementptr inbounds 
[[STRUCT__GLOBALIZED_LOCALS_TY:%.*]], ptr [[TMP4]], i32 0, i32 0
 // 

[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Plugin::success();
+}
+
+// Allocate and construct the AMDGPU kernel.
+GenericKernelTy *AMDGPUKernel = Plugin.allocate();
+if (!AMDGPUKernel)
+  return Plugin::error("Failed to allocate memory for AMDGPU kernel");
+
+new (AMDGPUKernel) AMDGPUKernelTy(Name);
+if (auto Err = AMDGPUKernel->initImpl(*this, Image))
+  return std::move(Err);
+
+auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>();

jdoerfert wrote:

Here and above you don't need plugin allocate. That's only for things that 
outlive the function, neither the Kernel nor the AsyncInfo will. They should be 
stack objects. That said, you should not need an aysync_info ptr anyway. 
AsyncInfoWrapperTy should work standalone and it has all the functions we need.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy {
   using CUDAStreamManagerTy = GenericDeviceResourceManagerTy;
   using CUDAEventManagerTy = GenericDeviceResourceManagerTy;
 
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ bool IsCtor) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'nvptx-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini",
+sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Plugin::success();
+}
+
+// The Nvidia backend cannot handle creating the ctor / dtor array
+// automatically so we must create it ourselves. The backend will emit
+// several globals that contain function pointers we can call. These are
+// prefixed with a known name due to Nvidia's lack of section support.
+const ELF64LEObjectFile *ELFObj =
+Handler.getOrCreateELFObjectFile(*this, Image);
+if (!ELFObj)
+  return Plugin::error("Unable to create ELF object for image %p",
+   Image.getStart());
+
+// Search for all symbols that contain a constructor or destructor.
+SmallVector> Funcs;
+for (ELFSymbolRef Sym : ELFObj->symbols()) {
+  auto NameOrErr = Sym.getName();
+  if (!NameOrErr)
+return NameOrErr.takeError();
+
+  if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_"
+ : "__fini_array_object_"))
+continue;
+
+  uint16_t priority;
+  if (NameOrErr->rsplit('_').second.getAsInteger(10, priority))
+return Plugin::error("Invalid priority for constructor or destructor");
+
+  Funcs.emplace_back(*NameOrErr, priority);
+}
+
+// Sort the created array to be in priority order.
+llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; });
+
+// Allocate a buffer to store all of the known constructor / destructor
+// functions in so we can iterate them on the device.
+void *Buffer =
+allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED);

jdoerfert wrote:

Do we really need to used shared/managed memory here?

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy {
   using CUDAStreamManagerTy = GenericDeviceResourceManagerTy;
   using CUDAEventManagerTy = GenericDeviceResourceManagerTy;
 
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ bool IsCtor) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'nvptx-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini",
+sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Plugin::success();
+}
+
+// The Nvidia backend cannot handle creating the ctor / dtor array
+// automatically so we must create it ourselves. The backend will emit
+// several globals that contain function pointers we can call. These are
+// prefixed with a known name due to Nvidia's lack of section support.
+const ELF64LEObjectFile *ELFObj =
+Handler.getOrCreateELFObjectFile(*this, Image);
+if (!ELFObj)
+  return Plugin::error("Unable to create ELF object for image %p",
+   Image.getStart());
+
+// Search for all symbols that contain a constructor or destructor.
+SmallVector> Funcs;
+for (ELFSymbolRef Sym : ELFObj->symbols()) {
+  auto NameOrErr = Sym.getName();
+  if (!NameOrErr)
+return NameOrErr.takeError();
+
+  if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_"
+ : "__fini_array_object_"))
+continue;
+
+  uint16_t priority;
+  if (NameOrErr->rsplit('_').second.getAsInteger(10, priority))
+return Plugin::error("Invalid priority for constructor or destructor");
+
+  Funcs.emplace_back(*NameOrErr, priority);
+}
+
+// Sort the created array to be in priority order.
+llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; });
+
+// Allocate a buffer to store all of the known constructor / destructor
+// functions in so we can iterate them on the device.
+void *Buffer =
+allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED);
+if (!Buffer)
+  return Plugin::error("Failed to allocate memory for global buffer");
+
+auto *GlobalPtrStart = reinterpret_cast(Buffer);
+auto *GlobalPtrStop = reinterpret_cast(Buffer) + Funcs.size();
+
+std::size_t Idx = 0;
+for (auto [Name, Priority] : Funcs) {
+  GlobalTy FunctionAddr(Name.str(), sizeof(void *), 
&GlobalPtrStart[Idx++]);
+  if (auto Err = Handler.readGlobalFromDevice(*this, Image, FunctionAddr))
+return std::move(Err);
+}
+
+// Copy the created buffer to the appropriate symbols so the kernel can
+// iterate through them.
+GlobalTy StartGlobal(IsCtor ? "__init_array_start" : "__fini_array_start",
+ sizeof(void *), &GlobalPtrStart);
+if (auto Err = Handler.writeGlobalToDevice(*this, Image, StartGlobal))
+  return std::move(Err);
+
+GlobalTy StopGlobal(IsCtor ? "__init_array_end" : "__fini_array_end",
+sizeof(void *), &GlobalPtrStop);
+if (auto Err = Handler.writeGlobalToDevice(*this, Image, StopGlobal))
+  return std::move(Err);
+
+// Launch the kernel to execute the functions in the buffer.
+GenericKernelTy *CUDAKernel = Plugin.allocate();

jdoerfert wrote:

Same as with AMD.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy {
   using CUDAStreamManagerTy = GenericDeviceResourceManagerTy;
   using CUDAEventManagerTy = GenericDeviceResourceManagerTy;
 
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ bool IsCtor) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'nvptx-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini",
+sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Plugin::success();
+}
+
+// The Nvidia backend cannot handle creating the ctor / dtor array
+// automatically so we must create it ourselves. The backend will emit
+// several globals that contain function pointers we can call. These are
+// prefixed with a known name due to Nvidia's lack of section support.
+const ELF64LEObjectFile *ELFObj =
+Handler.getOrCreateELFObjectFile(*this, Image);
+if (!ELFObj)
+  return Plugin::error("Unable to create ELF object for image %p",
+   Image.getStart());
+
+// Search for all symbols that contain a constructor or destructor.
+SmallVector> Funcs;
+for (ELFSymbolRef Sym : ELFObj->symbols()) {
+  auto NameOrErr = Sym.getName();
+  if (!NameOrErr)
+return NameOrErr.takeError();
+
+  if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_"
+ : "__fini_array_object_"))
+continue;
+
+  uint16_t priority;
+  if (NameOrErr->rsplit('_').second.getAsInteger(10, priority))
+return Plugin::error("Invalid priority for constructor or destructor");
+
+  Funcs.emplace_back(*NameOrErr, priority);
+}
+
+// Sort the created array to be in priority order.
+llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; });
+
+// Allocate a buffer to store all of the known constructor / destructor
+// functions in so we can iterate them on the device.
+void *Buffer =
+allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED);
+if (!Buffer)
+  return Plugin::error("Failed to allocate memory for global buffer");
+
+auto *GlobalPtrStart = reinterpret_cast(Buffer);
+auto *GlobalPtrStop = reinterpret_cast(Buffer) + Funcs.size();
+
+std::size_t Idx = 0;
+for (auto [Name, Priority] : Funcs) {
+  GlobalTy FunctionAddr(Name.str(), sizeof(void *), 
&GlobalPtrStart[Idx++]);
+  if (auto Err = Handler.readGlobalFromDevice(*this, Image, FunctionAddr))
+return std::move(Err);
+}
+
+// Copy the created buffer to the appropriate symbols so the kernel can
+// iterate through them.
+GlobalTy StartGlobal(IsCtor ? "__init_array_start" : "__fini_array_start",
+ sizeof(void *), &GlobalPtrStart);
+if (auto Err = Handler.writeGlobalToDevice(*this, Image, StartGlobal))
+  return std::move(Err);
+
+GlobalTy StopGlobal(IsCtor ? "__init_array_end" : "__fini_array_end",
+sizeof(void *), &GlobalPtrStop);
+if (auto Err = Handler.writeGlobalToDevice(*this, Image, StopGlobal))
+  return std::move(Err);
+
+// Launch the kernel to execute the functions in the buffer.
+GenericKernelTy *CUDAKernel = Plugin.allocate();
+if (!CUDAKernel)
+  return Plugin::error("Failed to allocate memory for CUDA kernel");
+
+new (CUDAKernel)
+CUDAKernelTy(IsCtor ? "nvptx$device$init" : "nvptx$device$fini");

jdoerfert wrote:

> IsCtor ? "nvptx$device$init" : "nvptx$device$fini"

Do this once, other such ternaries as well.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -313,12 +313,18 @@ static void 
registerGlobalCtorsDtorsForImage(__tgt_bin_desc *Desc,
 DP("Adding ctor " DPxMOD " to the pending list.\n",
DPxPTR(Entry->addr));
 Device.PendingCtorsDtors[Desc].PendingCtors.push_back(Entry->addr);
+MESSAGE("Calling deprecated constructor for entry %s will be removed "

jdoerfert wrote:

Add "WARNING: ", or similar.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy {
   using CUDAStreamManagerTy = GenericDeviceResourceManagerTy;
   using CUDAEventManagerTy = GenericDeviceResourceManagerTy;
 
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ bool IsCtor) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'nvptx-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini",
+sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Plugin::success();
+}
+
+// The Nvidia backend cannot handle creating the ctor / dtor array
+// automatically so we must create it ourselves. The backend will emit
+// several globals that contain function pointers we can call. These are
+// prefixed with a known name due to Nvidia's lack of section support.
+const ELF64LEObjectFile *ELFObj =
+Handler.getOrCreateELFObjectFile(*this, Image);
+if (!ELFObj)
+  return Plugin::error("Unable to create ELF object for image %p",
+   Image.getStart());
+
+// Search for all symbols that contain a constructor or destructor.
+SmallVector> Funcs;
+for (ELFSymbolRef Sym : ELFObj->symbols()) {
+  auto NameOrErr = Sym.getName();
+  if (!NameOrErr)
+return NameOrErr.takeError();
+
+  if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_"
+ : "__fini_array_object_"))
+continue;
+
+  uint16_t priority;
+  if (NameOrErr->rsplit('_').second.getAsInteger(10, priority))
+return Plugin::error("Invalid priority for constructor or destructor");
+
+  Funcs.emplace_back(*NameOrErr, priority);
+}
+
+// Sort the created array to be in priority order.
+llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; });
+
+// Allocate a buffer to store all of the known constructor / destructor
+// functions in so we can iterate them on the device.
+void *Buffer =
+allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED);
+if (!Buffer)
+  return Plugin::error("Failed to allocate memory for global buffer");
+
+auto *GlobalPtrStart = reinterpret_cast(Buffer);
+auto *GlobalPtrStop = reinterpret_cast(Buffer) + Funcs.size();
+
+std::size_t Idx = 0;
+for (auto [Name, Priority] : Funcs) {
+  GlobalTy FunctionAddr(Name.str(), sizeof(void *), 
&GlobalPtrStart[Idx++]);
+  if (auto Err = Handler.readGlobalFromDevice(*this, Image, FunctionAddr))
+return std::move(Err);
+}
+
+// Copy the created buffer to the appropriate symbols so the kernel can
+// iterate through them.
+GlobalTy StartGlobal(IsCtor ? "__init_array_start" : "__fini_array_start",
+ sizeof(void *), &GlobalPtrStart);
+if (auto Err = Handler.writeGlobalToDevice(*this, Image, StartGlobal))
+  return std::move(Err);
+
+GlobalTy StopGlobal(IsCtor ? "__init_array_end" : "__fini_array_end",
+sizeof(void *), &GlobalPtrStop);
+if (auto Err = Handler.writeGlobalToDevice(*this, Image, StopGlobal))
+  return std::move(Err);
+
+// Launch the kernel to execute the functions in the buffer.
+GenericKernelTy *CUDAKernel = Plugin.allocate();
+if (!CUDAKernel)
+  return Plugin::error("Failed to allocate memory for CUDA kernel");
+
+new (CUDAKernel)
+CUDAKernelTy(IsCtor ? "nvptx$device$init" : "nvptx$device$fini");
+
+if (auto Err = CUDAKernel->init(*this, Image))
+  return std::move(Err);
+
+AsyncInfoWrapperTy AsyncInfoWrapper(*this, nullptr);
+
+if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper))

jdoerfert wrote:

You shouldn't need this.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert approved this pull request.

LG, check my comments.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -2627,6 +2637,38 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Plugin::success();
+}
+
+// Allocate and construct the AMDGPU kernel.
+AMDGPUKernelTy AMDGPUKernel(Name);
+if (auto Err = AMDGPUKernel.initImpl(*this, Image))

jdoerfert wrote:

Generally, we should always call the generic entry points, so, init, not 
initImpl.
Assuming you have no specific reason not to. Also below for launch.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -0,0 +1,37 @@
+// RUN: %libomptarget-compilexx-run-and-check-generic
+
+// REQUIRES: libc
+
+#include 
+
+#pragma omp begin declare target device_type(nohost)
+
+// CHECK: void ctor1()
+// CHECK: void ctor2()
+// CHECK: void ctor3()
+[[gnu::constructor(101)]] void ctor1() { puts(__PRETTY_FUNCTION__); }
+[[gnu::constructor(102)]] void ctor2() { puts(__PRETTY_FUNCTION__); }
+[[gnu::constructor(103)]] void ctor3() { puts(__PRETTY_FUNCTION__); }

jdoerfert wrote:

put the 103 priority between 101 and 102 to actually test sorting.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -1038,6 +1048,100 @@ struct CUDADeviceTy : public GenericDeviceTy {
   using CUDAStreamManagerTy = GenericDeviceResourceManagerTy;
   using CUDAEventManagerTy = GenericDeviceResourceManagerTy;
 
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ bool IsCtor) {
+const char *KernelName = IsCtor ? "nvptx$device$init" : 
"nvptx$device$fini";
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'nvptx-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(KernelName, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Plugin::success();
+}
+
+// The Nvidia backend cannot handle creating the ctor / dtor array
+// automatically so we must create it ourselves. The backend will emit
+// several globals that contain function pointers we can call. These are
+// prefixed with a known name due to Nvidia's lack of section support.
+const ELF64LEObjectFile *ELFObj =
+Handler.getOrCreateELFObjectFile(*this, Image);
+if (!ELFObj)
+  return Plugin::error("Unable to create ELF object for image %p",
+   Image.getStart());
+
+// Search for all symbols that contain a constructor or destructor.
+SmallVector> Funcs;
+for (ELFSymbolRef Sym : ELFObj->symbols()) {
+  auto NameOrErr = Sym.getName();
+  if (!NameOrErr)
+return NameOrErr.takeError();
+
+  if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_"
+ : "__fini_array_object_"))
+continue;
+
+  uint16_t priority;

jdoerfert wrote:

s/p/P/

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-10 Thread Johannes Doerfert via cfe-commits


@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy {
   using CUDAStreamManagerTy = GenericDeviceResourceManagerTy;
   using CUDAEventManagerTy = GenericDeviceResourceManagerTy;
 
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ bool IsCtor) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'nvptx-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini",
+sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Plugin::success();
+}
+
+// The Nvidia backend cannot handle creating the ctor / dtor array
+// automatically so we must create it ourselves. The backend will emit
+// several globals that contain function pointers we can call. These are
+// prefixed with a known name due to Nvidia's lack of section support.
+const ELF64LEObjectFile *ELFObj =
+Handler.getOrCreateELFObjectFile(*this, Image);
+if (!ELFObj)
+  return Plugin::error("Unable to create ELF object for image %p",
+   Image.getStart());
+
+// Search for all symbols that contain a constructor or destructor.
+SmallVector> Funcs;
+for (ELFSymbolRef Sym : ELFObj->symbols()) {
+  auto NameOrErr = Sym.getName();
+  if (!NameOrErr)
+return NameOrErr.takeError();
+
+  if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_"
+ : "__fini_array_object_"))
+continue;
+
+  uint16_t priority;
+  if (NameOrErr->rsplit('_').second.getAsInteger(10, priority))
+return Plugin::error("Invalid priority for constructor or destructor");
+
+  Funcs.emplace_back(*NameOrErr, priority);
+}
+
+// Sort the created array to be in priority order.
+llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; });
+
+// Allocate a buffer to store all of the known constructor / destructor
+// functions in so we can iterate them on the device.
+void *Buffer =
+allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED);

jdoerfert wrote:

I'm more worried about systems that do not have support than about the time.
If you think it's always supported, we can keep it for now.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 7318fe6 - [OpenMP][FIX] Ensure device reduction geps work for multi-var reductions

2023-11-10 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2023-11-10T14:34:46-08:00
New Revision: 7318fe633487f9b9187e18c7ebd4c6516ded9a22

URL: 
https://github.com/llvm/llvm-project/commit/7318fe633487f9b9187e18c7ebd4c6516ded9a22
DIFF: 
https://github.com/llvm/llvm-project/commit/7318fe633487f9b9187e18c7ebd4c6516ded9a22.diff

LOG: [OpenMP][FIX] Ensure device reduction geps work for multi-var reductions

If we have more than one reduction variable we need to be consistent
wrt. indexing. In 3de645efe30b83ba1b6d7e500486c4f441a17a61 we broke this
as the buffer type was reduced to a singleton but the index computation
was not adjusted to account for that offset. This fixes it by
interleaving the reduction variables properly in a array-of-struct
style. We can revert it back to struct-of-array in a follow up if turns
out to be a problem. I doubt it since half the accesses should benefit
from the locallity this layout offers and only the other half were
consecutive before.

Added: 
openmp/libomptarget/test/offloading/multiple_reductions_simple.c

Modified: 
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
clang/test/OpenMP/target_teams_generic_loop_codegen.cpp
openmp/libomptarget/DeviceRTL/src/Reduction.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index a13d74743c3bd3f..abecf5250f4cf96 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -153,9 +153,11 @@ static RecordDecl *buildRecordForGlobalizedVars(
   Field->addAttr(*I);
   }
 } else {
-  llvm::APInt ArraySize(32, BufSize);
-  Type = C.getConstantArrayType(Type, ArraySize, nullptr,
-ArraySizeModifier::Normal, 0);
+  if (BufSize > 1) {
+llvm::APInt ArraySize(32, BufSize);
+Type = C.getConstantArrayType(Type, ArraySize, nullptr,
+  ArraySizeModifier::Normal, 0);
+  }
   Field = FieldDecl::Create(
   C, GlobalizedRD, Loc, Loc, VD->getIdentifier(), Type,
   C.getTrivialTypeSourceInfo(Type, SourceLocation()),
@@ -2205,8 +2207,7 @@ static llvm::Value *emitListToGlobalCopyFunction(
   llvm::Value *BufferArrPtr = Bld.CreatePointerBitCastOrAddrSpaceCast(
   CGF.EmitLoadOfScalar(AddrBufferArg, /*Volatile=*/false, C.VoidPtrTy, 
Loc),
   LLVMReductionsBufferTy->getPointerTo());
-  llvm::Value *Idxs[] = {llvm::ConstantInt::getNullValue(CGF.Int32Ty),
- CGF.EmitLoadOfScalar(CGF.GetAddrOfLocalVar(&IdxArg),
+  llvm::Value *Idxs[] = {CGF.EmitLoadOfScalar(CGF.GetAddrOfLocalVar(&IdxArg),
   /*Volatile=*/false, C.IntTy,
   Loc)};
   unsigned Idx = 0;
@@ -2224,12 +2225,12 @@ static llvm::Value *emitListToGlobalCopyFunction(
 const ValueDecl *VD = cast(Private)->getDecl();
 // Global = Buffer.VD[Idx];
 const FieldDecl *FD = VarFieldMap.lookup(VD);
+llvm::Value *BufferPtr =
+Bld.CreateInBoundsGEP(LLVMReductionsBufferTy, BufferArrPtr, Idxs);
 LValue GlobLVal = CGF.EmitLValueForField(
-CGF.MakeNaturalAlignAddrLValue(BufferArrPtr, StaticTy), FD);
+CGF.MakeNaturalAlignAddrLValue(BufferPtr, StaticTy), FD);
 Address GlobAddr = GlobLVal.getAddress(CGF);
-llvm::Value *BufferPtr = Bld.CreateInBoundsGEP(GlobAddr.getElementType(),
-   GlobAddr.getPointer(), 
Idxs);
-GlobLVal.setAddress(Address(BufferPtr,
+GlobLVal.setAddress(Address(GlobAddr.getPointer(),
 CGF.ConvertTypeForMem(Private->getType()),
 GlobAddr.getAlignment()));
 switch (CGF.getEvaluationKind(Private->getType())) {
@@ -2316,8 +2317,7 @@ static llvm::Value *emitListToGlobalReduceFunction(
   Address ReductionList =
   CGF.CreateMemTemp(ReductionArrayTy, ".omp.reduction.red_list");
   auto IPriv = Privates.begin();
-  llvm::Value *Idxs[] = {llvm::ConstantInt::getNullValue(CGF.Int32Ty),
- CGF.EmitLoadOfScalar(CGF.GetAddrOfLocalVar(&IdxArg),
+  llvm::Value *Idxs[] = {CGF.EmitLoadOfScalar(CGF.GetAddrOfLocalVar(&IdxArg),
   /*Volatile=*/false, C.IntTy,
   Loc)};
   unsigned Idx = 0;
@@ -2326,12 +2326,13 @@ static llvm::Value *emitListToGlobalReduceFunction(
 // Global = Buffer.VD[Idx];
 const ValueDecl *VD = cast(*IPriv)->getDecl();
 const FieldDecl *FD = VarFieldMap.lookup(VD);
+llvm::Value *BufferPtr =
+Bld.CreateInBoundsGEP(LLVMReductionsBufferTy, BufferArrPtr, Idxs);
 LValue GlobLVal = CGF.EmitLValueForField(
-CGF.MakeNaturalAlignAddrLValue(BufferArrPtr, StaticTy), FD);
+CGF.MakeNaturalAlignAddrLValue(Bu

[clang-tools-extra] [llvm] [openmp] [clang] [OpenMP] Add extra flags to libomptarget and plugin builds (PR #74520)

2023-12-11 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert updated 
https://github.com/llvm/llvm-project/pull/74520

>From f505868953d07125f67bcbb79be426a6deee1a13 Mon Sep 17 00:00:00 2001
From: Johannes Doerfert 
Date: Tue, 5 Dec 2023 12:35:04 -0800
Subject: [PATCH 1/2] [OpenMP] Add extra flags to libomptarget and plugin
 builds

---
 openmp/libomptarget/CMakeLists.txt| 19 +++
 .../plugins-nextgen/common/CMakeLists.txt |  3 +++
 openmp/libomptarget/src/CMakeLists.txt|  3 +++
 3 files changed, 25 insertions(+)

diff --git a/openmp/libomptarget/CMakeLists.txt 
b/openmp/libomptarget/CMakeLists.txt
index 972b887c7c952..fe895d5bc3254 100644
--- a/openmp/libomptarget/CMakeLists.txt
+++ b/openmp/libomptarget/CMakeLists.txt
@@ -75,6 +75,25 @@ if(LIBOMPTARGET_ENABLE_DEBUG)
   add_definitions(-DOMPTARGET_DEBUG)
 endif()
 
+# No exceptions and no RTTI, except if requested.
+set(offload_compile_flags -fno-exceptions)
+if(NOT LLVM_ENABLE_RTTI)
+   set(offload_compile_flags ${offload_compile_flags} -fno-rtti)
+endif()
+
+# If LTO is not explicitly disabled we check if we can enable it and do so.
+set(LIBOMPTARGET_USE_LTO TRUE CACHE BOOL "Use LTO for the offload runtimes if 
available")
+if (LIBOMPTARGET_USE_LTO)
+   include(CheckIPOSupported)
+   check_ipo_supported(RESULT use_lto OUTPUT output)
+   if(use_lto)
+   set(offload_compile_flags ${offload_compile_flags} -flto)
+   set(offload_link_flags ${offload_link_flags} -flto)
+   else()
+  message(WARNING "LTO is not supported: ${output}")
+   endif()
+endif()
+
 # OMPT support for libomptarget
 # Follow host OMPT support and check if host support has been requested.
 # LIBOMP_HAVE_OMPT_SUPPORT indicates whether host OMPT support has been 
implemented.
diff --git a/openmp/libomptarget/plugins-nextgen/common/CMakeLists.txt 
b/openmp/libomptarget/plugins-nextgen/common/CMakeLists.txt
index 5b332ed3d2f41..8ae3ff2a6d291 100644
--- a/openmp/libomptarget/plugins-nextgen/common/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/common/CMakeLists.txt
@@ -88,6 +88,9 @@ target_compile_definitions(PluginCommon PRIVATE
   DEBUG_PREFIX="PluginInterface"
 )
 
+target_compile_options(PluginCommon PUBLIC ${offload_compile_flags})
+target_link_options(PluginCommon PUBLIC ${offload_link_flags})
+
 target_include_directories(PluginCommon
   PRIVATE
   ${LIBOMPTARGET_INCLUDE_DIR}
diff --git a/openmp/libomptarget/src/CMakeLists.txt 
b/openmp/libomptarget/src/CMakeLists.txt
index 7c311f738ac8e..253e9f0aa176f 100644
--- a/openmp/libomptarget/src/CMakeLists.txt
+++ b/openmp/libomptarget/src/CMakeLists.txt
@@ -55,6 +55,9 @@ target_compile_definitions(omptarget PRIVATE
   DEBUG_PREFIX="omptarget"
 )
 
+target_compile_options(omptarget PUBLIC ${offload_compile_flags})
+target_link_options(omptarget PUBLIC ${offload_link_flags})
+
 # libomptarget.so needs to be aware of where the plugins live as they
 # are now separated in the build directory.
 set_target_properties(omptarget PROPERTIES

>From f96dc0a43075dcf9700b7813550ed687136b0d0a Mon Sep 17 00:00:00 2001
From: Johannes Doerfert 
Date: Mon, 11 Dec 2023 10:34:28 -0800
Subject: [PATCH 2/2] Update CMakeLists.txt

---
 openmp/libomptarget/CMakeLists.txt | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/openmp/libomptarget/CMakeLists.txt 
b/openmp/libomptarget/CMakeLists.txt
index fe895d5bc3254..21ecb6ddba3dc 100644
--- a/openmp/libomptarget/CMakeLists.txt
+++ b/openmp/libomptarget/CMakeLists.txt
@@ -78,20 +78,20 @@ endif()
 # No exceptions and no RTTI, except if requested.
 set(offload_compile_flags -fno-exceptions)
 if(NOT LLVM_ENABLE_RTTI)
-   set(offload_compile_flags ${offload_compile_flags} -fno-rtti)
+  set(offload_compile_flags ${offload_compile_flags} -fno-rtti)
 endif()
 
 # If LTO is not explicitly disabled we check if we can enable it and do so.
 set(LIBOMPTARGET_USE_LTO TRUE CACHE BOOL "Use LTO for the offload runtimes if 
available")
 if (LIBOMPTARGET_USE_LTO)
-   include(CheckIPOSupported)
-   check_ipo_supported(RESULT use_lto OUTPUT output)
-   if(use_lto)
-   set(offload_compile_flags ${offload_compile_flags} -flto)
-   set(offload_link_flags ${offload_link_flags} -flto)
-   else()
-  message(WARNING "LTO is not supported: ${output}")
-   endif()
+  include(CheckIPOSupported)
+  check_ipo_supported(RESULT use_lto OUTPUT output)
+  if(use_lto)
+set(offload_compile_flags ${offload_compile_flags} -flto)
+set(offload_link_flags ${offload_link_flags} -flto)
+  else()
+message(WARNING "LTO is not supported: ${output}")
+  endif()
 endif()
 
 # OMPT support for libomptarget

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [openmp] [clang] [OpenMP] Add extra flags to libomptarget and plugin builds (PR #74520)

2023-12-11 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert closed 
https://github.com/llvm/llvm-project/pull/74520
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/72697
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Johannes Doerfert via cfe-commits


@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant 
*Addr, StringRef Name,
   M.getDataLayout().getDefaultGlobalsAddressSpace());
 
   // The entry has to be created in the section the linker expects it to be.
-  Entry->setSection(SectionName);
+  if (Triple.isOSBinFormatCOFF())
+Entry->setSection((SectionName + "$OE").str());
+  else
+Entry->setSection(SectionName);
   Entry->setAlignment(Align(1));
 }
 
 std::pair
 offloading::getOffloadEntryArray(Module &M, StringRef SectionName) {
-  auto *EntriesB =
-  new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0),
- /*isConstant=*/true, GlobalValue::ExternalLinkage,
- /*Initializer=*/nullptr, "__start_" + SectionName);
+  llvm::Triple Triple(M.getTargetTriple());
+
+  auto *ZeroInitilaizer =
+  ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u));
+  auto *EntryInit = Triple.isOSBinFormatCOFF() ? ZeroInitilaizer : nullptr;
+  auto *EntryType = Triple.isOSBinFormatCOFF()
+? ZeroInitilaizer->getType()
+: ArrayType::get(getEntryTy(M), 0);

jdoerfert wrote:

I don't see why we need the ternary here, aren't both options the same?

https://github.com/llvm/llvm-project/pull/72697
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Johannes Doerfert via cfe-commits


@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant 
*Addr, StringRef Name,
   M.getDataLayout().getDefaultGlobalsAddressSpace());
 
   // The entry has to be created in the section the linker expects it to be.
-  Entry->setSection(SectionName);
+  if (Triple.isOSBinFormatCOFF())
+Entry->setSection((SectionName + "$OE").str());
+  else
+Entry->setSection(SectionName);
   Entry->setAlignment(Align(1));
 }
 
 std::pair
 offloading::getOffloadEntryArray(Module &M, StringRef SectionName) {
-  auto *EntriesB =
-  new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0),
- /*isConstant=*/true, GlobalValue::ExternalLinkage,
- /*Initializer=*/nullptr, "__start_" + SectionName);
+  llvm::Triple Triple(M.getTargetTriple());

jdoerfert wrote:

This should be in the llvm namespace.

https://github.com/llvm/llvm-project/pull/72697
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)

2023-11-20 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert approved this pull request.

LG, two nits.

https://github.com/llvm/llvm-project/pull/72697
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [Clang][OpenMP] Fix ordering of processing of map clauses when mapping a struct. (PR #72410)

2023-11-20 Thread Johannes Doerfert via cfe-commits


@@ -7742,15 +7744,42 @@ class MappableExprsHandler {
   else if (C->getMapType() == OMPC_MAP_alloc)
 Kind = Allocs;
   const auto *EI = C->getVarRefs().begin();
-  for (const auto L : C->component_lists()) {
-const Expr *E = (C->getMapLoc().isValid()) ? *EI : nullptr;
-InfoGen(std::get<0>(L), Kind, std::get<1>(L), C->getMapType(),
-C->getMapTypeModifiers(), std::nullopt,
-/*ReturnDevicePointer=*/false, C->isImplicit(), std::get<2>(L),
-E);
-++EI;
+  if (*EI && !isa(*EI)) {
+for (const auto L : C->component_lists()) {
+  const Expr *E = (C->getMapLoc().isValid()) ? *EI : nullptr;
+  InfoGen(std::get<0>(L), Kind, std::get<1>(L), C->getMapType(),
+  C->getMapTypeModifiers(), std::nullopt,
+  /*ReturnDevicePointer=*/false, C->isImplicit(),
+  std::get<2>(L), E);
+  ++EI;
+}
+  }
+}
+
+// Process the maps with sections.
+for (const auto *Cl : Clauses) {
+  const auto *C = dyn_cast(Cl);
+  if (!C)
+continue;
+  MapKind Kind = Other;
+  if (llvm::is_contained(C->getMapTypeModifiers(),
+ OMPC_MAP_MODIFIER_present))
+Kind = Present;
+  else if (C->getMapType() == OMPC_MAP_alloc)
+Kind = Allocs;
+  const auto *EI = C->getVarRefs().begin();
+  if (*EI && isa(*EI)) {
+for (const auto L : C->component_lists()) {
+  const Expr *E = (C->getMapLoc().isValid()) ? *EI : nullptr;
+  InfoGen(std::get<0>(L), Kind, std::get<1>(L), C->getMapType(),
+  C->getMapTypeModifiers(), std::nullopt,
+  /*ReturnDevicePointer=*/false, C->isImplicit(),
+  std::get<2>(L), E);
+  ++EI;
+}

jdoerfert wrote:

This duplicates the loop nest, which is very unfortunate. Why not actually sort 
the clause list? That will also make it easier to add/change things in the 
future, e.g., we simply modify the comparator.

https://github.com/llvm/llvm-project/pull/72410
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Robustify openmp test (PR #69739)

2023-10-28 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert approved this pull request.

LG, thx

https://github.com/llvm/llvm-project/pull/69739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP] Unify the min/max thread/teams pathways (PR #70273)

2023-10-28 Thread Johannes Doerfert via cfe-commits


@@ -1,68 +1,20 @@
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu 
-x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck 
-allow-deprecated-dag-overlap  %s -check-prefix=CHECK1
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ 
-std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ 
-triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s 
-emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s 
-check-prefix=CHECK1
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ 
-triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck 
-allow-deprecated-dag-overlap  %s  -check-prefix=CHECK1
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ 
-std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple 
i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | 
FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK1
-
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu 
-x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck 
-allow-deprecated-dag-overlap  %s -check-prefix=CHECK2
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ 
-std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ 
-triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s 
-emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s 
-check-prefix=CHECK2
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ 
-triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck 
-allow-deprecated-dag-overlap  %s  -check-prefix=CHECK2
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ 
-std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple 
i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | 
FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK2
-
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu 
-x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck 
-allow-deprecated-dag-overlap  %s -check-prefix=CHECK3
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ 
-std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ 
-triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s 
-emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s 
-check-prefix=CHECK3
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ 
-triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck 
-allow-deprecated-dag-overlap  %s  -check-prefix=CHECK3
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ 
-std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple 
i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | 
FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK3
-
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu 
-x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck 
-allow-deprecated-dag-overlap  %s -check-prefix=CHECK4
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ 
-std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ 
-triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s 
-emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap  %s 
-check-prefix=CHECK4
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ 
-triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck 
-allow-deprecated-dag-overlap  %s  -check-prefix=CHECK4
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ 
-std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple 
i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | 
FileCheck -allow-deprecated-dag-overlap  %s  -check-prefix=CHECK4
-
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu 
-x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck 
-allow-deprecated-dag-overlap  %s -check-prefix=CHECK5
-// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ 
-std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
-// RU

[clang] [OpenMP] Associate the KernelEnvironment with the GenericKernelTy (PR #70383)

2023-10-29 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert updated 
https://github.com/llvm/llvm-project/pull/70383

>From fa6d6d9cf6398915f911e06eecc78c7ba83d3623 Mon Sep 17 00:00:00 2001
From: Johannes Doerfert 
Date: Wed, 25 Oct 2023 16:46:01 -0700
Subject: [PATCH] [OpenMP] Associate the KernelEnvironment with the
 GenericKernelTy

By associating the kernel environment with the generic kernel we can
access middle-end information easily, including the launch bounds ranges
that are acceptable. By constraining the number of threads accordingly,
we now obey the user provided bounds that were passed via attributes.
---
 clang/test/OpenMP/bug57757.cpp| 15 ++--
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  4 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  8 +-
 .../PluginInterface/PluginInterface.cpp   | 74 +++
 .../common/PluginInterface/PluginInterface.h  | 39 +-
 .../plugins-nextgen/cuda/src/rtl.cpp  |  8 +-
 .../generic-elf-64bit/src/rtl.cpp | 20 ++---
 .../test/offloading/default_thread_limit.c|  3 +-
 .../test/offloading/thread_state_1.c  |  4 +-
 .../test/offloading/thread_state_2.c  |  4 +-
 10 files changed, 74 insertions(+), 105 deletions(-)

diff --git a/clang/test/OpenMP/bug57757.cpp b/clang/test/OpenMP/bug57757.cpp
index 7894796ac46284c..7acfe134ddd0baf 100644
--- a/clang/test/OpenMP/bug57757.cpp
+++ b/clang/test/OpenMP/bug57757.cpp
@@ -32,24 +32,23 @@ void foo() {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[TMP2:%.*]] = getelementptr inbounds 
[[STRUCT_KMP_TASK_T:%.*]], ptr [[TMP1]], i64 0, i32 2
 // CHECK-NEXT:tail call void 
@llvm.experimental.noalias.scope.decl(metadata [[META13:![0-9]+]])
-// CHECK-NEXT:tail call void 
@llvm.experimental.noalias.scope.decl(metadata [[META16:![0-9]+]])
-// CHECK-NEXT:[[TMP3:%.*]] = load i32, ptr [[TMP2]], align 4, !tbaa 
[[TBAA18:![0-9]+]], !alias.scope !13, !noalias !16
+// CHECK-NEXT:[[TMP3:%.*]] = load i32, ptr [[TMP2]], align 4, !tbaa 
[[TBAA16:![0-9]+]], !alias.scope !13, !noalias !17
 // CHECK-NEXT:switch i32 [[TMP3]], label [[DOTOMP_OUTLINED__EXIT:%.*]] [
 // CHECK-NEXT:i32 0, label [[DOTUNTIED_JMP__I:%.*]]
 // CHECK-NEXT:i32 1, label [[DOTUNTIED_NEXT__I:%.*]]
 // CHECK-NEXT:]
 // CHECK:   .untied.jmp..i:
-// CHECK-NEXT:store i32 1, ptr [[TMP2]], align 4, !tbaa [[TBAA18]], 
!alias.scope !13, !noalias !16
-// CHECK-NEXT:[[TMP4:%.*]] = tail call i32 @__kmpc_omp_task(ptr nonnull 
@[[GLOB1]], i32 [[TMP0]], ptr [[TMP1]]), !noalias !19
+// CHECK-NEXT:store i32 1, ptr [[TMP2]], align 4, !tbaa [[TBAA16]], 
!alias.scope !13, !noalias !17
+// CHECK-NEXT:[[TMP4:%.*]] = tail call i32 @__kmpc_omp_task(ptr nonnull 
@[[GLOB1]], i32 [[TMP0]], ptr [[TMP1]]), !noalias !13
 // CHECK-NEXT:br label [[DOTOMP_OUTLINED__EXIT]]
 // CHECK:   .untied.next..i:
 // CHECK-NEXT:[[TMP5:%.*]] = getelementptr inbounds 
[[STRUCT_KMP_TASK_T_WITH_PRIVATES:%.*]], ptr [[TMP1]], i64 0, i32 1
 // CHECK-NEXT:[[TMP6:%.*]] = getelementptr inbounds 
[[STRUCT_KMP_TASK_T_WITH_PRIVATES]], ptr [[TMP1]], i64 0, i32 1, i32 2
 // CHECK-NEXT:[[TMP7:%.*]] = getelementptr inbounds 
[[STRUCT_KMP_TASK_T_WITH_PRIVATES]], ptr [[TMP1]], i64 0, i32 1, i32 1
-// CHECK-NEXT:[[TMP8:%.*]] = load ptr, ptr [[TMP5]], align 8, !tbaa 
[[TBAA20:![0-9]+]], !alias.scope !16, !noalias !13
-// CHECK-NEXT:[[TMP9:%.*]] = load i32, ptr [[TMP7]], align 4, !tbaa 
[[TBAA18]], !alias.scope !16, !noalias !13
-// CHECK-NEXT:[[TMP10:%.*]] = load float, ptr [[TMP6]], align 4, !tbaa 
[[TBAA21:![0-9]+]], !alias.scope !16, !noalias !13
-// CHECK-NEXT:tail call void [[TMP8]](i32 noundef [[TMP9]], float noundef 
[[TMP10]]) #[[ATTR2:[0-9]+]], !noalias !19
+// CHECK-NEXT:[[TMP8:%.*]] = load ptr, ptr [[TMP5]], align 8, !tbaa 
[[TBAA19:![0-9]+]], !noalias !13
+// CHECK-NEXT:[[TMP9:%.*]] = load i32, ptr [[TMP7]], align 4, !tbaa 
[[TBAA16]], !noalias !13
+// CHECK-NEXT:[[TMP10:%.*]] = load float, ptr [[TMP6]], align 4, !tbaa 
[[TBAA20:![0-9]+]], !noalias !13
+// CHECK-NEXT:tail call void [[TMP8]](i32 noundef [[TMP9]], float noundef 
[[TMP10]]) #[[ATTR2:[0-9]+]], !noalias !13
 // CHECK-NEXT:br label [[DOTOMP_OUTLINED__EXIT]]
 // CHECK:   .omp_outlined..exit:
 // CHECK-NEXT:ret i32 0
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 3e4e030f44c7fe0..b320d77652e1cba 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -4093,8 +4093,8 @@ OpenMPIRBuilder::createTargetInit(const 
LocationDescription &Loc, bool IsSPMD,
 
   Function *Kernel = Builder.GetInsertBlock()->getParent();
 
-  /// Manifest the launch configuration in the metadata matching the kernel
-  /// environment.
+  // Manifest the launch configuration in the metadata matching the kernel
+  // environment.
   if (MinTeamsVal > 1 || MaxTeamsVal > 0)
 writeTeamsForKernel(T, *Kernel, MinTeamsVal, MaxTeams

[clang] [OpenMP] Associate the KernelEnvironment with the GenericKernelTy (PR #70383)

2023-10-29 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert closed 
https://github.com/llvm/llvm-project/pull/70383
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [mlir] [clang] [OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (PR #70401)

2023-10-31 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/70401
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [mlir] [clang] [OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (PR #70401)

2023-10-31 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert closed 
https://github.com/llvm/llvm-project/pull/70401
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Non racy team reductions (PR #70752)

2023-10-31 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert updated 
https://github.com/llvm/llvm-project/pull/70752

>From 04aafdce6f259e31304ed47118a56042b155bd77 Mon Sep 17 00:00:00 2001
From: Johannes Doerfert 
Date: Mon, 30 Oct 2023 16:39:00 -0700
Subject: [PATCH] [OpenMP][FIX] Allocate per launch memory for GPU team
 reductions

We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instances of a kernel,
or different kernels with team reductions, would use the same locations.
Since we now have the kernel launch environment, we can allocate dynamic
memory per-launch, allowing us to move all the state into a non-racy
place.

Fixes: https://github.com/llvm/llvm-project/issues/70249
---
 clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp  |  75 ++
 clang/lib/CodeGen/CGOpenMPRuntimeGPU.h|   2 -
 .../OpenMP/nvptx_teams_reduction_codegen.cpp  | 240 +-
 .../target_teams_generic_loop_codegen.cpp |  20 +-
 .../DeviceRTL/include/Interface.h |   2 +
 .../libomptarget/DeviceRTL/src/Reduction.cpp  |  10 +-
 openmp/libomptarget/include/Environment.h |   7 +-
 .../PluginInterface/PluginInterface.cpp   |  11 +
 .../common/PluginInterface/PluginInterface.h  |   2 +-
 .../parallel_target_teams_reduction.cpp   |  36 +++
 10 files changed, 221 insertions(+), 184 deletions(-)
 create mode 100644 
openmp/libomptarget/test/offloading/parallel_target_teams_reduction.cpp

diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index bd9329b8e2d4113..0ed665e0dfb9722 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -803,8 +803,30 @@ void CGOpenMPRuntimeGPU::emitKernelDeinit(CodeGenFunction 
&CGF,
   if (!IsSPMD)
 emitGenericVarsEpilog(CGF);
 
+  // This is temporary until we remove the fixed sized buffer.
+  ASTContext &C = CGM.getContext();
+  RecordDecl *StaticRD = C.buildImplicitRecord(
+  "_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union);
+  StaticRD->startDefinition();
+  for (const RecordDecl *TeamReductionRec : TeamsReductions) {
+QualType RecTy = C.getRecordType(TeamReductionRec);
+auto *Field = FieldDecl::Create(
+C, StaticRD, SourceLocation(), SourceLocation(), nullptr, RecTy,
+C.getTrivialTypeSourceInfo(RecTy, SourceLocation()),
+/*BW=*/nullptr, /*Mutable=*/false,
+/*InitStyle=*/ICIS_NoInit);
+Field->setAccess(AS_public);
+StaticRD->addDecl(Field);
+  }
+  StaticRD->completeDefinition();
+  QualType StaticTy = C.getRecordType(StaticRD);
+  llvm::Type *LLVMReductionsBufferTy =
+  CGM.getTypes().ConvertTypeForMem(StaticTy);
+  const auto &DL = CGM.getModule().getDataLayout();
+  uint64_t BufferSize =
+  DL.getTypeAllocSize(LLVMReductionsBufferTy).getFixedValue();
   CGBuilderTy &Bld = CGF.Builder;
-  OMPBuilder.createTargetDeinit(Bld);
+  OMPBuilder.createTargetDeinit(Bld, BufferSize);
 }
 
 void CGOpenMPRuntimeGPU::emitSPMDKernel(const OMPExecutableDirective &D,
@@ -2998,15 +3020,10 @@ void CGOpenMPRuntimeGPU::emitReduction(
 CGM.getContext(), PrivatesReductions, std::nullopt, VarFieldMap,
 C.getLangOpts().OpenMPCUDAReductionBufNum);
 TeamsReductions.push_back(TeamReductionRec);
-if (!KernelTeamsReductionPtr) {
-  KernelTeamsReductionPtr = new llvm::GlobalVariable(
-  CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/true,
-  llvm::GlobalValue::InternalLinkage, nullptr,
-  "_openmp_teams_reductions_buffer_$_$ptr");
-}
-llvm::Value *GlobalBufferPtr = CGF.EmitLoadOfScalar(
-Address(KernelTeamsReductionPtr, CGF.VoidPtrTy, CGM.getPointerAlign()),
-/*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc);
+auto *KernelTeamsReductionPtr = CGF.EmitRuntimeCall(
+OMPBuilder.getOrCreateRuntimeFunction(
+CGM.getModule(), OMPRTL___kmpc_reduction_get_fixed_buffer),
+{}, "_openmp_teams_reductions_buffer_$_$ptr");
 llvm::Value *GlobalToBufferCpyFn = ::emitListToGlobalCopyFunction(
 CGM, Privates, ReductionArrayTy, Loc, TeamReductionRec, VarFieldMap);
 llvm::Value *GlobalToBufferRedFn = ::emitListToGlobalReduceFunction(
@@ -3021,7 +3038,7 @@ void CGOpenMPRuntimeGPU::emitReduction(
 llvm::Value *Args[] = {
 RTLoc,
 ThreadId,
-GlobalBufferPtr,
+KernelTeamsReductionPtr,
 CGF.Builder.getInt32(C.getLangOpts().OpenMPCUDAReductionBufNum),
 RL,
 ShuffleAndReduceFn,
@@ -3654,42 +3671,6 @@ void CGOpenMPRuntimeGPU::processRequiresDirective(
   CGOpenMPRuntime::processRequiresDirective(D);
 }
 
-void CGOpenMPRuntimeGPU::clear() {
-
-  if (!TeamsReductions.empty()) {
-ASTContext &C = CGM.getContext();
-RecordDecl *StaticRD = C.buildImplicitRecord(
-"_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union);
-StaticRD->startDefinition();
-for (const RecordDecl *TeamReductionRec : Team

[clang] [openmp] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)

2023-10-31 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/70752
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)

2023-10-31 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/70752
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)

2023-10-31 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/70752
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP] Team reduction work specialization (PR #70766)

2023-10-31 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert updated 
https://github.com/llvm/llvm-project/pull/70766

>From 04aafdce6f259e31304ed47118a56042b155bd77 Mon Sep 17 00:00:00 2001
From: Johannes Doerfert 
Date: Mon, 30 Oct 2023 16:39:00 -0700
Subject: [PATCH 1/2] [OpenMP][FIX] Allocate per launch memory for GPU team
 reductions

We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instances of a kernel,
or different kernels with team reductions, would use the same locations.
Since we now have the kernel launch environment, we can allocate dynamic
memory per-launch, allowing us to move all the state into a non-racy
place.

Fixes: https://github.com/llvm/llvm-project/issues/70249
---
 clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp  |  75 ++
 clang/lib/CodeGen/CGOpenMPRuntimeGPU.h|   2 -
 .../OpenMP/nvptx_teams_reduction_codegen.cpp  | 240 +-
 .../target_teams_generic_loop_codegen.cpp |  20 +-
 .../DeviceRTL/include/Interface.h |   2 +
 .../libomptarget/DeviceRTL/src/Reduction.cpp  |  10 +-
 openmp/libomptarget/include/Environment.h |   7 +-
 .../PluginInterface/PluginInterface.cpp   |  11 +
 .../common/PluginInterface/PluginInterface.h  |   2 +-
 .../parallel_target_teams_reduction.cpp   |  36 +++
 10 files changed, 221 insertions(+), 184 deletions(-)
 create mode 100644 
openmp/libomptarget/test/offloading/parallel_target_teams_reduction.cpp

diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index bd9329b8e2d4113..0ed665e0dfb9722 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -803,8 +803,30 @@ void CGOpenMPRuntimeGPU::emitKernelDeinit(CodeGenFunction 
&CGF,
   if (!IsSPMD)
 emitGenericVarsEpilog(CGF);
 
+  // This is temporary until we remove the fixed sized buffer.
+  ASTContext &C = CGM.getContext();
+  RecordDecl *StaticRD = C.buildImplicitRecord(
+  "_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union);
+  StaticRD->startDefinition();
+  for (const RecordDecl *TeamReductionRec : TeamsReductions) {
+QualType RecTy = C.getRecordType(TeamReductionRec);
+auto *Field = FieldDecl::Create(
+C, StaticRD, SourceLocation(), SourceLocation(), nullptr, RecTy,
+C.getTrivialTypeSourceInfo(RecTy, SourceLocation()),
+/*BW=*/nullptr, /*Mutable=*/false,
+/*InitStyle=*/ICIS_NoInit);
+Field->setAccess(AS_public);
+StaticRD->addDecl(Field);
+  }
+  StaticRD->completeDefinition();
+  QualType StaticTy = C.getRecordType(StaticRD);
+  llvm::Type *LLVMReductionsBufferTy =
+  CGM.getTypes().ConvertTypeForMem(StaticTy);
+  const auto &DL = CGM.getModule().getDataLayout();
+  uint64_t BufferSize =
+  DL.getTypeAllocSize(LLVMReductionsBufferTy).getFixedValue();
   CGBuilderTy &Bld = CGF.Builder;
-  OMPBuilder.createTargetDeinit(Bld);
+  OMPBuilder.createTargetDeinit(Bld, BufferSize);
 }
 
 void CGOpenMPRuntimeGPU::emitSPMDKernel(const OMPExecutableDirective &D,
@@ -2998,15 +3020,10 @@ void CGOpenMPRuntimeGPU::emitReduction(
 CGM.getContext(), PrivatesReductions, std::nullopt, VarFieldMap,
 C.getLangOpts().OpenMPCUDAReductionBufNum);
 TeamsReductions.push_back(TeamReductionRec);
-if (!KernelTeamsReductionPtr) {
-  KernelTeamsReductionPtr = new llvm::GlobalVariable(
-  CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/true,
-  llvm::GlobalValue::InternalLinkage, nullptr,
-  "_openmp_teams_reductions_buffer_$_$ptr");
-}
-llvm::Value *GlobalBufferPtr = CGF.EmitLoadOfScalar(
-Address(KernelTeamsReductionPtr, CGF.VoidPtrTy, CGM.getPointerAlign()),
-/*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc);
+auto *KernelTeamsReductionPtr = CGF.EmitRuntimeCall(
+OMPBuilder.getOrCreateRuntimeFunction(
+CGM.getModule(), OMPRTL___kmpc_reduction_get_fixed_buffer),
+{}, "_openmp_teams_reductions_buffer_$_$ptr");
 llvm::Value *GlobalToBufferCpyFn = ::emitListToGlobalCopyFunction(
 CGM, Privates, ReductionArrayTy, Loc, TeamReductionRec, VarFieldMap);
 llvm::Value *GlobalToBufferRedFn = ::emitListToGlobalReduceFunction(
@@ -3021,7 +3038,7 @@ void CGOpenMPRuntimeGPU::emitReduction(
 llvm::Value *Args[] = {
 RTLoc,
 ThreadId,
-GlobalBufferPtr,
+KernelTeamsReductionPtr,
 CGF.Builder.getInt32(C.getLangOpts().OpenMPCUDAReductionBufNum),
 RL,
 ShuffleAndReduceFn,
@@ -3654,42 +3671,6 @@ void CGOpenMPRuntimeGPU::processRequiresDirective(
   CGOpenMPRuntime::processRequiresDirective(D);
 }
 
-void CGOpenMPRuntimeGPU::clear() {
-
-  if (!TeamsReductions.empty()) {
-ASTContext &C = CGM.getContext();
-RecordDecl *StaticRD = C.buildImplicitRecord(
-"_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union);
-StaticRD->startDefinition();
-for (const RecordDecl *TeamReductionRec : 

[openmp] [clang] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)

2023-11-01 Thread Johannes Doerfert via cfe-commits


@@ -194,6 +191,9 @@ int32_t __kmpc_nvptx_teams_reduce_nowait_v2(
 ThreadId = 0;
   }
 
+  uint32_t &IterCnt = state::getKernelLaunchEnvironment().ReductionIterCnt;
+  uint32_t &Cnt = state::getKernelLaunchEnvironment().ReductionCnt;

jdoerfert wrote:

They are, I replaced the globals with them.

https://github.com/llvm/llvm-project/pull/70752
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)

2023-11-01 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert updated 
https://github.com/llvm/llvm-project/pull/70752

>From 1859bd43bc2c0bb32fc028f0daf73525f039c5d2 Mon Sep 17 00:00:00 2001
From: Johannes Doerfert 
Date: Mon, 30 Oct 2023 16:39:00 -0700
Subject: [PATCH] [OpenMP][FIX] Allocate per launch memory for GPU team
 reductions

We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instances of a kernel,
or different kernels with team reductions, would use the same locations.
Since we now have the kernel launch environment, we can allocate dynamic
memory per-launch, allowing us to move all the state into a non-racy
place.

Fixes: https://github.com/llvm/llvm-project/issues/70249
---
 clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp  |  75 ++
 clang/lib/CodeGen/CGOpenMPRuntimeGPU.h|   2 -
 .../OpenMP/nvptx_teams_reduction_codegen.cpp  | 240 +-
 .../target_teams_generic_loop_codegen.cpp |  20 +-
 .../DeviceRTL/include/Interface.h |   2 +
 .../libomptarget/DeviceRTL/src/Reduction.cpp  |  10 +-
 openmp/libomptarget/include/Environment.h |  25 +-
 .../PluginInterface/PluginInterface.cpp   |  16 +-
 .../parallel_target_teams_reduction.cpp   |  36 +++
 9 files changed, 231 insertions(+), 195 deletions(-)
 create mode 100644 
openmp/libomptarget/test/offloading/parallel_target_teams_reduction.cpp

diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index bd9329b8e2d4113..0ed665e0dfb9722 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -803,8 +803,30 @@ void CGOpenMPRuntimeGPU::emitKernelDeinit(CodeGenFunction 
&CGF,
   if (!IsSPMD)
 emitGenericVarsEpilog(CGF);
 
+  // This is temporary until we remove the fixed sized buffer.
+  ASTContext &C = CGM.getContext();
+  RecordDecl *StaticRD = C.buildImplicitRecord(
+  "_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union);
+  StaticRD->startDefinition();
+  for (const RecordDecl *TeamReductionRec : TeamsReductions) {
+QualType RecTy = C.getRecordType(TeamReductionRec);
+auto *Field = FieldDecl::Create(
+C, StaticRD, SourceLocation(), SourceLocation(), nullptr, RecTy,
+C.getTrivialTypeSourceInfo(RecTy, SourceLocation()),
+/*BW=*/nullptr, /*Mutable=*/false,
+/*InitStyle=*/ICIS_NoInit);
+Field->setAccess(AS_public);
+StaticRD->addDecl(Field);
+  }
+  StaticRD->completeDefinition();
+  QualType StaticTy = C.getRecordType(StaticRD);
+  llvm::Type *LLVMReductionsBufferTy =
+  CGM.getTypes().ConvertTypeForMem(StaticTy);
+  const auto &DL = CGM.getModule().getDataLayout();
+  uint64_t BufferSize =
+  DL.getTypeAllocSize(LLVMReductionsBufferTy).getFixedValue();
   CGBuilderTy &Bld = CGF.Builder;
-  OMPBuilder.createTargetDeinit(Bld);
+  OMPBuilder.createTargetDeinit(Bld, BufferSize);
 }
 
 void CGOpenMPRuntimeGPU::emitSPMDKernel(const OMPExecutableDirective &D,
@@ -2998,15 +3020,10 @@ void CGOpenMPRuntimeGPU::emitReduction(
 CGM.getContext(), PrivatesReductions, std::nullopt, VarFieldMap,
 C.getLangOpts().OpenMPCUDAReductionBufNum);
 TeamsReductions.push_back(TeamReductionRec);
-if (!KernelTeamsReductionPtr) {
-  KernelTeamsReductionPtr = new llvm::GlobalVariable(
-  CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/true,
-  llvm::GlobalValue::InternalLinkage, nullptr,
-  "_openmp_teams_reductions_buffer_$_$ptr");
-}
-llvm::Value *GlobalBufferPtr = CGF.EmitLoadOfScalar(
-Address(KernelTeamsReductionPtr, CGF.VoidPtrTy, CGM.getPointerAlign()),
-/*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc);
+auto *KernelTeamsReductionPtr = CGF.EmitRuntimeCall(
+OMPBuilder.getOrCreateRuntimeFunction(
+CGM.getModule(), OMPRTL___kmpc_reduction_get_fixed_buffer),
+{}, "_openmp_teams_reductions_buffer_$_$ptr");
 llvm::Value *GlobalToBufferCpyFn = ::emitListToGlobalCopyFunction(
 CGM, Privates, ReductionArrayTy, Loc, TeamReductionRec, VarFieldMap);
 llvm::Value *GlobalToBufferRedFn = ::emitListToGlobalReduceFunction(
@@ -3021,7 +3038,7 @@ void CGOpenMPRuntimeGPU::emitReduction(
 llvm::Value *Args[] = {
 RTLoc,
 ThreadId,
-GlobalBufferPtr,
+KernelTeamsReductionPtr,
 CGF.Builder.getInt32(C.getLangOpts().OpenMPCUDAReductionBufNum),
 RL,
 ShuffleAndReduceFn,
@@ -3654,42 +3671,6 @@ void CGOpenMPRuntimeGPU::processRequiresDirective(
   CGOpenMPRuntime::processRequiresDirective(D);
 }
 
-void CGOpenMPRuntimeGPU::clear() {
-
-  if (!TeamsReductions.empty()) {
-ASTContext &C = CGM.getContext();
-RecordDecl *StaticRD = C.buildImplicitRecord(
-"_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union);
-StaticRD->startDefinition();
-for (const RecordDecl *TeamReductionRec : TeamsReductions) {
-  QualType RecTy = C.getRecordType(T

[openmp] [clang] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)

2023-11-01 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert closed 
https://github.com/llvm/llvm-project/pull/70752
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 0e06ddf - Revert "[APINotes] Upstream APINotesOptions"

2023-11-01 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2023-11-01T11:33:25-07:00
New Revision: 0e06ddf0f6896cfd817a1b97a43b78331e0b1d66

URL: 
https://github.com/llvm/llvm-project/commit/0e06ddf0f6896cfd817a1b97a43b78331e0b1d66
DIFF: 
https://github.com/llvm/llvm-project/commit/0e06ddf0f6896cfd817a1b97a43b78331e0b1d66.diff

LOG: Revert "[APINotes] Upstream APINotesOptions"

This reverts commit c0a1857928c557400af0ed53d198cc9f3f185f9a.

A shared_ptr assertion always triggers causes all bots to fail.

Added: 


Modified: 
clang/include/clang/Driver/Options.td
clang/include/clang/Frontend/CompilerInvocation.h
clang/lib/Frontend/CompilerInvocation.cpp

Removed: 
clang/include/clang/APINotes/APINotesOptions.h



diff  --git a/clang/include/clang/APINotes/APINotesOptions.h 
b/clang/include/clang/APINotes/APINotesOptions.h
deleted file mode 100644
index e8b8a9ed2261fa1..000
--- a/clang/include/clang/APINotes/APINotesOptions.h
+++ /dev/null
@@ -1,34 +0,0 @@
-//===--- APINotesOptions.h --*- C++ 
-*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===--===//
-
-#ifndef LLVM_CLANG_APINOTES_APINOTESOPTIONS_H
-#define LLVM_CLANG_APINOTES_APINOTESOPTIONS_H
-
-#include "llvm/Support/VersionTuple.h"
-#include 
-#include 
-
-namespace clang {
-
-/// Tracks various options which control how API notes are found and handled.
-class APINotesOptions {
-public:
-  /// The Swift version which should be used for API notes.
-  llvm::VersionTuple SwiftVersion;
-
-  /// The set of search paths where we API notes can be found for particular
-  /// modules.
-  ///
-  /// The API notes in this directory are stored as .apinotes, and
-  /// are only applied when building the module .
-  std::vector ModuleSearchPaths;
-};
-
-} // namespace clang
-
-#endif // LLVM_CLANG_APINOTES_APINOTESOPTIONS_H

diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index fcf6a4b2ccb2369..b1229b2f4562379 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1733,10 +1733,6 @@ def fswift_async_fp_EQ : Joined<["-"], 
"fswift-async-fp=">,
 NormalizedValuesScope<"CodeGenOptions::SwiftAsyncFramePointerKind">,
 NormalizedValues<["Auto", "Always", "Never"]>,
 MarshallingInfoEnum, "Always">;
-def fapinotes_swift_version : Joined<["-"], "fapinotes-swift-version=">,
-  Group, Visibility<[ClangOption, CC1Option]>,
-  MetaVarName<"">,
-  HelpText<"Specify the Swift version to use when filtering API notes">;
 
 defm addrsig : BoolFOption<"addrsig",
   CodeGenOpts<"Addrsig">, DefaultFalse,
@@ -4133,9 +4129,6 @@ def ibuiltininc : Flag<["-"], "ibuiltininc">, 
Group,
 def index_header_map : Flag<["-"], "index-header-map">,
   Visibility<[ClangOption, CC1Option]>,
   HelpText<"Make the next included directory (-I or -F) an indexer header 
map">;
-def iapinotes_modules : JoinedOrSeparate<["-"], "iapinotes-modules">, 
Group,
-  Visibility<[ClangOption, CC1Option]>,
-  HelpText<"Add directory to the API notes search path referenced by module 
name">, MetaVarName<"">;
 def idirafter : JoinedOrSeparate<["-"], "idirafter">, Group,
   Visibility<[ClangOption, CC1Option]>,
   HelpText<"Add directory to AFTER include search path">;

diff  --git a/clang/include/clang/Frontend/CompilerInvocation.h 
b/clang/include/clang/Frontend/CompilerInvocation.h
index d9c757a8a156861..45e263e7bc76822 100644
--- a/clang/include/clang/Frontend/CompilerInvocation.h
+++ b/clang/include/clang/Frontend/CompilerInvocation.h
@@ -9,7 +9,6 @@
 #ifndef LLVM_CLANG_FRONTEND_COMPILERINVOCATION_H
 #define LLVM_CLANG_FRONTEND_COMPILERINVOCATION_H
 
-#include "clang/APINotes/APINotesOptions.h"
 #include "clang/Basic/CodeGenOptions.h"
 #include "clang/Basic/DiagnosticOptions.h"
 #include "clang/Basic/FileSystemOptions.h"
@@ -93,9 +92,6 @@ class CompilerInvocationBase {
 
   std::shared_ptr MigratorOpts;
 
-  /// Options controlling API notes.
-  std::shared_ptr APINotesOpts;
-
   /// Options controlling IRgen and the backend.
   std::shared_ptr CodeGenOpts;
 
@@ -135,7 +131,6 @@ class CompilerInvocationBase {
   const PreprocessorOptions &getPreprocessorOpts() const { return *PPOpts; }
   const AnalyzerOptions &getAnalyzerOpts() const { return *AnalyzerOpts; }
   const MigratorOptions &getMigratorOpts() const { return *MigratorOpts; }
-  const APINotesOptions &getAPINotesOpts() const { return *APINotesOpts; }
   const CodeGenOptions &getCodeGenOpts() const { return *CodeGenOpts; }
   const FileSystemOptions &getFileSystemOpts() const { return *FSOpts; }
   const FrontendOptions &getFrontendOpts() const { return *FrontendOpts; }
@@ -247,7 +242,6 @@ class Compile

[clang] [OpenMP] Make team reductions less bad (PR #70981)

2023-11-02 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert closed 
https://github.com/llvm/llvm-project/pull/70981
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [OpenMP][OMPIRBuilder] Add support to omp target parallel (PR #67000)

2023-11-02 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/67000
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [OpenMP][OMPIRBuilder] Add support to omp target parallel (PR #67000)

2023-11-02 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert approved this pull request.

LG, commit the two commits separately though

https://github.com/llvm/llvm-project/pull/67000
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [OpenMP][OMPIRBuilder] Add support to omp target parallel (PR #67000)

2023-11-02 Thread Johannes Doerfert via cfe-commits


@@ -1026,25 +1026,25 @@ for (int i = 0; i < argc; ++i) {
 // CHECK3-NEXT:call void @llvm.experimental.noalias.scope.decl(metadata 
[[META8:![0-9]+]])
 // CHECK3-NEXT:call void @llvm.experimental.noalias.scope.decl(metadata 
[[META10:![0-9]+]])
 // CHECK3-NEXT:call void @llvm.experimental.noalias.scope.decl(metadata 
[[META12:![0-9]+]])
-// CHECK3-NEXT:store i32 [[TMP2]], ptr [[DOTGLOBAL_TID__ADDR_I]], align 4, 
!noalias !14
-// CHECK3-NEXT:store ptr [[TMP5]], ptr [[DOTPART_ID__ADDR_I]], align 8, 
!noalias !14
-// CHECK3-NEXT:store ptr null, ptr [[DOTPRIVATES__ADDR_I]], align 8, 
!noalias !14
-// CHECK3-NEXT:store ptr null, ptr [[DOTCOPY_FN__ADDR_I]], align 8, 
!noalias !14
-// CHECK3-NEXT:store ptr [[TMP3]], ptr [[DOTTASK_T__ADDR_I]], align 8, 
!noalias !14
-// CHECK3-NEXT:store ptr [[TMP7]], ptr [[__CONTEXT_ADDR_I]], align 8, 
!noalias !14
-// CHECK3-NEXT:[[TMP8:%.*]] = load ptr, ptr [[__CONTEXT_ADDR_I]], align 8, 
!noalias !14
+// CHECK3-NEXT:store i32 [[TMP2]], ptr [[DOTGLOBAL_TID__ADDR_I]], align 4, 
!noalias ![[NOALIAS0:[0-9]+]]
+// CHECK3-NEXT:store ptr [[TMP5]], ptr [[DOTPART_ID__ADDR_I]], align 8, 
!noalias ![[NOALIAS0]]
+// CHECK3-NEXT:store ptr null, ptr [[DOTPRIVATES__ADDR_I]], align 8, 
!noalias ![[NOALIAS0]]
+// CHECK3-NEXT:store ptr null, ptr [[DOTCOPY_FN__ADDR_I]], align 8, 
!noalias ![[NOALIAS0]]
+// CHECK3-NEXT:store ptr [[TMP3]], ptr [[DOTTASK_T__ADDR_I]], align 8, 
!noalias ![[NOALIAS0]]
+// CHECK3-NEXT:store ptr [[TMP7]], ptr [[__CONTEXT_ADDR_I]], align 8, 
!noalias ![[NOALIAS0]]
+// CHECK3-NEXT:[[TMP8:%.*]] = load ptr, ptr [[__CONTEXT_ADDR_I]], align 8, 
!noalias ![[NOALIAS0]]
 // CHECK3-NEXT:[[OMP_GLOBAL_THREAD_NUM_I:%.*]] = call i32 
@__kmpc_global_thread_num(ptr @[[GLOB12:[0-9]+]])
 // CHECK3-NEXT:[[TMP9:%.*]] = call i32 @__kmpc_cancel(ptr @[[GLOB1]], i32 
[[OMP_GLOBAL_THREAD_NUM_I]], i32 4)
 // CHECK3-NEXT:[[TMP10:%.*]] = icmp ne i32 [[TMP9]], 0
 // CHECK3-NEXT:br i1 [[TMP10]], label [[DOTCANCEL_EXIT_I:%.*]], label 
[[DOTCANCEL_CONTINUE_I:%.*]]
 // CHECK3:   .cancel.exit.i:
-// CHECK3-NEXT:store i32 1, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias 
!14
+// CHECK3-NEXT:store i32 1, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias 
![[NOALIAS1:[0-9]+]]
 // CHECK3-NEXT:br label [[DOTOMP_OUTLINED__EXIT:%.*]]
 // CHECK3:   .cancel.continue.i:
-// CHECK3-NEXT:store i32 0, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias 
!14
+// CHECK3-NEXT:store i32 0, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias 
![[NOALIAS1]]
 // CHECK3-NEXT:br label [[DOTOMP_OUTLINED__EXIT]]
 // CHECK3:   .omp_outlined..exit:
-// CHECK3-NEXT:[[CLEANUP_DEST_I:%.*]] = load i32, ptr 
[[CLEANUP_DEST_SLOT_I]], align 4, !noalias !14
+// CHECK3-NEXT:[[CLEANUP_DEST_I:%.*]] = load i32, ptr 
[[CLEANUP_DEST_SLOT_I]], align 4, !noalias ![[NOALIAS1]]

jdoerfert wrote:

Commit that right away as NFC. If you leave it with the PR and you use the web 
interface it'll smash them right back together.

https://github.com/llvm/llvm-project/pull/67000
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] Recommit changes to global checks (PR #71171)

2023-11-03 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert commented:

I think if the issues with the original commit are resolved, this is good to go.
Did you verify we can properly auto-generate files, e.g., in 
llvm/test/Transforms/Attributor and clang/test/OpenMP?


https://github.com/llvm/llvm-project/pull/71171
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] d3e7a48 - [OpenMP][NFC] Remove a no-op function

2023-11-03 Thread Johannes Doerfert via cfe-commits

Author: Johannes Doerfert
Date: 2023-11-03T10:28:36-07:00
New Revision: d3e7a48cbde060a6dbc1edcb00f375fb2f9405dc

URL: 
https://github.com/llvm/llvm-project/commit/d3e7a48cbde060a6dbc1edcb00f375fb2f9405dc
DIFF: 
https://github.com/llvm/llvm-project/commit/d3e7a48cbde060a6dbc1edcb00f375fb2f9405dc.diff

LOG: [OpenMP][NFC] Remove a no-op function

Added: 


Modified: 
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp
clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp
clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
clang/test/OpenMP/reduction_implicit_map.cpp
clang/test/OpenMP/target_teams_generic_loop_codegen.cpp
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
llvm/test/Transforms/OpenMP/add_attributes.ll
openmp/libomptarget/DeviceRTL/include/Interface.h
openmp/libomptarget/DeviceRTL/src/Reduction.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 0ed665e0dfb9722..009b3f0a85a3785 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -3081,14 +3081,7 @@ void CGOpenMPRuntimeGPU::emitReduction(
   ++IRHS;
 }
   };
-  llvm::Value *EndArgs[] = {ThreadId};
   RegionCodeGenTy RCG(CodeGen);
-  NVPTXActionTy Action(
-  nullptr, std::nullopt,
-  OMPBuilder.getOrCreateRuntimeFunction(
-  CGM.getModule(), OMPRTL___kmpc_nvptx_end_reduce_nowait),
-  EndArgs);
-  RCG.setAction(Action);
   RCG(CGF);
   // There is no need to emit line number for unconditional branch.
   (void)ApplyDebugLocation::CreateEmpty(CGF);

diff  --git a/clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp 
b/clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp
index 094c5ae3522f96d..c2a958dfdd2453e 100644
--- a/clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp
+++ b/clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp
@@ -148,7 +148,6 @@ int bar(int n){
 // CHECK-64-NEXT:[[TMP8:%.*]] = load double, ptr [[E1]], align 8
 // CHECK-64-NEXT:[[ADD2:%.*]] = fadd double [[TMP7]], [[TMP8]]
 // CHECK-64-NEXT:store double [[ADD2]], ptr [[TMP0]], align 8
-// CHECK-64-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP3]])
 // CHECK-64-NEXT:br label [[DOTOMP_REDUCTION_DONE]]
 // CHECK-64:   .omp.reduction.done:
 // CHECK-64-NEXT:ret void
@@ -353,7 +352,6 @@ int bar(int n){
 // CHECK-64-NEXT:[[TMP13:%.*]] = load float, ptr [[D2]], align 4
 // CHECK-64-NEXT:[[MUL8:%.*]] = fmul float [[TMP12]], [[TMP13]]
 // CHECK-64-NEXT:store float [[MUL8]], ptr [[TMP1]], align 4
-// CHECK-64-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP5]])
 // CHECK-64-NEXT:br label [[DOTOMP_REDUCTION_DONE]]
 // CHECK-64:   .omp.reduction.done:
 // CHECK-64-NEXT:ret void
@@ -609,7 +607,6 @@ int bar(int n){
 // CHECK-64:   cond.end11:
 // CHECK-64-NEXT:[[COND12:%.*]] = phi i16 [ [[TMP15]], [[COND_TRUE9]] ], [ 
[[TMP16]], [[COND_FALSE10]] ]
 // CHECK-64-NEXT:store i16 [[COND12]], ptr [[TMP1]], align 2
-// CHECK-64-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP6]])
 // CHECK-64-NEXT:br label [[DOTOMP_REDUCTION_DONE]]
 // CHECK-64:   .omp.reduction.done:
 // CHECK-64-NEXT:ret void
@@ -824,7 +821,6 @@ int bar(int n){
 // CHECK-32-NEXT:[[TMP8:%.*]] = load double, ptr [[E1]], align 8
 // CHECK-32-NEXT:[[ADD2:%.*]] = fadd double [[TMP7]], [[TMP8]]
 // CHECK-32-NEXT:store double [[ADD2]], ptr [[TMP0]], align 8
-// CHECK-32-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP3]])
 // CHECK-32-NEXT:br label [[DOTOMP_REDUCTION_DONE]]
 // CHECK-32:   .omp.reduction.done:
 // CHECK-32-NEXT:ret void
@@ -1029,7 +1025,6 @@ int bar(int n){
 // CHECK-32-NEXT:[[TMP13:%.*]] = load float, ptr [[D2]], align 4
 // CHECK-32-NEXT:[[MUL8:%.*]] = fmul float [[TMP12]], [[TMP13]]
 // CHECK-32-NEXT:store float [[MUL8]], ptr [[TMP1]], align 4
-// CHECK-32-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP5]])
 // CHECK-32-NEXT:br label [[DOTOMP_REDUCTION_DONE]]
 // CHECK-32:   .omp.reduction.done:
 // CHECK-32-NEXT:ret void
@@ -1285,7 +1280,6 @@ int bar(int n){
 // CHECK-32:   cond.end11:
 // CHECK-32-NEXT:[[COND12:%.*]] = phi i16 [ [[TMP15]], [[COND_TRUE9]] ], [ 
[[TMP16]], [[COND_FALSE10]] ]
 // CHECK-32-NEXT:store i16 [[COND12]], ptr [[TMP1]], align 2
-// CHECK-32-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP6]])
 // CHECK-32-NEXT:br label [[DOTOMP_REDUCTION_DONE]]
 // CHECK-32:   .omp.reduction.done:
 // CHECK-32-NEXT:ret void
@@ -1500,7 +1494,6 @@ int bar(int n){
 // CHECK-32-EX-NEXT:[[TMP8:%.*]] = load double, ptr [[E1]], align 8
 // CHECK-32-EX-NEXT:[[ADD2:%.*]] = fadd

[clang] [llvm] Recommit changes to global checks (PR #71171)

2023-11-03 Thread Johannes Doerfert via cfe-commits

jdoerfert wrote:

> > I think if the issues with the original commit are resolved, this is good 
> > to go.
> > Did you verify we can properly auto-generate files, e.g., in 
> > llvm/test/Transforms/Attributor and clang/test/OpenMP?
> 
> Ah no I did not, I'll do that on Monday.

I'd run `./llvm/utils/update_any_test_checks.py` once and see if the tests pass 
afterwards.
Then do it again to ensure the nasty ordering and duplication issues are gone 
for good.


https://github.com/llvm/llvm-project/pull/71171
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang] report inlining decisions with -Wattribute-{warning|error} (PR #73552)

2023-11-27 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert updated 
https://github.com/llvm/llvm-project/pull/73552

>From cea177222b421c67dabbe9e267f8b9a4ead4d51e Mon Sep 17 00:00:00 2001
From: Nick Desaulniers 
Date: Tue, 10 Jan 2023 17:42:18 -0800
Subject: [PATCH 01/15] [clang] report inlining decisions with
 -Wattribute-{warning|error}

Due to inlining, descovering which specific call site to a function with
the attribute "warning" or "error" is painful.

In the IR record inlining decisions in metadata when inlining a callee
that itself contains a call to a dontcall-error or dontcall-warn fn.

Print this info so that it's clearer which call site is problematic.

There's still some limitations with this approach; macro expansion is
not recorded.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/1571

Differential Revision: https://reviews.llvm.org/D141451
---
 .../clang/Basic/DiagnosticFrontendKinds.td|  2 +
 clang/lib/CodeGen/CodeGenAction.cpp   | 10 +++
 ...backend-attribute-error-warning-optimize.c | 21 +
 .../backend-attribute-error-warning.c |  6 ++
 .../backend-attribute-error-warning.cpp   | 12 +++
 llvm/docs/LangRef.rst |  8 +-
 llvm/include/llvm/IR/DiagnosticInfo.h | 14 +++-
 llvm/lib/IR/DiagnosticInfo.cpp| 23 -
 llvm/lib/Transforms/Utils/InlineFunction.cpp  | 13 +++
 .../Transforms/Inline/dontcall-attributes.ll  | 84 +++
 10 files changed, 185 insertions(+), 8 deletions(-)
 create mode 100644 llvm/test/Transforms/Inline/dontcall-attributes.ll

diff --git a/clang/include/clang/Basic/DiagnosticFrontendKinds.td 
b/clang/include/clang/Basic/DiagnosticFrontendKinds.td
index 715e0c0dc8fa84e..0909b1f59175be9 100644
--- a/clang/include/clang/Basic/DiagnosticFrontendKinds.td
+++ b/clang/include/clang/Basic/DiagnosticFrontendKinds.td
@@ -93,6 +93,8 @@ def err_fe_backend_error_attr :
 def warn_fe_backend_warning_attr :
   Warning<"call to '%0' declared with 'warning' attribute: %1">, BackendInfo,
   InGroup;
+def note_fe_backend_in : Note<"called by function '%0'">;
+def note_fe_backend_inlined : Note<"inlined by function '%0'">;
 
 def err_fe_invalid_code_complete_file : Error<
 "cannot locate code-completion file %0">, DefaultFatal;
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp 
b/clang/lib/CodeGen/CodeGenAction.cpp
index a31a271ed77d1ca..66e040741e2718d 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -52,6 +52,8 @@
 #include "llvm/Transforms/Utils/Cloning.h"
 
 #include 
+#include 
+
 using namespace clang;
 using namespace llvm;
 
@@ -794,6 +796,14 @@ void BackendConsumer::DontCallDiagHandler(const 
DiagnosticInfoDontCall &D) {
   ? diag::err_fe_backend_error_attr
   : diag::warn_fe_backend_warning_attr)
   << llvm::demangle(D.getFunctionName()) << D.getNote();
+
+  SmallVector InliningDecisions;
+  D.getInliningDecisions(InliningDecisions);
+  InliningDecisions.push_back(D.getCaller().str());
+  for (auto Dec : llvm::enumerate(InliningDecisions))
+Diags.Report(Dec.index() ? diag::note_fe_backend_inlined
+ : diag::note_fe_backend_in)
+<< llvm::demangle(Dec.value());
 }
 
 void BackendConsumer::MisExpectDiagHandler(
diff --git a/clang/test/Frontend/backend-attribute-error-warning-optimize.c 
b/clang/test/Frontend/backend-attribute-error-warning-optimize.c
index d3951e3b6b1f57d..0bfc50ff8985c39 100644
--- a/clang/test/Frontend/backend-attribute-error-warning-optimize.c
+++ b/clang/test/Frontend/backend-attribute-error-warning-optimize.c
@@ -9,6 +9,7 @@ int x(void) {
 }
 void baz(void) {
   foo(); // expected-error {{call to 'foo' declared with 'error' attribute: oh 
no foo}}
+ // expected-note@* {{called by function 'baz'}}
   if (x())
 bar();
 }
@@ -20,3 +21,23 @@ void indirect(void) {
   quux = foo;
   quux();
 }
+
+static inline void a(int x) {
+if (x == 10)
+foo(); // expected-error {{call to 'foo' declared with 'error' 
attribute: oh no foo}}
+   // expected-note@* {{called by function 'a'}}
+   // expected-note@* {{inlined by function 'b'}}
+   // expected-note@* {{inlined by function 'd'}}
+}
+
+static inline void b() {
+a(10);
+}
+
+void c() {
+a(9);
+}
+
+void d() {
+  b();
+}
diff --git a/clang/test/Frontend/backend-attribute-error-warning.c 
b/clang/test/Frontend/backend-attribute-error-warning.c
index c3c7803479aac96..c87a47053e5c0f8 100644
--- a/clang/test/Frontend/backend-attribute-error-warning.c
+++ b/clang/test/Frontend/backend-attribute-error-warning.c
@@ -23,11 +23,17 @@ duplicate_warnings(void);
 
 void baz(void) {
   foo(); // expected-error {{call to 'foo' declared with 'error' attribute: oh 
no foo}}
+ // expected-note@* {{called by function 'baz'}}
   if (x())
 bar(); // expected-error {{call to 'bar' declared with 'error' attribute: 
oh no bar}}
+   // expected-note@* {

[clang] [Clang] CWG2789 Overload resolution with implicit and explicit object… (PR #73493)

2023-11-27 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert updated 
https://github.com/llvm/llvm-project/pull/73493

>From 3758290904571237c13ba23f2e3f65e58b6598aa Mon Sep 17 00:00:00 2001
From: Corentin Jabot 
Date: Mon, 27 Nov 2023 10:48:13 +0100
Subject: [PATCH 1/2] [Clang] CWG2789 Overload resolution with implicit and
 explicit object member functions

Implement the resolution to CWG2789 from
https://wiki.edg.com/pub/Wg21kona2023/StrawPolls/p3046r0.html

The DR page is not updated because the issue has not made
it to a published list yet.
---
 clang/include/clang/Sema/Sema.h |  6 +++
 clang/lib/Sema/SemaOverload.cpp | 69 ++---
 clang/test/CXX/drs/dr27xx.cpp   | 31 +++
 3 files changed, 92 insertions(+), 14 deletions(-)
 create mode 100644 clang/test/CXX/drs/dr27xx.cpp

diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index f7c9d0e2e6412b7..7579a3256bc37aa 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -3849,6 +3849,12 @@ class Sema final {
   const FunctionProtoType *NewType,
   unsigned *ArgPos = nullptr,
   bool Reversed = false);
+
+  bool FunctionNonObjectParamTypesAreEqual(const FunctionDecl *OldFunction,
+   const FunctionDecl *NewFunction,
+   unsigned *ArgPos = nullptr,
+   bool Reversed = false);
+
   void HandleFunctionTypeMismatch(PartialDiagnostic &PDiag,
   QualType FromType, QualType ToType);
 
diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 9800d7f1c9cfee9..cc69cd1f2862aae 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -3239,6 +3239,28 @@ bool Sema::FunctionParamTypesAreEqual(const 
FunctionProtoType *OldType,
 NewType->param_types(), ArgPos, Reversed);
 }
 
+bool Sema::FunctionNonObjectParamTypesAreEqual(const FunctionDecl *OldFunction,
+   const FunctionDecl *NewFunction,
+   unsigned *ArgPos,
+   bool Reversed) {
+
+  if (OldFunction->getNumNonObjectParams() !=
+  NewFunction->getNumNonObjectParams())
+return false;
+
+  unsigned OldIgnore =
+  unsigned(OldFunction->hasCXXExplicitFunctionObjectParameter());
+  unsigned NewIgnore =
+  unsigned(NewFunction->hasCXXExplicitFunctionObjectParameter());
+
+  auto *OldPT = cast(OldFunction->getFunctionType());
+  auto *NewPT = cast(NewFunction->getFunctionType());
+
+  return FunctionParamTypesAreEqual(OldPT->param_types().slice(OldIgnore),
+NewPT->param_types().slice(NewIgnore),
+ArgPos, Reversed);
+}
+
 /// CheckPointerConversion - Check the pointer conversion from the
 /// expression From to the type ToType. This routine checks for
 /// ambiguous or inaccessible derived-to-base pointer
@@ -10121,22 +10143,41 @@ static bool haveSameParameterTypes(ASTContext 
&Context, const FunctionDecl *F1,
 
 /// We're allowed to use constraints partial ordering only if the candidates
 /// have the same parameter types:
-/// [over.match.best]p2.6
-/// F1 and F2 are non-template functions with the same parameter-type-lists,
-/// and F1 is more constrained than F2 [...]
+/// [over.match.best.general]p2.6
+/// F1 and F2 are non-template functions with the same
+/// non-object-parameter-type-lists, and F1 is more constrained than F2 [...]
 static bool sameFunctionParameterTypeLists(Sema &S,
-  const OverloadCandidate &Cand1,
-  const OverloadCandidate &Cand2) {
-  if (Cand1.Function && Cand2.Function) {
-auto *PT1 = cast(Cand1.Function->getFunctionType());
-auto *PT2 = cast(Cand2.Function->getFunctionType());
-if (PT1->getNumParams() == PT2->getNumParams() &&
-PT1->isVariadic() == PT2->isVariadic() &&
-S.FunctionParamTypesAreEqual(PT1, PT2, nullptr,
- Cand1.isReversed() ^ Cand2.isReversed()))
-  return true;
+   const OverloadCandidate &Cand1,
+   const OverloadCandidate &Cand2) {
+  if (!Cand1.Function || !Cand2.Function)
+return false;
+
+  auto *Fn1 = Cand1.Function;
+  auto *Fn2 = Cand2.Function;
+
+  if (Fn1->isVariadic() != Fn1->isVariadic())
+return false;
+
+  if (!S.FunctionNonObjectParamTypesAreEqual(
+  Fn1, Fn2, nullptr, Cand1.isReversed() ^ Cand2.isReversed()))
+return false;
+
+  auto *Mem1 = dyn_cast(Fn1);
+  auto *Mem2 = dyn_cast(Fn2);
+  if (Mem1 && Mem2) {
+// if they are member functions, both are direct members of the same class,
+// and
+  

[openmp] [clang] [OpenMP] Directly use user's grid and block size in kernel language mode (PR #70612)

2023-11-28 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert approved this pull request.

LG, see the nit.

Also, add a runtime test with bare and verify we stick with values we otherwise 
would not, e.g., 1 teams and 1024 threads.

https://github.com/llvm/llvm-project/pull/70612
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Directly use user's grid and block size in kernel language mode (PR #70612)

2023-11-28 Thread Johannes Doerfert via cfe-commits


@@ -14633,6 +14633,26 @@ StmtResult 
Sema::ActOnOpenMPTargetTeamsDirective(ArrayRef Clauses,
   }
   setFunctionHasBranchProtectedScope();
 
+  bool HasBareClause = false;
+  bool HasThreadLimitClause = false;
+  bool HasNumTeamsClause = false;
+  OMPClause *BareClause = nullptr;

jdoerfert wrote:

No need for HasBareClause since you got the pointer.

https://github.com/llvm/llvm-project/pull/70612
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [OpenMP] Directly use user's grid and block size in kernel language mode (PR #70612)

2023-11-28 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/70612
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [libcxx] [clang] [openmp] [flang] [OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code (PR #73817)

2023-11-29 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert updated 
https://github.com/llvm/llvm-project/pull/73817

>From ed1513641d575c4a2881613864c892aff7855a78 Mon Sep 17 00:00:00 2001
From: Johannes Doerfert 
Date: Tue, 28 Nov 2023 18:51:23 -0800
Subject: [PATCH] [OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code

While this does not really encapsulate the mapping code, it at least
moves most of the declarations out of the way.
---
 openmp/libomptarget/include/OpenMP/Mapping.h | 427 +++
 openmp/libomptarget/include/device.h | 350 +--
 openmp/libomptarget/src/CMakeLists.txt   |   2 +
 openmp/libomptarget/src/OpenMP/Mapping.cpp   |  40 ++
 openmp/libomptarget/src/omptarget.cpp|   1 +
 openmp/libomptarget/src/private.h|  82 
 6 files changed, 473 insertions(+), 429 deletions(-)
 create mode 100644 openmp/libomptarget/include/OpenMP/Mapping.h
 create mode 100644 openmp/libomptarget/src/OpenMP/Mapping.cpp

diff --git a/openmp/libomptarget/include/OpenMP/Mapping.h 
b/openmp/libomptarget/include/OpenMP/Mapping.h
new file mode 100644
index 000..b01831c61f6c823
--- /dev/null
+++ b/openmp/libomptarget/include/OpenMP/Mapping.h
@@ -0,0 +1,427 @@
+//===-- OpenMP/Mapping.h - OpenMP/OpenACC pointer mapping ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Declarations for managing host-to-device pointer mappings.
+//
+//===--===//
+
+#ifndef OMPTARGET_OPENMP_MAPPING_H
+#define OMPTARGET_OPENMP_MAPPING_H
+
+#include "omptarget.h"
+
+#include 
+#include 
+#include 
+
+#include "llvm/ADT/SmallSet.h"
+
+struct DeviceTy;
+class AsyncInfoTy;
+
+using map_var_info_t = void *;
+
+/// Information about shadow pointers.
+struct ShadowPtrInfoTy {
+  void **HstPtrAddr = nullptr;
+  void *HstPtrVal = nullptr;
+  void **TgtPtrAddr = nullptr;
+  void *TgtPtrVal = nullptr;
+
+  bool operator==(const ShadowPtrInfoTy &Other) const {
+return HstPtrAddr == Other.HstPtrAddr;
+  }
+};
+
+inline bool operator<(const ShadowPtrInfoTy &lhs, const ShadowPtrInfoTy &rhs) {
+  return lhs.HstPtrAddr < rhs.HstPtrAddr;
+}
+
+/// Map between host data and target data.
+struct HostDataToTargetTy {
+  const uintptr_t HstPtrBase; // host info.
+  const uintptr_t HstPtrBegin;
+  const uintptr_t HstPtrEnd;   // non-inclusive.
+  const map_var_info_t HstPtrName; // Optional source name of mapped variable.
+
+  const uintptr_t TgtAllocBegin; // allocated target memory
+  const uintptr_t TgtPtrBegin; // mapped target memory = TgtAllocBegin + 
padding
+
+private:
+  static const uint64_t INFRefCount = ~(uint64_t)0;
+  static std::string refCountToStr(uint64_t RefCount) {
+return RefCount == INFRefCount ? "INF" : std::to_string(RefCount);
+  }
+
+  struct StatesTy {
+StatesTy(uint64_t DRC, uint64_t HRC)
+: DynRefCount(DRC), HoldRefCount(HRC) {}
+/// The dynamic reference count is the standard reference count as of 
OpenMP
+/// 4.5.  The hold reference count is an OpenMP extension for the sake of
+/// OpenACC support.
+///
+/// The 'ompx_hold' map type modifier is permitted only on "omp target" and
+/// "omp target data", and "delete" is permitted only on "omp target exit
+/// data" and associated runtime library routines.  As a result, we really
+/// need to implement "reset" functionality only for the dynamic reference
+/// counter.  Likewise, only the dynamic reference count can be infinite
+/// because, for example, omp_target_associate_ptr and "omp declare target
+/// link" operate only on it.  Nevertheless, it's actually easier to follow
+/// the code (and requires less assertions for special cases) when we just
+/// implement these features generally across both reference counters here.
+/// Thus, it's the users of this class that impose those restrictions.
+///
+uint64_t DynRefCount;
+uint64_t HoldRefCount;
+
+/// A map of shadow pointers associated with this entry, the keys are host
+/// pointer addresses to identify stale entries.
+llvm::SmallSet ShadowPtrInfos;
+
+/// Pointer to the event corresponding to the data update of this map.
+/// Note: At present this event is created when the first data transfer 
from
+/// host to device is issued, and only being used for H2D. It is not used
+/// for data transfer in another direction (device to host). It is still
+/// unclear whether we need it for D2H. If in the future we need similar
+/// mechanism for D2H, and if the event cannot be shared between them, 
Event
+/// should be written as void *Event[2].
+void *Event = nullptr;
+
+/// Number of threads currently holding a re

[llvm] [libcxx] [clang] [openmp] [flang] [OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code (PR #73817)

2023-11-29 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert closed 
https://github.com/llvm/llvm-project/pull/73817
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [OpenMP] Avoid initializing the KernelLaunchEnvironment if possible (PR #73864)

2023-11-29 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert closed 
https://github.com/llvm/llvm-project/pull/73864
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [clang-tidy] Add bugprone-move-shared-pointer-contents check. (PR #67467)

2023-12-01 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert updated 
https://github.com/llvm/llvm-project/pull/67467

>From 6d5d35e1273f595e8a0382053d5183cbce7a9d8a Mon Sep 17 00:00:00 2001
From: David Pizzuto 
Date: Tue, 26 Sep 2023 10:45:42 -0700
Subject: [PATCH] [clang-tidy] Add bugprone-move-shared-pointer-contents check.

This check detects moves of the contents of a shared pointer rather than the
pointer itself. Other code with a reference to the shared pointer is probably
not expecting the move.

The set of shared pointer classes is configurable via options to allow 
individual
projects to cover additional types.
---
 .../bugprone/BugproneTidyModule.cpp   |  3 +
 .../clang-tidy/bugprone/CMakeLists.txt|  2 +
 .../MoveSharedPointerContentsCheck.cpp| 60 
 .../bugprone/MoveSharedPointerContentsCheck.h | 37 ++
 clang-tools-extra/docs/ReleaseNotes.rst   |  6 ++
 .../bugprone/move-shared-pointer-contents.rst | 17 +
 .../docs/clang-tidy/checks/list.rst   |  2 +
 .../bugprone/move-shared-pointer-contents.cpp | 68 +++
 8 files changed, 195 insertions(+)
 create mode 100644 
clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.cpp
 create mode 100644 
clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.h
 create mode 100644 
clang-tools-extra/docs/clang-tidy/checks/bugprone/move-shared-pointer-contents.rst
 create mode 100644 
clang-tools-extra/test/clang-tidy/checkers/bugprone/move-shared-pointer-contents.cpp

diff --git a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp 
b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
index a67a91eedd10482..7f4a504f9930f17 100644
--- a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
+++ b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
@@ -39,6 +39,7 @@
 #include "MisplacedPointerArithmeticInAllocCheck.h"
 #include "MisplacedWideningCastCheck.h"
 #include "MoveForwardingReferenceCheck.h"
+#include "MoveSharedPointerContentsCheck.h"
 #include "MultiLevelImplicitPointerConversionCheck.h"
 #include "MultipleNewInOneExpressionCheck.h"
 #include "MultipleStatementMacroCheck.h"
@@ -125,6 +126,8 @@ class BugproneModule : public ClangTidyModule {
 "bugprone-inaccurate-erase");
 CheckFactories.registerCheck(
 "bugprone-incorrect-enable-if");
+CheckFactories.registerCheck(
+"bugprone-move-shared-pointer-contents");
 CheckFactories.registerCheck(
 "bugprone-switch-missing-default-case");
 CheckFactories.registerCheck(
diff --git a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt 
b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
index 3c768021feb1502..c017f0c0cc52021 100644
--- a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
+++ b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
@@ -23,6 +23,7 @@ add_clang_library(clangTidyBugproneModule
   ImplicitWideningOfMultiplicationResultCheck.cpp
   InaccurateEraseCheck.cpp
   IncorrectEnableIfCheck.cpp
+  MoveSharedPointerContentsCheck.cpp
   SwitchMissingDefaultCaseCheck.cpp
   IncDecInConditionsCheck.cpp
   IncorrectRoundingsCheck.cpp
@@ -35,6 +36,7 @@ add_clang_library(clangTidyBugproneModule
   MisplacedPointerArithmeticInAllocCheck.cpp
   MisplacedWideningCastCheck.cpp
   MoveForwardingReferenceCheck.cpp
+  MoveSharedPointerContentsCheck.cpp
   MultiLevelImplicitPointerConversionCheck.cpp
   MultipleNewInOneExpressionCheck.cpp
   MultipleStatementMacroCheck.cpp
diff --git 
a/clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.cpp 
b/clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.cpp
new file mode 100644
index 000..b4a393b7f2f2000
--- /dev/null
+++ b/clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.cpp
@@ -0,0 +1,60 @@
+//===--- MoveSharedPointerContentsCheck.cpp - clang-tidy 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "MoveSharedPointerContentsCheck.h"
+#include "../ClangTidyCheck.h"
+#include "../utils/Matchers.h"
+#include "../utils/OptionsUtils.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+
+using namespace clang::ast_matchers;
+
+namespace clang::tidy::bugprone {
+
+MoveSharedPointerContentsCheck::MoveSharedPointerContentsCheck(
+StringRef Name, ClangTidyContext *Context)
+: ClangTidyCheck(Name, Context),
+  SharedPointerClasses(utils::options::parseStringList(
+  Options.get("SharedPointerClasses", "std::shared_ptr"))) {}
+
+MoveSharedPointerContentsCheck::~MoveSharedPointerContentsCheck() = default;
+
+bool MoveSharedPointerContentsCheck::isLanguageVersionSupported(
+const LangOptions &LangOptions) const {
+  return L

[lldb] [libc] [llvm] [compiler-rt] [clang] [flang] [libcxx] [clang-tools-extra] [openmp] [lld] [mlir] [libunwind] [OpenMP] Improve omp offload profiler (PR #68016)

2023-12-21 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert approved this pull request.

LG. Please rebase and merge.

https://github.com/llvm/llvm-project/pull/68016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [openmp] [Clang][OpenMP] Fix mapping of structs to device (PR #75642)

2023-12-21 Thread Johannes Doerfert via cfe-commits

jdoerfert wrote:

This fails for me on the host and the AMD GPU:
GPU:
# | :217:1: note: possible intended match here
# | dat.datum[dat.arr[0][0]] = 5
X86:
# | :134:1: note: possible intended match here
# | dat.datum[dat.arr[0][0]] = 5461

The location that is printed (datum[1]) is uninitialized.


https://github.com/llvm/llvm-project/pull/75642
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)

2024-01-03 Thread Johannes Doerfert via cfe-commits

jdoerfert wrote:

Documentation missing as well.

https://github.com/llvm/llvm-project/pull/76571
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [libc] [lldb] [mlir] [clang] [NFC][ObjectSizeOffset] Use classes instead of std::pair (PR #76882)

2024-01-03 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert commented:

Generally, getting rid of the pair is great.
I am unsure I understand why we do the base template rather than inheritance. 
Where is the base template used? If we do inheritance we could avoid 
duplicating of members and provide default impls that work with many things, 
e.g., all operator== are the same.


https://github.com/llvm/llvm-project/pull/76882
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[mlir] [libc] [clang] [lldb] [llvm] [NFC][ObjectSizeOffset] Use classes instead of std::pair (PR #76882)

2024-01-03 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert edited 
https://github.com/llvm/llvm-project/pull/76882
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libc] [clang] [lldb] [llvm] [mlir] [NFC][ObjectSizeOffset] Use classes instead of std::pair (PR #76882)

2024-01-03 Thread Johannes Doerfert via cfe-commits


@@ -187,80 +187,147 @@ Value *lowerObjectSizeCall(
 const TargetLibraryInfo *TLI, AAResults *AA, bool MustSucceed,
 SmallVectorImpl *InsertedInstructions = nullptr);
 
-using SizeOffsetType = std::pair;
+/// SizeOffsetType - A base template class for the object size visitors. Used
+/// here as a self-documenting way to handle the values rather than using a
+/// \p std::pair.
+template  struct SizeOffsetType {
+  T Size;
+  T Offset;
+
+  bool knownSize() const;
+  bool knownOffset() const;
+  bool anyKnown() const;
+  bool bothKnown() const;
+};
+
+/// SizeOffsetType - Used by \p ObjectSizeOffsetVisitor, which works
+/// with \p APInts.
+template <> struct SizeOffsetType {
+  APInt Size;
+  APInt Offset;
+
+  SizeOffsetType() = default;
+  SizeOffsetType(APInt Size, APInt Offset) : Size(Size), Offset(Offset) {}
+
+  bool knownSize() const { return Size.getBitWidth() > 1; }
+  bool knownOffset() const { return Offset.getBitWidth() > 1; }
+  bool anyKnown() const { return knownSize() || knownOffset(); }
+  bool bothKnown() const { return knownSize() && knownOffset(); }
+
+  bool operator==(const SizeOffsetType &RHS) {
+return Size == RHS.Size && Offset == RHS.Offset;
+  }
+  bool operator!=(const SizeOffsetType &RHS) { return !(*this == RHS); }
+};
+using SizeOffsetAPInt = SizeOffsetType;
 
 /// Evaluate the size and offset of an object pointed to by a Value*
 /// statically. Fails if size or offset are not known at compile time.
 class ObjectSizeOffsetVisitor
-  : public InstVisitor {
+: public InstVisitor {
   const DataLayout &DL;
   const TargetLibraryInfo *TLI;
   ObjectSizeOpts Options;
   unsigned IntTyBits;
   APInt Zero;
-  SmallDenseMap SeenInsts;
+  SmallDenseMap SeenInsts;
   unsigned InstructionsVisited;
 
   APInt align(APInt Size, MaybeAlign Align);
 
-  SizeOffsetType unknown() {
-return std::make_pair(APInt(), APInt());
-  }
+  static SizeOffsetAPInt unknown;
 
 public:
   ObjectSizeOffsetVisitor(const DataLayout &DL, const TargetLibraryInfo *TLI,
   LLVMContext &Context, ObjectSizeOpts Options = {});
 
-  SizeOffsetType compute(Value *V);
-
-  static bool knownSize(const SizeOffsetType &SizeOffset) {
-return SizeOffset.first.getBitWidth() > 1;
-  }
-
-  static bool knownOffset(const SizeOffsetType &SizeOffset) {
-return SizeOffset.second.getBitWidth() > 1;
-  }
-
-  static bool bothKnown(const SizeOffsetType &SizeOffset) {
-return knownSize(SizeOffset) && knownOffset(SizeOffset);
-  }
+  SizeOffsetAPInt compute(Value *V);
 
   // These are "private", except they can't actually be made private. Only
   // compute() should be used by external users.
-  SizeOffsetType visitAllocaInst(AllocaInst &I);
-  SizeOffsetType visitArgument(Argument &A);
-  SizeOffsetType visitCallBase(CallBase &CB);
-  SizeOffsetType visitConstantPointerNull(ConstantPointerNull&);
-  SizeOffsetType visitExtractElementInst(ExtractElementInst &I);
-  SizeOffsetType visitExtractValueInst(ExtractValueInst &I);
-  SizeOffsetType visitGlobalAlias(GlobalAlias &GA);
-  SizeOffsetType visitGlobalVariable(GlobalVariable &GV);
-  SizeOffsetType visitIntToPtrInst(IntToPtrInst&);
-  SizeOffsetType visitLoadInst(LoadInst &I);
-  SizeOffsetType visitPHINode(PHINode&);
-  SizeOffsetType visitSelectInst(SelectInst &I);
-  SizeOffsetType visitUndefValue(UndefValue&);
-  SizeOffsetType visitInstruction(Instruction &I);
+  SizeOffsetAPInt visitAllocaInst(AllocaInst &I);
+  SizeOffsetAPInt visitArgument(Argument &A);
+  SizeOffsetAPInt visitCallBase(CallBase &CB);
+  SizeOffsetAPInt visitConstantPointerNull(ConstantPointerNull &);
+  SizeOffsetAPInt visitExtractElementInst(ExtractElementInst &I);
+  SizeOffsetAPInt visitExtractValueInst(ExtractValueInst &I);
+  SizeOffsetAPInt visitGlobalAlias(GlobalAlias &GA);
+  SizeOffsetAPInt visitGlobalVariable(GlobalVariable &GV);
+  SizeOffsetAPInt visitIntToPtrInst(IntToPtrInst &);
+  SizeOffsetAPInt visitLoadInst(LoadInst &I);
+  SizeOffsetAPInt visitPHINode(PHINode &);
+  SizeOffsetAPInt visitSelectInst(SelectInst &I);
+  SizeOffsetAPInt visitUndefValue(UndefValue &);
+  SizeOffsetAPInt visitInstruction(Instruction &I);
 
 private:
-  SizeOffsetType findLoadSizeOffset(
+  SizeOffsetAPInt findLoadSizeOffset(
   LoadInst &LoadFrom, BasicBlock &BB, BasicBlock::iterator From,
-  SmallDenseMap &VisitedBlocks,
+  SmallDenseMap &VisitedBlocks,
   unsigned &ScannedInstCount);
-  SizeOffsetType combineSizeOffset(SizeOffsetType LHS, SizeOffsetType RHS);
-  SizeOffsetType computeImpl(Value *V);
-  SizeOffsetType computeValue(Value *V);
+  SizeOffsetAPInt combineSizeOffset(SizeOffsetAPInt LHS, SizeOffsetAPInt RHS);
+  SizeOffsetAPInt computeImpl(Value *V);
+  SizeOffsetAPInt computeValue(Value *V);
   bool CheckedZextOrTrunc(APInt &I);
 };
 
-using SizeOffsetEvalType = std::pair;
+template <> struct SizeOffsetType;
+
+/// SizeOffsetType - Used by \p ObjectSizeOffsetEvaluator, which works
+/// with \p Values.
+template <

[openmp] [clang-tools-extra] [flang] [libcxx] [libc] [compiler-rt] [clang] [lldb] [lld] [llvm] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-03 Thread Johannes Doerfert via cfe-commits


@@ -428,13 +428,22 @@ std::string getPGOFuncNameVarName(StringRef FuncName,
   return VarName;
 }
 
+bool isGPUProfTarget(const Module &M) {
+  const auto &triple = M.getTargetTriple();
+  return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 ||
+ triple.rfind("r600", 0) == 0;
+}
+

jdoerfert wrote:

Use the suggesting above. This is what we use elsewhere rn.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[libcxx] [llvm] [lld] [libc] [compiler-rt] [openmp] [lldb] [flang] [clang-tools-extra] [clang] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-01-03 Thread Johannes Doerfert via cfe-commits

https://github.com/jdoerfert commented:

Can we have tests for this? You can just check for the dump.

https://github.com/llvm/llvm-project/pull/76587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   >