r351642 - [NFC] Generalize expected output for callback test
Author: jdoerfert Date: Sat Jan 19 01:40:08 2019 New Revision: 351642 URL: http://llvm.org/viewvc/llvm-project?rev=351642&view=rev Log: [NFC] Generalize expected output for callback test Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_pthread_create.c?rev=351642&r1=351641&r2=351642&view=diff == --- cfe/trunk/test/CodeGen/callback_pthread_create.c (original) +++ cfe/trunk/test/CodeGen/callback_pthread_create.c Sat Jan 19 01:40:08 2019 @@ -1,7 +1,7 @@ // RUN: %clang -O1 %s -S -c -emit-llvm -o - | FileCheck %s // RUN: %clang -O1 %s -S -c -emit-llvm -o - | opt -ipconstprop -S | FileCheck --check-prefix=IPCP %s -// CHECK: declare !callback ![[cid:[0-9]+]] dso_local i32 @pthread_create +// CHECK: declare !callback ![[cid:[0-9]+]] {{.*}}i32 @pthread_create // CHECK: ![[cid]] = !{![[cidb:[0-9]+]]} // CHECK: ![[cidb]] = !{i64 2, i64 3, i1 false} ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r351643 - [FIX] Restrict callback pthreads_create test to linux only
Author: jdoerfert Date: Sat Jan 19 01:40:10 2019 New Revision: 351643 URL: http://llvm.org/viewvc/llvm-project?rev=351643&view=rev Log: [FIX] Restrict callback pthreads_create test to linux only Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_pthread_create.c?rev=351643&r1=351642&r2=351643&view=diff == --- cfe/trunk/test/CodeGen/callback_pthread_create.c (original) +++ cfe/trunk/test/CodeGen/callback_pthread_create.c Sat Jan 19 01:40:10 2019 @@ -1,6 +1,9 @@ // RUN: %clang -O1 %s -S -c -emit-llvm -o - | FileCheck %s // RUN: %clang -O1 %s -S -c -emit-llvm -o - | opt -ipconstprop -S | FileCheck --check-prefix=IPCP %s +// This is a linux only test for now due to the include. +// UNSUPPORTED: !linux + // CHECK: declare !callback ![[cid:[0-9]+]] {{.*}}i32 @pthread_create // CHECK: ![[cid]] = !{![[cidb:[0-9]+]]} // CHECK: ![[cidb]] = !{i64 2, i64 3, i1 false} ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r351629 - Emit !callback metadata and introduce the callback attribute
Author: jdoerfert Date: Fri Jan 18 21:36:54 2019 New Revision: 351629 URL: http://llvm.org/viewvc/llvm-project?rev=351629&view=rev Log: Emit !callback metadata and introduce the callback attribute With commit r351627, LLVM gained the ability to apply (existing) IPO optimizations on indirections through callbacks, or transitive calls. The general idea is that we use an abstraction to hide the middle man and represent the callback call in the context of the initial caller. It is described in more detail in the commit message of the LLVM patch r351627, the llvm::AbstractCallSite class description, and the language reference section on callback-metadata. This commit enables clang to emit !callback metadata that is understood by LLVM. It does so in three different cases: 1) For known broker functions declarations that are directly generated, e.g., __kmpc_fork_call for the OpenMP pragma parallel. 2) For known broker functions that are identified by their name and source location through the builtin detection, e.g., pthread_create from the POSIX thread API. 3) For user annotated functions that carry the "callback(callee, ...)" attribute. The attribute has to include the name, or index, of the callback callee and how the passed arguments can be identified (as many as the callback callee has). See the callback attribute documentation for detailed information. Differential Revision: https://reviews.llvm.org/D55483 Added: cfe/trunk/test/CodeGen/attr-callback.c cfe/trunk/test/CodeGen/callback_annotated.c cfe/trunk/test/CodeGen/callback_openmp.c cfe/trunk/test/CodeGen/callback_pthread_create.c cfe/trunk/test/CodeGenCXX/attr-callback.cpp cfe/trunk/test/Sema/attr-callback-broken.c cfe/trunk/test/Sema/attr-callback.c cfe/trunk/test/SemaCXX/attr-callback-broken.cpp cfe/trunk/test/SemaCXX/attr-callback.cpp Modified: cfe/trunk/include/clang/AST/ASTContext.h cfe/trunk/include/clang/Basic/Attr.td cfe/trunk/include/clang/Basic/AttrDocs.td cfe/trunk/include/clang/Basic/Builtins.def cfe/trunk/include/clang/Basic/Builtins.h cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td cfe/trunk/lib/AST/ASTContext.cpp cfe/trunk/lib/Basic/Builtins.cpp cfe/trunk/lib/CodeGen/CGOpenMPRuntime.cpp cfe/trunk/lib/CodeGen/CodeGenModule.cpp cfe/trunk/lib/Parse/ParseDecl.cpp cfe/trunk/lib/Sema/SemaDecl.cpp cfe/trunk/lib/Sema/SemaDeclAttr.cpp cfe/trunk/test/Analysis/retain-release.m cfe/trunk/test/Misc/pragma-attribute-supported-attributes-list.test cfe/trunk/test/OpenMP/parallel_codegen.cpp cfe/trunk/utils/TableGen/ClangAttrEmitter.cpp Modified: cfe/trunk/include/clang/AST/ASTContext.h URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/AST/ASTContext.h?rev=351629&r1=351628&r2=351629&view=diff == --- cfe/trunk/include/clang/AST/ASTContext.h (original) +++ cfe/trunk/include/clang/AST/ASTContext.h Fri Jan 18 21:36:54 2019 @@ -2003,6 +2003,9 @@ public: /// No error GE_None, +/// Missing a type +GE_Missing_type, + /// Missing a type from GE_Missing_stdio, Modified: cfe/trunk/include/clang/Basic/Attr.td URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Attr.td?rev=351629&r1=351628&r2=351629&view=diff == --- cfe/trunk/include/clang/Basic/Attr.td (original) +++ cfe/trunk/include/clang/Basic/Attr.td Fri Jan 18 21:36:54 2019 @@ -190,6 +190,9 @@ class VariadicIdentifierArgument : Argument; +// A list of identifiers matching parameters or ParamIdx indices. +class VariadicParamOrParamIdxArgument : Argument; + // Like VariadicParamIdxArgument but for a single function parameter index. class ParamIdxArgument : Argument; @@ -1210,6 +1213,13 @@ def FormatArg : InheritableAttr { let Documentation = [Undocumented]; } +def Callback : InheritableAttr { + let Spellings = [Clang<"callback">]; + let Args = [VariadicParamOrParamIdxArgument<"Encoding">]; + let Subjects = SubjectList<[Function]>; + let Documentation = [CallbackDocs]; +} + def GNUInline : InheritableAttr { let Spellings = [GCC<"gnu_inline">]; let Subjects = SubjectList<[Function]>; Modified: cfe/trunk/include/clang/Basic/AttrDocs.td URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/AttrDocs.td?rev=351629&r1=351628&r2=351629&view=diff == --- cfe/trunk/include/clang/Basic/AttrDocs.td (original) +++ cfe/trunk/include/clang/Basic/AttrDocs.td Fri Jan 18 21:36:54 2019 @@ -3781,6 +3781,55 @@ it rather documents the programmer's int }]; } +def CallbackDocs : Documentation { + let Category = DocCatVariable; + let Content = [{ +The ``callback`` attribute specifies that the annotated functio
r351665 - [FIX] Generalize the expected results for callback clang tests
Author: jdoerfert Date: Sat Jan 19 12:46:10 2019 New Revision: 351665 URL: http://llvm.org/viewvc/llvm-project?rev=351665&view=rev Log: [FIX] Generalize the expected results for callback clang tests Modified: cfe/trunk/test/CodeGen/callback_annotated.c cfe/trunk/test/CodeGen/callback_pthread_create.c Modified: cfe/trunk/test/CodeGen/callback_annotated.c URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_annotated.c?rev=351665&r1=351664&r2=351665&view=diff == --- cfe/trunk/test/CodeGen/callback_annotated.c (original) +++ cfe/trunk/test/CodeGen/callback_annotated.c Sat Jan 19 12:46:10 2019 @@ -30,22 +30,20 @@ __attribute__((callback(4, d, 5, 2))) vo static void *VoidPtr2VoidPtr(void *payload) { // RUN2: ret i8* %payload - // IPCP: ret i8* null + // IPCP: ret i8* null return payload; } static int ThreeInt2Int(int a, int b, int c) { - // RUN2: define internal i32 @ThreeInt2Int(i32 %a, i32 %b, i32 %c) - // RUN2-NEXT: entry: - // RUN2-NEXT: %mul = mul nsw i32 %b, %a - // RUN2-NEXT: %add = add nsw i32 %mul, %c - // RUN2-NEXT: ret i32 %add + // RUN2: define internal i32 @ThreeInt2Int(i32 %a, i32 %b, i32 %c) + // RUN2: %mul = mul nsw i32 %b, %a + // RUN2: %add = add nsw i32 %mul, %c + // RUN2: ret i32 %add - // IPCP: define internal i32 @ThreeInt2Int(i32 %a, i32 %b, i32 %c) - // IPCP-NEXT: entry: - // IPCP-NEXT: %mul = mul nsw i32 4, %a - // IPCP-NEXT: %add = add nsw i32 %mul, %c - // IPCP-NEXT: ret i32 %add + // IPCP: define internal i32 @ThreeInt2Int(i32 %a, i32 %b, i32 %c) + // IPCP: %mul = mul nsw i32 4, %a + // IPCP: %add = add nsw i32 %mul, %c + // IPCP: ret i32 %add return a * b + c; } Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_pthread_create.c?rev=351665&r1=351664&r2=351665&view=diff == --- cfe/trunk/test/CodeGen/callback_pthread_create.c (original) +++ cfe/trunk/test/CodeGen/callback_pthread_create.c Sat Jan 19 12:46:10 2019 @@ -14,15 +14,13 @@ const int GlobalVar = 0; static void *callee0(void *payload) { // IPCP: define internal i8* @callee0 -// IPCP-NEXT: entry: -// IPCP-NEXT: ret i8* null +// IPCP:ret i8* null return payload; } static void *callee1(void *payload) { // IPCP: define internal i8* @callee1 -// IPCP-NEXT: entry: -// IPCP-NEXT: ret i8* bitcast (i32* @GlobalVar to i8*) +// IPCP:ret i8* bitcast (i32* @GlobalVar to i8*) return payload; } ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r351744 - [NFC] Fix comparison warning issues by MSVC
Author: jdoerfert Date: Mon Jan 21 06:23:46 2019 New Revision: 351744 URL: http://llvm.org/viewvc/llvm-project?rev=351744&view=rev Log: [NFC] Fix comparison warning issues by MSVC Modified: cfe/trunk/lib/Sema/SemaDeclAttr.cpp Modified: cfe/trunk/lib/Sema/SemaDeclAttr.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaDeclAttr.cpp?rev=351744&r1=351743&r2=351744&view=diff == --- cfe/trunk/lib/Sema/SemaDeclAttr.cpp (original) +++ cfe/trunk/lib/Sema/SemaDeclAttr.cpp Mon Jan 21 06:23:46 2019 @@ -3560,7 +3560,9 @@ static void handleCallbackAttr(Sema &S, int CalleeIdx = EncodingIndices.front(); // Check if the callee index is proper, thus not "this" and not "unknown". - if (CalleeIdx < HasImplicitThisParam) { + // This means the "CalleeIdx" has to be non-negative if "HasImplicitThisParam" + // is false and positive if "HasImplicitThisParam" is true. + if (CalleeIdx < (int)HasImplicitThisParam) { S.Diag(AL.getLoc(), diag::err_callback_attribute_invalid_callee) << AL.getRange(); return; ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r352299 - [FIX] Adjust CXX microsoft abi dynamic cast test to r352293
Author: jdoerfert Date: Sat Jan 26 16:22:10 2019 New Revision: 352299 URL: http://llvm.org/viewvc/llvm-project?rev=352299&view=rev Log: [FIX] Adjust CXX microsoft abi dynamic cast test to r352293 Modified: cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp Modified: cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp?rev=352299&r1=352298&r2=352299&view=diff == --- cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp (original) +++ cfe/trunk/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp Sat Jan 26 16:22:10 2019 @@ -60,7 +60,7 @@ T* test5(A* x) { return dynamic_cast // CHECK-NEXT: [[VBOFFP:%.*]] = getelementptr inbounds i32, i32* [[VBTBL]], i32 1 // CHECK-NEXT: [[VBOFFS:%.*]] = load i32, i32* [[VBOFFP]], align 4 // CHECK-NEXT: [[ADJ:%.*]] = getelementptr inbounds i8, i8* [[VOIDP]], i32 [[VBOFFS]] -// CHECK-NEXT: [[CALL:%.*]] = tail call i8* @__RTDynamicCast(i8* [[ADJ]], i32 [[VBOFFS]], i8* {{.*}}bitcast (%rtti.TypeDescriptor7* @"??_R0?AUA@@@8" to i8*), i8* {{.*}}bitcast (%rtti.TypeDescriptor7* @"??_R0?AUT@@@8" to i8*), i32 0) +// CHECK-NEXT: [[CALL:%.*]] = tail call i8* @__RTDynamicCast(i8* nonnull [[ADJ]], i32 [[VBOFFS]], i8* {{.*}}bitcast (%rtti.TypeDescriptor7* @"??_R0?AUA@@@8" to i8*), i8* {{.*}}bitcast (%rtti.TypeDescriptor7* @"??_R0?AUT@@@8" to i8*), i32 0) // CHECK-NEXT: [[RES:%.*]] = bitcast i8* [[CALL]] to %struct.T* // CHECK-NEXT: br label // CHECK:[[RET:%.*]] = phi %struct.T* @@ -100,7 +100,7 @@ void* test8(A* x) { return dynamic_cast< // CHECK-NEXT: [[VBOFFP:%.*]] = getelementptr inbounds i32, i32* [[VBTBL]], i32 1 // CHECK-NEXT: [[VBOFFS:%.*]] = load i32, i32* [[VBOFFP]], align 4 // CHECK-NEXT: [[ADJ:%.*]] = getelementptr inbounds i8, i8* [[VOIDP]], i32 [[VBOFFS]] -// CHECK-NEXT: [[RES:%.*]] = tail call i8* @__RTCastToVoid(i8* [[ADJ]]) +// CHECK-NEXT: [[RES:%.*]] = tail call i8* @__RTCastToVoid(i8* nonnull [[ADJ]]) // CHECK-NEXT: br label // CHECK:[[RET:%.*]] = phi i8* // CHECK-NEXT: ret i8* [[RET]] Modified: cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp?rev=352299&r1=352298&r2=352299&view=diff == --- cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp (original) +++ cfe/trunk/test/CodeGenCXX/microsoft-abi-typeid.cpp Sat Jan 26 16:22:10 2019 @@ -36,7 +36,7 @@ const std::type_info* test3_typeid() { r // CHECK-NEXT: [[VBSLOT:%.*]] = getelementptr inbounds i32, i32* [[VBTBL]], i32 1 // CHECK-NEXT: [[VBASE_OFFS:%.*]] = load i32, i32* [[VBSLOT]], align 4 // CHECK-NEXT: [[ADJ:%.*]] = getelementptr inbounds i8, i8* [[THIS]], i32 [[VBASE_OFFS]] -// CHECK-NEXT: [[RT:%.*]] = tail call i8* @__RTtypeid(i8* [[ADJ]]) +// CHECK-NEXT: [[RT:%.*]] = tail call i8* @__RTtypeid(i8* nonnull [[ADJ]]) // CHECK-NEXT: [[RET:%.*]] = bitcast i8* [[RT]] to %struct.type_info* // CHECK-NEXT: ret %struct.type_info* [[RET]] ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r353088 - Generalize pthread callback test case
Author: jdoerfert Date: Mon Feb 4 12:42:38 2019 New Revision: 353088 URL: http://llvm.org/viewvc/llvm-project?rev=353088&view=rev Log: Generalize pthread callback test case Changes suggested by Eli Friedman Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c Modified: cfe/trunk/test/CodeGen/callback_pthread_create.c URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/callback_pthread_create.c?rev=353088&r1=353087&r2=353088&view=diff == --- cfe/trunk/test/CodeGen/callback_pthread_create.c (original) +++ cfe/trunk/test/CodeGen/callback_pthread_create.c Mon Feb 4 12:42:38 2019 @@ -1,14 +1,22 @@ -// RUN: %clang -O1 %s -S -c -emit-llvm -o - | FileCheck %s -// RUN: %clang -O1 %s -S -c -emit-llvm -o - | opt -ipconstprop -S | FileCheck --check-prefix=IPCP %s - -// This is a linux only test for now due to the include. -// UNSUPPORTED: !linux +// RUN: %clang_cc1 -O1 %s -S -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -O1 %s -S -emit-llvm -o - | opt -ipconstprop -S | FileCheck --check-prefix=IPCP %s // CHECK: declare !callback ![[cid:[0-9]+]] {{.*}}i32 @pthread_create // CHECK: ![[cid]] = !{![[cidb:[0-9]+]]} // CHECK: ![[cidb]] = !{i64 2, i64 3, i1 false} -#include +// Taken from test/Analysis/retain-release.m +//{ +struct _opaque_pthread_t {}; +struct _opaque_pthread_attr_t {}; +typedef struct _opaque_pthread_t *__darwin_pthread_t; +typedef struct _opaque_pthread_attr_t __darwin_pthread_attr_t; +typedef __darwin_pthread_t pthread_t; +typedef __darwin_pthread_attr_t pthread_attr_t; + +int pthread_create(pthread_t *, const pthread_attr_t *, + void *(*)(void *), void *); +//} const int GlobalVar = 0; @@ -26,8 +34,8 @@ static void *callee1(void *payload) { void foo() { pthread_t MyFirstThread; - pthread_create(&MyFirstThread, NULL, callee0, NULL); + pthread_create(&MyFirstThread, 0, callee0, 0); pthread_t MySecondThread; - pthread_create(&MySecondThread, NULL, callee1, (void *)&GlobalVar); + pthread_create(&MySecondThread, 0, callee1, (void *)&GlobalVar); } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r362545 - Introduce Value::stripPointerCastsSameRepresentation
Author: jdoerfert Date: Tue Jun 4 13:21:46 2019 New Revision: 362545 URL: http://llvm.org/viewvc/llvm-project?rev=362545&view=rev Log: Introduce Value::stripPointerCastsSameRepresentation This patch allows current users of Value::stripPointerCasts() to force the result of the function to have the same representation as the value it was called on. This is useful in various cases, e.g., (non-)null checks. In this patch only a single call site was adjusted to fix an existing misuse that would cause nonnull where they may be wrong. Uses in attribute deduction and other areas, e.g., D60047, are to be expected. For a discussion on this topic, please see [0]. [0] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128423.html Reviewers: hfinkel, arsenm, reames Subscribers: wdng, hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61607 Modified: cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl Modified: cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl?rev=362545&r1=362544&r2=362545&view=diff == --- cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl (original) +++ cfe/trunk/test/CodeGenOpenCLCXX/addrspace-references.cl Tue Jun 4 13:21:46 2019 @@ -9,6 +9,6 @@ void foo() { // CHECK: [[REF:%.*]] = alloca i32 // CHECK: store i32 1, i32* [[REF]] // CHECK: [[REG:%[0-9]+]] = addrspacecast i32* [[REF]] to i32 addrspace(4)* - // CHECK: call spir_func i32 @_Z3barRU3AS4Kj(i32 addrspace(4)* nonnull dereferenceable(4) [[REG]]) + // CHECK: call spir_func i32 @_Z3barRU3AS4Kj(i32 addrspace(4)* dereferenceable(4) [[REG]]) bar(1); } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
r367387 - [Fix] Customize warnings for missing built-in types
Author: jdoerfert Date: Tue Jul 30 22:16:38 2019 New Revision: 367387 URL: http://llvm.org/viewvc/llvm-project?rev=367387&view=rev Log: [Fix] Customize warnings for missing built-in types If we detect a built-in declaration for which we cannot derive a type matching the pattern in the Builtins.def file, we currently emit a warning that the respective header is needed. However, this is not necessarily the behavior we want as it has no connection to the location of the declaration (which can actually be in the header in question). Instead, this warning is generated - if we could not build the type for the pattern on file (for some reason). Here we should make the reason explicit. The actual problem is otherwise circumvented as the warning is misleading, see [0] for an example. - if we could not build the type for the pattern because we do not have a type on record, possible since D55483, we should not emit any warning. See [1] for a legitimate problem. This patch address both cases. For the "setjmp" family a new warning is introduced and for built-ins without type on record, so far "pthread_create", we do not emit the warning anymore. Also see: PR40692 [0] https://lkml.org/lkml/2019/1/11/718 [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235583 Differential Revision: https://reviews.llvm.org/D58091 Added: cfe/trunk/test/Sema/builtin-setjmp.c Modified: cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td cfe/trunk/lib/Sema/SemaDecl.cpp cfe/trunk/test/Analysis/retain-release.m cfe/trunk/test/Sema/implicit-builtin-decl.c Modified: cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td?rev=367387&r1=367386&r2=367387&view=diff == --- cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td (original) +++ cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td Tue Jul 30 22:16:38 2019 @@ -598,6 +598,10 @@ def ext_implicit_lib_function_decl : Ext def note_include_header_or_declare : Note< "include the header <%0> or explicitly provide a declaration for '%1'">; def note_previous_builtin_declaration : Note<"%0 is a builtin with type %1">; +def warn_implicit_decl_no_jmp_buf +: Warning<"declaration of built-in function '%0' requires the declaration" +" of the 'jmp_buf' type, commonly provided in the header .">, + InGroup>; def warn_implicit_decl_requires_sysheader : Warning< "declaration of built-in function '%1' requires inclusion of the header <%0>">, InGroup; Modified: cfe/trunk/lib/Sema/SemaDecl.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaDecl.cpp?rev=367387&r1=367386&r2=367387&view=diff == --- cfe/trunk/lib/Sema/SemaDecl.cpp (original) +++ cfe/trunk/lib/Sema/SemaDecl.cpp Tue Jul 30 22:16:38 2019 @@ -1983,10 +1983,27 @@ NamedDecl *Sema::LazilyCreateBuiltin(Ide ASTContext::GetBuiltinTypeError Error; QualType R = Context.GetBuiltinType(ID, Error); if (Error) { -if (ForRedeclaration) - Diag(Loc, diag::warn_implicit_decl_requires_sysheader) - << getHeaderName(Context.BuiltinInfo, ID, Error) +if (!ForRedeclaration) + return nullptr; + +// If we have a builtin without an associated type we should not emit a +// warning when we were not able to find a type for it. +if (Error == ASTContext::GE_Missing_type) + return nullptr; + +// If we could not find a type for setjmp it is because the jmp_buf type was +// not defined prior to the setjmp declaration. +if (Error == ASTContext::GE_Missing_setjmp) { + Diag(Loc, diag::warn_implicit_decl_no_jmp_buf) << Context.BuiltinInfo.getName(ID); + return nullptr; +} + +// Generally, we emit a warning that the declaration requires the +// appropriate header. +Diag(Loc, diag::warn_implicit_decl_requires_sysheader) +<< getHeaderName(Context.BuiltinInfo, ID, Error) +<< Context.BuiltinInfo.getName(ID); return nullptr; } Modified: cfe/trunk/test/Analysis/retain-release.m URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Analysis/retain-release.m?rev=367387&r1=367386&r2=367387&view=diff == --- cfe/trunk/test/Analysis/retain-release.m (original) +++ cfe/trunk/test/Analysis/retain-release.m Tue Jul 30 22:16:38 2019 @@ -2,7 +2,7 @@ // RUN: %clang_analyze_cc1 -triple x86_64-apple-darwin10\ // RUN: -analyzer-checker=core,osx.coreFoundation.CFRetainRelease\ // RUN: -analyzer-checker=osx.cocoa.ClassRelease,osx.cocoa.RetainCount\ -// RUN: -analyzer-checker=debug.ExprInspection -fblocks -verify=expected,C %s\ +// RUN: -analyzer-checker=debug.ExprInspection -fblocks -verify %s\ // RUN: -Wno-objc-root-class -analyzer-output=plis
[clang] 7db017b - [OpenMP][Docs] Update Clang Support docs after D75591
Author: Johannes Doerfert Date: 2020-07-29T10:21:05-05:00 New Revision: 7db017bf3405c7fa43786fe27380d88702e19584 URL: https://github.com/llvm/llvm-project/commit/7db017bf3405c7fa43786fe27380d88702e19584 DIFF: https://github.com/llvm/llvm-project/commit/7db017bf3405c7fa43786fe27380d88702e19584.diff LOG: [OpenMP][Docs] Update Clang Support docs after D75591 Added: Modified: clang/docs/OpenMPSupport.rst Removed: diff --git a/clang/docs/OpenMPSupport.rst b/clang/docs/OpenMPSupport.rst index a1d1b120bcec..bda52e934c26 100644 --- a/clang/docs/OpenMPSupport.rst +++ b/clang/docs/OpenMPSupport.rst @@ -264,7 +264,7 @@ want to help with the implementation. +==+==+==+===+ | misc extension | user-defined function variants with #ifdef protection| :part:`worked on`| D71179 | +--+--+--+---+ -| misc extension | default(firstprivate) & default(private) | :part:`worked on`| | +| misc extension | default(firstprivate) & default(private) | :part:`partial` | firstprivate done: D75591 | +--+--+--+---+ | loop extension | Loop tiling transformation | :part:`claimed` | | +--+--+--+---+ ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] ee05167 - [OpenMP] Allow traits for the OpenMP context selector `isa`
Author: Johannes Doerfert Date: 2020-07-29T10:22:27-05:00 New Revision: ee05167cc42b95f70bc2ff1bd4402969f356f53b URL: https://github.com/llvm/llvm-project/commit/ee05167cc42b95f70bc2ff1bd4402969f356f53b DIFF: https://github.com/llvm/llvm-project/commit/ee05167cc42b95f70bc2ff1bd4402969f356f53b.diff LOG: [OpenMP] Allow traits for the OpenMP context selector `isa` It was unclear what `isa` was supposed to mean so we did not provide any traits for this context selector. With this patch we will allow *any* string or identifier. We use the target attribute and target info to determine if the trait matches. In other words, we will check if the provided value is a target feature that is available (at the call site). Fixes PR46338 Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D83281 Added: clang/test/OpenMP/declare_variant_device_isa_codegen_1.c Modified: clang/include/clang/AST/OpenMPClause.h clang/include/clang/Basic/DiagnosticParseKinds.td clang/include/clang/Basic/DiagnosticSemaKinds.td clang/lib/AST/OpenMPClause.cpp clang/lib/Parse/ParseOpenMP.cpp clang/lib/Sema/SemaOpenMP.cpp clang/test/OpenMP/declare_variant_messages.c llvm/include/llvm/Frontend/OpenMP/OMPContext.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPContext.cpp llvm/unittests/Frontend/OpenMPContextTest.cpp Removed: diff --git a/clang/include/clang/AST/OpenMPClause.h b/clang/include/clang/AST/OpenMPClause.h index c649502f765b..4f94aa7074ee 100644 --- a/clang/include/clang/AST/OpenMPClause.h +++ b/clang/include/clang/AST/OpenMPClause.h @@ -7635,6 +7635,10 @@ class OMPClausePrinter final : public OMPClauseVisitor { struct OMPTraitProperty { llvm::omp::TraitProperty Kind = llvm::omp::TraitProperty::invalid; + + /// The raw string as we parsed it. This is needed for the `isa` trait set + /// (which accepts anything) and (later) extensions. + StringRef RawString; }; struct OMPTraitSelector { Expr *ScoreOrCondition = nullptr; @@ -7692,6 +7696,23 @@ class OMPTraitInfo { llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const OMPTraitInfo &TI); llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const OMPTraitInfo *TI); +/// Clang specific specialization of the OMPContext to lookup target features. +struct TargetOMPContext final : public llvm::omp::OMPContext { + + TargetOMPContext(ASTContext &ASTCtx, + std::function &&DiagUnknownTrait, + const FunctionDecl *CurrentFunctionDecl); + virtual ~TargetOMPContext() = default; + + /// See llvm::omp::OMPContext::matchesISATrait + bool matchesISATrait(StringRef RawString) const override; + +private: + std::function FeatureValidityCheck; + std::function DiagUnknownTrait; + llvm::StringMap FeatureMap; +}; + } // namespace clang #endif // LLVM_CLANG_AST_OPENMPCLAUSE_H diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td index 6138b27fb87f..08b91de31993 100644 --- a/clang/include/clang/Basic/DiagnosticParseKinds.td +++ b/clang/include/clang/Basic/DiagnosticParseKinds.td @@ -1278,6 +1278,11 @@ def warn_omp_declare_variant_string_literal_or_identifier "%select{set|selector|property}0; " "%select{set|selector|property}0 skipped">, InGroup; +def warn_unknown_begin_declare_variant_isa_trait +: Warning<"isa trait '%0' is not known to the current target; verify the " + "spelling or consider restricting the context selector with the " + "'arch' selector further">, + InGroup; def note_omp_declare_variant_ctx_options : Note<"context %select{set|selector|property}0 options are: %1">; def warn_omp_declare_variant_expected diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index 8093e7ed3fbe..ae693a08108c 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -10320,6 +10320,11 @@ def warn_nested_declare_variant : Warning<"nesting `omp begin/end declare variant` is not supported yet; " "nested context ignored">, InGroup; +def warn_unknown_declare_variant_isa_trait +: Warning<"isa trait '%0' is not known to the current target; verify the " + "spelling or consider restricting the context selector with the " + "'arch' selector further">, + InGroup; def err_omp_non_pointer_type_array_shaping_base : Error< "expected expression with a pointer to a complete type as a base of an array " "shaping operation">; diff --git a/clang/lib/AST/OpenMPClause.cpp b/clang/lib/AST/OpenMPClause.cpp index 6933c5742552..9caa691188fd 100644 --- a/clang/lib/AST/OpenMPClause.cpp +++ b/clang/lib/AST/OpenMPClause.cpp @@ -17,6 +17,7 @@ #include "clang/
[clang] 8723280 - [OpenMP] Fix D83281 issue on windows by allowing `dso_local` in CHECK
Author: Johannes Doerfert Date: 2020-07-29T15:18:20-05:00 New Revision: 8723280b68b1e5ed97a699466720b36a32a9e406 URL: https://github.com/llvm/llvm-project/commit/8723280b68b1e5ed97a699466720b36a32a9e406 DIFF: https://github.com/llvm/llvm-project/commit/8723280b68b1e5ed97a699466720b36a32a9e406.diff LOG: [OpenMP] Fix D83281 issue on windows by allowing `dso_local` in CHECK Added: Modified: clang/test/OpenMP/declare_variant_device_isa_codegen_1.c Removed: diff --git a/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c b/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c index baa5eb8f8830..25fc3941dd50 100644 --- a/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c +++ b/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c @@ -31,18 +31,18 @@ void avx512_saxpy(int n, float s, float *x, float *y) { } void caller(int n, float s, float *x, float *y) { - // GENERIC: define void @{{.*}}caller - // GENERIC: call void @{{.*}}base_saxpy - // WITHFEATURE: define void @{{.*}}caller - // WITHFEATURE: call void @{{.*}}avx512_saxpy + // GENERIC: define void {{.*}}caller + // GENERIC: call void {{.*}}base_saxpy + // WITHFEATURE: define void {{.*}}caller + // WITHFEATURE: call void {{.*}}avx512_saxpy base_saxpy(n, s, x, y); } __attribute__((target("avx512f"))) void variant_caller(int n, float s, float *x, float *y) { - // GENERIC: define void @{{.*}}variant_caller - // GENERIC: call void @{{.*}}avx512_saxpy - // WITHFEATURE: define void @{{.*}}variant_caller - // WITHFEATURE: call void @{{.*}}avx512_saxpy + // GENERIC: define void {{.*}}variant_caller + // GENERIC: call void {{.*}}avx512_saxpy + // WITHFEATURE: define void {{.*}}variant_caller + // WITHFEATURE: call void {{.*}}avx512_saxpy base_saxpy(n, s, x, y); } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] b08abf4 - [OpenMP] Fix D83281 issue on windows by allowing `dso_local` in CHECK [2/1]
Author: Johannes Doerfert Date: 2020-07-29T15:47:45-05:00 New Revision: b08abf4c808e98718b8806dafcae1626328676d4 URL: https://github.com/llvm/llvm-project/commit/b08abf4c808e98718b8806dafcae1626328676d4 DIFF: https://github.com/llvm/llvm-project/commit/b08abf4c808e98718b8806dafcae1626328676d4.diff LOG: [OpenMP] Fix D83281 issue on windows by allowing `dso_local` in CHECK [2/1] The problem with 8723280b68b1e5ed97a699466720b36a32a9e406 was that the `dso_local` is *before* the void not after. Hope this works. Added: Modified: clang/test/OpenMP/declare_variant_device_isa_codegen_1.c Removed: diff --git a/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c b/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c index 25fc3941dd50..76a3eedeae30 100644 --- a/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c +++ b/clang/test/OpenMP/declare_variant_device_isa_codegen_1.c @@ -31,18 +31,18 @@ void avx512_saxpy(int n, float s, float *x, float *y) { } void caller(int n, float s, float *x, float *y) { - // GENERIC: define void {{.*}}caller - // GENERIC: call void {{.*}}base_saxpy - // WITHFEATURE: define void {{.*}}caller - // WITHFEATURE: call void {{.*}}avx512_saxpy + // GENERIC: define {{.*}}void @{{.*}}caller + // GENERIC: call void @{{.*}}base_saxpy + // WITHFEATURE: define {{.*}}void @{{.*}}caller + // WITHFEATURE: call void @{{.*}}avx512_saxpy base_saxpy(n, s, x, y); } __attribute__((target("avx512f"))) void variant_caller(int n, float s, float *x, float *y) { - // GENERIC: define void {{.*}}variant_caller - // GENERIC: call void {{.*}}avx512_saxpy - // WITHFEATURE: define void {{.*}}variant_caller - // WITHFEATURE: call void {{.*}}avx512_saxpy + // GENERIC: define {{.*}}void @{{.*}}variant_caller + // GENERIC: call void @{{.*}}avx512_saxpy + // WITHFEATURE: define {{.*}}void @{{.*}}variant_caller + // WITHFEATURE: call void @{{.*}}avx512_saxpy base_saxpy(n, s, x, y); } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 19756ef - [OpenMP][IRBuilder] Support allocas in nested parallel regions
Author: Johannes Doerfert Date: 2020-07-30T10:19:39-05:00 New Revision: 19756ef53a498b7aa1fbac9e3a7cd3aa8e110fad URL: https://github.com/llvm/llvm-project/commit/19756ef53a498b7aa1fbac9e3a7cd3aa8e110fad DIFF: https://github.com/llvm/llvm-project/commit/19756ef53a498b7aa1fbac9e3a7cd3aa8e110fad.diff LOG: [OpenMP][IRBuilder] Support allocas in nested parallel regions We need to keep track of the alloca insertion point (which we already communicate via the callback to the user) as we place allocas as well. Reviewed By: fghanim, SouraVX Differential Revision: https://reviews.llvm.org/D82470 Added: Modified: clang/lib/CodeGen/CGStmtOpenMP.cpp llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp Removed: diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp index 0ee1133ebaa1..df1cc1666de4 100644 --- a/clang/lib/CodeGen/CGStmtOpenMP.cpp +++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp @@ -1707,9 +1707,11 @@ void CodeGenFunction::EmitOMPParallelDirective(const OMPParallelDirective &S) { CGCapturedStmtInfo CGSI(*CS, CR_OpenMP); CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(*this, &CGSI); -Builder.restoreIP(OMPBuilder.CreateParallel(Builder, BodyGenCB, PrivCB, -FiniCB, IfCond, NumThreads, -ProcBind, S.hasCancel())); +llvm::OpenMPIRBuilder::InsertPointTy AllocaIP( +AllocaInsertPt->getParent(), AllocaInsertPt->getIterator()); +Builder.restoreIP( +OMPBuilder.CreateParallel(Builder, AllocaIP, BodyGenCB, PrivCB, FiniCB, + IfCond, NumThreads, ProcBind, S.hasCancel())); return; } diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 95eed59f1b3d..f813a730342e 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -156,6 +156,7 @@ class OpenMPIRBuilder { /// Generator for '#omp parallel' /// /// \param Loc The insert and source location description. + /// \param AllocaIP The insertion points to be used for alloca instructions. /// \param BodyGenCB Callback that will generate the region code. /// \param PrivCB Callback to copy a given variable (think copy constructor). /// \param FiniCB Callback to finalize variable copies. @@ -166,10 +167,11 @@ class OpenMPIRBuilder { /// /// \returns The insertion position *after* the parallel. IRBuilder<>::InsertPoint - CreateParallel(const LocationDescription &Loc, BodyGenCallbackTy BodyGenCB, - PrivatizeCallbackTy PrivCB, FinalizeCallbackTy FiniCB, - Value *IfCondition, Value *NumThreads, - omp::ProcBindKind ProcBind, bool IsCancellable); + CreateParallel(const LocationDescription &Loc, InsertPointTy AllocaIP, + BodyGenCallbackTy BodyGenCB, PrivatizeCallbackTy PrivCB, + FinalizeCallbackTy FiniCB, Value *IfCondition, + Value *NumThreads, omp::ProcBindKind ProcBind, + bool IsCancellable); /// Generator for '#omp flush' /// diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 9468a3aa3c8d..a5fe4ec87c46 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -394,9 +394,10 @@ void OpenMPIRBuilder::emitCancelationCheckImpl( } IRBuilder<>::InsertPoint OpenMPIRBuilder::CreateParallel( -const LocationDescription &Loc, BodyGenCallbackTy BodyGenCB, -PrivatizeCallbackTy PrivCB, FinalizeCallbackTy FiniCB, Value *IfCondition, -Value *NumThreads, omp::ProcBindKind ProcBind, bool IsCancellable) { +const LocationDescription &Loc, InsertPointTy OuterAllocaIP, +BodyGenCallbackTy BodyGenCB, PrivatizeCallbackTy PrivCB, +FinalizeCallbackTy FiniCB, Value *IfCondition, Value *NumThreads, +omp::ProcBindKind ProcBind, bool IsCancellable) { if (!updateToLocation(Loc)) return Loc.IP; @@ -429,7 +430,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::CreateParallel( // we want to delete at the end. SmallVector ToBeDeleted; - Builder.SetInsertPoint(OuterFn->getEntryBlock().getFirstNonPHI()); + // Change the location to the outer alloca insertion point to create and + // initialize the allocas we pass into the parallel region. + Builder.restoreIP(OuterAllocaIP); AllocaInst *TIDAddr = Builder.CreateAlloca(Int32, nullptr, "tid.addr"); AllocaInst *ZeroAddr = Builder.CreateAlloca(Int32, nullptr, "zero.addr"); @@ -481,9 +484,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::CreateParallel( // Generate the privatization allocas in the block that will become the entry // o
[clang] ebad64d - [OpenMP][FIX] Consistently use OpenMPIRBuilder if requested
Author: Johannes Doerfert Date: 2020-07-30T10:19:40-05:00 New Revision: ebad64dfe133e64d1df6b82e6ef2fb031d635b08 URL: https://github.com/llvm/llvm-project/commit/ebad64dfe133e64d1df6b82e6ef2fb031d635b08 DIFF: https://github.com/llvm/llvm-project/commit/ebad64dfe133e64d1df6b82e6ef2fb031d635b08.diff LOG: [OpenMP][FIX] Consistently use OpenMPIRBuilder if requested When we use the OpenMPIRBuilder for the parallel region we need to also use it to get the thread ID (among other things) in the body. This is because CGOpenMPRuntime::getThreadID() and CGOpenMPRuntime::emitUpdateLocation implicitly assumes that if they are called from within a parallel region there is a certain structure to the code and certain members of the OMPRegionInfo are initialized. It might make sense to initialize them even if we use the OpenMPIRBuilder but we would preferably get rid of such state instead. Bug reported by Anchu Rajendran Sudhakumari. Depends on D82470. Reviewed By: anchu-rajendran Differential Revision: https://reviews.llvm.org/D82822 Added: clang/test/OpenMP/irbuilder_nested_parallel_for.c Modified: clang/lib/CodeGen/CGOpenMPRuntime.cpp clang/test/OpenMP/cancel_codegen.cpp clang/test/OpenMP/task_codegen.cpp Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index dc12286c72be..60c7081b135b 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1455,6 +1455,19 @@ void CGOpenMPRuntime::clearLocThreadIdInsertPt(CodeGenFunction &CGF) { } } +static StringRef getIdentStringFromSourceLocation(CodeGenFunction &CGF, + SourceLocation Loc, + SmallString<128> &Buffer) { + llvm::raw_svector_ostream OS(Buffer); + // Build debug location + PresumedLoc PLoc = CGF.getContext().getSourceManager().getPresumedLoc(Loc); + OS << ";" << PLoc.getFilename() << ";"; + if (const auto *FD = dyn_cast_or_null(CGF.CurFuncDecl)) +OS << FD->getQualifiedNameAsString(); + OS << ";" << PLoc.getLine() << ";" << PLoc.getColumn() << ";;"; + return OS.str(); +} + llvm::Value *CGOpenMPRuntime::emitUpdateLocation(CodeGenFunction &CGF, SourceLocation Loc, unsigned Flags) { @@ -1464,6 +1477,16 @@ llvm::Value *CGOpenMPRuntime::emitUpdateLocation(CodeGenFunction &CGF, Loc.isInvalid()) return getOrCreateDefaultLocation(Flags).getPointer(); + // If the OpenMPIRBuilder is used we need to use it for all location handling + // as the clang invariants used below might be broken. + if (CGM.getLangOpts().OpenMPIRBuilder) { +SmallString<128> Buffer; +OMPBuilder.updateToLocation(CGF.Builder.saveIP()); +auto *SrcLocStr = OMPBuilder.getOrCreateSrcLocStr( +getIdentStringFromSourceLocation(CGF, Loc, Buffer)); +return OMPBuilder.getOrCreateIdent(SrcLocStr, IdentFlag(Flags)); + } + assert(CGF.CurFn && "No function in current CodeGenFunction."); CharUnits Align = CGM.getContext().getTypeAlignInChars(IdentQTy); @@ -1497,15 +1520,9 @@ llvm::Value *CGOpenMPRuntime::emitUpdateLocation(CodeGenFunction &CGF, llvm::Value *OMPDebugLoc = OpenMPDebugLocMap.lookup(Loc.getRawEncoding()); if (OMPDebugLoc == nullptr) { -SmallString<128> Buffer2; -llvm::raw_svector_ostream OS2(Buffer2); -// Build debug location -PresumedLoc PLoc = CGF.getContext().getSourceManager().getPresumedLoc(Loc); -OS2 << ";" << PLoc.getFilename() << ";"; -if (const auto *FD = dyn_cast_or_null(CGF.CurFuncDecl)) - OS2 << FD->getQualifiedNameAsString(); -OS2 << ";" << PLoc.getLine() << ";" << PLoc.getColumn() << ";;"; -OMPDebugLoc = CGF.Builder.CreateGlobalStringPtr(OS2.str()); +SmallString<128> Buffer; +OMPDebugLoc = CGF.Builder.CreateGlobalStringPtr( +getIdentStringFromSourceLocation(CGF, Loc, Buffer)); OpenMPDebugLocMap[Loc.getRawEncoding()] = OMPDebugLoc; } // *psource = ";;"; @@ -1519,6 +1536,16 @@ llvm::Value *CGOpenMPRuntime::emitUpdateLocation(CodeGenFunction &CGF, llvm::Value *CGOpenMPRuntime::getThreadID(CodeGenFunction &CGF, SourceLocation Loc) { assert(CGF.CurFn && "No function in current CodeGenFunction."); + // If the OpenMPIRBuilder is used we need to use it for all thread id calls as + // the clang invariants used below might be broken. + if (CGM.getLangOpts().OpenMPIRBuilder) { +SmallString<128> Buffer; +OMPBuilder.updateToLocation(CGF.Builder.saveIP()); +auto *SrcLocStr = OMPBuilder.getOrCreateSrcLocStr( +getIdentStringFromSourceLocation(CGF, Loc, Buffer)); +return OMPBuilder.getOrCreateThreadID( +OMPBuilder.getOrCreateIdent(SrcLocStr)); + } llvm::Value *ThreadID = nullptr
[clang] ceed44a - [OpenMP][NFC] Remove unnecessary argument
Author: Johannes Doerfert Date: 2020-04-04T11:34:58-05:00 New Revision: ceed44adfd1ae9d714eaa4f0e7fa5a1a149b4dc5 URL: https://github.com/llvm/llvm-project/commit/ceed44adfd1ae9d714eaa4f0e7fa5a1a149b4dc5 DIFF: https://github.com/llvm/llvm-project/commit/ceed44adfd1ae9d714eaa4f0e7fa5a1a149b4dc5.diff LOG: [OpenMP][NFC] Remove unnecessary argument Added: Modified: clang/include/clang/Sema/Sema.h clang/lib/Sema/SemaExpr.cpp clang/lib/Sema/SemaOpenMP.cpp Removed: diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h index 7c689c2a13e8..10e2d69f3d9e 100644 --- a/clang/include/clang/Sema/Sema.h +++ b/clang/include/clang/Sema/Sema.h @@ -9892,7 +9892,7 @@ class Sema final { /// specialization via the OpenMP declare variant mechanism available. If /// there is, return the specialized call expression, otherwise return the /// original \p Call. - ExprResult ActOnOpenMPCall(Sema &S, ExprResult Call, Scope *Scope, + ExprResult ActOnOpenMPCall(ExprResult Call, Scope *Scope, SourceLocation LParenLoc, MultiExprArg ArgExprs, SourceLocation RParenLoc, Expr *ExecConfig); diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index 8d0e97c85771..b311aad84816 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -5997,7 +5997,7 @@ ExprResult Sema::ActOnCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, } if (LangOpts.OpenMP) -Call = ActOnOpenMPCall(*this, Call, Scope, LParenLoc, ArgExprs, RParenLoc, +Call = ActOnOpenMPCall(Call, Scope, LParenLoc, ArgExprs, RParenLoc, ExecConfig); return Call; diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp index f663b1d43659..cfaf981983c1 100644 --- a/clang/lib/Sema/SemaOpenMP.cpp +++ b/clang/lib/Sema/SemaOpenMP.cpp @@ -5584,7 +5584,7 @@ void Sema::ActOnFinishedFunctionDefinitionInOpenMPDeclareVariantScope( BaseFD->setImplicit(true); } -ExprResult Sema::ActOnOpenMPCall(Sema &S, ExprResult Call, Scope *Scope, +ExprResult Sema::ActOnOpenMPCall(ExprResult Call, Scope *Scope, SourceLocation LParenLoc, MultiExprArg ArgExprs, SourceLocation RParenLoc, Expr *ExecConfig) { @@ -5601,8 +5601,8 @@ ExprResult Sema::ActOnOpenMPCall(Sema &S, ExprResult Call, Scope *Scope, if (!CalleeFnDecl->hasAttr()) return Call; - ASTContext &Context = S.getASTContext(); - OMPContext OMPCtx(S.getLangOpts().OpenMPIsDevice, + ASTContext &Context = getASTContext(); + OMPContext OMPCtx(getLangOpts().OpenMPIsDevice, Context.getTargetInfo().getTriple()); SmallVector Exprs; @@ -5650,12 +5650,12 @@ ExprResult Sema::ActOnOpenMPCall(Sema &S, ExprResult Call, Scope *Scope, if (auto *SpecializedMethod = dyn_cast(BestDecl)) { auto *MemberCall = dyn_cast(CE); BestExpr = MemberExpr::CreateImplicit( -S.Context, MemberCall->getImplicitObjectArgument(), -/* IsArrow */ false, SpecializedMethod, S.Context.BoundMemberTy, +Context, MemberCall->getImplicitObjectArgument(), +/* IsArrow */ false, SpecializedMethod, Context.BoundMemberTy, MemberCall->getValueKind(), MemberCall->getObjectKind()); } - NewCall = S.BuildCallExpr(Scope, BestExpr, LParenLoc, ArgExprs, RParenLoc, -ExecConfig); + NewCall = BuildCallExpr(Scope, BestExpr, LParenLoc, ArgExprs, RParenLoc, + ExecConfig); if (NewCall.isUsable()) break; } @@ -5666,7 +5666,6 @@ ExprResult Sema::ActOnOpenMPCall(Sema &S, ExprResult Call, Scope *Scope, if (!NewCall.isUsable()) return Call; - return PseudoObjectExpr::Create(Context, CE, {NewCall.get()}, 0); } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] 8ea07f6 - [OpenMP] Add extra qualification to OpenMP clause id
Author: Johannes Doerfert Date: 2020-04-05T23:10:58-05:00 New Revision: 8ea07f62a6f06bdb7da981425227995423867a4d URL: https://github.com/llvm/llvm-project/commit/8ea07f62a6f06bdb7da981425227995423867a4d DIFF: https://github.com/llvm/llvm-project/commit/8ea07f62a6f06bdb7da981425227995423867a4d.diff LOG: [OpenMP] Add extra qualification to OpenMP clause id Forgot to adjust this use in 419a559c5a73f13578d891feb1299cada08d581e. Added: Modified: clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp Removed: diff --git a/clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp b/clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp index efd70e778c6f..724e9b9b9cbc 100644 --- a/clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp +++ b/clang-tools-extra/clang-tidy/openmp/UseDefaultNoneCheck.cpp @@ -24,7 +24,7 @@ namespace openmp { void UseDefaultNoneCheck::registerMatchers(MatchFinder *Finder) { Finder->addMatcher( ompExecutableDirective( - allOf(isAllowedToContainClauseKind(OMPC_default), + allOf(isAllowedToContainClauseKind(llvm::omp::OMPC_default), anyOf(unless(hasAnyClause(ompDefaultClause())), hasAnyClause(ompDefaultClause(unless(isNoneKind())) .bind("clause") ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 931c0cd - [OpenMP][NFC] Move and simplify directive -> allowed clause mapping
Author: Johannes Doerfert Date: 2020-04-06T00:04:08-05:00 New Revision: 931c0cd713ee9b082389727bed1b518c6a44344f URL: https://github.com/llvm/llvm-project/commit/931c0cd713ee9b082389727bed1b518c6a44344f DIFF: https://github.com/llvm/llvm-project/commit/931c0cd713ee9b082389727bed1b518c6a44344f.diff LOG: [OpenMP][NFC] Move and simplify directive -> allowed clause mapping Move the listing of allowed clauses per OpenMP directive to the new macro file in `llvm/Frontend/OpenMP`. Also, use a single generic macro that specifies the directive and one allowed clause explicitly instead of a dedicated macro per directive. We save 800 loc and boilerplate for all new directives/clauses with no functional change. We also need to include the macro file only once and not once per directive. Depends on D77112. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D77113 Added: Modified: clang/include/clang/ASTMatchers/ASTMatchers.h clang/include/clang/Basic/OpenMPKinds.def clang/include/clang/Basic/OpenMPKinds.h clang/lib/ASTMatchers/Dynamic/CMakeLists.txt clang/lib/Basic/OpenMPKinds.cpp clang/lib/Tooling/CMakeLists.txt clang/lib/Tooling/Transformer/CMakeLists.txt clang/unittests/AST/CMakeLists.txt clang/unittests/ASTMatchers/CMakeLists.txt clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt clang/unittests/Analysis/CMakeLists.txt clang/unittests/Rename/CMakeLists.txt clang/unittests/Sema/CMakeLists.txt clang/unittests/StaticAnalyzer/CMakeLists.txt clang/unittests/Tooling/CMakeLists.txt llvm/include/llvm/Frontend/OpenMP/OMPConstants.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPConstants.cpp Removed: diff --git a/clang/include/clang/ASTMatchers/ASTMatchers.h b/clang/include/clang/ASTMatchers/ASTMatchers.h index 8d97c32a0d36..9d7b4dcaacfd 100644 --- a/clang/include/clang/ASTMatchers/ASTMatchers.h +++ b/clang/include/clang/ASTMatchers/ASTMatchers.h @@ -7119,7 +7119,7 @@ AST_MATCHER(OMPDefaultClause, isSharedKind) { /// ``isAllowedToContainClauseKind("OMPC_default").`` AST_MATCHER_P(OMPExecutableDirective, isAllowedToContainClauseKind, OpenMPClauseKind, CKind) { - return isAllowedClauseForDirective( + return llvm::omp::isAllowedClauseForDirective( Node.getDirectiveKind(), CKind, Finder->getASTContext().getLangOpts().OpenMP); } diff --git a/clang/include/clang/Basic/OpenMPKinds.def b/clang/include/clang/Basic/OpenMPKinds.def index 4a4e6c6cb4c3..0ae0bc844e36 100644 --- a/clang/include/clang/Basic/OpenMPKinds.def +++ b/clang/include/clang/Basic/OpenMPKinds.def @@ -11,102 +11,6 @@ /// //===--===// -#ifndef OPENMP_CLAUSE -# define OPENMP_CLAUSE(Name, Class) -#endif -#ifndef OPENMP_PARALLEL_CLAUSE -# define OPENMP_PARALLEL_CLAUSE(Name) -#endif -#ifndef OPENMP_SIMD_CLAUSE -# define OPENMP_SIMD_CLAUSE(Name) -#endif -#ifndef OPENMP_FOR_CLAUSE -# define OPENMP_FOR_CLAUSE(Name) -#endif -#ifndef OPENMP_FOR_SIMD_CLAUSE -# define OPENMP_FOR_SIMD_CLAUSE(Name) -#endif -#ifndef OPENMP_SECTIONS_CLAUSE -# define OPENMP_SECTIONS_CLAUSE(Name) -#endif -#ifndef OPENMP_SINGLE_CLAUSE -# define OPENMP_SINGLE_CLAUSE(Name) -#endif -#ifndef OPENMP_PARALLEL_FOR_CLAUSE -# define OPENMP_PARALLEL_FOR_CLAUSE(Name) -#endif -#ifndef OPENMP_PARALLEL_FOR_SIMD_CLAUSE -# define OPENMP_PARALLEL_FOR_SIMD_CLAUSE(Name) -#endif -#ifndef OPENMP_PARALLEL_MASTER_CLAUSE -# define OPENMP_PARALLEL_MASTER_CLAUSE(Name) -#endif -#ifndef OPENMP_PARALLEL_SECTIONS_CLAUSE -# define OPENMP_PARALLEL_SECTIONS_CLAUSE(Name) -#endif -#ifndef OPENMP_TASK_CLAUSE -# define OPENMP_TASK_CLAUSE(Name) -#endif -#ifndef OPENMP_ATOMIC_CLAUSE -# define OPENMP_ATOMIC_CLAUSE(Name) -#endif -#ifndef OPENMP_TARGET_CLAUSE -# define OPENMP_TARGET_CLAUSE(Name) -#endif -#ifndef OPENMP_REQUIRES_CLAUSE -# define OPENMP_REQUIRES_CLAUSE(Name) -#endif -#ifndef OPENMP_TARGET_DATA_CLAUSE -# define OPENMP_TARGET_DATA_CLAUSE(Name) -#endif -#ifndef OPENMP_TARGET_ENTER_DATA_CLAUSE -#define OPENMP_TARGET_ENTER_DATA_CLAUSE(Name) -#endif -#ifndef OPENMP_TARGET_EXIT_DATA_CLAUSE -#define OPENMP_TARGET_EXIT_DATA_CLAUSE(Name) -#endif -#ifndef OPENMP_TARGET_PARALLEL_CLAUSE -# define OPENMP_TARGET_PARALLEL_CLAUSE(Name) -#endif -#ifndef OPENMP_TARGET_PARALLEL_FOR_CLAUSE -# define OPENMP_TARGET_PARALLEL_FOR_CLAUSE(Name) -#endif -#ifndef OPENMP_TARGET_UPDATE_CLAUSE -# define OPENMP_TARGET_UPDATE_CLAUSE(Name) -#endif -#ifndef OPENMP_TEAMS_CLAUSE -# define OPENMP_TEAMS_CLAUSE(Name) -#endif -#ifndef OPENMP_CANCEL_CLAUSE -# define OPENMP_CANCEL_CLAUSE(Name) -#endif -#ifndef OPENMP_ORDERED_CLAUSE -# define OPENMP_ORDERED_CLAUSE(Name) -#endif -#ifndef OPENMP_TASKLOOP_CLAUSE -# define OPENMP_TASKLOOP_CLAUSE(Name) -#endif -#ifndef OPENMP_TASKLOOP_SIMD_CLAUSE -# define OPENMP_TA
[clang-tools-extra] 9e1af17 - [OpenMP][FIX] Add missing cmake dependence needed after 931c0cd713ee
Author: Johannes Doerfert Date: 2020-04-06T09:01:43-05:00 New Revision: 9e1af172eec9a06bffac337057a2452b88466288 URL: https://github.com/llvm/llvm-project/commit/9e1af172eec9a06bffac337057a2452b88466288 DIFF: https://github.com/llvm/llvm-project/commit/9e1af172eec9a06bffac337057a2452b88466288.diff LOG: [OpenMP][FIX] Add missing cmake dependence needed after 931c0cd713ee Added: Modified: clang-tools-extra/clang-reorder-fields/CMakeLists.txt Removed: diff --git a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt index 9c75d785cc9a..c357d0a3cfbf 100644 --- a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt +++ b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt @@ -1,4 +1,7 @@ -set(LLVM_LINK_COMPONENTS support) +set(LLVM_LINK_COMPONENTS + FrontendOpenMP + support +) add_clang_library(clangReorderFields ReorderFieldsAction.cpp ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Re: [clang-tools-extra] 9e1af17 - [OpenMP][FIX] Add missing cmake dependence needed after 931c0cd713ee
On 4/6/20 9:06 AM, Roman Lebedev wrote: This seems suspicious. Agreed, especially since this is also not the only place. I was hoping to unblock the builders with this. Does clang-reorder-fields actually explicitly needs something from FrontendOpenMP? If not, it looks like there dependency is missing elsewhere, or there's wrong layering. The root cause is the use of `isAllowedClauseForDirective` in ASTMatchers.h. On Mon, Apr 6, 2020 at 5:03 PM Johannes Doerfert via cfe-commits wrote: Author: Johannes Doerfert Date: 2020-04-06T09:01:43-05:00 New Revision: 9e1af172eec9a06bffac337057a2452b88466288 URL: https://github.com/llvm/llvm-project/commit/9e1af172eec9a06bffac337057a2452b88466288 DIFF: https://github.com/llvm/llvm-project/commit/9e1af172eec9a06bffac337057a2452b88466288.diff LOG: [OpenMP][FIX] Add missing cmake dependence needed after 931c0cd713ee Added: Modified: clang-tools-extra/clang-reorder-fields/CMakeLists.txt Removed: diff --git a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt index 9c75d785cc9a..c357d0a3cfbf 100644 --- a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt +++ b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt @@ -1,4 +1,7 @@ -set(LLVM_LINK_COMPONENTS support) +set(LLVM_LINK_COMPONENTS + FrontendOpenMP + support +) add_clang_library(clangReorderFields ReorderFieldsAction.cpp ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] 97aa593 - [OpenMP] Fix layering problem with FrontendOpenMP
Author: Johannes Doerfert Date: 2020-04-06T13:04:26-05:00 New Revision: 97aa593a8387586095b7eac12974ba2fdd08f4c3 URL: https://github.com/llvm/llvm-project/commit/97aa593a8387586095b7eac12974ba2fdd08f4c3 DIFF: https://github.com/llvm/llvm-project/commit/97aa593a8387586095b7eac12974ba2fdd08f4c3.diff LOG: [OpenMP] Fix layering problem with FrontendOpenMP Summary: ASTMatchers is used in various places and it now exposes the LLVMFrontendOpenMP library to its users without them needing to depend on it explicitly. Reviewers: lebedev.ri Subscribers: mgorny, yaxunl, bollu, guansong, martong, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77574 Added: Modified: clang-tools-extra/clang-reorder-fields/CMakeLists.txt clang-tools-extra/clang-tidy/openmp/CMakeLists.txt clang/lib/ASTMatchers/CMakeLists.txt clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt clang/lib/StaticAnalyzer/Core/CMakeLists.txt clang/lib/Tooling/CMakeLists.txt clang/lib/Tooling/Transformer/CMakeLists.txt clang/unittests/AST/CMakeLists.txt clang/unittests/ASTMatchers/CMakeLists.txt clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt clang/unittests/Analysis/CMakeLists.txt clang/unittests/Rename/CMakeLists.txt clang/unittests/Sema/CMakeLists.txt clang/unittests/StaticAnalyzer/CMakeLists.txt clang/unittests/Tooling/CMakeLists.txt Removed: diff --git a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt index c357d0a3cfbf..153271ce58e4 100644 --- a/clang-tools-extra/clang-reorder-fields/CMakeLists.txt +++ b/clang-tools-extra/clang-reorder-fields/CMakeLists.txt @@ -1,5 +1,4 @@ set(LLVM_LINK_COMPONENTS - FrontendOpenMP support ) diff --git a/clang-tools-extra/clang-tidy/openmp/CMakeLists.txt b/clang-tools-extra/clang-tidy/openmp/CMakeLists.txt index af95704fd445..ad1f591a6338 100644 --- a/clang-tools-extra/clang-tidy/openmp/CMakeLists.txt +++ b/clang-tools-extra/clang-tidy/openmp/CMakeLists.txt @@ -1,5 +1,4 @@ set(LLVM_LINK_COMPONENTS - FrontendOpenMP Support) add_clang_library(clangTidyOpenMPModule diff --git a/clang/lib/ASTMatchers/CMakeLists.txt b/clang/lib/ASTMatchers/CMakeLists.txt index cde871cd31ca..1b78d95e4fac 100644 --- a/clang/lib/ASTMatchers/CMakeLists.txt +++ b/clang/lib/ASTMatchers/CMakeLists.txt @@ -1,7 +1,6 @@ add_subdirectory(Dynamic) set(LLVM_LINK_COMPONENTS - FrontendOpenMP Support ) @@ -15,3 +14,5 @@ add_clang_library(clangASTMatchers clangBasic clangLex ) + +target_link_libraries(clangASTMatchers PUBLIC LLVMFrontendOpenMP) diff --git a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt index bcf2dfdb8326..b7fb0d90c980 100644 --- a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt +++ b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt @@ -1,5 +1,4 @@ set(LLVM_LINK_COMPONENTS - FrontendOpenMP Support ) diff --git a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt index 057cdd4bb18a..c7c9aa2ff1f4 100644 --- a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt +++ b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt @@ -1,5 +1,4 @@ set(LLVM_LINK_COMPONENTS - FrontendOpenMP Support ) diff --git a/clang/lib/Tooling/CMakeLists.txt b/clang/lib/Tooling/CMakeLists.txt index 71b6cc55e504..59c990daaa29 100644 --- a/clang/lib/Tooling/CMakeLists.txt +++ b/clang/lib/Tooling/CMakeLists.txt @@ -1,6 +1,5 @@ set(LLVM_LINK_COMPONENTS Option - FrontendOpenMP Support ) diff --git a/clang/lib/Tooling/Transformer/CMakeLists.txt b/clang/lib/Tooling/Transformer/CMakeLists.txt index 281af1007a65..3d82bb98ff69 100644 --- a/clang/lib/Tooling/Transformer/CMakeLists.txt +++ b/clang/lib/Tooling/Transformer/CMakeLists.txt @@ -1,5 +1,4 @@ set(LLVM_LINK_COMPONENTS - FrontendOpenMP Support ) diff --git a/clang/unittests/AST/CMakeLists.txt b/clang/unittests/AST/CMakeLists.txt index 868635b6eea5..b738ce08d06d 100644 --- a/clang/unittests/AST/CMakeLists.txt +++ b/clang/unittests/AST/CMakeLists.txt @@ -1,5 +1,4 @@ set(LLVM_LINK_COMPONENTS - FrontendOpenMP Support ) diff --git a/clang/unittests/ASTMatchers/CMakeLists.txt b/clang/unittests/ASTMatchers/CMakeLists.txt index e128cfe695a6..aa5438c947f4 100644 --- a/clang/unittests/ASTMatchers/CMakeLists.txt +++ b/clang/unittests/ASTMatchers/CMakeLists.txt @@ -1,5 +1,4 @@ set(LLVM_LINK_COMPONENTS - FrontendOpenMP Support ) diff --git a/clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt b/clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt index 85556b01cae1..c40964dfaf0b 100644 --- a/clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt +++ b/clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt @@ -1,5 +1,4 @@ set(LLVM_LINK_COMPONENTS - FrontendOpenMP Support ) diff --git a/clang
[clang-tools-extra] f9d558c - [OpenMP] "UnFix" layering problem with FrontendOpenMP
Author: Johannes Doerfert Date: 2020-04-07T14:41:18-05:00 New Revision: f9d558c871337699d2815dbf116bae94025f5d90 URL: https://github.com/llvm/llvm-project/commit/f9d558c871337699d2815dbf116bae94025f5d90 DIFF: https://github.com/llvm/llvm-project/commit/f9d558c871337699d2815dbf116bae94025f5d90.diff LOG: [OpenMP] "UnFix" layering problem with FrontendOpenMP This reverts commit 97aa593a8387586095b7eac12974ba2fdd08f4c3 as it causes problems (PR45453) https://reviews.llvm.org/D77574#1966321. This additionally adds an explicit reference to FrontendOpenMP to clang-tidy where ASTMatchers is used. This is hopefully just a temporary solution. The dependence on `FrontendOpenMP` from `ASTMatchers` should be handled by CMake implicitly, not us explicitly. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D77666 Added: Modified: clang-tools-extra/clang-change-namespace/CMakeLists.txt clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt clang-tools-extra/clang-doc/CMakeLists.txt clang-tools-extra/clang-include-fixer/find-all-symbols/CMakeLists.txt clang-tools-extra/clang-move/CMakeLists.txt clang-tools-extra/clang-query/CMakeLists.txt clang-tools-extra/clang-reorder-fields/CMakeLists.txt clang-tools-extra/clang-tidy/CMakeLists.txt clang-tools-extra/clang-tidy/abseil/CMakeLists.txt clang-tools-extra/clang-tidy/android/CMakeLists.txt clang-tools-extra/clang-tidy/boost/CMakeLists.txt clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt clang-tools-extra/clang-tidy/cert/CMakeLists.txt clang-tools-extra/clang-tidy/cppcoreguidelines/CMakeLists.txt clang-tools-extra/clang-tidy/darwin/CMakeLists.txt clang-tools-extra/clang-tidy/fuchsia/CMakeLists.txt clang-tools-extra/clang-tidy/google/CMakeLists.txt clang-tools-extra/clang-tidy/hicpp/CMakeLists.txt clang-tools-extra/clang-tidy/linuxkernel/CMakeLists.txt clang-tools-extra/clang-tidy/llvm/CMakeLists.txt clang-tools-extra/clang-tidy/llvmlibc/CMakeLists.txt clang-tools-extra/clang-tidy/misc/CMakeLists.txt clang-tools-extra/clang-tidy/modernize/CMakeLists.txt clang-tools-extra/clang-tidy/mpi/CMakeLists.txt clang-tools-extra/clang-tidy/objc/CMakeLists.txt clang-tools-extra/clang-tidy/openmp/CMakeLists.txt clang-tools-extra/clang-tidy/performance/CMakeLists.txt clang-tools-extra/clang-tidy/portability/CMakeLists.txt clang-tools-extra/clang-tidy/readability/CMakeLists.txt clang-tools-extra/clang-tidy/utils/CMakeLists.txt clang-tools-extra/clang-tidy/zircon/CMakeLists.txt clang-tools-extra/clangd/CMakeLists.txt clang-tools-extra/clangd/unittests/CMakeLists.txt clang-tools-extra/tool-template/CMakeLists.txt clang-tools-extra/unittests/clang-change-namespace/CMakeLists.txt clang-tools-extra/unittests/clang-doc/CMakeLists.txt clang-tools-extra/unittests/clang-include-fixer/find-all-symbols/CMakeLists.txt clang-tools-extra/unittests/clang-move/CMakeLists.txt clang-tools-extra/unittests/clang-query/CMakeLists.txt clang-tools-extra/unittests/clang-tidy/CMakeLists.txt clang/lib/ASTMatchers/CMakeLists.txt clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt clang/lib/StaticAnalyzer/Core/CMakeLists.txt clang/lib/Tooling/CMakeLists.txt clang/lib/Tooling/Transformer/CMakeLists.txt clang/unittests/AST/CMakeLists.txt clang/unittests/ASTMatchers/CMakeLists.txt clang/unittests/ASTMatchers/Dynamic/CMakeLists.txt clang/unittests/Analysis/CMakeLists.txt clang/unittests/Rename/CMakeLists.txt clang/unittests/Sema/CMakeLists.txt clang/unittests/StaticAnalyzer/CMakeLists.txt clang/unittests/Tooling/CMakeLists.txt Removed: diff --git a/clang-tools-extra/clang-change-namespace/CMakeLists.txt b/clang-tools-extra/clang-change-namespace/CMakeLists.txt index 178306423eb7..7c0363cd00d0 100644 --- a/clang-tools-extra/clang-change-namespace/CMakeLists.txt +++ b/clang-tools-extra/clang-change-namespace/CMakeLists.txt @@ -1,5 +1,6 @@ set(LLVM_LINK_COMPONENTS - support + FrontendOpenMP + Support ) add_clang_library(clangChangeNamespace diff --git a/clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt b/clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt index ae48a5e0f798..c168bb4d5794 100644 --- a/clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt +++ b/clang-tools-extra/clang-change-namespace/tool/CMakeLists.txt @@ -1,6 +1,7 @@ include_directories(${CMAKE_CURRENT_SOURCE_DIR}/..) set(LLVM_LINK_COMPONENTS + FrontendOpenMP Support ) diff --git a/clang-tools-extra/clang-doc/CMakeLists.txt b/clang-tools-extra/clang-doc/CMakeLists.txt index c301ad5aface..8df7d3ef9098 100644 --- a/clang-tools-extra/clang-doc/CMakeLists.txt +++ b/clang-tools-extra/clang-doc/CMakeLists.txt @@ -1,6 +1,7 @@ set(LLVM_LINK_COMPONENTS support Bit
[clang-tools-extra] 5303770 - [OpenMP] "UnFix" last layering problem with FrontendOpenMP
Author: Johannes Doerfert Date: 2020-04-07T22:47:41-05:00 New Revision: 530377018f624eadb8c07650511bbb9ca63608de URL: https://github.com/llvm/llvm-project/commit/530377018f624eadb8c07650511bbb9ca63608de DIFF: https://github.com/llvm/llvm-project/commit/530377018f624eadb8c07650511bbb9ca63608de.diff LOG: [OpenMP] "UnFix" last layering problem with FrontendOpenMP It seems one target was missed in D77666 which kept some bots red [0]. [0] http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/12079/steps/build%20stage%201/logs/stdio Added: Modified: clang-tools-extra/clang-tidy/tool/CMakeLists.txt Removed: diff --git a/clang-tools-extra/clang-tidy/tool/CMakeLists.txt b/clang-tools-extra/clang-tidy/tool/CMakeLists.txt index 0cd15ddb4653..ff9104b661d0 100644 --- a/clang-tools-extra/clang-tidy/tool/CMakeLists.txt +++ b/clang-tools-extra/clang-tidy/tool/CMakeLists.txt @@ -2,6 +2,7 @@ set(LLVM_LINK_COMPONENTS AllTargetsAsmParsers AllTargetsDescs AllTargetsInfos + FrontendOpenMP support ) ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] a19eb1d - [OpenMP] Add match_{all,any,none} declare variant selector extensions.
Author: Johannes Doerfert Date: 2020-04-07T23:33:24-05:00 New Revision: a19eb1de726c1ccbf60dca6a1fbcd49b3157282f URL: https://github.com/llvm/llvm-project/commit/a19eb1de726c1ccbf60dca6a1fbcd49b3157282f DIFF: https://github.com/llvm/llvm-project/commit/a19eb1de726c1ccbf60dca6a1fbcd49b3157282f.diff LOG: [OpenMP] Add match_{all,any,none} declare variant selector extensions. By default, all traits in the OpenMP context selector have to match for it to be acceptable. Though, we sometimes want a single property out of multiple to match (=any) or no match at all (=none). We offer these choices as extensions via `implementation={extension(match_{all,any,none})}` to the user. The choice will affect the entire context selector not only the traits following the match property. The first user will be D75788. There we can replace ``` #pragma omp begin declare variant match(device={arch(nvptx64)}) #define __CUDA__ #include <__clang_cuda_cmath.h> // TODO: Hack until we support an extension to the match clause that allows "or". #undef __CLANG_CUDA_CMATH_H__ #undef __CUDA__ #pragma omp end declare variant #pragma omp begin declare variant match(device={arch(nvptx)}) #define __CUDA__ #include <__clang_cuda_cmath.h> #undef __CUDA__ #pragma omp end declare variant ``` with the much simpler ``` #pragma omp begin declare variant match(device={arch(nvptx, nvptx64)}, implementation={extension(match_any)}) #define __CUDA__ #include <__clang_cuda_cmath.h> #undef __CUDA__ #pragma omp end declare variant ``` Reviewed By: mikerice Differential Revision: https://reviews.llvm.org/D77414 Added: clang/test/AST/ast-dump-openmp-declare-variant-extensions-messages.c clang/test/AST/ast-dump-openmp-declare-variant-extensions.c Modified: clang/include/clang/AST/OpenMPClause.h clang/include/clang/Basic/AttrDocs.td clang/include/clang/Basic/DiagnosticParseKinds.td clang/lib/AST/OpenMPClause.cpp clang/lib/Parse/ParseOpenMP.cpp clang/lib/Sema/SemaOpenMP.cpp clang/test/OpenMP/declare_variant_ast_print.c clang/test/OpenMP/declare_variant_messages.c llvm/include/llvm/Frontend/OpenMP/OMPContext.h llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Frontend/OpenMP/OMPContext.cpp Removed: diff --git a/clang/include/clang/AST/OpenMPClause.h b/clang/include/clang/AST/OpenMPClause.h index 9f5ff5a85182..f276611e3d0c 100644 --- a/clang/include/clang/AST/OpenMPClause.h +++ b/clang/include/clang/AST/OpenMPClause.h @@ -7229,11 +7229,9 @@ class OMPTraitInfo { /// former is a flat representation the actual main diff erence is that the /// latter uses clang::Expr to store the score/condition while the former is /// independent of clang. Thus, expressions and conditions are evaluated in - /// this method. If \p DeviceSetOnly is true, only the device selector set, if - /// present, is put in \p VMI, otherwise all selector sets are put in \p VMI. + /// this method. void getAsVariantMatchInfo(ASTContext &ASTCtx, - llvm::omp::VariantMatchInfo &VMI, - bool DeviceSetOnly) const; + llvm::omp::VariantMatchInfo &VMI) const; /// Return a string representation identifying this context selector. std::string getMangledName() const; diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td index 36561c04d395..e3308cd74874 100644 --- a/clang/include/clang/Basic/AttrDocs.td +++ b/clang/include/clang/Basic/AttrDocs.td @@ -3394,6 +3394,20 @@ where clause is one of the following: and where `variant-func-id` is the name of a function variant that is either a base language identifier or, for C++, a template-id. +Clang provides the following context selector extensions, used via `implementation={extension(EXTENSION)}`: + + .. code-block:: none + +match_all +match_any +match_none + +The match extensions change when the *entire* context selector is considered a +match for an OpenMP context. The default is `all`, with `none` no trait in the +selector is allowed to be in the OpenMP context, with `any` a single trait in +both the selector and OpenMP context is sufficient. Only a single match +extension trait is allowed per context selector. + }]; } diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td index f3741531c8b5..a1311d7776c6 100644 --- a/clang/include/clang/Basic/DiagnosticParseKinds.td +++ b/clang/include/clang/Basic/DiagnosticParseKinds.td @@ -1313,6 +1313,8 @@ def warn_omp_ctx_incompatible_score_for_property def warn_omp_more_one_device_type_clause : Warning<"more than one 'device_type' clause is specified">, InGroup; +def err_omp_variant_ctx_second_match_extension : Error< + "only a single match extension allowed per OpenMP context s
[clang] eb5a16e - [OpenMP] Specialize OpenMP calls after template instantiation
Author: Johannes Doerfert Date: 2020-04-07T23:33:24-05:00 New Revision: eb5a16efbf59150af31bd4e3d37b8ea5976d780b URL: https://github.com/llvm/llvm-project/commit/eb5a16efbf59150af31bd4e3d37b8ea5976d780b DIFF: https://github.com/llvm/llvm-project/commit/eb5a16efbf59150af31bd4e3d37b8ea5976d780b.diff LOG: [OpenMP] Specialize OpenMP calls after template instantiation As with regular calls, we want to specialize a call that went through template instantiation if it has an applicable OpenMP declare variant. Reviewed By: erichkeane, mikerice Differential Revision: https://reviews.llvm.org/D77290 Added: clang/test/AST/ast-dump-openmp-begin-declare-variant_template_1.cpp Modified: clang/lib/Sema/TreeTransform.h Removed: diff --git a/clang/lib/Sema/TreeTransform.h b/clang/lib/Sema/TreeTransform.h index e9f4b11ca7bb..a3103205f2bd 100644 --- a/clang/lib/Sema/TreeTransform.h +++ b/clang/lib/Sema/TreeTransform.h @@ -2411,8 +2411,8 @@ class TreeTransform { MultiExprArg Args, SourceLocation RParenLoc, Expr *ExecConfig = nullptr) { -return getSema().BuildCallExpr(/*Scope=*/nullptr, Callee, LParenLoc, Args, - RParenLoc, ExecConfig); +return getSema().ActOnCallExpr( +/*Scope=*/nullptr, Callee, LParenLoc, Args, RParenLoc, ExecConfig); } /// Build a new member access expression. diff --git a/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_1.cpp b/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_1.cpp new file mode 100644 index ..6a663d5d75d9 --- /dev/null +++ b/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_1.cpp @@ -0,0 +1,170 @@ +// RUN: %clang_cc1 -triple x86_64-unknown-unknown -fopenmp -verify -ast-dump %s | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-unknown-unknown -fopenmp -verify -ast-dump %s -x c++| FileCheck %s +// expected-no-diagnostics + +int also_before() { + return 1; +} + +#pragma omp begin declare variant match(implementation={vendor(score(100):llvm)}) +int also_after(void) { + return 2; +} +int also_after(int) { + return 3; +} +int also_after(double) { + return 0; +} +#pragma omp end declare variant +#pragma omp begin declare variant match(implementation={vendor(score(0):llvm)}) +int also_before() { + return 0; +} +#pragma omp end declare variant + +int also_after(void) { + return 4; +} +int also_after(int) { + return 5; +} +int also_after(double) { + return 6; +} + +template +int test1() { + // Should return 0. + return also_after(T(0)); +} + +typedef int(*Ty)(); + +template +int test2() { + // Should return 0. + return fn(); +} + +int test() { + // Should return 0. + return test1() + test2(); +} + +// CHECK: |-FunctionDecl [[ADDR_0:0x[a-z0-9]*]] <{{.*}}, line:7:1> line:5:5 used also_before 'int ({{.*}})' +// CHECK-NEXT: | |-CompoundStmt [[ADDR_1:0x[a-z0-9]*]] +// CHECK-NEXT: | | `-ReturnStmt [[ADDR_2:0x[a-z0-9]*]] +// CHECK-NEXT: | | `-IntegerLiteral [[ADDR_3:0x[a-z0-9]*]] 'int' 1 +// CHECK-NEXT: | `-OMPDeclareVariantAttr [[ADDR_4:0x[a-z0-9]*]] <> Implicit implementation={vendor(score(0): llvm)} +// CHECK-NEXT: | `-DeclRefExpr [[ADDR_5:0x[a-z0-9]*]] 'int ({{.*}})' Function [[ADDR_6:0x[a-z0-9]*]] 'also_before[implementation={vendor(llvm)}]' 'int ({{.*}})' +// CHECK-NEXT: |-FunctionDecl [[ADDR_7:0x[a-z0-9]*]] col:5 implicit also_after 'int ({{.*}})' +// CHECK-NEXT: | `-OMPDeclareVariantAttr [[ADDR_8:0x[a-z0-9]*]] <> Implicit implementation={vendor(score(100): llvm)} +// CHECK-NEXT: | `-DeclRefExpr [[ADDR_9:0x[a-z0-9]*]] 'int ({{.*}})' Function [[ADDR_10:0x[a-z0-9]*]] 'also_after[implementation={vendor(llvm)}]' 'int ({{.*}})' +// CHECK-NEXT: |-FunctionDecl [[ADDR_10]] line:10:1 also_after[implementation={vendor(llvm)}] 'int ({{.*}})' +// CHECK-NEXT: | `-CompoundStmt [[ADDR_11:0x[a-z0-9]*]] +// CHECK-NEXT: | `-ReturnStmt [[ADDR_12:0x[a-z0-9]*]] +// CHECK-NEXT: | `-IntegerLiteral [[ADDR_13:0x[a-z0-9]*]] 'int' 2 +// CHECK-NEXT: |-FunctionDecl [[ADDR_14:0x[a-z0-9]*]] col:5 implicit also_after 'int (int)' +// CHECK-NEXT: | |-ParmVarDecl [[ADDR_15:0x[a-z0-9]*]] col:19 'int' +// CHECK-NEXT: | `-OMPDeclareVariantAttr [[ADDR_16:0x[a-z0-9]*]] <> Implicit implementation={vendor(score(100): llvm)} +// CHECK-NEXT: | `-DeclRefExpr [[ADDR_17:0x[a-z0-9]*]] 'int (int)' Function [[ADDR_18:0x[a-z0-9]*]] 'also_after[implementation={vendor(llvm)}]' 'int (int)' +// CHECK-NEXT: |-FunctionDecl [[ADDR_18]] line:13:1 also_after[implementation={vendor(llvm)}] 'int (int)' +// CHECK-NEXT: | |-ParmVarDecl [[ADDR_15]] col:19 'int' +// CHECK-NEXT: | `-CompoundStmt [[ADDR_19:0x[a-z0-9]*]] +// CHECK-NEXT: | `-ReturnStmt [[ADDR_20:0x[a-z0-9]*]] +// CHECK-NEXT: | `-IntegerLiteral [[ADDR_21:0x[a-z0-9]*]] 'int' 3 +// CHECK-NEXT: |-Function
[clang] f85ae05 - [OpenMP] Provide math functions in OpenMP device code via OpenMP variants
Author: Johannes Doerfert Date: 2020-04-07T23:33:24-05:00 New Revision: f85ae058f580e9d74c4a8f2f0de168c18da6150f URL: https://github.com/llvm/llvm-project/commit/f85ae058f580e9d74c4a8f2f0de168c18da6150f DIFF: https://github.com/llvm/llvm-project/commit/f85ae058f580e9d74c4a8f2f0de168c18da6150f.diff LOG: [OpenMP] Provide math functions in OpenMP device code via OpenMP variants For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions, we include the appropriate definitions inside of an `omp begin/end declare variant match(device={arch(nvptx)})` scope. This way, the vendor specific math functions will become specialized versions of the system math functions. When a system math function is called and specialized version is available the selection logic introduced in D75779 instead call the specialized version. In contrast to the code path we used so far, the system header is actually included. This means functions without specialized versions are available and so are macro definitions. This should address PR42061, PR42798, and PR42799. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D75788 Added: clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h clang/lib/Headers/openmp_wrappers/time.h clang/test/Headers/Inputs/include/climits clang/test/Headers/nvptx_device_math_complex.c clang/test/Headers/nvptx_device_math_macro.cpp clang/test/Headers/nvptx_device_math_modf.cpp clang/test/Headers/nvptx_device_math_sin.c clang/test/Headers/nvptx_device_math_sin.cpp clang/test/Headers/nvptx_device_math_sin_cos.cpp clang/test/Headers/nvptx_device_math_sincos.cpp Modified: clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Headers/CMakeLists.txt clang/lib/Headers/__clang_cuda_cmath.h clang/lib/Headers/__clang_cuda_device_functions.h clang/lib/Headers/__clang_cuda_math.h clang/lib/Headers/__clang_cuda_math_forward_declares.h clang/lib/Headers/openmp_wrappers/cmath clang/lib/Headers/openmp_wrappers/math.h clang/test/Headers/Inputs/include/cmath clang/test/Headers/Inputs/include/cstdlib clang/test/Headers/Inputs/include/math.h clang/test/Headers/Inputs/include/stdlib.h clang/test/Headers/nvptx_device_cmath_functions.c clang/test/Headers/nvptx_device_cmath_functions.cpp clang/test/Headers/nvptx_device_cmath_functions_cxx17.cpp clang/test/Headers/nvptx_device_math_functions.c clang/test/Headers/nvptx_device_math_functions.cpp clang/test/Headers/nvptx_device_math_functions_cxx17.cpp Removed: clang/lib/Headers/openmp_wrappers/__clang_openmp_math.h clang/lib/Headers/openmp_wrappers/__clang_openmp_math_declares.h diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 4d825301be41..2b368131f5cc 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -1216,7 +1216,7 @@ void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA, } CmdArgs.push_back("-include"); -CmdArgs.push_back("__clang_openmp_math_declares.h"); +CmdArgs.push_back("__clang_openmp_device_functions.h"); } // Add -i* options, and automatically translate to diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt index 6851957600e0..d6c8ed5e1fc6 100644 --- a/clang/lib/Headers/CMakeLists.txt +++ b/clang/lib/Headers/CMakeLists.txt @@ -145,8 +145,7 @@ set(ppc_wrapper_files set(openmp_wrapper_files openmp_wrappers/math.h openmp_wrappers/cmath - openmp_wrappers/__clang_openmp_math.h - openmp_wrappers/__clang_openmp_math_declares.h + openmp_wrappers/__clang_openmp_device_functions.h openmp_wrappers/new ) diff --git a/clang/lib/Headers/__clang_cuda_cmath.h b/clang/lib/Headers/__clang_cuda_cmath.h index 834a2e3fd134..f406112164e5 100644 --- a/clang/lib/Headers/__clang_cuda_cmath.h +++ b/clang/lib/Headers/__clang_cuda_cmath.h @@ -12,7 +12,9 @@ #error "This file is for CUDA compilation only." #endif +#ifndef _OPENMP #include +#endif // CUDA lets us use various std math functions on the device side. This file // works in concert with __clang_cuda_math_forward_declares.h to make this work. @@ -31,31 +33,15 @@ // std covers all of the known knowns. #ifdef _OPENMP -#define __DEVICE__ static __attribute__((always_inline)) +#define __DEVICE__ static constexpr __attribute__((always_inline, nothrow)) #else #define __DEVICE__ static __device__ __inline__ __attribute__((always_inline)) #endif -// For C++ 17 we need to include noexcept attribute to be compatible -// with the header-defined version. This may be removed once -// variant is supported. -#if defined(_OPENMP) && defined(__cplusplus) && __cplusplus >= 201703L -#define __NOEXCEPT noexcept -#else -#define __NOEXCEPT -#endif - -#if !(defined(_OPENMP) && de
[clang] 17d8334 - [OpenMP] Allow to go first in C++-mode in target regions
Author: Johannes Doerfert Date: 2020-04-09T22:10:31-05:00 New Revision: 17d83342235f01d4b110dc5d4664fe96f6597f11 URL: https://github.com/llvm/llvm-project/commit/17d83342235f01d4b110dc5d4664fe96f6597f11 DIFF: https://github.com/llvm/llvm-project/commit/17d83342235f01d4b110dc5d4664fe96f6597f11.diff LOG: [OpenMP] Allow to go first in C++-mode in target regions If we are in C++ mode and include (not ) first, we still need to make sure is read first. The problem otherwise is that we haven't seen the declarations of the math.h functions when the system math.h includes our cmath overlay. However, our cmath overlay, or better the underlying overlay, e.g. CUDA, uses the math.h functions. Since we haven't declared them yet we get errors. CUDA avoids this by eagerly declaring all math functions (in the __device__ space) but we cannot do this. Instead we break the dependence by forcing cmath to go first. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D4 Added: Modified: clang/lib/Headers/openmp_wrappers/math.h clang/test/Headers/Inputs/include/math.h clang/test/Headers/nvptx_device_math_sincos.cpp Removed: diff --git a/clang/lib/Headers/openmp_wrappers/math.h b/clang/lib/Headers/openmp_wrappers/math.h index 1ce22e065c27..e917a149b5c9 100644 --- a/clang/lib/Headers/openmp_wrappers/math.h +++ b/clang/lib/Headers/openmp_wrappers/math.h @@ -7,6 +7,19 @@ *===---=== */ +// If we are in C++ mode and include (not ) first, we still need +// to make sure is read first. The problem otherwise is that we haven't +// seen the declarations of the math.h functions when the system math.h includes +// our cmath overlay. However, our cmath overlay, or better the underlying +// overlay, e.g. CUDA, uses the math.h functions. Since we haven't declared them +// yet we get errors. CUDA avoids this by eagerly declaring all math functions +// (in the __device__ space) but we cannot do this. Instead we break the +// dependence by forcing cmath to go first. While our cmath will in turn include +// this file, the cmath guards will prevent recursion. +#ifdef __cplusplus +#include +#endif + #ifndef __CLANG_OPENMP_MATH_H__ #define __CLANG_OPENMP_MATH_H__ diff --git a/clang/test/Headers/Inputs/include/math.h b/clang/test/Headers/Inputs/include/math.h index a60ad45b4d71..b13b14f2b124 100644 --- a/clang/test/Headers/Inputs/include/math.h +++ b/clang/test/Headers/Inputs/include/math.h @@ -197,3 +197,7 @@ float ynf(int __a, float __b); * math functions. */ #define HUGE_VAL (__builtin_huge_val()) + +#ifdef __cplusplus +#include +#endif diff --git a/clang/test/Headers/nvptx_device_math_sincos.cpp b/clang/test/Headers/nvptx_device_math_sincos.cpp index 5419ee2c3513..cf9b67903bf6 100644 --- a/clang/test/Headers/nvptx_device_math_sincos.cpp +++ b/clang/test/Headers/nvptx_device_math_sincos.cpp @@ -1,8 +1,13 @@ // REQUIRES: nvptx-registered-target // RUN: %clang_cc1 -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s +// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s +// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers -DCMATH -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s +#ifdef CMATH #include +#else +#include +#endif // 4 calls to sincos(f), all translated to __nv_sincos calls: ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] d999cbc - [OpenMP] Initial support for std::complex in target regions
Author: Johannes Doerfert Date: 2020-07-08T17:33:59-05:00 New Revision: d999cbc98832154e15e786b98281211d5c1b9f5d URL: https://github.com/llvm/llvm-project/commit/d999cbc98832154e15e786b98281211d5c1b9f5d DIFF: https://github.com/llvm/llvm-project/commit/d999cbc98832154e15e786b98281211d5c1b9f5d.diff LOG: [OpenMP] Initial support for std::complex in target regions This simply follows the scheme we have for other wrappers. It resolves the current link problem, e.g., `__muldc3 not found`, when std::complex operations are used on a device. This will not allow complex make math function calls to work properly, e.g., sin, but that is more complex (pan intended) anyway. Reviewed By: tra, JonChesterfield Differential Revision: https://reviews.llvm.org/D80897 Added: clang/lib/Headers/openmp_wrappers/complex clang/lib/Headers/openmp_wrappers/complex.h clang/test/Headers/Inputs/include/complex clang/test/Headers/nvptx_device_math_complex.cpp Modified: clang/lib/Headers/CMakeLists.txt clang/lib/Headers/__clang_cuda_complex_builtins.h clang/lib/Headers/__clang_cuda_math.h clang/test/Headers/Inputs/include/cmath clang/test/Headers/Inputs/include/cstdlib clang/test/Headers/nvptx_device_math_complex.c Removed: diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt index e7bee192d918..0692fe75a441 100644 --- a/clang/lib/Headers/CMakeLists.txt +++ b/clang/lib/Headers/CMakeLists.txt @@ -151,6 +151,8 @@ set(ppc_wrapper_files set(openmp_wrapper_files openmp_wrappers/math.h openmp_wrappers/cmath + openmp_wrappers/complex.h + openmp_wrappers/complex openmp_wrappers/__clang_openmp_device_functions.h openmp_wrappers/new ) diff --git a/clang/lib/Headers/__clang_cuda_complex_builtins.h b/clang/lib/Headers/__clang_cuda_complex_builtins.h index 576a958b16bb..d698be71d011 100644 --- a/clang/lib/Headers/__clang_cuda_complex_builtins.h +++ b/clang/lib/Headers/__clang_cuda_complex_builtins.h @@ -13,10 +13,61 @@ // This header defines __muldc3, __mulsc3, __divdc3, and __divsc3. These are // libgcc functions that clang assumes are available when compiling c99 complex // operations. (These implementations come from libc++, and have been modified -// to work with CUDA.) +// to work with CUDA and OpenMP target offloading [in C and C++ mode].) -extern "C" inline __device__ double _Complex __muldc3(double __a, double __b, - double __c, double __d) { +#pragma push_macro("__DEVICE__") +#ifdef _OPENMP +#pragma omp declare target +#define __DEVICE__ __attribute__((noinline, nothrow, cold)) +#else +#define __DEVICE__ __device__ inline +#endif + +// Make the algorithms available for C and C++ by selecting the right functions. +#if defined(__cplusplus) +// TODO: In OpenMP mode we cannot overload isinf/isnan/isfinite the way we +// overload all other math functions because old math system headers and not +// always conformant and return an integer instead of a boolean. Until that has +// been addressed we need to work around it. For now, we substituate with the +// calls we would have used to implement those three functions. Note that we +// could use the C alternatives as well. +#define _ISNANd ::__isnan +#define _ISNANf ::__isnanf +#define _ISINFd ::__isinf +#define _ISINFf ::__isinff +#define _ISFINITEd ::__isfinited +#define _ISFINITEf ::__finitef +#define _COPYSIGNd std::copysign +#define _COPYSIGNf std::copysign +#define _SCALBNd std::scalbn +#define _SCALBNf std::scalbn +#define _ABSd std::abs +#define _ABSf std::abs +#define _LOGBd std::logb +#define _LOGBf std::logb +#else +#define _ISNANd isnan +#define _ISNANf isnanf +#define _ISINFd isinf +#define _ISINFf isinff +#define _ISFINITEd isfinite +#define _ISFINITEf isfinitef +#define _COPYSIGNd copysign +#define _COPYSIGNf copysignf +#define _SCALBNd scalbn +#define _SCALBNf scalbnf +#define _ABSd abs +#define _ABSf absf +#define _LOGBd logb +#define _LOGBf logbf +#endif + +#if defined(__cplusplus) +extern "C" { +#endif + +__DEVICE__ double _Complex __muldc3(double __a, double __b, double __c, +double __d) { double __ac = __a * __c; double __bd = __b * __d; double __ad = __a * __d; @@ -24,50 +75,49 @@ extern "C" inline __device__ double _Complex __muldc3(double __a, double __b, double _Complex z; __real__(z) = __ac - __bd; __imag__(z) = __ad + __bc; - if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) { + if (_ISNANd(__real__(z)) && _ISNANd(__imag__(z))) { int __recalc = 0; -if (std::isinf(__a) || std::isinf(__b)) { - __a = std::copysign(std::isinf(__a) ? 1 : 0, __a); - __b = std::copysign(std::isinf(__b) ? 1 : 0, __b); - if (std::isnan(__c)) -__c = std::copysign(0, __c); - if (std::isnan(__d)) -__d = std::copysign(0, __d); +if (_ISINFd(__a) |
[clang] e3e47e8 - [OpenMP] Make complex soft-float functions on the GPU weak definitions
Author: Johannes Doerfert Date: 2020-07-09T01:06:55-05:00 New Revision: e3e47e80355422df2e730cf97a0c80bb6de3915e URL: https://github.com/llvm/llvm-project/commit/e3e47e80355422df2e730cf97a0c80bb6de3915e DIFF: https://github.com/llvm/llvm-project/commit/e3e47e80355422df2e730cf97a0c80bb6de3915e.diff LOG: [OpenMP] Make complex soft-float functions on the GPU weak definitions To avoid linkage errors we have to ensure the linkage allows multiple definitions of these compiler inserted functions. Since they are on the cold path of complex computations, we want to avoid `inline`. Instead, we opt for `weak` and `noinline` for now. Added: Modified: clang/lib/Headers/__clang_cuda_complex_builtins.h clang/test/Headers/nvptx_device_math_complex.c clang/test/Headers/nvptx_device_math_complex.cpp Removed: diff --git a/clang/lib/Headers/__clang_cuda_complex_builtins.h b/clang/lib/Headers/__clang_cuda_complex_builtins.h index d698be71d011..c48c754ed1a4 100644 --- a/clang/lib/Headers/__clang_cuda_complex_builtins.h +++ b/clang/lib/Headers/__clang_cuda_complex_builtins.h @@ -18,7 +18,7 @@ #pragma push_macro("__DEVICE__") #ifdef _OPENMP #pragma omp declare target -#define __DEVICE__ __attribute__((noinline, nothrow, cold)) +#define __DEVICE__ __attribute__((noinline, nothrow, cold, weak)) #else #define __DEVICE__ __device__ inline #endif diff --git a/clang/test/Headers/nvptx_device_math_complex.c b/clang/test/Headers/nvptx_device_math_complex.c index 9b96b5dd8c22..0e212592dd2b 100644 --- a/clang/test/Headers/nvptx_device_math_complex.c +++ b/clang/test/Headers/nvptx_device_math_complex.c @@ -11,10 +11,10 @@ #include #endif -// CHECK-DAG: define {{.*}} @__mulsc3 -// CHECK-DAG: define {{.*}} @__muldc3 -// CHECK-DAG: define {{.*}} @__divsc3 -// CHECK-DAG: define {{.*}} @__divdc3 +// CHECK-DAG: define weak {{.*}} @__mulsc3 +// CHECK-DAG: define weak {{.*}} @__muldc3 +// CHECK-DAG: define weak {{.*}} @__divsc3 +// CHECK-DAG: define weak {{.*}} @__divdc3 // CHECK-DAG: call float @__nv_scalbnf( void test_scmplx(float _Complex a) { diff --git a/clang/test/Headers/nvptx_device_math_complex.cpp b/clang/test/Headers/nvptx_device_math_complex.cpp index 15434d907605..58ed24b74b0e 100644 --- a/clang/test/Headers/nvptx_device_math_complex.cpp +++ b/clang/test/Headers/nvptx_device_math_complex.cpp @@ -5,10 +5,10 @@ #include -// CHECK-DAG: define {{.*}} @__mulsc3 -// CHECK-DAG: define {{.*}} @__muldc3 -// CHECK-DAG: define {{.*}} @__divsc3 -// CHECK-DAG: define {{.*}} @__divdc3 +// CHECK-DAG: define weak {{.*}} @__mulsc3 +// CHECK-DAG: define weak {{.*}} @__muldc3 +// CHECK-DAG: define weak {{.*}} @__divsc3 +// CHECK-DAG: define weak {{.*}} @__divdc3 // CHECK-DAG: call float @__nv_scalbnf( void test_scmplx(std::complex a) { ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 7f1e6fc - [OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers
Author: Johannes Doerfert Date: 2020-07-10T18:53:34-05:00 New Revision: 7f1e6fcff9427adfa8efa3bfeeeac801da788b87 URL: https://github.com/llvm/llvm-project/commit/7f1e6fcff9427adfa8efa3bfeeeac801da788b87 DIFF: https://github.com/llvm/llvm-project/commit/7f1e6fcff9427adfa8efa3bfeeeac801da788b87.diff LOG: [OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers Due to recent changes we cannot use OpenMP in CUDA files anymore (PR45533) as the math handling of CUDA is different when _OPENMP is defined. We actually want this different behavior only if we are offloading with OpenMP to NVIDIA, thus generating NVPTX. With this patch we do not interfere with the CUDA math handling except if we are in NVPTX offloading mode, as indicated by the presence of __OPENMP_NVPTX__. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D78155 Added: Modified: clang/lib/Headers/__clang_cuda_cmath.h clang/lib/Headers/__clang_cuda_device_functions.h clang/lib/Headers/__clang_cuda_libdevice_declares.h clang/lib/Headers/__clang_cuda_math.h clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h clang/lib/Headers/openmp_wrappers/cmath clang/lib/Headers/openmp_wrappers/math.h Removed: diff --git a/clang/lib/Headers/__clang_cuda_cmath.h b/clang/lib/Headers/__clang_cuda_cmath.h index f406112164e5..8ba182689a4f 100644 --- a/clang/lib/Headers/__clang_cuda_cmath.h +++ b/clang/lib/Headers/__clang_cuda_cmath.h @@ -12,7 +12,7 @@ #error "This file is for CUDA compilation only." #endif -#ifndef _OPENMP +#ifndef __OPENMP_NVPTX__ #include #endif @@ -32,7 +32,7 @@ // implementation. Declaring in the global namespace and pulling into namespace // std covers all of the known knowns. -#ifdef _OPENMP +#ifdef __OPENMP_NVPTX__ #define __DEVICE__ static constexpr __attribute__((always_inline, nothrow)) #else #define __DEVICE__ static __device__ __inline__ __attribute__((always_inline)) @@ -69,7 +69,7 @@ __DEVICE__ float frexp(float __arg, int *__exp) { // Windows. For OpenMP we omit these as some old system headers have // non-conforming `isinf(float)` and `isnan(float)` implementations that return // an `int`. The system versions of these functions should be fine anyway. -#if !defined(_MSC_VER) && !defined(_OPENMP) +#if !defined(_MSC_VER) && !defined(__OPENMP_NVPTX__) __DEVICE__ bool isinf(float __x) { return ::__isinff(__x); } __DEVICE__ bool isinf(double __x) { return ::__isinf(__x); } __DEVICE__ bool isfinite(float __x) { return ::__finitef(__x); } @@ -146,7 +146,7 @@ __DEVICE__ float tanh(float __x) { return ::tanhf(__x); } // libdevice doesn't provide an implementation, and we don't want to be in the // business of implementing tricky libm functions in this header. -#ifndef _OPENMP +#ifndef __OPENMP_NVPTX__ // Now we've defined everything we promised we'd define in // __clang_cuda_math_forward_declares.h. We need to do two additional things to @@ -463,7 +463,7 @@ _GLIBCXX_END_NAMESPACE_VERSION } // namespace std #endif -#endif // _OPENMP +#endif // __OPENMP_NVPTX__ #undef __DEVICE__ diff --git a/clang/lib/Headers/__clang_cuda_device_functions.h b/clang/lib/Headers/__clang_cuda_device_functions.h index 76c588997f18..f801e5426aa4 100644 --- a/clang/lib/Headers/__clang_cuda_device_functions.h +++ b/clang/lib/Headers/__clang_cuda_device_functions.h @@ -10,7 +10,7 @@ #ifndef __CLANG_CUDA_DEVICE_FUNCTIONS_H__ #define __CLANG_CUDA_DEVICE_FUNCTIONS_H__ -#ifndef _OPENMP +#ifndef __OPENMP_NVPTX__ #if CUDA_VERSION < 9000 #error This file is intended to be used with CUDA-9+ only. #endif @@ -20,7 +20,7 @@ // we implement in this file. We need static in order to avoid emitting unused // functions and __forceinline__ helps inlining these wrappers at -O1. #pragma push_macro("__DEVICE__") -#ifdef _OPENMP +#ifdef __OPENMP_NVPTX__ #define __DEVICE__ static __attribute__((always_inline, nothrow)) #else #define __DEVICE__ static __device__ __forceinline__ @@ -1466,14 +1466,14 @@ __DEVICE__ unsigned int __vsubus4(unsigned int __a, unsigned int __b) { // For OpenMP we require the user to include as we need to know what // clock_t is on the system. -#ifndef _OPENMP +#ifndef __OPENMP_NVPTX__ __DEVICE__ /* clock_t= */ int clock() { return __nvvm_read_ptx_sreg_clock(); } #endif __DEVICE__ long long clock64() { return __nvvm_read_ptx_sreg_clock64(); } // These functions shouldn't be declared when including this header // for math function resolution purposes. -#ifndef _OPENMP +#ifndef __OPENMP_NVPTX__ __DEVICE__ void *memcpy(void *__a, const void *__b, size_t __c) { return __builtin_memcpy(__a, __b, __c); } diff --git a/clang/lib/Headers/__clang_cuda_libdevice_declares.h b/clang/lib/Headers/__clang_cuda_libdevice_declares.h index 4d70353394c8..6173b589e3ef 100644 --- a/clang/lib/Headers/__clang_cuda_libdevice_declares.h +++ b/cl
[clang] cd0ea03 - [OpenMP][NFC] Remove unused and untested code from the device runtime
Author: Johannes Doerfert Date: 2020-07-10T19:09:41-05:00 New Revision: cd0ea03e6f157e8fb477cd8368b29e1448eeb265 URL: https://github.com/llvm/llvm-project/commit/cd0ea03e6f157e8fb477cd8368b29e1448eeb265 DIFF: https://github.com/llvm/llvm-project/commit/cd0ea03e6f157e8fb477cd8368b29e1448eeb265.diff LOG: [OpenMP][NFC] Remove unused and untested code from the device runtime Summary: We carried a lot of unused and untested code in the device runtime. Among other reasons, we are planning major rewrites for which reduced size is going to help a lot. The number of code lines reduced by 14%! Before: --- Language files blankcomment code --- CUDA13489841 2454 C/C++ Header14322493 1377 C 12117124559 CMake4 64 64262 C++ 1 6 6 39 --- SUM:44998 1528 4691 --- After: --- Language files blankcomment code --- CUDA13366733 1879 C/C++ Header14317484 1293 C 12117124559 CMake4 64 64262 C++ 1 6 6 39 --- SUM:44870 1411 4032 --- Reviewers: hfinkel, jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis, Hahnfeld, ABataev, hbae, ronlieb, gregrodgers Subscribers: jvesely, yaxunl, bollu, guansong, jfb, sstefan1, aaron.ballman, openmp-commits, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D83349 Added: Modified: clang/test/OpenMP/nvptx_target_simd_codegen.cpp openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h openmp/libomptarget/deviceRTLs/common/omptarget.h openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu openmp/libomptarget/deviceRTLs/common/src/libcall.cu openmp/libomptarget/deviceRTLs/common/src/loop.cu openmp/libomptarget/deviceRTLs/common/src/omptarget.cu openmp/libomptarget/deviceRTLs/common/src/parallel.cu openmp/libomptarget/deviceRTLs/common/src/reduction.cu openmp/libomptarget/deviceRTLs/common/src/support.cu openmp/libomptarget/deviceRTLs/common/src/sync.cu openmp/libomptarget/deviceRTLs/common/support.h openmp/libomptarget/deviceRTLs/interface.h openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h Removed: diff --git a/clang/test/OpenMP/nvptx_target_simd_codegen.cpp b/clang/test/OpenMP/nvptx_target_simd_codegen.cpp index 073d6fa2f14e..7a1f01c1f1ad 100644 --- a/clang/test/OpenMP/nvptx_target_simd_codegen.cpp +++ b/clang/test/OpenMP/nvptx_target_simd_codegen.cpp @@ -78,7 +78,6 @@ int bar(int n){ // CHECK: call void @__kmpc_spmd_kernel_init(i32 %{{.+}}, i16 0, i16 0) // CHECK-NOT: call void @__kmpc_for_static_init // CHECK-NOT: call void @__kmpc_for_static_fini -// CHECK-NOT: call i32 @__kmpc_nvptx_simd_reduce_nowait( // CHECK-NOT: call void @__kmpc_nvptx_end_reduce_nowait( // CHECK: call void @__kmpc_spmd_kernel_deinit_v2(i16 0) // CHECK: ret void diff --git a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h index 77a0ffb54f95..3c90b39282c9 100644 --- a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h +++ b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h @@ -140,8 +140,6 @@ DEVICE int GetNumberOfThreadsInBlock(); DEVICE unsigned GetWarpId(); DEVICE unsigned GetLaneId(); -DEVICE bool __kmpc_impl_is_first_active_thread(); - // Locks DEVICE void __kmpc_impl_init_lock(omp_lock_t *lock); DEVICE void __kmpc_impl_destroy_lock(omp_lock_t *lock); diff --git a/openmp/libomptarget/deviceRTLs/common/omptarget.h b/openmp/libomptarget/deviceRTLs/common/ompta
[clang] b5667d0 - [OpenMP][CUDA] Fix std::complex in GPU regions
Author: Johannes Doerfert Date: 2020-07-11T00:40:05-05:00 New Revision: b5667d00e0447747419a783697b84a37f59ce055 URL: https://github.com/llvm/llvm-project/commit/b5667d00e0447747419a783697b84a37f59ce055 DIFF: https://github.com/llvm/llvm-project/commit/b5667d00e0447747419a783697b84a37f59ce055.diff LOG: [OpenMP][CUDA] Fix std::complex in GPU regions The old way worked to some degree for C++-mode but in C mode we actually tried to introduce variants of macros (e.g., isinf). To make both modes work reliably we get rid of those extra variants and directly use NVIDIA intrinsics in the complex implementation. While this has to be revisited as we add other GPU targets which want to reuse the code, it should be fine for now. Reviewed By: tra, JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D83591 Added: Modified: clang/lib/Headers/__clang_cuda_complex_builtins.h clang/lib/Headers/__clang_cuda_math.h clang/test/Headers/nvptx_device_math_complex.c clang/test/Headers/nvptx_device_math_complex.cpp Removed: diff --git a/clang/lib/Headers/__clang_cuda_complex_builtins.h b/clang/lib/Headers/__clang_cuda_complex_builtins.h index c48c754ed1a4..8c10ff6b461f 100644 --- a/clang/lib/Headers/__clang_cuda_complex_builtins.h +++ b/clang/lib/Headers/__clang_cuda_complex_builtins.h @@ -23,20 +23,16 @@ #define __DEVICE__ __device__ inline #endif -// Make the algorithms available for C and C++ by selecting the right functions. -#if defined(__cplusplus) -// TODO: In OpenMP mode we cannot overload isinf/isnan/isfinite the way we -// overload all other math functions because old math system headers and not -// always conformant and return an integer instead of a boolean. Until that has -// been addressed we need to work around it. For now, we substituate with the -// calls we would have used to implement those three functions. Note that we -// could use the C alternatives as well. -#define _ISNANd ::__isnan -#define _ISNANf ::__isnanf -#define _ISINFd ::__isinf -#define _ISINFf ::__isinff -#define _ISFINITEd ::__isfinited -#define _ISFINITEf ::__finitef +// To make the algorithms available for C and C++ in CUDA and OpenMP we select +// diff erent but equivalent function versions. TODO: For OpenMP we currently +// select the native builtins as the overload support for templates is lacking. +#if !defined(_OPENMP) +#define _ISNANd std::isnan +#define _ISNANf std::isnan +#define _ISINFd std::isinf +#define _ISINFf std::isinf +#define _ISFINITEd std::isfinite +#define _ISFINITEf std::isfinite #define _COPYSIGNd std::copysign #define _COPYSIGNf std::copysign #define _SCALBNd std::scalbn @@ -46,20 +42,20 @@ #define _LOGBd std::logb #define _LOGBf std::logb #else -#define _ISNANd isnan -#define _ISNANf isnanf -#define _ISINFd isinf -#define _ISINFf isinff -#define _ISFINITEd isfinite -#define _ISFINITEf isfinitef -#define _COPYSIGNd copysign -#define _COPYSIGNf copysignf -#define _SCALBNd scalbn -#define _SCALBNf scalbnf -#define _ABSd abs -#define _ABSf absf -#define _LOGBd logb -#define _LOGBf logbf +#define _ISNANd __nv_isnand +#define _ISNANf __nv_isnanf +#define _ISINFd __nv_isinfd +#define _ISINFf __nv_isinff +#define _ISFINITEd __nv_isfinited +#define _ISFINITEf __nv_finitef +#define _COPYSIGNd __nv_copysign +#define _COPYSIGNf __nv_copysignf +#define _SCALBNd __nv_scalbn +#define _SCALBNf __nv_scalbnf +#define _ABSd __nv_fabs +#define _ABSf __nv_fabsf +#define _LOGBd __nv_logb +#define _LOGBf __nv_logbf #endif #if defined(__cplusplus) diff --git a/clang/lib/Headers/__clang_cuda_math.h b/clang/lib/Headers/__clang_cuda_math.h index 2e8e6ae71d9c..332e616702ac 100644 --- a/clang/lib/Headers/__clang_cuda_math.h +++ b/clang/lib/Headers/__clang_cuda_math.h @@ -340,16 +340,6 @@ __DEVICE__ float y1f(float __a) { return __nv_y1f(__a); } __DEVICE__ double yn(int __a, double __b) { return __nv_yn(__a, __b); } __DEVICE__ float ynf(int __a, float __b) { return __nv_ynf(__a, __b); } -// In C++ mode OpenMP takes the system versions of these because some math -// headers provide the wrong return type. This cannot happen in C and we can and -// want to use the specialized versions right away. -#if defined(_OPENMP) && !defined(__cplusplus) -__DEVICE__ int isinff(float __x) { return __nv_isinff(__x); } -__DEVICE__ int isinf(double __x) { return __nv_isinfd(__x); } -__DEVICE__ int isnanf(float __x) { return __nv_isnanf(__x); } -__DEVICE__ int isnan(double __x) { return __nv_isnand(__x); } -#endif - #pragma pop_macro("__DEVICE__") #pragma pop_macro("__DEVICE_VOID__") #pragma pop_macro("__FAST_OR_SLOW") diff --git a/clang/test/Headers/nvptx_device_math_complex.c b/clang/test/Headers/nvptx_device_math_complex.c index 0e212592dd2b..6e3e8bffbd24 100644 --- a/clang/test/Headers/nvptx_device_math_complex.c +++ b/clang/test/Headers/nvptx_device_math_complex.c @@ -11,12 +11,34 @@ #include
[clang] c986995 - [OpenMP][NFC] Remove unused (always fixed) arguments
Author: Johannes Doerfert Date: 2020-07-11T00:51:51-05:00 New Revision: c98699582a6333bbe76ff7853b4cd6beb45754cf URL: https://github.com/llvm/llvm-project/commit/c98699582a6333bbe76ff7853b4cd6beb45754cf DIFF: https://github.com/llvm/llvm-project/commit/c98699582a6333bbe76ff7853b4cd6beb45754cf.diff LOG: [OpenMP][NFC] Remove unused (always fixed) arguments There are various runtime calls in the device runtime with unused, or always fixed, arguments. This is bad for all sorts of reasons. Clean up two before as we match them in OpenMPOpt now. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83268 Added: Modified: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp clang/test/OpenMP/nvptx_data_sharing.cpp clang/test/OpenMP/nvptx_parallel_codegen.cpp clang/test/OpenMP/nvptx_target_codegen.cpp clang/test/OpenMP/nvptx_target_teams_codegen.cpp clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp llvm/include/llvm/Frontend/OpenMP/OMPKinds.def openmp/libomptarget/deviceRTLs/common/src/parallel.cu openmp/libomptarget/deviceRTLs/interface.h Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp index cabd06bd76e8..cbd443134e7a 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp @@ -38,11 +38,9 @@ enum OpenMPRTLFunctionNVPTX { /// Call to void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime); OMPRTL_NVPTX__kmpc_spmd_kernel_deinit_v2, /// Call to void __kmpc_kernel_prepare_parallel(void - /// *outlined_function, int16_t - /// IsOMPRuntimeInitialized); + /// *outlined_function); OMPRTL_NVPTX__kmpc_kernel_prepare_parallel, - /// Call to bool __kmpc_kernel_parallel(void **outlined_function, - /// int16_t IsOMPRuntimeInitialized); + /// Call to bool __kmpc_kernel_parallel(void **outlined_function); OMPRTL_NVPTX__kmpc_kernel_parallel, /// Call to void __kmpc_kernel_end_parallel(); OMPRTL_NVPTX__kmpc_kernel_end_parallel, @@ -1466,8 +1464,7 @@ void CGOpenMPRuntimeNVPTX::emitWorkerLoop(CodeGenFunction &CGF, CGF.InitTempAlloca(WorkFn, llvm::Constant::getNullValue(CGF.Int8PtrTy)); // TODO: Optimize runtime initialization and pass in correct value. - llvm::Value *Args[] = {WorkFn.getPointer(), - /*RequiresOMPRuntime=*/Bld.getInt16(1)}; + llvm::Value *Args[] = {WorkFn.getPointer()}; llvm::Value *Ret = CGF.EmitRuntimeCall( createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_parallel), Args); Bld.CreateStore(Bld.CreateZExt(Ret, CGF.Int8Ty), ExecStatus); @@ -1595,17 +1592,16 @@ CGOpenMPRuntimeNVPTX::createNVPTXRuntimeFunction(unsigned Function) { } case OMPRTL_NVPTX__kmpc_kernel_prepare_parallel: { /// Build void __kmpc_kernel_prepare_parallel( -/// void *outlined_function, int16_t IsOMPRuntimeInitialized); -llvm::Type *TypeParams[] = {CGM.Int8PtrTy, CGM.Int16Ty}; +/// void *outlined_function); +llvm::Type *TypeParams[] = {CGM.Int8PtrTy}; auto *FnTy = llvm::FunctionType::get(CGM.VoidTy, TypeParams, /*isVarArg*/ false); RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_kernel_prepare_parallel"); break; } case OMPRTL_NVPTX__kmpc_kernel_parallel: { -/// Build bool __kmpc_kernel_parallel(void **outlined_function, -/// int16_t IsOMPRuntimeInitialized); -llvm::Type *TypeParams[] = {CGM.Int8PtrPtrTy, CGM.Int16Ty}; +/// Build bool __kmpc_kernel_parallel(void **outlined_function); +llvm::Type *TypeParams[] = {CGM.Int8PtrPtrTy}; llvm::Type *RetTy = CGM.getTypes().ConvertType(CGM.getContext().BoolTy); auto *FnTy = llvm::FunctionType::get(RetTy, TypeParams, /*isVarArg*/ false); @@ -2569,7 +2565,7 @@ void CGOpenMPRuntimeNVPTX::emitNonSPMDParallelCall( llvm::Value *ID = Bld.CreateBitOrPointerCast(WFn, CGM.Int8PtrTy); // Prepare for parallel region. Indicate the outlined function. -llvm::Value *Args[] = {ID, /*RequiresOMPRuntime=*/Bld.getInt16(1)}; +llvm::Value *Args[] = {ID}; CGF.EmitRuntimeCall( createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_prepare_parallel), Args); diff --git a/clang/test/OpenMP/nvptx_data_sharing.cpp b/clang/test/OpenMP/nvptx_data_sharing.cpp index 2ee6bd2b4701..1372246c7fc8 100644 --- a/clang/test/OpenMP/nvptx_data_sharing.cpp +++ b/clang/test/OpenMP/nvptx_data_sharing.cpp @@ -55,7 +55,7 @@ void test_ds(){ // CK1: [[A:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 0 // CK1: [[B:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 1 // CK1: store i32 10, i32* [[A]] -// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}}, i16 1) +// CK1: call void @__kmpc_kernel_prepare_parallel({
[clang] fec1f21 - [OpenMP] Emit remarks during GPU state machine optimization
Author: Johannes Doerfert Date: 2020-07-14T22:33:57-05:00 New Revision: fec1f2109f33c9a1a7650272b3bfb8f0f81f6a2b URL: https://github.com/llvm/llvm-project/commit/fec1f2109f33c9a1a7650272b3bfb8f0f81f6a2b DIFF: https://github.com/llvm/llvm-project/commit/fec1f2109f33c9a1a7650272b3bfb8f0f81f6a2b.diff LOG: [OpenMP] Emit remarks during GPU state machine optimization Since D83271 we can optimize the GPU state machine to avoid spurious call edges that increase the register usage of kernels. With this patch we inform the user why and if this optimization is happening and when it is not. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D83707 Added: clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c clang/test/OpenMP/remarks_parallel_in_target_state_machine.c Modified: llvm/lib/Transforms/IPO/OpenMPOpt.cpp Removed: diff --git a/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c b/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c new file mode 100644 index ..c5152d401c8b --- /dev/null +++ b/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c @@ -0,0 +1,102 @@ +// RUN: %clang_cc1 -verify=host -Rpass=openmp -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc +// RUN: %clang_cc1 -verify=all,safe -Rpass=openmp -fopenmp -O2 -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t.out +// RUN: %clang_cc1 -fexperimental-new-pass-manager -verify=all,safe -Rpass=openmp -fopenmp -O2 -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t.out + +// host-no-diagnostics + +void bar1(void) { +#pragma omp parallel // #0 + // all-remark@#0 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage for unrelated target regions in the same translation unit due to spurious call edges assumed by ptxas.}} + // safe-remark@#0 {{Parallel region is not known to be called from a unique single target region, maybe the surrounding function has external linkage?; will not attempt to rewrite the state machine use.}} + // force-remark@#0 {{[UNSAFE] Parallel region is not known to be called from a unique single target region, maybe the surrounding function has external linkage?; will rewrite the state machine use due to command line flag, this can lead to undefined behavior if the parallel region is called from a target region outside this translation unit.}} + // force-remark@#0 {{Specialize parallel region that is only reached from a single target region to avoid spurious call edges and excessive register usage in other target regions. (parallel region ID: __omp_outlined__2_wrapper, kernel ID: }} + { + } +} +void bar2(void) { +#pragma omp parallel // #1 + // all-remark@#1 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage for unrelated target regions in the same translation unit due to spurious call edges assumed by ptxas.}} + // safe-remark@#1 {{Parallel region is not known to be called from a unique single target region, maybe the surrounding function has external linkage?; will not attempt to rewrite the state machine use.}} + // force-remark@#1 {{[UNSAFE] Parallel region is not known to be called from a unique single target region, maybe the surrounding function has external linkage?; will rewrite the state machine use due to command line flag, this can lead to undefined behavior if the parallel region is called from a target region outside this translation unit.}} + // force-remark@#1 {{Specialize parallel region that is only reached from a single target region to avoid spurious call edges and excessive register usage in other target regions. (parallel region ID: __omp_outlined__6_wrapper, kernel ID: }} + { + } +} + +void foo1(void) { +#pragma omp target teams // #2 + // all-remark@#2 {{Target region containing the parallel region that is specialized. (parallel region ID: __omp_outlined__1_w
[clang] 7af287d - [OpenMP][IRBuilder] Support nested parallel regions
Author: Johannes Doerfert Date: 2020-07-14T22:39:06-05:00 New Revision: 7af287d0d921471f18b5c3054ce42381c0f973ed URL: https://github.com/llvm/llvm-project/commit/7af287d0d921471f18b5c3054ce42381c0f973ed DIFF: https://github.com/llvm/llvm-project/commit/7af287d0d921471f18b5c3054ce42381c0f973ed.diff LOG: [OpenMP][IRBuilder] Support nested parallel regions During code generation we might change/add basic blocks so keeping a list of them is fairly easy to break. Nested parallel regions were enough. The new scheme does recompute the list of blocks to be outlined once it is needed. Reviewed By: anchu-rajendran Differential Revision: https://reviews.llvm.org/D82722 Added: clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c Modified: clang/test/OpenMP/cancel_codegen.cpp llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp Removed: diff --git a/clang/test/OpenMP/cancel_codegen.cpp b/clang/test/OpenMP/cancel_codegen.cpp index b7d1cea56721..a21a9db1e39a 100644 --- a/clang/test/OpenMP/cancel_codegen.cpp +++ b/clang/test/OpenMP/cancel_codegen.cpp @@ -175,7 +175,7 @@ for (int i = 0; i < argc; ++i) { // IRBUILDER: define internal void @main -// IRBUILDER: [[RETURN:omp.par.exit[^:]*]] +// IRBUILDER: [[RETURN:omp.par.outlined.exit[^:]*]] // IRBUILDER-NEXT: ret void // IRBUILDER: [[FLAG:%.+]] = load float, float* @{{.+}}, @@ -192,10 +192,8 @@ for (int i = 0; i < argc; ++i) { // IRBUILDER: [[CMP:%.+]] = icmp eq i32 [[RES]], 0 // IRBUILDER: br i1 [[CMP]], label %[[CONTINUE:[^,].+]], label %[[EXIT:.+]] // IRBUILDER: [[EXIT]] -// IRBUILDER: br label %[[EXIT2:.+]] -// IRBUILDER: [[CONTINUE]] -// IRBUILDER: br label %[[ELSE:.+]] -// IRBUILDER: [[EXIT2]] // IRBUILDER: br label %[[RETURN]] +// IRBUILDER: [[CONTINUE]] +// IRBUILDER: br label %[[ELSE2:.+]] #endif diff --git a/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c b/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c new file mode 100644 index ..552455eb9779 --- /dev/null +++ b/clang/test/OpenMP/irbuilder_nested_openmp_parallel_empty.c @@ -0,0 +1,110 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// RUN: %clang_cc1 -verify -fopenmp -fopenmp-enable-irbuilder -x c++ -emit-llvm %s -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -o - | FileCheck %s --check-prefixes=ALL,IRBUILDER +// %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o /tmp/t1 %s +// %clang_cc1 -fopenmp -fopenmp-enable-irbuilder -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -debug-info-kind=limited -std=c++11 -include-pch /tmp/t1 -verify %s -emit-llvm -o - | FileCheck --check-prefixes=ALL-DEBUG,IRBUILDER-DEBUG %s + +// expected-no-diagnostics + +// TODO: Teach the update script to check new functions too. + +#ifndef HEADER +#define HEADER + +// ALL-LABEL: @_Z17nested_parallel_0v( +// ALL-NEXT: entry: +// ALL-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1) +// ALL-NEXT:br label [[OMP_PARALLEL:%.*]] +// ALL: omp_parallel: +// ALL-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @1, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @_Z17nested_parallel_0v..omp_par.1 to void (i32*, i32*, ...)*)) +// ALL-NEXT:br label [[OMP_PAR_OUTLINED_EXIT12:%.*]] +// ALL: omp.par.outlined.exit12: +// ALL-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]] +// ALL: omp.par.exit.split: +// ALL-NEXT:ret void +// +void nested_parallel_0(void) { +#pragma omp parallel + { +#pragma omp parallel +{ +} + } +} + +// ALL-LABEL: @_Z17nested_parallel_1Pfid( +// ALL-NEXT: entry: +// ALL-NEXT:[[R_ADDR:%.*]] = alloca float*, align 8 +// ALL-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4 +// ALL-NEXT:[[B_ADDR:%.*]] = alloca double, align 8 +// ALL-NEXT:store float* [[R:%.*]], float** [[R_ADDR]], align 8 +// ALL-NEXT:store i32 [[A:%.*]], i32* [[A_ADDR]], align 4 +// ALL-NEXT:store double [[B:%.*]], double* [[B_ADDR]], align 8 +// ALL-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1) +// ALL-NEXT:br label [[OMP_PARALLEL:%.*]] +// ALL: omp_parallel: +// ALL-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @1, i32 3, void (i32*, i32*, ...)* bitcast (void (i32*, i32*, i32*, double*, float**)* @_Z17nested_parallel_1Pfid..omp_par.2 to void (i32*, i32*, ...)*), i32* [[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]]) +// ALL-NEXT:br label [[OMP_PAR_OUTLINED_EXIT13:%.*]] +// ALL: omp.par.outlined.exit13: +// ALL-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]] +// ALL: omp.par.exit.sp
[clang] d87c92e - [OpenMP][FIX] Check only for deterministic part of a generated function name
Author: Johannes Doerfert Date: 2020-07-14T22:48:22-05:00 New Revision: d87c92e5a2eca620903ce53592ccbe4f8807abe1 URL: https://github.com/llvm/llvm-project/commit/d87c92e5a2eca620903ce53592ccbe4f8807abe1 DIFF: https://github.com/llvm/llvm-project/commit/d87c92e5a2eca620903ce53592ccbe4f8807abe1.diff LOG: [OpenMP][FIX] Check only for deterministic part of a generated function name Added: Modified: clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c clang/test/OpenMP/remarks_parallel_in_target_state_machine.c Removed: diff --git a/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c b/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c index c5152d401c8b..163f0b92468a 100644 --- a/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c +++ b/clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c @@ -25,18 +25,18 @@ void bar2(void) { void foo1(void) { #pragma omp target teams // #2 - // all-remark@#2 {{Target region containing the parallel region that is specialized. (parallel region ID: __omp_outlined__1_wrapper, kernel ID: __omp_offloading_22}} - // all-remark@#2 {{Target region containing the parallel region that is specialized. (parallel region ID: __omp_outlined__3_wrapper, kernel ID: __omp_offloading_22}} + // all-remark@#2 {{Target region containing the parallel region that is specialized. (parallel region ID: __omp_outlined__1_wrapper, kernel ID: __omp_offloading}} + // all-remark@#2 {{Target region containing the parallel region that is specialized. (parallel region ID: __omp_outlined__3_wrapper, kernel ID: __omp_offloading}} { #pragma omp parallel // #3 // all-remark@#3 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage for unrelated target regions in the same translation unit due to spurious call edges assumed by ptxas.}} - // all-remark@#3 {{Specialize parallel region that is only reached from a single target region to avoid spurious call edges and excessive register usage in other target regions. (parallel region ID: __omp_outlined__1_wrapper, kernel ID: __omp_offloading_22}} + // all-remark@#3 {{Specialize parallel region that is only reached from a single target region to avoid spurious call edges and excessive register usage in other target regions. (parallel region ID: __omp_outlined__1_wrapper, kernel ID: __omp_offloading}} { } bar1(); #pragma omp parallel // #4 // all-remark@#4 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage for unrelated target regions in the same translation unit due to spurious call edges assumed by ptxas.}} - // all-remark@#4 {{Specialize parallel region that is only reached from a single target region to avoid spurious call edges and excessive register usage in other target regions. (parallel region ID: __omp_outlined__3_wrapper, kernel ID: __omp_offloading_22}} + // all-remark@#4 {{Specialize parallel region that is only reached from a single target region to avoid spurious call edges and excessive register usage in other target regions. (parallel region ID: __omp_outlined__3_wrapper, kernel ID: __omp_offloading}} { } } @@ -44,19 +44,19 @@ void foo1(void) { void foo2(void) { #pragma omp target teams // #5 - // all-remark@#5 {{Target region containing the parallel region that is specialized. (parallel region ID: __omp_outlined__5_wrapper, kernel ID: __omp_offloading_22}} - // all-remark@#5 {{Target region containing the parallel region that is specialized. (parallel region ID: __omp_outlined__7_wrapper, kernel ID: __omp_offloading_22}} + // all-remark@#5 {{Target region containing the parallel region that is specialized. (parallel region ID: __omp_outlined__5_wrapper, kernel ID: __omp_offloading}} + // all-remark@#5 {{Target region containing the parallel region that is specialized. (parallel region ID: __omp_outlined__7_wrapper, kernel ID: __omp_offloading}} { #pragma omp parallel // #6 // all-remark@#6 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage for unrelated target regions in the
[clang] 97ce7fd - [UpdateTestChecks] Match unnamed values like "@[0-9]+" and "![0-9]+"
Author: Johannes Doerfert Date: 2020-08-12T01:04:16-05:00 New Revision: 97ce7fd89fcc92d84c1938108388f735d55d372c URL: https://github.com/llvm/llvm-project/commit/97ce7fd89fcc92d84c1938108388f735d55d372c DIFF: https://github.com/llvm/llvm-project/commit/97ce7fd89fcc92d84c1938108388f735d55d372c.diff LOG: [UpdateTestChecks] Match unnamed values like "@[0-9]+" and "![0-9]+" With this patch we will match most *uses* of "temporary" named things in the IR via regular expressions, not their name at creation time. The new "values" we match are: - "unnamed" globals: `@[0-9]+` - debug metadata: `!dbg ![0-9]+` - loop metadata: `!loop ![0-9]+` - tbaa metadata: `!tbaa ![0-9]+` - range metadata: `!range ![0-9]+` - generic metadata: `metadata ![0-9]+` - attributes groups: `#[0-9]` We still don't match the declarations but that can be done later. This patch can introduce churn when existing check lines contain the old hardcoded versions of the above "values". We can add a flag to opt-out, or opt-in, if necessary. Reviewed By: arichardson, MaskRay Differential Revision: https://reviews.llvm.org/D85099 Added: llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/various_ir_values.ll llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/various_ir_values.ll.expected llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/various_ir_values.ll.funcsig.expected llvm/test/tools/UpdateTestChecks/update_test_checks/various_ir_values.test Modified: clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.funcattrs.expected llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.plain.expected llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/scrub_attrs.ll.plain.expected llvm/utils/UpdateTestChecks/asm.py llvm/utils/UpdateTestChecks/common.py llvm/utils/update_cc_test_checks.py llvm/utils/update_test_checks.py Removed: diff --git a/clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected b/clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected index 48ee67a7165a..8095b10d7877 100644 --- a/clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected +++ b/clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected @@ -44,7 +44,7 @@ Foo::Foo(int x) : x(x) {} // CHECK-NEXT:[[THIS_ADDR:%.*]] = alloca %class.Foo*, align 8 // CHECK-NEXT:store %class.Foo* [[THIS:%.*]], %class.Foo** [[THIS_ADDR]], align 8 // CHECK-NEXT:[[THIS1:%.*]] = load %class.Foo*, %class.Foo** [[THIS_ADDR]], align 8 -// CHECK-NEXT:call void @_ZN3FooD2Ev(%class.Foo* [[THIS1]]) #2 +// CHECK-NEXT:call void @_ZN3FooD2Ev(%class.Foo* [[THIS1]]) [[ATTR2:#.*]] // CHECK-NEXT:ret void // Foo::~Foo() {} @@ -70,7 +70,7 @@ int Foo::function_defined_out_of_line(int arg) const { return x - arg; } // CHECK-NEXT:call void @_ZN3FooC1Ei(%class.Foo* [[F]], i32 1) // CHECK-NEXT:[[CALL:%.*]] = call i32 @_ZNK3Foo23function_defined_inlineEi(%class.Foo* [[F]], i32 2) // CHECK-NEXT:[[CALL1:%.*]] = call i32 @_ZNK3Foo28function_defined_out_of_lineEi(%class.Foo* [[F]], i32 3) -// CHECK-NEXT:call void @_ZN3FooD1Ev(%class.Foo* [[F]]) #2 +// CHECK-NEXT:call void @_ZN3FooD1Ev(%class.Foo* [[F]]) [[ATTR2]] // CHECK-NEXT:ret i32 0 // int main() { diff --git a/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected b/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected index e76cf074bdb7..313bd5bcef7c 100644 --- a/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected +++ b/clang/test/utils/update_cc_test_checks/Inputs/mangled_names.c.funcsig.expected @@ -3,7 +3,7 @@ // RUN: %clang_cc1 -triple=x86_64-unknown-linux-gnu -emit-llvm -o - %s | FileCheck %s // CHECK-LABEL: define {{[^@]+}}@test -// CHECK-SAME: (i64 [[A:%.*]], i32 [[B:%.*]]) #0 +// CHECK-SAME: (i64 [[A:%.*]], i32 [[B:%.*]]) [[ATTR0:#.*]] // CHECK-NEXT: entry: // CHECK-NEXT:[[A_ADDR:%.*]] = alloca i64, align 8 // CHECK-NEXT:[[B_ADDR:%.*]] = alloca i32, align 4 @@ -21,7 +21,7 @@ long test(long a, int b) { // A function with a mangled name // CHECK-LABEL: define {{[^@]+}}@_Z4testlii -// CHECK-SAME: (i64 [[A:%.*]], i32 [[B:%.*]], i32 [[C:%.*]]) #0 +// CHECK-SAME: (i64 [[A:%.*]], i32 [[B:%.*]], i32 [[C:%.*]]) [[ATTR0]] // CHECK-NEXT: entry: // CHECK-NEXT:[[A_ADDR:%.*]] = alloca i64, align 8 // CHECK-NEXT:[[B_ADDR:%.*]] = alloca i32, align 4 diff --git a/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.funcattrs.expected b/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.fun
[clang] 07c3348 - [OpenMP][NFC] Update test check lines with new script version
Author: Johannes Doerfert Date: 2020-08-14T08:59:25-05:00 New Revision: 07c33487faff3067953d61e5e968b6c3d1b845d6 URL: https://github.com/llvm/llvm-project/commit/07c33487faff3067953d61e5e968b6c3d1b845d6 DIFF: https://github.com/llvm/llvm-project/commit/07c33487faff3067953d61e5e968b6c3d1b845d6.diff LOG: [OpenMP][NFC] Update test check lines with new script version Added: Modified: clang/test/OpenMP/irbuilder_nested_parallel_for.c Removed: diff --git a/clang/test/OpenMP/irbuilder_nested_parallel_for.c b/clang/test/OpenMP/irbuilder_nested_parallel_for.c index 929a92827689..2ca6fe711e28 100644 --- a/clang/test/OpenMP/irbuilder_nested_parallel_for.c +++ b/clang/test/OpenMP/irbuilder_nested_parallel_for.c @@ -11,10 +11,10 @@ // CHECK-LABEL: @_Z14parallel_for_0v( // CHECK-NEXT: entry: -// CHECK-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1) +// CHECK-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* [[GLOB1:@.*]]) // CHECK-NEXT:br label [[OMP_PARALLEL:%.*]] // CHECK: omp_parallel: -// CHECK-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @1, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @_Z14parallel_for_0v..omp_par to void (i32*, i32*, ...)*)) +// CHECK-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @_Z14parallel_for_0v..omp_par to void (i32*, i32*, ...)*)) // CHECK-NEXT:br label [[OMP_PAR_OUTLINED_EXIT:%.*]] // CHECK: omp.par.outlined.exit: // CHECK-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]] @@ -23,15 +23,15 @@ // // CHECK-DEBUG-LABEL: @_Z14parallel_for_0v( // CHECK-DEBUG-NEXT: entry: -// CHECK-DEBUG-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1), !dbg !{{[0-9]*}} +// CHECK-DEBUG-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* [[GLOB1:@.*]]), [[DBG10:!dbg !.*]] // CHECK-DEBUG-NEXT:br label [[OMP_PARALLEL:%.*]] // CHECK-DEBUG: omp_parallel: -// CHECK-DEBUG-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @1, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @_Z14parallel_for_0v..omp_par to void (i32*, i32*, ...)*)), !dbg !{{[0-9]*}} +// CHECK-DEBUG-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @_Z14parallel_for_0v..omp_par to void (i32*, i32*, ...)*)), [[DBG11:!dbg !.*]] // CHECK-DEBUG-NEXT:br label [[OMP_PAR_OUTLINED_EXIT:%.*]] // CHECK-DEBUG: omp.par.outlined.exit: // CHECK-DEBUG-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]] // CHECK-DEBUG: omp.par.exit.split: -// CHECK-DEBUG-NEXT:ret void, !dbg !{{[0-9]*}} +// CHECK-DEBUG-NEXT:ret void, [[DBG14:!dbg !.*]] // void parallel_for_0(void) { #pragma omp parallel @@ -50,10 +50,10 @@ void parallel_for_0(void) { // CHECK-NEXT:store float* [[R:%.*]], float** [[R_ADDR]], align 8 // CHECK-NEXT:store i32 [[A:%.*]], i32* [[A_ADDR]], align 4 // CHECK-NEXT:store double [[B:%.*]], double* [[B_ADDR]], align 8 -// CHECK-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1) +// CHECK-NEXT:[[OMP_GLOBAL_THREAD_NUM:%.*]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* [[GLOB1]]) // CHECK-NEXT:br label [[OMP_PARALLEL:%.*]] // CHECK: omp_parallel: -// CHECK-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @1, i32 3, void (i32*, i32*, ...)* bitcast (void (i32*, i32*, i32*, double*, float**)* @_Z14parallel_for_1Pfid..omp_par.1 to void (i32*, i32*, ...)*), i32* [[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]]) +// CHECK-NEXT:call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 3, void (i32*, i32*, ...)* bitcast (void (i32*, i32*, i32*, double*, float**)* @_Z14parallel_for_1Pfid..omp_par.1 to void (i32*, i32*, ...)*), i32* [[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]]) // CHECK-NEXT:br label [[OMP_PAR_OUTLINED_EXIT19:%.*]] // CHECK: omp.par.outlined.exit19: // CHECK-NEXT:br label [[OMP_PAR_EXIT_SPLIT:%.*]] @@ -66,20 +66,20 @@ void parallel_for_0(void) { // CHECK-DEBUG-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4 // CHECK-DEBUG-NEXT:[[B_ADDR:%.*]] = alloca double, align 8 // CHECK-DEBUG-NEXT:store float* [[R:%.*]], float** [[R_ADDR]], align 8 -// CHECK-DEBUG-NEXT:call void @llvm.dbg.declare(metadata float** [[R_ADDR]], metadata !{{[0-9]*}}, metadata !DIExpre
[clang] 95a25e4 - [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156
Author: Johannes Doerfert Date: 2020-08-16T14:38:31-05:00 New Revision: 95a25e4c3203f35e9f57f9fac620b4a21bffd6e1 URL: https://github.com/llvm/llvm-project/commit/95a25e4c3203f35e9f57f9fac620b4a21bffd6e1 DIFF: https://github.com/llvm/llvm-project/commit/95a25e4c3203f35e9f57f9fac620b4a21bffd6e1.diff LOG: [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156 When we implement OpenMP GPU reductions we use type punning a lot during the shuffle and reduce operations. This is not always compatible with language rules on aliasing. So far we generated TBAA which later allowed to remove some of the reduce code as accesses and initialization were "known to not alias". With this patch we avoid TBAA in this step, hopefully for all accesses that we need to. Verified on the reproducer of PR46156 and QMCPack. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D86037 Added: clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp Modified: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index 1b2608e9854a..d9ef6c2a1078 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -2845,8 +2845,12 @@ static llvm::Value *castValueToType(CodeGenFunction &CGF, llvm::Value *Val, Address CastItem = CGF.CreateMemTemp(CastTy); Address ValCastItem = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast( CastItem, Val->getType()->getPointerTo(CastItem.getAddressSpace())); - CGF.EmitStoreOfScalar(Val, ValCastItem, /*Volatile=*/false, ValTy); - return CGF.EmitLoadOfScalar(CastItem, /*Volatile=*/false, CastTy, Loc); + CGF.EmitStoreOfScalar(Val, ValCastItem, /*Volatile=*/false, ValTy, +LValueBaseInfo(AlignmentSource::Type), +TBAAAccessInfo()); + return CGF.EmitLoadOfScalar(CastItem, /*Volatile=*/false, CastTy, Loc, + LValueBaseInfo(AlignmentSource::Type), + TBAAAccessInfo()); } /// This function creates calls to one of two shuffle functions to copy @@ -2933,9 +2937,14 @@ static void shuffleAndStore(CodeGenFunction &CGF, Address SrcAddr, ThenBB, ExitBB); CGF.EmitBlock(ThenBB); llvm::Value *Res = createRuntimeShuffleFunction( - CGF, CGF.EmitLoadOfScalar(Ptr, /*Volatile=*/false, IntType, Loc), + CGF, + CGF.EmitLoadOfScalar(Ptr, /*Volatile=*/false, IntType, Loc, + LValueBaseInfo(AlignmentSource::Type), + TBAAAccessInfo()), IntType, Offset, Loc); - CGF.EmitStoreOfScalar(Res, ElemPtr, /*Volatile=*/false, IntType); + CGF.EmitStoreOfScalar(Res, ElemPtr, /*Volatile=*/false, IntType, +LValueBaseInfo(AlignmentSource::Type), +TBAAAccessInfo()); Address LocalPtr = Bld.CreateConstGEP(Ptr, 1); Address LocalElemPtr = Bld.CreateConstGEP(ElemPtr, 1); PhiSrc->addIncoming(LocalPtr.getPointer(), ThenBB); @@ -2944,9 +2953,14 @@ static void shuffleAndStore(CodeGenFunction &CGF, Address SrcAddr, CGF.EmitBlock(ExitBB); } else { llvm::Value *Res = createRuntimeShuffleFunction( - CGF, CGF.EmitLoadOfScalar(Ptr, /*Volatile=*/false, IntType, Loc), + CGF, + CGF.EmitLoadOfScalar(Ptr, /*Volatile=*/false, IntType, Loc, + LValueBaseInfo(AlignmentSource::Type), + TBAAAccessInfo()), IntType, Offset, Loc); - CGF.EmitStoreOfScalar(Res, ElemPtr, /*Volatile=*/false, IntType); + CGF.EmitStoreOfScalar(Res, ElemPtr, /*Volatile=*/false, IntType, +LValueBaseInfo(AlignmentSource::Type), +TBAAAccessInfo()); Ptr = Bld.CreateConstGEP(Ptr, 1); ElemPtr = Bld.CreateConstGEP(ElemPtr, 1); } @@ -3100,12 +3114,14 @@ static void emitReductionListCopy( } else { switch (CGF.getEvaluationKind(Private->getType())) { case TEK_Scalar: { -llvm::Value *Elem = -CGF.EmitLoadOfScalar(SrcElementAddr, /*Volatile=*/false, - Private->getType(), Private->getExprLoc()); +llvm::Value *Elem = CGF.EmitLoadOfScalar( +SrcElementAddr, /*Volatile=*/false, Private->getType(), +Private->getExprLoc(), LValueBaseInfo(AlignmentSource::Type), +TBAAAccessInfo()); // Store the source element value to the dest element address. -CGF.EmitStoreOfScalar(Elem, DestElementAddr, /*Volatile=*/false, - Private->getType()); +CGF.EmitStoreOfScalar( +Elem, DestElementAddr, /*Volatile=*/false, Private->getType(), +
[clang] abe71b7 - [OpenMP][NFC] Delete dead code
Author: Johannes Doerfert Date: 2023-11-06T11:50:41-08:00 New Revision: abe71b77f9ac2fc689e899fa6aa2486120c53912 URL: https://github.com/llvm/llvm-project/commit/abe71b77f9ac2fc689e899fa6aa2486120c53912 DIFF: https://github.com/llvm/llvm-project/commit/abe71b77f9ac2fc689e899fa6aa2486120c53912.diff LOG: [OpenMP][NFC] Delete dead code Added: Modified: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index 335ccec6455fc46..229668c8ba5db20 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -1574,11 +1574,6 @@ enum CopyAction : unsigned { RemoteLaneToThread, // ThreadCopy: Make a copy of a Reduce list on the thread's stack. ThreadCopy, - // ThreadToScratchpad: Copy a team-reduced array to the scratchpad. - ThreadToScratchpad, - // ScratchpadToThread: Copy from a scratchpad array in global memory - // containing team-reduced data to a thread's stack. - ScratchpadToThread, }; } // namespace @@ -1600,13 +1595,10 @@ static void emitReductionListCopy( CGBuilderTy &Bld = CGF.Builder; llvm::Value *RemoteLaneOffset = CopyOptions.RemoteLaneOffset; - llvm::Value *ScratchpadIndex = CopyOptions.ScratchpadIndex; - llvm::Value *ScratchpadWidth = CopyOptions.ScratchpadWidth; // Iterates, element-by-element, through the source Reduce list and // make a copy. unsigned Idx = 0; - unsigned Size = Privates.size(); for (const Expr *Private : Privates) { Address SrcElementAddr = Address::invalid(); Address DestElementAddr = Address::invalid(); @@ -1616,10 +1608,6 @@ static void emitReductionListCopy( // Set to true to update the pointer in the dest Reduce list to a // newly created element. bool UpdateDestListPtr = false; -// Increment the src or dest pointer to the scratchpad, for each -// new element. -bool IncrScratchpadSrc = false; -bool IncrScratchpadDest = false; QualType PrivatePtrType = C.getPointerType(Private->getType()); llvm::Type *PrivateLlvmPtrType = CGF.ConvertType(PrivatePtrType); @@ -1655,49 +1643,6 @@ static void emitReductionListCopy( PrivatePtrType->castAs()); break; } -case ThreadToScratchpad: { - // Step 1.1: Get the address for the src element in the Reduce list. - Address SrcElementPtrAddr = Bld.CreateConstArrayGEP(SrcBase, Idx); - SrcElementAddr = CGF.EmitLoadOfPointer( - SrcElementPtrAddr.withElementType(PrivateLlvmPtrType), - PrivatePtrType->castAs()); - - // Step 1.2: Get the address for dest element: - // address = base + index * ElementSizeInChars. - llvm::Value *ElementSizeInChars = CGF.getTypeSize(Private->getType()); - llvm::Value *CurrentOffset = - Bld.CreateNUWMul(ElementSizeInChars, ScratchpadIndex); - llvm::Value *ScratchPadElemAbsolutePtrVal = - Bld.CreateNUWAdd(DestBase.getPointer(), CurrentOffset); - ScratchPadElemAbsolutePtrVal = - Bld.CreateIntToPtr(ScratchPadElemAbsolutePtrVal, CGF.VoidPtrTy); - DestElementAddr = Address(ScratchPadElemAbsolutePtrVal, CGF.Int8Ty, -C.getTypeAlignInChars(Private->getType())); - IncrScratchpadDest = true; - break; -} -case ScratchpadToThread: { - // Step 1.1: Get the address for the src element in the scratchpad. - // address = base + index * ElementSizeInChars. - llvm::Value *ElementSizeInChars = CGF.getTypeSize(Private->getType()); - llvm::Value *CurrentOffset = - Bld.CreateNUWMul(ElementSizeInChars, ScratchpadIndex); - llvm::Value *ScratchPadElemAbsolutePtrVal = - Bld.CreateNUWAdd(SrcBase.getPointer(), CurrentOffset); - ScratchPadElemAbsolutePtrVal = - Bld.CreateIntToPtr(ScratchPadElemAbsolutePtrVal, CGF.VoidPtrTy); - SrcElementAddr = Address(ScratchPadElemAbsolutePtrVal, CGF.Int8Ty, - C.getTypeAlignInChars(Private->getType())); - IncrScratchpadSrc = true; - - // Step 1.2: Create a temporary to store the element in the destination - // Reduce list. - DestElementPtrAddr = Bld.CreateConstArrayGEP(DestBase, Idx); - DestElementAddr = - CGF.CreateMemTemp(Private->getType(), ".omp.reduction.element"); - UpdateDestListPtr = true; - break; -} } // Regardless of src and dest of copy, we emit the load of src @@ -1755,39 +1700,6 @@ static void emitReductionListCopy( C.VoidPtrTy); } -// Step 4.1: Increment SrcBase/DestBase so that it points to the starting -// address of the next element in scratchpad memory, unless we're currently -// processing the last one. Memory alignment is also taken care of here. -if ((IncrScratchpadDest || IncrScratchpadSrc)
[clang] 921bd29 - [OpenMP] Remove alignment for global <-> local reduction functions
Author: Johannes Doerfert Date: 2023-11-06T11:50:41-08:00 New Revision: 921bd299134fbe17c676b2486af269e18281def4 URL: https://github.com/llvm/llvm-project/commit/921bd299134fbe17c676b2486af269e18281def4 DIFF: https://github.com/llvm/llvm-project/commit/921bd299134fbe17c676b2486af269e18281def4.diff LOG: [OpenMP] Remove alignment for global <-> local reduction functions The alignment did likely not help much but increases the memory requirement. Note that half of the affected accesses are all performed by a single thread in each block. The reads are by consecutive threads in a single block. Added: Modified: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp clang/test/OpenMP/reduction_implicit_map.cpp clang/test/OpenMP/target_teams_generic_loop_codegen.cpp Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index 229668c8ba5db20..b4e067ff497a085 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -85,18 +85,6 @@ class ExecutionRuntimeModesRAII { ~ExecutionRuntimeModesRAII() { ExecMode = SavedExecMode; } }; -/// GPU Configuration: This information can be derived from cuda registers, -/// however, providing compile time constants helps generate more efficient -/// code. For all practical purposes this is fine because the configuration -/// is the same for all known NVPTX architectures. -enum MachineConfiguration : unsigned { - /// See "llvm/Frontend/OpenMP/OMPGridValues.h" for various related target - /// specific Grid Values like GV_Warp_Size, GV_Slot_Size - - /// Global memory alignment for performance. - GlobalMemoryAlignment = 128, -}; - static const ValueDecl *getPrivateItem(const Expr *RefExpr) { RefExpr = RefExpr->IgnoreParens(); if (const auto *ASE = dyn_cast(RefExpr)) { @@ -119,31 +107,23 @@ static const ValueDecl *getPrivateItem(const Expr *RefExpr) { return cast(ME->getMemberDecl()->getCanonicalDecl()); } - static RecordDecl *buildRecordForGlobalizedVars( ASTContext &C, ArrayRef EscapedDecls, ArrayRef EscapedDeclsForTeams, llvm::SmallDenseMap -&MappedDeclsFields, int BufSize) { +&MappedDeclsFields, +int BufSize) { using VarsDataTy = std::pair; if (EscapedDecls.empty() && EscapedDeclsForTeams.empty()) return nullptr; SmallVector GlobalizedVars; for (const ValueDecl *D : EscapedDecls) -GlobalizedVars.emplace_back( -CharUnits::fromQuantity(std::max( -C.getDeclAlign(D).getQuantity(), -static_cast(GlobalMemoryAlignment))), -D); +GlobalizedVars.emplace_back(C.getDeclAlign(D), D); for (const ValueDecl *D : EscapedDeclsForTeams) GlobalizedVars.emplace_back(C.getDeclAlign(D), D); - llvm::stable_sort(GlobalizedVars, [](VarsDataTy L, VarsDataTy R) { -return L.first > R.first; - }); // Build struct _globalized_locals_ty { - // /* globalized vars */[WarSize] align (max(decl_align, - // GlobalMemoryAlignment)) + // /* globalized vars */[WarSize] align (decl_align) // /* globalized vars */ for EscapedDeclsForTeams // }; RecordDecl *GlobalizedRD = C.buildImplicitRecord("_globalized_locals_ty"); @@ -182,9 +162,7 @@ static RecordDecl *buildRecordForGlobalizedVars( /*BW=*/nullptr, /*Mutable=*/false, /*InitStyle=*/ICIS_NoInit); Field->setAccess(AS_public); - llvm::APInt Align(32, std::max(C.getDeclAlign(VD).getQuantity(), - static_cast( - GlobalMemoryAlignment))); + llvm::APInt Align(32, Pair.first.getQuantity()); Field->addAttr(AlignedAttr::CreateImplicit( C, /*IsAlignmentExpr=*/true, IntegerLiteral::Create(C, Align, diff --git a/clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp b/clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp index 32b67762a1e1e6b..27af206098c10b1 100644 --- a/clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp +++ b/clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp @@ -253,7 +253,7 @@ int bar(int n){ // CHECK1-NEXT:[[E:%.*]] = getelementptr inbounds [[STRUCT__GLOBALIZED_LOCALS_TY:%.*]], ptr [[TMP4]], i32 0, i32 0 // CHECK1-NEXT:[[TMP8:%.*]] = getelementptr inbounds [1024 x double], ptr [[E]], i32 0, i32 [[TMP5]] // CHECK1-NEXT:[[TMP9:%.*]] = load double, ptr [[TMP7]], align 8 -// CHECK1-NEXT:store double [[TMP9]], ptr [[TMP8]], align 128 +// CHECK1-NEXT:store double [[TMP9]], ptr [[TMP8]], align 8 // CHECK1-NEXT:ret void // // @@ -294,7 +294,7 @@ int bar(int n){ // CHECK1-NEXT:[[TMP7:%.*]] = load ptr, ptr [[TMP6]], align 8 // CHECK1-NEXT:[[E:%.*]] = getelementptr inbounds [[STRUCT__GLOBALIZED_LOCALS_TY:%.*]], ptr [[TMP4]], i32 0, i32 0 //
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Plugin::success(); +} + +// Allocate and construct the AMDGPU kernel. +GenericKernelTy *AMDGPUKernel = Plugin.allocate(); +if (!AMDGPUKernel) + return Plugin::error("Failed to allocate memory for AMDGPU kernel"); + +new (AMDGPUKernel) AMDGPUKernelTy(Name); +if (auto Err = AMDGPUKernel->initImpl(*this, Image)) + return std::move(Err); + +auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>(); jdoerfert wrote: Here and above you don't need plugin allocate. That's only for things that outlive the function, neither the Kernel nor the AsyncInfo will. They should be stack objects. That said, you should not need an aysync_info ptr anyway. AsyncInfoWrapperTy should work standalone and it has all the functions we need. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy { using CUDAStreamManagerTy = GenericDeviceResourceManagerTy; using CUDAEventManagerTy = GenericDeviceResourceManagerTy; + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + bool IsCtor) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'nvptx-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini", +sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Plugin::success(); +} + +// The Nvidia backend cannot handle creating the ctor / dtor array +// automatically so we must create it ourselves. The backend will emit +// several globals that contain function pointers we can call. These are +// prefixed with a known name due to Nvidia's lack of section support. +const ELF64LEObjectFile *ELFObj = +Handler.getOrCreateELFObjectFile(*this, Image); +if (!ELFObj) + return Plugin::error("Unable to create ELF object for image %p", + Image.getStart()); + +// Search for all symbols that contain a constructor or destructor. +SmallVector> Funcs; +for (ELFSymbolRef Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) +return NameOrErr.takeError(); + + if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_" + : "__fini_array_object_")) +continue; + + uint16_t priority; + if (NameOrErr->rsplit('_').second.getAsInteger(10, priority)) +return Plugin::error("Invalid priority for constructor or destructor"); + + Funcs.emplace_back(*NameOrErr, priority); +} + +// Sort the created array to be in priority order. +llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; }); + +// Allocate a buffer to store all of the known constructor / destructor +// functions in so we can iterate them on the device. +void *Buffer = +allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED); jdoerfert wrote: Do we really need to used shared/managed memory here? https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy { using CUDAStreamManagerTy = GenericDeviceResourceManagerTy; using CUDAEventManagerTy = GenericDeviceResourceManagerTy; + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + bool IsCtor) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'nvptx-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini", +sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Plugin::success(); +} + +// The Nvidia backend cannot handle creating the ctor / dtor array +// automatically so we must create it ourselves. The backend will emit +// several globals that contain function pointers we can call. These are +// prefixed with a known name due to Nvidia's lack of section support. +const ELF64LEObjectFile *ELFObj = +Handler.getOrCreateELFObjectFile(*this, Image); +if (!ELFObj) + return Plugin::error("Unable to create ELF object for image %p", + Image.getStart()); + +// Search for all symbols that contain a constructor or destructor. +SmallVector> Funcs; +for (ELFSymbolRef Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) +return NameOrErr.takeError(); + + if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_" + : "__fini_array_object_")) +continue; + + uint16_t priority; + if (NameOrErr->rsplit('_').second.getAsInteger(10, priority)) +return Plugin::error("Invalid priority for constructor or destructor"); + + Funcs.emplace_back(*NameOrErr, priority); +} + +// Sort the created array to be in priority order. +llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; }); + +// Allocate a buffer to store all of the known constructor / destructor +// functions in so we can iterate them on the device. +void *Buffer = +allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED); +if (!Buffer) + return Plugin::error("Failed to allocate memory for global buffer"); + +auto *GlobalPtrStart = reinterpret_cast(Buffer); +auto *GlobalPtrStop = reinterpret_cast(Buffer) + Funcs.size(); + +std::size_t Idx = 0; +for (auto [Name, Priority] : Funcs) { + GlobalTy FunctionAddr(Name.str(), sizeof(void *), &GlobalPtrStart[Idx++]); + if (auto Err = Handler.readGlobalFromDevice(*this, Image, FunctionAddr)) +return std::move(Err); +} + +// Copy the created buffer to the appropriate symbols so the kernel can +// iterate through them. +GlobalTy StartGlobal(IsCtor ? "__init_array_start" : "__fini_array_start", + sizeof(void *), &GlobalPtrStart); +if (auto Err = Handler.writeGlobalToDevice(*this, Image, StartGlobal)) + return std::move(Err); + +GlobalTy StopGlobal(IsCtor ? "__init_array_end" : "__fini_array_end", +sizeof(void *), &GlobalPtrStop); +if (auto Err = Handler.writeGlobalToDevice(*this, Image, StopGlobal)) + return std::move(Err); + +// Launch the kernel to execute the functions in the buffer. +GenericKernelTy *CUDAKernel = Plugin.allocate(); jdoerfert wrote: Same as with AMD. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy { using CUDAStreamManagerTy = GenericDeviceResourceManagerTy; using CUDAEventManagerTy = GenericDeviceResourceManagerTy; + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + bool IsCtor) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'nvptx-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini", +sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Plugin::success(); +} + +// The Nvidia backend cannot handle creating the ctor / dtor array +// automatically so we must create it ourselves. The backend will emit +// several globals that contain function pointers we can call. These are +// prefixed with a known name due to Nvidia's lack of section support. +const ELF64LEObjectFile *ELFObj = +Handler.getOrCreateELFObjectFile(*this, Image); +if (!ELFObj) + return Plugin::error("Unable to create ELF object for image %p", + Image.getStart()); + +// Search for all symbols that contain a constructor or destructor. +SmallVector> Funcs; +for (ELFSymbolRef Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) +return NameOrErr.takeError(); + + if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_" + : "__fini_array_object_")) +continue; + + uint16_t priority; + if (NameOrErr->rsplit('_').second.getAsInteger(10, priority)) +return Plugin::error("Invalid priority for constructor or destructor"); + + Funcs.emplace_back(*NameOrErr, priority); +} + +// Sort the created array to be in priority order. +llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; }); + +// Allocate a buffer to store all of the known constructor / destructor +// functions in so we can iterate them on the device. +void *Buffer = +allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED); +if (!Buffer) + return Plugin::error("Failed to allocate memory for global buffer"); + +auto *GlobalPtrStart = reinterpret_cast(Buffer); +auto *GlobalPtrStop = reinterpret_cast(Buffer) + Funcs.size(); + +std::size_t Idx = 0; +for (auto [Name, Priority] : Funcs) { + GlobalTy FunctionAddr(Name.str(), sizeof(void *), &GlobalPtrStart[Idx++]); + if (auto Err = Handler.readGlobalFromDevice(*this, Image, FunctionAddr)) +return std::move(Err); +} + +// Copy the created buffer to the appropriate symbols so the kernel can +// iterate through them. +GlobalTy StartGlobal(IsCtor ? "__init_array_start" : "__fini_array_start", + sizeof(void *), &GlobalPtrStart); +if (auto Err = Handler.writeGlobalToDevice(*this, Image, StartGlobal)) + return std::move(Err); + +GlobalTy StopGlobal(IsCtor ? "__init_array_end" : "__fini_array_end", +sizeof(void *), &GlobalPtrStop); +if (auto Err = Handler.writeGlobalToDevice(*this, Image, StopGlobal)) + return std::move(Err); + +// Launch the kernel to execute the functions in the buffer. +GenericKernelTy *CUDAKernel = Plugin.allocate(); +if (!CUDAKernel) + return Plugin::error("Failed to allocate memory for CUDA kernel"); + +new (CUDAKernel) +CUDAKernelTy(IsCtor ? "nvptx$device$init" : "nvptx$device$fini"); jdoerfert wrote: > IsCtor ? "nvptx$device$init" : "nvptx$device$fini" Do this once, other such ternaries as well. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -313,12 +313,18 @@ static void registerGlobalCtorsDtorsForImage(__tgt_bin_desc *Desc, DP("Adding ctor " DPxMOD " to the pending list.\n", DPxPTR(Entry->addr)); Device.PendingCtorsDtors[Desc].PendingCtors.push_back(Entry->addr); +MESSAGE("Calling deprecated constructor for entry %s will be removed " jdoerfert wrote: Add "WARNING: ", or similar. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy { using CUDAStreamManagerTy = GenericDeviceResourceManagerTy; using CUDAEventManagerTy = GenericDeviceResourceManagerTy; + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + bool IsCtor) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'nvptx-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini", +sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Plugin::success(); +} + +// The Nvidia backend cannot handle creating the ctor / dtor array +// automatically so we must create it ourselves. The backend will emit +// several globals that contain function pointers we can call. These are +// prefixed with a known name due to Nvidia's lack of section support. +const ELF64LEObjectFile *ELFObj = +Handler.getOrCreateELFObjectFile(*this, Image); +if (!ELFObj) + return Plugin::error("Unable to create ELF object for image %p", + Image.getStart()); + +// Search for all symbols that contain a constructor or destructor. +SmallVector> Funcs; +for (ELFSymbolRef Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) +return NameOrErr.takeError(); + + if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_" + : "__fini_array_object_")) +continue; + + uint16_t priority; + if (NameOrErr->rsplit('_').second.getAsInteger(10, priority)) +return Plugin::error("Invalid priority for constructor or destructor"); + + Funcs.emplace_back(*NameOrErr, priority); +} + +// Sort the created array to be in priority order. +llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; }); + +// Allocate a buffer to store all of the known constructor / destructor +// functions in so we can iterate them on the device. +void *Buffer = +allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED); +if (!Buffer) + return Plugin::error("Failed to allocate memory for global buffer"); + +auto *GlobalPtrStart = reinterpret_cast(Buffer); +auto *GlobalPtrStop = reinterpret_cast(Buffer) + Funcs.size(); + +std::size_t Idx = 0; +for (auto [Name, Priority] : Funcs) { + GlobalTy FunctionAddr(Name.str(), sizeof(void *), &GlobalPtrStart[Idx++]); + if (auto Err = Handler.readGlobalFromDevice(*this, Image, FunctionAddr)) +return std::move(Err); +} + +// Copy the created buffer to the appropriate symbols so the kernel can +// iterate through them. +GlobalTy StartGlobal(IsCtor ? "__init_array_start" : "__fini_array_start", + sizeof(void *), &GlobalPtrStart); +if (auto Err = Handler.writeGlobalToDevice(*this, Image, StartGlobal)) + return std::move(Err); + +GlobalTy StopGlobal(IsCtor ? "__init_array_end" : "__fini_array_end", +sizeof(void *), &GlobalPtrStop); +if (auto Err = Handler.writeGlobalToDevice(*this, Image, StopGlobal)) + return std::move(Err); + +// Launch the kernel to execute the functions in the buffer. +GenericKernelTy *CUDAKernel = Plugin.allocate(); +if (!CUDAKernel) + return Plugin::error("Failed to allocate memory for CUDA kernel"); + +new (CUDAKernel) +CUDAKernelTy(IsCtor ? "nvptx$device$init" : "nvptx$device$fini"); + +if (auto Err = CUDAKernel->init(*this, Image)) + return std::move(Err); + +AsyncInfoWrapperTy AsyncInfoWrapper(*this, nullptr); + +if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper)) jdoerfert wrote: You shouldn't need this. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jdoerfert approved this pull request. LG, check my comments. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jdoerfert edited https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,38 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Plugin::success(); +} + +// Allocate and construct the AMDGPU kernel. +AMDGPUKernelTy AMDGPUKernel(Name); +if (auto Err = AMDGPUKernel.initImpl(*this, Image)) jdoerfert wrote: Generally, we should always call the generic entry points, so, init, not initImpl. Assuming you have no specific reason not to. Also below for launch. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -0,0 +1,37 @@ +// RUN: %libomptarget-compilexx-run-and-check-generic + +// REQUIRES: libc + +#include + +#pragma omp begin declare target device_type(nohost) + +// CHECK: void ctor1() +// CHECK: void ctor2() +// CHECK: void ctor3() +[[gnu::constructor(101)]] void ctor1() { puts(__PRETTY_FUNCTION__); } +[[gnu::constructor(102)]] void ctor2() { puts(__PRETTY_FUNCTION__); } +[[gnu::constructor(103)]] void ctor3() { puts(__PRETTY_FUNCTION__); } jdoerfert wrote: put the 103 priority between 101 and 102 to actually test sorting. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -1038,6 +1048,100 @@ struct CUDADeviceTy : public GenericDeviceTy { using CUDAStreamManagerTy = GenericDeviceResourceManagerTy; using CUDAEventManagerTy = GenericDeviceResourceManagerTy; + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + bool IsCtor) { +const char *KernelName = IsCtor ? "nvptx$device$init" : "nvptx$device$fini"; +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'nvptx-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(KernelName, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Plugin::success(); +} + +// The Nvidia backend cannot handle creating the ctor / dtor array +// automatically so we must create it ourselves. The backend will emit +// several globals that contain function pointers we can call. These are +// prefixed with a known name due to Nvidia's lack of section support. +const ELF64LEObjectFile *ELFObj = +Handler.getOrCreateELFObjectFile(*this, Image); +if (!ELFObj) + return Plugin::error("Unable to create ELF object for image %p", + Image.getStart()); + +// Search for all symbols that contain a constructor or destructor. +SmallVector> Funcs; +for (ELFSymbolRef Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) +return NameOrErr.takeError(); + + if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_" + : "__fini_array_object_")) +continue; + + uint16_t priority; jdoerfert wrote: s/p/P/ https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -1038,6 +1048,109 @@ struct CUDADeviceTy : public GenericDeviceTy { using CUDAStreamManagerTy = GenericDeviceResourceManagerTy; using CUDAEventManagerTy = GenericDeviceResourceManagerTy; + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + bool IsCtor) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'nvptx-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(IsCtor ? "nvptx$device$init" : "nvptx$device$fini", +sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Plugin::success(); +} + +// The Nvidia backend cannot handle creating the ctor / dtor array +// automatically so we must create it ourselves. The backend will emit +// several globals that contain function pointers we can call. These are +// prefixed with a known name due to Nvidia's lack of section support. +const ELF64LEObjectFile *ELFObj = +Handler.getOrCreateELFObjectFile(*this, Image); +if (!ELFObj) + return Plugin::error("Unable to create ELF object for image %p", + Image.getStart()); + +// Search for all symbols that contain a constructor or destructor. +SmallVector> Funcs; +for (ELFSymbolRef Sym : ELFObj->symbols()) { + auto NameOrErr = Sym.getName(); + if (!NameOrErr) +return NameOrErr.takeError(); + + if (!NameOrErr->starts_with(IsCtor ? "__init_array_object_" + : "__fini_array_object_")) +continue; + + uint16_t priority; + if (NameOrErr->rsplit('_').second.getAsInteger(10, priority)) +return Plugin::error("Invalid priority for constructor or destructor"); + + Funcs.emplace_back(*NameOrErr, priority); +} + +// Sort the created array to be in priority order. +llvm::sort(Funcs, [=](auto x, auto y) { return x.second < y.second; }); + +// Allocate a buffer to store all of the known constructor / destructor +// functions in so we can iterate them on the device. +void *Buffer = +allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED); jdoerfert wrote: I'm more worried about systems that do not have support than about the time. If you think it's always supported, we can keep it for now. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 7318fe6 - [OpenMP][FIX] Ensure device reduction geps work for multi-var reductions
Author: Johannes Doerfert Date: 2023-11-10T14:34:46-08:00 New Revision: 7318fe633487f9b9187e18c7ebd4c6516ded9a22 URL: https://github.com/llvm/llvm-project/commit/7318fe633487f9b9187e18c7ebd4c6516ded9a22 DIFF: https://github.com/llvm/llvm-project/commit/7318fe633487f9b9187e18c7ebd4c6516ded9a22.diff LOG: [OpenMP][FIX] Ensure device reduction geps work for multi-var reductions If we have more than one reduction variable we need to be consistent wrt. indexing. In 3de645efe30b83ba1b6d7e500486c4f441a17a61 we broke this as the buffer type was reduced to a singleton but the index computation was not adjusted to account for that offset. This fixes it by interleaving the reduction variables properly in a array-of-struct style. We can revert it back to struct-of-array in a follow up if turns out to be a problem. I doubt it since half the accesses should benefit from the locallity this layout offers and only the other half were consecutive before. Added: openmp/libomptarget/test/offloading/multiple_reductions_simple.c Modified: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp clang/test/OpenMP/target_teams_generic_loop_codegen.cpp openmp/libomptarget/DeviceRTL/src/Reduction.cpp Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index a13d74743c3bd3f..abecf5250f4cf96 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -153,9 +153,11 @@ static RecordDecl *buildRecordForGlobalizedVars( Field->addAttr(*I); } } else { - llvm::APInt ArraySize(32, BufSize); - Type = C.getConstantArrayType(Type, ArraySize, nullptr, -ArraySizeModifier::Normal, 0); + if (BufSize > 1) { +llvm::APInt ArraySize(32, BufSize); +Type = C.getConstantArrayType(Type, ArraySize, nullptr, + ArraySizeModifier::Normal, 0); + } Field = FieldDecl::Create( C, GlobalizedRD, Loc, Loc, VD->getIdentifier(), Type, C.getTrivialTypeSourceInfo(Type, SourceLocation()), @@ -2205,8 +2207,7 @@ static llvm::Value *emitListToGlobalCopyFunction( llvm::Value *BufferArrPtr = Bld.CreatePointerBitCastOrAddrSpaceCast( CGF.EmitLoadOfScalar(AddrBufferArg, /*Volatile=*/false, C.VoidPtrTy, Loc), LLVMReductionsBufferTy->getPointerTo()); - llvm::Value *Idxs[] = {llvm::ConstantInt::getNullValue(CGF.Int32Ty), - CGF.EmitLoadOfScalar(CGF.GetAddrOfLocalVar(&IdxArg), + llvm::Value *Idxs[] = {CGF.EmitLoadOfScalar(CGF.GetAddrOfLocalVar(&IdxArg), /*Volatile=*/false, C.IntTy, Loc)}; unsigned Idx = 0; @@ -2224,12 +2225,12 @@ static llvm::Value *emitListToGlobalCopyFunction( const ValueDecl *VD = cast(Private)->getDecl(); // Global = Buffer.VD[Idx]; const FieldDecl *FD = VarFieldMap.lookup(VD); +llvm::Value *BufferPtr = +Bld.CreateInBoundsGEP(LLVMReductionsBufferTy, BufferArrPtr, Idxs); LValue GlobLVal = CGF.EmitLValueForField( -CGF.MakeNaturalAlignAddrLValue(BufferArrPtr, StaticTy), FD); +CGF.MakeNaturalAlignAddrLValue(BufferPtr, StaticTy), FD); Address GlobAddr = GlobLVal.getAddress(CGF); -llvm::Value *BufferPtr = Bld.CreateInBoundsGEP(GlobAddr.getElementType(), - GlobAddr.getPointer(), Idxs); -GlobLVal.setAddress(Address(BufferPtr, +GlobLVal.setAddress(Address(GlobAddr.getPointer(), CGF.ConvertTypeForMem(Private->getType()), GlobAddr.getAlignment())); switch (CGF.getEvaluationKind(Private->getType())) { @@ -2316,8 +2317,7 @@ static llvm::Value *emitListToGlobalReduceFunction( Address ReductionList = CGF.CreateMemTemp(ReductionArrayTy, ".omp.reduction.red_list"); auto IPriv = Privates.begin(); - llvm::Value *Idxs[] = {llvm::ConstantInt::getNullValue(CGF.Int32Ty), - CGF.EmitLoadOfScalar(CGF.GetAddrOfLocalVar(&IdxArg), + llvm::Value *Idxs[] = {CGF.EmitLoadOfScalar(CGF.GetAddrOfLocalVar(&IdxArg), /*Volatile=*/false, C.IntTy, Loc)}; unsigned Idx = 0; @@ -2326,12 +2326,13 @@ static llvm::Value *emitListToGlobalReduceFunction( // Global = Buffer.VD[Idx]; const ValueDecl *VD = cast(*IPriv)->getDecl(); const FieldDecl *FD = VarFieldMap.lookup(VD); +llvm::Value *BufferPtr = +Bld.CreateInBoundsGEP(LLVMReductionsBufferTy, BufferArrPtr, Idxs); LValue GlobLVal = CGF.EmitLValueForField( -CGF.MakeNaturalAlignAddrLValue(BufferArrPtr, StaticTy), FD); +CGF.MakeNaturalAlignAddrLValue(Bu
[clang-tools-extra] [llvm] [openmp] [clang] [OpenMP] Add extra flags to libomptarget and plugin builds (PR #74520)
https://github.com/jdoerfert updated https://github.com/llvm/llvm-project/pull/74520 >From f505868953d07125f67bcbb79be426a6deee1a13 Mon Sep 17 00:00:00 2001 From: Johannes Doerfert Date: Tue, 5 Dec 2023 12:35:04 -0800 Subject: [PATCH 1/2] [OpenMP] Add extra flags to libomptarget and plugin builds --- openmp/libomptarget/CMakeLists.txt| 19 +++ .../plugins-nextgen/common/CMakeLists.txt | 3 +++ openmp/libomptarget/src/CMakeLists.txt| 3 +++ 3 files changed, 25 insertions(+) diff --git a/openmp/libomptarget/CMakeLists.txt b/openmp/libomptarget/CMakeLists.txt index 972b887c7c952..fe895d5bc3254 100644 --- a/openmp/libomptarget/CMakeLists.txt +++ b/openmp/libomptarget/CMakeLists.txt @@ -75,6 +75,25 @@ if(LIBOMPTARGET_ENABLE_DEBUG) add_definitions(-DOMPTARGET_DEBUG) endif() +# No exceptions and no RTTI, except if requested. +set(offload_compile_flags -fno-exceptions) +if(NOT LLVM_ENABLE_RTTI) + set(offload_compile_flags ${offload_compile_flags} -fno-rtti) +endif() + +# If LTO is not explicitly disabled we check if we can enable it and do so. +set(LIBOMPTARGET_USE_LTO TRUE CACHE BOOL "Use LTO for the offload runtimes if available") +if (LIBOMPTARGET_USE_LTO) + include(CheckIPOSupported) + check_ipo_supported(RESULT use_lto OUTPUT output) + if(use_lto) + set(offload_compile_flags ${offload_compile_flags} -flto) + set(offload_link_flags ${offload_link_flags} -flto) + else() + message(WARNING "LTO is not supported: ${output}") + endif() +endif() + # OMPT support for libomptarget # Follow host OMPT support and check if host support has been requested. # LIBOMP_HAVE_OMPT_SUPPORT indicates whether host OMPT support has been implemented. diff --git a/openmp/libomptarget/plugins-nextgen/common/CMakeLists.txt b/openmp/libomptarget/plugins-nextgen/common/CMakeLists.txt index 5b332ed3d2f41..8ae3ff2a6d291 100644 --- a/openmp/libomptarget/plugins-nextgen/common/CMakeLists.txt +++ b/openmp/libomptarget/plugins-nextgen/common/CMakeLists.txt @@ -88,6 +88,9 @@ target_compile_definitions(PluginCommon PRIVATE DEBUG_PREFIX="PluginInterface" ) +target_compile_options(PluginCommon PUBLIC ${offload_compile_flags}) +target_link_options(PluginCommon PUBLIC ${offload_link_flags}) + target_include_directories(PluginCommon PRIVATE ${LIBOMPTARGET_INCLUDE_DIR} diff --git a/openmp/libomptarget/src/CMakeLists.txt b/openmp/libomptarget/src/CMakeLists.txt index 7c311f738ac8e..253e9f0aa176f 100644 --- a/openmp/libomptarget/src/CMakeLists.txt +++ b/openmp/libomptarget/src/CMakeLists.txt @@ -55,6 +55,9 @@ target_compile_definitions(omptarget PRIVATE DEBUG_PREFIX="omptarget" ) +target_compile_options(omptarget PUBLIC ${offload_compile_flags}) +target_link_options(omptarget PUBLIC ${offload_link_flags}) + # libomptarget.so needs to be aware of where the plugins live as they # are now separated in the build directory. set_target_properties(omptarget PROPERTIES >From f96dc0a43075dcf9700b7813550ed687136b0d0a Mon Sep 17 00:00:00 2001 From: Johannes Doerfert Date: Mon, 11 Dec 2023 10:34:28 -0800 Subject: [PATCH 2/2] Update CMakeLists.txt --- openmp/libomptarget/CMakeLists.txt | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/openmp/libomptarget/CMakeLists.txt b/openmp/libomptarget/CMakeLists.txt index fe895d5bc3254..21ecb6ddba3dc 100644 --- a/openmp/libomptarget/CMakeLists.txt +++ b/openmp/libomptarget/CMakeLists.txt @@ -78,20 +78,20 @@ endif() # No exceptions and no RTTI, except if requested. set(offload_compile_flags -fno-exceptions) if(NOT LLVM_ENABLE_RTTI) - set(offload_compile_flags ${offload_compile_flags} -fno-rtti) + set(offload_compile_flags ${offload_compile_flags} -fno-rtti) endif() # If LTO is not explicitly disabled we check if we can enable it and do so. set(LIBOMPTARGET_USE_LTO TRUE CACHE BOOL "Use LTO for the offload runtimes if available") if (LIBOMPTARGET_USE_LTO) - include(CheckIPOSupported) - check_ipo_supported(RESULT use_lto OUTPUT output) - if(use_lto) - set(offload_compile_flags ${offload_compile_flags} -flto) - set(offload_link_flags ${offload_link_flags} -flto) - else() - message(WARNING "LTO is not supported: ${output}") - endif() + include(CheckIPOSupported) + check_ipo_supported(RESULT use_lto OUTPUT output) + if(use_lto) +set(offload_compile_flags ${offload_compile_flags} -flto) +set(offload_link_flags ${offload_link_flags} -flto) + else() +message(WARNING "LTO is not supported: ${output}") + endif() endif() # OMPT support for libomptarget ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [openmp] [clang] [OpenMP] Add extra flags to libomptarget and plugin builds (PR #74520)
https://github.com/jdoerfert closed https://github.com/llvm/llvm-project/pull/74520 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
https://github.com/jdoerfert edited https://github.com/llvm/llvm-project/pull/72697 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant *Addr, StringRef Name, M.getDataLayout().getDefaultGlobalsAddressSpace()); // The entry has to be created in the section the linker expects it to be. - Entry->setSection(SectionName); + if (Triple.isOSBinFormatCOFF()) +Entry->setSection((SectionName + "$OE").str()); + else +Entry->setSection(SectionName); Entry->setAlignment(Align(1)); } std::pair offloading::getOffloadEntryArray(Module &M, StringRef SectionName) { - auto *EntriesB = - new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0), - /*isConstant=*/true, GlobalValue::ExternalLinkage, - /*Initializer=*/nullptr, "__start_" + SectionName); + llvm::Triple Triple(M.getTargetTriple()); + + auto *ZeroInitilaizer = + ConstantAggregateZero::get(ArrayType::get(getEntryTy(M), 0u)); + auto *EntryInit = Triple.isOSBinFormatCOFF() ? ZeroInitilaizer : nullptr; + auto *EntryType = Triple.isOSBinFormatCOFF() +? ZeroInitilaizer->getType() +: ArrayType::get(getEntryTy(M), 0); jdoerfert wrote: I don't see why we need the ternary here, aren't both options the same? https://github.com/llvm/llvm-project/pull/72697 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
@@ -62,35 +63,51 @@ void offloading::emitOffloadingEntry(Module &M, Constant *Addr, StringRef Name, M.getDataLayout().getDefaultGlobalsAddressSpace()); // The entry has to be created in the section the linker expects it to be. - Entry->setSection(SectionName); + if (Triple.isOSBinFormatCOFF()) +Entry->setSection((SectionName + "$OE").str()); + else +Entry->setSection(SectionName); Entry->setAlignment(Align(1)); } std::pair offloading::getOffloadEntryArray(Module &M, StringRef SectionName) { - auto *EntriesB = - new GlobalVariable(M, ArrayType::get(getEntryTy(M), 0), - /*isConstant=*/true, GlobalValue::ExternalLinkage, - /*Initializer=*/nullptr, "__start_" + SectionName); + llvm::Triple Triple(M.getTargetTriple()); jdoerfert wrote: This should be in the llvm namespace. https://github.com/llvm/llvm-project/pull/72697 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [Offload] Initial support for registering offloading entries on COFF targets (PR #72697)
https://github.com/jdoerfert approved this pull request. LG, two nits. https://github.com/llvm/llvm-project/pull/72697 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [Clang][OpenMP] Fix ordering of processing of map clauses when mapping a struct. (PR #72410)
@@ -7742,15 +7744,42 @@ class MappableExprsHandler { else if (C->getMapType() == OMPC_MAP_alloc) Kind = Allocs; const auto *EI = C->getVarRefs().begin(); - for (const auto L : C->component_lists()) { -const Expr *E = (C->getMapLoc().isValid()) ? *EI : nullptr; -InfoGen(std::get<0>(L), Kind, std::get<1>(L), C->getMapType(), -C->getMapTypeModifiers(), std::nullopt, -/*ReturnDevicePointer=*/false, C->isImplicit(), std::get<2>(L), -E); -++EI; + if (*EI && !isa(*EI)) { +for (const auto L : C->component_lists()) { + const Expr *E = (C->getMapLoc().isValid()) ? *EI : nullptr; + InfoGen(std::get<0>(L), Kind, std::get<1>(L), C->getMapType(), + C->getMapTypeModifiers(), std::nullopt, + /*ReturnDevicePointer=*/false, C->isImplicit(), + std::get<2>(L), E); + ++EI; +} + } +} + +// Process the maps with sections. +for (const auto *Cl : Clauses) { + const auto *C = dyn_cast(Cl); + if (!C) +continue; + MapKind Kind = Other; + if (llvm::is_contained(C->getMapTypeModifiers(), + OMPC_MAP_MODIFIER_present)) +Kind = Present; + else if (C->getMapType() == OMPC_MAP_alloc) +Kind = Allocs; + const auto *EI = C->getVarRefs().begin(); + if (*EI && isa(*EI)) { +for (const auto L : C->component_lists()) { + const Expr *E = (C->getMapLoc().isValid()) ? *EI : nullptr; + InfoGen(std::get<0>(L), Kind, std::get<1>(L), C->getMapType(), + C->getMapTypeModifiers(), std::nullopt, + /*ReturnDevicePointer=*/false, C->isImplicit(), + std::get<2>(L), E); + ++EI; +} jdoerfert wrote: This duplicates the loop nest, which is very unfortunate. Why not actually sort the clause list? That will also make it easier to add/change things in the future, e.g., we simply modify the comparator. https://github.com/llvm/llvm-project/pull/72410 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Robustify openmp test (PR #69739)
https://github.com/jdoerfert approved this pull request. LG, thx https://github.com/llvm/llvm-project/pull/69739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP] Unify the min/max thread/teams pathways (PR #70273)
@@ -1,68 +1,20 @@ -// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK1 -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK1 -// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK1 -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK1 - -// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK2 -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK2 -// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK2 -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK2 - -// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK3 -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK3 -// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK3 -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK3 - -// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK4 -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK4 -// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK4 -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK4 - -// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck -allow-deprecated-dag-overlap %s -check-prefix=CHECK5 -// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s -// RU
[clang] [OpenMP] Associate the KernelEnvironment with the GenericKernelTy (PR #70383)
https://github.com/jdoerfert updated https://github.com/llvm/llvm-project/pull/70383 >From fa6d6d9cf6398915f911e06eecc78c7ba83d3623 Mon Sep 17 00:00:00 2001 From: Johannes Doerfert Date: Wed, 25 Oct 2023 16:46:01 -0700 Subject: [PATCH] [OpenMP] Associate the KernelEnvironment with the GenericKernelTy By associating the kernel environment with the generic kernel we can access middle-end information easily, including the launch bounds ranges that are acceptable. By constraining the number of threads accordingly, we now obey the user provided bounds that were passed via attributes. --- clang/test/OpenMP/bug57757.cpp| 15 ++-- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 4 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 8 +- .../PluginInterface/PluginInterface.cpp | 74 +++ .../common/PluginInterface/PluginInterface.h | 39 +- .../plugins-nextgen/cuda/src/rtl.cpp | 8 +- .../generic-elf-64bit/src/rtl.cpp | 20 ++--- .../test/offloading/default_thread_limit.c| 3 +- .../test/offloading/thread_state_1.c | 4 +- .../test/offloading/thread_state_2.c | 4 +- 10 files changed, 74 insertions(+), 105 deletions(-) diff --git a/clang/test/OpenMP/bug57757.cpp b/clang/test/OpenMP/bug57757.cpp index 7894796ac46284c..7acfe134ddd0baf 100644 --- a/clang/test/OpenMP/bug57757.cpp +++ b/clang/test/OpenMP/bug57757.cpp @@ -32,24 +32,23 @@ void foo() { // CHECK-NEXT: entry: // CHECK-NEXT:[[TMP2:%.*]] = getelementptr inbounds [[STRUCT_KMP_TASK_T:%.*]], ptr [[TMP1]], i64 0, i32 2 // CHECK-NEXT:tail call void @llvm.experimental.noalias.scope.decl(metadata [[META13:![0-9]+]]) -// CHECK-NEXT:tail call void @llvm.experimental.noalias.scope.decl(metadata [[META16:![0-9]+]]) -// CHECK-NEXT:[[TMP3:%.*]] = load i32, ptr [[TMP2]], align 4, !tbaa [[TBAA18:![0-9]+]], !alias.scope !13, !noalias !16 +// CHECK-NEXT:[[TMP3:%.*]] = load i32, ptr [[TMP2]], align 4, !tbaa [[TBAA16:![0-9]+]], !alias.scope !13, !noalias !17 // CHECK-NEXT:switch i32 [[TMP3]], label [[DOTOMP_OUTLINED__EXIT:%.*]] [ // CHECK-NEXT:i32 0, label [[DOTUNTIED_JMP__I:%.*]] // CHECK-NEXT:i32 1, label [[DOTUNTIED_NEXT__I:%.*]] // CHECK-NEXT:] // CHECK: .untied.jmp..i: -// CHECK-NEXT:store i32 1, ptr [[TMP2]], align 4, !tbaa [[TBAA18]], !alias.scope !13, !noalias !16 -// CHECK-NEXT:[[TMP4:%.*]] = tail call i32 @__kmpc_omp_task(ptr nonnull @[[GLOB1]], i32 [[TMP0]], ptr [[TMP1]]), !noalias !19 +// CHECK-NEXT:store i32 1, ptr [[TMP2]], align 4, !tbaa [[TBAA16]], !alias.scope !13, !noalias !17 +// CHECK-NEXT:[[TMP4:%.*]] = tail call i32 @__kmpc_omp_task(ptr nonnull @[[GLOB1]], i32 [[TMP0]], ptr [[TMP1]]), !noalias !13 // CHECK-NEXT:br label [[DOTOMP_OUTLINED__EXIT]] // CHECK: .untied.next..i: // CHECK-NEXT:[[TMP5:%.*]] = getelementptr inbounds [[STRUCT_KMP_TASK_T_WITH_PRIVATES:%.*]], ptr [[TMP1]], i64 0, i32 1 // CHECK-NEXT:[[TMP6:%.*]] = getelementptr inbounds [[STRUCT_KMP_TASK_T_WITH_PRIVATES]], ptr [[TMP1]], i64 0, i32 1, i32 2 // CHECK-NEXT:[[TMP7:%.*]] = getelementptr inbounds [[STRUCT_KMP_TASK_T_WITH_PRIVATES]], ptr [[TMP1]], i64 0, i32 1, i32 1 -// CHECK-NEXT:[[TMP8:%.*]] = load ptr, ptr [[TMP5]], align 8, !tbaa [[TBAA20:![0-9]+]], !alias.scope !16, !noalias !13 -// CHECK-NEXT:[[TMP9:%.*]] = load i32, ptr [[TMP7]], align 4, !tbaa [[TBAA18]], !alias.scope !16, !noalias !13 -// CHECK-NEXT:[[TMP10:%.*]] = load float, ptr [[TMP6]], align 4, !tbaa [[TBAA21:![0-9]+]], !alias.scope !16, !noalias !13 -// CHECK-NEXT:tail call void [[TMP8]](i32 noundef [[TMP9]], float noundef [[TMP10]]) #[[ATTR2:[0-9]+]], !noalias !19 +// CHECK-NEXT:[[TMP8:%.*]] = load ptr, ptr [[TMP5]], align 8, !tbaa [[TBAA19:![0-9]+]], !noalias !13 +// CHECK-NEXT:[[TMP9:%.*]] = load i32, ptr [[TMP7]], align 4, !tbaa [[TBAA16]], !noalias !13 +// CHECK-NEXT:[[TMP10:%.*]] = load float, ptr [[TMP6]], align 4, !tbaa [[TBAA20:![0-9]+]], !noalias !13 +// CHECK-NEXT:tail call void [[TMP8]](i32 noundef [[TMP9]], float noundef [[TMP10]]) #[[ATTR2:[0-9]+]], !noalias !13 // CHECK-NEXT:br label [[DOTOMP_OUTLINED__EXIT]] // CHECK: .omp_outlined..exit: // CHECK-NEXT:ret i32 0 diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 3e4e030f44c7fe0..b320d77652e1cba 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -4093,8 +4093,8 @@ OpenMPIRBuilder::createTargetInit(const LocationDescription &Loc, bool IsSPMD, Function *Kernel = Builder.GetInsertBlock()->getParent(); - /// Manifest the launch configuration in the metadata matching the kernel - /// environment. + // Manifest the launch configuration in the metadata matching the kernel + // environment. if (MinTeamsVal > 1 || MaxTeamsVal > 0) writeTeamsForKernel(T, *Kernel, MinTeamsVal, MaxTeams
[clang] [OpenMP] Associate the KernelEnvironment with the GenericKernelTy (PR #70383)
https://github.com/jdoerfert closed https://github.com/llvm/llvm-project/pull/70383 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [mlir] [clang] [OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (PR #70401)
https://github.com/jdoerfert edited https://github.com/llvm/llvm-project/pull/70401 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [mlir] [clang] [OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (PR #70401)
https://github.com/jdoerfert closed https://github.com/llvm/llvm-project/pull/70401 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP] Non racy team reductions (PR #70752)
https://github.com/jdoerfert updated https://github.com/llvm/llvm-project/pull/70752 >From 04aafdce6f259e31304ed47118a56042b155bd77 Mon Sep 17 00:00:00 2001 From: Johannes Doerfert Date: Mon, 30 Oct 2023 16:39:00 -0700 Subject: [PATCH] [OpenMP][FIX] Allocate per launch memory for GPU team reductions We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249 --- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp | 75 ++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.h| 2 - .../OpenMP/nvptx_teams_reduction_codegen.cpp | 240 +- .../target_teams_generic_loop_codegen.cpp | 20 +- .../DeviceRTL/include/Interface.h | 2 + .../libomptarget/DeviceRTL/src/Reduction.cpp | 10 +- openmp/libomptarget/include/Environment.h | 7 +- .../PluginInterface/PluginInterface.cpp | 11 + .../common/PluginInterface/PluginInterface.h | 2 +- .../parallel_target_teams_reduction.cpp | 36 +++ 10 files changed, 221 insertions(+), 184 deletions(-) create mode 100644 openmp/libomptarget/test/offloading/parallel_target_teams_reduction.cpp diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index bd9329b8e2d4113..0ed665e0dfb9722 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -803,8 +803,30 @@ void CGOpenMPRuntimeGPU::emitKernelDeinit(CodeGenFunction &CGF, if (!IsSPMD) emitGenericVarsEpilog(CGF); + // This is temporary until we remove the fixed sized buffer. + ASTContext &C = CGM.getContext(); + RecordDecl *StaticRD = C.buildImplicitRecord( + "_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union); + StaticRD->startDefinition(); + for (const RecordDecl *TeamReductionRec : TeamsReductions) { +QualType RecTy = C.getRecordType(TeamReductionRec); +auto *Field = FieldDecl::Create( +C, StaticRD, SourceLocation(), SourceLocation(), nullptr, RecTy, +C.getTrivialTypeSourceInfo(RecTy, SourceLocation()), +/*BW=*/nullptr, /*Mutable=*/false, +/*InitStyle=*/ICIS_NoInit); +Field->setAccess(AS_public); +StaticRD->addDecl(Field); + } + StaticRD->completeDefinition(); + QualType StaticTy = C.getRecordType(StaticRD); + llvm::Type *LLVMReductionsBufferTy = + CGM.getTypes().ConvertTypeForMem(StaticTy); + const auto &DL = CGM.getModule().getDataLayout(); + uint64_t BufferSize = + DL.getTypeAllocSize(LLVMReductionsBufferTy).getFixedValue(); CGBuilderTy &Bld = CGF.Builder; - OMPBuilder.createTargetDeinit(Bld); + OMPBuilder.createTargetDeinit(Bld, BufferSize); } void CGOpenMPRuntimeGPU::emitSPMDKernel(const OMPExecutableDirective &D, @@ -2998,15 +3020,10 @@ void CGOpenMPRuntimeGPU::emitReduction( CGM.getContext(), PrivatesReductions, std::nullopt, VarFieldMap, C.getLangOpts().OpenMPCUDAReductionBufNum); TeamsReductions.push_back(TeamReductionRec); -if (!KernelTeamsReductionPtr) { - KernelTeamsReductionPtr = new llvm::GlobalVariable( - CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/true, - llvm::GlobalValue::InternalLinkage, nullptr, - "_openmp_teams_reductions_buffer_$_$ptr"); -} -llvm::Value *GlobalBufferPtr = CGF.EmitLoadOfScalar( -Address(KernelTeamsReductionPtr, CGF.VoidPtrTy, CGM.getPointerAlign()), -/*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc); +auto *KernelTeamsReductionPtr = CGF.EmitRuntimeCall( +OMPBuilder.getOrCreateRuntimeFunction( +CGM.getModule(), OMPRTL___kmpc_reduction_get_fixed_buffer), +{}, "_openmp_teams_reductions_buffer_$_$ptr"); llvm::Value *GlobalToBufferCpyFn = ::emitListToGlobalCopyFunction( CGM, Privates, ReductionArrayTy, Loc, TeamReductionRec, VarFieldMap); llvm::Value *GlobalToBufferRedFn = ::emitListToGlobalReduceFunction( @@ -3021,7 +3038,7 @@ void CGOpenMPRuntimeGPU::emitReduction( llvm::Value *Args[] = { RTLoc, ThreadId, -GlobalBufferPtr, +KernelTeamsReductionPtr, CGF.Builder.getInt32(C.getLangOpts().OpenMPCUDAReductionBufNum), RL, ShuffleAndReduceFn, @@ -3654,42 +3671,6 @@ void CGOpenMPRuntimeGPU::processRequiresDirective( CGOpenMPRuntime::processRequiresDirective(D); } -void CGOpenMPRuntimeGPU::clear() { - - if (!TeamsReductions.empty()) { -ASTContext &C = CGM.getContext(); -RecordDecl *StaticRD = C.buildImplicitRecord( -"_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union); -StaticRD->startDefinition(); -for (const RecordDecl *TeamReductionRec : Team
[clang] [openmp] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)
https://github.com/jdoerfert edited https://github.com/llvm/llvm-project/pull/70752 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)
https://github.com/jdoerfert edited https://github.com/llvm/llvm-project/pull/70752 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)
https://github.com/jdoerfert edited https://github.com/llvm/llvm-project/pull/70752 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [OpenMP] Team reduction work specialization (PR #70766)
https://github.com/jdoerfert updated https://github.com/llvm/llvm-project/pull/70766 >From 04aafdce6f259e31304ed47118a56042b155bd77 Mon Sep 17 00:00:00 2001 From: Johannes Doerfert Date: Mon, 30 Oct 2023 16:39:00 -0700 Subject: [PATCH 1/2] [OpenMP][FIX] Allocate per launch memory for GPU team reductions We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249 --- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp | 75 ++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.h| 2 - .../OpenMP/nvptx_teams_reduction_codegen.cpp | 240 +- .../target_teams_generic_loop_codegen.cpp | 20 +- .../DeviceRTL/include/Interface.h | 2 + .../libomptarget/DeviceRTL/src/Reduction.cpp | 10 +- openmp/libomptarget/include/Environment.h | 7 +- .../PluginInterface/PluginInterface.cpp | 11 + .../common/PluginInterface/PluginInterface.h | 2 +- .../parallel_target_teams_reduction.cpp | 36 +++ 10 files changed, 221 insertions(+), 184 deletions(-) create mode 100644 openmp/libomptarget/test/offloading/parallel_target_teams_reduction.cpp diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index bd9329b8e2d4113..0ed665e0dfb9722 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -803,8 +803,30 @@ void CGOpenMPRuntimeGPU::emitKernelDeinit(CodeGenFunction &CGF, if (!IsSPMD) emitGenericVarsEpilog(CGF); + // This is temporary until we remove the fixed sized buffer. + ASTContext &C = CGM.getContext(); + RecordDecl *StaticRD = C.buildImplicitRecord( + "_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union); + StaticRD->startDefinition(); + for (const RecordDecl *TeamReductionRec : TeamsReductions) { +QualType RecTy = C.getRecordType(TeamReductionRec); +auto *Field = FieldDecl::Create( +C, StaticRD, SourceLocation(), SourceLocation(), nullptr, RecTy, +C.getTrivialTypeSourceInfo(RecTy, SourceLocation()), +/*BW=*/nullptr, /*Mutable=*/false, +/*InitStyle=*/ICIS_NoInit); +Field->setAccess(AS_public); +StaticRD->addDecl(Field); + } + StaticRD->completeDefinition(); + QualType StaticTy = C.getRecordType(StaticRD); + llvm::Type *LLVMReductionsBufferTy = + CGM.getTypes().ConvertTypeForMem(StaticTy); + const auto &DL = CGM.getModule().getDataLayout(); + uint64_t BufferSize = + DL.getTypeAllocSize(LLVMReductionsBufferTy).getFixedValue(); CGBuilderTy &Bld = CGF.Builder; - OMPBuilder.createTargetDeinit(Bld); + OMPBuilder.createTargetDeinit(Bld, BufferSize); } void CGOpenMPRuntimeGPU::emitSPMDKernel(const OMPExecutableDirective &D, @@ -2998,15 +3020,10 @@ void CGOpenMPRuntimeGPU::emitReduction( CGM.getContext(), PrivatesReductions, std::nullopt, VarFieldMap, C.getLangOpts().OpenMPCUDAReductionBufNum); TeamsReductions.push_back(TeamReductionRec); -if (!KernelTeamsReductionPtr) { - KernelTeamsReductionPtr = new llvm::GlobalVariable( - CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/true, - llvm::GlobalValue::InternalLinkage, nullptr, - "_openmp_teams_reductions_buffer_$_$ptr"); -} -llvm::Value *GlobalBufferPtr = CGF.EmitLoadOfScalar( -Address(KernelTeamsReductionPtr, CGF.VoidPtrTy, CGM.getPointerAlign()), -/*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc); +auto *KernelTeamsReductionPtr = CGF.EmitRuntimeCall( +OMPBuilder.getOrCreateRuntimeFunction( +CGM.getModule(), OMPRTL___kmpc_reduction_get_fixed_buffer), +{}, "_openmp_teams_reductions_buffer_$_$ptr"); llvm::Value *GlobalToBufferCpyFn = ::emitListToGlobalCopyFunction( CGM, Privates, ReductionArrayTy, Loc, TeamReductionRec, VarFieldMap); llvm::Value *GlobalToBufferRedFn = ::emitListToGlobalReduceFunction( @@ -3021,7 +3038,7 @@ void CGOpenMPRuntimeGPU::emitReduction( llvm::Value *Args[] = { RTLoc, ThreadId, -GlobalBufferPtr, +KernelTeamsReductionPtr, CGF.Builder.getInt32(C.getLangOpts().OpenMPCUDAReductionBufNum), RL, ShuffleAndReduceFn, @@ -3654,42 +3671,6 @@ void CGOpenMPRuntimeGPU::processRequiresDirective( CGOpenMPRuntime::processRequiresDirective(D); } -void CGOpenMPRuntimeGPU::clear() { - - if (!TeamsReductions.empty()) { -ASTContext &C = CGM.getContext(); -RecordDecl *StaticRD = C.buildImplicitRecord( -"_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union); -StaticRD->startDefinition(); -for (const RecordDecl *TeamReductionRec :
[openmp] [clang] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)
@@ -194,6 +191,9 @@ int32_t __kmpc_nvptx_teams_reduce_nowait_v2( ThreadId = 0; } + uint32_t &IterCnt = state::getKernelLaunchEnvironment().ReductionIterCnt; + uint32_t &Cnt = state::getKernelLaunchEnvironment().ReductionCnt; jdoerfert wrote: They are, I replaced the globals with them. https://github.com/llvm/llvm-project/pull/70752 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)
https://github.com/jdoerfert updated https://github.com/llvm/llvm-project/pull/70752 >From 1859bd43bc2c0bb32fc028f0daf73525f039c5d2 Mon Sep 17 00:00:00 2001 From: Johannes Doerfert Date: Mon, 30 Oct 2023 16:39:00 -0700 Subject: [PATCH] [OpenMP][FIX] Allocate per launch memory for GPU team reductions We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249 --- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp | 75 ++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.h| 2 - .../OpenMP/nvptx_teams_reduction_codegen.cpp | 240 +- .../target_teams_generic_loop_codegen.cpp | 20 +- .../DeviceRTL/include/Interface.h | 2 + .../libomptarget/DeviceRTL/src/Reduction.cpp | 10 +- openmp/libomptarget/include/Environment.h | 25 +- .../PluginInterface/PluginInterface.cpp | 16 +- .../parallel_target_teams_reduction.cpp | 36 +++ 9 files changed, 231 insertions(+), 195 deletions(-) create mode 100644 openmp/libomptarget/test/offloading/parallel_target_teams_reduction.cpp diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index bd9329b8e2d4113..0ed665e0dfb9722 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -803,8 +803,30 @@ void CGOpenMPRuntimeGPU::emitKernelDeinit(CodeGenFunction &CGF, if (!IsSPMD) emitGenericVarsEpilog(CGF); + // This is temporary until we remove the fixed sized buffer. + ASTContext &C = CGM.getContext(); + RecordDecl *StaticRD = C.buildImplicitRecord( + "_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union); + StaticRD->startDefinition(); + for (const RecordDecl *TeamReductionRec : TeamsReductions) { +QualType RecTy = C.getRecordType(TeamReductionRec); +auto *Field = FieldDecl::Create( +C, StaticRD, SourceLocation(), SourceLocation(), nullptr, RecTy, +C.getTrivialTypeSourceInfo(RecTy, SourceLocation()), +/*BW=*/nullptr, /*Mutable=*/false, +/*InitStyle=*/ICIS_NoInit); +Field->setAccess(AS_public); +StaticRD->addDecl(Field); + } + StaticRD->completeDefinition(); + QualType StaticTy = C.getRecordType(StaticRD); + llvm::Type *LLVMReductionsBufferTy = + CGM.getTypes().ConvertTypeForMem(StaticTy); + const auto &DL = CGM.getModule().getDataLayout(); + uint64_t BufferSize = + DL.getTypeAllocSize(LLVMReductionsBufferTy).getFixedValue(); CGBuilderTy &Bld = CGF.Builder; - OMPBuilder.createTargetDeinit(Bld); + OMPBuilder.createTargetDeinit(Bld, BufferSize); } void CGOpenMPRuntimeGPU::emitSPMDKernel(const OMPExecutableDirective &D, @@ -2998,15 +3020,10 @@ void CGOpenMPRuntimeGPU::emitReduction( CGM.getContext(), PrivatesReductions, std::nullopt, VarFieldMap, C.getLangOpts().OpenMPCUDAReductionBufNum); TeamsReductions.push_back(TeamReductionRec); -if (!KernelTeamsReductionPtr) { - KernelTeamsReductionPtr = new llvm::GlobalVariable( - CGM.getModule(), CGM.VoidPtrTy, /*isConstant=*/true, - llvm::GlobalValue::InternalLinkage, nullptr, - "_openmp_teams_reductions_buffer_$_$ptr"); -} -llvm::Value *GlobalBufferPtr = CGF.EmitLoadOfScalar( -Address(KernelTeamsReductionPtr, CGF.VoidPtrTy, CGM.getPointerAlign()), -/*Volatile=*/false, C.getPointerType(C.VoidPtrTy), Loc); +auto *KernelTeamsReductionPtr = CGF.EmitRuntimeCall( +OMPBuilder.getOrCreateRuntimeFunction( +CGM.getModule(), OMPRTL___kmpc_reduction_get_fixed_buffer), +{}, "_openmp_teams_reductions_buffer_$_$ptr"); llvm::Value *GlobalToBufferCpyFn = ::emitListToGlobalCopyFunction( CGM, Privates, ReductionArrayTy, Loc, TeamReductionRec, VarFieldMap); llvm::Value *GlobalToBufferRedFn = ::emitListToGlobalReduceFunction( @@ -3021,7 +3038,7 @@ void CGOpenMPRuntimeGPU::emitReduction( llvm::Value *Args[] = { RTLoc, ThreadId, -GlobalBufferPtr, +KernelTeamsReductionPtr, CGF.Builder.getInt32(C.getLangOpts().OpenMPCUDAReductionBufNum), RL, ShuffleAndReduceFn, @@ -3654,42 +3671,6 @@ void CGOpenMPRuntimeGPU::processRequiresDirective( CGOpenMPRuntime::processRequiresDirective(D); } -void CGOpenMPRuntimeGPU::clear() { - - if (!TeamsReductions.empty()) { -ASTContext &C = CGM.getContext(); -RecordDecl *StaticRD = C.buildImplicitRecord( -"_openmp_teams_reduction_type_$_", RecordDecl::TagKind::TTK_Union); -StaticRD->startDefinition(); -for (const RecordDecl *TeamReductionRec : TeamsReductions) { - QualType RecTy = C.getRecordType(T
[openmp] [clang] [OpenMP][FIX] Allocate per launch memory for GPU team reductions (PR #70752)
https://github.com/jdoerfert closed https://github.com/llvm/llvm-project/pull/70752 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 0e06ddf - Revert "[APINotes] Upstream APINotesOptions"
Author: Johannes Doerfert Date: 2023-11-01T11:33:25-07:00 New Revision: 0e06ddf0f6896cfd817a1b97a43b78331e0b1d66 URL: https://github.com/llvm/llvm-project/commit/0e06ddf0f6896cfd817a1b97a43b78331e0b1d66 DIFF: https://github.com/llvm/llvm-project/commit/0e06ddf0f6896cfd817a1b97a43b78331e0b1d66.diff LOG: Revert "[APINotes] Upstream APINotesOptions" This reverts commit c0a1857928c557400af0ed53d198cc9f3f185f9a. A shared_ptr assertion always triggers causes all bots to fail. Added: Modified: clang/include/clang/Driver/Options.td clang/include/clang/Frontend/CompilerInvocation.h clang/lib/Frontend/CompilerInvocation.cpp Removed: clang/include/clang/APINotes/APINotesOptions.h diff --git a/clang/include/clang/APINotes/APINotesOptions.h b/clang/include/clang/APINotes/APINotesOptions.h deleted file mode 100644 index e8b8a9ed2261fa1..000 --- a/clang/include/clang/APINotes/APINotesOptions.h +++ /dev/null @@ -1,34 +0,0 @@ -//===--- APINotesOptions.h --*- C++ -*-===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===--===// - -#ifndef LLVM_CLANG_APINOTES_APINOTESOPTIONS_H -#define LLVM_CLANG_APINOTES_APINOTESOPTIONS_H - -#include "llvm/Support/VersionTuple.h" -#include -#include - -namespace clang { - -/// Tracks various options which control how API notes are found and handled. -class APINotesOptions { -public: - /// The Swift version which should be used for API notes. - llvm::VersionTuple SwiftVersion; - - /// The set of search paths where we API notes can be found for particular - /// modules. - /// - /// The API notes in this directory are stored as .apinotes, and - /// are only applied when building the module . - std::vector ModuleSearchPaths; -}; - -} // namespace clang - -#endif // LLVM_CLANG_APINOTES_APINOTESOPTIONS_H diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index fcf6a4b2ccb2369..b1229b2f4562379 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -1733,10 +1733,6 @@ def fswift_async_fp_EQ : Joined<["-"], "fswift-async-fp=">, NormalizedValuesScope<"CodeGenOptions::SwiftAsyncFramePointerKind">, NormalizedValues<["Auto", "Always", "Never"]>, MarshallingInfoEnum, "Always">; -def fapinotes_swift_version : Joined<["-"], "fapinotes-swift-version=">, - Group, Visibility<[ClangOption, CC1Option]>, - MetaVarName<"">, - HelpText<"Specify the Swift version to use when filtering API notes">; defm addrsig : BoolFOption<"addrsig", CodeGenOpts<"Addrsig">, DefaultFalse, @@ -4133,9 +4129,6 @@ def ibuiltininc : Flag<["-"], "ibuiltininc">, Group, def index_header_map : Flag<["-"], "index-header-map">, Visibility<[ClangOption, CC1Option]>, HelpText<"Make the next included directory (-I or -F) an indexer header map">; -def iapinotes_modules : JoinedOrSeparate<["-"], "iapinotes-modules">, Group, - Visibility<[ClangOption, CC1Option]>, - HelpText<"Add directory to the API notes search path referenced by module name">, MetaVarName<"">; def idirafter : JoinedOrSeparate<["-"], "idirafter">, Group, Visibility<[ClangOption, CC1Option]>, HelpText<"Add directory to AFTER include search path">; diff --git a/clang/include/clang/Frontend/CompilerInvocation.h b/clang/include/clang/Frontend/CompilerInvocation.h index d9c757a8a156861..45e263e7bc76822 100644 --- a/clang/include/clang/Frontend/CompilerInvocation.h +++ b/clang/include/clang/Frontend/CompilerInvocation.h @@ -9,7 +9,6 @@ #ifndef LLVM_CLANG_FRONTEND_COMPILERINVOCATION_H #define LLVM_CLANG_FRONTEND_COMPILERINVOCATION_H -#include "clang/APINotes/APINotesOptions.h" #include "clang/Basic/CodeGenOptions.h" #include "clang/Basic/DiagnosticOptions.h" #include "clang/Basic/FileSystemOptions.h" @@ -93,9 +92,6 @@ class CompilerInvocationBase { std::shared_ptr MigratorOpts; - /// Options controlling API notes. - std::shared_ptr APINotesOpts; - /// Options controlling IRgen and the backend. std::shared_ptr CodeGenOpts; @@ -135,7 +131,6 @@ class CompilerInvocationBase { const PreprocessorOptions &getPreprocessorOpts() const { return *PPOpts; } const AnalyzerOptions &getAnalyzerOpts() const { return *AnalyzerOpts; } const MigratorOptions &getMigratorOpts() const { return *MigratorOpts; } - const APINotesOptions &getAPINotesOpts() const { return *APINotesOpts; } const CodeGenOptions &getCodeGenOpts() const { return *CodeGenOpts; } const FileSystemOptions &getFileSystemOpts() const { return *FSOpts; } const FrontendOptions &getFrontendOpts() const { return *FrontendOpts; } @@ -247,7 +242,6 @@ class Compile
[clang] [OpenMP] Make team reductions less bad (PR #70981)
https://github.com/jdoerfert closed https://github.com/llvm/llvm-project/pull/70981 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [OpenMP][OMPIRBuilder] Add support to omp target parallel (PR #67000)
https://github.com/jdoerfert edited https://github.com/llvm/llvm-project/pull/67000 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [OpenMP][OMPIRBuilder] Add support to omp target parallel (PR #67000)
https://github.com/jdoerfert approved this pull request. LG, commit the two commits separately though https://github.com/llvm/llvm-project/pull/67000 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [OpenMP][OMPIRBuilder] Add support to omp target parallel (PR #67000)
@@ -1026,25 +1026,25 @@ for (int i = 0; i < argc; ++i) { // CHECK3-NEXT:call void @llvm.experimental.noalias.scope.decl(metadata [[META8:![0-9]+]]) // CHECK3-NEXT:call void @llvm.experimental.noalias.scope.decl(metadata [[META10:![0-9]+]]) // CHECK3-NEXT:call void @llvm.experimental.noalias.scope.decl(metadata [[META12:![0-9]+]]) -// CHECK3-NEXT:store i32 [[TMP2]], ptr [[DOTGLOBAL_TID__ADDR_I]], align 4, !noalias !14 -// CHECK3-NEXT:store ptr [[TMP5]], ptr [[DOTPART_ID__ADDR_I]], align 8, !noalias !14 -// CHECK3-NEXT:store ptr null, ptr [[DOTPRIVATES__ADDR_I]], align 8, !noalias !14 -// CHECK3-NEXT:store ptr null, ptr [[DOTCOPY_FN__ADDR_I]], align 8, !noalias !14 -// CHECK3-NEXT:store ptr [[TMP3]], ptr [[DOTTASK_T__ADDR_I]], align 8, !noalias !14 -// CHECK3-NEXT:store ptr [[TMP7]], ptr [[__CONTEXT_ADDR_I]], align 8, !noalias !14 -// CHECK3-NEXT:[[TMP8:%.*]] = load ptr, ptr [[__CONTEXT_ADDR_I]], align 8, !noalias !14 +// CHECK3-NEXT:store i32 [[TMP2]], ptr [[DOTGLOBAL_TID__ADDR_I]], align 4, !noalias ![[NOALIAS0:[0-9]+]] +// CHECK3-NEXT:store ptr [[TMP5]], ptr [[DOTPART_ID__ADDR_I]], align 8, !noalias ![[NOALIAS0]] +// CHECK3-NEXT:store ptr null, ptr [[DOTPRIVATES__ADDR_I]], align 8, !noalias ![[NOALIAS0]] +// CHECK3-NEXT:store ptr null, ptr [[DOTCOPY_FN__ADDR_I]], align 8, !noalias ![[NOALIAS0]] +// CHECK3-NEXT:store ptr [[TMP3]], ptr [[DOTTASK_T__ADDR_I]], align 8, !noalias ![[NOALIAS0]] +// CHECK3-NEXT:store ptr [[TMP7]], ptr [[__CONTEXT_ADDR_I]], align 8, !noalias ![[NOALIAS0]] +// CHECK3-NEXT:[[TMP8:%.*]] = load ptr, ptr [[__CONTEXT_ADDR_I]], align 8, !noalias ![[NOALIAS0]] // CHECK3-NEXT:[[OMP_GLOBAL_THREAD_NUM_I:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB12:[0-9]+]]) // CHECK3-NEXT:[[TMP9:%.*]] = call i32 @__kmpc_cancel(ptr @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM_I]], i32 4) // CHECK3-NEXT:[[TMP10:%.*]] = icmp ne i32 [[TMP9]], 0 // CHECK3-NEXT:br i1 [[TMP10]], label [[DOTCANCEL_EXIT_I:%.*]], label [[DOTCANCEL_CONTINUE_I:%.*]] // CHECK3: .cancel.exit.i: -// CHECK3-NEXT:store i32 1, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias !14 +// CHECK3-NEXT:store i32 1, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias ![[NOALIAS1:[0-9]+]] // CHECK3-NEXT:br label [[DOTOMP_OUTLINED__EXIT:%.*]] // CHECK3: .cancel.continue.i: -// CHECK3-NEXT:store i32 0, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias !14 +// CHECK3-NEXT:store i32 0, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias ![[NOALIAS1]] // CHECK3-NEXT:br label [[DOTOMP_OUTLINED__EXIT]] // CHECK3: .omp_outlined..exit: -// CHECK3-NEXT:[[CLEANUP_DEST_I:%.*]] = load i32, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias !14 +// CHECK3-NEXT:[[CLEANUP_DEST_I:%.*]] = load i32, ptr [[CLEANUP_DEST_SLOT_I]], align 4, !noalias ![[NOALIAS1]] jdoerfert wrote: Commit that right away as NFC. If you leave it with the PR and you use the web interface it'll smash them right back together. https://github.com/llvm/llvm-project/pull/67000 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] Recommit changes to global checks (PR #71171)
https://github.com/jdoerfert commented: I think if the issues with the original commit are resolved, this is good to go. Did you verify we can properly auto-generate files, e.g., in llvm/test/Transforms/Attributor and clang/test/OpenMP? https://github.com/llvm/llvm-project/pull/71171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] d3e7a48 - [OpenMP][NFC] Remove a no-op function
Author: Johannes Doerfert Date: 2023-11-03T10:28:36-07:00 New Revision: d3e7a48cbde060a6dbc1edcb00f375fb2f9405dc URL: https://github.com/llvm/llvm-project/commit/d3e7a48cbde060a6dbc1edcb00f375fb2f9405dc DIFF: https://github.com/llvm/llvm-project/commit/d3e7a48cbde060a6dbc1edcb00f375fb2f9405dc.diff LOG: [OpenMP][NFC] Remove a no-op function Added: Modified: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp clang/test/OpenMP/reduction_implicit_map.cpp clang/test/OpenMP/target_teams_generic_loop_codegen.cpp llvm/include/llvm/Frontend/OpenMP/OMPKinds.def llvm/lib/Transforms/IPO/OpenMPOpt.cpp llvm/test/Transforms/OpenMP/add_attributes.ll openmp/libomptarget/DeviceRTL/include/Interface.h openmp/libomptarget/DeviceRTL/src/Reduction.cpp Removed: diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index 0ed665e0dfb9722..009b3f0a85a3785 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -3081,14 +3081,7 @@ void CGOpenMPRuntimeGPU::emitReduction( ++IRHS; } }; - llvm::Value *EndArgs[] = {ThreadId}; RegionCodeGenTy RCG(CodeGen); - NVPTXActionTy Action( - nullptr, std::nullopt, - OMPBuilder.getOrCreateRuntimeFunction( - CGM.getModule(), OMPRTL___kmpc_nvptx_end_reduce_nowait), - EndArgs); - RCG.setAction(Action); RCG(CGF); // There is no need to emit line number for unconditional branch. (void)ApplyDebugLocation::CreateEmpty(CGF); diff --git a/clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp b/clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp index 094c5ae3522f96d..c2a958dfdd2453e 100644 --- a/clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp +++ b/clang/test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp @@ -148,7 +148,6 @@ int bar(int n){ // CHECK-64-NEXT:[[TMP8:%.*]] = load double, ptr [[E1]], align 8 // CHECK-64-NEXT:[[ADD2:%.*]] = fadd double [[TMP7]], [[TMP8]] // CHECK-64-NEXT:store double [[ADD2]], ptr [[TMP0]], align 8 -// CHECK-64-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP3]]) // CHECK-64-NEXT:br label [[DOTOMP_REDUCTION_DONE]] // CHECK-64: .omp.reduction.done: // CHECK-64-NEXT:ret void @@ -353,7 +352,6 @@ int bar(int n){ // CHECK-64-NEXT:[[TMP13:%.*]] = load float, ptr [[D2]], align 4 // CHECK-64-NEXT:[[MUL8:%.*]] = fmul float [[TMP12]], [[TMP13]] // CHECK-64-NEXT:store float [[MUL8]], ptr [[TMP1]], align 4 -// CHECK-64-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP5]]) // CHECK-64-NEXT:br label [[DOTOMP_REDUCTION_DONE]] // CHECK-64: .omp.reduction.done: // CHECK-64-NEXT:ret void @@ -609,7 +607,6 @@ int bar(int n){ // CHECK-64: cond.end11: // CHECK-64-NEXT:[[COND12:%.*]] = phi i16 [ [[TMP15]], [[COND_TRUE9]] ], [ [[TMP16]], [[COND_FALSE10]] ] // CHECK-64-NEXT:store i16 [[COND12]], ptr [[TMP1]], align 2 -// CHECK-64-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP6]]) // CHECK-64-NEXT:br label [[DOTOMP_REDUCTION_DONE]] // CHECK-64: .omp.reduction.done: // CHECK-64-NEXT:ret void @@ -824,7 +821,6 @@ int bar(int n){ // CHECK-32-NEXT:[[TMP8:%.*]] = load double, ptr [[E1]], align 8 // CHECK-32-NEXT:[[ADD2:%.*]] = fadd double [[TMP7]], [[TMP8]] // CHECK-32-NEXT:store double [[ADD2]], ptr [[TMP0]], align 8 -// CHECK-32-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP3]]) // CHECK-32-NEXT:br label [[DOTOMP_REDUCTION_DONE]] // CHECK-32: .omp.reduction.done: // CHECK-32-NEXT:ret void @@ -1029,7 +1025,6 @@ int bar(int n){ // CHECK-32-NEXT:[[TMP13:%.*]] = load float, ptr [[D2]], align 4 // CHECK-32-NEXT:[[MUL8:%.*]] = fmul float [[TMP12]], [[TMP13]] // CHECK-32-NEXT:store float [[MUL8]], ptr [[TMP1]], align 4 -// CHECK-32-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP5]]) // CHECK-32-NEXT:br label [[DOTOMP_REDUCTION_DONE]] // CHECK-32: .omp.reduction.done: // CHECK-32-NEXT:ret void @@ -1285,7 +1280,6 @@ int bar(int n){ // CHECK-32: cond.end11: // CHECK-32-NEXT:[[COND12:%.*]] = phi i16 [ [[TMP15]], [[COND_TRUE9]] ], [ [[TMP16]], [[COND_FALSE10]] ] // CHECK-32-NEXT:store i16 [[COND12]], ptr [[TMP1]], align 2 -// CHECK-32-NEXT:call void @__kmpc_nvptx_end_reduce_nowait(i32 [[TMP6]]) // CHECK-32-NEXT:br label [[DOTOMP_REDUCTION_DONE]] // CHECK-32: .omp.reduction.done: // CHECK-32-NEXT:ret void @@ -1500,7 +1494,6 @@ int bar(int n){ // CHECK-32-EX-NEXT:[[TMP8:%.*]] = load double, ptr [[E1]], align 8 // CHECK-32-EX-NEXT:[[ADD2:%.*]] = fadd
[clang] [llvm] Recommit changes to global checks (PR #71171)
jdoerfert wrote: > > I think if the issues with the original commit are resolved, this is good > > to go. > > Did you verify we can properly auto-generate files, e.g., in > > llvm/test/Transforms/Attributor and clang/test/OpenMP? > > Ah no I did not, I'll do that on Monday. I'd run `./llvm/utils/update_any_test_checks.py` once and see if the tests pass afterwards. Then do it again to ensure the nasty ordering and duplication issues are gone for good. https://github.com/llvm/llvm-project/pull/71171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [clang] report inlining decisions with -Wattribute-{warning|error} (PR #73552)
https://github.com/jdoerfert updated https://github.com/llvm/llvm-project/pull/73552 >From cea177222b421c67dabbe9e267f8b9a4ead4d51e Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Tue, 10 Jan 2023 17:42:18 -0800 Subject: [PATCH 01/15] [clang] report inlining decisions with -Wattribute-{warning|error} Due to inlining, descovering which specific call site to a function with the attribute "warning" or "error" is painful. In the IR record inlining decisions in metadata when inlining a callee that itself contains a call to a dontcall-error or dontcall-warn fn. Print this info so that it's clearer which call site is problematic. There's still some limitations with this approach; macro expansion is not recorded. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1571 Differential Revision: https://reviews.llvm.org/D141451 --- .../clang/Basic/DiagnosticFrontendKinds.td| 2 + clang/lib/CodeGen/CodeGenAction.cpp | 10 +++ ...backend-attribute-error-warning-optimize.c | 21 + .../backend-attribute-error-warning.c | 6 ++ .../backend-attribute-error-warning.cpp | 12 +++ llvm/docs/LangRef.rst | 8 +- llvm/include/llvm/IR/DiagnosticInfo.h | 14 +++- llvm/lib/IR/DiagnosticInfo.cpp| 23 - llvm/lib/Transforms/Utils/InlineFunction.cpp | 13 +++ .../Transforms/Inline/dontcall-attributes.ll | 84 +++ 10 files changed, 185 insertions(+), 8 deletions(-) create mode 100644 llvm/test/Transforms/Inline/dontcall-attributes.ll diff --git a/clang/include/clang/Basic/DiagnosticFrontendKinds.td b/clang/include/clang/Basic/DiagnosticFrontendKinds.td index 715e0c0dc8fa84e..0909b1f59175be9 100644 --- a/clang/include/clang/Basic/DiagnosticFrontendKinds.td +++ b/clang/include/clang/Basic/DiagnosticFrontendKinds.td @@ -93,6 +93,8 @@ def err_fe_backend_error_attr : def warn_fe_backend_warning_attr : Warning<"call to '%0' declared with 'warning' attribute: %1">, BackendInfo, InGroup; +def note_fe_backend_in : Note<"called by function '%0'">; +def note_fe_backend_inlined : Note<"inlined by function '%0'">; def err_fe_invalid_code_complete_file : Error< "cannot locate code-completion file %0">, DefaultFatal; diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp index a31a271ed77d1ca..66e040741e2718d 100644 --- a/clang/lib/CodeGen/CodeGenAction.cpp +++ b/clang/lib/CodeGen/CodeGenAction.cpp @@ -52,6 +52,8 @@ #include "llvm/Transforms/Utils/Cloning.h" #include +#include + using namespace clang; using namespace llvm; @@ -794,6 +796,14 @@ void BackendConsumer::DontCallDiagHandler(const DiagnosticInfoDontCall &D) { ? diag::err_fe_backend_error_attr : diag::warn_fe_backend_warning_attr) << llvm::demangle(D.getFunctionName()) << D.getNote(); + + SmallVector InliningDecisions; + D.getInliningDecisions(InliningDecisions); + InliningDecisions.push_back(D.getCaller().str()); + for (auto Dec : llvm::enumerate(InliningDecisions)) +Diags.Report(Dec.index() ? diag::note_fe_backend_inlined + : diag::note_fe_backend_in) +<< llvm::demangle(Dec.value()); } void BackendConsumer::MisExpectDiagHandler( diff --git a/clang/test/Frontend/backend-attribute-error-warning-optimize.c b/clang/test/Frontend/backend-attribute-error-warning-optimize.c index d3951e3b6b1f57d..0bfc50ff8985c39 100644 --- a/clang/test/Frontend/backend-attribute-error-warning-optimize.c +++ b/clang/test/Frontend/backend-attribute-error-warning-optimize.c @@ -9,6 +9,7 @@ int x(void) { } void baz(void) { foo(); // expected-error {{call to 'foo' declared with 'error' attribute: oh no foo}} + // expected-note@* {{called by function 'baz'}} if (x()) bar(); } @@ -20,3 +21,23 @@ void indirect(void) { quux = foo; quux(); } + +static inline void a(int x) { +if (x == 10) +foo(); // expected-error {{call to 'foo' declared with 'error' attribute: oh no foo}} + // expected-note@* {{called by function 'a'}} + // expected-note@* {{inlined by function 'b'}} + // expected-note@* {{inlined by function 'd'}} +} + +static inline void b() { +a(10); +} + +void c() { +a(9); +} + +void d() { + b(); +} diff --git a/clang/test/Frontend/backend-attribute-error-warning.c b/clang/test/Frontend/backend-attribute-error-warning.c index c3c7803479aac96..c87a47053e5c0f8 100644 --- a/clang/test/Frontend/backend-attribute-error-warning.c +++ b/clang/test/Frontend/backend-attribute-error-warning.c @@ -23,11 +23,17 @@ duplicate_warnings(void); void baz(void) { foo(); // expected-error {{call to 'foo' declared with 'error' attribute: oh no foo}} + // expected-note@* {{called by function 'baz'}} if (x()) bar(); // expected-error {{call to 'bar' declared with 'error' attribute: oh no bar}} + // expected-note@* {
[clang] [Clang] CWG2789 Overload resolution with implicit and explicit object… (PR #73493)
https://github.com/jdoerfert updated https://github.com/llvm/llvm-project/pull/73493 >From 3758290904571237c13ba23f2e3f65e58b6598aa Mon Sep 17 00:00:00 2001 From: Corentin Jabot Date: Mon, 27 Nov 2023 10:48:13 +0100 Subject: [PATCH 1/2] [Clang] CWG2789 Overload resolution with implicit and explicit object member functions Implement the resolution to CWG2789 from https://wiki.edg.com/pub/Wg21kona2023/StrawPolls/p3046r0.html The DR page is not updated because the issue has not made it to a published list yet. --- clang/include/clang/Sema/Sema.h | 6 +++ clang/lib/Sema/SemaOverload.cpp | 69 ++--- clang/test/CXX/drs/dr27xx.cpp | 31 +++ 3 files changed, 92 insertions(+), 14 deletions(-) create mode 100644 clang/test/CXX/drs/dr27xx.cpp diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h index f7c9d0e2e6412b7..7579a3256bc37aa 100644 --- a/clang/include/clang/Sema/Sema.h +++ b/clang/include/clang/Sema/Sema.h @@ -3849,6 +3849,12 @@ class Sema final { const FunctionProtoType *NewType, unsigned *ArgPos = nullptr, bool Reversed = false); + + bool FunctionNonObjectParamTypesAreEqual(const FunctionDecl *OldFunction, + const FunctionDecl *NewFunction, + unsigned *ArgPos = nullptr, + bool Reversed = false); + void HandleFunctionTypeMismatch(PartialDiagnostic &PDiag, QualType FromType, QualType ToType); diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp index 9800d7f1c9cfee9..cc69cd1f2862aae 100644 --- a/clang/lib/Sema/SemaOverload.cpp +++ b/clang/lib/Sema/SemaOverload.cpp @@ -3239,6 +3239,28 @@ bool Sema::FunctionParamTypesAreEqual(const FunctionProtoType *OldType, NewType->param_types(), ArgPos, Reversed); } +bool Sema::FunctionNonObjectParamTypesAreEqual(const FunctionDecl *OldFunction, + const FunctionDecl *NewFunction, + unsigned *ArgPos, + bool Reversed) { + + if (OldFunction->getNumNonObjectParams() != + NewFunction->getNumNonObjectParams()) +return false; + + unsigned OldIgnore = + unsigned(OldFunction->hasCXXExplicitFunctionObjectParameter()); + unsigned NewIgnore = + unsigned(NewFunction->hasCXXExplicitFunctionObjectParameter()); + + auto *OldPT = cast(OldFunction->getFunctionType()); + auto *NewPT = cast(NewFunction->getFunctionType()); + + return FunctionParamTypesAreEqual(OldPT->param_types().slice(OldIgnore), +NewPT->param_types().slice(NewIgnore), +ArgPos, Reversed); +} + /// CheckPointerConversion - Check the pointer conversion from the /// expression From to the type ToType. This routine checks for /// ambiguous or inaccessible derived-to-base pointer @@ -10121,22 +10143,41 @@ static bool haveSameParameterTypes(ASTContext &Context, const FunctionDecl *F1, /// We're allowed to use constraints partial ordering only if the candidates /// have the same parameter types: -/// [over.match.best]p2.6 -/// F1 and F2 are non-template functions with the same parameter-type-lists, -/// and F1 is more constrained than F2 [...] +/// [over.match.best.general]p2.6 +/// F1 and F2 are non-template functions with the same +/// non-object-parameter-type-lists, and F1 is more constrained than F2 [...] static bool sameFunctionParameterTypeLists(Sema &S, - const OverloadCandidate &Cand1, - const OverloadCandidate &Cand2) { - if (Cand1.Function && Cand2.Function) { -auto *PT1 = cast(Cand1.Function->getFunctionType()); -auto *PT2 = cast(Cand2.Function->getFunctionType()); -if (PT1->getNumParams() == PT2->getNumParams() && -PT1->isVariadic() == PT2->isVariadic() && -S.FunctionParamTypesAreEqual(PT1, PT2, nullptr, - Cand1.isReversed() ^ Cand2.isReversed())) - return true; + const OverloadCandidate &Cand1, + const OverloadCandidate &Cand2) { + if (!Cand1.Function || !Cand2.Function) +return false; + + auto *Fn1 = Cand1.Function; + auto *Fn2 = Cand2.Function; + + if (Fn1->isVariadic() != Fn1->isVariadic()) +return false; + + if (!S.FunctionNonObjectParamTypesAreEqual( + Fn1, Fn2, nullptr, Cand1.isReversed() ^ Cand2.isReversed())) +return false; + + auto *Mem1 = dyn_cast(Fn1); + auto *Mem2 = dyn_cast(Fn2); + if (Mem1 && Mem2) { +// if they are member functions, both are direct members of the same class, +// and +
[openmp] [clang] [OpenMP] Directly use user's grid and block size in kernel language mode (PR #70612)
https://github.com/jdoerfert approved this pull request. LG, see the nit. Also, add a runtime test with bare and verify we stick with values we otherwise would not, e.g., 1 teams and 1024 threads. https://github.com/llvm/llvm-project/pull/70612 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP] Directly use user's grid and block size in kernel language mode (PR #70612)
@@ -14633,6 +14633,26 @@ StmtResult Sema::ActOnOpenMPTargetTeamsDirective(ArrayRef Clauses, } setFunctionHasBranchProtectedScope(); + bool HasBareClause = false; + bool HasThreadLimitClause = false; + bool HasNumTeamsClause = false; + OMPClause *BareClause = nullptr; jdoerfert wrote: No need for HasBareClause since you got the pointer. https://github.com/llvm/llvm-project/pull/70612 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [OpenMP] Directly use user's grid and block size in kernel language mode (PR #70612)
https://github.com/jdoerfert edited https://github.com/llvm/llvm-project/pull/70612 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [libcxx] [clang] [openmp] [flang] [OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code (PR #73817)
https://github.com/jdoerfert updated https://github.com/llvm/llvm-project/pull/73817 >From ed1513641d575c4a2881613864c892aff7855a78 Mon Sep 17 00:00:00 2001 From: Johannes Doerfert Date: Tue, 28 Nov 2023 18:51:23 -0800 Subject: [PATCH] [OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code While this does not really encapsulate the mapping code, it at least moves most of the declarations out of the way. --- openmp/libomptarget/include/OpenMP/Mapping.h | 427 +++ openmp/libomptarget/include/device.h | 350 +-- openmp/libomptarget/src/CMakeLists.txt | 2 + openmp/libomptarget/src/OpenMP/Mapping.cpp | 40 ++ openmp/libomptarget/src/omptarget.cpp| 1 + openmp/libomptarget/src/private.h| 82 6 files changed, 473 insertions(+), 429 deletions(-) create mode 100644 openmp/libomptarget/include/OpenMP/Mapping.h create mode 100644 openmp/libomptarget/src/OpenMP/Mapping.cpp diff --git a/openmp/libomptarget/include/OpenMP/Mapping.h b/openmp/libomptarget/include/OpenMP/Mapping.h new file mode 100644 index 000..b01831c61f6c823 --- /dev/null +++ b/openmp/libomptarget/include/OpenMP/Mapping.h @@ -0,0 +1,427 @@ +//===-- OpenMP/Mapping.h - OpenMP/OpenACC pointer mapping ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Declarations for managing host-to-device pointer mappings. +// +//===--===// + +#ifndef OMPTARGET_OPENMP_MAPPING_H +#define OMPTARGET_OPENMP_MAPPING_H + +#include "omptarget.h" + +#include +#include +#include + +#include "llvm/ADT/SmallSet.h" + +struct DeviceTy; +class AsyncInfoTy; + +using map_var_info_t = void *; + +/// Information about shadow pointers. +struct ShadowPtrInfoTy { + void **HstPtrAddr = nullptr; + void *HstPtrVal = nullptr; + void **TgtPtrAddr = nullptr; + void *TgtPtrVal = nullptr; + + bool operator==(const ShadowPtrInfoTy &Other) const { +return HstPtrAddr == Other.HstPtrAddr; + } +}; + +inline bool operator<(const ShadowPtrInfoTy &lhs, const ShadowPtrInfoTy &rhs) { + return lhs.HstPtrAddr < rhs.HstPtrAddr; +} + +/// Map between host data and target data. +struct HostDataToTargetTy { + const uintptr_t HstPtrBase; // host info. + const uintptr_t HstPtrBegin; + const uintptr_t HstPtrEnd; // non-inclusive. + const map_var_info_t HstPtrName; // Optional source name of mapped variable. + + const uintptr_t TgtAllocBegin; // allocated target memory + const uintptr_t TgtPtrBegin; // mapped target memory = TgtAllocBegin + padding + +private: + static const uint64_t INFRefCount = ~(uint64_t)0; + static std::string refCountToStr(uint64_t RefCount) { +return RefCount == INFRefCount ? "INF" : std::to_string(RefCount); + } + + struct StatesTy { +StatesTy(uint64_t DRC, uint64_t HRC) +: DynRefCount(DRC), HoldRefCount(HRC) {} +/// The dynamic reference count is the standard reference count as of OpenMP +/// 4.5. The hold reference count is an OpenMP extension for the sake of +/// OpenACC support. +/// +/// The 'ompx_hold' map type modifier is permitted only on "omp target" and +/// "omp target data", and "delete" is permitted only on "omp target exit +/// data" and associated runtime library routines. As a result, we really +/// need to implement "reset" functionality only for the dynamic reference +/// counter. Likewise, only the dynamic reference count can be infinite +/// because, for example, omp_target_associate_ptr and "omp declare target +/// link" operate only on it. Nevertheless, it's actually easier to follow +/// the code (and requires less assertions for special cases) when we just +/// implement these features generally across both reference counters here. +/// Thus, it's the users of this class that impose those restrictions. +/// +uint64_t DynRefCount; +uint64_t HoldRefCount; + +/// A map of shadow pointers associated with this entry, the keys are host +/// pointer addresses to identify stale entries. +llvm::SmallSet ShadowPtrInfos; + +/// Pointer to the event corresponding to the data update of this map. +/// Note: At present this event is created when the first data transfer from +/// host to device is issued, and only being used for H2D. It is not used +/// for data transfer in another direction (device to host). It is still +/// unclear whether we need it for D2H. If in the future we need similar +/// mechanism for D2H, and if the event cannot be shared between them, Event +/// should be written as void *Event[2]. +void *Event = nullptr; + +/// Number of threads currently holding a re
[llvm] [libcxx] [clang] [openmp] [flang] [OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code (PR #73817)
https://github.com/jdoerfert closed https://github.com/llvm/llvm-project/pull/73817 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [OpenMP] Avoid initializing the KernelLaunchEnvironment if possible (PR #73864)
https://github.com/jdoerfert closed https://github.com/llvm/llvm-project/pull/73864 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [clang-tidy] Add bugprone-move-shared-pointer-contents check. (PR #67467)
https://github.com/jdoerfert updated https://github.com/llvm/llvm-project/pull/67467 >From 6d5d35e1273f595e8a0382053d5183cbce7a9d8a Mon Sep 17 00:00:00 2001 From: David Pizzuto Date: Tue, 26 Sep 2023 10:45:42 -0700 Subject: [PATCH] [clang-tidy] Add bugprone-move-shared-pointer-contents check. This check detects moves of the contents of a shared pointer rather than the pointer itself. Other code with a reference to the shared pointer is probably not expecting the move. The set of shared pointer classes is configurable via options to allow individual projects to cover additional types. --- .../bugprone/BugproneTidyModule.cpp | 3 + .../clang-tidy/bugprone/CMakeLists.txt| 2 + .../MoveSharedPointerContentsCheck.cpp| 60 .../bugprone/MoveSharedPointerContentsCheck.h | 37 ++ clang-tools-extra/docs/ReleaseNotes.rst | 6 ++ .../bugprone/move-shared-pointer-contents.rst | 17 + .../docs/clang-tidy/checks/list.rst | 2 + .../bugprone/move-shared-pointer-contents.cpp | 68 +++ 8 files changed, 195 insertions(+) create mode 100644 clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.cpp create mode 100644 clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.h create mode 100644 clang-tools-extra/docs/clang-tidy/checks/bugprone/move-shared-pointer-contents.rst create mode 100644 clang-tools-extra/test/clang-tidy/checkers/bugprone/move-shared-pointer-contents.cpp diff --git a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp index a67a91eedd10482..7f4a504f9930f17 100644 --- a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp +++ b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp @@ -39,6 +39,7 @@ #include "MisplacedPointerArithmeticInAllocCheck.h" #include "MisplacedWideningCastCheck.h" #include "MoveForwardingReferenceCheck.h" +#include "MoveSharedPointerContentsCheck.h" #include "MultiLevelImplicitPointerConversionCheck.h" #include "MultipleNewInOneExpressionCheck.h" #include "MultipleStatementMacroCheck.h" @@ -125,6 +126,8 @@ class BugproneModule : public ClangTidyModule { "bugprone-inaccurate-erase"); CheckFactories.registerCheck( "bugprone-incorrect-enable-if"); +CheckFactories.registerCheck( +"bugprone-move-shared-pointer-contents"); CheckFactories.registerCheck( "bugprone-switch-missing-default-case"); CheckFactories.registerCheck( diff --git a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt index 3c768021feb1502..c017f0c0cc52021 100644 --- a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt +++ b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt @@ -23,6 +23,7 @@ add_clang_library(clangTidyBugproneModule ImplicitWideningOfMultiplicationResultCheck.cpp InaccurateEraseCheck.cpp IncorrectEnableIfCheck.cpp + MoveSharedPointerContentsCheck.cpp SwitchMissingDefaultCaseCheck.cpp IncDecInConditionsCheck.cpp IncorrectRoundingsCheck.cpp @@ -35,6 +36,7 @@ add_clang_library(clangTidyBugproneModule MisplacedPointerArithmeticInAllocCheck.cpp MisplacedWideningCastCheck.cpp MoveForwardingReferenceCheck.cpp + MoveSharedPointerContentsCheck.cpp MultiLevelImplicitPointerConversionCheck.cpp MultipleNewInOneExpressionCheck.cpp MultipleStatementMacroCheck.cpp diff --git a/clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.cpp b/clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.cpp new file mode 100644 index 000..b4a393b7f2f2000 --- /dev/null +++ b/clang-tools-extra/clang-tidy/bugprone/MoveSharedPointerContentsCheck.cpp @@ -0,0 +1,60 @@ +//===--- MoveSharedPointerContentsCheck.cpp - clang-tidy --===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "MoveSharedPointerContentsCheck.h" +#include "../ClangTidyCheck.h" +#include "../utils/Matchers.h" +#include "../utils/OptionsUtils.h" +#include "clang/AST/ASTContext.h" +#include "clang/ASTMatchers/ASTMatchFinder.h" + +using namespace clang::ast_matchers; + +namespace clang::tidy::bugprone { + +MoveSharedPointerContentsCheck::MoveSharedPointerContentsCheck( +StringRef Name, ClangTidyContext *Context) +: ClangTidyCheck(Name, Context), + SharedPointerClasses(utils::options::parseStringList( + Options.get("SharedPointerClasses", "std::shared_ptr"))) {} + +MoveSharedPointerContentsCheck::~MoveSharedPointerContentsCheck() = default; + +bool MoveSharedPointerContentsCheck::isLanguageVersionSupported( +const LangOptions &LangOptions) const { + return L
[lldb] [libc] [llvm] [compiler-rt] [clang] [flang] [libcxx] [clang-tools-extra] [openmp] [lld] [mlir] [libunwind] [OpenMP] Improve omp offload profiler (PR #68016)
https://github.com/jdoerfert approved this pull request. LG. Please rebase and merge. https://github.com/llvm/llvm-project/pull/68016 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [openmp] [Clang][OpenMP] Fix mapping of structs to device (PR #75642)
jdoerfert wrote: This fails for me on the host and the AMD GPU: GPU: # | :217:1: note: possible intended match here # | dat.datum[dat.arr[0][0]] = 5 X86: # | :134:1: note: possible intended match here # | dat.datum[dat.arr[0][0]] = 5461 The location that is printed (datum[1]) is uninitialized. https://github.com/llvm/llvm-project/pull/75642 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [OpenMP][USM] Introduces -fopenmp-force-usm flag (PR #76571)
jdoerfert wrote: Documentation missing as well. https://github.com/llvm/llvm-project/pull/76571 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [libc] [lldb] [mlir] [clang] [NFC][ObjectSizeOffset] Use classes instead of std::pair (PR #76882)
https://github.com/jdoerfert commented: Generally, getting rid of the pair is great. I am unsure I understand why we do the base template rather than inheritance. Where is the base template used? If we do inheritance we could avoid duplicating of members and provide default impls that work with many things, e.g., all operator== are the same. https://github.com/llvm/llvm-project/pull/76882 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[mlir] [libc] [clang] [lldb] [llvm] [NFC][ObjectSizeOffset] Use classes instead of std::pair (PR #76882)
https://github.com/jdoerfert edited https://github.com/llvm/llvm-project/pull/76882 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libc] [clang] [lldb] [llvm] [mlir] [NFC][ObjectSizeOffset] Use classes instead of std::pair (PR #76882)
@@ -187,80 +187,147 @@ Value *lowerObjectSizeCall( const TargetLibraryInfo *TLI, AAResults *AA, bool MustSucceed, SmallVectorImpl *InsertedInstructions = nullptr); -using SizeOffsetType = std::pair; +/// SizeOffsetType - A base template class for the object size visitors. Used +/// here as a self-documenting way to handle the values rather than using a +/// \p std::pair. +template struct SizeOffsetType { + T Size; + T Offset; + + bool knownSize() const; + bool knownOffset() const; + bool anyKnown() const; + bool bothKnown() const; +}; + +/// SizeOffsetType - Used by \p ObjectSizeOffsetVisitor, which works +/// with \p APInts. +template <> struct SizeOffsetType { + APInt Size; + APInt Offset; + + SizeOffsetType() = default; + SizeOffsetType(APInt Size, APInt Offset) : Size(Size), Offset(Offset) {} + + bool knownSize() const { return Size.getBitWidth() > 1; } + bool knownOffset() const { return Offset.getBitWidth() > 1; } + bool anyKnown() const { return knownSize() || knownOffset(); } + bool bothKnown() const { return knownSize() && knownOffset(); } + + bool operator==(const SizeOffsetType &RHS) { +return Size == RHS.Size && Offset == RHS.Offset; + } + bool operator!=(const SizeOffsetType &RHS) { return !(*this == RHS); } +}; +using SizeOffsetAPInt = SizeOffsetType; /// Evaluate the size and offset of an object pointed to by a Value* /// statically. Fails if size or offset are not known at compile time. class ObjectSizeOffsetVisitor - : public InstVisitor { +: public InstVisitor { const DataLayout &DL; const TargetLibraryInfo *TLI; ObjectSizeOpts Options; unsigned IntTyBits; APInt Zero; - SmallDenseMap SeenInsts; + SmallDenseMap SeenInsts; unsigned InstructionsVisited; APInt align(APInt Size, MaybeAlign Align); - SizeOffsetType unknown() { -return std::make_pair(APInt(), APInt()); - } + static SizeOffsetAPInt unknown; public: ObjectSizeOffsetVisitor(const DataLayout &DL, const TargetLibraryInfo *TLI, LLVMContext &Context, ObjectSizeOpts Options = {}); - SizeOffsetType compute(Value *V); - - static bool knownSize(const SizeOffsetType &SizeOffset) { -return SizeOffset.first.getBitWidth() > 1; - } - - static bool knownOffset(const SizeOffsetType &SizeOffset) { -return SizeOffset.second.getBitWidth() > 1; - } - - static bool bothKnown(const SizeOffsetType &SizeOffset) { -return knownSize(SizeOffset) && knownOffset(SizeOffset); - } + SizeOffsetAPInt compute(Value *V); // These are "private", except they can't actually be made private. Only // compute() should be used by external users. - SizeOffsetType visitAllocaInst(AllocaInst &I); - SizeOffsetType visitArgument(Argument &A); - SizeOffsetType visitCallBase(CallBase &CB); - SizeOffsetType visitConstantPointerNull(ConstantPointerNull&); - SizeOffsetType visitExtractElementInst(ExtractElementInst &I); - SizeOffsetType visitExtractValueInst(ExtractValueInst &I); - SizeOffsetType visitGlobalAlias(GlobalAlias &GA); - SizeOffsetType visitGlobalVariable(GlobalVariable &GV); - SizeOffsetType visitIntToPtrInst(IntToPtrInst&); - SizeOffsetType visitLoadInst(LoadInst &I); - SizeOffsetType visitPHINode(PHINode&); - SizeOffsetType visitSelectInst(SelectInst &I); - SizeOffsetType visitUndefValue(UndefValue&); - SizeOffsetType visitInstruction(Instruction &I); + SizeOffsetAPInt visitAllocaInst(AllocaInst &I); + SizeOffsetAPInt visitArgument(Argument &A); + SizeOffsetAPInt visitCallBase(CallBase &CB); + SizeOffsetAPInt visitConstantPointerNull(ConstantPointerNull &); + SizeOffsetAPInt visitExtractElementInst(ExtractElementInst &I); + SizeOffsetAPInt visitExtractValueInst(ExtractValueInst &I); + SizeOffsetAPInt visitGlobalAlias(GlobalAlias &GA); + SizeOffsetAPInt visitGlobalVariable(GlobalVariable &GV); + SizeOffsetAPInt visitIntToPtrInst(IntToPtrInst &); + SizeOffsetAPInt visitLoadInst(LoadInst &I); + SizeOffsetAPInt visitPHINode(PHINode &); + SizeOffsetAPInt visitSelectInst(SelectInst &I); + SizeOffsetAPInt visitUndefValue(UndefValue &); + SizeOffsetAPInt visitInstruction(Instruction &I); private: - SizeOffsetType findLoadSizeOffset( + SizeOffsetAPInt findLoadSizeOffset( LoadInst &LoadFrom, BasicBlock &BB, BasicBlock::iterator From, - SmallDenseMap &VisitedBlocks, + SmallDenseMap &VisitedBlocks, unsigned &ScannedInstCount); - SizeOffsetType combineSizeOffset(SizeOffsetType LHS, SizeOffsetType RHS); - SizeOffsetType computeImpl(Value *V); - SizeOffsetType computeValue(Value *V); + SizeOffsetAPInt combineSizeOffset(SizeOffsetAPInt LHS, SizeOffsetAPInt RHS); + SizeOffsetAPInt computeImpl(Value *V); + SizeOffsetAPInt computeValue(Value *V); bool CheckedZextOrTrunc(APInt &I); }; -using SizeOffsetEvalType = std::pair; +template <> struct SizeOffsetType; + +/// SizeOffsetType - Used by \p ObjectSizeOffsetEvaluator, which works +/// with \p Values. +template <
[openmp] [clang-tools-extra] [flang] [libcxx] [libc] [compiler-rt] [clang] [lldb] [lld] [llvm] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
@@ -428,13 +428,22 @@ std::string getPGOFuncNameVarName(StringRef FuncName, return VarName; } +bool isGPUProfTarget(const Module &M) { + const auto &triple = M.getTargetTriple(); + return triple.rfind("nvptx", 0) == 0 || triple.rfind("amdgcn", 0) == 0 || + triple.rfind("r600", 0) == 0; +} + jdoerfert wrote: Use the suggesting above. This is what we use elsewhere rn. https://github.com/llvm/llvm-project/pull/76587 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[libcxx] [llvm] [lld] [libc] [compiler-rt] [openmp] [lldb] [flang] [clang-tools-extra] [clang] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
https://github.com/jdoerfert commented: Can we have tests for this? You can just check for the dump. https://github.com/llvm/llvm-project/pull/76587 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits