[llvm-branch-commits] [llvm] release/20.x: [SLP] Check for PHI nodes (potentially cycles!) when checking dependencies (PR #127294)
https://github.com/nikic commented: Looks like there is a test failures in Transforms/SLPVectorizer/X86/perfect-matched-reused-bv.ll. https://github.com/llvm/llvm-project/pull/127294 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [RISCV] [MachineOutliner] Analyze all candidates (#127659) (PR #128146)
https://github.com/wangpc-pp approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/128146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Fix preprocessor output from #embed (#126742) (PR #127222)
https://github.com/Fznamznon updated https://github.com/llvm/llvm-project/pull/127222 >From 95cf7310c15324f25e9e5276772278fa58ba6926 Mon Sep 17 00:00:00 2001 From: Mariya Podchishchaeva Date: Thu, 13 Feb 2025 10:59:21 +0100 Subject: [PATCH] [clang] Fix preprocessor output from #embed (#126742) When bytes with negative signed char values appear in the data, make sure to use raw bytes from the data string when preprocessing, not char values. Fixes https://github.com/llvm/llvm-project/issues/102798 --- clang/docs/ReleaseNotes.rst| 2 ++ clang/lib/Frontend/PrintPreprocessedOutput.cpp | 5 ++--- clang/test/Preprocessor/embed_preprocess_to_file.c | 8 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ad1a5e7ae282e..08f8491e2928d 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -897,6 +897,8 @@ Bug Fixes in This Version - No longer return ``false`` for ``noexcept`` expressions involving a ``delete`` which resolves to a destroying delete but the type of the object being deleted has a potentially throwing destructor (#GH118660). +- Clang now outputs correct values when #embed data contains bytes with negative + signed char values (#GH102798). Bug Fixes to Compiler Builtins ^^ diff --git a/clang/lib/Frontend/PrintPreprocessedOutput.cpp b/clang/lib/Frontend/PrintPreprocessedOutput.cpp index 1005825441b3e..2ae355fb33885 100644 --- a/clang/lib/Frontend/PrintPreprocessedOutput.cpp +++ b/clang/lib/Frontend/PrintPreprocessedOutput.cpp @@ -974,11 +974,10 @@ static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok, // Loop over the contents and print them as a comma-delimited list of // values. bool PrintComma = false; - for (auto Iter = Data->BinaryData.begin(), End = Data->BinaryData.end(); - Iter != End; ++Iter) { + for (unsigned char Byte : Data->BinaryData.bytes()) { if (PrintComma) *Callbacks->OS << ", "; -*Callbacks->OS << static_cast(*Iter); +*Callbacks->OS << static_cast(Byte); PrintComma = true; } } else if (Tok.isAnnotation()) { diff --git a/clang/test/Preprocessor/embed_preprocess_to_file.c b/clang/test/Preprocessor/embed_preprocess_to_file.c index 9895d958cf96d..b3c99d36f784a 100644 --- a/clang/test/Preprocessor/embed_preprocess_to_file.c +++ b/clang/test/Preprocessor/embed_preprocess_to_file.c @@ -37,3 +37,11 @@ const char even_more[] = { // DIRECTIVE-NEXT: #embed prefix(4, 5,) suffix(, 6, 7) /* clang -E -dE */ // DIRECTIVE-NEXT: , 8, 9, 10 // DIRECTIVE-NEXT: }; + +constexpr char big_one[] = { +#embed +}; + +// EXPANDED: constexpr char big_one[] = {255 +// DIRECTIVE: constexpr char big_one[] = { +// DIRECTIVE-NEXT: #embed ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Fix preprocessor output from #embed (#126742) (PR #127222)
https://github.com/AaronBallman approved this pull request. LGTM! https://github.com/llvm/llvm-project/pull/127222 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Fix cross-lane scan when given divergent lanes (#127703) (PR #128085)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/128085 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121) (PR #128132)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/128132 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [clang][CodeGen] `sret` args should always point to the `alloca` AS, so use that (#114062) (PR #127552)
tstellar wrote: @arsenm What do you think ? https://github.com/llvm/llvm-project/pull/127552 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
jodelek wrote: @Artem-B, just curious - is there anything additional that needs to happen before you can approve this? https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [RISCV] [MachineOutliner] Analyze all candidates (#127659) (PR #128146)
github-actions[bot] wrote: @svs-quic (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/128146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
https://github.com/Artem-B approved this pull request. I was the one proposing to merge this change, so I assumed that it's the release maintainers who'd need to stamp it. I am all for merging it. https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CSKY] Default to unsigned char (PR #126436)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/126436 >From 77195a5edb332947a991a1f0c4e915f5f1d9411f Mon Sep 17 00:00:00 2001 From: Alexander Richardson Date: Sun, 9 Feb 2025 12:18:52 -0800 Subject: [PATCH] [CSKY] Default to unsigned char This matches the ABI document found at https://github.com/c-sky/csky-doc/blob/master/C-SKY_V2_CPU_Applications_Binary_Interface_Standards_Manual.pdf Partially addresses https://github.com/llvm/llvm-project/issues/115957 Reviewed By: zixuan-wu Pull Request: https://github.com/llvm/llvm-project/pull/115961 (cherry picked from commit d2047242e6d0f0deb7634ff22ab164354c520c79) --- clang/lib/Driver/ToolChains/Clang.cpp | 1 + clang/test/Driver/csky-toolchain.c| 1 + 2 files changed, 2 insertions(+) diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index ec5ee29ece434..57b7d2bd46989 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -1358,6 +1358,7 @@ static bool isSignedCharDefault(const llvm::Triple &Triple) { return true; return false; + case llvm::Triple::csky: case llvm::Triple::hexagon: case llvm::Triple::msp430: case llvm::Triple::ppcle: diff --git a/clang/test/Driver/csky-toolchain.c b/clang/test/Driver/csky-toolchain.c index 66485464652ac..638ce64ec98cd 100644 --- a/clang/test/Driver/csky-toolchain.c +++ b/clang/test/Driver/csky-toolchain.c @@ -3,6 +3,7 @@ // RUN: %clang -### %s --target=csky 2>&1 | FileCheck -check-prefix=CC1 %s // CC1: "-cc1" "-triple" "csky" +// CC1: "-fno-signed-char" // In the below tests, --rtlib=platform is used so that the driver ignores // the configure-time CLANG_DEFAULT_RTLIB option when choosing the runtime lib ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 1504fc5 - AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487) (#127751)
Author: Matt Arsenault Date: 2025-02-21T09:35:52-08:00 New Revision: 1504fc57d88d5d700d5f8053ebc46b33e8bb12bf URL: https://github.com/llvm/llvm-project/commit/1504fc57d88d5d700d5f8053ebc46b33e8bb12bf DIFF: https://github.com/llvm/llvm-project/commit/1504fc57d88d5d700d5f8053ebc46b33e8bb12bf.diff LOG: AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487) (#127751) These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: ``` void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } ``` This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d03708. (cherry picked from commit 18ea6c9) Added: Modified: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll llvm/test/CodeGen/AMDGPU/invalid-addrspacecast.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp index e9e47eaadd557..e84f0f5fa615a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2426,11 +2426,8 @@ bool AMDGPULegalizerInfo::legalizeAddrSpaceCast( return true; } - DiagnosticInfoUnsupported InvalidAddrSpaceCast( - MF.getFunction(), "invalid addrspacecast", B.getDebugLoc()); - - LLVMContext &Ctx = MF.getFunction().getContext(); - Ctx.diagnose(InvalidAddrSpaceCast); + // Invalid casts are poison. + // TODO: Should return poison B.buildUndef(Dst); MI.eraseFromParent(); return true; diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index b632c50dae0e3..e09df53995d61 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -7340,11 +7340,8 @@ SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op, // global <-> flat are no-ops and never emitted. - const MachineFunction &MF = DAG.getMachineFunction(); - DiagnosticInfoUnsupported InvalidAddrSpaceCast( - MF.getFunction(), "invalid addrspacecast", SL.getDebugLoc()); - DAG.getContext()->diagnose(InvalidAddrSpaceCast); - + // Invalid casts are poison. + // TODO: Should return poison return DAG.getUNDEF(Op->getValueType(0)); } diff --git a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll index f5c9b1a79b476..9b446896db590 100644 --- a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll +++ b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll @@ -444,6 +444,761 @@ define float @no_unsafe(ptr %addr, float %val) { ret float %res } +@global = hidden addrspace(1) global i64 0, align 8 + +; Make sure there is no error on an invalid addrspacecast without optimizations +define i64 @optnone_atomicrmw_add_i64_expand(i64 %val) #1 { +; GFX908-LABEL: optnone_atomicrmw_add_i64_expand: +; GFX908: ; %bb.0: +; GFX908-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX908-NEXT:s_mov_b64 s[4:5], src_private_base +; GFX908-NEXT:s_mov_b32 s6, 32 +; GFX908-NEXT:s_lshr_b64 s[4:5], s[4:5], s6 +; GFX908-NEXT:s_getpc_b64 s[6:7] +; GFX908-NEXT:s_add_u32 s6, s6, global@rel32@lo+4 +; GFX908-NEXT:s_addc_u32 s7, s7, global@rel32@hi+12 +; GFX908-NEXT:s_cmp_eq_u32 s7, s4 +; GFX908-NEXT:s_cselect_b64 s[4:5], -1, 0 +; GFX908-NEXT:v_cndmask_b32_e64 v2, 0, 1, s[4:5] +; GFX908-NEXT:s_mov_b64 s[4:5], -1 +; GFX908-NEXT:s_mov_b32 s6, 1 +; GFX908-NEXT:v_cmp_ne_u32_e64 s[6:7], v2, s6 +; GFX908-NEXT:s_and_b64 vcc, exec, s[6:7] +; GFX908-NEXT:; implicit-def: $vgpr3_vgpr4 +; GFX908-NEXT:s_cbranch_vccnz .LBB4_3 +; GFX908-NEXT: .LBB4_1: ; %Flow +; GFX908-NEXT:v_cndmask_b32_e64 v2, 0, 1, s[4:5] +; GFX908-NEXT:s_mov_b32 s4, 1 +; GFX908-NEXT:v_cmp_ne_u32_e64 s[4:5], v2, s4 +; GFX908-NEXT:s_and_b64 vcc, exec, s[4:5] +; GFX908-NEXT:s_cbranch_vccnz .LBB4_4 +; GFX908-NEXT: ; %bb.2: ; %atomicrmw.private +; GFX908-NEXT:s_waitcnt lgkmcnt(0) +; GFX908-NEXT:buffer_load_dword v3, v0, s[0:3], 0 offen +; GFX908-NEXT:s_waitcnt vmcnt(0) +; GFX908-NEXT:v_mov_b32_e32 v4, v3 +; GFX908-NEXT:v_add_co_u32_e64 v0, s[4:5], v3, v0 +; GFX908-NEXT:v_addc_co_u32_e64 v1, s[4:5], v4, v1, s[4
[llvm-branch-commits] [clang] release/20.x: [CSKY] Default to unsigned char (PR #126436)
arichardson wrote: > @arichardson Would you be able to create a follow-up PR with the a release > note entry? Sure, will do this. https://github.com/llvm/llvm-project/pull/126436 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] deb63e7 - [clang] Track function template instantiation from definition (#125266) (#127777)
Author: Matheus Izvekov Date: 2025-02-21T10:49:10-08:00 New Revision: deb63e72d6c9ed98a2fbf4f8249ca6911bd189b8 URL: https://github.com/llvm/llvm-project/commit/deb63e72d6c9ed98a2fbf4f8249ca6911bd189b8 DIFF: https://github.com/llvm/llvm-project/commit/deb63e72d6c9ed98a2fbf4f8249ca6911bd189b8.diff LOG: [clang] Track function template instantiation from definition (#125266) (#12) This fixes instantiation of definition for friend function templates, when the declaration found and the one containing the definition have different template contexts. In these cases, the the function declaration corresponding to the definition is not available; it may not even be instantiated at all. So this patch adds a bit which tracks which function template declaration was instantiated from the member template. It's used to find which primary template serves as a context for the purpose of obtainining the template arguments needed to instantiate the definition. Fixes #55509 Added: clang/test/SemaTemplate/GH55509.cpp Modified: clang/docs/ReleaseNotes.rst clang/include/clang/AST/Decl.h clang/include/clang/AST/DeclBase.h clang/include/clang/AST/DeclTemplate.h clang/lib/AST/Decl.cpp clang/lib/Sema/SemaTemplateDeduction.cpp clang/lib/Sema/SemaTemplateInstantiate.cpp clang/lib/Sema/SemaTemplateInstantiateDecl.cpp clang/lib/Serialization/ASTReaderDecl.cpp clang/lib/Serialization/ASTWriterDecl.cpp Removed: diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index e716efa46a1f0..a2518042cb5b0 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -1057,6 +1057,7 @@ Bug Fixes to C++ Support - Fix that some dependent immediate expressions did not cause immediate escalation (#GH119046) - Fixed a substitution bug in transforming CTAD aliases when the type alias contains a non-pack template argument corresponding to a pack parameter (#GH124715) +- Clang is now better at keeping track of friend function template instance contexts. (#GH55509) Bug Fixes to AST Handling ^ diff --git a/clang/include/clang/AST/Decl.h b/clang/include/clang/AST/Decl.h index 9593bab576412..362a2741a0cdd 100644 --- a/clang/include/clang/AST/Decl.h +++ b/clang/include/clang/AST/Decl.h @@ -2298,6 +2298,13 @@ class FunctionDecl : public DeclaratorDecl, FunctionDeclBits.IsLateTemplateParsed = ILT; } + bool isInstantiatedFromMemberTemplate() const { +return FunctionDeclBits.IsInstantiatedFromMemberTemplate; + } + void setInstantiatedFromMemberTemplate(bool Val = true) { +FunctionDeclBits.IsInstantiatedFromMemberTemplate = Val; + } + /// Whether this function is "trivial" in some specialized C++ senses. /// Can only be true for default constructors, copy constructors, /// copy assignment operators, and destructors. Not meaningful until diff --git a/clang/include/clang/AST/DeclBase.h b/clang/include/clang/AST/DeclBase.h index 3bb82c1572ef9..648dae2838e03 100644 --- a/clang/include/clang/AST/DeclBase.h +++ b/clang/include/clang/AST/DeclBase.h @@ -1780,6 +1780,8 @@ class DeclContext { uint64_t HasImplicitReturnZero : 1; LLVM_PREFERRED_TYPE(bool) uint64_t IsLateTemplateParsed : 1; +LLVM_PREFERRED_TYPE(bool) +uint64_t IsInstantiatedFromMemberTemplate : 1; /// Kind of contexpr specifier as defined by ConstexprSpecKind. LLVM_PREFERRED_TYPE(ConstexprSpecKind) @@ -1830,7 +1832,7 @@ class DeclContext { }; /// Number of inherited and non-inherited bits in FunctionDeclBitfields. - enum { NumFunctionDeclBits = NumDeclContextBits + 31 }; + enum { NumFunctionDeclBits = NumDeclContextBits + 32 }; /// Stores the bits used by CXXConstructorDecl. If modified /// NumCXXConstructorDeclBits and the accessor @@ -1841,12 +1843,12 @@ class DeclContext { LLVM_PREFERRED_TYPE(FunctionDeclBitfields) uint64_t : NumFunctionDeclBits; -/// 20 bits to fit in the remaining available space. +/// 19 bits to fit in the remaining available space. /// Note that this makes CXXConstructorDeclBitfields take /// exactly 64 bits and thus the width of NumCtorInitializers /// will need to be shrunk if some bit is added to NumDeclContextBitfields, /// NumFunctionDeclBitfields or CXXConstructorDeclBitfields. -uint64_t NumCtorInitializers : 17; +uint64_t NumCtorInitializers : 16; LLVM_PREFERRED_TYPE(bool) uint64_t IsInheritingConstructor : 1; @@ -1860,7 +1862,7 @@ class DeclContext { }; /// Number of inherited and non-inherited bits in CXXConstructorDeclBitfields. - enum { NumCXXConstructorDeclBits = NumFunctionDeclBits + 20 }; + enum { NumCXXConstructorDeclBits = NumFunctionDeclBits + 19 }; /// Stores the bits used by ObjCMethodDecl. /// If modified NumObjCMethodDeclBits and the accessor diff --git a/clang/include/clang/AST/Dec
[llvm-branch-commits] [clang] Backport: [clang] Track function template instantiation from definition (#125266) (PR #127777)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/12 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)
nhaehnle wrote: How about this comment from earlier: > Every Inst may potentially appear with many UseInsts in the temporal > divergence list. The current code will create multiple new registers and > multiple COPY instructions, which seems wasteful even if downstream passes > can often clean it up. > > I would suggest capturing the created register in a DenseMap Register> for re-use. > > Also, how about inserting the COPY at the end of Inst->getParent()? That way, > the live range of the VGPR is reduced. ? https://github.com/llvm/llvm-project/pull/124298 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CMake][Release] Statically link clang with stage1 runtimes (#127268) (PR #127949)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/127949 >From dc1bd6a8fa6a5f4fc38f7c3ce77c0ffcfcaa66e9 Mon Sep 17 00:00:00 2001 From: Tom Stellard Date: Wed, 19 Feb 2025 17:46:29 -0800 Subject: [PATCH] [CMake][Release] Statically link clang with stage1 runtimes (#127268) This change will cause clang and the other tools to statically link against the runtimes built in stage1. This will make the built binaries more portable by eliminating dependencies on system libraries like libgcc and libstdc++. (cherry picked from commit f5b311e47de044160aeb25221095898c35c4847f) --- clang/cmake/caches/Release.cmake | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake index 23e99493087ff..a1c68fc51dbd0 100644 --- a/clang/cmake/caches/Release.cmake +++ b/clang/cmake/caches/Release.cmake @@ -48,10 +48,8 @@ set(CLANG_ENABLE_BOOTSTRAP ON CACHE BOOL "") set(STAGE1_PROJECTS "clang") -# Building Flang on Windows requires compiler-rt, so we need to build it in -# stage1. compiler-rt is also required for building the Flang tests on -# macOS. -set(STAGE1_RUNTIMES "compiler-rt") +# Build all runtimes so we can statically link them into the stage2 compiler. +set(STAGE1_RUNTIMES "compiler-rt;libcxx;libcxxabi;libunwind") if (LLVM_RELEASE_ENABLE_PGO) list(APPEND STAGE1_PROJECTS "lld") @@ -90,9 +88,20 @@ else() set(CLANG_BOOTSTRAP_TARGETS ${LLVM_RELEASE_FINAL_STAGE_TARGETS} CACHE STRING "") endif() +if (LLVM_RELEASE_ENABLE_LTO) + # Enable LTO for the runtimes. We need to configure stage1 clang to default + # to using lld as the linker because the stage1 toolchain will be used to + # build and link the runtimes. + # FIXME: We can't use LLVM_ENABLE_LTO=Thin here, because it causes the CMake + # step for the libcxx build to fail. CMAKE_INTERPROCEDURAL_OPTIMIZATION does + # enable ThinLTO, though. + set(RUNTIMES_CMAKE_ARGS "-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON -DLLVM_ENABLE_LLD=ON" CACHE STRING "") +endif() + # Stage 1 Common Config set(LLVM_ENABLE_RUNTIMES ${STAGE1_RUNTIMES} CACHE STRING "") set(LLVM_ENABLE_PROJECTS ${STAGE1_PROJECTS} CACHE STRING "") +set(LIBCXX_STATICALLY_LINK_ABI_IN_STATIC_LIBRARY ON CACHE STRING "") # stage2-instrumented and Final Stage Config: # Options that need to be set in both the instrumented stage (if we are doing @@ -102,6 +111,16 @@ set_instrument_and_final_stage_var(LLVM_ENABLE_LTO "${LLVM_RELEASE_ENABLE_LTO}" if (LLVM_RELEASE_ENABLE_LTO) set_instrument_and_final_stage_var(LLVM_ENABLE_LLD "ON" BOOL) endif() +set_instrument_and_final_stage_var(LLVM_ENABLE_LIBCXX "ON" BOOL) +set_instrument_and_final_stage_var(LLVM_STATIC_LINK_CXX_STDLIB "ON" BOOL) +set(RELEASE_LINKER_FLAGS "-rtlib=compiler-rt --unwindlib=libunwind") +if(NOT ${CMAKE_HOST_SYSTEM_NAME} MATCHES "Darwin") + set(RELEASE_LINKER_FLAGS "${RELEASE_LINKER_FLAGS} -static-libgcc") +endif() + +set_instrument_and_final_stage_var(CMAKE_EXE_LINKER_FLAGS ${RELEASE_LINKER_FLAGS} STRING) +set_instrument_and_final_stage_var(CMAKE_SHARED_LINKER_FLAGS ${RELEASE_LINKER_FLAGS} STRING) +set_instrument_and_final_stage_var(CMAKE_MODULE_LINKER_FLAGS ${RELEASE_LINKER_FLAGS} STRING) # Final Stage Config (stage2) set_final_stage_var(LLVM_ENABLE_RUNTIMES "${LLVM_RELEASE_ENABLE_RUNTIMES}" STRING) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 3076a68 - [RISCV] [MachineOutliner] Analyze all candidates (#127659)
Author: Sudharsan Veeravalli Date: 2025-02-21T10:56:29-08:00 New Revision: 3076a68f69aac3f87195eec12f38908a499263cb URL: https://github.com/llvm/llvm-project/commit/3076a68f69aac3f87195eec12f38908a499263cb DIFF: https://github.com/llvm/llvm-project/commit/3076a68f69aac3f87195eec12f38908a499263cb.diff LOG: [RISCV] [MachineOutliner] Analyze all candidates (#127659) #117700 made a change from analyzing all the candidates to analyzing just the first candidate before deciding to either delete or keep all of them. Even though the candidates all have the same instructions, the basic blocks in which they are present are different and we will need to check each of them before deciding whether to keep or erase them. Particularly, `isAvailableAcrossAndOutOfSeq` checks to see if the register (x5 in this case) is available from the end of the MBB to the beginning of the candidate and not checking this for each candidate led to incorrect candidates being outlined resulting in correctness issues in a few downstream benchmarks. Similarly, deleting all the candidates if the first one is not viable will result in missed outlining opportunities. (cherry picked from commit 6757cf4e6f1c7767d605e579930a24758c0778dc) Added: llvm/test/CodeGen/RISCV/machine-outliner-call-x5-liveout.mir Modified: llvm/lib/Target/RISCV/RISCVInstrInfo.cpp Removed: diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp index 12a7af0750813..87f1f35835cbe 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp +++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp @@ -3010,30 +3010,25 @@ static bool cannotInsertTailCall(const MachineBasicBlock &MBB) { return false; } -static std::optional -analyzeCandidate(outliner::Candidate &C) { +static bool analyzeCandidate(outliner::Candidate &C) { // If last instruction is return then we can rely on // the verification already performed in the getOutliningTypeImpl. if (C.back().isReturn()) { assert(!cannotInsertTailCall(*C.getMBB()) && "The candidate who uses return instruction must be outlined " "using tail call"); -return MachineOutlinerTailCall; +return false; } - auto CandidateUsesX5 = [](outliner::Candidate &C) { -const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo(); -if (std::any_of(C.begin(), C.end(), [TRI](const MachineInstr &MI) { - return isMIModifiesReg(MI, TRI, RISCV::X5); -})) - return true; -return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI); - }; - - if (!CandidateUsesX5(C)) -return MachineOutlinerDefault; + // Filter out candidates where the X5 register (t0) can't be used to setup + // the function call. + const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo(); + if (std::any_of(C.begin(), C.end(), [TRI](const MachineInstr &MI) { +return isMIModifiesReg(MI, TRI, RISCV::X5); + })) +return true; - return std::nullopt; + return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI); } std::optional> @@ -3042,35 +3037,32 @@ RISCVInstrInfo::getOutliningCandidateInfo( std::vector &RepeatedSequenceLocs, unsigned MinRepeats) const { - // Each RepeatedSequenceLoc is identical. - outliner::Candidate &Candidate = RepeatedSequenceLocs[0]; - auto CandidateInfo = analyzeCandidate(Candidate); - if (!CandidateInfo) -RepeatedSequenceLocs.clear(); + // Analyze each candidate and erase the ones that are not viable. + llvm::erase_if(RepeatedSequenceLocs, analyzeCandidate); // If the sequence doesn't have enough candidates left, then we're done. if (RepeatedSequenceLocs.size() < MinRepeats) return std::nullopt; + // Each RepeatedSequenceLoc is identical. + outliner::Candidate &Candidate = RepeatedSequenceLocs[0]; unsigned InstrSizeCExt = Candidate.getMF()->getSubtarget().hasStdExtCOrZca() ? 2 : 4; unsigned CallOverhead = 0, FrameOverhead = 0; - MachineOutlinerConstructionID MOCI = CandidateInfo.value(); - switch (MOCI) { - case MachineOutlinerDefault: -// call t0, function = 8 bytes. -CallOverhead = 8; -// jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled. -FrameOverhead = InstrSizeCExt; -break; - case MachineOutlinerTailCall: + MachineOutlinerConstructionID MOCI = MachineOutlinerDefault; + if (Candidate.back().isReturn()) { +MOCI = MachineOutlinerTailCall; // tail call = auipc + jalr in the worst case without linker relaxation. CallOverhead = 4 + InstrSizeCExt; // Using tail call we move ret instruction from caller to callee. FrameOverhead = 0; -break; + } else { +// call t0, function = 8 bytes. +CallOverhead = 8; +// jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled. +FrameOverhead
[llvm-branch-commits] [llvm] release/20.x: [RISCV] [MachineOutliner] Analyze all candidates (#127659) (PR #128146)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/128146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [RISCV] [MachineOutliner] Analyze all candidates (#127659) (PR #128146)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/128146 >From 3076a68f69aac3f87195eec12f38908a499263cb Mon Sep 17 00:00:00 2001 From: Sudharsan Veeravalli Date: Fri, 21 Feb 2025 12:53:13 +0530 Subject: [PATCH] [RISCV] [MachineOutliner] Analyze all candidates (#127659) #117700 made a change from analyzing all the candidates to analyzing just the first candidate before deciding to either delete or keep all of them. Even though the candidates all have the same instructions, the basic blocks in which they are present are different and we will need to check each of them before deciding whether to keep or erase them. Particularly, `isAvailableAcrossAndOutOfSeq` checks to see if the register (x5 in this case) is available from the end of the MBB to the beginning of the candidate and not checking this for each candidate led to incorrect candidates being outlined resulting in correctness issues in a few downstream benchmarks. Similarly, deleting all the candidates if the first one is not viable will result in missed outlining opportunities. (cherry picked from commit 6757cf4e6f1c7767d605e579930a24758c0778dc) --- llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | 52 +++ .../machine-outliner-call-x5-liveout.mir | 136 ++ 2 files changed, 158 insertions(+), 30 deletions(-) create mode 100644 llvm/test/CodeGen/RISCV/machine-outliner-call-x5-liveout.mir diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp index 12a7af0750813..87f1f35835cbe 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp +++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp @@ -3010,30 +3010,25 @@ static bool cannotInsertTailCall(const MachineBasicBlock &MBB) { return false; } -static std::optional -analyzeCandidate(outliner::Candidate &C) { +static bool analyzeCandidate(outliner::Candidate &C) { // If last instruction is return then we can rely on // the verification already performed in the getOutliningTypeImpl. if (C.back().isReturn()) { assert(!cannotInsertTailCall(*C.getMBB()) && "The candidate who uses return instruction must be outlined " "using tail call"); -return MachineOutlinerTailCall; +return false; } - auto CandidateUsesX5 = [](outliner::Candidate &C) { -const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo(); -if (std::any_of(C.begin(), C.end(), [TRI](const MachineInstr &MI) { - return isMIModifiesReg(MI, TRI, RISCV::X5); -})) - return true; -return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI); - }; - - if (!CandidateUsesX5(C)) -return MachineOutlinerDefault; + // Filter out candidates where the X5 register (t0) can't be used to setup + // the function call. + const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo(); + if (std::any_of(C.begin(), C.end(), [TRI](const MachineInstr &MI) { +return isMIModifiesReg(MI, TRI, RISCV::X5); + })) +return true; - return std::nullopt; + return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI); } std::optional> @@ -3042,35 +3037,32 @@ RISCVInstrInfo::getOutliningCandidateInfo( std::vector &RepeatedSequenceLocs, unsigned MinRepeats) const { - // Each RepeatedSequenceLoc is identical. - outliner::Candidate &Candidate = RepeatedSequenceLocs[0]; - auto CandidateInfo = analyzeCandidate(Candidate); - if (!CandidateInfo) -RepeatedSequenceLocs.clear(); + // Analyze each candidate and erase the ones that are not viable. + llvm::erase_if(RepeatedSequenceLocs, analyzeCandidate); // If the sequence doesn't have enough candidates left, then we're done. if (RepeatedSequenceLocs.size() < MinRepeats) return std::nullopt; + // Each RepeatedSequenceLoc is identical. + outliner::Candidate &Candidate = RepeatedSequenceLocs[0]; unsigned InstrSizeCExt = Candidate.getMF()->getSubtarget().hasStdExtCOrZca() ? 2 : 4; unsigned CallOverhead = 0, FrameOverhead = 0; - MachineOutlinerConstructionID MOCI = CandidateInfo.value(); - switch (MOCI) { - case MachineOutlinerDefault: -// call t0, function = 8 bytes. -CallOverhead = 8; -// jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled. -FrameOverhead = InstrSizeCExt; -break; - case MachineOutlinerTailCall: + MachineOutlinerConstructionID MOCI = MachineOutlinerDefault; + if (Candidate.back().isReturn()) { +MOCI = MachineOutlinerTailCall; // tail call = auipc + jalr in the worst case without linker relaxation. CallOverhead = 4 + InstrSizeCExt; // Using tail call we move ret instruction from caller to callee. FrameOverhead = 0; -break; + } else { +// call t0, function = 8 bytes. +CallOverhead = 8; +// jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled. +FrameOverhead = InstrSizeCExt; } fo
[llvm-branch-commits] [clang] release/20.x: [CSKY] Default to unsigned char (PR #126436)
tstellar wrote: @arichardson Would you be able to create a follow-up PR with the a release note entry? https://github.com/llvm/llvm-project/pull/126436 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Stop emitting an error on illegal addrspacecasts (PR #127751)
github-actions[bot] wrote: @arsenm (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/127751 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Rework update of `AAAMDWavesPerEU` (PR #123995)
shiltian wrote: ping @arsenm @jdoerfert https://github.com/llvm/llvm-project/pull/123995 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Stop emitting an error on illegal addrspacecasts (PR #127751)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/127751 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] d51f233 - AMDGPU: Add some release 20 notes (#128136)
Author: Matt Arsenault Date: 2025-02-21T11:23:06-08:00 New Revision: d51f23377a77eace4ef006e0e6b23460ed05576c URL: https://github.com/llvm/llvm-project/commit/d51f23377a77eace4ef006e0e6b23460ed05576c DIFF: https://github.com/llvm/llvm-project/commit/d51f23377a77eace4ef006e0e6b23460ed05576c.diff LOG: AMDGPU: Add some release 20 notes (#128136) Added: Modified: llvm/docs/ReleaseNotes.md Removed: diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md index c80aecfdea084..e654509792652 100644 --- a/llvm/docs/ReleaseNotes.md +++ b/llvm/docs/ReleaseNotes.md @@ -159,6 +159,17 @@ Changes to the AArch64 Backend Changes to the AMDGPU Backend - +* Initial support for gfx950 + +* Improved ``llvm.memcpy``, ``llvm.memmove`` and ``llvm.memset`` lowering + +* Fixed expansion of 64-bit flat address space ``atomicrmw`` and + ``cmpxchg`` operations which may access private + memory. `noalias.addrspace` metadat may be used to avoid the + expansion if the target address is known to not be on the stack. + +* Fix compile failures when emitting unreachable functions. + * Removed `llvm.amdgcn.flat.atomic.fadd` and `llvm.amdgcn.global.atomic.fadd` intrinsics. Users should use the {ref}`atomicrmw ` instruction with `fadd` and ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add some release 20 notes (PR #128136)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/128136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add some release 20 notes (PR #128136)
github-actions[bot] wrote: @arsenm (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/128136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
jodelek wrote: Do you know who is the person I should bother? https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
efriedma-quic wrote: The process is that the patch is first reviewed by someone familiar with the code. They approve the patch, and describe how the fix meets the release branch patch requirements (https://llvm.org/docs/HowToReleaseLLVM.html#release-patch-rules). Once it's approved, the release manager will look at the look at the patch, and either merge or request changes. You don't need to specifically ping the release manager; they track all the pending pull requests. https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
Artem-B wrote: > patch is first reviewed by someone familiar with the code. That would be me, as I am the maintainer of CUDA code and had reviewed the original PR. > They approve the patch, and describe how the fix meets the release branch > patch requirements > (https://llvm.org/docs/HowToReleaseLLVM.html#release-patch-rules). This patch fits item #3 on the rule list "or completion of features that were started before the branch was created. " These changes allow clang users to compile CUDA code with just-released cuda-12.8 which adds these new GPU variants. https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: Add Wasm, RISC-V, BPF, and NVPTX targets back to Windows release packaging (#127794) (PR #127982)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/127982 >From b727a13fecc4e29b6f8499afd95626795c9f6a8e Mon Sep 17 00:00:00 2001 From: Hans Wennborg Date: Thu, 20 Feb 2025 11:02:33 +0100 Subject: [PATCH] Add Wasm, RISC-V, BPF, and NVPTX targets back to Windows release packaging (#127794) In #106059 we reduced the targets to those supported by Windows (X86 and ARM) to avoid running into size limitations of the NSIS compiler. Since then, people complained about the lack of Wasm [1], RISC-V [2], BPF [3], and NVPTX [4]. These do seem to fit in the installer (at least for 20.1.0-rc2), so let's add them back. [1] https://discourse.llvm.org/t/llvm-19-x-release-third-party-binaries/80374/26 [2] https://discourse.llvm.org/t/llvm-19-x-release-third-party-binaries/80374/53 [3] https://github.com/llvm/llvm-project/issues/127120 [4] https://github.com/llvm/llvm-project/pull/127794#issuecomment-2668677203 (cherry picked from commit 6e047a5ab42698165a4746ef681396fab1698327) --- llvm/utils/release/build_llvm_release.bat | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/utils/release/build_llvm_release.bat b/llvm/utils/release/build_llvm_release.bat index dd041d7d384ec..1c30673cf88bd 100755 --- a/llvm/utils/release/build_llvm_release.bat +++ b/llvm/utils/release/build_llvm_release.bat @@ -150,7 +150,7 @@ set common_cmake_flags=^ -DCMAKE_BUILD_TYPE=Release ^ -DLLVM_ENABLE_ASSERTIONS=OFF ^ -DLLVM_INSTALL_TOOLCHAIN_ONLY=ON ^ - -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86" ^ + -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86;BPF;WebAssembly;RISCV;NVPTX" ^ -DLLVM_BUILD_LLVM_C_DYLIB=ON ^ -DCMAKE_INSTALL_UCRT_LIBRARIES=ON ^ -DPython3_FIND_REGISTRY=NEVER ^ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: Add Wasm, RISC-V, BPF, and NVPTX targets back to Windows release packaging (#127794) (PR #127982)
github-actions[bot] wrote: @zmodem (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/127982 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CSKY] Default to unsigned char (PR #126436)
arichardson wrote: > @arichardson (or anyone else). If you would like to add a note about this fix > in the release notes (completely optional). Please reply to this comment with > a one or two sentence description of the fix. When you are done, please add > the release:note label to this PR. Prior to Clang 20, the CSKY target used an incorrect ABI with `char` being signed instead of unsigned. Code that relies on the incorrect definition can restore the old behavior by passing `-fsigned-char`. https://github.com/llvm/llvm-project/pull/126436 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] af9d7dd - [libc++] Fix stray usage of _LIBCPP_HAS_NO_WIDE_CHARACTERS on Windows
Author: Louis Dionne Date: 2025-02-21T14:08:41-08:00 New Revision: af9d7dda2125c2ab10758ce6b5a968fd56af5048 URL: https://github.com/llvm/llvm-project/commit/af9d7dda2125c2ab10758ce6b5a968fd56af5048 DIFF: https://github.com/llvm/llvm-project/commit/af9d7dda2125c2ab10758ce6b5a968fd56af5048.diff LOG: [libc++] Fix stray usage of _LIBCPP_HAS_NO_WIDE_CHARACTERS on Windows (cherry picked from commit bcfd9f81e1bc9954d616ffbb8625099916bebd5b) Added: Modified: libcxx/include/__locale_dir/support/windows.h Removed: diff --git a/libcxx/include/__locale_dir/support/windows.h b/libcxx/include/__locale_dir/support/windows.h index ff89d3e87eb44..f0f76c527264a 100644 --- a/libcxx/include/__locale_dir/support/windows.h +++ b/libcxx/include/__locale_dir/support/windows.h @@ -215,7 +215,7 @@ inline _LIBCPP_HIDE_FROM_ABI size_t __strxfrm(char* __dest, const char* __src, s return ::_strxfrm_l(__dest, __src, __n, __loc); } -#ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS +#if _LIBCPP_HAS_WIDE_CHARACTERS inline _LIBCPP_HIDE_FROM_ABI int __iswctype(wint_t __c, wctype_t __type, __locale_t __loc) { return ::_iswctype_l(__c, __type, __loc); } @@ -240,7 +240,7 @@ inline _LIBCPP_HIDE_FROM_ABI int __wcscoll(const wchar_t* __ws1, const wchar_t* inline _LIBCPP_HIDE_FROM_ABI size_t __wcsxfrm(wchar_t* __dest, const wchar_t* __src, size_t __n, __locale_t __loc) { return ::_wcsxfrm_l(__dest, __src, __n, __loc); } -#endif // !_LIBCPP_HAS_NO_WIDE_CHARACTERS +#endif // _LIBCPP_HAS_WIDE_CHARACTERS #if defined(__MINGW32__) && __MSVCRT_VERSION__ < 0x0800 _LIBCPP_EXPORTED_FROM_ABI size_t __strftime(char*, size_t, const char*, const struct tm*, __locale_t); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/20.x: [libc++] Reduce the dependency of the locale base API on the base system from the headers (#117764) (PR #128009)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/128009 >From af9d7dda2125c2ab10758ce6b5a968fd56af5048 Mon Sep 17 00:00:00 2001 From: Louis Dionne Date: Wed, 5 Feb 2025 08:33:14 -0500 Subject: [PATCH 1/2] [libc++] Fix stray usage of _LIBCPP_HAS_NO_WIDE_CHARACTERS on Windows (cherry picked from commit bcfd9f81e1bc9954d616ffbb8625099916bebd5b) --- libcxx/include/__locale_dir/support/windows.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libcxx/include/__locale_dir/support/windows.h b/libcxx/include/__locale_dir/support/windows.h index ff89d3e87eb44..f0f76c527264a 100644 --- a/libcxx/include/__locale_dir/support/windows.h +++ b/libcxx/include/__locale_dir/support/windows.h @@ -215,7 +215,7 @@ inline _LIBCPP_HIDE_FROM_ABI size_t __strxfrm(char* __dest, const char* __src, s return ::_strxfrm_l(__dest, __src, __n, __loc); } -#ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS +#if _LIBCPP_HAS_WIDE_CHARACTERS inline _LIBCPP_HIDE_FROM_ABI int __iswctype(wint_t __c, wctype_t __type, __locale_t __loc) { return ::_iswctype_l(__c, __type, __loc); } @@ -240,7 +240,7 @@ inline _LIBCPP_HIDE_FROM_ABI int __wcscoll(const wchar_t* __ws1, const wchar_t* inline _LIBCPP_HIDE_FROM_ABI size_t __wcsxfrm(wchar_t* __dest, const wchar_t* __src, size_t __n, __locale_t __loc) { return ::_wcsxfrm_l(__dest, __src, __n, __loc); } -#endif // !_LIBCPP_HAS_NO_WIDE_CHARACTERS +#endif // _LIBCPP_HAS_WIDE_CHARACTERS #if defined(__MINGW32__) && __MSVCRT_VERSION__ < 0x0800 _LIBCPP_EXPORTED_FROM_ABI size_t __strftime(char*, size_t, const char*, const struct tm*, __locale_t); >From 43a04b1db60414089bc7f864feb7cd8be7600498 Mon Sep 17 00:00:00 2001 From: Louis Dionne Date: Thu, 20 Feb 2025 08:38:42 -0500 Subject: [PATCH 2/2] [libc++] Reduce the dependency of the locale base API on the base system from the headers (#117764) Many parts of the locale base API are only required when building the shared/static library, but not from the headers. Document those functions and carve out a few of those that don't work when _XOPEN_SOURCE is defined to something old. Fixes #117630 (cherry picked from commit f00b32e2d0ee666d32f1ddd0c687e269fab95b44) --- libcxx/include/__locale_dir/locale_base_api.h | 56 --- .../include/__locale_dir/support/bsd_like.h | 22 +--- libcxx/include/__locale_dir/support/fuchsia.h | 9 ++- .../support/no_locale/characters.h| 8 ++- libcxx/include/__locale_dir/support/windows.h | 18 -- libcxx/test/libcxx/xopen_source.gen.py| 53 ++ 6 files changed, 128 insertions(+), 38 deletions(-) create mode 100644 libcxx/test/libcxx/xopen_source.gen.py diff --git a/libcxx/include/__locale_dir/locale_base_api.h b/libcxx/include/__locale_dir/locale_base_api.h index bbee9f49867fd..c1e73caeecced 100644 --- a/libcxx/include/__locale_dir/locale_base_api.h +++ b/libcxx/include/__locale_dir/locale_base_api.h @@ -23,12 +23,16 @@ // Variadic functions may be implemented as templates with a parameter pack instead // of C-style variadic functions. // +// Most of these functions are only required when building the library. Functions that are also +// required when merely using the headers are marked as such below. +// // TODO: __localeconv shouldn't take a reference, but the Windows implementation doesn't allow copying __locale_t +// TODO: Eliminate the need for any of these functions from the headers. // // Locale management // - // namespace __locale { -// using __locale_t = implementation-defined; +// using __locale_t = implementation-defined; // required by the headers // using __lconv_t = implementation-defined; // __locale_t __newlocale(int, const char*, __locale_t); // void__freelocale(__locale_t); @@ -36,6 +40,7 @@ // __lconv_t* __localeconv(__locale_t&); // } // +// // required by the headers // #define _LIBCPP_COLLATE_MASK /* implementation-defined */ // #define _LIBCPP_CTYPE_MASK /* implementation-defined */ // #define _LIBCPP_MONETARY_MASK /* implementation-defined */ @@ -48,6 +53,7 @@ // Strtonum functions // -- // namespace __locale { +// // required by the headers // float __strtof(const char*, char**, __locale_t); // double __strtod(const char*, char**, __locale_t); // long double __strtold(const char*, char**, __locale_t); @@ -60,8 +66,8 @@ // namespace __locale { // int __islower(int, __locale_t); // int __isupper(int, __locale_t); -// int __isdigit(int, __locale_t); -// int __isxdigit(int, __locale_t); +// int __isdigit(int, __locale_t); // required by the headers +// int __isxdigit(int, __locale_t); // required by the headers // int __toupper(int, __locale_t); // int __tolower(int, __locale_t); // int __strcoll(const char*, const char*, __locale_t); @@ -99,9 +105,10 @@ // int __mbtowc(wchar_t*, const char*, size_t, __l
[llvm-branch-commits] [libcxx] 43a04b1 - [libc++] Reduce the dependency of the locale base API on the base system from the headers (#117764)
Author: Louis Dionne Date: 2025-02-21T14:08:41-08:00 New Revision: 43a04b1db60414089bc7f864feb7cd8be7600498 URL: https://github.com/llvm/llvm-project/commit/43a04b1db60414089bc7f864feb7cd8be7600498 DIFF: https://github.com/llvm/llvm-project/commit/43a04b1db60414089bc7f864feb7cd8be7600498.diff LOG: [libc++] Reduce the dependency of the locale base API on the base system from the headers (#117764) Many parts of the locale base API are only required when building the shared/static library, but not from the headers. Document those functions and carve out a few of those that don't work when _XOPEN_SOURCE is defined to something old. Fixes #117630 (cherry picked from commit f00b32e2d0ee666d32f1ddd0c687e269fab95b44) Added: libcxx/test/libcxx/xopen_source.gen.py Modified: libcxx/include/__locale_dir/locale_base_api.h libcxx/include/__locale_dir/support/bsd_like.h libcxx/include/__locale_dir/support/fuchsia.h libcxx/include/__locale_dir/support/no_locale/characters.h libcxx/include/__locale_dir/support/windows.h Removed: diff --git a/libcxx/include/__locale_dir/locale_base_api.h b/libcxx/include/__locale_dir/locale_base_api.h index bbee9f49867fd..c1e73caeecced 100644 --- a/libcxx/include/__locale_dir/locale_base_api.h +++ b/libcxx/include/__locale_dir/locale_base_api.h @@ -23,12 +23,16 @@ // Variadic functions may be implemented as templates with a parameter pack instead // of C-style variadic functions. // +// Most of these functions are only required when building the library. Functions that are also +// required when merely using the headers are marked as such below. +// // TODO: __localeconv shouldn't take a reference, but the Windows implementation doesn't allow copying __locale_t +// TODO: Eliminate the need for any of these functions from the headers. // // Locale management // - // namespace __locale { -// using __locale_t = implementation-defined; +// using __locale_t = implementation-defined; // required by the headers // using __lconv_t = implementation-defined; // __locale_t __newlocale(int, const char*, __locale_t); // void__freelocale(__locale_t); @@ -36,6 +40,7 @@ // __lconv_t* __localeconv(__locale_t&); // } // +// // required by the headers // #define _LIBCPP_COLLATE_MASK /* implementation-defined */ // #define _LIBCPP_CTYPE_MASK /* implementation-defined */ // #define _LIBCPP_MONETARY_MASK /* implementation-defined */ @@ -48,6 +53,7 @@ // Strtonum functions // -- // namespace __locale { +// // required by the headers // float __strtof(const char*, char**, __locale_t); // double __strtod(const char*, char**, __locale_t); // long double __strtold(const char*, char**, __locale_t); @@ -60,8 +66,8 @@ // namespace __locale { // int __islower(int, __locale_t); // int __isupper(int, __locale_t); -// int __isdigit(int, __locale_t); -// int __isxdigit(int, __locale_t); +// int __isdigit(int, __locale_t); // required by the headers +// int __isxdigit(int, __locale_t); // required by the headers // int __toupper(int, __locale_t); // int __tolower(int, __locale_t); // int __strcoll(const char*, const char*, __locale_t); @@ -99,9 +105,10 @@ // int __mbtowc(wchar_t*, const char*, size_t, __locale_t); // size_t __mbrlen(const char*, size_t, mbstate_t*, __locale_t); // size_t __mbsrtowcs(wchar_t*, const char**, size_t, mbstate_t*, __locale_t); -// int __snprintf(char*, size_t, __locale_t, const char*, ...); -// int __asprintf(char**, __locale_t, const char*, ...); -// int __sscanf(const char*, __locale_t, const char*, ...); +// +// int __snprintf(char*, size_t, __locale_t, const char*, ...); // required by the headers +// int __asprintf(char**, __locale_t, const char*, ...);// required by the headers +// int __sscanf(const char*, __locale_t, const char*, ...); // required by the headers // } #if defined(__APPLE__) @@ -143,8 +150,19 @@ namespace __locale { // // Locale management // +# define _LIBCPP_COLLATE_MASK LC_COLLATE_MASK +# define _LIBCPP_CTYPE_MASK LC_CTYPE_MASK +# define _LIBCPP_MONETARY_MASK LC_MONETARY_MASK +# define _LIBCPP_NUMERIC_MASK LC_NUMERIC_MASK +# define _LIBCPP_TIME_MASK LC_TIME_MASK +# define _LIBCPP_MESSAGES_MASK LC_MESSAGES_MASK +# define _LIBCPP_ALL_MASK LC_ALL_MASK +# define _LIBCPP_LC_ALL LC_ALL + using __locale_t _LIBCPP_NODEBUG = locale_t; -using __lconv_t _LIBCPP_NODEBUG = lconv; + +# if defined(_LIBCPP_BUILDING_LIBRARY) +using __lconv_t _LIBCPP_NODEBUG = lconv; inline _LIBCPP_HIDE_FROM_ABI __locale_t __newlocale(int __category_mask, const char* __name, __locale_t __loc) { return newlocale(__category_mask, __name, __loc); @@ -157,15 +175,7 @@ inline _LIBCPP_HIDE_FROM_ABI char* __setlocale(int __category,
[llvm-branch-commits] [libcxx] release/20.x: [libc++] Reduce the dependency of the locale base API on the base system from the headers (#117764) (PR #128009)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/128009 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Fix cross-lane scan when given divergent lanes (#127703) (PR #128085)
github-actions[bot] wrote: @jhuber6 (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/128085 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Fix cross-lane scan when given divergent lanes (#127703) (PR #128085)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/128085 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: Add Wasm, RISC-V, BPF, and NVPTX targets back to Windows release packaging (#127794) (PR #127982)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/127982 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/127918 >From b84ffb9f3b349dd4548a9d3c0ead91021b7905a3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebastian=20Jod=C5=82owski?= Date: Wed, 19 Feb 2025 14:41:07 -0800 Subject: [PATCH] [CUDA] Add support for sm101 and sm120 target architectures (#127187) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. - Co-authored-by: Sebastian Jodlowski (cherry picked from commit 0127f169dc8e0b5b6c2a24f74cd42d9d277916f6) --- clang/include/clang/Basic/BuiltinsNVPTX.td| 8 --- clang/include/clang/Basic/Cuda.h | 4 clang/lib/Basic/Cuda.cpp | 8 +++ clang/lib/Basic/Targets/NVPTX.cpp | 23 +++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp | 4 .../test/Misc/target-invalid-cpu-note/nvptx.c | 4 6 files changed, 43 insertions(+), 8 deletions(-) diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.td b/clang/include/clang/Basic/BuiltinsNVPTX.td index 9d24a992563a4..b550fff8567df 100644 --- a/clang/include/clang/Basic/BuiltinsNVPTX.td +++ b/clang/include/clang/Basic/BuiltinsNVPTX.td @@ -21,12 +21,14 @@ class SM newer_list> : SMFeatures { !strconcat(f, "|", newer.Features)); } +let Features = "sm_120a" in def SM_120a : SMFeatures; +let Features = "sm_101a" in def SM_101a : SMFeatures; let Features = "sm_100a" in def SM_100a : SMFeatures; - -def SM_100 : SM<"100", [SM_100a]>; - let Features = "sm_90a" in def SM_90a : SMFeatures; +def SM_120 : SM<"120", [SM_120a]>; +def SM_101 : SM<"101", [SM_101a, SM_120]>; +def SM_100 : SM<"100", [SM_100a, SM_101]>; def SM_90 : SM<"90", [SM_90a, SM_100]>; def SM_89 : SM<"89", [SM_90]>; def SM_87 : SM<"87", [SM_89]>; diff --git a/clang/include/clang/Basic/Cuda.h b/clang/include/clang/Basic/Cuda.h index f33ba46233a7a..5c909a8e9ca11 100644 --- a/clang/include/clang/Basic/Cuda.h +++ b/clang/include/clang/Basic/Cuda.h @@ -82,6 +82,10 @@ enum class OffloadArch { SM_90a, SM_100, SM_100a, + SM_101, + SM_101a, + SM_120, + SM_120a, GFX600, GFX601, GFX602, diff --git a/clang/lib/Basic/Cuda.cpp b/clang/lib/Basic/Cuda.cpp index 1bfec0b37c5ee..79cac0ec119dd 100644 --- a/clang/lib/Basic/Cuda.cpp +++ b/clang/lib/Basic/Cuda.cpp @@ -100,6 +100,10 @@ static const OffloadArchToStringMap arch_names[] = { SM(90a), // Hopper SM(100), // Blackwell SM(100a),// Blackwell +SM(101), // Blackwell +SM(101a),// Blackwell +SM(120), // Blackwell +SM(120a),// Blackwell GFX(600), // gfx600 GFX(601), // gfx601 GFX(602), // gfx602 @@ -230,6 +234,10 @@ CudaVersion MinVersionForOffloadArch(OffloadArch A) { return CudaVersion::CUDA_120; case OffloadArch::SM_100: case OffloadArch::SM_100a: + case OffloadArch::SM_101: + case OffloadArch::SM_101a: + case OffloadArch::SM_120: + case OffloadArch::SM_120a: return CudaVersion::CUDA_128; default: llvm_unreachable("invalid enum"); diff --git a/clang/lib/Basic/Targets/NVPTX.cpp b/clang/lib/Basic/Targets/NVPTX.cpp index a03f4983b9d03..9be12cbe7ac19 100644 --- a/clang/lib/Basic/Targets/NVPTX.cpp +++ b/clang/lib/Basic/Targets/NVPTX.cpp @@ -176,7 +176,7 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts, if (Opts.CUDAIsDevice || Opts.OpenMPIsTargetDevice || !HostTarget) { // Set __CUDA_ARCH__ for the GPU specified. -std::string CUDAArchCode = [this] { +llvm::StringRef CUDAArchCode = [this] { switch (GPU) { case OffloadArch::GFX600: case OffloadArch::GFX601: @@ -283,14 +283,27 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts, case OffloadArch::SM_100: case OffloadArch::SM_100a: return "1000"; + case OffloadArch::SM_101: + case OffloadArch::SM_101a: + return "1010"; + case OffloadArch::SM_120: + case OffloadArch::SM_120a: + return "1200"; } llvm_unreachable("unhandled OffloadArch"); }(); Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode); -if (GPU == OffloadArch::SM_90a) - Builder.defineMacro("__CUDA_ARCH_FEAT_SM90_ALL", "1"); -if (GPU == OffloadArch::SM_100a) - Builder.defineMacro("__CUDA_ARCH_FEAT_SM100_ALL", "1"); +switch(GPU) { + case OffloadArch::SM_90a: + case OffloadArch::SM_100a: + case OffloadArch::SM_101a: + case OffloadArch::SM_120a: +Builder.defineMacro("__CUDA_ARCH_FEAT_SM" + CUDAArchCode.drop_back() + "_ALL", "1"); +break; + default: +// Do nothing if this is not an enhanced architecture. +break; +} } } diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index c13928f61a748..dc417880a50e9 10064
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] b84ffb9 - [CUDA] Add support for sm101 and sm120 target architectures (#127187)
Author: Sebastian Jodłowski Date: 2025-02-21T14:06:54-08:00 New Revision: b84ffb9f3b349dd4548a9d3c0ead91021b7905a3 URL: https://github.com/llvm/llvm-project/commit/b84ffb9f3b349dd4548a9d3c0ead91021b7905a3 DIFF: https://github.com/llvm/llvm-project/commit/b84ffb9f3b349dd4548a9d3c0ead91021b7905a3.diff LOG: [CUDA] Add support for sm101 and sm120 target architectures (#127187) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. - Co-authored-by: Sebastian Jodlowski (cherry picked from commit 0127f169dc8e0b5b6c2a24f74cd42d9d277916f6) Added: Modified: clang/include/clang/Basic/BuiltinsNVPTX.td clang/include/clang/Basic/Cuda.h clang/lib/Basic/Cuda.cpp clang/lib/Basic/Targets/NVPTX.cpp clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp clang/test/Misc/target-invalid-cpu-note/nvptx.c Removed: diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.td b/clang/include/clang/Basic/BuiltinsNVPTX.td index 9d24a992563a4..b550fff8567df 100644 --- a/clang/include/clang/Basic/BuiltinsNVPTX.td +++ b/clang/include/clang/Basic/BuiltinsNVPTX.td @@ -21,12 +21,14 @@ class SM newer_list> : SMFeatures { !strconcat(f, "|", newer.Features)); } +let Features = "sm_120a" in def SM_120a : SMFeatures; +let Features = "sm_101a" in def SM_101a : SMFeatures; let Features = "sm_100a" in def SM_100a : SMFeatures; - -def SM_100 : SM<"100", [SM_100a]>; - let Features = "sm_90a" in def SM_90a : SMFeatures; +def SM_120 : SM<"120", [SM_120a]>; +def SM_101 : SM<"101", [SM_101a, SM_120]>; +def SM_100 : SM<"100", [SM_100a, SM_101]>; def SM_90 : SM<"90", [SM_90a, SM_100]>; def SM_89 : SM<"89", [SM_90]>; def SM_87 : SM<"87", [SM_89]>; diff --git a/clang/include/clang/Basic/Cuda.h b/clang/include/clang/Basic/Cuda.h index f33ba46233a7a..5c909a8e9ca11 100644 --- a/clang/include/clang/Basic/Cuda.h +++ b/clang/include/clang/Basic/Cuda.h @@ -82,6 +82,10 @@ enum class OffloadArch { SM_90a, SM_100, SM_100a, + SM_101, + SM_101a, + SM_120, + SM_120a, GFX600, GFX601, GFX602, diff --git a/clang/lib/Basic/Cuda.cpp b/clang/lib/Basic/Cuda.cpp index 1bfec0b37c5ee..79cac0ec119dd 100644 --- a/clang/lib/Basic/Cuda.cpp +++ b/clang/lib/Basic/Cuda.cpp @@ -100,6 +100,10 @@ static const OffloadArchToStringMap arch_names[] = { SM(90a), // Hopper SM(100), // Blackwell SM(100a),// Blackwell +SM(101), // Blackwell +SM(101a),// Blackwell +SM(120), // Blackwell +SM(120a),// Blackwell GFX(600), // gfx600 GFX(601), // gfx601 GFX(602), // gfx602 @@ -230,6 +234,10 @@ CudaVersion MinVersionForOffloadArch(OffloadArch A) { return CudaVersion::CUDA_120; case OffloadArch::SM_100: case OffloadArch::SM_100a: + case OffloadArch::SM_101: + case OffloadArch::SM_101a: + case OffloadArch::SM_120: + case OffloadArch::SM_120a: return CudaVersion::CUDA_128; default: llvm_unreachable("invalid enum"); diff --git a/clang/lib/Basic/Targets/NVPTX.cpp b/clang/lib/Basic/Targets/NVPTX.cpp index a03f4983b9d03..9be12cbe7ac19 100644 --- a/clang/lib/Basic/Targets/NVPTX.cpp +++ b/clang/lib/Basic/Targets/NVPTX.cpp @@ -176,7 +176,7 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts, if (Opts.CUDAIsDevice || Opts.OpenMPIsTargetDevice || !HostTarget) { // Set __CUDA_ARCH__ for the GPU specified. -std::string CUDAArchCode = [this] { +llvm::StringRef CUDAArchCode = [this] { switch (GPU) { case OffloadArch::GFX600: case OffloadArch::GFX601: @@ -283,14 +283,27 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts, case OffloadArch::SM_100: case OffloadArch::SM_100a: return "1000"; + case OffloadArch::SM_101: + case OffloadArch::SM_101a: + return "1010"; + case OffloadArch::SM_120: + case OffloadArch::SM_120a: + return "1200"; } llvm_unreachable("unhandled OffloadArch"); }(); Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode); -if (GPU == OffloadArch::SM_90a) - Builder.defineMacro("__CUDA_ARCH_FEAT_SM90_ALL", "1"); -if (GPU == OffloadArch::SM_100a) - Builder.defineMacro("__CUDA_ARCH_FEAT_SM100_ALL", "1"); +switch(GPU) { + case OffloadArch::SM_90a: + case OffloadArch::SM_100a: + case OffloadArch::SM_101a: + case OffloadArch::SM_120a: +Builder.defineMacro("__CUDA_ARCH_FEAT_SM" + CUDAArchCode.drop_back() + "_ALL", "1"); +break; + default: +// Do nothing if this is not an enhanced architecture. +break; +} } } diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index c13928f61a748..dc417
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
github-actions[bot] wrote: @Artem-B (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Fix cross-lane scan when given divergent lanes (#127703) (PR #128085)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/128085 >From e0c4a3397fd2f80740d776de85360dc12cd0bcc7 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Wed, 19 Feb 2025 16:46:59 -0600 Subject: [PATCH] [Clang] Fix cross-lane scan when given divergent lanes (#127703) Summary: The scan operation implemented here only works if there are contiguous ones in the executation mask that can be used to propagate the result. There are two solutions to this, one is to enter 'whole-wave-mode' and forcibly turn them back on, or to do this serially. This implementation does the latter because it's more portable, but checks to see if the parallel fast-path is applicable. Needs to be backported for correct behavior and because it fixes a failing libc test. (cherry picked from commit 6cc7ca084a5bbb7ccf606cab12065604453dde59) --- clang/lib/Headers/gpuintrin.h | 74 --- clang/lib/Headers/nvptxintrin.h | 5 +- .../src/__support/GPU/scan_reduce.cpp | 49 3 files changed, 102 insertions(+), 26 deletions(-) diff --git a/clang/lib/Headers/gpuintrin.h b/clang/lib/Headers/gpuintrin.h index 11c87e85cd497..efdc3d94ac0b3 100644 --- a/clang/lib/Headers/gpuintrin.h +++ b/clang/lib/Headers/gpuintrin.h @@ -150,35 +150,33 @@ __gpu_shuffle_idx_f64(uint64_t __lane_mask, uint32_t __idx, double __x, __builtin_bit_cast(uint64_t, __x), __width)); } -// Gets the sum of all lanes inside the warp or wavefront. -#define __DO_LANE_SUM(__type, __suffix) \ - _DEFAULT_FN_ATTRS static __inline__ __type __gpu_lane_sum_##__suffix( \ - uint64_t __lane_mask, __type __x) { \ -for (uint32_t __step = __gpu_num_lanes() / 2; __step > 0; __step /= 2) { \ - uint32_t __index = __step + __gpu_lane_id(); \ - __x += __gpu_shuffle_idx_##__suffix(__lane_mask, __index, __x, \ - __gpu_num_lanes()); \ -} \ -return __gpu_read_first_lane_##__suffix(__lane_mask, __x); \ - } -__DO_LANE_SUM(uint32_t, u32); // uint32_t __gpu_lane_sum_u32(m, x) -__DO_LANE_SUM(uint64_t, u64); // uint64_t __gpu_lane_sum_u64(m, x) -__DO_LANE_SUM(float, f32);// float __gpu_lane_sum_f32(m, x) -__DO_LANE_SUM(double, f64); // double __gpu_lane_sum_f64(m, x) -#undef __DO_LANE_SUM - // Gets the accumulator scan of the threads in the warp or wavefront. #define __DO_LANE_SCAN(__type, __bitmask_type, __suffix) \ _DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_lane_scan_##__suffix( \ uint64_t __lane_mask, uint32_t __x) { \ -for (uint32_t __step = 1; __step < __gpu_num_lanes(); __step *= 2) { \ - uint32_t __index = __gpu_lane_id() - __step; \ - __bitmask_type bitmask = __gpu_lane_id() >= __step; \ - __x += __builtin_bit_cast( \ - __type, -bitmask & __builtin_bit_cast(__bitmask_type, \ -__gpu_shuffle_idx_##__suffix( \ -__lane_mask, __index, __x, \ -__gpu_num_lanes(; \ +uint64_t __first = __lane_mask >> __builtin_ctzll(__lane_mask); \ +bool __divergent = __gpu_read_first_lane_##__suffix( \ +__lane_mask, __first & (__first + 1)); \ +if (__divergent) { \ + __type __accum = 0; \ + for (uint64_t __mask = __lane_mask; __mask; __mask &= __mask - 1) { \ +__type __index = __builtin_ctzll(__mask); \ +__type __tmp = __gpu_shuffle_idx_##__suffix(__lane_mask, __index, __x, \ +__gpu_num_lanes()); \ +__x = __gpu_lane_id() == __index ? __accum + __tmp : __x; \ +__accum += __tmp; \ + } \ +} else { \ + for (uint32_t __step = 1; __step < __gpu_num_lanes(); __step *= 2) { \ +uint32_t __index = __gpu_lane_id() - __step; \ +__bitmask_type bitmask = __gpu_lane_id() >= __step; \ +__x += __builtin_bit_cast( \ +__type,
[llvm-branch-commits] [libc] e0c4a33 - [Clang] Fix cross-lane scan when given divergent lanes (#127703)
Author: Joseph Huber Date: 2025-02-21T14:10:17-08:00 New Revision: e0c4a3397fd2f80740d776de85360dc12cd0bcc7 URL: https://github.com/llvm/llvm-project/commit/e0c4a3397fd2f80740d776de85360dc12cd0bcc7 DIFF: https://github.com/llvm/llvm-project/commit/e0c4a3397fd2f80740d776de85360dc12cd0bcc7.diff LOG: [Clang] Fix cross-lane scan when given divergent lanes (#127703) Summary: The scan operation implemented here only works if there are contiguous ones in the executation mask that can be used to propagate the result. There are two solutions to this, one is to enter 'whole-wave-mode' and forcibly turn them back on, or to do this serially. This implementation does the latter because it's more portable, but checks to see if the parallel fast-path is applicable. Needs to be backported for correct behavior and because it fixes a failing libc test. (cherry picked from commit 6cc7ca084a5bbb7ccf606cab12065604453dde59) Added: Modified: clang/lib/Headers/gpuintrin.h clang/lib/Headers/nvptxintrin.h libc/test/integration/src/__support/GPU/scan_reduce.cpp Removed: diff --git a/clang/lib/Headers/gpuintrin.h b/clang/lib/Headers/gpuintrin.h index 11c87e85cd497..efdc3d94ac0b3 100644 --- a/clang/lib/Headers/gpuintrin.h +++ b/clang/lib/Headers/gpuintrin.h @@ -150,35 +150,33 @@ __gpu_shuffle_idx_f64(uint64_t __lane_mask, uint32_t __idx, double __x, __builtin_bit_cast(uint64_t, __x), __width)); } -// Gets the sum of all lanes inside the warp or wavefront. -#define __DO_LANE_SUM(__type, __suffix) \ - _DEFAULT_FN_ATTRS static __inline__ __type __gpu_lane_sum_##__suffix( \ - uint64_t __lane_mask, __type __x) { \ -for (uint32_t __step = __gpu_num_lanes() / 2; __step > 0; __step /= 2) { \ - uint32_t __index = __step + __gpu_lane_id(); \ - __x += __gpu_shuffle_idx_##__suffix(__lane_mask, __index, __x, \ - __gpu_num_lanes()); \ -} \ -return __gpu_read_first_lane_##__suffix(__lane_mask, __x); \ - } -__DO_LANE_SUM(uint32_t, u32); // uint32_t __gpu_lane_sum_u32(m, x) -__DO_LANE_SUM(uint64_t, u64); // uint64_t __gpu_lane_sum_u64(m, x) -__DO_LANE_SUM(float, f32);// float __gpu_lane_sum_f32(m, x) -__DO_LANE_SUM(double, f64); // double __gpu_lane_sum_f64(m, x) -#undef __DO_LANE_SUM - // Gets the accumulator scan of the threads in the warp or wavefront. #define __DO_LANE_SCAN(__type, __bitmask_type, __suffix) \ _DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_lane_scan_##__suffix( \ uint64_t __lane_mask, uint32_t __x) { \ -for (uint32_t __step = 1; __step < __gpu_num_lanes(); __step *= 2) { \ - uint32_t __index = __gpu_lane_id() - __step; \ - __bitmask_type bitmask = __gpu_lane_id() >= __step; \ - __x += __builtin_bit_cast( \ - __type, -bitmask & __builtin_bit_cast(__bitmask_type, \ -__gpu_shuffle_idx_##__suffix( \ -__lane_mask, __index, __x, \ -__gpu_num_lanes(; \ +uint64_t __first = __lane_mask >> __builtin_ctzll(__lane_mask); \ +bool __divergent = __gpu_read_first_lane_##__suffix( \ +__lane_mask, __first & (__first + 1)); \ +if (__divergent) { \ + __type __accum = 0; \ + for (uint64_t __mask = __lane_mask; __mask; __mask &= __mask - 1) { \ +__type __index = __builtin_ctzll(__mask); \ +__type __tmp = __gpu_shuffle_idx_##__suffix(__lane_mask, __index, __x, \ +__gpu_num_lanes()); \ +__x = __gpu_lane_id() == __index ? __accum + __tmp : __x; \ +__accum += __tmp; \ + } \ +} else { \ + for (uint32_t __step = 1; __step < __gpu_num_lanes(); __step *= 2) { \ +uint32_t __index = __gpu_lane_id() - __step; \ +__bitmask_type bitmask = __gpu_lane_id() >= __step; \ +__x += __builtin_bit_cast(
[llvm-branch-commits] [libcxx] release/20.x: [libc++] Reduce the dependency of the locale base API on the base system from the headers (#117764) (PR #128009)
github-actions[bot] wrote: @ldionne (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/128009 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] b727a13 - Add Wasm, RISC-V, BPF, and NVPTX targets back to Windows release packaging (#127794)
Author: Hans Wennborg Date: 2025-02-21T14:03:27-08:00 New Revision: b727a13fecc4e29b6f8499afd95626795c9f6a8e URL: https://github.com/llvm/llvm-project/commit/b727a13fecc4e29b6f8499afd95626795c9f6a8e DIFF: https://github.com/llvm/llvm-project/commit/b727a13fecc4e29b6f8499afd95626795c9f6a8e.diff LOG: Add Wasm, RISC-V, BPF, and NVPTX targets back to Windows release packaging (#127794) In #106059 we reduced the targets to those supported by Windows (X86 and ARM) to avoid running into size limitations of the NSIS compiler. Since then, people complained about the lack of Wasm [1], RISC-V [2], BPF [3], and NVPTX [4]. These do seem to fit in the installer (at least for 20.1.0-rc2), so let's add them back. [1] https://discourse.llvm.org/t/llvm-19-x-release-third-party-binaries/80374/26 [2] https://discourse.llvm.org/t/llvm-19-x-release-third-party-binaries/80374/53 [3] https://github.com/llvm/llvm-project/issues/127120 [4] https://github.com/llvm/llvm-project/pull/127794#issuecomment-2668677203 (cherry picked from commit 6e047a5ab42698165a4746ef681396fab1698327) Added: Modified: llvm/utils/release/build_llvm_release.bat Removed: diff --git a/llvm/utils/release/build_llvm_release.bat b/llvm/utils/release/build_llvm_release.bat index dd041d7d384ec..1c30673cf88bd 100755 --- a/llvm/utils/release/build_llvm_release.bat +++ b/llvm/utils/release/build_llvm_release.bat @@ -150,7 +150,7 @@ set common_cmake_flags=^ -DCMAKE_BUILD_TYPE=Release ^ -DLLVM_ENABLE_ASSERTIONS=OFF ^ -DLLVM_INSTALL_TOOLCHAIN_ONLY=ON ^ - -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86" ^ + -DLLVM_TARGETS_TO_BUILD="AArch64;ARM;X86;BPF;WebAssembly;RISCV;NVPTX" ^ -DLLVM_BUILD_LLVM_C_DYLIB=ON ^ -DCMAKE_INSTALL_UCRT_LIBRARIES=ON ^ -DPython3_FIND_REGISTRY=NEVER ^ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121) (PR #128132)
https://github.com/llvmbot updated https://github.com/llvm/llvm-project/pull/128132 >From e6d4fd035fdf90348fbeba6e73f90feb6e66b30b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 21 Feb 2025 12:08:49 +0700 Subject: [PATCH] AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121) Unfortunately we only have the vector versions of v2f16 minimum3 and maximum. Widen to v2f16 so we can lower as minimum333(x, y, y). (cherry picked from commit e729dc759d052de122c8a918fe51b05ac796bb50) --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp| 40 +- llvm/lib/Target/AMDGPU/SIISelLowering.h | 1 + llvm/test/CodeGen/AMDGPU/fmaximum3.ll| 689 --- llvm/test/CodeGen/AMDGPU/fminimum3.ll| 689 --- llvm/test/CodeGen/AMDGPU/llvm.maximum.f16.ll | 66 +- llvm/test/CodeGen/AMDGPU/llvm.minimum.f16.ll | 66 +- 6 files changed, 966 insertions(+), 585 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index e09df53995d61..d45ae7398e25d 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -869,8 +869,13 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM, if (Subtarget->hasMinimum3Maximum3F32()) setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f32, Legal); -if (Subtarget->hasMinimum3Maximum3PKF16()) +if (Subtarget->hasMinimum3Maximum3PKF16()) { setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::v2f16, Legal); + + // If only the vector form is available, we need to widen to a vector. + if (!Subtarget->hasMinimum3Maximum3F16()) +setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f16, Custom); +} } setOperationAction(ISD::INTRINSIC_WO_CHAIN, @@ -5963,6 +5968,9 @@ SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { case ISD::FMINNUM: case ISD::FMAXNUM: return lowerFMINNUM_FMAXNUM(Op, DAG); + case ISD::FMINIMUM: + case ISD::FMAXIMUM: +return lowerFMINIMUM_FMAXIMUM(Op, DAG); case ISD::FLDEXP: case ISD::STRICT_FLDEXP: return lowerFLDEXP(Op, DAG); @@ -5984,8 +5992,6 @@ SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { case ISD::FMUL: case ISD::FMINNUM_IEEE: case ISD::FMAXNUM_IEEE: - case ISD::FMINIMUM: - case ISD::FMAXIMUM: case ISD::FMINIMUMNUM: case ISD::FMAXIMUMNUM: case ISD::UADDSAT: @@ -6840,6 +6846,34 @@ SDValue SITargetLowering::lowerFMINNUM_FMAXNUM(SDValue Op, return Op; } +SDValue SITargetLowering::lowerFMINIMUM_FMAXIMUM(SDValue Op, + SelectionDAG &DAG) const { + EVT VT = Op.getValueType(); + if (VT.isVector()) +return splitBinaryVectorOp(Op, DAG); + + assert(!Subtarget->hasIEEEMinMax() && !Subtarget->hasMinimum3Maximum3F16() && + Subtarget->hasMinimum3Maximum3PKF16() && VT == MVT::f16 && + "should not need to widen f16 minimum/maximum to v2f16"); + + // Widen f16 operation to v2f16 + + // fminimum f16:x, f16:y -> + // extract_vector_elt (fminimum (v2f16 (scalar_to_vector x)) + //(v2f16 (scalar_to_vector y))), 0 + SDLoc SL(Op); + SDValue WideSrc0 = + DAG.getNode(ISD::SCALAR_TO_VECTOR, SL, MVT::v2f16, Op.getOperand(0)); + SDValue WideSrc1 = + DAG.getNode(ISD::SCALAR_TO_VECTOR, SL, MVT::v2f16, Op.getOperand(1)); + + SDValue Widened = + DAG.getNode(Op.getOpcode(), SL, MVT::v2f16, WideSrc0, WideSrc1); + + return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::f16, Widened, + DAG.getConstant(0, SL, MVT::i32)); +} + SDValue SITargetLowering::lowerFLDEXP(SDValue Op, SelectionDAG &DAG) const { bool IsStrict = Op.getOpcode() == ISD::STRICT_FLDEXP; EVT VT = Op.getValueType(); diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h index 1cd7f1b29e077..9b2c14862407a 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.h +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h @@ -146,6 +146,7 @@ class SITargetLowering final : public AMDGPUTargetLowering { /// Custom lowering for ISD::FP_ROUND for MVT::f16. SDValue lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const; SDValue lowerFMINNUM_FMAXNUM(SDValue Op, SelectionDAG &DAG) const; + SDValue lowerFMINIMUM_FMAXIMUM(SDValue Op, SelectionDAG &DAG) const; SDValue lowerFLDEXP(SDValue Op, SelectionDAG &DAG) const; SDValue promoteUniformOpToI32(SDValue Op, DAGCombinerInfo &DCI) const; SDValue lowerMUL(SDValue Op, SelectionDAG &DAG) const; diff --git a/llvm/test/CodeGen/AMDGPU/fmaximum3.ll b/llvm/test/CodeGen/AMDGPU/fmaximum3.ll index f0fa621e3b4bc..6724c37605eb4 100644 --- a/llvm/test/CodeGen/AMDGPU/fmaximum3.ll +++ b/llvm/test/CodeGen/AMDGPU/fmaximum3.ll @@ -1251,19 +1251,27 @@ define half @v_fmaximum3_f16(half %a, half %b, half %c) { ; GFX12-NEXT:v_maximum3_f16 v0, v0, v1, v2 ; GFX12-NEXT:s_setpc_b64 s[30:3
[llvm-branch-commits] [llvm] e6d4fd0 - AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121)
Author: Matt Arsenault Date: 2025-02-21T14:14:20-08:00 New Revision: e6d4fd035fdf90348fbeba6e73f90feb6e66b30b URL: https://github.com/llvm/llvm-project/commit/e6d4fd035fdf90348fbeba6e73f90feb6e66b30b DIFF: https://github.com/llvm/llvm-project/commit/e6d4fd035fdf90348fbeba6e73f90feb6e66b30b.diff LOG: AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121) Unfortunately we only have the vector versions of v2f16 minimum3 and maximum. Widen to v2f16 so we can lower as minimum333(x, y, y). (cherry picked from commit e729dc759d052de122c8a918fe51b05ac796bb50) Added: Modified: llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/test/CodeGen/AMDGPU/fmaximum3.ll llvm/test/CodeGen/AMDGPU/fminimum3.ll llvm/test/CodeGen/AMDGPU/llvm.maximum.f16.ll llvm/test/CodeGen/AMDGPU/llvm.minimum.f16.ll Removed: diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index e09df53995d61..d45ae7398e25d 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -869,8 +869,13 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM, if (Subtarget->hasMinimum3Maximum3F32()) setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f32, Legal); -if (Subtarget->hasMinimum3Maximum3PKF16()) +if (Subtarget->hasMinimum3Maximum3PKF16()) { setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::v2f16, Legal); + + // If only the vector form is available, we need to widen to a vector. + if (!Subtarget->hasMinimum3Maximum3F16()) +setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f16, Custom); +} } setOperationAction(ISD::INTRINSIC_WO_CHAIN, @@ -5963,6 +5968,9 @@ SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { case ISD::FMINNUM: case ISD::FMAXNUM: return lowerFMINNUM_FMAXNUM(Op, DAG); + case ISD::FMINIMUM: + case ISD::FMAXIMUM: +return lowerFMINIMUM_FMAXIMUM(Op, DAG); case ISD::FLDEXP: case ISD::STRICT_FLDEXP: return lowerFLDEXP(Op, DAG); @@ -5984,8 +5992,6 @@ SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { case ISD::FMUL: case ISD::FMINNUM_IEEE: case ISD::FMAXNUM_IEEE: - case ISD::FMINIMUM: - case ISD::FMAXIMUM: case ISD::FMINIMUMNUM: case ISD::FMAXIMUMNUM: case ISD::UADDSAT: @@ -6840,6 +6846,34 @@ SDValue SITargetLowering::lowerFMINNUM_FMAXNUM(SDValue Op, return Op; } +SDValue SITargetLowering::lowerFMINIMUM_FMAXIMUM(SDValue Op, + SelectionDAG &DAG) const { + EVT VT = Op.getValueType(); + if (VT.isVector()) +return splitBinaryVectorOp(Op, DAG); + + assert(!Subtarget->hasIEEEMinMax() && !Subtarget->hasMinimum3Maximum3F16() && + Subtarget->hasMinimum3Maximum3PKF16() && VT == MVT::f16 && + "should not need to widen f16 minimum/maximum to v2f16"); + + // Widen f16 operation to v2f16 + + // fminimum f16:x, f16:y -> + // extract_vector_elt (fminimum (v2f16 (scalar_to_vector x)) + //(v2f16 (scalar_to_vector y))), 0 + SDLoc SL(Op); + SDValue WideSrc0 = + DAG.getNode(ISD::SCALAR_TO_VECTOR, SL, MVT::v2f16, Op.getOperand(0)); + SDValue WideSrc1 = + DAG.getNode(ISD::SCALAR_TO_VECTOR, SL, MVT::v2f16, Op.getOperand(1)); + + SDValue Widened = + DAG.getNode(Op.getOpcode(), SL, MVT::v2f16, WideSrc0, WideSrc1); + + return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::f16, Widened, + DAG.getConstant(0, SL, MVT::i32)); +} + SDValue SITargetLowering::lowerFLDEXP(SDValue Op, SelectionDAG &DAG) const { bool IsStrict = Op.getOpcode() == ISD::STRICT_FLDEXP; EVT VT = Op.getValueType(); diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h index 1cd7f1b29e077..9b2c14862407a 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.h +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h @@ -146,6 +146,7 @@ class SITargetLowering final : public AMDGPUTargetLowering { /// Custom lowering for ISD::FP_ROUND for MVT::f16. SDValue lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const; SDValue lowerFMINNUM_FMAXNUM(SDValue Op, SelectionDAG &DAG) const; + SDValue lowerFMINIMUM_FMAXIMUM(SDValue Op, SelectionDAG &DAG) const; SDValue lowerFLDEXP(SDValue Op, SelectionDAG &DAG) const; SDValue promoteUniformOpToI32(SDValue Op, DAGCombinerInfo &DCI) const; SDValue lowerMUL(SDValue Op, SelectionDAG &DAG) const; diff --git a/llvm/test/CodeGen/AMDGPU/fmaximum3.ll b/llvm/test/CodeGen/AMDGPU/fmaximum3.ll index f0fa621e3b4bc..6724c37605eb4 100644 --- a/llvm/test/CodeGen/AMDGPU/fmaximum3.ll +++ b/llvm/test/CodeGen/AMDGPU/fmaximum3.ll @@ -1251,19 +1251,27 @@ define half @v_fmaximum3_f16(half %a, half %b, half %c) { ; GFX12-NEXT:v_maximum3_f16 v0
[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121) (PR #128132)
github-actions[bot] wrote: @arsenm (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/128132 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CUDA] Add support for sm101 and sm120 target architectures (#127187) (PR #127918)
Artem-B wrote: ``` # CUDA - Clang now supports CUDA compilation with CUDA SDK up to v12.8 - Clang can now target sm_100, sm_101, and sm_120 GPUs (Blackwell) ``` https://github.com/llvm/llvm-project/pull/127918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121) (PR #128132)
https://github.com/tstellar closed https://github.com/llvm/llvm-project/pull/128132 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [RISCV] [MachineOutliner] Analyze all candidates (#127659) (PR #128146)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/128146 Backport 6757cf4 Requested by: @svs-quic >From 8f71b2383f1da600e396fbc912795362659adba1 Mon Sep 17 00:00:00 2001 From: Sudharsan Veeravalli Date: Fri, 21 Feb 2025 12:53:13 +0530 Subject: [PATCH] [RISCV] [MachineOutliner] Analyze all candidates (#127659) #117700 made a change from analyzing all the candidates to analyzing just the first candidate before deciding to either delete or keep all of them. Even though the candidates all have the same instructions, the basic blocks in which they are present are different and we will need to check each of them before deciding whether to keep or erase them. Particularly, `isAvailableAcrossAndOutOfSeq` checks to see if the register (x5 in this case) is available from the end of the MBB to the beginning of the candidate and not checking this for each candidate led to incorrect candidates being outlined resulting in correctness issues in a few downstream benchmarks. Similarly, deleting all the candidates if the first one is not viable will result in missed outlining opportunities. (cherry picked from commit 6757cf4e6f1c7767d605e579930a24758c0778dc) --- llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | 52 +++ .../machine-outliner-call-x5-liveout.mir | 136 ++ 2 files changed, 158 insertions(+), 30 deletions(-) create mode 100644 llvm/test/CodeGen/RISCV/machine-outliner-call-x5-liveout.mir diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp index 12a7af0750813..87f1f35835cbe 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp +++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp @@ -3010,30 +3010,25 @@ static bool cannotInsertTailCall(const MachineBasicBlock &MBB) { return false; } -static std::optional -analyzeCandidate(outliner::Candidate &C) { +static bool analyzeCandidate(outliner::Candidate &C) { // If last instruction is return then we can rely on // the verification already performed in the getOutliningTypeImpl. if (C.back().isReturn()) { assert(!cannotInsertTailCall(*C.getMBB()) && "The candidate who uses return instruction must be outlined " "using tail call"); -return MachineOutlinerTailCall; +return false; } - auto CandidateUsesX5 = [](outliner::Candidate &C) { -const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo(); -if (std::any_of(C.begin(), C.end(), [TRI](const MachineInstr &MI) { - return isMIModifiesReg(MI, TRI, RISCV::X5); -})) - return true; -return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI); - }; - - if (!CandidateUsesX5(C)) -return MachineOutlinerDefault; + // Filter out candidates where the X5 register (t0) can't be used to setup + // the function call. + const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo(); + if (std::any_of(C.begin(), C.end(), [TRI](const MachineInstr &MI) { +return isMIModifiesReg(MI, TRI, RISCV::X5); + })) +return true; - return std::nullopt; + return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI); } std::optional> @@ -3042,35 +3037,32 @@ RISCVInstrInfo::getOutliningCandidateInfo( std::vector &RepeatedSequenceLocs, unsigned MinRepeats) const { - // Each RepeatedSequenceLoc is identical. - outliner::Candidate &Candidate = RepeatedSequenceLocs[0]; - auto CandidateInfo = analyzeCandidate(Candidate); - if (!CandidateInfo) -RepeatedSequenceLocs.clear(); + // Analyze each candidate and erase the ones that are not viable. + llvm::erase_if(RepeatedSequenceLocs, analyzeCandidate); // If the sequence doesn't have enough candidates left, then we're done. if (RepeatedSequenceLocs.size() < MinRepeats) return std::nullopt; + // Each RepeatedSequenceLoc is identical. + outliner::Candidate &Candidate = RepeatedSequenceLocs[0]; unsigned InstrSizeCExt = Candidate.getMF()->getSubtarget().hasStdExtCOrZca() ? 2 : 4; unsigned CallOverhead = 0, FrameOverhead = 0; - MachineOutlinerConstructionID MOCI = CandidateInfo.value(); - switch (MOCI) { - case MachineOutlinerDefault: -// call t0, function = 8 bytes. -CallOverhead = 8; -// jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled. -FrameOverhead = InstrSizeCExt; -break; - case MachineOutlinerTailCall: + MachineOutlinerConstructionID MOCI = MachineOutlinerDefault; + if (Candidate.back().isReturn()) { +MOCI = MachineOutlinerTailCall; // tail call = auipc + jalr in the worst case without linker relaxation. CallOverhead = 4 + InstrSizeCExt; // Using tail call we move ret instruction from caller to callee. FrameOverhead = 0; -break; + } else { +// call t0, function = 8 bytes. +CallOverhead = 8; +// jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled. +
[llvm-branch-commits] [llvm] release/20.x: [RISCV] [MachineOutliner] Analyze all candidates (#127659) (PR #128146)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/128146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [RISCV] [MachineOutliner] Analyze all candidates (#127659) (PR #128146)
llvmbot wrote: @wangpc-pp What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/128146 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [RISCV] [MachineOutliner] Analyze all candidates (#127659) (PR #128146)
llvmbot wrote: @llvm/pr-subscribers-backend-risc-v Author: None (llvmbot) Changes Backport 6757cf4 Requested by: @svs-quic --- Full diff: https://github.com/llvm/llvm-project/pull/128146.diff 2 Files Affected: - (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+22-30) - (added) llvm/test/CodeGen/RISCV/machine-outliner-call-x5-liveout.mir (+136) ``diff diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp index 12a7af0750813..87f1f35835cbe 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp +++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp @@ -3010,30 +3010,25 @@ static bool cannotInsertTailCall(const MachineBasicBlock &MBB) { return false; } -static std::optional -analyzeCandidate(outliner::Candidate &C) { +static bool analyzeCandidate(outliner::Candidate &C) { // If last instruction is return then we can rely on // the verification already performed in the getOutliningTypeImpl. if (C.back().isReturn()) { assert(!cannotInsertTailCall(*C.getMBB()) && "The candidate who uses return instruction must be outlined " "using tail call"); -return MachineOutlinerTailCall; +return false; } - auto CandidateUsesX5 = [](outliner::Candidate &C) { -const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo(); -if (std::any_of(C.begin(), C.end(), [TRI](const MachineInstr &MI) { - return isMIModifiesReg(MI, TRI, RISCV::X5); -})) - return true; -return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI); - }; - - if (!CandidateUsesX5(C)) -return MachineOutlinerDefault; + // Filter out candidates where the X5 register (t0) can't be used to setup + // the function call. + const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo(); + if (std::any_of(C.begin(), C.end(), [TRI](const MachineInstr &MI) { +return isMIModifiesReg(MI, TRI, RISCV::X5); + })) +return true; - return std::nullopt; + return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI); } std::optional> @@ -3042,35 +3037,32 @@ RISCVInstrInfo::getOutliningCandidateInfo( std::vector &RepeatedSequenceLocs, unsigned MinRepeats) const { - // Each RepeatedSequenceLoc is identical. - outliner::Candidate &Candidate = RepeatedSequenceLocs[0]; - auto CandidateInfo = analyzeCandidate(Candidate); - if (!CandidateInfo) -RepeatedSequenceLocs.clear(); + // Analyze each candidate and erase the ones that are not viable. + llvm::erase_if(RepeatedSequenceLocs, analyzeCandidate); // If the sequence doesn't have enough candidates left, then we're done. if (RepeatedSequenceLocs.size() < MinRepeats) return std::nullopt; + // Each RepeatedSequenceLoc is identical. + outliner::Candidate &Candidate = RepeatedSequenceLocs[0]; unsigned InstrSizeCExt = Candidate.getMF()->getSubtarget().hasStdExtCOrZca() ? 2 : 4; unsigned CallOverhead = 0, FrameOverhead = 0; - MachineOutlinerConstructionID MOCI = CandidateInfo.value(); - switch (MOCI) { - case MachineOutlinerDefault: -// call t0, function = 8 bytes. -CallOverhead = 8; -// jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled. -FrameOverhead = InstrSizeCExt; -break; - case MachineOutlinerTailCall: + MachineOutlinerConstructionID MOCI = MachineOutlinerDefault; + if (Candidate.back().isReturn()) { +MOCI = MachineOutlinerTailCall; // tail call = auipc + jalr in the worst case without linker relaxation. CallOverhead = 4 + InstrSizeCExt; // Using tail call we move ret instruction from caller to callee. FrameOverhead = 0; -break; + } else { +// call t0, function = 8 bytes. +CallOverhead = 8; +// jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled. +FrameOverhead = InstrSizeCExt; } for (auto &C : RepeatedSequenceLocs) diff --git a/llvm/test/CodeGen/RISCV/machine-outliner-call-x5-liveout.mir b/llvm/test/CodeGen/RISCV/machine-outliner-call-x5-liveout.mir new file mode 100644 index 0..f7bea33e52885 --- /dev/null +++ b/llvm/test/CodeGen/RISCV/machine-outliner-call-x5-liveout.mir @@ -0,0 +1,136 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 +# RUN: llc -mtriple=riscv32 -x mir -run-pass=machine-outliner -simplify-mir -verify-machineinstrs < %s \ +# RUN: | FileCheck -check-prefixes=RV32I-MO %s +# RUN: llc -mtriple=riscv64 -x mir -run-pass=machine-outliner -simplify-mir -verify-machineinstrs < %s \ +# RUN: | FileCheck -check-prefixes=RV64I-MO %s + +# MIR has been edited by hand to have x5 as live out in @dont_outline + +--- + +name:outline_0 +tracksRegLiveness: true +isOutlined: false +body: | + bb.0: +liveins: $x10, $x11 + +; RV32I-MO-LABEL: name: outline_0 +; RV32I-MO: liveins: $x10, $x11 +; RV32I-M
[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)
@@ -102,6 +105,47 @@ mlir::Operation *findLoopIterationVarMemDecl(fir::DoLoopOp doLoop) { return result.getDefiningOp(); } +/// Collects the op(s) responsible for updating a loop's iteration variable with +/// the current iteration number. For example, for the input IR: +/// ``` +/// %i = fir.alloca i32 {bindc_name = "i"} +/// %i_decl:2 = hlfir.declare %i ... +/// ... +/// fir.do_loop %i_iv = %lb to %ub step %step unordered { +/// %1 = fir.convert %i_iv : (index) -> i32 +/// fir.store %1 to %i_decl#1 : !fir.ref +/// ... +/// } +/// ``` +/// this function would return the first 2 ops in the `fir.do_loop`'s region. +llvm::SetVector +extractIndVarUpdateOps(fir::DoLoopOp doLoop) { + mlir::Value indVar = doLoop.getInductionVar(); + llvm::SetVector indVarUpdateOps; + + llvm::SmallVector toProcess; + toProcess.push_back(indVar); + + llvm::DenseSet done; + + while (!toProcess.empty()) { +mlir::Value val = toProcess.back(); +toProcess.pop_back(); + +if (!done.insert(val).second) + continue; + +for (mlir::Operation *user : val.getUsers()) { + indVarUpdateOps.insert(user); + + for (mlir::Value result : user->getResults()) +toProcess.push_back(result); +} + } + + return std::move(indVarUpdateOps); skatrak wrote: Returning containers goes a bit against general recommendations, but if you prefer to keep this approach rather than populating an output `SmallVectorImpl &` argument with help of `llvm::is_contained()` (which is what `SetVector` does for small vectors), I'd suggest considering the following: ```suggestion return std::move(indVarUpdateOps.takeVector()); ``` That way, at least we don't leak implementation details of the structure we used to avoid duplicates. https://github.com/llvm/llvm-project/pull/127634 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)
https://github.com/skatrak edited https://github.com/llvm/llvm-project/pull/127634 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)
@@ -102,6 +105,47 @@ mlir::Operation *findLoopIterationVarMemDecl(fir::DoLoopOp doLoop) { return result.getDefiningOp(); } +/// Collects the op(s) responsible for updating a loop's iteration variable with +/// the current iteration number. For example, for the input IR: skatrak wrote: This function seems to do something more generic than that: it collects all of the ops that either take the loop's induction variable as argument or take a value as argument that has been calculated based on the result of another operation that directly or indirectly took the loop's induction variable as argument. I guess that, similarly to another comment I left at a previous PR in the stack https://github.com/llvm/llvm-project/pull/127633#discussion_r1963571510, it's doing something more general than it states. If, like the other case, the idea is to just store the associated `fir.convert` and `fir.store` operations, perhaps it makes more sense to match that pattern specifically. https://github.com/llvm/llvm-project/pull/127634 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC (PR #127820)
https://github.com/Meinersbur approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/127820 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Allow host evaluation of loop bounds for distribute (PR #127822)
https://github.com/Meinersbur approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/127822 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [OpenMPIRBuilder] Add support for distribute-parallel-for/do constructs (PR #127818)
https://github.com/Meinersbur approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/127818 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/20.x: [CSKY] Default to unsigned char (PR #126436)
https://github.com/AaronBallman approved this pull request. LGTM though this definitely needs a release note so that anyone relying on the old behavior has some amount of notice (and a suggestion as to how to get the old behavior back). https://github.com/llvm/llvm-project/pull/126436 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)
@@ -102,6 +105,47 @@ mlir::Operation *findLoopIterationVarMemDecl(fir::DoLoopOp doLoop) { return result.getDefiningOp(); } +/// Collects the op(s) responsible for updating a loop's iteration variable with +/// the current iteration number. For example, for the input IR: +/// ``` +/// %i = fir.alloca i32 {bindc_name = "i"} +/// %i_decl:2 = hlfir.declare %i ... +/// ... +/// fir.do_loop %i_iv = %lb to %ub step %step unordered { +/// %1 = fir.convert %i_iv : (index) -> i32 +/// fir.store %1 to %i_decl#1 : !fir.ref +/// ... +/// } +/// ``` +/// this function would return the first 2 ops in the `fir.do_loop`'s region. +llvm::SetVector +extractIndVarUpdateOps(fir::DoLoopOp doLoop) { + mlir::Value indVar = doLoop.getInductionVar(); + llvm::SetVector indVarUpdateOps; + + llvm::SmallVector toProcess; + toProcess.push_back(indVar); + + llvm::DenseSet done; + + while (!toProcess.empty()) { +mlir::Value val = toProcess.back(); +toProcess.pop_back(); + +if (!done.insert(val).second) + continue; skatrak wrote: If I understand it correctly, this set could potentially be avoided if we checked below whether `indVarUpdateOps.insert(user)` actually inserted something before adding its results to the processing list. https://github.com/llvm/llvm-project/pull/127634 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)
@@ -30,6 +30,9 @@ namespace looputils { struct InductionVariableInfo { /// the operation allocating memory for iteration variable, mlir::Operation *iterVarMemDef; + /// the operation(s) updating the iteration variable with the current + /// iteration number. + llvm::SetVector indVarUpdateOps; skatrak wrote: Is there any reason why this has to be a set? It seems like an implementation detail that leaked out of `extractIndVarUpdateOps`. https://github.com/llvm/llvm-project/pull/127634 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)
https://github.com/skatrak commented: Thank you Kareem, some small comments from me. https://github.com/llvm/llvm-project/pull/127634 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Map simple `do concurrent` loops to OpenMP host constructs (PR #127633)
@@ -152,26 +231,136 @@ class DoConcurrentConversion : public mlir::OpConversionPattern { public: using mlir::OpConversionPattern::OpConversionPattern; - DoConcurrentConversion(mlir::MLIRContext *context, bool mapToDevice) - : OpConversionPattern(context), mapToDevice(mapToDevice) {} + DoConcurrentConversion(mlir::MLIRContext *context, bool mapToDevice, + llvm::DenseSet &concurrentLoopsToSkip) + : OpConversionPattern(context), mapToDevice(mapToDevice), +concurrentLoopsToSkip(concurrentLoopsToSkip) {} mlir::LogicalResult matchAndRewrite(fir::DoLoopOp doLoop, OpAdaptor adaptor, mlir::ConversionPatternRewriter &rewriter) const override { -looputils::LoopNest loopNest; +looputils::LoopNestToIndVarMap loopNest; bool hasRemainingNestedLoops = failed(looputils::collectLoopNest(doLoop, loopNest)); if (hasRemainingNestedLoops) mlir::emitWarning(doLoop.getLoc(), "Some `do concurent` loops are not perfectly-nested. " "These will be serialized."); -// TODO This will be filled in with the next PRs that upstreams the rest of -// the ROCm implementaion. +mlir::IRMapping mapper; +genParallelOp(doLoop.getLoc(), rewriter, loopNest, mapper); +mlir::omp::LoopNestOperands loopNestClauseOps; +genLoopNestClauseOps(doLoop.getLoc(), rewriter, loopNest, mapper, + loopNestClauseOps); + +mlir::omp::LoopNestOp ompLoopNest = +genWsLoopOp(rewriter, loopNest.back().first, mapper, loopNestClauseOps, +/*isComposite=*/mapToDevice); skatrak wrote: This will at the moment cause invalid MLIR to be produced (composite `omp.wsloop` with no other loop wrappers). We should probably emit a not yet implemented error if `mapToDevice=true` at the beginning of the function instead, unless you intend to merge host and target support at the same time. https://github.com/llvm/llvm-project/pull/127633 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/124298 >From 3f039f909b91cc5ad1f92208944e0b66447346df Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Fri, 21 Feb 2025 14:33:44 +0100 Subject: [PATCH] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) Record all uses outside cycle with divergent exit during propagateTemporalDivergence in Uniformity analysis. With this list of candidates for temporal divergence lowering, excluding known lane masks from control flow intrinsics, find sources from inside the cycle that are not i1 and uniform. Temporal divergence lowering (non i1): create copy(v_mov) to vgpr, with implicit exec (to stop other passes from moving this copy outside of the cycle) and use this vgpr outside of the cycle instead of original uniform source. --- llvm/include/llvm/ADT/GenericUniformityImpl.h | 46 ++- llvm/include/llvm/ADT/GenericUniformityInfo.h | 5 ++ llvm/lib/Analysis/UniformityAnalysis.cpp | 3 +- .../lib/CodeGen/MachineUniformityAnalysis.cpp | 6 +-- .../AMDGPUGlobalISelDivergenceLowering.cpp| 44 +- .../lib/Target/AMDGPU/AMDGPURegBankSelect.cpp | 25 -- llvm/lib/Target/AMDGPU/SILowerI1Copies.h | 6 +++ ...divergent-i1-phis-no-lane-mask-merging.mir | 7 +-- ...ergence-divergent-i1-used-outside-loop.mir | 19 .../divergence-temporal-divergent-reg.ll | 18 .../divergence-temporal-divergent-reg.mir | 3 +- .../AMDGPU/GlobalISel/regbankselect-mui.ll| 17 +++ 12 files changed, 157 insertions(+), 42 deletions(-) diff --git a/llvm/include/llvm/ADT/GenericUniformityImpl.h b/llvm/include/llvm/ADT/GenericUniformityImpl.h index bd09f4fe43e08..6411fc9b4b974 100644 --- a/llvm/include/llvm/ADT/GenericUniformityImpl.h +++ b/llvm/include/llvm/ADT/GenericUniformityImpl.h @@ -51,7 +51,10 @@ #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SparseBitVector.h" #include "llvm/ADT/StringExtras.h" +#include "llvm/CodeGen/MachineInstr.h" +#include "llvm/Support/Debug.h" #include "llvm/Support/raw_ostream.h" +#include #define DEBUG_TYPE "uniformity" @@ -342,6 +345,9 @@ template class GenericUniformityAnalysisImpl { typename SyncDependenceAnalysisT::DivergenceDescriptor; using BlockLabelMapT = typename SyncDependenceAnalysisT::BlockLabelMap; + using TemporalDivergenceTuple = + std::tuple; + GenericUniformityAnalysisImpl(const DominatorTreeT &DT, const CycleInfoT &CI, const TargetTransformInfo *TTI) : Context(CI.getSSAContext()), F(*Context.getFunction()), CI(CI), @@ -396,6 +402,11 @@ template class GenericUniformityAnalysisImpl { void print(raw_ostream &out) const; + SmallVector TemporalDivergenceList; + + void recordTemporalDivergence(ConstValueRefT, const InstructionT *, +const CycleT *); + protected: /// \brief Value/block pair representing a single phi input. struct PhiInput { @@ -1129,6 +1140,13 @@ void GenericUniformityAnalysisImpl::compute() { } } +template +void GenericUniformityAnalysisImpl::recordTemporalDivergence( +ConstValueRefT Val, const InstructionT *User, const CycleT *Cycle) { + TemporalDivergenceList.emplace_back(Val, const_cast(User), + Cycle); +} + template bool GenericUniformityAnalysisImpl::isAlwaysUniform( const InstructionT &Instr) const { @@ -1146,6 +1164,12 @@ template void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { bool haveDivergentArgs = false; + // When we print Value, LLVM IR instruction, we want to print extra new line. + // In LLVM IR print function for Value does not print new line at the end. + // In MIR print for MachineInstr prints new line at the end. + constexpr bool IsMIR = std::is_same::value; + std::string NewLine = IsMIR ? "" : "\n"; + // Control flow instructions may be divergent even if their inputs are // uniform. Thus, although exceedingly rare, it is possible to have a program // with no divergent values but with divergent control structures. @@ -1180,6 +1204,16 @@ void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { } } + if (!TemporalDivergenceList.empty()) { +OS << "\nTEMPORAL DIVERGENCE LIST:\n"; + +for (auto [Val, UseInst, Cycle] : TemporalDivergenceList) { + OS << "Value :" << Context.print(Val) << NewLine + << "Used by :" << Context.print(UseInst) << NewLine + << "Outside cycle :" << Cycle->print(Context) << "\n\n"; +} + } + for (auto &block : F) { OS << "\nBLOCK " << Context.print(&block) << '\n'; @@ -1191,7 +1225,7 @@ void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { OS << " DIVERGENT: "; else OS << " "; - OS << Context.print(value) << '\n'; + OS << Context.print(value) << NewLine; } OS << "TERMINATORS\n"; @@ -1203,13 +1237,21
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)
@@ -188,6 +190,35 @@ void DivergenceLoweringHelper::constrainAsLaneMask(Incoming &In) { In.Reg = Copy.getReg(0); } +void replaceUsesOfRegInInstWith(Register Reg, MachineInstr *Inst, +Register NewReg) { + for (MachineOperand &Op : Inst->operands()) { +if (Op.isReg() && Op.getReg() == Reg) + Op.setReg(NewReg); + } +} + +bool DivergenceLoweringHelper::lowerTemporalDivergence() { + AMDGPU::IntrinsicLaneMaskAnalyzer ILMA(*MF); + + for (auto [Inst, UseInst, _] : MUI->getTemporalDivergenceList()) { petar-avramovic wrote: Updated types for recording TemporalDivergence and prints, improved new line prints. https://github.com/llvm/llvm-project/pull/124298 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)
@@ -202,6 +202,57 @@ variables: `i` and `j`. These are locally allocated inside the parallel/target OpenMP region similar to what the single-range example in previous section shows. +### Data environment + +By default, variables that are used inside a `do concurrent` loop nest are +either treated as `shared` in case of mapping to `host`, or mapped into the +`target` region using a `map` clause in case of mapping to `device`. The only +exceptions to this are: + 1. the loop's iteration variable(s) (IV) of **perfect** loop nests. In that + case, for each IV, we allocate a local copy as shown by the mapping + examples above. + 1. any values that are from allocations outside the loop nest and used + exclusively inside of it. In such cases, a local privatized + copy is created in the OpenMP region to prevent multiple teams of threads + from accessing and destroying the same memory block, which causes runtime + issues. For an example of such cases, see + `flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90`. + +Implicit mapping detection (for mapping to the target device) is still quite +limited and work to make it smarter is underway for both OpenMP in general +and `do concurrent` mapping. skatrak wrote: Nit: There's no mapping support at this stage, so maybe state that to avoid misleading anyone reading it. https://github.com/llvm/llvm-project/pull/127635 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)
https://github.com/skatrak approved this pull request. I have a couple of nits, but LGTM otherwise. Thank you! https://github.com/llvm/llvm-project/pull/127635 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)
@@ -361,6 +361,64 @@ void sinkLoopIVArgs(mlir::ConversionPatternRewriter &rewriter, ++idx; } } + +/// Collects values that are local to a loop: "loop-local values". A loop-local +/// value is one that is used exclusively inside the loop but allocated outside +/// of it. This usually corresponds to temporary values that are used inside the +/// loop body for initialzing other variables for example. skatrak wrote: ```suggestion /// loop body for initializing other variables for example. ``` https://github.com/llvm/llvm-project/pull/127635 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)
https://github.com/skatrak edited https://github.com/llvm/llvm-project/pull/127635 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)
@@ -361,6 +361,64 @@ void sinkLoopIVArgs(mlir::ConversionPatternRewriter &rewriter, ++idx; } } + +/// Collects values that are local to a loop: "loop-local values". A loop-local +/// value is one that is used exclusively inside the loop but allocated outside +/// of it. This usually corresponds to temporary values that are used inside the +/// loop body for initialzing other variables for example. +/// +/// See `flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90` for an +/// example of why we need this. +/// +/// \param [in] doLoop - the loop within which the function searches for values +/// used exclusively inside. +/// +/// \param [out] locals - the list of loop-local values detected for \p doLoop. +void collectLoopLocalValues(fir::DoLoopOp doLoop, +llvm::SetVector &locals) { + doLoop.walk([&](mlir::Operation *op) { +for (mlir::Value operand : op->getOperands()) { + if (locals.contains(operand)) +continue; + + bool isLocal = true; + + if (!mlir::isa_and_present(operand.getDefiningOp())) +continue; + + // Values defined inside the loop are not interesting since they do not + // need to be localized. + if (doLoop->isAncestor(operand.getDefiningOp())) +continue; + + for (auto *user : operand.getUsers()) { +if (!doLoop->isAncestor(user)) { + isLocal = false; + break; +} + } + + if (isLocal) +locals.insert(operand); skatrak wrote: Nit: I think something like this might be a bit more concise, but feel free to disagree. In that case, the `isLocal` declaration might be good to move it closer to the loop. ```suggestion auto users = operand.getUsers(); if (llvm::find_if(users, [&](mlir::Operation *user) { return !doLoop->isAncestor(user); }) == users.end()) locals.insert(operand); ``` https://github.com/llvm/llvm-project/pull/127635 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)
@@ -202,6 +202,57 @@ variables: `i` and `j`. These are locally allocated inside the parallel/target OpenMP region similar to what the single-range example in previous section shows. +### Data environment + +By default, variables that are used inside a `do concurrent` loop nest are +either treated as `shared` in case of mapping to `host`, or mapped into the +`target` region using a `map` clause in case of mapping to `device`. The only +exceptions to this are: + 1. the loop's iteration variable(s) (IV) of **perfect** loop nests. In that + case, for each IV, we allocate a local copy as shown by the mapping + examples above. + 1. any values that are from allocations outside the loop nest and used + exclusively inside of it. In such cases, a local privatized + copy is created in the OpenMP region to prevent multiple teams of threads skatrak wrote: Nit: In the OpenMP parallel region? https://github.com/llvm/llvm-project/pull/127635 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)
@@ -0,0 +1,62 @@ +! Tests that "loop-local values" are properly handled by localizing them to the +! body of the loop nest. See `collectLoopLocalValues` and `localizeLoopLocalValue` +! for a definition of "loop-local values" and how they are handled. skatrak wrote: Nit: Maybe it's better to just generally point at the `DoConcurrentConversion` pass for more information, since this comment will easily get out of sync of the actual implementation otherwise. https://github.com/llvm/llvm-project/pull/127635 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)
@@ -102,6 +105,47 @@ mlir::Operation *findLoopIterationVarMemDecl(fir::DoLoopOp doLoop) { return result.getDefiningOp(); } +/// Collects the op(s) responsible for updating a loop's iteration variable with +/// the current iteration number. For example, for the input IR: +/// ``` +/// %i = fir.alloca i32 {bindc_name = "i"} +/// %i_decl:2 = hlfir.declare %i ... +/// ... +/// fir.do_loop %i_iv = %lb to %ub step %step unordered { +/// %1 = fir.convert %i_iv : (index) -> i32 +/// fir.store %1 to %i_decl#1 : !fir.ref +/// ... +/// } +/// ``` +/// this function would return the first 2 ops in the `fir.do_loop`'s region. +llvm::SetVector +extractIndVarUpdateOps(fir::DoLoopOp doLoop) { + mlir::Value indVar = doLoop.getInductionVar(); + llvm::SetVector indVarUpdateOps; + + llvm::SmallVector toProcess; + toProcess.push_back(indVar); + + llvm::DenseSet done; + + while (!toProcess.empty()) { +mlir::Value val = toProcess.back(); +toProcess.pop_back(); skatrak wrote: ```suggestion mlir::Value val = toProcess.pop_back_val(); ``` https://github.com/llvm/llvm-project/pull/127634 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Host lowering of distribute-parallel-do/for (PR #127819)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127819 >From 33d5af4e9d8aaf9464aa74f5031d60001d77c610 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 13:07:51 + Subject: [PATCH] [MLIR][OpenMP] Host lowering of distribute-parallel-do/for This patch adds support for translating composite `omp.parallel` + `omp.distribute` + `omp.wsloop` loops to LLVM IR on the host. This is done by passing an updated `WorksharingLoopType` to the call to `applyWorkshareLoop` associated to the lowering of the `omp.wsloop` operation, so that `__kmpc_dist_for_static_init` is called at runtime in place of `__kmpc_for_static_init`. Existing translation rules take care of creating a parallel region to hold the workshared and workdistributed loop. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 21 -- mlir/test/Target/LLVMIR/openmp-llvm.mlir | 65 +++ mlir/test/Target/LLVMIR/openmp-todo.mlir | 19 -- 3 files changed, 81 insertions(+), 24 deletions(-) diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 987f18fc7bc47..fbea278b2511f 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -257,10 +257,6 @@ static LogicalResult checkImplementationStatus(Operation &op) { LogicalResult result = success(); llvm::TypeSwitch(op) .Case([&](omp::DistributeOp op) { -if (op.isComposite() && -isa_and_present(op.getNestedWrapper())) - result = op.emitError() << "not yet implemented: " - "composite omp.distribute + omp.wsloop"; checkAllocate(op, result); checkDistSchedule(op, result); checkOrder(op, result); @@ -1990,6 +1986,14 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, bool isSimd = wsloopOp.getScheduleSimd(); bool loopNeedsBarrier = !wsloopOp.getNowait(); + // The only legal way for the direct parent to be omp.distribute is that this + // represents 'distribute parallel do'. Otherwise, this is a regular + // worksharing loop. + llvm::omp::WorksharingLoopType workshareLoopType = + llvm::isa_and_present(opInst.getParentOp()) + ? llvm::omp::WorksharingLoopType::DistributeForStaticLoop + : llvm::omp::WorksharingLoopType::ForStaticLoop; + llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder); llvm::Expected regionBlock = convertOmpOpRegions( wsloopOp.getRegion(), "omp.wsloop.region", builder, moduleTranslation); @@ -2005,7 +2009,8 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, ompLoc.DL, loopInfo, allocaIP, loopNeedsBarrier, convertToScheduleKind(schedule), chunk, isSimd, scheduleMod == omp::ScheduleModifier::monotonic, - scheduleMod == omp::ScheduleModifier::nonmonotonic, isOrdered); + scheduleMod == omp::ScheduleModifier::nonmonotonic, isOrdered, + workshareLoopType); if (failed(handleError(wsloopIP, opInst))) return failure(); @@ -3791,6 +3796,12 @@ convertOmpDistribute(Operation &opInst, llvm::IRBuilderBase &builder, return regionBlock.takeError(); builder.SetInsertPoint(*regionBlock, (*regionBlock)->begin()); +// Skip applying a workshare loop below when translating 'distribute +// parallel do' (it's been already handled by this point while translating +// the nested omp.wsloop). +if (isa_and_present(distributeOp.getNestedWrapper())) + return llvm::Error::success(); + // TODO: Add support for clauses which are valid for DISTRIBUTE constructs. // Static schedule is the default. auto schedule = omp::ClauseScheduleKind::Static; diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir index a5a490e527d79..d85b149c66811 100644 --- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir +++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir @@ -3307,3 +3307,68 @@ llvm.func @distribute() { // CHECK: store i64 1, ptr %[[STRIDE]] // CHECK: %[[TID:.*]] = call i32 @__kmpc_global_thread_num({{.*}}) // CHECK: call void @__kmpc_for_static_init_{{.*}}(ptr @{{.*}}, i32 %[[TID]], i32 92, ptr %[[LASTITER]], ptr %[[LB]], ptr %[[UB]], ptr %[[STRIDE]], i64 1, i64 0) + +// - + +llvm.func @distribute_wsloop(%lb : i32, %ub : i32, %step : i32) { + omp.parallel { +omp.distribute { + omp.wsloop { +omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) { + omp.yield +} + } {omp.composite} +} {omp.composite} +omp.terminator + } {omp.composite} + llvm.return +} + +// CHECK-LABEL: define void @distribute_wsloop +// CHECK: call void{{.*}}@__kmpc_fork_call({{.*}}, ptr @[[OUTLINED_PARALLEL:.*]], + +// CHECK: define internal void @[[OUTLINE
[llvm-branch-commits] [llvm] [OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC (PR #127820)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127820 >From 082d8e12a622e2315dd4503ce460f9a0e6f29007 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 14:19:30 + Subject: [PATCH] [OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC This patch splits off the calculation of canonical loop trip counts from the creation of canonical loops. This makes it possible to reuse this logic to, for instance, populate the `__tgt_target_kernel` runtime call for SPMD kernels. This feature is used to simplify one of the existing OpenMPIRBuilder tests. --- .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 38 +++ llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 27 - .../Frontend/OpenMPIRBuilderTest.cpp | 16 ++-- 3 files changed, 52 insertions(+), 29 deletions(-) diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 9ad85413acd34..207ca7fb05f62 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -728,13 +728,12 @@ class OpenMPIRBuilder { LoopBodyGenCallbackTy BodyGenCB, Value *TripCount, const Twine &Name = "loop"); - /// Generator for the control flow structure of an OpenMP canonical loop. + /// Calculate the trip count of a canonical loop. /// - /// Instead of a logical iteration space, this allows specifying user-defined - /// loop counter values using increment, upper- and lower bounds. To - /// disambiguate the terminology when counting downwards, instead of lower - /// bounds we use \p Start for the loop counter value in the first body - /// iteration. + /// This allows specifying user-defined loop counter values using increment, + /// upper- and lower bounds. To disambiguate the terminology when counting + /// downwards, instead of lower bounds we use \p Start for the loop counter + /// value in the first body iteration. /// /// Consider the following limitations: /// @@ -758,7 +757,32 @@ class OpenMPIRBuilder { /// /// for (int i = 0; i < 42; i -= 1u) /// - // + /// \param Loc The insert and source location description. + /// \param Start Value of the loop counter for the first iterations. + /// \param Stop Loop counter values past this will stop the loop. + /// \param Step Loop counter increment after each iteration; negative + /// means counting down. + /// \param IsSigned Whether Start, Stop and Step are signed integers. + /// \param InclusiveStop Whether \p Stop itself is a valid value for the loop + /// counter. + /// \param Name Base name used to derive instruction names. + /// + /// \returns The value holding the calculated trip count. + Value *calculateCanonicalLoopTripCount(const LocationDescription &Loc, + Value *Start, Value *Stop, Value *Step, + bool IsSigned, bool InclusiveStop, + const Twine &Name = "loop"); + + /// Generator for the control flow structure of an OpenMP canonical loop. + /// + /// Instead of a logical iteration space, this allows specifying user-defined + /// loop counter values using increment, upper- and lower bounds. To + /// disambiguate the terminology when counting downwards, instead of lower + /// bounds we use \p Start for the loop counter value in the first body + /// + /// It calls \see calculateCanonicalLoopTripCount for trip count calculations, + /// so limitations of that method apply here as well. + /// /// \param Loc The insert and source location description. /// \param BodyGenCB Callback that will generate the loop body code. /// \param Start Value of the loop counter for the first iterations. diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 7788897fc0795..eee6e3e54d615 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -4059,10 +4059,9 @@ OpenMPIRBuilder::createCanonicalLoop(const LocationDescription &Loc, return CL; } -Expected OpenMPIRBuilder::createCanonicalLoop( -const LocationDescription &Loc, LoopBodyGenCallbackTy BodyGenCB, -Value *Start, Value *Stop, Value *Step, bool IsSigned, bool InclusiveStop, -InsertPointTy ComputeIP, const Twine &Name) { +Value *OpenMPIRBuilder::calculateCanonicalLoopTripCount( +const LocationDescription &Loc, Value *Start, Value *Stop, Value *Step, +bool IsSigned, bool InclusiveStop, const Twine &Name) { // Consider the following difficulties (assuming 8-bit signed integers): // * Adding \p Step to the loop counter which passes \p Stop may overflow: @@ -4075,9 +4074,7 @@ Expected OpenMPIRBuilder::createCanonicalLoop( assert(IndVarTy ==
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Host lowering of standalone distribute (PR #127817)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127817 >From 55089ba79ac352b05553d3d930ffca3f94562dc1 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 11:22:43 + Subject: [PATCH] [MLIR][OpenMP] Host lowering of standalone distribute This patch adds MLIR to LLVM IR translation support for standalone `omp.distribute` operations, as well as `distribute simd` through ignoring SIMD information (similarly to `do/for simd`). Co-authored-by: Dominik Adamski --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 83 +++ mlir/test/Target/LLVMIR/openmp-llvm.mlir | 37 + mlir/test/Target/LLVMIR/openmp-todo.mlir | 66 ++- 3 files changed, 183 insertions(+), 3 deletions(-) diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 1344f992c116e..87b690912620b 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -161,6 +161,10 @@ static LogicalResult checkImplementationStatus(Operation &op) { if (op.getDevice()) result = todo("device"); }; + auto checkDistSchedule = [&todo](auto op, LogicalResult &result) { +if (op.getDistScheduleChunkSize()) + result = todo("dist_schedule with chunk_size"); + }; auto checkHasDeviceAddr = [&todo](auto op, LogicalResult &result) { if (!op.getHasDeviceAddrVars().empty()) result = todo("has_device_addr"); @@ -252,6 +256,16 @@ static LogicalResult checkImplementationStatus(Operation &op) { LogicalResult result = success(); llvm::TypeSwitch(op) + .Case([&](omp::DistributeOp op) { +if (op.isComposite() && +isa_and_present(op.getNestedWrapper())) + result = op.emitError() << "not yet implemented: " + "composite omp.distribute + omp.wsloop"; +checkAllocate(op, result); +checkDistSchedule(op, result); +checkOrder(op, result); +checkPrivate(op, result); + }) .Case([&](omp::OrderedRegionOp op) { checkParLevelSimd(op, result); }) .Case([&](omp::SectionsOp op) { checkAllocate(op, result); @@ -3754,6 +3768,72 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder, return success(); } +static LogicalResult +convertOmpDistribute(Operation &opInst, llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation) { + llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder(); + auto distributeOp = cast(opInst); + if (failed(checkImplementationStatus(opInst))) +return failure(); + + using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy; + auto bodyGenCB = [&](InsertPointTy allocaIP, + InsertPointTy codeGenIP) -> llvm::Error { +// Save the alloca insertion point on ModuleTranslation stack for use in +// nested regions. +LLVM::ModuleTranslation::SaveStack frame( +moduleTranslation, allocaIP); + +// DistributeOp has only one region associated with it. +builder.restoreIP(codeGenIP); + +llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder(); +llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder); +llvm::Expected regionBlock = +convertOmpOpRegions(distributeOp.getRegion(), "omp.distribute.region", +builder, moduleTranslation); +if (!regionBlock) + return regionBlock.takeError(); +builder.SetInsertPoint(*regionBlock, (*regionBlock)->begin()); + +// TODO: Add support for clauses which are valid for DISTRIBUTE constructs. +// Static schedule is the default. +auto schedule = omp::ClauseScheduleKind::Static; +bool isOrdered = false; +std::optional scheduleMod; +bool isSimd = false; +llvm::omp::WorksharingLoopType workshareLoopType = +llvm::omp::WorksharingLoopType::DistributeStaticLoop; +bool loopNeedsBarrier = false; +llvm::Value *chunk = nullptr; + +llvm::CanonicalLoopInfo *loopInfo = findCurrentLoopInfo(moduleTranslation); +llvm::OpenMPIRBuilder::InsertPointOrErrorTy wsloopIP = +ompBuilder->applyWorkshareLoop( +ompLoc.DL, loopInfo, allocaIP, loopNeedsBarrier, +convertToScheduleKind(schedule), chunk, isSimd, +scheduleMod == omp::ScheduleModifier::monotonic, +scheduleMod == omp::ScheduleModifier::nonmonotonic, isOrdered, +workshareLoopType); + +if (!wsloopIP) + return wsloopIP.takeError(); +return llvm::Error::success(); + }; + + llvm::OpenMPIRBuilder::InsertPointTy allocaIP = + findAllocaInsertPoint(builder, moduleTranslation); + llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder); + llvm::OpenMPIRBuilder::InsertPointOrErrorTy afterIP = + ompBuilder->createDistribute(ompLoc,
[llvm-branch-commits] [llvm] [OpenMPIRBuilder] Add support for distribute-parallel-for/do constructs (PR #127818)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127818 >From ba9ea8c2cbe7848ca36c92e4c3ee464bcf0e6c39 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 12:04:53 + Subject: [PATCH] [OpenMPIRBuilder] Add support for distribute-parallel-for/do constructs This patch adds codegen for `kmpc_dist_for_static_init` runtime calls, used to support worksharing a single loop across teams and threads. This can be used to implement `distribute parallel for/do` support. --- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 34 --- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 9e380bf2d3dbe..7788897fc0795 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -4130,6 +4130,23 @@ Expected OpenMPIRBuilder::createCanonicalLoop( return createCanonicalLoop(LoopLoc, BodyGen, TripCount, Name); } +// Returns an LLVM function to call for initializing loop bounds using OpenMP +// static scheduling for composite `distribute parallel for` depending on +// `type`. Only i32 and i64 are supported by the runtime. Always interpret +// integers as unsigned similarly to CanonicalLoopInfo. +static FunctionCallee +getKmpcDistForStaticInitForType(Type *Ty, Module &M, +OpenMPIRBuilder &OMPBuilder) { + unsigned Bitwidth = Ty->getIntegerBitWidth(); + if (Bitwidth == 32) +return OMPBuilder.getOrCreateRuntimeFunction( +M, omp::RuntimeFunction::OMPRTL___kmpc_dist_for_static_init_4u); + if (Bitwidth == 64) +return OMPBuilder.getOrCreateRuntimeFunction( +M, omp::RuntimeFunction::OMPRTL___kmpc_dist_for_static_init_8u); + llvm_unreachable("unknown OpenMP loop iterator bitwidth"); +} + // Returns an LLVM function to call for initializing loop bounds using OpenMP // static scheduling depending on `type`. Only i32 and i64 are supported by the // runtime. Always interpret integers as unsigned similarly to @@ -4164,7 +4181,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop( // Declare useful OpenMP runtime functions. Value *IV = CLI->getIndVar(); Type *IVTy = IV->getType(); - FunctionCallee StaticInit = getKmpcForStaticInitForType(IVTy, M, *this); + FunctionCallee StaticInit = + LoopType == WorksharingLoopType::DistributeForStaticLoop + ? getKmpcDistForStaticInitForType(IVTy, M, *this) + : getKmpcForStaticInitForType(IVTy, M, *this); FunctionCallee StaticFini = getOrCreateRuntimeFunction(M, omp::OMPRTL___kmpc_for_static_fini); @@ -4200,9 +4220,15 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop( // Call the "init" function and update the trip count of the loop with the // value it produced. - Builder.CreateCall(StaticInit, - {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, - PUpperBound, PStride, One, Zero}); + SmallVector Args( + {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, PUpperBound}); + if (LoopType == WorksharingLoopType::DistributeForStaticLoop) { +Value *PDistUpperBound = +Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound"); +Args.push_back(PDistUpperBound); + } + Args.append({PStride, One, Zero}); + Builder.CreateCall(StaticInit, Args); Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound); Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound); Value *TripCountMinusOne = Builder.CreateSub(InclusiveUpperBound, LowerBound); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC (PR #127820)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127820 >From 033091e14c76c3e9c7adb0deae2451a298a7fe9e Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 14:19:30 + Subject: [PATCH] [OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC This patch splits off the calculation of canonical loop trip counts from the creation of canonical loops. This makes it possible to reuse this logic to, for instance, populate the `__tgt_target_kernel` runtime call for SPMD kernels. This feature is used to simplify one of the existing OpenMPIRBuilder tests. --- .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 38 +++ llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 27 - .../Frontend/OpenMPIRBuilderTest.cpp | 16 ++-- 3 files changed, 52 insertions(+), 29 deletions(-) diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 9ad85413acd34..207ca7fb05f62 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -728,13 +728,12 @@ class OpenMPIRBuilder { LoopBodyGenCallbackTy BodyGenCB, Value *TripCount, const Twine &Name = "loop"); - /// Generator for the control flow structure of an OpenMP canonical loop. + /// Calculate the trip count of a canonical loop. /// - /// Instead of a logical iteration space, this allows specifying user-defined - /// loop counter values using increment, upper- and lower bounds. To - /// disambiguate the terminology when counting downwards, instead of lower - /// bounds we use \p Start for the loop counter value in the first body - /// iteration. + /// This allows specifying user-defined loop counter values using increment, + /// upper- and lower bounds. To disambiguate the terminology when counting + /// downwards, instead of lower bounds we use \p Start for the loop counter + /// value in the first body iteration. /// /// Consider the following limitations: /// @@ -758,7 +757,32 @@ class OpenMPIRBuilder { /// /// for (int i = 0; i < 42; i -= 1u) /// - // + /// \param Loc The insert and source location description. + /// \param Start Value of the loop counter for the first iterations. + /// \param Stop Loop counter values past this will stop the loop. + /// \param Step Loop counter increment after each iteration; negative + /// means counting down. + /// \param IsSigned Whether Start, Stop and Step are signed integers. + /// \param InclusiveStop Whether \p Stop itself is a valid value for the loop + /// counter. + /// \param Name Base name used to derive instruction names. + /// + /// \returns The value holding the calculated trip count. + Value *calculateCanonicalLoopTripCount(const LocationDescription &Loc, + Value *Start, Value *Stop, Value *Step, + bool IsSigned, bool InclusiveStop, + const Twine &Name = "loop"); + + /// Generator for the control flow structure of an OpenMP canonical loop. + /// + /// Instead of a logical iteration space, this allows specifying user-defined + /// loop counter values using increment, upper- and lower bounds. To + /// disambiguate the terminology when counting downwards, instead of lower + /// bounds we use \p Start for the loop counter value in the first body + /// + /// It calls \see calculateCanonicalLoopTripCount for trip count calculations, + /// so limitations of that method apply here as well. + /// /// \param Loc The insert and source location description. /// \param BodyGenCB Callback that will generate the loop body code. /// \param Start Value of the loop counter for the first iterations. diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 7788897fc0795..eee6e3e54d615 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -4059,10 +4059,9 @@ OpenMPIRBuilder::createCanonicalLoop(const LocationDescription &Loc, return CL; } -Expected OpenMPIRBuilder::createCanonicalLoop( -const LocationDescription &Loc, LoopBodyGenCallbackTy BodyGenCB, -Value *Start, Value *Stop, Value *Step, bool IsSigned, bool InclusiveStop, -InsertPointTy ComputeIP, const Twine &Name) { +Value *OpenMPIRBuilder::calculateCanonicalLoopTripCount( +const LocationDescription &Loc, Value *Start, Value *Stop, Value *Step, +bool IsSigned, bool InclusiveStop, const Twine &Name) { // Consider the following difficulties (assuming 8-bit signed integers): // * Adding \p Step to the loop counter which passes \p Stop may overflow: @@ -4075,9 +4074,7 @@ Expected OpenMPIRBuilder::createCanonicalLoop( assert(IndVarTy ==
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Support target SPMD (PR #127821)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127821 >From 32e696f446082a50b60032f1f5b656e494db5570 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Wed, 19 Feb 2025 14:41:12 + Subject: [PATCH 1/2] [MLIR][OpenMP] Support target SPMD This patch implements MLIR to LLVM IR translation of host-evaluated loop bounds, completing initial support for `target teams distribute parallel do [simd]` and `target teams distribute [simd]`. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 83 .../Target/LLVMIR/openmp-target-spmd.mlir | 96 +++ mlir/test/Target/LLVMIR/openmp-todo.mlir | 24 - 3 files changed, 159 insertions(+), 44 deletions(-) create mode 100644 mlir/test/Target/LLVMIR/openmp-target-spmd.mlir diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index adac89988a2da..a7d2a00a1bd90 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -173,15 +173,6 @@ static LogicalResult checkImplementationStatus(Operation &op) { if (op.getHint()) op.emitWarning("hint clause discarded"); }; - auto checkHostEval = [](auto op, LogicalResult &result) { -// Host evaluated clauses are supported, except for loop bounds. -for (BlockArgument arg : - cast(*op).getHostEvalBlockArgs()) - for (Operation *user : arg.getUsers()) -if (isa(user)) - result = op.emitError("not yet implemented: host evaluation of loop " -"bounds in omp.target operation"); - }; auto checkInReduction = [&todo](auto op, LogicalResult &result) { if (!op.getInReductionVars().empty() || op.getInReductionByref() || op.getInReductionSyms()) @@ -318,7 +309,6 @@ static LogicalResult checkImplementationStatus(Operation &op) { checkBare(op, result); checkDevice(op, result); checkHasDeviceAddr(op, result); -checkHostEval(op, result); checkInReduction(op, result); checkIsDevicePtr(op, result); checkPrivate(op, result); @@ -4058,9 +4048,13 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg, /// /// Loop bounds and steps are only optionally populated, if output vectors are /// provided. -static void extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads, - Value &numTeamsLower, Value &numTeamsUpper, - Value &threadLimit) { +static void +extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads, + Value &numTeamsLower, Value &numTeamsUpper, + Value &threadLimit, + llvm::SmallVectorImpl *lowerBounds = nullptr, + llvm::SmallVectorImpl *upperBounds = nullptr, + llvm::SmallVectorImpl *steps = nullptr) { auto blockArgIface = llvm::cast(*targetOp); for (auto item : llvm::zip_equal(targetOp.getHostEvalVars(), blockArgIface.getHostEvalBlockArgs())) { @@ -4085,11 +4079,26 @@ static void extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads, llvm_unreachable("unsupported host_eval use"); }) .Case([&](omp::LoopNestOp loopOp) { -// TODO: Extract bounds and step values. Currently, this cannot be -// reached because translation would have been stopped earlier as a -// result of `checkImplementationStatus` detecting and reporting -// this situation. -llvm_unreachable("unsupported host_eval use"); +auto processBounds = +[&](OperandRange opBounds, +llvm::SmallVectorImpl *outBounds) -> bool { + bool found = false; + for (auto [i, lb] : llvm::enumerate(opBounds)) { +if (lb == blockArg) { + found = true; + if (outBounds) +(*outBounds)[i] = hostEvalVar; +} + } + return found; +}; +bool found = +processBounds(loopOp.getLoopLowerBounds(), lowerBounds); +found = processBounds(loopOp.getLoopUpperBounds(), upperBounds) || +found; +found = processBounds(loopOp.getLoopSteps(), steps) || found; +if (!found) + llvm_unreachable("unsupported host_eval use"); }) .Default([](Operation *) { llvm_unreachable("unsupported host_eval use"); @@ -4226,6 +4235,7 @@ initTargetDefaultAttrs(omp::TargetOp targetOp, combinedMaxThreadsVal = maxThreadsVal; // Update kernel bounds structure for the `OpenMPIRBuilder` to use. + attrs.ExecFlags = targetOp.getKernelExecFla
[llvm-branch-commits] [llvm] [MachineBasicBlock][NFC] Decouple SplitCriticalEdges from pass manager (PR #128151)
https://github.com/cdevadas approved this pull request. https://github.com/llvm/llvm-project/pull/128151 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen][NewPM] Port MachineSink to NPM (PR #115434)
@@ -189,30 +198,19 @@ class MachineSinking : public MachineFunctionPass { bool EnableSinkAndFold; public: - static char ID; // Pass identification - - MachineSinking() : MachineFunctionPass(ID) { -initializeMachineSinkingPass(*PassRegistry::getPassRegistry()); - } - - bool runOnMachineFunction(MachineFunction &MF) override; - - void getAnalysisUsage(AnalysisUsage &AU) const override { -MachineFunctionPass::getAnalysisUsage(AU); -AU.addRequired(); -AU.addRequired(); -AU.addRequired(); -AU.addRequired(); -AU.addRequired(); -AU.addPreserved(); -AU.addPreserved(); -AU.addRequired(); -if (UseBlockFreqInfo) - AU.addRequired(); -AU.addRequired(); - } - - void releaseMemory() override { + MachineSinking(bool EnableSinkAndFold, MachineDominatorTree *DT, + MachinePostDominatorTree *PDT, LiveVariables *LV, + MachineLoopInfo *MLI, SlotIndexes *SI, LiveIntervals *LIS, + MachineCycleInfo *CI, ProfileSummaryInfo *PSI, + MachineBlockFrequencyInfo *MBFI, + const MachineBranchProbabilityInfo *MBPI, AliasAnalysis *AA) + : DT(DT), PDT(PDT), CI(CI), PSI(PSI), MBFI(MBFI), MBPI(MBPI), AA(AA), cdevadas wrote: Should have used `RequiredAnalyses` instead of this long list of arguments. https://github.com/llvm/llvm-project/pull/115434 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC][MachineBasicBlock] Decouple SplitCriticalEdges from pass manager (PR #128151)
https://github.com/optimisan edited https://github.com/llvm/llvm-project/pull/128151 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Allow host evaluation of loop bounds for distribute (PR #127822)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127822 >From c3db0a39f6515911deece48d61e7ee5bfb7219b1 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Wed, 19 Feb 2025 15:15:01 + Subject: [PATCH] [Flang][OpenMP] Allow host evaluation of loop bounds for distribute This patch adds `target teams distribute [simd]` and equivalent construct nests to the list of cases where loop bounds can be evaluated in the host, as they represent Generic-SPMD kernels for which the trip count must also be evaluated in advance to the kernel call. --- flang/lib/Lower/OpenMP/OpenMP.cpp | 12 +-- flang/test/Lower/OpenMP/host-eval.f90 | 103 ++ 2 files changed, 110 insertions(+), 5 deletions(-) diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp index bd794033cdf11..8c80453610473 100644 --- a/flang/lib/Lower/OpenMP/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP/OpenMP.cpp @@ -562,8 +562,11 @@ static void processHostEvalClauses(lower::AbstractConverter &converter, [[fallthrough]]; case OMPD_distribute_parallel_do: case OMPD_distribute_parallel_do_simd: - cp.processCollapse(loc, eval, hostInfo.ops, hostInfo.iv); cp.processNumThreads(stmtCtx, hostInfo.ops); + [[fallthrough]]; +case OMPD_distribute: +case OMPD_distribute_simd: + cp.processCollapse(loc, eval, hostInfo.ops, hostInfo.iv); break; // Cases where 'teams' clauses might be present, and target SPMD is @@ -573,10 +576,8 @@ static void processHostEvalClauses(lower::AbstractConverter &converter, [[fallthrough]]; case OMPD_target_teams: cp.processNumTeams(stmtCtx, hostInfo.ops); - processSingleNestedIf([](Directive nestedDir) { -return nestedDir == OMPD_distribute_parallel_do || - nestedDir == OMPD_distribute_parallel_do_simd; - }); + processSingleNestedIf( + [](Directive nestedDir) { return topDistributeSet.test(nestedDir); }); break; // Cases where only 'teams' host-evaluated clauses might be present. @@ -586,6 +587,7 @@ static void processHostEvalClauses(lower::AbstractConverter &converter, [[fallthrough]]; case OMPD_target_teams_distribute: case OMPD_target_teams_distribute_simd: + cp.processCollapse(loc, eval, hostInfo.ops, hostInfo.iv); cp.processNumTeams(stmtCtx, hostInfo.ops); break; diff --git a/flang/test/Lower/OpenMP/host-eval.f90 b/flang/test/Lower/OpenMP/host-eval.f90 index 32c52462b86a7..65258c91e5daf 100644 --- a/flang/test/Lower/OpenMP/host-eval.f90 +++ b/flang/test/Lower/OpenMP/host-eval.f90 @@ -155,3 +155,106 @@ subroutine distribute_parallel_do_simd() !$omp end distribute parallel do simd !$omp end teams end subroutine distribute_parallel_do_simd + +! BOTH-LABEL: func.func @_QPdistribute +subroutine distribute() + ! BOTH: omp.target + + ! HOST-SAME: host_eval(%{{.*}} -> %[[LB:.*]], %{{.*}} -> %[[UB:.*]], %{{.*}} -> %[[STEP:.*]] : i32, i32, i32) + + ! DEVICE-NOT: host_eval({{.*}}) + ! DEVICE-SAME: { + + ! BOTH: omp.teams + !$omp target teams + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.loop_nest + + ! HOST-SAME: (%{{.*}}) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) + !$omp distribute + do i=1,10 +call foo() + end do + !$omp end distribute + !$omp end target teams + + ! BOTH: omp.target + ! BOTH-NOT: host_eval({{.*}}) + ! BOTH-SAME: { + ! BOTH: omp.teams + !$omp target teams + call foo() !< Prevents this from being Generic-SPMD. + + ! BOTH: omp.distribute + !$omp distribute + do i=1,10 +call foo() + end do + !$omp end distribute + !$omp end target teams + + ! BOTH: omp.teams + !$omp teams + + ! BOTH: omp.distribute + !$omp distribute + do i=1,10 +call foo() + end do + !$omp end distribute + !$omp end teams +end subroutine distribute + +! BOTH-LABEL: func.func @_QPdistribute_simd +subroutine distribute_simd() + ! BOTH: omp.target + + ! HOST-SAME: host_eval(%{{.*}} -> %[[LB:.*]], %{{.*}} -> %[[UB:.*]], %{{.*}} -> %[[STEP:.*]] : i32, i32, i32) + + ! DEVICE-NOT: host_eval({{.*}}) + ! DEVICE-SAME: { + + ! BOTH: omp.teams + !$omp target teams + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.simd + ! BOTH-NEXT: omp.loop_nest + + ! HOST-SAME: (%{{.*}}) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) + !$omp distribute simd + do i=1,10 +call foo() + end do + !$omp end distribute simd + !$omp end target teams + + ! BOTH: omp.target + ! BOTH-NOT: host_eval({{.*}}) + ! BOTH-SAME: { + ! BOTH: omp.teams + !$omp target teams + call foo() !< Prevents this from being Generic-SPMD. + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.simd + !$omp distribute simd + do i=1,10 +call foo() + end do + !$omp end distribute simd + !$omp end target teams + + ! BOTH: omp.teams + !$omp teams + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.simd + !$omp distribute simd + do i=1,10 +call foo() + en
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Host lowering of standalone distribute (PR #127817)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127817 >From 128819a704e4c3497c55fe21d0b588f24240af34 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 11:22:43 + Subject: [PATCH] [MLIR][OpenMP] Host lowering of standalone distribute This patch adds MLIR to LLVM IR translation support for standalone `omp.distribute` operations, as well as `distribute simd` through ignoring SIMD information (similarly to `do/for simd`). Co-authored-by: Dominik Adamski --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 78 +++ mlir/test/Target/LLVMIR/openmp-llvm.mlir | 37 + mlir/test/Target/LLVMIR/openmp-todo.mlir | 66 +++- 3 files changed, 178 insertions(+), 3 deletions(-) diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 1344f992c116e..987f18fc7bc47 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -161,6 +161,10 @@ static LogicalResult checkImplementationStatus(Operation &op) { if (op.getDevice()) result = todo("device"); }; + auto checkDistSchedule = [&todo](auto op, LogicalResult &result) { +if (op.getDistScheduleChunkSize()) + result = todo("dist_schedule with chunk_size"); + }; auto checkHasDeviceAddr = [&todo](auto op, LogicalResult &result) { if (!op.getHasDeviceAddrVars().empty()) result = todo("has_device_addr"); @@ -252,6 +256,16 @@ static LogicalResult checkImplementationStatus(Operation &op) { LogicalResult result = success(); llvm::TypeSwitch(op) + .Case([&](omp::DistributeOp op) { +if (op.isComposite() && +isa_and_present(op.getNestedWrapper())) + result = op.emitError() << "not yet implemented: " + "composite omp.distribute + omp.wsloop"; +checkAllocate(op, result); +checkDistSchedule(op, result); +checkOrder(op, result); +checkPrivate(op, result); + }) .Case([&](omp::OrderedRegionOp op) { checkParLevelSimd(op, result); }) .Case([&](omp::SectionsOp op) { checkAllocate(op, result); @@ -3754,6 +3768,67 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder, return success(); } +static LogicalResult +convertOmpDistribute(Operation &opInst, llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation) { + llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder(); + auto distributeOp = cast(opInst); + if (failed(checkImplementationStatus(opInst))) +return failure(); + + using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy; + auto bodyGenCB = [&](InsertPointTy allocaIP, + InsertPointTy codeGenIP) -> llvm::Error { +// DistributeOp has only one region associated with it. +builder.restoreIP(codeGenIP); + +llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder(); +llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder); +llvm::Expected regionBlock = +convertOmpOpRegions(distributeOp.getRegion(), "omp.distribute.region", +builder, moduleTranslation); +if (!regionBlock) + return regionBlock.takeError(); +builder.SetInsertPoint(*regionBlock, (*regionBlock)->begin()); + +// TODO: Add support for clauses which are valid for DISTRIBUTE constructs. +// Static schedule is the default. +auto schedule = omp::ClauseScheduleKind::Static; +bool isOrdered = false; +std::optional scheduleMod; +bool isSimd = false; +llvm::omp::WorksharingLoopType workshareLoopType = +llvm::omp::WorksharingLoopType::DistributeStaticLoop; +bool loopNeedsBarrier = false; +llvm::Value *chunk = nullptr; + +llvm::CanonicalLoopInfo *loopInfo = *findCurrentLoopInfo(moduleTranslation); +llvm::OpenMPIRBuilder::InsertPointOrErrorTy wsloopIP = +ompBuilder->applyWorkshareLoop( +ompLoc.DL, loopInfo, allocaIP, loopNeedsBarrier, +convertToScheduleKind(schedule), chunk, isSimd, +scheduleMod == omp::ScheduleModifier::monotonic, +scheduleMod == omp::ScheduleModifier::nonmonotonic, isOrdered, +workshareLoopType); + +if (!wsloopIP) + return wsloopIP.takeError(); +return llvm::Error::success(); + }; + + llvm::OpenMPIRBuilder::InsertPointTy allocaIP = + findAllocaInsertPoint(builder, moduleTranslation); + llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder); + llvm::OpenMPIRBuilder::InsertPointOrErrorTy afterIP = + ompBuilder->createDistribute(ompLoc, allocaIP, bodyGenCB); + + if (failed(handleError(afterIP, opInst))) +return failure(); + + builder.restoreIP(*afterIP); + return success(); +} + /// Lowers the FlagsAttr which is
[llvm-branch-commits] [llvm] [OpenMPIRBuilder] Add support for distribute-parallel-for/do constructs (PR #127818)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127818 >From dbe0d70c0d1c83027ffc9b6eda637257a362adc5 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 12:04:53 + Subject: [PATCH] [OpenMPIRBuilder] Add support for distribute-parallel-for/do constructs This patch adds codegen for `kmpc_dist_for_static_init` runtime calls, used to support worksharing a single loop across teams and threads. This can be used to implement `distribute parallel for/do` support. --- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 34 --- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 9e380bf2d3dbe..7788897fc0795 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -4130,6 +4130,23 @@ Expected OpenMPIRBuilder::createCanonicalLoop( return createCanonicalLoop(LoopLoc, BodyGen, TripCount, Name); } +// Returns an LLVM function to call for initializing loop bounds using OpenMP +// static scheduling for composite `distribute parallel for` depending on +// `type`. Only i32 and i64 are supported by the runtime. Always interpret +// integers as unsigned similarly to CanonicalLoopInfo. +static FunctionCallee +getKmpcDistForStaticInitForType(Type *Ty, Module &M, +OpenMPIRBuilder &OMPBuilder) { + unsigned Bitwidth = Ty->getIntegerBitWidth(); + if (Bitwidth == 32) +return OMPBuilder.getOrCreateRuntimeFunction( +M, omp::RuntimeFunction::OMPRTL___kmpc_dist_for_static_init_4u); + if (Bitwidth == 64) +return OMPBuilder.getOrCreateRuntimeFunction( +M, omp::RuntimeFunction::OMPRTL___kmpc_dist_for_static_init_8u); + llvm_unreachable("unknown OpenMP loop iterator bitwidth"); +} + // Returns an LLVM function to call for initializing loop bounds using OpenMP // static scheduling depending on `type`. Only i32 and i64 are supported by the // runtime. Always interpret integers as unsigned similarly to @@ -4164,7 +4181,10 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop( // Declare useful OpenMP runtime functions. Value *IV = CLI->getIndVar(); Type *IVTy = IV->getType(); - FunctionCallee StaticInit = getKmpcForStaticInitForType(IVTy, M, *this); + FunctionCallee StaticInit = + LoopType == WorksharingLoopType::DistributeForStaticLoop + ? getKmpcDistForStaticInitForType(IVTy, M, *this) + : getKmpcForStaticInitForType(IVTy, M, *this); FunctionCallee StaticFini = getOrCreateRuntimeFunction(M, omp::OMPRTL___kmpc_for_static_fini); @@ -4200,9 +4220,15 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop( // Call the "init" function and update the trip count of the loop with the // value it produced. - Builder.CreateCall(StaticInit, - {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, - PUpperBound, PStride, One, Zero}); + SmallVector Args( + {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, PUpperBound}); + if (LoopType == WorksharingLoopType::DistributeForStaticLoop) { +Value *PDistUpperBound = +Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound"); +Args.push_back(PDistUpperBound); + } + Args.append({PStride, One, Zero}); + Builder.CreateCall(StaticInit, Args); Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound); Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound); Value *TripCountMinusOne = Builder.CreateSub(InclusiveUpperBound, LowerBound); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen][NewPM] Port MachineSink to NPM (PR #115434)
https://github.com/optimisan updated https://github.com/llvm/llvm-project/pull/115434 >From 17ae43cbf8e8aad79f3cba192079c3841e1425f5 Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Wed, 30 Oct 2024 04:56:54 + Subject: [PATCH 1/4] [CodeGen][NewPM] Port MachineSink to NPM Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for the NPM pipeline in buildCodeGenPipeline(... &Opts, ...) --- llvm/include/llvm/CodeGen/MachineSink.h | 29 llvm/include/llvm/CodeGen/Passes.h| 2 +- llvm/include/llvm/InitializePasses.h | 2 +- llvm/include/llvm/Passes/CodeGenPassBuilder.h | 3 +- .../llvm/Passes/MachinePassRegistry.def | 9 +- .../include/llvm/Target/CGPassBuilderOption.h | 1 + llvm/lib/CodeGen/CodeGen.cpp | 2 +- llvm/lib/CodeGen/MachineSink.cpp | 136 -- llvm/lib/CodeGen/TargetPassConfig.cpp | 4 +- llvm/lib/Passes/PassBuilder.cpp | 6 + llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp | 2 +- llvm/test/CodeGen/AArch64/loop-sink.mir | 1 + .../sink-and-fold-preserve-debugloc.mir | 2 + ...e-sink-temporal-divergence-swdev407790.mir | 2 + .../CodeGen/ARM/machine-sink-multidef.mir | 2 + .../Hexagon/machine-sink-float-usr.mir| 2 + .../PowerPC/sink-down-more-instructions-1.mir | 2 + .../CodeGen/RISCV/MachineSink-implicit-x0.mir | 1 + .../CodeGen/SystemZ/machinesink-dead-cc.mir | 3 + .../CodeGen/X86/machinesink-debug-inv-0.mir | 3 + .../DebugInfo/MIR/X86/sink-leaves-undef.mir | 1 + 21 files changed, 166 insertions(+), 49 deletions(-) create mode 100644 llvm/include/llvm/CodeGen/MachineSink.h diff --git a/llvm/include/llvm/CodeGen/MachineSink.h b/llvm/include/llvm/CodeGen/MachineSink.h new file mode 100644 index 0..1eee9d7f7e2a4 --- /dev/null +++ b/llvm/include/llvm/CodeGen/MachineSink.h @@ -0,0 +1,29 @@ +//===- MachineSink.h *- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef LLVM_CODEGEN_MACHINESINK_H +#define LLVM_CODEGEN_MACHINESINK_H + +#include "llvm/CodeGen/MachinePassManager.h" + +namespace llvm { + +class MachineSinkingPass : public PassInfoMixin { + bool EnableSinkAndFold; + +public: + MachineSinkingPass(bool EnableSinkAndFold = false) + : EnableSinkAndFold(EnableSinkAndFold) {} + + PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &); + + void printPipeline(raw_ostream &OS, function_ref MapClassName2PassName); +}; + +} // namespace llvm +#endif // LLVM_CODEGEN_MACHINESINK_H diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h index b5d2a7e6bf035..581c38e5c1a52 100644 --- a/llvm/include/llvm/CodeGen/Passes.h +++ b/llvm/include/llvm/CodeGen/Passes.h @@ -353,7 +353,7 @@ namespace llvm { extern char &EarlyMachineLICMID; /// MachineSinking - This pass performs sinking on machine instructions. - extern char &MachineSinkingID; + extern char &MachineSinkingLegacyID; /// MachineCopyPropagation - This pass performs copy propagation on /// machine instructions. diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 30c7402bd6606..5c45a405663b1 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -208,7 +208,7 @@ void initializeMachinePostDominatorTreeWrapperPassPass(PassRegistry &); void initializeMachineRegionInfoPassPass(PassRegistry &); void initializeMachineSanitizerBinaryMetadataPass(PassRegistry &); void initializeMachineSchedulerLegacyPass(PassRegistry &); -void initializeMachineSinkingPass(PassRegistry &); +void initializeMachineSinkingLegacyPass(PassRegistry &); void initializeMachineTraceMetricsWrapperPassPass(PassRegistry &); void initializeMachineUniformityInfoPrinterPassPass(PassRegistry &); void initializeMachineUniformityAnalysisPassPass(PassRegistry &); diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index 12781e2b84623..1967a323129c1 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -51,6 +51,7 @@ #include "llvm/CodeGen/MachineModuleInfo.h" #include "llvm/CodeGen/MachinePassManager.h" #include "llvm/CodeGen/MachineScheduler.h" +#include "llvm/CodeGen/MachineSink.h" #include "llvm/CodeGen/MachineVerifier.h" #include "llvm/CodeGen/OptimizePHIs.h" #include "llvm/CodeGen/PHIElimination.h" @@ -1042,7 +1043,7 @@ void CodeGenPassBuilder::addMachineSSAOptimization( addPass(EarlyMachineLICMPass()); addPass(MachineCSEPass()); - addPass(MachineSinkingPass()); + addPass(MachineSin
[llvm-branch-commits] [flang] [Flang][OpenMP] Allow host evaluation of loop bounds for distribute (PR #127822)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127822 >From 25e308a580946e40e4d74aae7f04d570723bb267 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Wed, 19 Feb 2025 15:15:01 + Subject: [PATCH] [Flang][OpenMP] Allow host evaluation of loop bounds for distribute This patch adds `target teams distribute [simd]` and equivalent construct nests to the list of cases where loop bounds can be evaluated in the host, as they represent Generic-SPMD kernels for which the trip count must also be evaluated in advance to the kernel call. --- flang/lib/Lower/OpenMP/OpenMP.cpp | 12 +-- flang/test/Lower/OpenMP/host-eval.f90 | 103 ++ 2 files changed, 110 insertions(+), 5 deletions(-) diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp index bd794033cdf11..8c80453610473 100644 --- a/flang/lib/Lower/OpenMP/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP/OpenMP.cpp @@ -562,8 +562,11 @@ static void processHostEvalClauses(lower::AbstractConverter &converter, [[fallthrough]]; case OMPD_distribute_parallel_do: case OMPD_distribute_parallel_do_simd: - cp.processCollapse(loc, eval, hostInfo.ops, hostInfo.iv); cp.processNumThreads(stmtCtx, hostInfo.ops); + [[fallthrough]]; +case OMPD_distribute: +case OMPD_distribute_simd: + cp.processCollapse(loc, eval, hostInfo.ops, hostInfo.iv); break; // Cases where 'teams' clauses might be present, and target SPMD is @@ -573,10 +576,8 @@ static void processHostEvalClauses(lower::AbstractConverter &converter, [[fallthrough]]; case OMPD_target_teams: cp.processNumTeams(stmtCtx, hostInfo.ops); - processSingleNestedIf([](Directive nestedDir) { -return nestedDir == OMPD_distribute_parallel_do || - nestedDir == OMPD_distribute_parallel_do_simd; - }); + processSingleNestedIf( + [](Directive nestedDir) { return topDistributeSet.test(nestedDir); }); break; // Cases where only 'teams' host-evaluated clauses might be present. @@ -586,6 +587,7 @@ static void processHostEvalClauses(lower::AbstractConverter &converter, [[fallthrough]]; case OMPD_target_teams_distribute: case OMPD_target_teams_distribute_simd: + cp.processCollapse(loc, eval, hostInfo.ops, hostInfo.iv); cp.processNumTeams(stmtCtx, hostInfo.ops); break; diff --git a/flang/test/Lower/OpenMP/host-eval.f90 b/flang/test/Lower/OpenMP/host-eval.f90 index 32c52462b86a7..65258c91e5daf 100644 --- a/flang/test/Lower/OpenMP/host-eval.f90 +++ b/flang/test/Lower/OpenMP/host-eval.f90 @@ -155,3 +155,106 @@ subroutine distribute_parallel_do_simd() !$omp end distribute parallel do simd !$omp end teams end subroutine distribute_parallel_do_simd + +! BOTH-LABEL: func.func @_QPdistribute +subroutine distribute() + ! BOTH: omp.target + + ! HOST-SAME: host_eval(%{{.*}} -> %[[LB:.*]], %{{.*}} -> %[[UB:.*]], %{{.*}} -> %[[STEP:.*]] : i32, i32, i32) + + ! DEVICE-NOT: host_eval({{.*}}) + ! DEVICE-SAME: { + + ! BOTH: omp.teams + !$omp target teams + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.loop_nest + + ! HOST-SAME: (%{{.*}}) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) + !$omp distribute + do i=1,10 +call foo() + end do + !$omp end distribute + !$omp end target teams + + ! BOTH: omp.target + ! BOTH-NOT: host_eval({{.*}}) + ! BOTH-SAME: { + ! BOTH: omp.teams + !$omp target teams + call foo() !< Prevents this from being Generic-SPMD. + + ! BOTH: omp.distribute + !$omp distribute + do i=1,10 +call foo() + end do + !$omp end distribute + !$omp end target teams + + ! BOTH: omp.teams + !$omp teams + + ! BOTH: omp.distribute + !$omp distribute + do i=1,10 +call foo() + end do + !$omp end distribute + !$omp end teams +end subroutine distribute + +! BOTH-LABEL: func.func @_QPdistribute_simd +subroutine distribute_simd() + ! BOTH: omp.target + + ! HOST-SAME: host_eval(%{{.*}} -> %[[LB:.*]], %{{.*}} -> %[[UB:.*]], %{{.*}} -> %[[STEP:.*]] : i32, i32, i32) + + ! DEVICE-NOT: host_eval({{.*}}) + ! DEVICE-SAME: { + + ! BOTH: omp.teams + !$omp target teams + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.simd + ! BOTH-NEXT: omp.loop_nest + + ! HOST-SAME: (%{{.*}}) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) + !$omp distribute simd + do i=1,10 +call foo() + end do + !$omp end distribute simd + !$omp end target teams + + ! BOTH: omp.target + ! BOTH-NOT: host_eval({{.*}}) + ! BOTH-SAME: { + ! BOTH: omp.teams + !$omp target teams + call foo() !< Prevents this from being Generic-SPMD. + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.simd + !$omp distribute simd + do i=1,10 +call foo() + end do + !$omp end distribute simd + !$omp end target teams + + ! BOTH: omp.teams + !$omp teams + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.simd + !$omp distribute simd + do i=1,10 +call foo() + en
[llvm-branch-commits] [llvm] [OpenMPIRBuilder] Add support for distribute constructs (PR #127816)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127816 >From 40d140e6bc0be9556bc09524b38e642cb9885a9d Mon Sep 17 00:00:00 2001 From: Dominik Adamski Date: Mon, 17 Feb 2025 14:25:40 + Subject: [PATCH] [OpenMPIRBuilder] Add support for distribute constructs This patch adds the `OpenMPIRBuilder::createDistribute()` function and updates `OpenMPIRBuilder::applyStaticWorkshareLoop()` in preparation for adding `distribute` support to flang. Co-authored-by: Sergio Afonso --- .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 17 -- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 59 --- 2 files changed, 64 insertions(+), 12 deletions(-) diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index d25077cae63e4..9ad85413acd34 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -1004,12 +1004,12 @@ class OpenMPIRBuilder { /// preheader of the loop. /// \param NeedsBarrier Indicates whether a barrier must be inserted after /// the loop. + /// \param LoopType Type of workshare loop. /// /// \returns Point where to insert code after the workshare construct. - InsertPointOrErrorTy applyStaticWorkshareLoop(DebugLoc DL, -CanonicalLoopInfo *CLI, -InsertPointTy AllocaIP, -bool NeedsBarrier); + InsertPointOrErrorTy applyStaticWorkshareLoop( + DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP, + omp::WorksharingLoopType LoopType, bool NeedsBarrier); /// Modifies the canonical loop a statically-scheduled workshare loop with a /// user-specified chunk size. @@ -2660,6 +2660,15 @@ class OpenMPIRBuilder { Value *NumTeamsLower = nullptr, Value *NumTeamsUpper = nullptr, Value *ThreadLimit = nullptr, Value *IfExpr = nullptr); + /// Generator for `#omp distribute` + /// + /// \param Loc The location where the distribute construct was encountered. + /// \param AllocaIP The insertion points to be used for alloca instructions. + /// \param BodyGenCB Callback that will generate the region code. + InsertPointOrErrorTy createDistribute(const LocationDescription &Loc, +InsertPointTy AllocaIP, +BodyGenCallbackTy BodyGenCB); + /// Generate conditional branch and relevant BasicBlocks through which private /// threads copy the 'copyin' variables from Master copy to threadprivate /// copies. diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 04acab1e5765e..9e380bf2d3dbe 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -2295,7 +2295,8 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createSections( return LoopInfo.takeError(); InsertPointOrErrorTy WsloopIP = - applyStaticWorkshareLoop(Loc.DL, *LoopInfo, AllocaIP, !IsNowait); + applyStaticWorkshareLoop(Loc.DL, *LoopInfo, AllocaIP, + WorksharingLoopType::ForStaticLoop, !IsNowait); if (!WsloopIP) return WsloopIP.takeError(); InsertPointTy AfterIP = *WsloopIP; @@ -4145,10 +4146,9 @@ static FunctionCallee getKmpcForStaticInitForType(Type *Ty, Module &M, llvm_unreachable("unknown OpenMP loop iterator bitwidth"); } -OpenMPIRBuilder::InsertPointOrErrorTy -OpenMPIRBuilder::applyStaticWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI, - InsertPointTy AllocaIP, - bool NeedsBarrier) { +OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop( +DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP, +WorksharingLoopType LoopType, bool NeedsBarrier) { assert(CLI->isValid() && "Requires a valid canonical loop"); assert(!isConflictIP(AllocaIP, CLI->getPreheaderIP()) && "Require dedicated allocate IP"); @@ -4191,8 +4191,12 @@ OpenMPIRBuilder::applyStaticWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI, Value *ThreadNum = getOrCreateThreadID(SrcLoc); - Constant *SchedulingType = ConstantInt::get( - I32Type, static_cast(OMPScheduleType::UnorderedStatic)); + OMPScheduleType SchedType = + (LoopType == WorksharingLoopType::DistributeStaticLoop) + ? OMPScheduleType::OrderedDistribute + : OMPScheduleType::UnorderedStatic; + Constant *SchedulingType = + ConstantInt::get(I32Type, static_cast(SchedType)); // Call the "init" function and update the trip count of the loop with the // value it produced. @@ -4452,6 +4456,7 @@ static void createTargetLoopWorkshareCall( RealArgs.push_back(TripCount); if (LoopType == Worksh
[llvm-branch-commits] [llvm] [OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC (PR #127820)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127820 >From f14d964b8b744ebbf2f981ff07e0051c338db335 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 14:19:30 + Subject: [PATCH] [OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC This patch splits off the calculation of canonical loop trip counts from the creation of canonical loops. This makes it possible to reuse this logic to, for instance, populate the `__tgt_target_kernel` runtime call for SPMD kernels. This feature is used to simplify one of the existing OpenMPIRBuilder tests. --- .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 38 +++ llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 27 - .../Frontend/OpenMPIRBuilderTest.cpp | 16 ++-- 3 files changed, 52 insertions(+), 29 deletions(-) diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 9ad85413acd34..207ca7fb05f62 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -728,13 +728,12 @@ class OpenMPIRBuilder { LoopBodyGenCallbackTy BodyGenCB, Value *TripCount, const Twine &Name = "loop"); - /// Generator for the control flow structure of an OpenMP canonical loop. + /// Calculate the trip count of a canonical loop. /// - /// Instead of a logical iteration space, this allows specifying user-defined - /// loop counter values using increment, upper- and lower bounds. To - /// disambiguate the terminology when counting downwards, instead of lower - /// bounds we use \p Start for the loop counter value in the first body - /// iteration. + /// This allows specifying user-defined loop counter values using increment, + /// upper- and lower bounds. To disambiguate the terminology when counting + /// downwards, instead of lower bounds we use \p Start for the loop counter + /// value in the first body iteration. /// /// Consider the following limitations: /// @@ -758,7 +757,32 @@ class OpenMPIRBuilder { /// /// for (int i = 0; i < 42; i -= 1u) /// - // + /// \param Loc The insert and source location description. + /// \param Start Value of the loop counter for the first iterations. + /// \param Stop Loop counter values past this will stop the loop. + /// \param Step Loop counter increment after each iteration; negative + /// means counting down. + /// \param IsSigned Whether Start, Stop and Step are signed integers. + /// \param InclusiveStop Whether \p Stop itself is a valid value for the loop + /// counter. + /// \param Name Base name used to derive instruction names. + /// + /// \returns The value holding the calculated trip count. + Value *calculateCanonicalLoopTripCount(const LocationDescription &Loc, + Value *Start, Value *Stop, Value *Step, + bool IsSigned, bool InclusiveStop, + const Twine &Name = "loop"); + + /// Generator for the control flow structure of an OpenMP canonical loop. + /// + /// Instead of a logical iteration space, this allows specifying user-defined + /// loop counter values using increment, upper- and lower bounds. To + /// disambiguate the terminology when counting downwards, instead of lower + /// bounds we use \p Start for the loop counter value in the first body + /// + /// It calls \see calculateCanonicalLoopTripCount for trip count calculations, + /// so limitations of that method apply here as well. + /// /// \param Loc The insert and source location description. /// \param BodyGenCB Callback that will generate the loop body code. /// \param Start Value of the loop counter for the first iterations. diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 7788897fc0795..eee6e3e54d615 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -4059,10 +4059,9 @@ OpenMPIRBuilder::createCanonicalLoop(const LocationDescription &Loc, return CL; } -Expected OpenMPIRBuilder::createCanonicalLoop( -const LocationDescription &Loc, LoopBodyGenCallbackTy BodyGenCB, -Value *Start, Value *Stop, Value *Step, bool IsSigned, bool InclusiveStop, -InsertPointTy ComputeIP, const Twine &Name) { +Value *OpenMPIRBuilder::calculateCanonicalLoopTripCount( +const LocationDescription &Loc, Value *Start, Value *Stop, Value *Step, +bool IsSigned, bool InclusiveStop, const Twine &Name) { // Consider the following difficulties (assuming 8-bit signed integers): // * Adding \p Step to the loop counter which passes \p Stop may overflow: @@ -4075,9 +4074,7 @@ Expected OpenMPIRBuilder::createCanonicalLoop( assert(IndVarTy ==
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Support target SPMD (PR #127821)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127821 >From 27139f8f6260de93a0e6d6163b9562c7daa451b8 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Wed, 19 Feb 2025 14:41:12 + Subject: [PATCH 1/2] [MLIR][OpenMP] Support target SPMD This patch implements MLIR to LLVM IR translation of host-evaluated loop bounds, completing initial support for `target teams distribute parallel do [simd]` and `target teams distribute [simd]`. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 83 .../Target/LLVMIR/openmp-target-spmd.mlir | 96 +++ mlir/test/Target/LLVMIR/openmp-todo.mlir | 24 - 3 files changed, 159 insertions(+), 44 deletions(-) create mode 100644 mlir/test/Target/LLVMIR/openmp-target-spmd.mlir diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index fbea278b2511f..9d07bf7b5d224 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -173,15 +173,6 @@ static LogicalResult checkImplementationStatus(Operation &op) { if (op.getHint()) op.emitWarning("hint clause discarded"); }; - auto checkHostEval = [](auto op, LogicalResult &result) { -// Host evaluated clauses are supported, except for loop bounds. -for (BlockArgument arg : - cast(*op).getHostEvalBlockArgs()) - for (Operation *user : arg.getUsers()) -if (isa(user)) - result = op.emitError("not yet implemented: host evaluation of loop " -"bounds in omp.target operation"); - }; auto checkInReduction = [&todo](auto op, LogicalResult &result) { if (!op.getInReductionVars().empty() || op.getInReductionByref() || op.getInReductionSyms()) @@ -318,7 +309,6 @@ static LogicalResult checkImplementationStatus(Operation &op) { checkBare(op, result); checkDevice(op, result); checkHasDeviceAddr(op, result); -checkHostEval(op, result); checkInReduction(op, result); checkIsDevicePtr(op, result); checkPrivate(op, result); @@ -4053,9 +4043,13 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg, /// /// Loop bounds and steps are only optionally populated, if output vectors are /// provided. -static void extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads, - Value &numTeamsLower, Value &numTeamsUpper, - Value &threadLimit) { +static void +extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads, + Value &numTeamsLower, Value &numTeamsUpper, + Value &threadLimit, + llvm::SmallVectorImpl *lowerBounds = nullptr, + llvm::SmallVectorImpl *upperBounds = nullptr, + llvm::SmallVectorImpl *steps = nullptr) { auto blockArgIface = llvm::cast(*targetOp); for (auto item : llvm::zip_equal(targetOp.getHostEvalVars(), blockArgIface.getHostEvalBlockArgs())) { @@ -4080,11 +4074,26 @@ static void extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads, llvm_unreachable("unsupported host_eval use"); }) .Case([&](omp::LoopNestOp loopOp) { -// TODO: Extract bounds and step values. Currently, this cannot be -// reached because translation would have been stopped earlier as a -// result of `checkImplementationStatus` detecting and reporting -// this situation. -llvm_unreachable("unsupported host_eval use"); +auto processBounds = +[&](OperandRange opBounds, +llvm::SmallVectorImpl *outBounds) -> bool { + bool found = false; + for (auto [i, lb] : llvm::enumerate(opBounds)) { +if (lb == blockArg) { + found = true; + if (outBounds) +(*outBounds)[i] = hostEvalVar; +} + } + return found; +}; +bool found = +processBounds(loopOp.getLoopLowerBounds(), lowerBounds); +found = processBounds(loopOp.getLoopUpperBounds(), upperBounds) || +found; +found = processBounds(loopOp.getLoopSteps(), steps) || found; +if (!found) + llvm_unreachable("unsupported host_eval use"); }) .Default([](Operation *) { llvm_unreachable("unsupported host_eval use"); @@ -4221,6 +4230,7 @@ initTargetDefaultAttrs(omp::TargetOp targetOp, combinedMaxThreadsVal = maxThreadsVal; // Update kernel bounds structure for the `OpenMPIRBuilder` to use. + attrs.ExecFlags = targetOp.getKernelExecFla
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Host lowering of distribute-parallel-do/for (PR #127819)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127819 >From aad04faf1796c328ac2a4280939a7fb9d7503ab1 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 13:07:51 + Subject: [PATCH 1/2] [MLIR][OpenMP] Host lowering of distribute-parallel-do/for This patch adds support for translating composite `omp.parallel` + `omp.distribute` + `omp.wsloop` loops to LLVM IR on the host. This is done by passing an updated `WorksharingLoopType` to the call to `applyWorkshareLoop` associated to the lowering of the `omp.wsloop` operation, so that `__kmpc_dist_for_static_init` is called at runtime in place of `__kmpc_for_static_init`. Existing translation rules take care of creating a parallel region to hold the workshared and workdistributed loop. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 21 -- mlir/test/Target/LLVMIR/openmp-llvm.mlir | 65 +++ mlir/test/Target/LLVMIR/openmp-todo.mlir | 19 -- 3 files changed, 81 insertions(+), 24 deletions(-) diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 87b690912620b..adac89988a2da 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -257,10 +257,6 @@ static LogicalResult checkImplementationStatus(Operation &op) { LogicalResult result = success(); llvm::TypeSwitch(op) .Case([&](omp::DistributeOp op) { -if (op.isComposite() && -isa_and_present(op.getNestedWrapper())) - result = op.emitError() << "not yet implemented: " - "composite omp.distribute + omp.wsloop"; checkAllocate(op, result); checkDistSchedule(op, result); checkOrder(op, result); @@ -1990,6 +1986,14 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, bool isSimd = wsloopOp.getScheduleSimd(); bool loopNeedsBarrier = !wsloopOp.getNowait(); + // The only legal way for the direct parent to be omp.distribute is that this + // represents 'distribute parallel do'. Otherwise, this is a regular + // worksharing loop. + llvm::omp::WorksharingLoopType workshareLoopType = + llvm::isa_and_present(opInst.getParentOp()) + ? llvm::omp::WorksharingLoopType::DistributeForStaticLoop + : llvm::omp::WorksharingLoopType::ForStaticLoop; + llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder); llvm::Expected regionBlock = convertOmpOpRegions( wsloopOp.getRegion(), "omp.wsloop.region", builder, moduleTranslation); @@ -2005,7 +2009,8 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, ompLoc.DL, loopInfo, allocaIP, loopNeedsBarrier, convertToScheduleKind(schedule), chunk, isSimd, scheduleMod == omp::ScheduleModifier::monotonic, - scheduleMod == omp::ScheduleModifier::nonmonotonic, isOrdered); + scheduleMod == omp::ScheduleModifier::nonmonotonic, isOrdered, + workshareLoopType); if (failed(handleError(wsloopIP, opInst))) return failure(); @@ -3796,6 +3801,12 @@ convertOmpDistribute(Operation &opInst, llvm::IRBuilderBase &builder, return regionBlock.takeError(); builder.SetInsertPoint(*regionBlock, (*regionBlock)->begin()); +// Skip applying a workshare loop below when translating 'distribute +// parallel do' (it's been already handled by this point while translating +// the nested omp.wsloop). +if (isa_and_present(distributeOp.getNestedWrapper())) + return llvm::Error::success(); + // TODO: Add support for clauses which are valid for DISTRIBUTE constructs. // Static schedule is the default. auto schedule = omp::ClauseScheduleKind::Static; diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir index a5a490e527d79..d85b149c66811 100644 --- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir +++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir @@ -3307,3 +3307,68 @@ llvm.func @distribute() { // CHECK: store i64 1, ptr %[[STRIDE]] // CHECK: %[[TID:.*]] = call i32 @__kmpc_global_thread_num({{.*}}) // CHECK: call void @__kmpc_for_static_init_{{.*}}(ptr @{{.*}}, i32 %[[TID]], i32 92, ptr %[[LASTITER]], ptr %[[LB]], ptr %[[UB]], ptr %[[STRIDE]], i64 1, i64 0) + +// - + +llvm.func @distribute_wsloop(%lb : i32, %ub : i32, %step : i32) { + omp.parallel { +omp.distribute { + omp.wsloop { +omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) { + omp.yield +} + } {omp.composite} +} {omp.composite} +omp.terminator + } {omp.composite} + llvm.return +} + +// CHECK-LABEL: define void @distribute_wsloop +// CHECK: call void{{.*}}@__kmpc_fork_call({{.*}}, ptr @[[OUTLINED_PARALLEL:.*]], + +// CHECK: define internal void @[[OU
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Support target SPMD (PR #127821)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127821 >From e965e0e637551c9b5b5f7fb526a809d1186ef261 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Wed, 19 Feb 2025 14:41:12 + Subject: [PATCH 1/2] [MLIR][OpenMP] Support target SPMD This patch implements MLIR to LLVM IR translation of host-evaluated loop bounds, completing initial support for `target teams distribute parallel do [simd]` and `target teams distribute [simd]`. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 83 .../Target/LLVMIR/openmp-target-spmd.mlir | 96 +++ mlir/test/Target/LLVMIR/openmp-todo.mlir | 24 - 3 files changed, 159 insertions(+), 44 deletions(-) create mode 100644 mlir/test/Target/LLVMIR/openmp-target-spmd.mlir diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index adac89988a2da..a7d2a00a1bd90 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -173,15 +173,6 @@ static LogicalResult checkImplementationStatus(Operation &op) { if (op.getHint()) op.emitWarning("hint clause discarded"); }; - auto checkHostEval = [](auto op, LogicalResult &result) { -// Host evaluated clauses are supported, except for loop bounds. -for (BlockArgument arg : - cast(*op).getHostEvalBlockArgs()) - for (Operation *user : arg.getUsers()) -if (isa(user)) - result = op.emitError("not yet implemented: host evaluation of loop " -"bounds in omp.target operation"); - }; auto checkInReduction = [&todo](auto op, LogicalResult &result) { if (!op.getInReductionVars().empty() || op.getInReductionByref() || op.getInReductionSyms()) @@ -318,7 +309,6 @@ static LogicalResult checkImplementationStatus(Operation &op) { checkBare(op, result); checkDevice(op, result); checkHasDeviceAddr(op, result); -checkHostEval(op, result); checkInReduction(op, result); checkIsDevicePtr(op, result); checkPrivate(op, result); @@ -4058,9 +4048,13 @@ createDeviceArgumentAccessor(MapInfoData &mapData, llvm::Argument &arg, /// /// Loop bounds and steps are only optionally populated, if output vectors are /// provided. -static void extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads, - Value &numTeamsLower, Value &numTeamsUpper, - Value &threadLimit) { +static void +extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads, + Value &numTeamsLower, Value &numTeamsUpper, + Value &threadLimit, + llvm::SmallVectorImpl *lowerBounds = nullptr, + llvm::SmallVectorImpl *upperBounds = nullptr, + llvm::SmallVectorImpl *steps = nullptr) { auto blockArgIface = llvm::cast(*targetOp); for (auto item : llvm::zip_equal(targetOp.getHostEvalVars(), blockArgIface.getHostEvalBlockArgs())) { @@ -4085,11 +4079,26 @@ static void extractHostEvalClauses(omp::TargetOp targetOp, Value &numThreads, llvm_unreachable("unsupported host_eval use"); }) .Case([&](omp::LoopNestOp loopOp) { -// TODO: Extract bounds and step values. Currently, this cannot be -// reached because translation would have been stopped earlier as a -// result of `checkImplementationStatus` detecting and reporting -// this situation. -llvm_unreachable("unsupported host_eval use"); +auto processBounds = +[&](OperandRange opBounds, +llvm::SmallVectorImpl *outBounds) -> bool { + bool found = false; + for (auto [i, lb] : llvm::enumerate(opBounds)) { +if (lb == blockArg) { + found = true; + if (outBounds) +(*outBounds)[i] = hostEvalVar; +} + } + return found; +}; +bool found = +processBounds(loopOp.getLoopLowerBounds(), lowerBounds); +found = processBounds(loopOp.getLoopUpperBounds(), upperBounds) || +found; +found = processBounds(loopOp.getLoopSteps(), steps) || found; +if (!found) + llvm_unreachable("unsupported host_eval use"); }) .Default([](Operation *) { llvm_unreachable("unsupported host_eval use"); @@ -4226,6 +4235,7 @@ initTargetDefaultAttrs(omp::TargetOp targetOp, combinedMaxThreadsVal = maxThreadsVal; // Update kernel bounds structure for the `OpenMPIRBuilder` to use. + attrs.ExecFlags = targetOp.getKernelExecFla
[llvm-branch-commits] [flang] [Flang][OpenMP] Allow host evaluation of loop bounds for distribute (PR #127822)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127822 >From de75db239e6725be6509c06057a338842339bc0a Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Wed, 19 Feb 2025 15:15:01 + Subject: [PATCH] [Flang][OpenMP] Allow host evaluation of loop bounds for distribute This patch adds `target teams distribute [simd]` and equivalent construct nests to the list of cases where loop bounds can be evaluated in the host, as they represent Generic-SPMD kernels for which the trip count must also be evaluated in advance to the kernel call. --- flang/lib/Lower/OpenMP/OpenMP.cpp | 12 +-- flang/test/Lower/OpenMP/host-eval.f90 | 103 ++ 2 files changed, 110 insertions(+), 5 deletions(-) diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp index bd794033cdf11..8c80453610473 100644 --- a/flang/lib/Lower/OpenMP/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP/OpenMP.cpp @@ -562,8 +562,11 @@ static void processHostEvalClauses(lower::AbstractConverter &converter, [[fallthrough]]; case OMPD_distribute_parallel_do: case OMPD_distribute_parallel_do_simd: - cp.processCollapse(loc, eval, hostInfo.ops, hostInfo.iv); cp.processNumThreads(stmtCtx, hostInfo.ops); + [[fallthrough]]; +case OMPD_distribute: +case OMPD_distribute_simd: + cp.processCollapse(loc, eval, hostInfo.ops, hostInfo.iv); break; // Cases where 'teams' clauses might be present, and target SPMD is @@ -573,10 +576,8 @@ static void processHostEvalClauses(lower::AbstractConverter &converter, [[fallthrough]]; case OMPD_target_teams: cp.processNumTeams(stmtCtx, hostInfo.ops); - processSingleNestedIf([](Directive nestedDir) { -return nestedDir == OMPD_distribute_parallel_do || - nestedDir == OMPD_distribute_parallel_do_simd; - }); + processSingleNestedIf( + [](Directive nestedDir) { return topDistributeSet.test(nestedDir); }); break; // Cases where only 'teams' host-evaluated clauses might be present. @@ -586,6 +587,7 @@ static void processHostEvalClauses(lower::AbstractConverter &converter, [[fallthrough]]; case OMPD_target_teams_distribute: case OMPD_target_teams_distribute_simd: + cp.processCollapse(loc, eval, hostInfo.ops, hostInfo.iv); cp.processNumTeams(stmtCtx, hostInfo.ops); break; diff --git a/flang/test/Lower/OpenMP/host-eval.f90 b/flang/test/Lower/OpenMP/host-eval.f90 index 32c52462b86a7..65258c91e5daf 100644 --- a/flang/test/Lower/OpenMP/host-eval.f90 +++ b/flang/test/Lower/OpenMP/host-eval.f90 @@ -155,3 +155,106 @@ subroutine distribute_parallel_do_simd() !$omp end distribute parallel do simd !$omp end teams end subroutine distribute_parallel_do_simd + +! BOTH-LABEL: func.func @_QPdistribute +subroutine distribute() + ! BOTH: omp.target + + ! HOST-SAME: host_eval(%{{.*}} -> %[[LB:.*]], %{{.*}} -> %[[UB:.*]], %{{.*}} -> %[[STEP:.*]] : i32, i32, i32) + + ! DEVICE-NOT: host_eval({{.*}}) + ! DEVICE-SAME: { + + ! BOTH: omp.teams + !$omp target teams + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.loop_nest + + ! HOST-SAME: (%{{.*}}) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) + !$omp distribute + do i=1,10 +call foo() + end do + !$omp end distribute + !$omp end target teams + + ! BOTH: omp.target + ! BOTH-NOT: host_eval({{.*}}) + ! BOTH-SAME: { + ! BOTH: omp.teams + !$omp target teams + call foo() !< Prevents this from being Generic-SPMD. + + ! BOTH: omp.distribute + !$omp distribute + do i=1,10 +call foo() + end do + !$omp end distribute + !$omp end target teams + + ! BOTH: omp.teams + !$omp teams + + ! BOTH: omp.distribute + !$omp distribute + do i=1,10 +call foo() + end do + !$omp end distribute + !$omp end teams +end subroutine distribute + +! BOTH-LABEL: func.func @_QPdistribute_simd +subroutine distribute_simd() + ! BOTH: omp.target + + ! HOST-SAME: host_eval(%{{.*}} -> %[[LB:.*]], %{{.*}} -> %[[UB:.*]], %{{.*}} -> %[[STEP:.*]] : i32, i32, i32) + + ! DEVICE-NOT: host_eval({{.*}}) + ! DEVICE-SAME: { + + ! BOTH: omp.teams + !$omp target teams + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.simd + ! BOTH-NEXT: omp.loop_nest + + ! HOST-SAME: (%{{.*}}) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) + !$omp distribute simd + do i=1,10 +call foo() + end do + !$omp end distribute simd + !$omp end target teams + + ! BOTH: omp.target + ! BOTH-NOT: host_eval({{.*}}) + ! BOTH-SAME: { + ! BOTH: omp.teams + !$omp target teams + call foo() !< Prevents this from being Generic-SPMD. + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.simd + !$omp distribute simd + do i=1,10 +call foo() + end do + !$omp end distribute simd + !$omp end target teams + + ! BOTH: omp.teams + !$omp teams + + ! BOTH: omp.distribute + ! BOTH-NEXT: omp.simd + !$omp distribute simd + do i=1,10 +call foo() + en
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Host lowering of distribute-parallel-do/for (PR #127819)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/127819 >From aad04faf1796c328ac2a4280939a7fb9d7503ab1 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 18 Feb 2025 13:07:51 + Subject: [PATCH] [MLIR][OpenMP] Host lowering of distribute-parallel-do/for This patch adds support for translating composite `omp.parallel` + `omp.distribute` + `omp.wsloop` loops to LLVM IR on the host. This is done by passing an updated `WorksharingLoopType` to the call to `applyWorkshareLoop` associated to the lowering of the `omp.wsloop` operation, so that `__kmpc_dist_for_static_init` is called at runtime in place of `__kmpc_for_static_init`. Existing translation rules take care of creating a parallel region to hold the workshared and workdistributed loop. --- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 21 -- mlir/test/Target/LLVMIR/openmp-llvm.mlir | 65 +++ mlir/test/Target/LLVMIR/openmp-todo.mlir | 19 -- 3 files changed, 81 insertions(+), 24 deletions(-) diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 87b690912620b..adac89988a2da 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -257,10 +257,6 @@ static LogicalResult checkImplementationStatus(Operation &op) { LogicalResult result = success(); llvm::TypeSwitch(op) .Case([&](omp::DistributeOp op) { -if (op.isComposite() && -isa_and_present(op.getNestedWrapper())) - result = op.emitError() << "not yet implemented: " - "composite omp.distribute + omp.wsloop"; checkAllocate(op, result); checkDistSchedule(op, result); checkOrder(op, result); @@ -1990,6 +1986,14 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, bool isSimd = wsloopOp.getScheduleSimd(); bool loopNeedsBarrier = !wsloopOp.getNowait(); + // The only legal way for the direct parent to be omp.distribute is that this + // represents 'distribute parallel do'. Otherwise, this is a regular + // worksharing loop. + llvm::omp::WorksharingLoopType workshareLoopType = + llvm::isa_and_present(opInst.getParentOp()) + ? llvm::omp::WorksharingLoopType::DistributeForStaticLoop + : llvm::omp::WorksharingLoopType::ForStaticLoop; + llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder); llvm::Expected regionBlock = convertOmpOpRegions( wsloopOp.getRegion(), "omp.wsloop.region", builder, moduleTranslation); @@ -2005,7 +2009,8 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, ompLoc.DL, loopInfo, allocaIP, loopNeedsBarrier, convertToScheduleKind(schedule), chunk, isSimd, scheduleMod == omp::ScheduleModifier::monotonic, - scheduleMod == omp::ScheduleModifier::nonmonotonic, isOrdered); + scheduleMod == omp::ScheduleModifier::nonmonotonic, isOrdered, + workshareLoopType); if (failed(handleError(wsloopIP, opInst))) return failure(); @@ -3796,6 +3801,12 @@ convertOmpDistribute(Operation &opInst, llvm::IRBuilderBase &builder, return regionBlock.takeError(); builder.SetInsertPoint(*regionBlock, (*regionBlock)->begin()); +// Skip applying a workshare loop below when translating 'distribute +// parallel do' (it's been already handled by this point while translating +// the nested omp.wsloop). +if (isa_and_present(distributeOp.getNestedWrapper())) + return llvm::Error::success(); + // TODO: Add support for clauses which are valid for DISTRIBUTE constructs. // Static schedule is the default. auto schedule = omp::ClauseScheduleKind::Static; diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir index a5a490e527d79..d85b149c66811 100644 --- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir +++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir @@ -3307,3 +3307,68 @@ llvm.func @distribute() { // CHECK: store i64 1, ptr %[[STRIDE]] // CHECK: %[[TID:.*]] = call i32 @__kmpc_global_thread_num({{.*}}) // CHECK: call void @__kmpc_for_static_init_{{.*}}(ptr @{{.*}}, i32 %[[TID]], i32 92, ptr %[[LASTITER]], ptr %[[LB]], ptr %[[UB]], ptr %[[STRIDE]], i64 1, i64 0) + +// - + +llvm.func @distribute_wsloop(%lb : i32, %ub : i32, %step : i32) { + omp.parallel { +omp.distribute { + omp.wsloop { +omp.loop_nest (%iv) : i32 = (%lb) to (%ub) step (%step) { + omp.yield +} + } {omp.composite} +} {omp.composite} +omp.terminator + } {omp.composite} + llvm.return +} + +// CHECK-LABEL: define void @distribute_wsloop +// CHECK: call void{{.*}}@__kmpc_fork_call({{.*}}, ptr @[[OUTLINED_PARALLEL:.*]], + +// CHECK: define internal void @[[OUTLINE
[llvm-branch-commits] [llvm] [MachineBasicBlock][NFC] Decouple SplitCriticalEdges from pass manager (PR #128151)
https://github.com/optimisan edited https://github.com/llvm/llvm-project/pull/128151 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits