[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][CodeGen]Introduce function address map section and emit dynamic instruction count(CodeGen part) (PR #124334)
https://github.com/wlei-llvm created https://github.com/llvm/llvm-project/pull/124334 Test Plan: llvm/test/CodeGen/X86/function-address-map-function-sections.ll ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Handle directive arguments in OmpDirectiveSpecifier (PR #124278)
kparzysz wrote: Summary of code changes: - Remove the indirection from `std::optional>>` inside OmpDirectiveSpecifier. The type with indirection was not convertible to `std::optional`, which is a relatively useful abstraction. The consequence of that was that the definition of `OmpDirectiveSpecifier` had to be moved past the definitions of all clauses, and any occurrence of OmpDirectiveSpecifier in clauses now had to be wrapped in indirection. - New classes for arguments (and their parsers) were created: OmpLocator, OmpReductionSpecifier, and a union-like struct OmpArgument. OmpDeclareMapperSpecifier was renamed to OmpMapperSpecifier. All of them were moved to before clause definitions. The intent here was to create argument classes for directives, while keeping in mind support for clause arguments as a long(er)-term goal. - Some other cleanups were made in parse-tree.h as well: the MODIFIER_BOILERPLATE macro was moved closer to the modifier definitions, OmpObject definition was moved to the top of OpenMP definitions. - Extend symbol resolution to properly resolve symbols in OmpMapperSpecifier and OmpReductionSpecifier when embedded in WHEN/OTHERWISE clauses (i.e. inside of a METADIRECTIVE specification). https://github.com/llvm/llvm-project/pull/124278 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][llvm-readobj]Introduce function address map section and emit dynamic instruction count(readobj part) (PR #124333)
llvmbot wrote: @llvm/pr-subscribers-llvm-binary-utilities Author: Lei Wang (wlei-llvm) Changes Test Plan: llvm/test/tools/llvm-readobj/ELF/func-map.test --- Full diff: https://github.com/llvm/llvm-project/pull/124333.diff 7 Files Affected: - (modified) llvm/include/llvm/Object/ELF.h (+7) - (modified) llvm/lib/Object/ELF.cpp (+98) - (added) llvm/test/tools/llvm-readobj/ELF/func-map.test (+96) - (modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+60) - (modified) llvm/tools/llvm-readobj/ObjDumper.h (+1) - (modified) llvm/tools/llvm-readobj/Opts.td (+1) - (modified) llvm/tools/llvm-readobj/llvm-readobj.cpp (+4) ``diff diff --git a/llvm/include/llvm/Object/ELF.h b/llvm/include/llvm/Object/ELF.h index 3aa1d7864fcb70..a688672a3e5190 100644 --- a/llvm/include/llvm/Object/ELF.h +++ b/llvm/include/llvm/Object/ELF.h @@ -513,6 +513,13 @@ class ELFFile { decodeBBAddrMap(const Elf_Shdr &Sec, const Elf_Shdr *RelaSec = nullptr, std::vector *PGOAnalyses = nullptr) const; + /// Returns a vector of FuncMap structs corresponding to each function + /// within the text section that the SHT_LLVM_FUNC_MAP section \p Sec + /// is associated with. If the current ELFFile is relocatable, a corresponding + /// \p RelaSec must be passed in as an argument. + Expected> + decodeFuncMap(const Elf_Shdr &Sec, const Elf_Shdr *RelaSec = nullptr) const; + /// Returns a map from every section matching \p IsMatch to its relocation /// section, or \p nullptr if it has no relocation section. This function /// returns an error if any of the \p IsMatch calls fail or if it fails to diff --git a/llvm/lib/Object/ELF.cpp b/llvm/lib/Object/ELF.cpp index 41c3fb4cc5e406..87a9e5469f46d2 100644 --- a/llvm/lib/Object/ELF.cpp +++ b/llvm/lib/Object/ELF.cpp @@ -940,6 +940,104 @@ ELFFile::decodeBBAddrMap(const Elf_Shdr &Sec, const Elf_Shdr *RelaSec, return std::move(AddrMapsOrErr); } +template +Expected> +ELFFile::decodeFuncMap(const Elf_Shdr &Sec, + const Elf_Shdr *RelaSec) const { + bool IsRelocatable = this->getHeader().e_type == ELF::ET_REL; + + // This DenseMap maps the offset of each function (the location of the + // reference to the function in the SHT_LLVM_FUNC_ADDR_MAP section) to the + // addend (the location of the function in the text section). + llvm::DenseMap FunctionOffsetTranslations; + if (IsRelocatable && RelaSec) { +assert(RelaSec && + "Can't read a SHT_LLVM_FUNC_ADDR_MAP section in a relocatable " + "object file without providing a relocation section."); +Expected::Elf_Rela_Range> Relas = +this->relas(*RelaSec); +if (!Relas) + return createError("unable to read relocations for section " + + describe(*this, Sec) + ": " + + toString(Relas.takeError())); +for (typename ELFFile::Elf_Rela Rela : *Relas) + FunctionOffsetTranslations[Rela.r_offset] = Rela.r_addend; + } + auto GetAddressForRelocation = + [&](unsigned RelocationOffsetInSection) -> Expected { +auto FOTIterator = +FunctionOffsetTranslations.find(RelocationOffsetInSection); +if (FOTIterator == FunctionOffsetTranslations.end()) { + return createError("failed to get relocation data for offset: " + + Twine::utohexstr(RelocationOffsetInSection) + + " in section " + describe(*this, Sec)); +} +return FOTIterator->second; + }; + Expected> ContentsOrErr = this->getSectionContents(Sec); + if (!ContentsOrErr) +return ContentsOrErr.takeError(); + ArrayRef Content = *ContentsOrErr; + DataExtractor Data(Content, this->isLE(), ELFT::Is64Bits ? 8 : 4); + std::vector FunctionEntries; + + DataExtractor::Cursor Cur(0); + Error ULEBSizeErr = Error::success(); + + // Helper lampda to extract the (possiblly relocatable) address stored at Cur. + auto ExtractAddress = [&]() -> Expected::uintX_t> { +uint64_t RelocationOffsetInSection = Cur.tell(); +auto Address = +static_cast::uintX_t>(Data.getAddress(Cur)); +if (!Cur) + return Cur.takeError(); +if (!IsRelocatable) + return Address; +assert(Address == 0); +Expected AddressOrErr = +GetAddressForRelocation(RelocationOffsetInSection); +if (!AddressOrErr) + return AddressOrErr.takeError(); +return *AddressOrErr; + }; + + uint8_t Version = 0; + uint8_t Feature = 0; + FuncMap::Features FeatEnable{}; + while (!ULEBSizeErr && Cur && Cur.tell() < Content.size()) { +if (Sec.sh_type == ELF::SHT_LLVM_FUNC_MAP) { + Version = Data.getU8(Cur); + if (!Cur) +break; + if (Version > 1) +return createError("unsupported SHT_LLVM_FUNC_MAP version: " + + Twine(static_cast(Version))); + Feature = Data.getU8(Cur); // Feature byte + if (!Cur) +break; + auto FeatEnableOrErr = FuncMap::Features::decode(Feature); + if (!FeatEn
[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][llvm-readobj]Introduce function address map section and emit dynamic instruction count(readobj part) (PR #124333)
https://github.com/wlei-llvm created https://github.com/llvm/llvm-project/pull/124333 Test Plan: llvm/test/tools/llvm-readobj/ELF/func-map.test ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][CodeGen]Introduce function address map section and emit dynamic instruction count(CodeGen part) (PR #124334)
llvmbot wrote: @llvm/pr-subscribers-mc Author: Lei Wang (wlei-llvm) Changes Test Plan: llvm/test/CodeGen/X86/function-address-map-function-sections.ll --- Full diff: https://github.com/llvm/llvm-project/pull/124334.diff 11 Files Affected: - (modified) llvm/docs/Extensions.rst (+24-1) - (modified) llvm/include/llvm/CodeGen/AsmPrinter.h (+2) - (modified) llvm/include/llvm/MC/MCContext.h (+5) - (modified) llvm/include/llvm/MC/MCObjectFileInfo.h (+2) - (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+56-1) - (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+17) - (modified) llvm/lib/MC/MCParser/ELFAsmParser.cpp (+2) - (modified) llvm/lib/MC/MCSectionELF.cpp (+2) - (added) llvm/test/CodeGen/X86/function-address-map-dyn-inst-count.ll (+110) - (added) llvm/test/CodeGen/X86/function-address-map-function-sections.ll (+41) - (modified) llvm/test/MC/AsmParser/llvm_section_types.s (+4) ``diff diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst index ea267842cdc353..d94e35eeefa6ad 100644 --- a/llvm/docs/Extensions.rst +++ b/llvm/docs/Extensions.rst @@ -535,6 +535,30 @@ Example of BBAddrMap with PGO data: .uleb128 1000 # BB_3 basic block frequency (only when enabled) .uleb128 0# BB_3 successors count (only enabled with branch probabilities) +``SHT_LLVM_FUNC_MAP`` Section (function address map) +^^ +This section stores the mapping from the binary address of function to its +related metadata features. It is used to emit function-level analysis data and +can be enabled through ``--func-map=`` option. + +Three fields are stored at the beginning: a version number byte for backward +compatibility, a feature byte where each bit represents a specific feature, and +the function's entry address. The encodings for each enabled feature come after +these fields. The currently supported feature is: + +#. Dynamic Instruction Count - Total PGO counts for all instructions within the function. + +Example: + +.. code-block:: gas + + .section ".llvm_func_map","",@llvm_func_map + .byte 1 # version number + .byte 1 # feature + .quad .Lfunc_begin1 # function address + .uleb128 333 # dynamic instruction count + + ``SHT_LLVM_OFFLOADING`` Section (offloading data) ^^ This section stores the binary data used to perform offloading device linking @@ -725,4 +749,3 @@ follows: add x16, x16, :lo12:__chkstk blr x16 sub sp, sp, x15, lsl #4 - diff --git a/llvm/include/llvm/CodeGen/AsmPrinter.h b/llvm/include/llvm/CodeGen/AsmPrinter.h index 5291369b3b9f1d..5fe35c283cceda 100644 --- a/llvm/include/llvm/CodeGen/AsmPrinter.h +++ b/llvm/include/llvm/CodeGen/AsmPrinter.h @@ -414,6 +414,8 @@ class AsmPrinter : public MachineFunctionPass { void emitBBAddrMapSection(const MachineFunction &MF); + void emitFuncMapSection(const MachineFunction &MF); + void emitKCFITrapEntry(const MachineFunction &MF, const MCSymbol *Symbol); virtual void emitKCFITypeId(const MachineFunction &MF); diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h index 57ba40f7ac26fc..6fc9eaafeb09e3 100644 --- a/llvm/include/llvm/MC/MCContext.h +++ b/llvm/include/llvm/MC/MCContext.h @@ -177,6 +177,9 @@ class MCContext { /// LLVM_BB_ADDR_MAP version to emit. uint8_t BBAddrMapVersion = 2; + /// LLVM_FUNC_MAP version to emit. + uint8_t FuncMapVersion = 1; + /// The file name of the log file from the environment variable /// AS_SECURE_LOG_FILE. Which must be set before the .secure_log_unique /// directive is used or it is an error. @@ -656,6 +659,8 @@ class MCContext { uint8_t getBBAddrMapVersion() const { return BBAddrMapVersion; } + uint8_t getFuncMapVersion() const { return FuncMapVersion; } + /// @} /// \name Dwarf Management diff --git a/llvm/include/llvm/MC/MCObjectFileInfo.h b/llvm/include/llvm/MC/MCObjectFileInfo.h index fb575fe721015c..e344d4772e3fec 100644 --- a/llvm/include/llvm/MC/MCObjectFileInfo.h +++ b/llvm/include/llvm/MC/MCObjectFileInfo.h @@ -364,6 +364,8 @@ class MCObjectFileInfo { MCSection *getBBAddrMapSection(const MCSection &TextSec) const; + MCSection *getFuncMapSection(const MCSection &TextSec) const; + MCSection *getKCFITrapSection(const MCSection &TextSec) const; MCSection *getPseudoProbeSection(const MCSection &TextSec) const; diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp index b2a4721f37b268..a00db04ef654c2 100644 --- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp @@ -147,6 +147,11 @@ enum class PGOMapFeaturesEnum { BrProb, All, }; + +enum class FuncMapFeaturesEnum { + DynamicInstCount, +}; + static cl::bits PgoA
[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][CodeGen]Introduce function address map section and emit dynamic instruction count(CodeGen part) (PR #124334)
llvmbot wrote: @llvm/pr-subscribers-backend-x86 Author: Lei Wang (wlei-llvm) Changes Test Plan: llvm/test/CodeGen/X86/function-address-map-function-sections.ll --- Full diff: https://github.com/llvm/llvm-project/pull/124334.diff 11 Files Affected: - (modified) llvm/docs/Extensions.rst (+24-1) - (modified) llvm/include/llvm/CodeGen/AsmPrinter.h (+2) - (modified) llvm/include/llvm/MC/MCContext.h (+5) - (modified) llvm/include/llvm/MC/MCObjectFileInfo.h (+2) - (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+56-1) - (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+17) - (modified) llvm/lib/MC/MCParser/ELFAsmParser.cpp (+2) - (modified) llvm/lib/MC/MCSectionELF.cpp (+2) - (added) llvm/test/CodeGen/X86/function-address-map-dyn-inst-count.ll (+110) - (added) llvm/test/CodeGen/X86/function-address-map-function-sections.ll (+41) - (modified) llvm/test/MC/AsmParser/llvm_section_types.s (+4) ``diff diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst index ea267842cdc353..d94e35eeefa6ad 100644 --- a/llvm/docs/Extensions.rst +++ b/llvm/docs/Extensions.rst @@ -535,6 +535,30 @@ Example of BBAddrMap with PGO data: .uleb128 1000 # BB_3 basic block frequency (only when enabled) .uleb128 0# BB_3 successors count (only enabled with branch probabilities) +``SHT_LLVM_FUNC_MAP`` Section (function address map) +^^ +This section stores the mapping from the binary address of function to its +related metadata features. It is used to emit function-level analysis data and +can be enabled through ``--func-map=`` option. + +Three fields are stored at the beginning: a version number byte for backward +compatibility, a feature byte where each bit represents a specific feature, and +the function's entry address. The encodings for each enabled feature come after +these fields. The currently supported feature is: + +#. Dynamic Instruction Count - Total PGO counts for all instructions within the function. + +Example: + +.. code-block:: gas + + .section ".llvm_func_map","",@llvm_func_map + .byte 1 # version number + .byte 1 # feature + .quad .Lfunc_begin1 # function address + .uleb128 333 # dynamic instruction count + + ``SHT_LLVM_OFFLOADING`` Section (offloading data) ^^ This section stores the binary data used to perform offloading device linking @@ -725,4 +749,3 @@ follows: add x16, x16, :lo12:__chkstk blr x16 sub sp, sp, x15, lsl #4 - diff --git a/llvm/include/llvm/CodeGen/AsmPrinter.h b/llvm/include/llvm/CodeGen/AsmPrinter.h index 5291369b3b9f1d..5fe35c283cceda 100644 --- a/llvm/include/llvm/CodeGen/AsmPrinter.h +++ b/llvm/include/llvm/CodeGen/AsmPrinter.h @@ -414,6 +414,8 @@ class AsmPrinter : public MachineFunctionPass { void emitBBAddrMapSection(const MachineFunction &MF); + void emitFuncMapSection(const MachineFunction &MF); + void emitKCFITrapEntry(const MachineFunction &MF, const MCSymbol *Symbol); virtual void emitKCFITypeId(const MachineFunction &MF); diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h index 57ba40f7ac26fc..6fc9eaafeb09e3 100644 --- a/llvm/include/llvm/MC/MCContext.h +++ b/llvm/include/llvm/MC/MCContext.h @@ -177,6 +177,9 @@ class MCContext { /// LLVM_BB_ADDR_MAP version to emit. uint8_t BBAddrMapVersion = 2; + /// LLVM_FUNC_MAP version to emit. + uint8_t FuncMapVersion = 1; + /// The file name of the log file from the environment variable /// AS_SECURE_LOG_FILE. Which must be set before the .secure_log_unique /// directive is used or it is an error. @@ -656,6 +659,8 @@ class MCContext { uint8_t getBBAddrMapVersion() const { return BBAddrMapVersion; } + uint8_t getFuncMapVersion() const { return FuncMapVersion; } + /// @} /// \name Dwarf Management diff --git a/llvm/include/llvm/MC/MCObjectFileInfo.h b/llvm/include/llvm/MC/MCObjectFileInfo.h index fb575fe721015c..e344d4772e3fec 100644 --- a/llvm/include/llvm/MC/MCObjectFileInfo.h +++ b/llvm/include/llvm/MC/MCObjectFileInfo.h @@ -364,6 +364,8 @@ class MCObjectFileInfo { MCSection *getBBAddrMapSection(const MCSection &TextSec) const; + MCSection *getFuncMapSection(const MCSection &TextSec) const; + MCSection *getKCFITrapSection(const MCSection &TextSec) const; MCSection *getPseudoProbeSection(const MCSection &TextSec) const; diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp index b2a4721f37b268..a00db04ef654c2 100644 --- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp @@ -147,6 +147,11 @@ enum class PGOMapFeaturesEnum { BrProb, All, }; + +enum class FuncMapFeaturesEnum { + DynamicInstCount, +}; + static cl::
[llvm-branch-commits] [flang] [Flang] Remove FLANG_INCLUDE_RUNTIME (PR #124126)
https://github.com/Meinersbur updated https://github.com/llvm/llvm-project/pull/124126 >From c515d13f0ad684763e6d76a87a610801482c15f4 Mon Sep 17 00:00:00 2001 From: Michael Kruse Date: Fri, 24 Jan 2025 16:52:46 +0100 Subject: [PATCH] [Flang] Remove FLANG_INCLUDE_RUNTIME --- flang/CMakeLists.txt | 25 +- .../modules/AddFlangOffloadRuntime.cmake | 146 flang/runtime/CMakeLists.txt | 350 -- flang/runtime/CUDA/CMakeLists.txt | 41 -- flang/runtime/Float128Math/CMakeLists.txt | 133 --- flang/test/CMakeLists.txt | 10 - flang/test/lit.cfg.py | 3 - flang/test/lit.site.cfg.py.in | 1 - flang/tools/f18/CMakeLists.txt| 17 +- flang/unittests/CMakeLists.txt| 43 +-- flang/unittests/Evaluate/CMakeLists.txt | 16 - 11 files changed, 5 insertions(+), 780 deletions(-) delete mode 100644 flang/cmake/modules/AddFlangOffloadRuntime.cmake delete mode 100644 flang/runtime/CMakeLists.txt delete mode 100644 flang/runtime/CUDA/CMakeLists.txt delete mode 100644 flang/runtime/Float128Math/CMakeLists.txt diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt index 38004c149b7835..aceb2d09c54388 100644 --- a/flang/CMakeLists.txt +++ b/flang/CMakeLists.txt @@ -23,7 +23,6 @@ if (LLVM_ENABLE_EH) endif() set(FLANG_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}) -set(FLANG_RT_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../flang-rt") if (CMAKE_SOURCE_DIR STREQUAL CMAKE_BINARY_DIR AND NOT MSVC_IDE) message(FATAL_ERROR "In-source builds are not allowed. \ @@ -237,24 +236,8 @@ else() include_directories(SYSTEM ${MLIR_TABLEGEN_OUTPUT_DIR}) endif() -set(FLANG_INCLUDE_RUNTIME_default ON) -if ("flang-rt" IN_LIST LLVM_ENABLE_RUNTIMES) - set(FLANG_INCLUDE_RUNTIME_default OFF) -endif () -option(FLANG_INCLUDE_RUNTIME "Build the runtime in-tree (deprecated; to be replaced with LLVM_ENABLE_RUNTIMES=flang-rt)" FLANG_INCLUDE_RUNTIME_default) -if (FLANG_INCLUDE_RUNTIME) - if ("flang-rt" IN_LIST LLVM_ENABLE_RUNTIMES) -message(WARNING "Building Flang-RT using LLVM_ENABLE_RUNTIMES. FLANG_INCLUDE_RUNTIME=${FLANG_INCLUDE_RUNTIME} ignored.") -set(FLANG_INCLUDE_RUNTIME OFF) - else () - message(STATUS "Building flang_rt in-tree") - endif () -else () - if ("flang-rt" IN_LIST LLVM_ENABLE_RUNTIMES) -message(STATUS "Building Flang-RT using LLVM_ENABLE_RUNTIMES.") - else () -message(STATUS "Not building Flang-RT. For a usable Fortran toolchain, compile a standalone Flang-RT") - endif () +if (NOT "flang-rt" IN_LIST LLVM_ENABLE_RUNTIMES) + message(STATUS "Not building Flang-RT. For a usable Fortran toolchain, compile a standalone Flang-RT") endif () set(FLANG_TOOLS_INSTALL_DIR "${CMAKE_INSTALL_BINDIR}" CACHE PATH @@ -484,10 +467,6 @@ if (FLANG_CUF_RUNTIME) find_package(CUDAToolkit REQUIRED) endif() -if (FLANG_INCLUDE_RUNTIME) - add_subdirectory(runtime) -endif () - if (LLVM_INCLUDE_EXAMPLES) add_subdirectory(examples) endif() diff --git a/flang/cmake/modules/AddFlangOffloadRuntime.cmake b/flang/cmake/modules/AddFlangOffloadRuntime.cmake deleted file mode 100644 index 8e4f47d18535dc..00 --- a/flang/cmake/modules/AddFlangOffloadRuntime.cmake +++ /dev/null @@ -1,146 +0,0 @@ -option(FLANG_EXPERIMENTAL_CUDA_RUNTIME - "Compile Fortran runtime as CUDA sources (experimental)" OFF - ) - -option(FLANG_CUDA_RUNTIME_PTX_WITHOUT_GLOBAL_VARS - "Do not compile global variables' definitions when producing PTX library" OFF - ) - -set(FLANG_LIBCUDACXX_PATH "" CACHE PATH "Path to libcu++ package installation") - -set(FLANG_EXPERIMENTAL_OMP_OFFLOAD_BUILD "off" CACHE STRING - "Compile Fortran runtime as OpenMP target offload sources (experimental). Valid options are 'off', 'host_device', 'nohost'") - -set(FLANG_OMP_DEVICE_ARCHITECTURES "all" CACHE STRING - "List of OpenMP device architectures to be used to compile the Fortran runtime (e.g. 'gfx1103;sm_90')") - -macro(enable_cuda_compilation name files) - if (FLANG_EXPERIMENTAL_CUDA_RUNTIME) -if (BUILD_SHARED_LIBS) - message(FATAL_ERROR -"BUILD_SHARED_LIBS is not supported for CUDA build of Fortran runtime" -) -endif() - -enable_language(CUDA) - -# TODO: figure out how to make target property CUDA_SEPARABLE_COMPILATION -# work, and avoid setting CMAKE_CUDA_SEPARABLE_COMPILATION. -set(CMAKE_CUDA_SEPARABLE_COMPILATION ON) - -# Treat all supported sources as CUDA files. -set_source_files_properties(${files} PROPERTIES LANGUAGE CUDA) -set(CUDA_COMPILE_OPTIONS) -if ("${CMAKE_CUDA_COMPILER_ID}" MATCHES "Clang") - # Allow varargs. - set(CUDA_COMPILE_OPTIONS --Xclang -fcuda-allow-variadic-functions -) -endif() -if ("${CMAKE_CUDA_COMPILER_ID}" MATCHES "NVIDIA") - set(CUDA_COMPILE_OPTIONS ---expt-relaxed-constexpr -# Disable these warnings: -#
[llvm-branch-commits] [clang] [flang] [lld] [Flang] Rename libFortranRuntime.a to libflang_rt.a (PR #122341)
llvmbot wrote: @llvm/pr-subscribers-clang-driver Author: Michael Kruse (Meinersbur) Changes The future name of Flang's runtime component is `flang_rt`, as already used in PR #110217 (Flang-RT). Since the flang driver has to select the runtime to link, both build instructions must agree on the name. Extracted out of #110217 --- Patch is 23.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122341.diff 26 Files Affected: - (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+2-2) - (modified) clang/lib/Driver/ToolChains/Flang.cpp (+4-4) - (modified) flang/CMakeLists.txt (+1-1) - (modified) flang/docs/FlangDriver.md (+3-3) - (modified) flang/docs/GettingStarted.md (+3-3) - (modified) flang/docs/OpenACC-descriptor-management.md (+1-1) - (modified) flang/docs/ReleaseNotes.md (+2) - (modified) flang/examples/ExternalHelloWorld/CMakeLists.txt (+1-1) - (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+1-1) - (modified) flang/runtime/CMakeLists.txt (+23-17) - (modified) flang/runtime/CUDA/CMakeLists.txt (+1-1) - (modified) flang/runtime/Float128Math/CMakeLists.txt (+1-1) - (modified) flang/runtime/time-intrinsic.cpp (+1-1) - (modified) flang/test/CMakeLists.txt (+7-1) - (modified) flang/test/Driver/gcc-toolchain-install-dir.f90 (+1-1) - (modified) flang/test/Driver/linker-flags.f90 (+4-4) - (modified) flang/test/Driver/msvc-dependent-lib-flags.f90 (+4-4) - (modified) flang/test/Driver/nostdlib.f90 (+1-1) - (modified) flang/test/Runtime/no-cpp-dep.c (+1-1) - (modified) flang/test/lit.cfg.py (+1-1) - (modified) flang/tools/f18/CMakeLists.txt (+4-4) - (modified) flang/unittests/CMakeLists.txt (+1-1) - (modified) flang/unittests/Evaluate/CMakeLists.txt (+2-2) - (modified) flang/unittests/Runtime/CMakeLists.txt (+1-1) - (modified) flang/unittests/Runtime/CUDA/CMakeLists.txt (+1-1) - (modified) lld/COFF/MinGW.cpp (+1-1) ``diff diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index b5273dd8cf1e3a..c7b0a660ee021f 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -1321,7 +1321,7 @@ void tools::addOpenMPHostOffloadingArgs(const Compilation &C, /// Add Fortran runtime libs void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args, llvm::opt::ArgStringList &CmdArgs) { - // Link FortranRuntime + // Link flang_rt // These are handled earlier on Windows by telling the frontend driver to // add the correct libraries to link against as dependents in the object // file. @@ -1337,7 +1337,7 @@ void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args, if (AsNeeded) addAsNeededOption(TC, Args, CmdArgs, /*as_needed=*/false); } -CmdArgs.push_back("-lFortranRuntime"); +CmdArgs.push_back("-lflang_rt"); addArchSpecificRPath(TC, Args, CmdArgs); } diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index f1bf32b3238270..68a17edf8ca341 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -360,26 +360,26 @@ static void processVSRuntimeLibrary(const ToolChain &TC, const ArgList &Args, case options::OPT__SLASH_MT: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("--dependent-lib=libcmt"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.static.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.static.lib"); break; case options::OPT__SLASH_MTd: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DEBUG"); CmdArgs.push_back("--dependent-lib=libcmtd"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.static_dbg.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.static_dbg.lib"); break; case options::OPT__SLASH_MD: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DLL"); CmdArgs.push_back("--dependent-lib=msvcrt"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.dynamic.lib"); break; case options::OPT__SLASH_MDd: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DEBUG"); CmdArgs.push_back("-D_DLL"); CmdArgs.push_back("--dependent-lib=msvcrtd"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic_dbg.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.dynamic_dbg.lib"); break; } } diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt index 7d6dcb5c184a52..8a8b8bfa73b007 100644 --- a/flang/CMakeLists.txt +++ b/flang/CMakeLists.txt @@ -301,7 +301,7 @@ set(FLANG_DEFAULT_LINKER "" CACHE STRING "Default linker to use (linker name or absolute path, empty for platform default)") set(FLANG_DEFAULT_RTLIB "" CACHE STRING - "Default Fortran runtime library to use (\"libFortranRuntime\"), leave empty for platform default.") + "Defaul
[llvm-branch-commits] [clang] [flang] [lld] [Flang] Rename libFortranRuntime.a to libflang_rt.a (PR #122341)
llvmbot wrote: @llvm/pr-subscribers-flang-semantics Author: Michael Kruse (Meinersbur) Changes The future name of Flang's runtime component is `flang_rt`, as already used in PR #110217 (Flang-RT). Since the flang driver has to select the runtime to link, both build instructions must agree on the name. Extracted out of #110217 --- Patch is 23.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122341.diff 26 Files Affected: - (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+2-2) - (modified) clang/lib/Driver/ToolChains/Flang.cpp (+4-4) - (modified) flang/CMakeLists.txt (+1-1) - (modified) flang/docs/FlangDriver.md (+3-3) - (modified) flang/docs/GettingStarted.md (+3-3) - (modified) flang/docs/OpenACC-descriptor-management.md (+1-1) - (modified) flang/docs/ReleaseNotes.md (+2) - (modified) flang/examples/ExternalHelloWorld/CMakeLists.txt (+1-1) - (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+1-1) - (modified) flang/runtime/CMakeLists.txt (+23-17) - (modified) flang/runtime/CUDA/CMakeLists.txt (+1-1) - (modified) flang/runtime/Float128Math/CMakeLists.txt (+1-1) - (modified) flang/runtime/time-intrinsic.cpp (+1-1) - (modified) flang/test/CMakeLists.txt (+7-1) - (modified) flang/test/Driver/gcc-toolchain-install-dir.f90 (+1-1) - (modified) flang/test/Driver/linker-flags.f90 (+4-4) - (modified) flang/test/Driver/msvc-dependent-lib-flags.f90 (+4-4) - (modified) flang/test/Driver/nostdlib.f90 (+1-1) - (modified) flang/test/Runtime/no-cpp-dep.c (+1-1) - (modified) flang/test/lit.cfg.py (+1-1) - (modified) flang/tools/f18/CMakeLists.txt (+4-4) - (modified) flang/unittests/CMakeLists.txt (+1-1) - (modified) flang/unittests/Evaluate/CMakeLists.txt (+2-2) - (modified) flang/unittests/Runtime/CMakeLists.txt (+1-1) - (modified) flang/unittests/Runtime/CUDA/CMakeLists.txt (+1-1) - (modified) lld/COFF/MinGW.cpp (+1-1) ``diff diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index b5273dd8cf1e3a..c7b0a660ee021f 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -1321,7 +1321,7 @@ void tools::addOpenMPHostOffloadingArgs(const Compilation &C, /// Add Fortran runtime libs void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args, llvm::opt::ArgStringList &CmdArgs) { - // Link FortranRuntime + // Link flang_rt // These are handled earlier on Windows by telling the frontend driver to // add the correct libraries to link against as dependents in the object // file. @@ -1337,7 +1337,7 @@ void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args, if (AsNeeded) addAsNeededOption(TC, Args, CmdArgs, /*as_needed=*/false); } -CmdArgs.push_back("-lFortranRuntime"); +CmdArgs.push_back("-lflang_rt"); addArchSpecificRPath(TC, Args, CmdArgs); } diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index f1bf32b3238270..68a17edf8ca341 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -360,26 +360,26 @@ static void processVSRuntimeLibrary(const ToolChain &TC, const ArgList &Args, case options::OPT__SLASH_MT: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("--dependent-lib=libcmt"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.static.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.static.lib"); break; case options::OPT__SLASH_MTd: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DEBUG"); CmdArgs.push_back("--dependent-lib=libcmtd"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.static_dbg.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.static_dbg.lib"); break; case options::OPT__SLASH_MD: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DLL"); CmdArgs.push_back("--dependent-lib=msvcrt"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.dynamic.lib"); break; case options::OPT__SLASH_MDd: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DEBUG"); CmdArgs.push_back("-D_DLL"); CmdArgs.push_back("--dependent-lib=msvcrtd"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic_dbg.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.dynamic_dbg.lib"); break; } } diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt index 7d6dcb5c184a52..8a8b8bfa73b007 100644 --- a/flang/CMakeLists.txt +++ b/flang/CMakeLists.txt @@ -301,7 +301,7 @@ set(FLANG_DEFAULT_LINKER "" CACHE STRING "Default linker to use (linker name or absolute path, empty for platform default)") set(FLANG_DEFAULT_RTLIB "" CACHE STRING - "Default Fortran runtime library to use (\"libFortranRuntime\"), leave empty for platform default.") + "Def
[llvm-branch-commits] [clang] [flang] [llvm] [Flang] LLVM_ENABLE_RUNTIMES=flang-rt (PR #110217)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Michael Kruse (Meinersbur) Changes Extract Flang's runtime library to use the LLVM_ENABLE_RUNTIME mechanism. Motivation: * Consistency with LLVM's other runtime libraries (compiler-rt, libc, libcxx, openmp offload, ...) * Allows compiling the runtime for multiple targets at once using the LLVM_RUNTIME_TARGETS configuration options * Installs the runtime into the compiler's per-target resource directory so it can be automatically found even when cross-compiling Potential future directions: * Uses CMake's support for compiling Fortran files, including dependency resolution of Fortran modules * Improve robustness of compiling `libomp.mod` when openmp is available * Remove Flang's dependency from flang-rt's RTNAME function declarations (tblgen?) * Reduce Flang's build-time dependency from flang-rt's `REAL(16)` support See RFC discussion at https://discourse.llvm.org/t/rfc-use-llvm-enable-runtimes-for-flangs-runtime/80826 Patch series: * #110244 * #112188 * #121997 * #122069 * #122334 * #122336 * #122341 * #110298 * #110217 (this PR) * #121782 * #124126 Patch for lab.llvm.org buildbots: * https://github.com/llvm/llvm-zorg/pull/333 --- Patch is 108.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110217.diff 41 Files Affected: - (modified) clang/lib/Driver/ToolChains/Flang.cpp (+9-5) - (added) flang-rt/.clang-tidy (+2) - (added) flang-rt/CMakeLists.txt (+248) - (added) flang-rt/CODE_OWNERS.TXT (+14) - (added) flang-rt/LICENSE.TXT (+234) - (added) flang-rt/README.md (+188) - (added) flang-rt/cmake/modules/AddFlangRT.cmake (+186) - (added) flang-rt/cmake/modules/AddFlangRTOffload.cmake (+101) - (added) flang-rt/cmake/modules/GetToolchainDirs.cmake (+125) - (added) flang-rt/lib/CMakeLists.txt (+18) - (added) flang-rt/lib/FortranFloat128Math/CMakeLists.txt (+136) - (added) flang-rt/lib/Testing/CMakeLists.txt (+20) - (added) flang-rt/lib/flang_rt/CMakeLists.txt (+213) - (added) flang-rt/lib/flang_rt/CUDA/CMakeLists.txt (+33) - (modified) flang-rt/lib/flang_rt/io-api-minimal.cpp (+1-1) - (added) flang-rt/test/CMakeLists.txt (+59) - (modified) flang-rt/test/Driver/ctofortran.f90 (+5-24) - (modified) flang-rt/test/Driver/exec.f90 (+4-4) - (added) flang-rt/test/NonGtestUnit/lit.cfg.py (+22) - (added) flang-rt/test/NonGtestUnit/lit.site.cfg.py.in (+14) - (modified) flang-rt/test/Runtime/no-cpp-dep.c (+3-2) - (added) flang-rt/test/Unit/lit.cfg.py (+21) - (added) flang-rt/test/Unit/lit.site.cfg.py.in (+15) - (added) flang-rt/test/lit.cfg.py (+102) - (added) flang-rt/test/lit.site.cfg.py.in (+19) - (added) flang-rt/unittests/CMakeLists.txt (+111) - (added) flang-rt/unittests/Evaluate/CMakeLists.txt (+21) - (added) flang-rt/unittests/Runtime/CMakeLists.txt (+48) - (added) flang-rt/unittests/Runtime/CUDA/CMakeLists.txt (+18) - (modified) flang/CMakeLists.txt (+26-27) - (added) flang/cmake/modules/FlangCommon.cmake (+43) - (modified) flang/docs/GettingStarted.md (+58-50) - (modified) flang/docs/ReleaseNotes.md (+7-1) - (modified) flang/module/iso_fortran_env_impl.f90 (+1-1) - (modified) flang/test/lit.cfg.py (-20) - (modified) flang/test/lit.site.cfg.py.in (-3) - (modified) llvm/CMakeLists.txt (+7-1) - (modified) llvm/cmake/modules/LLVMExternalProjectUtils.cmake (+15-1) - (modified) llvm/projects/CMakeLists.txt (+3-1) - (modified) llvm/runtimes/CMakeLists.txt (+18-7) - (modified) runtimes/CMakeLists.txt (+1-1) ``diff diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 68a17edf8ca341..17a8a4dd8d0a87 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -342,11 +342,15 @@ static void processVSRuntimeLibrary(const ToolChain &TC, const ArgList &Args, ArgStringList &CmdArgs) { assert(TC.getTriple().isKnownWindowsMSVCEnvironment() && "can only add VS runtime library on Windows!"); - // if -fno-fortran-main has been passed, skip linking Fortran_main.a - if (TC.getTriple().isKnownWindowsMSVCEnvironment()) { -CmdArgs.push_back(Args.MakeArgString( -"--dependent-lib=" + TC.getCompilerRTBasename(Args, "builtins"))); - } + + // Flang/Clang (including clang-cl) -compiled programs targeting the MSVC ABI + // should only depend on msv(u)crt. LLVM still emits libgcc/compiler-rt + // functions in some cases like 128-bit integer math (__udivti3, __modti3, + // __fixsfti, __floattidf, ...) that msvc does not support. We are injecting a + // dependency to Compiler-RT's builtin library where these are implemented. + CmdArgs.push_back(Args.MakeArgString( + "--dependent-lib=" + TC.getCompilerRTBasename(Args, "builtins"))); + unsigned RTOptionID = options::OPT__SLASH_MT; if (auto *rtl = Args.getLastArg(options::OPT_fms_runtime_lib_EQ)) { RTOptionID = llvm::StringSwitch(rtl->getValue()) d
[llvm-branch-commits] [flang] [Flang] Promote FortranEvaluateTesting library (PR #122334)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/122334 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Flang-RT] Build libflang_rt.so (PR #121782)
https://github.com/Meinersbur updated https://github.com/llvm/llvm-project/pull/121782 >From b05c9a033158aea459d51ff34b8ec47e72f85740 Mon Sep 17 00:00:00 2001 From: Michael Kruse Date: Fri, 24 Jan 2025 16:51:27 +0100 Subject: [PATCH] [Flang-RT] Build libflang_rt.so --- flang-rt/CMakeLists.txt | 30 ++ flang-rt/cmake/modules/AddFlangRT.cmake | 324 -- .../cmake/modules/AddFlangRTOffload.cmake | 18 +- flang-rt/lib/flang_rt/CMakeLists.txt | 9 +- flang-rt/lib/flang_rt/CUDA/CMakeLists.txt | 26 +- flang-rt/test/CMakeLists.txt | 2 +- flang-rt/test/lit.cfg.py | 2 +- 7 files changed, 283 insertions(+), 128 deletions(-) diff --git a/flang-rt/CMakeLists.txt b/flang-rt/CMakeLists.txt index 655d0a55b40044..0b91b6ae7eea78 100644 --- a/flang-rt/CMakeLists.txt +++ b/flang-rt/CMakeLists.txt @@ -115,6 +115,15 @@ endif () extend_path(FLANG_RT_INSTALL_RESOURCE_LIB_PATH "${FLANG_RT_INSTALL_RESOURCE_PATH}" "${toolchain_lib_subdir}") cmake_path(NORMAL_PATH FLANG_RT_OUTPUT_RESOURCE_DIR) cmake_path(NORMAL_PATH FLANG_RT_INSTALL_RESOURCE_PATH) +# FIXME: For the libflang_rt.so, the toolchain resource lib dir is not a good +#destination because it is not a ld.so default search path. +#The machine where the executable is eventually executed may not be the +#machine where the Flang compiler and its resource dir is installed, so +#setting RPath by the driver is not an solution. It should belong into +#/usr/lib//libflang_rt.so, like e.g. libgcc_s.so. +#But the linker as invoked by the Flang driver also requires +#libflang_rt.so to be found when linking and the resource lib dir is +#the only reliable location. cmake_path(NORMAL_PATH FLANG_RT_OUTPUT_RESOURCE_LIB_DIR) cmake_path(NORMAL_PATH FLANG_RT_INSTALL_RESOURCE_LIB_PATH) @@ -129,6 +138,27 @@ cmake_path(NORMAL_PATH FLANG_RT_INSTALL_RESOURCE_LIB_PATH) option(FLANG_RT_INCLUDE_TESTS "Generate build targets for the flang-rt unit and regression-tests." "${LLVM_INCLUDE_TESTS}") +option(FLANG_RT_ENABLE_STATIC "Build Flang-RT as a static library." ON) +if (WIN32) + # Windows DLL currently not implemented. + set(FLANG_RT_ENABLE_SHARED OFF) +else () + # TODO: Enable by default to increase test coverage, and which version of the + # library should be the user's choice anyway. + # Currently, the Flang driver adds `-L"libdir" -lflang_rt` as linker + # argument, which leaves the choice which library to use to the linker. + # Since most linkers prefer the shared library, this would constitute a + # breaking change unless the driver is changed. + option(FLANG_RT_ENABLE_SHARED "Build Flang-RT as a shared library." OFF) +endif () +if (NOT FLANG_RT_ENABLE_STATIC AND NOT FLANG_RT_ENABLE_SHARED) + message(FATAL_ERROR " + Must build at least one type of library + (FLANG_RT_ENABLE_STATIC=ON, FLANG_RT_ENABLE_SHARED=ON, or both) +") +endif () + + set(FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT "" CACHE STRING "Compile Flang-RT with GPU support (CUDA or OpenMP)") set_property(CACHE FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT PROPERTY STRINGS "" diff --git a/flang-rt/cmake/modules/AddFlangRT.cmake b/flang-rt/cmake/modules/AddFlangRT.cmake index aa8adedf61752a..87ec58b2e854eb 100644 --- a/flang-rt/cmake/modules/AddFlangRT.cmake +++ b/flang-rt/cmake/modules/AddFlangRT.cmake @@ -16,7 +16,8 @@ # STATIC # Build a static (.a/.lib) library # OBJECT -# Create only object files without static/dynamic library +# Always create an object library. +# Without SHARED/STATIC, build only the object library. # INSTALL_WITH_TOOLCHAIN # Install library into Clang's resource directory so it can be found by the # Flang driver during compilation, including tests @@ -48,17 +49,73 @@ function (add_flangrt_library name) ") endif () - # Forward libtype to add_library - set(extra_args "") - if (ARG_SHARED) -list(APPEND extra_args SHARED) + # Internal names of libraries. If called with just single type option, use + # the default name for it. Name of targets must only depend on function + # arguments to be predictable for callers. + set(name_static "${name}.static") + set(name_shared "${name}.shared") + set(name_object "obj.${name}") + if (ARG_STATIC AND NOT ARG_SHARED) +set(name_static "${name}") + elseif (NOT ARG_STATIC AND ARG_SHARED) +set(name_shared "${name}") + elseif (NOT ARG_STATIC AND NOT ARG_SHARED AND ARG_OBJECT) +set(name_object "${name}") + elseif (NOT ARG_STATIC AND NOT ARG_SHARED AND NOT ARG_OBJECT) +# Only one of them will actually be built. +set(name_static "${name}") +set(name_shared "${name}") endif () - if (ARG_STATIC) -list(APPEND extra_args STATIC) + + # Determine what to build. If not explicitly specified, honor + # BUILD_SHARED_LIBS (e.g. for unittest libraries). If can build s
[llvm-branch-commits] [Flang] Introduce FortranSupport (PR #122069)
https://github.com/Meinersbur closed https://github.com/llvm/llvm-project/pull/122069 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [flang] [lld] [Flang] Rename libFortranRuntime.a to libflang_rt.a (PR #122341)
llvmbot wrote: @llvm/pr-subscribers-lld @llvm/pr-subscribers-flang-driver Author: Michael Kruse (Meinersbur) Changes The future name of Flang's runtime component is `flang_rt`, as already used in PR #110217 (Flang-RT). Since the flang driver has to select the runtime to link, both build instructions must agree on the name. Extracted out of #110217 --- Patch is 23.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122341.diff 26 Files Affected: - (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+2-2) - (modified) clang/lib/Driver/ToolChains/Flang.cpp (+4-4) - (modified) flang/CMakeLists.txt (+1-1) - (modified) flang/docs/FlangDriver.md (+3-3) - (modified) flang/docs/GettingStarted.md (+3-3) - (modified) flang/docs/OpenACC-descriptor-management.md (+1-1) - (modified) flang/docs/ReleaseNotes.md (+2) - (modified) flang/examples/ExternalHelloWorld/CMakeLists.txt (+1-1) - (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+1-1) - (modified) flang/runtime/CMakeLists.txt (+23-17) - (modified) flang/runtime/CUDA/CMakeLists.txt (+1-1) - (modified) flang/runtime/Float128Math/CMakeLists.txt (+1-1) - (modified) flang/runtime/time-intrinsic.cpp (+1-1) - (modified) flang/test/CMakeLists.txt (+7-1) - (modified) flang/test/Driver/gcc-toolchain-install-dir.f90 (+1-1) - (modified) flang/test/Driver/linker-flags.f90 (+4-4) - (modified) flang/test/Driver/msvc-dependent-lib-flags.f90 (+4-4) - (modified) flang/test/Driver/nostdlib.f90 (+1-1) - (modified) flang/test/Runtime/no-cpp-dep.c (+1-1) - (modified) flang/test/lit.cfg.py (+1-1) - (modified) flang/tools/f18/CMakeLists.txt (+4-4) - (modified) flang/unittests/CMakeLists.txt (+1-1) - (modified) flang/unittests/Evaluate/CMakeLists.txt (+2-2) - (modified) flang/unittests/Runtime/CMakeLists.txt (+1-1) - (modified) flang/unittests/Runtime/CUDA/CMakeLists.txt (+1-1) - (modified) lld/COFF/MinGW.cpp (+1-1) ``diff diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index b5273dd8cf1e3a..c7b0a660ee021f 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -1321,7 +1321,7 @@ void tools::addOpenMPHostOffloadingArgs(const Compilation &C, /// Add Fortran runtime libs void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args, llvm::opt::ArgStringList &CmdArgs) { - // Link FortranRuntime + // Link flang_rt // These are handled earlier on Windows by telling the frontend driver to // add the correct libraries to link against as dependents in the object // file. @@ -1337,7 +1337,7 @@ void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args, if (AsNeeded) addAsNeededOption(TC, Args, CmdArgs, /*as_needed=*/false); } -CmdArgs.push_back("-lFortranRuntime"); +CmdArgs.push_back("-lflang_rt"); addArchSpecificRPath(TC, Args, CmdArgs); } diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index f1bf32b3238270..68a17edf8ca341 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -360,26 +360,26 @@ static void processVSRuntimeLibrary(const ToolChain &TC, const ArgList &Args, case options::OPT__SLASH_MT: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("--dependent-lib=libcmt"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.static.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.static.lib"); break; case options::OPT__SLASH_MTd: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DEBUG"); CmdArgs.push_back("--dependent-lib=libcmtd"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.static_dbg.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.static_dbg.lib"); break; case options::OPT__SLASH_MD: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DLL"); CmdArgs.push_back("--dependent-lib=msvcrt"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.dynamic.lib"); break; case options::OPT__SLASH_MDd: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DEBUG"); CmdArgs.push_back("-D_DLL"); CmdArgs.push_back("--dependent-lib=msvcrtd"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic_dbg.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.dynamic_dbg.lib"); break; } } diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt index 7d6dcb5c184a52..8a8b8bfa73b007 100644 --- a/flang/CMakeLists.txt +++ b/flang/CMakeLists.txt @@ -301,7 +301,7 @@ set(FLANG_DEFAULT_LINKER "" CACHE STRING "Default linker to use (linker name or absolute path, empty for platform default)") set(FLANG_DEFAULT_RTLIB "" CACHE STRING - "Default Fortran runtime library to use (\"libFortranRuntime\"), leave empty for platfo
[llvm-branch-commits] [flang] [Flang] Optionally do not compile the runtime in-tree (PR #122336)
https://github.com/Meinersbur updated https://github.com/llvm/llvm-project/pull/122336 >From 4c676f468ba344ac0c388583a4ed28035d05ae89 Mon Sep 17 00:00:00 2001 From: Michael Kruse Date: Fri, 24 Jan 2025 15:00:16 +0100 Subject: [PATCH] users/meinersbur/flang_runtime_FLANG_INCLUDE_RUNTIME --- flang/CMakeLists.txt| 6 +- flang/test/CMakeLists.txt | 6 +- flang/test/Driver/ctofortran.f90| 1 + flang/test/Driver/exec.f90 | 1 + flang/test/Runtime/no-cpp-dep.c | 2 +- flang/test/lit.cfg.py | 5 - flang/test/lit.site.cfg.py.in | 2 ++ flang/tools/f18/CMakeLists.txt | 2 +- flang/unittests/CMakeLists.txt | 11 +- flang/unittests/Evaluate/CMakeLists.txt | 27 + 10 files changed, 44 insertions(+), 19 deletions(-) diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt index b619553ef83021..7d6dcb5c184a52 100644 --- a/flang/CMakeLists.txt +++ b/flang/CMakeLists.txt @@ -247,6 +247,8 @@ else() include_directories(SYSTEM ${MLIR_TABLEGEN_OUTPUT_DIR}) endif() +option(FLANG_INCLUDE_RUNTIME "Build the runtime in-tree (deprecated; to be replaced with LLVM_ENABLE_RUNTIMES=flang-rt)" ON) + set(FLANG_TOOLS_INSTALL_DIR "${CMAKE_INSTALL_BINDIR}" CACHE PATH "Path for binary subdirectory (defaults to '${CMAKE_INSTALL_BINDIR}')") mark_as_advanced(FLANG_TOOLS_INSTALL_DIR) @@ -487,7 +489,9 @@ if (FLANG_CUF_RUNTIME) find_package(CUDAToolkit REQUIRED) endif() -add_subdirectory(runtime) +if (FLANG_INCLUDE_RUNTIME) + add_subdirectory(runtime) +endif () if (LLVM_INCLUDE_EXAMPLES) add_subdirectory(examples) diff --git a/flang/test/CMakeLists.txt b/flang/test/CMakeLists.txt index cab214c2ef4c8c..e398e0786147aa 100644 --- a/flang/test/CMakeLists.txt +++ b/flang/test/CMakeLists.txt @@ -71,9 +71,13 @@ set(FLANG_TEST_DEPENDS llvm-objdump llvm-readobj split-file - FortranRuntime FortranDecimal ) + +if (FLANG_INCLUDE_RUNTIME) + list(APPEND FLANG_TEST_DEPENDS FortranRuntime) +endif () + if (LLVM_ENABLE_PLUGINS AND NOT WIN32) list(APPEND FLANG_TEST_DEPENDS Bye) endif() diff --git a/flang/test/Driver/ctofortran.f90 b/flang/test/Driver/ctofortran.f90 index 78eac32133b18e..10c7adaccc9588 100644 --- a/flang/test/Driver/ctofortran.f90 +++ b/flang/test/Driver/ctofortran.f90 @@ -1,4 +1,5 @@ ! UNSUPPORTED: system-windows +! REQUIRES: flang-rt ! RUN: split-file %s %t ! RUN: chmod +x %t/runtest.sh ! RUN: %t/runtest.sh %t %t/ffile.f90 %t/cfile.c %flang | FileCheck %s diff --git a/flang/test/Driver/exec.f90 b/flang/test/Driver/exec.f90 index fd174005ddf62a..9ca91ee24011c9 100644 --- a/flang/test/Driver/exec.f90 +++ b/flang/test/Driver/exec.f90 @@ -1,4 +1,5 @@ ! UNSUPPORTED: system-windows +! REQUIRES: flang-rt ! Verify that flang can correctly build executables. ! RUN: %flang %s -o %t diff --git a/flang/test/Runtime/no-cpp-dep.c b/flang/test/Runtime/no-cpp-dep.c index b1a5fa004014cc..7303ce63fdec41 100644 --- a/flang/test/Runtime/no-cpp-dep.c +++ b/flang/test/Runtime/no-cpp-dep.c @@ -3,7 +3,7 @@ This test makes sure that flang's runtime does not depend on the C++ runtime library. It tries to link this simple file against libFortranRuntime.a with a C compiler. -REQUIRES: c-compiler +REQUIRES: c-compiler, flang-rt RUN: %if system-aix %{ export OBJECT_MODE=64 %} RUN: %cc -std=c99 %s -I%include %libruntime -lm \ diff --git a/flang/test/lit.cfg.py b/flang/test/lit.cfg.py index c452b6d231c89f..f4580afc8c47b1 100644 --- a/flang/test/lit.cfg.py +++ b/flang/test/lit.cfg.py @@ -163,10 +163,13 @@ ToolSubst("%not_todo_abort_cmd", command=FindTool("not"), unresolved="fatal") ) +if config.flang_include_runtime: +config.available_features.add("flang-rt") + # Define some variables to help us test that the flang runtime doesn't depend on # the C++ runtime libraries. For this we need a C compiler. If for some reason # we don't have one, we can just disable the test. -if config.cc: +if config.flang_include_runtime and config.cc: libruntime = os.path.join(config.flang_lib_dir, "libFortranRuntime.a") include = os.path.join(config.flang_src_dir, "include") diff --git a/flang/test/lit.site.cfg.py.in b/flang/test/lit.site.cfg.py.in index d1a0ac763cf8a0..697ba3fa797633 100644 --- a/flang/test/lit.site.cfg.py.in +++ b/flang/test/lit.site.cfg.py.in @@ -1,6 +1,7 @@ @LIT_SITE_CFG_IN_HEADER@ import sys +import lit.util config.llvm_tools_dir = lit_config.substitute("@LLVM_TOOLS_DIR@") config.llvm_shlib_dir = lit_config.substitute(path(r"@SHLIBDIR@")) @@ -32,6 +33,7 @@ else: config.openmp_module_dir = None config.flang_runtime_f128_math_lib = "@FLANG_RUNTIME_F128_MATH_LIB@" config.have_ldbl_mant_dig_113 = "@HAVE_LDBL_MANT_DIG_113@" +config.flang_include_runtime = lit.util.pythonize_bool("@FLANG_INCLUDE_RUNTIME@") import lit.llvm lit.llvm.initialize(lit_config, config) diff --git a/flang/tools/f18/CMakeLists
[llvm-branch-commits] [flang] [Flang] Optionally do not compile the runtime in-tree (PR #122336)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/122336 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang] Promote FortranEvaluateTesting library (PR #122334)
https://github.com/Meinersbur closed https://github.com/llvm/llvm-project/pull/122334 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [flang] [lld] [Flang] Rename libFortranRuntime.a to libflang_rt.a (PR #122341)
https://github.com/Meinersbur updated https://github.com/llvm/llvm-project/pull/122341 >From 875607fdecfada90a80ec732637ea9595fe72ba3 Mon Sep 17 00:00:00 2001 From: Michael Kruse Date: Fri, 24 Jan 2025 16:42:24 +0100 Subject: [PATCH] [Flang] Rename libFortranRuntime.a to libflang_rt.a --- clang/lib/Driver/ToolChains/CommonArgs.cpp| 4 +- clang/lib/Driver/ToolChains/Flang.cpp | 8 ++-- flang/CMakeLists.txt | 2 +- flang/docs/FlangDriver.md | 6 +-- flang/docs/GettingStarted.md | 6 +-- flang/docs/OpenACC-descriptor-management.md | 2 +- flang/docs/ReleaseNotes.md| 2 + .../ExternalHelloWorld/CMakeLists.txt | 2 +- flang/lib/Optimizer/Builder/IntrinsicCall.cpp | 2 +- flang/runtime/CMakeLists.txt | 40 +++ flang/runtime/CUDA/CMakeLists.txt | 2 +- flang/runtime/Float128Math/CMakeLists.txt | 2 +- flang/runtime/time-intrinsic.cpp | 2 +- flang/test/CMakeLists.txt | 8 +++- .../test/Driver/gcc-toolchain-install-dir.f90 | 2 +- flang/test/Driver/linker-flags.f90| 8 ++-- .../test/Driver/msvc-dependent-lib-flags.f90 | 8 ++-- flang/test/Driver/nostdlib.f90| 2 +- flang/test/Runtime/no-cpp-dep.c | 2 +- flang/test/lit.cfg.py | 2 +- flang/tools/f18/CMakeLists.txt| 8 ++-- flang/unittests/CMakeLists.txt| 2 +- flang/unittests/Evaluate/CMakeLists.txt | 4 +- flang/unittests/Runtime/CMakeLists.txt| 2 +- flang/unittests/Runtime/CUDA/CMakeLists.txt | 2 +- lld/COFF/MinGW.cpp| 2 +- 26 files changed, 73 insertions(+), 59 deletions(-) diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp index b5273dd8cf1e3a..c7b0a660ee021f 100644 --- a/clang/lib/Driver/ToolChains/CommonArgs.cpp +++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp @@ -1321,7 +1321,7 @@ void tools::addOpenMPHostOffloadingArgs(const Compilation &C, /// Add Fortran runtime libs void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args, llvm::opt::ArgStringList &CmdArgs) { - // Link FortranRuntime + // Link flang_rt // These are handled earlier on Windows by telling the frontend driver to // add the correct libraries to link against as dependents in the object // file. @@ -1337,7 +1337,7 @@ void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args, if (AsNeeded) addAsNeededOption(TC, Args, CmdArgs, /*as_needed=*/false); } -CmdArgs.push_back("-lFortranRuntime"); +CmdArgs.push_back("-lflang_rt"); addArchSpecificRPath(TC, Args, CmdArgs); } diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index f1bf32b3238270..68a17edf8ca341 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -360,26 +360,26 @@ static void processVSRuntimeLibrary(const ToolChain &TC, const ArgList &Args, case options::OPT__SLASH_MT: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("--dependent-lib=libcmt"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.static.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.static.lib"); break; case options::OPT__SLASH_MTd: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DEBUG"); CmdArgs.push_back("--dependent-lib=libcmtd"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.static_dbg.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.static_dbg.lib"); break; case options::OPT__SLASH_MD: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DLL"); CmdArgs.push_back("--dependent-lib=msvcrt"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.dynamic.lib"); break; case options::OPT__SLASH_MDd: CmdArgs.push_back("-D_MT"); CmdArgs.push_back("-D_DEBUG"); CmdArgs.push_back("-D_DLL"); CmdArgs.push_back("--dependent-lib=msvcrtd"); -CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic_dbg.lib"); +CmdArgs.push_back("--dependent-lib=flang_rt.dynamic_dbg.lib"); break; } } diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt index 7d6dcb5c184a52..8a8b8bfa73b007 100644 --- a/flang/CMakeLists.txt +++ b/flang/CMakeLists.txt @@ -301,7 +301,7 @@ set(FLANG_DEFAULT_LINKER "" CACHE STRING "Default linker to use (linker name or absolute path, empty for platform default)") set(FLANG_DEFAULT_RTLIB "" CACHE STRING - "Default Fortran runtime library to use (\"libFortranRuntime\"), leave empty for platform default.") + "Default Fortran runtime library to use (\"libflang_rt\"), leave empty for platform default.") if (NOT(FLANG_DEFAULT_RTLIB STREQUAL "")) message(W
[llvm-branch-commits] [flang] [Flang] Promote FortranEvaluateTesting library (PR #122334)
https://github.com/Meinersbur updated https://github.com/llvm/llvm-project/pull/122334 >From 71015c8f9ab17431d052472aec99dc67929a166e Mon Sep 17 00:00:00 2001 From: Michael Kruse Date: Fri, 24 Jan 2025 16:30:47 +0100 Subject: [PATCH] [Flang] Promote FortranEvaluateTesting library --- .../flang/Testing}/fp-testing.h | 14 ++-- .../flang/Testing}/testing.h | 14 ++-- flang/lib/CMakeLists.txt | 4 +++ flang/lib/Testing/CMakeLists.txt | 20 +++ .../Evaluate => lib/Testing}/fp-testing.cpp | 10 +- .../Evaluate => lib/Testing}/testing.cpp | 10 +- flang/unittests/Evaluate/CMakeLists.txt | 35 ++- .../Evaluate/ISO-Fortran-binding.cpp | 2 +- .../Evaluate/bit-population-count.cpp | 2 +- flang/unittests/Evaluate/expression.cpp | 2 +- flang/unittests/Evaluate/folding.cpp | 2 +- flang/unittests/Evaluate/integer.cpp | 2 +- flang/unittests/Evaluate/intrinsics.cpp | 2 +- .../Evaluate/leading-zero-bit-count.cpp | 2 +- flang/unittests/Evaluate/logical.cpp | 2 +- flang/unittests/Evaluate/real.cpp | 4 +-- flang/unittests/Evaluate/reshape.cpp | 2 +- flang/unittests/Evaluate/uint128.cpp | 2 +- 18 files changed, 87 insertions(+), 44 deletions(-) rename flang/{unittests/Evaluate => include/flang/Testing}/fp-testing.h (54%) rename flang/{unittests/Evaluate => include/flang/Testing}/testing.h (74%) create mode 100644 flang/lib/Testing/CMakeLists.txt rename flang/{unittests/Evaluate => lib/Testing}/fp-testing.cpp (87%) rename flang/{unittests/Evaluate => lib/Testing}/testing.cpp (88%) diff --git a/flang/unittests/Evaluate/fp-testing.h b/flang/include/flang/Testing/fp-testing.h similarity index 54% rename from flang/unittests/Evaluate/fp-testing.h rename to flang/include/flang/Testing/fp-testing.h index 9091963a99b32d..e223d2ef7d1b8b 100644 --- a/flang/unittests/Evaluate/fp-testing.h +++ b/flang/include/flang/Testing/fp-testing.h @@ -1,5 +1,13 @@ -#ifndef FORTRAN_TEST_EVALUATE_FP_TESTING_H_ -#define FORTRAN_TEST_EVALUATE_FP_TESTING_H_ +//===-- include/flang/Testing/fp-testing.h --*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef FORTRAN_TESTING_FP_TESTING_H_ +#define FORTRAN_TESTING_FP_TESTING_H_ #include "flang/Common/target-rounding.h" #include @@ -24,4 +32,4 @@ class ScopedHostFloatingPointEnvironment { #endif }; -#endif // FORTRAN_TEST_EVALUATE_FP_TESTING_H_ +#endif /* FORTRAN_TESTING_FP_TESTING_H_ */ diff --git a/flang/unittests/Evaluate/testing.h b/flang/include/flang/Testing/testing.h similarity index 74% rename from flang/unittests/Evaluate/testing.h rename to flang/include/flang/Testing/testing.h index 422e2853c05bc6..404650c9a89f2c 100644 --- a/flang/unittests/Evaluate/testing.h +++ b/flang/include/flang/Testing/testing.h @@ -1,5 +1,13 @@ -#ifndef FORTRAN_EVALUATE_TESTING_H_ -#define FORTRAN_EVALUATE_TESTING_H_ +//===-- include/flang/Testing/testing.h -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#ifndef FORTRAN_TESTING_TESTING_H_ +#define FORTRAN_TESTING_TESTING_H_ #include #include @@ -33,4 +41,4 @@ FailureDetailPrinter Match(const char *file, int line, const std::string &want, FailureDetailPrinter Compare(const char *file, int line, const char *xs, const char *rel, const char *ys, std::uint64_t x, std::uint64_t y); } // namespace testing -#endif // FORTRAN_EVALUATE_TESTING_H_ +#endif /* FORTRAN_TESTING_TESTING_H_ */ diff --git a/flang/lib/CMakeLists.txt b/flang/lib/CMakeLists.txt index 05c3535b09b3d3..8b201d9a758a80 100644 --- a/flang/lib/CMakeLists.txt +++ b/flang/lib/CMakeLists.txt @@ -8,3 +8,7 @@ add_subdirectory(Frontend) add_subdirectory(FrontendTool) add_subdirectory(Optimizer) + +if (FLANG_INCLUDE_TESTS) + add_subdirectory(Testing) +endif () diff --git a/flang/lib/Testing/CMakeLists.txt b/flang/lib/Testing/CMakeLists.txt new file mode 100644 index 00..8051bc09736d16 --- /dev/null +++ b/flang/lib/Testing/CMakeLists.txt @@ -0,0 +1,20 @@ +#===-- lib/Testing/CMakeLists.txt --===# +# +# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +# See https://llvm.org/LICENSE.txt for license information. +# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +# +#===
[llvm-branch-commits] [flang] [Flang] Promote FortranEvaluateTesting library (PR #122334)
Meinersbur wrote: GitHub interpreted pushing to the target branch (NOT main) of the patch series as "merging". There seems to be no way te re-open this PR, I will create a new one. https://github.com/llvm/llvm-project/pull/122334 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Flang] Introduce FortranSupport (PR #122069)
Meinersbur wrote: GitHub interpreted pushing to the target branch (NOT main) of the patch series as "merging". There seems to be no way te re-open this PR, I will create a new one. https://github.com/llvm/llvm-project/pull/122069 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [flang] [llvm] [Flang] LLVM_ENABLE_RUNTIMES=flang-rt (PR #110217)
llvmbot wrote: @llvm/pr-subscribers-clang-driver Author: Michael Kruse (Meinersbur) Changes Extract Flang's runtime library to use the LLVM_ENABLE_RUNTIME mechanism. Motivation: * Consistency with LLVM's other runtime libraries (compiler-rt, libc, libcxx, openmp offload, ...) * Allows compiling the runtime for multiple targets at once using the LLVM_RUNTIME_TARGETS configuration options * Installs the runtime into the compiler's per-target resource directory so it can be automatically found even when cross-compiling Potential future directions: * Uses CMake's support for compiling Fortran files, including dependency resolution of Fortran modules * Improve robustness of compiling `libomp.mod` when openmp is available * Remove Flang's dependency from flang-rt's RTNAME function declarations (tblgen?) * Reduce Flang's build-time dependency from flang-rt's `REAL(16)` support See RFC discussion at https://discourse.llvm.org/t/rfc-use-llvm-enable-runtimes-for-flangs-runtime/80826 Patch series: * #110244 * #112188 * #121997 * #122069 * #122334 * #122336 * #122341 * #110298 * #110217 (this PR) * #121782 * #124126 Patch for lab.llvm.org buildbots: * https://github.com/llvm/llvm-zorg/pull/333 --- Patch is 108.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110217.diff 41 Files Affected: - (modified) clang/lib/Driver/ToolChains/Flang.cpp (+9-5) - (added) flang-rt/.clang-tidy (+2) - (added) flang-rt/CMakeLists.txt (+248) - (added) flang-rt/CODE_OWNERS.TXT (+14) - (added) flang-rt/LICENSE.TXT (+234) - (added) flang-rt/README.md (+188) - (added) flang-rt/cmake/modules/AddFlangRT.cmake (+186) - (added) flang-rt/cmake/modules/AddFlangRTOffload.cmake (+101) - (added) flang-rt/cmake/modules/GetToolchainDirs.cmake (+125) - (added) flang-rt/lib/CMakeLists.txt (+18) - (added) flang-rt/lib/FortranFloat128Math/CMakeLists.txt (+136) - (added) flang-rt/lib/Testing/CMakeLists.txt (+20) - (added) flang-rt/lib/flang_rt/CMakeLists.txt (+213) - (added) flang-rt/lib/flang_rt/CUDA/CMakeLists.txt (+33) - (modified) flang-rt/lib/flang_rt/io-api-minimal.cpp (+1-1) - (added) flang-rt/test/CMakeLists.txt (+59) - (modified) flang-rt/test/Driver/ctofortran.f90 (+5-24) - (modified) flang-rt/test/Driver/exec.f90 (+4-4) - (added) flang-rt/test/NonGtestUnit/lit.cfg.py (+22) - (added) flang-rt/test/NonGtestUnit/lit.site.cfg.py.in (+14) - (modified) flang-rt/test/Runtime/no-cpp-dep.c (+3-2) - (added) flang-rt/test/Unit/lit.cfg.py (+21) - (added) flang-rt/test/Unit/lit.site.cfg.py.in (+15) - (added) flang-rt/test/lit.cfg.py (+102) - (added) flang-rt/test/lit.site.cfg.py.in (+19) - (added) flang-rt/unittests/CMakeLists.txt (+111) - (added) flang-rt/unittests/Evaluate/CMakeLists.txt (+21) - (added) flang-rt/unittests/Runtime/CMakeLists.txt (+48) - (added) flang-rt/unittests/Runtime/CUDA/CMakeLists.txt (+18) - (modified) flang/CMakeLists.txt (+26-27) - (added) flang/cmake/modules/FlangCommon.cmake (+43) - (modified) flang/docs/GettingStarted.md (+58-50) - (modified) flang/docs/ReleaseNotes.md (+7-1) - (modified) flang/module/iso_fortran_env_impl.f90 (+1-1) - (modified) flang/test/lit.cfg.py (-20) - (modified) flang/test/lit.site.cfg.py.in (-3) - (modified) llvm/CMakeLists.txt (+7-1) - (modified) llvm/cmake/modules/LLVMExternalProjectUtils.cmake (+15-1) - (modified) llvm/projects/CMakeLists.txt (+3-1) - (modified) llvm/runtimes/CMakeLists.txt (+18-7) - (modified) runtimes/CMakeLists.txt (+1-1) ``diff diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp index 68a17edf8ca341..17a8a4dd8d0a87 100644 --- a/clang/lib/Driver/ToolChains/Flang.cpp +++ b/clang/lib/Driver/ToolChains/Flang.cpp @@ -342,11 +342,15 @@ static void processVSRuntimeLibrary(const ToolChain &TC, const ArgList &Args, ArgStringList &CmdArgs) { assert(TC.getTriple().isKnownWindowsMSVCEnvironment() && "can only add VS runtime library on Windows!"); - // if -fno-fortran-main has been passed, skip linking Fortran_main.a - if (TC.getTriple().isKnownWindowsMSVCEnvironment()) { -CmdArgs.push_back(Args.MakeArgString( -"--dependent-lib=" + TC.getCompilerRTBasename(Args, "builtins"))); - } + + // Flang/Clang (including clang-cl) -compiled programs targeting the MSVC ABI + // should only depend on msv(u)crt. LLVM still emits libgcc/compiler-rt + // functions in some cases like 128-bit integer math (__udivti3, __modti3, + // __fixsfti, __floattidf, ...) that msvc does not support. We are injecting a + // dependency to Compiler-RT's builtin library where these are implemented. + CmdArgs.push_back(Args.MakeArgString( + "--dependent-lib=" + TC.getCompilerRTBasename(Args, "builtins"))); + unsigned RTOptionID = options::OPT__SLASH_MT; if (auto *rtl = Args.getLastArg(options::OPT_fms_runtime_lib_EQ)) { RTOptionID = llvm::StringSwitch(rtl->getVal
[llvm-branch-commits] [flang] [Flang] Introduce FortranSupport (PR #122069)
Meinersbur wrote: Moving this PR out of the chain causes merge conflicts further down, making maintaining consistency of the series even more difficult. At least the change `std::optional` to `optional.h` is needed or I cannot compile it with nvcc (I am not sure how you do). I understand that with submitting patches I accept some responsibility to maintain it. I don't want to do so if it is not in a maintainable state. I will put this PR as first into the chain, so at least it can be applied independently. > breaking it up based on today's runtime usage of it is a bit artificial (some > features from it that are not used today in the runtime may very well be > already usable and may be used tomorrow). The current FortranCommon is already artificial presumably because it has grown organically: Everything that does not belong to some other library. Some of it (e.g. `genEntryBlock` from `OpenMP-utils.h`, `getFlangRepositoryString()` from `Version.h`) should conceptually never be used by the runtime. Splitting it up by usages actually makes it less artificial. Any code that is currently not be used in the runtime should be assumed to not work with the runtime. E.g. because it causes an additional link dependency to `FortranCommon.a/so` or libc++, cannot be compiled with nvcc, or requires annotation for offloading. Similarly, code that is not being tested can be assumed to not work or after changes to the future. That is, additional work to make it usable within the runtime is required anyways. Every other code, including those in other libraries, might also be eventually be useful for the runtime (e.g. if we are to include a JIT). That is not a reason to preemptively make all of them a dependency of the runtime. Partial use of libraries is called "erroneous configuration" in LLVM, see https://github.com/llvm/llvm-project/commit/ebc3302725350c44aaf5f97ce7ba484e30b3efa8 Sorry, I cannot follow the argument. https://github.com/llvm/llvm-project/pull/122069 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)
https://github.com/ergawy edited https://github.com/llvm/llvm-project/pull/124019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)
@@ -34,52 +34,48 @@ def PrivateClauseOp : OpenMP_Op<"private", [IsolatedFromAbove, RecipeInterface]> let description = [{ This operation provides a declaration of how to implement the [first]privatization of a variable. The dialect users should provide -information about how to create an instance of the type in the alloc region, -how to initialize the copy from the original item in the copy region, and if -needed, how to deallocate allocated memory in the dealloc region. +which type should be allocated for this variable. The allocated (usually by +alloca) variable is passed to the initialization region which does everything +else (e.g. initialization of Fortran runtime descriptors). Information about +how to initialize the copy from the original item should be given in the +copy region, and if needed, how to deallocate memory (allocated by the +initialization region) in the dealloc region. ergawy wrote: Ah, thanks for the clarification. I see you expanded the docs below. https://github.com/llvm/llvm-project/pull/124019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Analysis] Add DebugInfoCache analysis (PR #118629)
https://github.com/artempyanykh updated https://github.com/llvm/llvm-project/pull/118629 >From 54bc13d26e0c0c3cd9b2205ca3453c58a815be4e Mon Sep 17 00:00:00 2001 From: Artem Pianykh Date: Sun, 15 Sep 2024 10:51:38 -0700 Subject: [PATCH] [Analysis] Add DebugInfoCache analysis Summary: The analysis simply primes and caches DebugInfoFinders for each DICompileUnit in a module. This allows (future) callers like CoroSplitPass to compute global debug info metadata (required for coroutine function cloning) much faster. Specifically, pay the price of DICompileUnit processing only once per compile unit, rather than once per coroutine. Test Plan: Added a smoke test for the new analysis ninja check-llvm-unit check-llvm stack-info: PR: https://github.com/llvm/llvm-project/pull/118629, branch: users/artempyanykh/fast-coro-upstream/10 --- llvm/include/llvm/Analysis/DebugInfoCache.h | 50 + llvm/include/llvm/IR/DebugInfo.h | 4 +- llvm/lib/Analysis/CMakeLists.txt | 1 + llvm/lib/Analysis/DebugInfoCache.cpp | 47 llvm/lib/Passes/PassBuilder.cpp | 1 + llvm/lib/Passes/PassRegistry.def | 1 + llvm/unittests/Analysis/CMakeLists.txt| 1 + .../unittests/Analysis/DebugInfoCacheTest.cpp | 211 ++ 8 files changed, 315 insertions(+), 1 deletion(-) create mode 100644 llvm/include/llvm/Analysis/DebugInfoCache.h create mode 100644 llvm/lib/Analysis/DebugInfoCache.cpp create mode 100644 llvm/unittests/Analysis/DebugInfoCacheTest.cpp diff --git a/llvm/include/llvm/Analysis/DebugInfoCache.h b/llvm/include/llvm/Analysis/DebugInfoCache.h new file mode 100644 index 00..dbd6802c99ea01 --- /dev/null +++ b/llvm/include/llvm/Analysis/DebugInfoCache.h @@ -0,0 +1,50 @@ +//===- llvm/Analysis/DebugInfoCache.h - debug info cache *- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This file contains an analysis that builds a cache of debug info for each +// DICompileUnit in a module. +// +//===--===// + +#ifndef LLVM_ANALYSIS_DEBUGINFOCACHE_H +#define LLVM_ANALYSIS_DEBUGINFOCACHE_H + +#include "llvm/IR/DebugInfo.h" +#include "llvm/IR/PassManager.h" + +namespace llvm { + +/// Processes and caches debug info for each DICompileUnit in a module. +/// +/// The result of the analysis is a set of DebugInfoFinders primed on their +/// respective DICompileUnit. Such DebugInfoFinders can be used to speed up +/// function cloning which otherwise requires an expensive traversal of +/// DICompileUnit-level debug info. See an example usage in CoroSplit. +class DebugInfoCache { +public: + using DIFinderCache = SmallDenseMap; + DIFinderCache Result; + + DebugInfoCache(const Module &M); + + bool invalidate(Module &, const PreservedAnalyses &, + ModuleAnalysisManager::Invalidator &); +}; + +class DebugInfoCacheAnalysis +: public AnalysisInfoMixin { + friend AnalysisInfoMixin; + static AnalysisKey Key; + +public: + using Result = DebugInfoCache; + Result run(Module &M, ModuleAnalysisManager &); +}; +} // namespace llvm + +#endif diff --git a/llvm/include/llvm/IR/DebugInfo.h b/llvm/include/llvm/IR/DebugInfo.h index 73f45c3769be44..11907fbb7f20b3 100644 --- a/llvm/include/llvm/IR/DebugInfo.h +++ b/llvm/include/llvm/IR/DebugInfo.h @@ -120,11 +120,13 @@ class DebugInfoFinder { /// Process subprogram. void processSubprogram(DISubprogram *SP); + /// Process a compile unit. + void processCompileUnit(DICompileUnit *CU); + /// Clear all lists. void reset(); private: - void processCompileUnit(DICompileUnit *CU); void processScope(DIScope *Scope); void processType(DIType *DT); bool addCompileUnit(DICompileUnit *CU); diff --git a/llvm/lib/Analysis/CMakeLists.txt b/llvm/lib/Analysis/CMakeLists.txt index 0db5b80f336cb5..db9a569e301563 100644 --- a/llvm/lib/Analysis/CMakeLists.txt +++ b/llvm/lib/Analysis/CMakeLists.txt @@ -52,6 +52,7 @@ add_llvm_component_library(LLVMAnalysis DDGPrinter.cpp ConstraintSystem.cpp Delinearization.cpp + DebugInfoCache.cpp DemandedBits.cpp DependenceAnalysis.cpp DependenceGraphBuilder.cpp diff --git a/llvm/lib/Analysis/DebugInfoCache.cpp b/llvm/lib/Analysis/DebugInfoCache.cpp new file mode 100644 index 00..c1a3e89f0a6ccf --- /dev/null +++ b/llvm/lib/Analysis/DebugInfoCache.cpp @@ -0,0 +1,47 @@ +//===- llvm/Analysis/DebugInfoCache.cpp - debug info cache ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===-
[llvm-branch-commits] [llvm] [Coro] Use DebugInfoCache to speed up cloning in CoroSplitPass (PR #118630)
https://github.com/artempyanykh updated https://github.com/llvm/llvm-project/pull/118630 >From fc245ef152cfe134e8f9d6a39a7a38043163b7ce Mon Sep 17 00:00:00 2001 From: Artem Pianykh Date: Sun, 15 Sep 2024 11:00:00 -0700 Subject: [PATCH] [Coro] Use DebugInfoCache to speed up cloning in CoroSplitPass Summary: We can use a DebugInfoFinder from DebugInfoCache which is already primed on a compile unit to speed up collection of module-level debug info. The pass could likely be another 2x+ faster if we avoid rebuilding the set of global debug info. This needs further massaging of CloneFunction and ValueMapper, though, and can be done incrementally on top of this. Comparing performance of CoroSplitPass at various points in this stack, this is anecdata from a sample cpp file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt CommonDI | Cached CU DIFinder (cur.) | |-|--||---|---| | CoroSplitPass | 306ms| 221ms | 68ms | 17ms | | CoroCloner | 101ms| 72ms | 0.5ms | 0.5ms | | CollectGlobalDI | -| - | 63ms | 13ms | | Speed up| 1x | 1.4x | 4.5x | 18x | Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample cpp file with time trace to get the avg. duration of the pass and inner scopes. stack-info: PR: https://github.com/llvm/llvm-project/pull/118630, branch: users/artempyanykh/fast-coro-upstream/11 --- llvm/include/llvm/Transforms/Coroutines/ABI.h | 13 +++-- llvm/lib/Analysis/CGSCCPassManager.cpp| 7 +++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp | 55 +++ llvm/test/Other/new-pass-manager.ll | 1 + llvm/test/Other/new-pm-defaults.ll| 1 + llvm/test/Other/new-pm-lto-defaults.ll| 1 + llvm/test/Other/new-pm-pgo-preinline.ll | 1 + .../Other/new-pm-thinlto-postlink-defaults.ll | 1 + .../new-pm-thinlto-postlink-pgo-defaults.ll | 1 + ...-pm-thinlto-postlink-samplepgo-defaults.ll | 1 + .../Other/new-pm-thinlto-prelink-defaults.ll | 1 + .../new-pm-thinlto-prelink-pgo-defaults.ll| 1 + ...w-pm-thinlto-prelink-samplepgo-defaults.ll | 1 + .../Analysis/CGSCCPassManagerTest.cpp | 4 +- 14 files changed, 72 insertions(+), 17 deletions(-) diff --git a/llvm/include/llvm/Transforms/Coroutines/ABI.h b/llvm/include/llvm/Transforms/Coroutines/ABI.h index 0b2d405f3caec4..2cf614b6bb1e2a 100644 --- a/llvm/include/llvm/Transforms/Coroutines/ABI.h +++ b/llvm/include/llvm/Transforms/Coroutines/ABI.h @@ -15,6 +15,7 @@ #ifndef LLVM_TRANSFORMS_COROUTINES_ABI_H #define LLVM_TRANSFORMS_COROUTINES_ABI_H +#include "llvm/Analysis/DebugInfoCache.h" #include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/Transforms/Coroutines/CoroShape.h" #include "llvm/Transforms/Coroutines/MaterializationUtils.h" @@ -53,7 +54,8 @@ class BaseABI { // Perform the function splitting according to the ABI. virtual void splitCoroutine(Function &F, coro::Shape &Shape, SmallVectorImpl &Clones, - TargetTransformInfo &TTI) = 0; + TargetTransformInfo &TTI, + const DebugInfoCache *DICache) = 0; Function &F; coro::Shape &Shape; @@ -73,7 +75,8 @@ class SwitchABI : public BaseABI { void splitCoroutine(Function &F, coro::Shape &Shape, SmallVectorImpl &Clones, - TargetTransformInfo &TTI) override; + TargetTransformInfo &TTI, + const DebugInfoCache *DICache) override; }; class AsyncABI : public BaseABI { @@ -86,7 +89,8 @@ class AsyncABI : public BaseABI { void splitCoroutine(Function &F, coro::Shape &Shape, SmallVectorImpl &Clones, - TargetTransformInfo &TTI) override; + TargetTransformInfo &TTI, + const DebugInfoCache *DICache) override; }; class AnyRetconABI : public BaseABI { @@ -99,7 +103,8 @@ class AnyRetconABI : public BaseABI { void splitCoroutine(Function &F, coro::Shape &Shape, SmallVectorImpl &Clones, - TargetTransformInfo &TTI) override; + TargetTransformInfo &TTI, + const DebugInfoCache *DICache) override; }; } // end namespace coro diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp b/llvm/lib/Analysis/CGSCCPassManager.cpp index 948bc2435ab275..3ba085cdb0be8b 100644 --- a/llvm/lib/Analysis/CGSCCPassManager.cpp +++ b/llvm/lib/Analysis/CGSCCPassManager.cpp @@ -14,6 +14,7 @@ #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/iterator_range.h" +#include "llvm/Analy
[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)
@@ -55,15 +55,19 @@ class MapsForPrivatizedSymbolsPass std::underlying_type_t>( llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO); Operation *definingOp = var.getDefiningOp(); -auto declOp = llvm::dyn_cast_or_null(definingOp); -assert(declOp && - "Expected defining Op of privatized var to be hlfir.declare"); +assert(definingOp && + "Privatizing a block argument without any hlfir.declare"); ergawy wrote: > MLIR values can come from two places: That's exactly my point. What prevents us from working with block args here? Why do we need to assume it is defined by an op? I am not against that, we can keep the assertion. But beyond the fact that we need to call `getBase` below, we only care about `var` and a `Value` and not about its defining op. https://github.com/llvm/llvm-project/pull/124019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)
https://github.com/ergawy approved this pull request. LGTM! Thanks Tom! However, I have to admit, the `fir` dialect type system is still a "semi-"blackbox to me, so someone more familiar with it needs to carefully review changes in `PrivateReductionUtils.cpp`. https://github.com/llvm/llvm-project/pull/124019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)
https://github.com/petar-avramovic created https://github.com/llvm/llvm-project/pull/124298 Record all uses outside cycle with divergent exit during propagateTemporalDivergence in Uniformity analysis. With this list of candidates for temporal divergence lowering, excluding known lane masks from control flow intrinsics, find sources from inside the cycle that are not i1 and uniform. Temporal divergence lowering (non i1): create copy(v_mov) to vgpr, with implicit exec (to stop other passes from moving this copy outside of the cycle) and use this vgpr outside of the cycle instead of original uniform source. >From 3e04401258c91639105b1f2f17a84badbdf928ae Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Fri, 24 Jan 2025 16:56:30 +0100 Subject: [PATCH] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) Record all uses outside cycle with divergent exit during propagateTemporalDivergence in Uniformity analysis. With this list of candidates for temporal divergence lowering, excluding known lane masks from control flow intrinsics, find sources from inside the cycle that are not i1 and uniform. Temporal divergence lowering (non i1): create copy(v_mov) to vgpr, with implicit exec (to stop other passes from moving this copy outside of the cycle) and use this vgpr outside of the cycle instead of original uniform source. --- llvm/include/llvm/ADT/GenericUniformityImpl.h | 37 +++ llvm/include/llvm/ADT/GenericUniformityInfo.h | 6 +++ llvm/lib/Analysis/UniformityAnalysis.cpp | 3 +- .../lib/CodeGen/MachineUniformityAnalysis.cpp | 8 ++-- .../AMDGPUGlobalISelDivergenceLowering.cpp| 47 ++- .../lib/Target/AMDGPU/AMDGPURegBankSelect.cpp | 24 -- llvm/lib/Target/AMDGPU/SILowerI1Copies.h | 6 +++ ...divergent-i1-phis-no-lane-mask-merging.mir | 7 +-- ...ergence-divergent-i1-used-outside-loop.mir | 19 .../divergence-temporal-divergent-reg.ll | 18 +++ .../divergence-temporal-divergent-reg.mir | 3 +- .../AMDGPU/GlobalISel/regbankselect-mui.ll| 17 +++ 12 files changed, 153 insertions(+), 42 deletions(-) diff --git a/llvm/include/llvm/ADT/GenericUniformityImpl.h b/llvm/include/llvm/ADT/GenericUniformityImpl.h index bd09f4fe43e087..91ee0e41332199 100644 --- a/llvm/include/llvm/ADT/GenericUniformityImpl.h +++ b/llvm/include/llvm/ADT/GenericUniformityImpl.h @@ -342,6 +342,10 @@ template class GenericUniformityAnalysisImpl { typename SyncDependenceAnalysisT::DivergenceDescriptor; using BlockLabelMapT = typename SyncDependenceAnalysisT::BlockLabelMap; + // Use outside cycle with divergent exit + using UOCWDE = + std::tuple; + GenericUniformityAnalysisImpl(const DominatorTreeT &DT, const CycleInfoT &CI, const TargetTransformInfo *TTI) : Context(CI.getSSAContext()), F(*Context.getFunction()), CI(CI), @@ -395,6 +399,14 @@ template class GenericUniformityAnalysisImpl { } void print(raw_ostream &out) const; + SmallVector UsesOutsideCycleWithDivergentExit; + void recordUseOutsideCycleWithDivergentExit(const InstructionT *, + const InstructionT *, + const CycleT *); + inline iterator_range getUsesOutsideCycleWithDivergentExit() const { +return make_range(UsesOutsideCycleWithDivergentExit.begin(), + UsesOutsideCycleWithDivergentExit.end()); + } protected: /// \brief Value/block pair representing a single phi input. @@ -1129,6 +1141,14 @@ void GenericUniformityAnalysisImpl::compute() { } } +template +void GenericUniformityAnalysisImpl< +ContextT>::recordUseOutsideCycleWithDivergentExit(const InstructionT *Inst, + const InstructionT *User, + const CycleT *Cycle) { + UsesOutsideCycleWithDivergentExit.emplace_back(Inst, User, Cycle); +} + template bool GenericUniformityAnalysisImpl::isAlwaysUniform( const InstructionT &Instr) const { @@ -1180,6 +1200,16 @@ void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { } } + if (!UsesOutsideCycleWithDivergentExit.empty()) { +OS << "\nUSES OUTSIDE CYCLES WITH DIVERGENT EXIT:\n"; + +for (auto [Inst, UseInst, Cycle] : UsesOutsideCycleWithDivergentExit) { + OS << "Inst:" << Context.print(Inst) + << "Used by :" << Context.print(UseInst) + << "Outside cycle :" << Cycle->print(Context) << "\n\n"; +} + } + for (auto &block : F) { OS << "\nBLOCK " << Context.print(&block) << '\n'; @@ -1210,6 +1240,13 @@ void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { } } +template +iterator_range::UOCWDE *> +GenericUniformityInfo::getUsesOutsideCycleWithDivergentExit() const { + return make_range(DA->UsesOutsideCycleWithDivergentExit.begin(), +DA->UsesOutsideCycleWithDivergentExit.end());
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering i1 (PR #124299)
petar-avramovic wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/124299?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#124299** https://app.graphite.dev/github/pr/llvm/llvm-project/124299?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> š https://app.graphite.dev/github/pr/llvm/llvm-project/124299?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#124298** https://app.graphite.dev/github/pr/llvm/llvm-project/124298?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#124297** https://app.graphite.dev/github/pr/llvm/llvm-project/124297?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/124299 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)
petar-avramovic wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/124298?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#124299** https://app.graphite.dev/github/pr/llvm/llvm-project/124299?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#124298** https://app.graphite.dev/github/pr/llvm/llvm-project/124298?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> š https://app.graphite.dev/github/pr/llvm/llvm-project/124298?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#124297** https://app.graphite.dev/github/pr/llvm/llvm-project/124297?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/124298 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)
https://github.com/petar-avramovic ready_for_review https://github.com/llvm/llvm-project/pull/124298 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering i1 (PR #124299)
llvmbot wrote: @llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Petar Avramovic (petar-avramovic) Changes Use of i1 outside of the cycle, both uniform and divergent, is lane mask(in sgpr) that contains i1 at iteration that lane exited the cycle. Create phi that merges lane mask across all iterations. --- Patch is 124.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124299.diff 9 Files Affected: - (modified) llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp (+55) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll (+20-10) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir (+33-19) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll (+87-69) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir (+160-127) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.ll (+64-59) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir (+104-88) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.ll (+36-23) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.mir (+55-34) ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp index d8cd1e7379c93f..7e8b9d5524be32 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp @@ -80,6 +80,7 @@ class DivergenceLoweringHelper : public PhiLoweringHelper { void constrainAsLaneMask(Incoming &In) override; bool lowerTempDivergence(); + bool lowerTempDivergenceI1(); }; DivergenceLoweringHelper::DivergenceLoweringHelper( @@ -221,6 +222,54 @@ bool DivergenceLoweringHelper::lowerTempDivergence() { return false; } +bool DivergenceLoweringHelper::lowerTempDivergenceI1() { + MachineRegisterInfo::VRegAttrs BoolS1 = {ST->getBoolRC(), LLT::scalar(1)}; + initializeLaneMaskRegisterAttributes(BoolS1); + + for (auto [Inst, UseInst, Cycle] : MUI->get_TDCs()) { +Register Reg = Inst->getOperand(0).getReg(); +if (MRI->getType(Reg) != LLT::scalar(1)) + continue; + +Register MergedMask = MRI->createVirtualRegister(BoolS1); +Register PrevIterMask = MRI->createVirtualRegister(BoolS1); + +MachineBasicBlock *CycleHeaderMBB = Cycle->getHeader(); +SmallVector ExitingBlocks; +Cycle->getExitingBlocks(ExitingBlocks); +assert(ExitingBlocks.size() == 1); +MachineBasicBlock *CycleExitingMBB = ExitingBlocks[0]; + +B.setInsertPt(*CycleHeaderMBB, CycleHeaderMBB->begin()); +auto CrossIterPHI = B.buildInstr(AMDGPU::PHI).addDef(PrevIterMask); + +// We only care about cycle iterration path - merge Reg with previous +// iteration. For other incomings use implicit def. +// Predecessors should be CyclePredecessor and CycleExitingMBB. +// In older versions of irreducible control flow lowering there could be +// cases with more predecessors. To keep this lowering as generic as +// possible also handle those cases. +for (auto MBB : CycleHeaderMBB->predecessors()) { + if (MBB == CycleExitingMBB) { +CrossIterPHI.addReg(MergedMask); + } else { +B.setInsertPt(*MBB, MBB->getFirstTerminator()); +auto ImplDef = B.buildInstr(AMDGPU::IMPLICIT_DEF, {BoolS1}, {}); +CrossIterPHI.addReg(ImplDef.getReg(0)); + } + CrossIterPHI.addMBB(MBB); +} + +buildMergeLaneMasks(*CycleExitingMBB, CycleExitingMBB->getFirstTerminator(), +{}, MergedMask, PrevIterMask, Reg); + +replaceUsesOfRegInInstWith(Reg, const_cast(UseInst), + MergedMask); + } + + return false; +} + } // End anonymous namespace. INITIALIZE_PASS_BEGIN(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE, @@ -260,6 +309,12 @@ bool AMDGPUGlobalISelDivergenceLowering::runOnMachineFunction( // Non-i1 temporal divergence lowering. Changed |= Helper.lowerTempDivergence(); + // This covers both uniform and divergent i1s. Lane masks are in sgpr and need + // to be updated in each iteration. + Changed |= Helper.lowerTempDivergenceI1(); + // Temporal divergence lowering of divergent i1 phi used outside of the cycle + // could also be handled by lowerPhis but we do it in lowerTempDivergenceI1 + // since in some case lowerPhis does unnecessary lane mask merging. Changed |= Helper.lowerPhis(); return Changed; } diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll index 65c96a3db5bbfa..11acd451d98d7d 100644 --- a/llvm/test/CodeGen/AMD
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)
llvmbot wrote: @llvm/pr-subscribers-llvm-globalisel Author: Petar Avramovic (petar-avramovic) Changes Record all uses outside cycle with divergent exit during propagateTemporalDivergence in Uniformity analysis. With this list of candidates for temporal divergence lowering, excluding known lane masks from control flow intrinsics, find sources from inside the cycle that are not i1 and uniform. Temporal divergence lowering (non i1): create copy(v_mov) to vgpr, with implicit exec (to stop other passes from moving this copy outside of the cycle) and use this vgpr outside of the cycle instead of original uniform source. --- Patch is 23.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124298.diff 12 Files Affected: - (modified) llvm/include/llvm/ADT/GenericUniformityImpl.h (+37) - (modified) llvm/include/llvm/ADT/GenericUniformityInfo.h (+6) - (modified) llvm/lib/Analysis/UniformityAnalysis.cpp (+1-2) - (modified) llvm/lib/CodeGen/MachineUniformityAnalysis.cpp (+4-4) - (modified) llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp (+45-2) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp (+20-4) - (modified) llvm/lib/Target/AMDGPU/SILowerI1Copies.h (+6) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir (+4-3) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir (+10-9) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-reg.ll (+9-9) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-reg.mir (+2-1) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui.ll (+9-8) ``diff diff --git a/llvm/include/llvm/ADT/GenericUniformityImpl.h b/llvm/include/llvm/ADT/GenericUniformityImpl.h index bd09f4fe43e087..91ee0e41332199 100644 --- a/llvm/include/llvm/ADT/GenericUniformityImpl.h +++ b/llvm/include/llvm/ADT/GenericUniformityImpl.h @@ -342,6 +342,10 @@ template class GenericUniformityAnalysisImpl { typename SyncDependenceAnalysisT::DivergenceDescriptor; using BlockLabelMapT = typename SyncDependenceAnalysisT::BlockLabelMap; + // Use outside cycle with divergent exit + using UOCWDE = + std::tuple; + GenericUniformityAnalysisImpl(const DominatorTreeT &DT, const CycleInfoT &CI, const TargetTransformInfo *TTI) : Context(CI.getSSAContext()), F(*Context.getFunction()), CI(CI), @@ -395,6 +399,14 @@ template class GenericUniformityAnalysisImpl { } void print(raw_ostream &out) const; + SmallVector UsesOutsideCycleWithDivergentExit; + void recordUseOutsideCycleWithDivergentExit(const InstructionT *, + const InstructionT *, + const CycleT *); + inline iterator_range getUsesOutsideCycleWithDivergentExit() const { +return make_range(UsesOutsideCycleWithDivergentExit.begin(), + UsesOutsideCycleWithDivergentExit.end()); + } protected: /// \brief Value/block pair representing a single phi input. @@ -1129,6 +1141,14 @@ void GenericUniformityAnalysisImpl::compute() { } } +template +void GenericUniformityAnalysisImpl< +ContextT>::recordUseOutsideCycleWithDivergentExit(const InstructionT *Inst, + const InstructionT *User, + const CycleT *Cycle) { + UsesOutsideCycleWithDivergentExit.emplace_back(Inst, User, Cycle); +} + template bool GenericUniformityAnalysisImpl::isAlwaysUniform( const InstructionT &Instr) const { @@ -1180,6 +1200,16 @@ void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { } } + if (!UsesOutsideCycleWithDivergentExit.empty()) { +OS << "\nUSES OUTSIDE CYCLES WITH DIVERGENT EXIT:\n"; + +for (auto [Inst, UseInst, Cycle] : UsesOutsideCycleWithDivergentExit) { + OS << "Inst:" << Context.print(Inst) + << "Used by :" << Context.print(UseInst) + << "Outside cycle :" << Cycle->print(Context) << "\n\n"; +} + } + for (auto &block : F) { OS << "\nBLOCK " << Context.print(&block) << '\n'; @@ -1210,6 +1240,13 @@ void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { } } +template +iterator_range::UOCWDE *> +GenericUniformityInfo::getUsesOutsideCycleWithDivergentExit() const { + return make_range(DA->UsesOutsideCycleWithDivergentExit.begin(), +DA->UsesOutsideCycleWithDivergentExit.end()); +} + template bool GenericUniformityInfo::hasDivergence() const { return DA->hasDivergence(); diff --git a/llvm/include/llvm/ADT/GenericUniformityInfo.h b/llvm/include/llvm/ADT/GenericUniformityInfo.h index e53afccc020b46..660fd6d46114d7 100644 --- a/llvm/include/llvm/ADT/GenericUniformityInfo.h +++ b/llvm/include/llvm/ADT/GenericUniformityInfo.h @@ -40
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)
llvmbot wrote: @llvm/pr-subscribers-llvm-analysis Author: Petar Avramovic (petar-avramovic) Changes Record all uses outside cycle with divergent exit during propagateTemporalDivergence in Uniformity analysis. With this list of candidates for temporal divergence lowering, excluding known lane masks from control flow intrinsics, find sources from inside the cycle that are not i1 and uniform. Temporal divergence lowering (non i1): create copy(v_mov) to vgpr, with implicit exec (to stop other passes from moving this copy outside of the cycle) and use this vgpr outside of the cycle instead of original uniform source. --- Patch is 23.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124298.diff 12 Files Affected: - (modified) llvm/include/llvm/ADT/GenericUniformityImpl.h (+37) - (modified) llvm/include/llvm/ADT/GenericUniformityInfo.h (+6) - (modified) llvm/lib/Analysis/UniformityAnalysis.cpp (+1-2) - (modified) llvm/lib/CodeGen/MachineUniformityAnalysis.cpp (+4-4) - (modified) llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp (+45-2) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp (+20-4) - (modified) llvm/lib/Target/AMDGPU/SILowerI1Copies.h (+6) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir (+4-3) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir (+10-9) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-reg.ll (+9-9) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-reg.mir (+2-1) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui.ll (+9-8) ``diff diff --git a/llvm/include/llvm/ADT/GenericUniformityImpl.h b/llvm/include/llvm/ADT/GenericUniformityImpl.h index bd09f4fe43e087..91ee0e41332199 100644 --- a/llvm/include/llvm/ADT/GenericUniformityImpl.h +++ b/llvm/include/llvm/ADT/GenericUniformityImpl.h @@ -342,6 +342,10 @@ template class GenericUniformityAnalysisImpl { typename SyncDependenceAnalysisT::DivergenceDescriptor; using BlockLabelMapT = typename SyncDependenceAnalysisT::BlockLabelMap; + // Use outside cycle with divergent exit + using UOCWDE = + std::tuple; + GenericUniformityAnalysisImpl(const DominatorTreeT &DT, const CycleInfoT &CI, const TargetTransformInfo *TTI) : Context(CI.getSSAContext()), F(*Context.getFunction()), CI(CI), @@ -395,6 +399,14 @@ template class GenericUniformityAnalysisImpl { } void print(raw_ostream &out) const; + SmallVector UsesOutsideCycleWithDivergentExit; + void recordUseOutsideCycleWithDivergentExit(const InstructionT *, + const InstructionT *, + const CycleT *); + inline iterator_range getUsesOutsideCycleWithDivergentExit() const { +return make_range(UsesOutsideCycleWithDivergentExit.begin(), + UsesOutsideCycleWithDivergentExit.end()); + } protected: /// \brief Value/block pair representing a single phi input. @@ -1129,6 +1141,14 @@ void GenericUniformityAnalysisImpl::compute() { } } +template +void GenericUniformityAnalysisImpl< +ContextT>::recordUseOutsideCycleWithDivergentExit(const InstructionT *Inst, + const InstructionT *User, + const CycleT *Cycle) { + UsesOutsideCycleWithDivergentExit.emplace_back(Inst, User, Cycle); +} + template bool GenericUniformityAnalysisImpl::isAlwaysUniform( const InstructionT &Instr) const { @@ -1180,6 +1200,16 @@ void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { } } + if (!UsesOutsideCycleWithDivergentExit.empty()) { +OS << "\nUSES OUTSIDE CYCLES WITH DIVERGENT EXIT:\n"; + +for (auto [Inst, UseInst, Cycle] : UsesOutsideCycleWithDivergentExit) { + OS << "Inst:" << Context.print(Inst) + << "Used by :" << Context.print(UseInst) + << "Outside cycle :" << Cycle->print(Context) << "\n\n"; +} + } + for (auto &block : F) { OS << "\nBLOCK " << Context.print(&block) << '\n'; @@ -1210,6 +1240,13 @@ void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { } } +template +iterator_range::UOCWDE *> +GenericUniformityInfo::getUsesOutsideCycleWithDivergentExit() const { + return make_range(DA->UsesOutsideCycleWithDivergentExit.begin(), +DA->UsesOutsideCycleWithDivergentExit.end()); +} + template bool GenericUniformityInfo::hasDivergence() const { return DA->hasDivergence(); diff --git a/llvm/include/llvm/ADT/GenericUniformityInfo.h b/llvm/include/llvm/ADT/GenericUniformityInfo.h index e53afccc020b46..660fd6d46114d7 100644 --- a/llvm/include/llvm/ADT/GenericUniformityInfo.h +++ b/llvm/include/llvm/ADT/GenericUniformityInfo.h @@ -40,6
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering i1 (PR #124299)
https://github.com/petar-avramovic ready_for_review https://github.com/llvm/llvm-project/pull/124299 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff 1728ab49b46a31b63d8ecdc81fe87851aa40a725 3e04401258c91639105b1f2f17a84badbdf928ae --extensions cpp,h -- llvm/include/llvm/ADT/GenericUniformityImpl.h llvm/include/llvm/ADT/GenericUniformityInfo.h llvm/lib/Analysis/UniformityAnalysis.cpp llvm/lib/CodeGen/MachineUniformityAnalysis.cpp llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp llvm/lib/Target/AMDGPU/SILowerI1Copies.h `` View the diff from clang-format here. ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp index 452d754985..8a0c9faa34 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp @@ -225,7 +225,8 @@ bool AMDGPURegBankSelect::runOnMachineFunction(MachineFunction &MF) { getAnalysis().getUniformityInfo(); MachineRegisterInfo &MRI = *B.getMRI(); const GCNSubtarget &ST = MF.getSubtarget(); - RegBankSelectHelper RBSHelper(B, ILMA, MUI, *ST.getRegisterInfo(), *ST.getRegBankInfo()); + RegBankSelectHelper RBSHelper(B, ILMA, MUI, *ST.getRegisterInfo(), +*ST.getRegBankInfo()); // Virtual registers at this point don't have register banks. // Virtual registers in def and use operands of already inst-selected // instruction have register class. `` https://github.com/llvm/llvm-project/pull/124298 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [PassBuilder][CodeGen] Add callback style pass buider (PR #116913)
optimisan wrote: > I created https://github.com/llvm/llvm-project/pull/76714, but disabling > arbitrary passes is not we expect. Maybe we could add an allowlist as a > compromise... Okay, I see. Will look if other solutions are possible as well. https://github.com/llvm/llvm-project/pull/116913 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
https://github.com/skatrak commented: Thank you Andrew, I have some minor comments but this generally looks fine to me. I'm not that familiar with mapping, so it's likely I would miss nontrivial problems if there were any, though š . https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -3197,37 +3303,77 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( // what we support as expected. llvm::omp::OpenMPOffloadMappingFlags mapFlag = mapData.Types[mapDataIndex]; ompBuilder.setCorrectMemberOfFlag(mapFlag, memberOfFlag); -combinedInfo.Types.emplace_back(mapFlag); -combinedInfo.DevicePointers.emplace_back( -llvm::OpenMPIRBuilder::DeviceInfoTy::None); -combinedInfo.Names.emplace_back(LLVM::createMappingInformation( -mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder)); -combinedInfo.BasePointers.emplace_back(mapData.BasePointers[mapDataIndex]); -combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]); -combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]); - } - return memberOfFlag; -} - -// The intent is to verify if the mapped data being passed is a -// pointer -> pointee that requires special handling in certain cases, -// e.g. applying the OMP_MAP_PTR_AND_OBJ map type. -// -// There may be a better way to verify this, but unfortunately with -// opaque pointers we lose the ability to easily check if something is -// a pointer whilst maintaining access to the underlying type. -static bool checkIfPointerMap(omp::MapInfoOp mapOp) { - // If we have a varPtrPtr field assigned then the underlying type is a pointer - if (mapOp.getVarPtrPtr()) -return true; - // If the map data is declare target with a link clause, then it's represented - // as a pointer when we lower it to LLVM-IR even if at the MLIR level it has - // no relation to pointers. - if (isDeclareTargetLink(mapOp.getVarPtr())) -return true; +if (targetDirective == TargetDirective::TargetUpdate) { + combinedInfo.Types.emplace_back(mapFlag); + combinedInfo.DevicePointers.emplace_back( + mapData.DevicePointers[mapDataIndex]); + combinedInfo.Names.emplace_back(LLVM::createMappingInformation( + mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder)); + combinedInfo.BasePointers.emplace_back( + mapData.BasePointers[mapDataIndex]); + combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]); + combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]); +} else { + llvm::SmallVector overlapIdxs; + // Find all of the members that "overlap", i.e. occlude other members that + // were mapped alongside the parent, e.g. member [0], occludes + getOverlappedMembers(overlapIdxs, mapData, parentClause); + // We need to make sure the overlapped members are sorted in order of + // lowest address to highest address skatrak wrote: ```suggestion // lowest address to highest address. ``` https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -3110,6 +3132,91 @@ calculateBoundsOffset(LLVM::ModuleTranslation &moduleTranslation, return idx; } +// Gathers members that are overlapping in the parent, excluding members that +// themselves overlap, keeping the top-most (closest to parents level) map. +static void getOverlappedMembers(llvm::SmallVector &overlapMapDataIdxs, + MapInfoData &mapData, + omp::MapInfoOp parentOp) { + // No members mapped, no overlaps. + if (parentOp.getMembers().empty()) +return; + + // Single member, we can insert and return early. + if (parentOp.getMembers().size() == 1) { +overlapMapDataIdxs.push_back(0); +return; + } + + // 1) collect list of top-level overlapping members from MemberOp + llvm::SmallVector> memberByIndex; + mlir::ArrayAttr indexAttr = parentOp.getMembersIndexAttr(); + for (auto [memIndex, indicesAttr] : llvm::enumerate(indexAttr)) +memberByIndex.push_back( +std::make_pair(memIndex, mlir::cast(indicesAttr))); + + // Sort the smallest first (higher up the parent -> member chain), so that + // when we remove members, we remove as much as we can in the initial + // iterations, shortening the number of passes required. + llvm::sort(memberByIndex.begin(), memberByIndex.end(), + [&](auto a, auto b) { return a.second.size() < b.second.size(); }); + + auto getAsIntegers = [](mlir::ArrayAttr values) { +llvm::SmallVector ints; +ints.reserve(values.size()); +llvm::transform(values, std::back_inserter(ints), +[](mlir::Attribute value) { + return mlir::cast(value).getInt(); +}); +return ints; + }; + + // Remove elements from the vector if there is a parent element that + // supersedes it. i.e. if member [0] is mapped, we can remove members [0,1], + // [0,2].. etc. + for (auto v : make_early_inc_range(memberByIndex)) { +auto vArr = getAsIntegers(v.second); +memberByIndex.erase( skatrak wrote: Do we know for sure this always works? Reading the documentation for `make_early_inc_range`, my understanding is that we're allowed to mutate the underlying range as long as we don't invalidate the next iterator. But, if we try to delete elements which could be anywhere in the range, it seems possible that we would end up doing just that. Maybe it would be safer to just create an integer set of to-be-skipped elements and only add to `overlapMapDataIdxs` elements in `memberByIndex` which are not part of that set. https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -3197,37 +3303,77 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( // what we support as expected. llvm::omp::OpenMPOffloadMappingFlags mapFlag = mapData.Types[mapDataIndex]; ompBuilder.setCorrectMemberOfFlag(mapFlag, memberOfFlag); -combinedInfo.Types.emplace_back(mapFlag); -combinedInfo.DevicePointers.emplace_back( -llvm::OpenMPIRBuilder::DeviceInfoTy::None); -combinedInfo.Names.emplace_back(LLVM::createMappingInformation( -mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder)); -combinedInfo.BasePointers.emplace_back(mapData.BasePointers[mapDataIndex]); -combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]); -combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]); - } - return memberOfFlag; -} - -// The intent is to verify if the mapped data being passed is a -// pointer -> pointee that requires special handling in certain cases, -// e.g. applying the OMP_MAP_PTR_AND_OBJ map type. -// -// There may be a better way to verify this, but unfortunately with -// opaque pointers we lose the ability to easily check if something is -// a pointer whilst maintaining access to the underlying type. -static bool checkIfPointerMap(omp::MapInfoOp mapOp) { - // If we have a varPtrPtr field assigned then the underlying type is a pointer - if (mapOp.getVarPtrPtr()) -return true; - // If the map data is declare target with a link clause, then it's represented - // as a pointer when we lower it to LLVM-IR even if at the MLIR level it has - // no relation to pointers. - if (isDeclareTargetLink(mapOp.getVarPtr())) -return true; +if (targetDirective == TargetDirective::TargetUpdate) { + combinedInfo.Types.emplace_back(mapFlag); + combinedInfo.DevicePointers.emplace_back( + mapData.DevicePointers[mapDataIndex]); + combinedInfo.Names.emplace_back(LLVM::createMappingInformation( + mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder)); + combinedInfo.BasePointers.emplace_back( + mapData.BasePointers[mapDataIndex]); + combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]); + combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]); +} else { + llvm::SmallVector overlapIdxs; + // Find all of the members that "overlap", i.e. occlude other members that + // were mapped alongside the parent, e.g. member [0], occludes + getOverlappedMembers(overlapIdxs, mapData, parentClause); + // We need to make sure the overlapped members are sorted in order of + // lowest address to highest address + sortMapIndices(overlapIdxs, parentClause); + + lowAddr = builder.CreatePointerCast(mapData.Pointers[mapDataIndex], + builder.getPtrTy()); + highAddr = builder.CreatePointerCast( + builder.CreateConstGEP1_32(mapData.BaseType[mapDataIndex], + mapData.Pointers[mapDataIndex], 1), + builder.getPtrTy()); + + // TODO: We may want to skip arrays/array sections in this as Clang does + // so it appears to be an optimisation rather than a neccessity though, + // but this requires further investigation. However, we would have to make + // sure to not exclude maps with bounds that ARE pointers, as these are + // processed as seperate components, i.e. pointer + data. skatrak wrote: ```suggestion // processed as separate components, i.e. pointer + data. ``` https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -3110,6 +3132,91 @@ calculateBoundsOffset(LLVM::ModuleTranslation &moduleTranslation, return idx; } +// Gathers members that are overlapping in the parent, excluding members that +// themselves overlap, keeping the top-most (closest to parents level) map. +static void getOverlappedMembers(llvm::SmallVector &overlapMapDataIdxs, skatrak wrote: ```suggestion static void getOverlappedMembers(llvm::SmallVectorImpl &overlapMapDataIdxs, ``` https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, omp::MapInfoOp memberOp) { return std::distance(mapData.MapClause.begin(), res); } -static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo, -bool first) { - ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); - // Only 1 member has been mapped, we can return it. - if (indexAttr.size() == 1) -return cast(mapInfo.getMembers()[0].getDefiningOp()); +static void sortMapIndices(llvm::SmallVector &indices, skatrak wrote: General nit for changes in this file: There's a `using namespace mlir`, so we can remove `mlir::`. Same for `llvm::` cast-style functions, which are present in the `mlir` namespace as well. https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, omp::MapInfoOp memberOp) { return std::distance(mapData.MapClause.begin(), res); } -static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo, -bool first) { - ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); - // Only 1 member has been mapped, we can return it. - if (indexAttr.size() == 1) -return cast(mapInfo.getMembers()[0].getDefiningOp()); +static void sortMapIndices(llvm::SmallVector &indices, + mlir::omp::MapInfoOp mapInfo, + bool ascending = true) { + mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); + if (indexAttr.empty() || indexAttr.size() == 1 || indices.empty() || + indices.size() == 1) +return; - llvm::SmallVector indices(indexAttr.size()); - std::iota(indices.begin(), indices.end(), 0); + llvm::sort( + indices.begin(), indices.end(), [&](const size_t a, const size_t b) { +auto memberIndicesA = mlir::cast(indexAttr[a]); +auto memberIndicesB = mlir::cast(indexAttr[b]); + +size_t smallestMember = memberIndicesA.size() < memberIndicesB.size() +? memberIndicesA.size() +: memberIndicesB.size(); - llvm::sort(indices.begin(), indices.end(), - [&](const size_t a, const size_t b) { - auto memberIndicesA = cast(indexAttr[a]); - auto memberIndicesB = cast(indexAttr[b]); - for (const auto it : llvm::zip(memberIndicesA, memberIndicesB)) { - int64_t aIndex = cast(std::get<0>(it)).getInt(); - int64_t bIndex = cast(std::get<1>(it)).getInt(); +for (size_t i = 0; i < smallestMember; ++i) { skatrak wrote: Nit: `llvm::zip` already iterates as long as both ranges have elements, so it stops at the shortest. I think it's better to use it in this case. https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -3197,37 +3303,77 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( // what we support as expected. llvm::omp::OpenMPOffloadMappingFlags mapFlag = mapData.Types[mapDataIndex]; ompBuilder.setCorrectMemberOfFlag(mapFlag, memberOfFlag); -combinedInfo.Types.emplace_back(mapFlag); -combinedInfo.DevicePointers.emplace_back( -llvm::OpenMPIRBuilder::DeviceInfoTy::None); -combinedInfo.Names.emplace_back(LLVM::createMappingInformation( -mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder)); -combinedInfo.BasePointers.emplace_back(mapData.BasePointers[mapDataIndex]); -combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]); -combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]); - } - return memberOfFlag; -} - -// The intent is to verify if the mapped data being passed is a -// pointer -> pointee that requires special handling in certain cases, -// e.g. applying the OMP_MAP_PTR_AND_OBJ map type. -// -// There may be a better way to verify this, but unfortunately with -// opaque pointers we lose the ability to easily check if something is -// a pointer whilst maintaining access to the underlying type. -static bool checkIfPointerMap(omp::MapInfoOp mapOp) { - // If we have a varPtrPtr field assigned then the underlying type is a pointer - if (mapOp.getVarPtrPtr()) -return true; - // If the map data is declare target with a link clause, then it's represented - // as a pointer when we lower it to LLVM-IR even if at the MLIR level it has - // no relation to pointers. - if (isDeclareTargetLink(mapOp.getVarPtr())) -return true; +if (targetDirective == TargetDirective::TargetUpdate) { + combinedInfo.Types.emplace_back(mapFlag); + combinedInfo.DevicePointers.emplace_back( + mapData.DevicePointers[mapDataIndex]); + combinedInfo.Names.emplace_back(LLVM::createMappingInformation( + mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder)); + combinedInfo.BasePointers.emplace_back( + mapData.BasePointers[mapDataIndex]); + combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]); + combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]); +} else { + llvm::SmallVector overlapIdxs; + // Find all of the members that "overlap", i.e. occlude other members that + // were mapped alongside the parent, e.g. member [0], occludes skatrak wrote: Nit: This comment seems to be incomplete. https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, omp::MapInfoOp memberOp) { return std::distance(mapData.MapClause.begin(), res); } -static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo, -bool first) { - ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); - // Only 1 member has been mapped, we can return it. - if (indexAttr.size() == 1) -return cast(mapInfo.getMembers()[0].getDefiningOp()); +static void sortMapIndices(llvm::SmallVector &indices, + mlir::omp::MapInfoOp mapInfo, + bool ascending = true) { + mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); + if (indexAttr.empty() || indexAttr.size() == 1 || indices.empty() || + indices.size() == 1) +return; - llvm::SmallVector indices(indexAttr.size()); - std::iota(indices.begin(), indices.end(), 0); + llvm::sort( + indices.begin(), indices.end(), [&](const size_t a, const size_t b) { +auto memberIndicesA = mlir::cast(indexAttr[a]); +auto memberIndicesB = mlir::cast(indexAttr[b]); + +size_t smallestMember = memberIndicesA.size() < memberIndicesB.size() +? memberIndicesA.size() +: memberIndicesB.size(); - llvm::sort(indices.begin(), indices.end(), - [&](const size_t a, const size_t b) { - auto memberIndicesA = cast(indexAttr[a]); - auto memberIndicesB = cast(indexAttr[b]); - for (const auto it : llvm::zip(memberIndicesA, memberIndicesB)) { - int64_t aIndex = cast(std::get<0>(it)).getInt(); - int64_t bIndex = cast(std::get<1>(it)).getInt(); +for (size_t i = 0; i < smallestMember; ++i) { + int64_t aIndex = + mlir::cast(memberIndicesA.getValue()[i]) + .getInt(); + int64_t bIndex = + mlir::cast(memberIndicesB.getValue()[i]) + .getInt(); - if (aIndex == bIndex) - continue; + if (aIndex == bIndex) +continue; - if (aIndex < bIndex) - return first; + if (aIndex < bIndex) +return ascending; - if (aIndex > bIndex) - return !first; - } + if (aIndex > bIndex) +return !ascending; +} - // Iterated the up until the end of the smallest member and - // they were found to be equal up to that point, so select - // the member with the lowest index count, so the "parent" - return memberIndicesA.size() < memberIndicesB.size(); - }); +// Iterated up until the end of the smallest member and +// they were found to be equal up to that point, so select +// the member with the lowest index count, so the "parent" +return memberIndicesA.size() < memberIndicesB.size(); + }); +} + +static mlir::omp::MapInfoOp +getFirstOrLastMappedMemberPtr(mlir::omp::MapInfoOp mapInfo, bool first) { + mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); + // Only 1 member has been mapped, we can return it. + if (indexAttr.size() == 1) +if (auto mapOp = +dyn_cast(mapInfo.getMembers()[0].getDefiningOp())) skatrak wrote: Let me know if I understood this wrong, but it seems like there is nothing preventing the `llvm::cast` call at the end of this function to trigger an assert if there was a single member mapped that wasn't defined by an `omp.map.info`. I don't know whether this function can be expected to return `null`, in which case we could replace the `cast` below with a `dyn_cast`, or if this check here should be replaced with `return cast(...)`. https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, omp::MapInfoOp memberOp) { return std::distance(mapData.MapClause.begin(), res); } -static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo, -bool first) { - ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); - // Only 1 member has been mapped, we can return it. - if (indexAttr.size() == 1) -return cast(mapInfo.getMembers()[0].getDefiningOp()); +static void sortMapIndices(llvm::SmallVector &indices, skatrak wrote: ```suggestion static void sortMapIndices(llvm::SmallVectorImpl &indices, ``` https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
https://github.com/skatrak edited https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, omp::MapInfoOp memberOp) { return std::distance(mapData.MapClause.begin(), res); } -static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo, -bool first) { - ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); - // Only 1 member has been mapped, we can return it. - if (indexAttr.size() == 1) -return cast(mapInfo.getMembers()[0].getDefiningOp()); +static void sortMapIndices(llvm::SmallVector &indices, + mlir::omp::MapInfoOp mapInfo, + bool ascending = true) { skatrak wrote: It seems a bit overkill to introduce this argument and allow sorting the list in reverse order just so that we can get the first or the last element in `getFirstOrLastMappedMemberPtr`. Wouldn't it be simpler to just update the `mapInfo.getMembers()[indices.front()].getDefiningOp());` expression to take `indices.front()` or `indices.back()` based on the `first` argument? https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, omp::MapInfoOp memberOp) { return std::distance(mapData.MapClause.begin(), res); } -static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo, -bool first) { - ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); - // Only 1 member has been mapped, we can return it. - if (indexAttr.size() == 1) -return cast(mapInfo.getMembers()[0].getDefiningOp()); +static void sortMapIndices(llvm::SmallVector &indices, + mlir::omp::MapInfoOp mapInfo, + bool ascending = true) { + mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); + if (indexAttr.empty() || indexAttr.size() == 1 || indices.empty() || + indices.size() == 1) +return; - llvm::SmallVector indices(indexAttr.size()); - std::iota(indices.begin(), indices.end(), 0); + llvm::sort( + indices.begin(), indices.end(), [&](const size_t a, const size_t b) { +auto memberIndicesA = mlir::cast(indexAttr[a]); +auto memberIndicesB = mlir::cast(indexAttr[b]); + +size_t smallestMember = memberIndicesA.size() < memberIndicesB.size() +? memberIndicesA.size() +: memberIndicesB.size(); - llvm::sort(indices.begin(), indices.end(), - [&](const size_t a, const size_t b) { - auto memberIndicesA = cast(indexAttr[a]); - auto memberIndicesB = cast(indexAttr[b]); - for (const auto it : llvm::zip(memberIndicesA, memberIndicesB)) { - int64_t aIndex = cast(std::get<0>(it)).getInt(); - int64_t bIndex = cast(std::get<1>(it)).getInt(); +for (size_t i = 0; i < smallestMember; ++i) { + int64_t aIndex = + mlir::cast(memberIndicesA.getValue()[i]) + .getInt(); + int64_t bIndex = + mlir::cast(memberIndicesB.getValue()[i]) + .getInt(); - if (aIndex == bIndex) - continue; + if (aIndex == bIndex) +continue; - if (aIndex < bIndex) - return first; + if (aIndex < bIndex) +return ascending; - if (aIndex > bIndex) - return !first; - } + if (aIndex > bIndex) +return !ascending; +} - // Iterated the up until the end of the smallest member and - // they were found to be equal up to that point, so select - // the member with the lowest index count, so the "parent" - return memberIndicesA.size() < memberIndicesB.size(); - }); +// Iterated up until the end of the smallest member and +// they were found to be equal up to that point, so select +// the member with the lowest index count, so the "parent" +return memberIndicesA.size() < memberIndicesB.size(); + }); +} + +static mlir::omp::MapInfoOp +getFirstOrLastMappedMemberPtr(mlir::omp::MapInfoOp mapInfo, bool first) { + mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); + // Only 1 member has been mapped, we can return it. + if (indexAttr.size() == 1) +if (auto mapOp = +dyn_cast(mapInfo.getMembers()[0].getDefiningOp())) + return mapOp; + + llvm::SmallVector indices; + indices.resize(indexAttr.size()); skatrak wrote: ```suggestion llvm::SmallVector indices(indexAttr.size()); ``` https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, omp::MapInfoOp memberOp) { return std::distance(mapData.MapClause.begin(), res); } -static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo, -bool first) { - ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); - // Only 1 member has been mapped, we can return it. - if (indexAttr.size() == 1) -return cast(mapInfo.getMembers()[0].getDefiningOp()); +static void sortMapIndices(llvm::SmallVector &indices, + mlir::omp::MapInfoOp mapInfo, + bool ascending = true) { + mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr(); + if (indexAttr.empty() || indexAttr.size() == 1 || indices.empty() || + indices.size() == 1) +return; skatrak wrote: Nit: I think this isn't necessary. `std::sort`, in which `llvm::sort` seems to be based, already returns early in these cases. https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)
@@ -3197,37 +3303,77 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( // what we support as expected. llvm::omp::OpenMPOffloadMappingFlags mapFlag = mapData.Types[mapDataIndex]; ompBuilder.setCorrectMemberOfFlag(mapFlag, memberOfFlag); -combinedInfo.Types.emplace_back(mapFlag); -combinedInfo.DevicePointers.emplace_back( -llvm::OpenMPIRBuilder::DeviceInfoTy::None); -combinedInfo.Names.emplace_back(LLVM::createMappingInformation( -mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder)); -combinedInfo.BasePointers.emplace_back(mapData.BasePointers[mapDataIndex]); -combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]); -combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]); - } - return memberOfFlag; -} - -// The intent is to verify if the mapped data being passed is a -// pointer -> pointee that requires special handling in certain cases, -// e.g. applying the OMP_MAP_PTR_AND_OBJ map type. -// -// There may be a better way to verify this, but unfortunately with -// opaque pointers we lose the ability to easily check if something is -// a pointer whilst maintaining access to the underlying type. -static bool checkIfPointerMap(omp::MapInfoOp mapOp) { - // If we have a varPtrPtr field assigned then the underlying type is a pointer - if (mapOp.getVarPtrPtr()) -return true; - // If the map data is declare target with a link clause, then it's represented - // as a pointer when we lower it to LLVM-IR even if at the MLIR level it has - // no relation to pointers. - if (isDeclareTargetLink(mapOp.getVarPtr())) -return true; +if (targetDirective == TargetDirective::TargetUpdate) { + combinedInfo.Types.emplace_back(mapFlag); + combinedInfo.DevicePointers.emplace_back( + mapData.DevicePointers[mapDataIndex]); + combinedInfo.Names.emplace_back(LLVM::createMappingInformation( + mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder)); + combinedInfo.BasePointers.emplace_back( + mapData.BasePointers[mapDataIndex]); + combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]); + combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]); +} else { + llvm::SmallVector overlapIdxs; + // Find all of the members that "overlap", i.e. occlude other members that + // were mapped alongside the parent, e.g. member [0], occludes + getOverlappedMembers(overlapIdxs, mapData, parentClause); + // We need to make sure the overlapped members are sorted in order of + // lowest address to highest address + sortMapIndices(overlapIdxs, parentClause); + + lowAddr = builder.CreatePointerCast(mapData.Pointers[mapDataIndex], + builder.getPtrTy()); + highAddr = builder.CreatePointerCast( + builder.CreateConstGEP1_32(mapData.BaseType[mapDataIndex], + mapData.Pointers[mapDataIndex], 1), + builder.getPtrTy()); + + // TODO: We may want to skip arrays/array sections in this as Clang does + // so it appears to be an optimisation rather than a neccessity though, skatrak wrote: ```suggestion // so. It appears to be an optimisation rather than a necessity though, ``` https://github.com/llvm/llvm-project/pull/119588 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Handle directive arguments in OmpDirectiveSpecifier (PR #124278)
https://github.com/kparzysz created https://github.com/llvm/llvm-project/pull/124278 Implement parsing and symbol resolution for directives that take arguments. There are a few, and most of them take objects. Special handling is needed for two that take more specialized arguments: DECLARE MAPPER and DECLARE REDUCTION. This only affects directives in METADIRECTIVE's WHEN and OTHERWISE clauses. Parsing and semantic checks of other cases is unaffected. >From e230e8ad3bcd09fc28b18f64a84fcd20d6e9bc65 Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Wed, 22 Jan 2025 09:47:44 -0600 Subject: [PATCH] [flang][OpenMP] Handle directive arguments in OmpDirectiveSpecifier Implement parsing and symbol resolution for directives that take arguments. There are a few, and most of them take objects. Special handling is needed for two that take more specialized arguments: DECLARE MAPPER and DECLARE REDUCTION. This only affects directives in METADIRECTIVE's WHEN and OTHERWISE clauses. Parsing and semantic checks of other cases is unaffected. --- flang/examples/FeatureList/FeatureList.cpp| 1 - flang/include/flang/Parser/dump-parse-tree.h | 9 +- flang/include/flang/Parser/parse-tree.h | 142 ++ flang/lib/Parser/openmp-parsers.cpp | 68 +++-- flang/lib/Parser/unparse.cpp | 40 +-- flang/lib/Semantics/check-omp-structure.cpp | 2 +- flang/lib/Semantics/resolve-names.cpp | 130 +++--- .../Parser/OpenMP/declare-mapper-unparse.f90 | 4 +- .../Parser/OpenMP/metadirective-dirspec.f90 | 242 ++ 9 files changed, 517 insertions(+), 121 deletions(-) create mode 100644 flang/test/Parser/OpenMP/metadirective-dirspec.f90 diff --git a/flang/examples/FeatureList/FeatureList.cpp b/flang/examples/FeatureList/FeatureList.cpp index 3a689c335c81c0..e35f120d8661ea 100644 --- a/flang/examples/FeatureList/FeatureList.cpp +++ b/flang/examples/FeatureList/FeatureList.cpp @@ -514,7 +514,6 @@ struct NodeVisitor { READ_FEATURE(OmpReductionClause) READ_FEATURE(OmpInReductionClause) READ_FEATURE(OmpReductionCombiner) - READ_FEATURE(OmpReductionCombiner::FunctionCombiner) READ_FEATURE(OmpReductionInitializerClause) READ_FEATURE(OmpReductionIdentifier) READ_FEATURE(OmpAllocateClause) diff --git a/flang/include/flang/Parser/dump-parse-tree.h b/flang/include/flang/Parser/dump-parse-tree.h index 1323fd695d4439..ce518c7c3edea0 100644 --- a/flang/include/flang/Parser/dump-parse-tree.h +++ b/flang/include/flang/Parser/dump-parse-tree.h @@ -476,6 +476,12 @@ class ParseTreeDumper { NODE(parser, NullInit) NODE(parser, ObjectDecl) NODE(parser, OldParameterStmt) + NODE(parser, OmpTypeSpecifier) + NODE(parser, OmpTypeNameList) + NODE(parser, OmpLocator) + NODE(parser, OmpLocatorList) + NODE(parser, OmpReductionSpecifier) + NODE(parser, OmpArgument) NODE(parser, OmpMetadirectiveDirective) NODE(parser, OmpMatchClause) NODE(parser, OmpOtherwiseClause) @@ -541,7 +547,7 @@ class ParseTreeDumper { NODE(parser, OmpDeclareTargetSpecifier) NODE(parser, OmpDeclareTargetWithClause) NODE(parser, OmpDeclareTargetWithList) - NODE(parser, OmpDeclareMapperSpecifier) + NODE(parser, OmpMapperSpecifier) NODE(parser, OmpDefaultClause) NODE_ENUM(OmpDefaultClause, DataSharingAttribute) NODE(parser, OmpVariableCategory) @@ -624,7 +630,6 @@ class ParseTreeDumper { NODE(parser, OmpReductionCombiner) NODE(parser, OmpTaskReductionClause) NODE(OmpTaskReductionClause, Modifier) - NODE(OmpReductionCombiner, FunctionCombiner) NODE(parser, OmpReductionInitializerClause) NODE(parser, OmpReductionIdentifier) NODE(parser, OmpAllocateClause) diff --git a/flang/include/flang/Parser/parse-tree.h b/flang/include/flang/Parser/parse-tree.h index 2e27b6ea7eafa1..993c1338f7235b 100644 --- a/flang/include/flang/Parser/parse-tree.h +++ b/flang/include/flang/Parser/parse-tree.h @@ -3454,15 +3454,7 @@ WRAPPER_CLASS(PauseStmt, std::optional); // --- Common definitions struct OmpClause; -struct OmpClauseList; - -struct OmpDirectiveSpecification { - TUPLE_CLASS_BOILERPLATE(OmpDirectiveSpecification); - std::tuple>> - t; - CharBlock source; -}; +struct OmpDirectiveSpecification; // 2.1 Directives or clauses may accept a list or extended-list. // A list item is a variable, array section or common block name (enclosed @@ -3475,15 +3467,76 @@ struct OmpObject { WRAPPER_CLASS(OmpObjectList, std::list); -#define MODIFIER_BOILERPLATE(...) \ - struct Modifier { \ -using Variant = std::variant<__VA_ARGS__>; \ -UNION_CLASS_BOILERPLATE(Modifier); \ -CharBlock source; \ -Variant u; \ - } +// Ref: [4.5:201-207], [5.0:293-299], [5.1:325-331], [5.2:124] +// +// reduction-identifier -> +//base-language-identifier |// since 4.5 +//- | // since 4.5, until 5.2 +//+ | * | .AND. | .OR. | .EQV. | .NEQV. | // since 4.5 +//MIN
[llvm-branch-commits] [flang] [flang][OpenMP] Handle directive arguments in OmpDirectiveSpecifier (PR #124278)
llvmbot wrote: @llvm/pr-subscribers-flang-openmp Author: Krzysztof Parzyszek (kparzysz) Changes Implement parsing and symbol resolution for directives that take arguments. There are a few, and most of them take objects. Special handling is needed for two that take more specialized arguments: DECLARE MAPPER and DECLARE REDUCTION. This only affects directives in METADIRECTIVE's WHEN and OTHERWISE clauses. Parsing and semantic checks of other cases is unaffected. --- Patch is 37.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124278.diff 9 Files Affected: - (modified) flang/examples/FeatureList/FeatureList.cpp (-1) - (modified) flang/include/flang/Parser/dump-parse-tree.h (+7-2) - (modified) flang/include/flang/Parser/parse-tree.h (+97-45) - (modified) flang/lib/Parser/openmp-parsers.cpp (+45-23) - (modified) flang/lib/Parser/unparse.cpp (+25-15) - (modified) flang/lib/Semantics/check-omp-structure.cpp (+1-1) - (modified) flang/lib/Semantics/resolve-names.cpp (+98-32) - (modified) flang/test/Parser/OpenMP/declare-mapper-unparse.f90 (+2-2) - (added) flang/test/Parser/OpenMP/metadirective-dirspec.f90 (+242) ``diff diff --git a/flang/examples/FeatureList/FeatureList.cpp b/flang/examples/FeatureList/FeatureList.cpp index 3a689c335c81c0..e35f120d8661ea 100644 --- a/flang/examples/FeatureList/FeatureList.cpp +++ b/flang/examples/FeatureList/FeatureList.cpp @@ -514,7 +514,6 @@ struct NodeVisitor { READ_FEATURE(OmpReductionClause) READ_FEATURE(OmpInReductionClause) READ_FEATURE(OmpReductionCombiner) - READ_FEATURE(OmpReductionCombiner::FunctionCombiner) READ_FEATURE(OmpReductionInitializerClause) READ_FEATURE(OmpReductionIdentifier) READ_FEATURE(OmpAllocateClause) diff --git a/flang/include/flang/Parser/dump-parse-tree.h b/flang/include/flang/Parser/dump-parse-tree.h index 1323fd695d4439..ce518c7c3edea0 100644 --- a/flang/include/flang/Parser/dump-parse-tree.h +++ b/flang/include/flang/Parser/dump-parse-tree.h @@ -476,6 +476,12 @@ class ParseTreeDumper { NODE(parser, NullInit) NODE(parser, ObjectDecl) NODE(parser, OldParameterStmt) + NODE(parser, OmpTypeSpecifier) + NODE(parser, OmpTypeNameList) + NODE(parser, OmpLocator) + NODE(parser, OmpLocatorList) + NODE(parser, OmpReductionSpecifier) + NODE(parser, OmpArgument) NODE(parser, OmpMetadirectiveDirective) NODE(parser, OmpMatchClause) NODE(parser, OmpOtherwiseClause) @@ -541,7 +547,7 @@ class ParseTreeDumper { NODE(parser, OmpDeclareTargetSpecifier) NODE(parser, OmpDeclareTargetWithClause) NODE(parser, OmpDeclareTargetWithList) - NODE(parser, OmpDeclareMapperSpecifier) + NODE(parser, OmpMapperSpecifier) NODE(parser, OmpDefaultClause) NODE_ENUM(OmpDefaultClause, DataSharingAttribute) NODE(parser, OmpVariableCategory) @@ -624,7 +630,6 @@ class ParseTreeDumper { NODE(parser, OmpReductionCombiner) NODE(parser, OmpTaskReductionClause) NODE(OmpTaskReductionClause, Modifier) - NODE(OmpReductionCombiner, FunctionCombiner) NODE(parser, OmpReductionInitializerClause) NODE(parser, OmpReductionIdentifier) NODE(parser, OmpAllocateClause) diff --git a/flang/include/flang/Parser/parse-tree.h b/flang/include/flang/Parser/parse-tree.h index 2e27b6ea7eafa1..993c1338f7235b 100644 --- a/flang/include/flang/Parser/parse-tree.h +++ b/flang/include/flang/Parser/parse-tree.h @@ -3454,15 +3454,7 @@ WRAPPER_CLASS(PauseStmt, std::optional); // --- Common definitions struct OmpClause; -struct OmpClauseList; - -struct OmpDirectiveSpecification { - TUPLE_CLASS_BOILERPLATE(OmpDirectiveSpecification); - std::tuple>> - t; - CharBlock source; -}; +struct OmpDirectiveSpecification; // 2.1 Directives or clauses may accept a list or extended-list. // A list item is a variable, array section or common block name (enclosed @@ -3475,15 +3467,76 @@ struct OmpObject { WRAPPER_CLASS(OmpObjectList, std::list); -#define MODIFIER_BOILERPLATE(...) \ - struct Modifier { \ -using Variant = std::variant<__VA_ARGS__>; \ -UNION_CLASS_BOILERPLATE(Modifier); \ -CharBlock source; \ -Variant u; \ - } +// Ref: [4.5:201-207], [5.0:293-299], [5.1:325-331], [5.2:124] +// +// reduction-identifier -> +//base-language-identifier |// since 4.5 +//- | // since 4.5, until 5.2 +//+ | * | .AND. | .OR. | .EQV. | .NEQV. | // since 4.5 +//MIN | MAX | IAND | IOR | IEOR // since 4.5 +struct OmpReductionIdentifier { + UNION_CLASS_BOILERPLATE(OmpReductionIdentifier); + std::variant u; +}; -#define MODIFIERS() std::optional> +// Ref: [4.5:222:6], [5.0:305:27], [5.1:337:19], [5.2:126:3-4], [6.0:240:27-28] +// +// combiner-expression -> // since 4.5 +//assignment-statement | +//function-reference +struct OmpReductionCombiner { + UNION_CLASS_BOILERPLATE(OmpReductionCombiner); + std::variant u; +}; +
[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/112866 >From 73554e86fc276e15db22462749aa71324d1e1f41 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 31 Oct 2024 14:10:57 +0100 Subject: [PATCH] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr register bank in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch. --- llvm/lib/CodeGen/MachineSSAContext.cpp| 27 +- .../AMDGPU/MIR/hidden-diverge-gmir.mir| 28 +++ .../AMDGPU/MIR/hidden-loop-diverge.mir| 4 +- .../AMDGPU/MIR/uses-value-from-cycle.mir | 8 +- .../GlobalISel/divergence-structurizer.mir| 80 -- .../regbankselect-mui-regbanklegalize.mir | 69 --- .../regbankselect-mui-regbankselect.mir | 18 ++-- .../AMDGPU/GlobalISel/regbankselect-mui.ll| 84 ++- .../AMDGPU/GlobalISel/regbankselect-mui.mir | 51 ++- 9 files changed, 191 insertions(+), 178 deletions(-) diff --git a/llvm/lib/CodeGen/MachineSSAContext.cpp b/llvm/lib/CodeGen/MachineSSAContext.cpp index e384187b6e8593..8e13c0916dd9e1 100644 --- a/llvm/lib/CodeGen/MachineSSAContext.cpp +++ b/llvm/lib/CodeGen/MachineSSAContext.cpp @@ -54,9 +54,34 @@ const MachineBasicBlock *MachineSSAContext::getDefBlock(Register value) const { return F->getRegInfo().getVRegDef(value)->getParent(); } +static bool isUndef(const MachineInstr &MI) { + return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF || + MI.getOpcode() == TargetOpcode::IMPLICIT_DEF; +} + +/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue() for G_PHI. template <> bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) { - return Phi.isConstantValuePHI(); + if (!Phi.isPHI()) +return false; + + // In later passes PHI may appear with an undef operand, getVRegDef can fail. + if (Phi.getOpcode() == TargetOpcode::PHI) +return Phi.isConstantValuePHI(); + + // For G_PHI we do equivalent of PHINode::hasConstantOrUndefValue(). + const MachineRegisterInfo &MRI = Phi.getMF()->getRegInfo(); + Register This = Phi.getOperand(0).getReg(); + Register ConstantValue; + for (unsigned i = 1, e = Phi.getNumOperands(); i < e; i += 2) { +Register Incoming = Phi.getOperand(i).getReg(); +if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) { + if (ConstantValue && ConstantValue != Incoming) +return false; + ConstantValue = Incoming; +} + } + return true; } template <> diff --git a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir index ce00edf3363f77..9694a340b5e906 100644 --- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir +++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir @@ -1,24 +1,24 @@ # RUN: llc -mtriple=amdgcn-- -run-pass=print-machine-uniformity -o - %s 2>&1 | FileCheck %s # CHECK-LABEL: MachineUniformityInfo for function: hidden_diverge # CHECK-LABEL: BLOCK bb.0 -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, %{{[0-9]*}}:_ -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) -# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1 -# CHECK: DIVERGENT: G_BR %bb.2 +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, %{{[0-9]*}}:_ +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) +# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1 +# CHECK: DIVERGENT: G_BR %bb.2 # CHECK-LABEL: BLOCK bb.1 # CHECK-LABEL: BLOCK bb.2 -# CHECK: D
[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)
@@ -55,15 +55,19 @@ class MapsForPrivatizedSymbolsPass std::underlying_type_t>( llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO); Operation *definingOp = var.getDefiningOp(); -auto declOp = llvm::dyn_cast_or_null(definingOp); -assert(declOp && - "Expected defining Op of privatized var to be hlfir.declare"); +assert(definingOp && + "Privatizing a block argument without any hlfir.declare"); tblah wrote: I was nervous to make any functional change to the target stuff because I don't know how to test it. The previous implementation also wouldn't have worked for block arguments. I can fix this if somebody at AMD is willing to test `omp target` for me? https://github.com/llvm/llvm-project/pull/124019 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] a41ded8 - Revert "[GlobalMerge][NFC] Skip sorting by profitability when it is not needeā¦"
Author: Michael Maitland Date: 2025-01-24T23:42:18-05:00 New Revision: a41ded832d91141939c1b4aa2e955471a1047755 URL: https://github.com/llvm/llvm-project/commit/a41ded832d91141939c1b4aa2e955471a1047755 DIFF: https://github.com/llvm/llvm-project/commit/a41ded832d91141939c1b4aa2e955471a1047755.diff LOG: Revert "[GlobalMerge][NFC] Skip sorting by profitability when it is not needeā¦" This reverts commit e5e55c04d6af4ae32c99d574f59e632595abf607. Added: Modified: llvm/lib/CodeGen/GlobalMerge.cpp Removed: diff --git a/llvm/lib/CodeGen/GlobalMerge.cpp b/llvm/lib/CodeGen/GlobalMerge.cpp index 41e01a1d3ccd52..7b76155b175d1d 100644 --- a/llvm/lib/CodeGen/GlobalMerge.cpp +++ b/llvm/lib/CodeGen/GlobalMerge.cpp @@ -423,12 +423,24 @@ bool GlobalMergeImpl::doMerge(SmallVectorImpl &Globals, } } + // Now we found a bunch of sets of globals used together. We accumulated + // the number of times we encountered the sets (i.e., the number of functions + // that use that exact set of globals). + // + // Multiply that by the size of the set to give us a crude profitability + // metric. + llvm::stable_sort(UsedGlobalSets, +[](const UsedGlobalSet &UGS1, const UsedGlobalSet &UGS2) { + return UGS1.Globals.count() * UGS1.UsageCount < + UGS2.Globals.count() * UGS2.UsageCount; +}); + // We can choose to merge all globals together, but ignore globals never used // with another global. This catches the obviously non-profitable cases of // having a single global, but is aggressive enough for any other case. if (GlobalMergeIgnoreSingleUse) { BitVector AllGlobals(Globals.size()); -for (const UsedGlobalSet &UGS : UsedGlobalSets) { +for (const UsedGlobalSet &UGS : llvm::reverse(UsedGlobalSets)) { if (UGS.UsageCount == 0) continue; if (UGS.Globals.count() > 1) @@ -437,16 +449,6 @@ bool GlobalMergeImpl::doMerge(SmallVectorImpl &Globals, return doMerge(Globals, AllGlobals, M, isConst, AddrSpace); } - // Now we found a bunch of sets of globals used together. We accumulated - // the number of times we encountered the sets (i.e., the number of functions - // that use that exact set of globals). Multiply that by the size of the set - // to give us a crude profitability metric. - llvm::stable_sort(UsedGlobalSets, -[](const UsedGlobalSet &UGS1, const UsedGlobalSet &UGS2) { - return UGS1.Globals.count() * UGS1.UsageCount >= - UGS2.Globals.count() * UGS2.UsageCount; -}); - // Starting from the sets with the best (=biggest) profitability, find a // good combination. // The ideal (and expensive) solution can only be found by trying all @@ -456,7 +458,7 @@ bool GlobalMergeImpl::doMerge(SmallVectorImpl &Globals, BitVector PickedGlobals(Globals.size()); bool Changed = false; - for (const UsedGlobalSet &UGS : UsedGlobalSets) { + for (const UsedGlobalSet &UGS : llvm::reverse(UsedGlobalSets)) { if (UGS.UsageCount == 0) continue; if (PickedGlobals.anyCommon(UGS.Globals)) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)
https://github.com/zyn0217 edited https://github.com/llvm/llvm-project/pull/124386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)
https://github.com/zyn0217 approved this pull request. Generally looks good, but please give others a chance to take a look. https://github.com/llvm/llvm-project/pull/124386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)
@@ -5252,63 +5253,89 @@ bool Sema::CheckTemplateArgument( return true; } -switch (Arg.getArgument().getKind()) { -case TemplateArgument::Null: - llvm_unreachable("Should never see a NULL template argument here"); - -case TemplateArgument::Expression: { - Expr *E = Arg.getArgument().getAsExpr(); +auto checkExpr = [&](Expr *E) -> Expr * { TemplateArgument SugaredResult, CanonicalResult; unsigned CurSFINAEErrors = NumSFINAEErrors; ExprResult Res = CheckTemplateArgument(NTTP, NTTPType, E, SugaredResult, CanonicalResult, PartialOrderingTTP, CTAK); - if (Res.isInvalid()) -return true; // If the current template argument causes an error, give up now. - if (CurSFINAEErrors < NumSFINAEErrors) -return true; + if (Res.isInvalid() || CurSFINAEErrors < NumSFINAEErrors) +return nullptr; + SugaredConverted.push_back(SugaredResult); + CanonicalConverted.push_back(CanonicalResult); + return Res.get(); +}; + +switch (Arg.getKind()) { +case TemplateArgument::Null: + llvm_unreachable("Should never see a NULL template argument here"); +case TemplateArgument::Expression: { + Expr *E = Arg.getAsExpr(); + Expr *R = checkExpr(E); + if (!R) +return true; // If the resulting expression is new, then use it in place of the // old expression in the template argument. - if (Res.get() != E) { -TemplateArgument TA(Res.get()); -Arg = TemplateArgumentLoc(TA, Res.get()); + if (R != E) { +TemplateArgument TA(R); +ArgLoc = TemplateArgumentLoc(TA, R); } - - SugaredConverted.push_back(SugaredResult); - CanonicalConverted.push_back(CanonicalResult); break; } -case TemplateArgument::Declaration: -case TemplateArgument::Integral: +// As for the converted NTTP kinds, they still might need another +// conversion, as the new corresponding parameter might be different. +// Ideally, we would always perform substitution starting with sugared types +// and never need these, as we would still have expressions. Since these are +// needed so rarely, it's probably a better tradeoff to just convert them +// back to expressions. +case TemplateArgument::Integral: { + IntegerLiteral ILE(Context, Arg.getAsIntegral(), Arg.getIntegralType(), + SourceLocation()); + if (!checkExpr(&ILE)) zyn0217 wrote: So this makes `CheckTemplateArgument` take an Expr pointer to a temporary rather than anything persisted by ASTContext. Shall we document this behavior somewhere to avoid accidentally storing it longer in `CheckTemplateArgument`? https://github.com/llvm/llvm-project/pull/124386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/112866 >From c336fe428d4d1824a4a437c99655cb909bf328c6 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 31 Oct 2024 14:10:57 +0100 Subject: [PATCH] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr register bank in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch. --- llvm/lib/CodeGen/MachineSSAContext.cpp| 27 +- .../AMDGPU/MIR/hidden-diverge-gmir.mir| 28 +++ .../AMDGPU/MIR/hidden-loop-diverge.mir| 4 +- .../AMDGPU/MIR/uses-value-from-cycle.mir | 8 +- .../GlobalISel/divergence-structurizer.mir| 80 -- .../regbankselect-mui-regbanklegalize.mir | 69 --- .../regbankselect-mui-regbankselect.mir | 18 ++-- .../AMDGPU/GlobalISel/regbankselect-mui.ll| 84 ++- .../AMDGPU/GlobalISel/regbankselect-mui.mir | 51 ++- 9 files changed, 191 insertions(+), 178 deletions(-) diff --git a/llvm/lib/CodeGen/MachineSSAContext.cpp b/llvm/lib/CodeGen/MachineSSAContext.cpp index e384187b6e8593..8e13c0916dd9e1 100644 --- a/llvm/lib/CodeGen/MachineSSAContext.cpp +++ b/llvm/lib/CodeGen/MachineSSAContext.cpp @@ -54,9 +54,34 @@ const MachineBasicBlock *MachineSSAContext::getDefBlock(Register value) const { return F->getRegInfo().getVRegDef(value)->getParent(); } +static bool isUndef(const MachineInstr &MI) { + return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF || + MI.getOpcode() == TargetOpcode::IMPLICIT_DEF; +} + +/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue() for G_PHI. template <> bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) { - return Phi.isConstantValuePHI(); + if (!Phi.isPHI()) +return false; + + // In later passes PHI may appear with an undef operand, getVRegDef can fail. + if (Phi.getOpcode() == TargetOpcode::PHI) +return Phi.isConstantValuePHI(); + + // For G_PHI we do equivalent of PHINode::hasConstantOrUndefValue(). + const MachineRegisterInfo &MRI = Phi.getMF()->getRegInfo(); + Register This = Phi.getOperand(0).getReg(); + Register ConstantValue; + for (unsigned i = 1, e = Phi.getNumOperands(); i < e; i += 2) { +Register Incoming = Phi.getOperand(i).getReg(); +if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) { + if (ConstantValue && ConstantValue != Incoming) +return false; + ConstantValue = Incoming; +} + } + return true; } template <> diff --git a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir index ce00edf3363f77..9694a340b5e906 100644 --- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir +++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir @@ -1,24 +1,24 @@ # RUN: llc -mtriple=amdgcn-- -run-pass=print-machine-uniformity -o - %s 2>&1 | FileCheck %s # CHECK-LABEL: MachineUniformityInfo for function: hidden_diverge # CHECK-LABEL: BLOCK bb.0 -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, %{{[0-9]*}}:_ -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) -# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1 -# CHECK: DIVERGENT: G_BR %bb.2 +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, %{{[0-9]*}}:_ +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) +# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1 +# CHECK: DIVERGENT: G_BR %bb.2 # CHECK-LABEL: BLOCK bb.1 # CHECK-LABEL: BLOCK bb.2 -# CHECK: D
[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/112866 >From 87c8fc15b5b8ccb0b7d48065caa82cdbeddfeac5 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 31 Oct 2024 14:10:57 +0100 Subject: [PATCH] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr register bank in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch. --- llvm/lib/CodeGen/MachineSSAContext.cpp| 27 +- .../AMDGPU/MIR/hidden-diverge-gmir.mir| 28 +++ .../AMDGPU/MIR/hidden-loop-diverge.mir| 4 +- .../AMDGPU/MIR/uses-value-from-cycle.mir | 8 +- .../GlobalISel/divergence-structurizer.mir| 80 -- .../regbankselect-mui-regbanklegalize.mir | 69 --- .../regbankselect-mui-regbankselect.mir | 18 ++-- .../AMDGPU/GlobalISel/regbankselect-mui.ll| 84 ++- .../AMDGPU/GlobalISel/regbankselect-mui.mir | 51 ++- 9 files changed, 191 insertions(+), 178 deletions(-) diff --git a/llvm/lib/CodeGen/MachineSSAContext.cpp b/llvm/lib/CodeGen/MachineSSAContext.cpp index e384187b6e8593..8e13c0916dd9e1 100644 --- a/llvm/lib/CodeGen/MachineSSAContext.cpp +++ b/llvm/lib/CodeGen/MachineSSAContext.cpp @@ -54,9 +54,34 @@ const MachineBasicBlock *MachineSSAContext::getDefBlock(Register value) const { return F->getRegInfo().getVRegDef(value)->getParent(); } +static bool isUndef(const MachineInstr &MI) { + return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF || + MI.getOpcode() == TargetOpcode::IMPLICIT_DEF; +} + +/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue() for G_PHI. template <> bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) { - return Phi.isConstantValuePHI(); + if (!Phi.isPHI()) +return false; + + // In later passes PHI may appear with an undef operand, getVRegDef can fail. + if (Phi.getOpcode() == TargetOpcode::PHI) +return Phi.isConstantValuePHI(); + + // For G_PHI we do equivalent of PHINode::hasConstantOrUndefValue(). + const MachineRegisterInfo &MRI = Phi.getMF()->getRegInfo(); + Register This = Phi.getOperand(0).getReg(); + Register ConstantValue; + for (unsigned i = 1, e = Phi.getNumOperands(); i < e; i += 2) { +Register Incoming = Phi.getOperand(i).getReg(); +if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) { + if (ConstantValue && ConstantValue != Incoming) +return false; + ConstantValue = Incoming; +} + } + return true; } template <> diff --git a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir index ce00edf3363f77..9694a340b5e906 100644 --- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir +++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir @@ -1,24 +1,24 @@ # RUN: llc -mtriple=amdgcn-- -run-pass=print-machine-uniformity -o - %s 2>&1 | FileCheck %s # CHECK-LABEL: MachineUniformityInfo for function: hidden_diverge # CHECK-LABEL: BLOCK bb.0 -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, %{{[0-9]*}}:_ -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) -# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1 -# CHECK: DIVERGENT: G_BR %bb.2 +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, %{{[0-9]*}}:_ +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) +# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1 +# CHECK: DIVERGENT: G_BR %bb.2 # CHECK-LABEL: BLOCK bb.1 # CHECK-LABEL: BLOCK bb.2 -# CHECK: D
[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/112866 >From 87c8fc15b5b8ccb0b7d48065caa82cdbeddfeac5 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 31 Oct 2024 14:10:57 +0100 Subject: [PATCH] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr register bank in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch. --- llvm/lib/CodeGen/MachineSSAContext.cpp| 27 +- .../AMDGPU/MIR/hidden-diverge-gmir.mir| 28 +++ .../AMDGPU/MIR/hidden-loop-diverge.mir| 4 +- .../AMDGPU/MIR/uses-value-from-cycle.mir | 8 +- .../GlobalISel/divergence-structurizer.mir| 80 -- .../regbankselect-mui-regbanklegalize.mir | 69 --- .../regbankselect-mui-regbankselect.mir | 18 ++-- .../AMDGPU/GlobalISel/regbankselect-mui.ll| 84 ++- .../AMDGPU/GlobalISel/regbankselect-mui.mir | 51 ++- 9 files changed, 191 insertions(+), 178 deletions(-) diff --git a/llvm/lib/CodeGen/MachineSSAContext.cpp b/llvm/lib/CodeGen/MachineSSAContext.cpp index e384187b6e8593..8e13c0916dd9e1 100644 --- a/llvm/lib/CodeGen/MachineSSAContext.cpp +++ b/llvm/lib/CodeGen/MachineSSAContext.cpp @@ -54,9 +54,34 @@ const MachineBasicBlock *MachineSSAContext::getDefBlock(Register value) const { return F->getRegInfo().getVRegDef(value)->getParent(); } +static bool isUndef(const MachineInstr &MI) { + return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF || + MI.getOpcode() == TargetOpcode::IMPLICIT_DEF; +} + +/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue() for G_PHI. template <> bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) { - return Phi.isConstantValuePHI(); + if (!Phi.isPHI()) +return false; + + // In later passes PHI may appear with an undef operand, getVRegDef can fail. + if (Phi.getOpcode() == TargetOpcode::PHI) +return Phi.isConstantValuePHI(); + + // For G_PHI we do equivalent of PHINode::hasConstantOrUndefValue(). + const MachineRegisterInfo &MRI = Phi.getMF()->getRegInfo(); + Register This = Phi.getOperand(0).getReg(); + Register ConstantValue; + for (unsigned i = 1, e = Phi.getNumOperands(); i < e; i += 2) { +Register Incoming = Phi.getOperand(i).getReg(); +if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) { + if (ConstantValue && ConstantValue != Incoming) +return false; + ConstantValue = Incoming; +} + } + return true; } template <> diff --git a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir index ce00edf3363f77..9694a340b5e906 100644 --- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir +++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir @@ -1,24 +1,24 @@ # RUN: llc -mtriple=amdgcn-- -run-pass=print-machine-uniformity -o - %s 2>&1 | FileCheck %s # CHECK-LABEL: MachineUniformityInfo for function: hidden_diverge # CHECK-LABEL: BLOCK bb.0 -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, %{{[0-9]*}}:_ -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) -# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) -# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1 -# CHECK: DIVERGENT: G_BR %bb.2 +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, %{{[0-9]*}}:_ +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) +# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if) +# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1 +# CHECK: DIVERGENT: G_BR %bb.2 # CHECK-LABEL: BLOCK bb.1 # CHECK-LABEL: BLOCK bb.2 -# CHECK: D
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for load (PR #112882)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/112882 >From 8cb73f44dd58c897ad3acde5e29014a21ea38ea4 Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 23 Jan 2025 13:35:07 +0100 Subject: [PATCH] AMDGPU/GlobalISel: RegBankLegalize rules for load Add IDs for bit width that cover multiple LLTs: B32 B64 etc. "Predicate" wrapper class for bool predicate functions used to write pretty rules. Predicates can be combined using &&, || and !. Lowering for splitting and widening loads. Write rules for loads to not change existing mir tests from old regbankselect. --- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 288 +++- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 5 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 278 ++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 65 +++- .../AMDGPU/GlobalISel/regbankselect-load.mir | 320 +++--- .../GlobalISel/regbankselect-zextload.mir | 9 +- 6 files changed, 900 insertions(+), 65 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index d27fa1f62538b6..3c007987b84947 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -50,6 +50,83 @@ void RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) { lower(MI, Mapping, WaterfallSgprs); } +void RegBankLegalizeHelper::splitLoad(MachineInstr &MI, + ArrayRef LLTBreakdown, LLT MergeTy) { + MachineFunction &MF = B.getMF(); + assert(MI.getNumMemOperands() == 1); + MachineMemOperand &BaseMMO = **MI.memoperands_begin(); + Register Dst = MI.getOperand(0).getReg(); + const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst); + Register Base = MI.getOperand(1).getReg(); + LLT PtrTy = MRI.getType(Base); + const RegisterBank *PtrRB = MRI.getRegBankOrNull(Base); + LLT OffsetTy = LLT::scalar(PtrTy.getSizeInBits()); + SmallVector LoadPartRegs; + + unsigned ByteOffset = 0; + for (LLT PartTy : LLTBreakdown) { +Register BasePlusOffset; +if (ByteOffset == 0) { + BasePlusOffset = Base; +} else { + auto Offset = B.buildConstant({PtrRB, OffsetTy}, ByteOffset); + BasePlusOffset = B.buildPtrAdd({PtrRB, PtrTy}, Base, Offset).getReg(0); +} +auto *OffsetMMO = MF.getMachineMemOperand(&BaseMMO, ByteOffset, PartTy); +auto LoadPart = B.buildLoad({DstRB, PartTy}, BasePlusOffset, *OffsetMMO); +LoadPartRegs.push_back(LoadPart.getReg(0)); +ByteOffset += PartTy.getSizeInBytes(); + } + + if (!MergeTy.isValid()) { +// Loads are of same size, concat or merge them together. +B.buildMergeLikeInstr(Dst, LoadPartRegs); + } else { +// Loads are not all of same size, need to unmerge them to smaller pieces +// of MergeTy type, then merge pieces to Dst. +SmallVector MergeTyParts; +for (Register Reg : LoadPartRegs) { + if (MRI.getType(Reg) == MergeTy) { +MergeTyParts.push_back(Reg); + } else { +auto Unmerge = B.buildUnmerge({DstRB, MergeTy}, Reg); +for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) + MergeTyParts.push_back(Unmerge.getReg(i)); + } +} +B.buildMergeLikeInstr(Dst, MergeTyParts); + } + MI.eraseFromParent(); +} + +void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy, + LLT MergeTy) { + MachineFunction &MF = B.getMF(); + assert(MI.getNumMemOperands() == 1); + MachineMemOperand &BaseMMO = **MI.memoperands_begin(); + Register Dst = MI.getOperand(0).getReg(); + const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst); + Register Base = MI.getOperand(1).getReg(); + + MachineMemOperand *WideMMO = MF.getMachineMemOperand(&BaseMMO, 0, WideTy); + auto WideLoad = B.buildLoad({DstRB, WideTy}, Base, *WideMMO); + + if (WideTy.isScalar()) { +B.buildTrunc(Dst, WideLoad); + } else { +SmallVector MergeTyParts; +auto Unmerge = B.buildUnmerge({DstRB, MergeTy}, WideLoad); + +LLT DstTy = MRI.getType(Dst); +unsigned NumElts = DstTy.getSizeInBits() / MergeTy.getSizeInBits(); +for (unsigned i = 0; i < NumElts; ++i) { + MergeTyParts.push_back(Unmerge.getReg(i)); +} +B.buildMergeLikeInstr(Dst, MergeTyParts); + } + MI.eraseFromParent(); +} + void RegBankLegalizeHelper::lower(MachineInstr &MI, const RegBankLLTMapping &Mapping, SmallSet &WaterfallSgprs) { @@ -128,6 +205,54 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI, MI.eraseFromParent(); break; } + case SplitLoad: { +LLT DstTy = MRI.getType(MI.getOperand(0).getReg()); +unsigned Size = DstTy.getSizeInBits(); +// Even split to 128-bit loads +if (Size > 128) { + LLT B128; + if (DstTy.isVector()) { +LLT EltTy = DstTy.getElementType(); +B128 = LLT::f
[llvm-branch-commits] [llvm] [Coro] Use DebugInfoCache to speed up cloning in CoroSplitPass (PR #118630)
https://github.com/artempyanykh edited https://github.com/llvm/llvm-project/pull/118630 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Analysis] Add DebugInfoCache analysis (PR #118629)
https://github.com/artempyanykh edited https://github.com/llvm/llvm-project/pull/118629 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for load (PR #112882)
https://github.com/petar-avramovic updated https://github.com/llvm/llvm-project/pull/112882 >From 0030251f71c08000c1b4ff123cc401b70c72014f Mon Sep 17 00:00:00 2001 From: Petar Avramovic Date: Thu, 23 Jan 2025 13:35:07 +0100 Subject: [PATCH] AMDGPU/GlobalISel: RegBankLegalize rules for load Add IDs for bit width that cover multiple LLTs: B32 B64 etc. "Predicate" wrapper class for bool predicate functions used to write pretty rules. Predicates can be combined using &&, || and !. Lowering for splitting and widening loads. Write rules for loads to not change existing mir tests from old regbankselect. --- .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 288 +++- .../AMDGPU/AMDGPURegBankLegalizeHelper.h | 5 + .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 278 ++- .../AMDGPU/AMDGPURegBankLegalizeRules.h | 65 +++- .../AMDGPU/GlobalISel/regbankselect-load.mir | 320 +++--- .../GlobalISel/regbankselect-zextload.mir | 9 +- 6 files changed, 900 insertions(+), 65 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp index d27fa1f62538b6..3c007987b84947 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp @@ -50,6 +50,83 @@ void RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) { lower(MI, Mapping, WaterfallSgprs); } +void RegBankLegalizeHelper::splitLoad(MachineInstr &MI, + ArrayRef LLTBreakdown, LLT MergeTy) { + MachineFunction &MF = B.getMF(); + assert(MI.getNumMemOperands() == 1); + MachineMemOperand &BaseMMO = **MI.memoperands_begin(); + Register Dst = MI.getOperand(0).getReg(); + const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst); + Register Base = MI.getOperand(1).getReg(); + LLT PtrTy = MRI.getType(Base); + const RegisterBank *PtrRB = MRI.getRegBankOrNull(Base); + LLT OffsetTy = LLT::scalar(PtrTy.getSizeInBits()); + SmallVector LoadPartRegs; + + unsigned ByteOffset = 0; + for (LLT PartTy : LLTBreakdown) { +Register BasePlusOffset; +if (ByteOffset == 0) { + BasePlusOffset = Base; +} else { + auto Offset = B.buildConstant({PtrRB, OffsetTy}, ByteOffset); + BasePlusOffset = B.buildPtrAdd({PtrRB, PtrTy}, Base, Offset).getReg(0); +} +auto *OffsetMMO = MF.getMachineMemOperand(&BaseMMO, ByteOffset, PartTy); +auto LoadPart = B.buildLoad({DstRB, PartTy}, BasePlusOffset, *OffsetMMO); +LoadPartRegs.push_back(LoadPart.getReg(0)); +ByteOffset += PartTy.getSizeInBytes(); + } + + if (!MergeTy.isValid()) { +// Loads are of same size, concat or merge them together. +B.buildMergeLikeInstr(Dst, LoadPartRegs); + } else { +// Loads are not all of same size, need to unmerge them to smaller pieces +// of MergeTy type, then merge pieces to Dst. +SmallVector MergeTyParts; +for (Register Reg : LoadPartRegs) { + if (MRI.getType(Reg) == MergeTy) { +MergeTyParts.push_back(Reg); + } else { +auto Unmerge = B.buildUnmerge({DstRB, MergeTy}, Reg); +for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) + MergeTyParts.push_back(Unmerge.getReg(i)); + } +} +B.buildMergeLikeInstr(Dst, MergeTyParts); + } + MI.eraseFromParent(); +} + +void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy, + LLT MergeTy) { + MachineFunction &MF = B.getMF(); + assert(MI.getNumMemOperands() == 1); + MachineMemOperand &BaseMMO = **MI.memoperands_begin(); + Register Dst = MI.getOperand(0).getReg(); + const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst); + Register Base = MI.getOperand(1).getReg(); + + MachineMemOperand *WideMMO = MF.getMachineMemOperand(&BaseMMO, 0, WideTy); + auto WideLoad = B.buildLoad({DstRB, WideTy}, Base, *WideMMO); + + if (WideTy.isScalar()) { +B.buildTrunc(Dst, WideLoad); + } else { +SmallVector MergeTyParts; +auto Unmerge = B.buildUnmerge({DstRB, MergeTy}, WideLoad); + +LLT DstTy = MRI.getType(Dst); +unsigned NumElts = DstTy.getSizeInBits() / MergeTy.getSizeInBits(); +for (unsigned i = 0; i < NumElts; ++i) { + MergeTyParts.push_back(Unmerge.getReg(i)); +} +B.buildMergeLikeInstr(Dst, MergeTyParts); + } + MI.eraseFromParent(); +} + void RegBankLegalizeHelper::lower(MachineInstr &MI, const RegBankLLTMapping &Mapping, SmallSet &WaterfallSgprs) { @@ -128,6 +205,54 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI, MI.eraseFromParent(); break; } + case SplitLoad: { +LLT DstTy = MRI.getType(MI.getOperand(0).getReg()); +unsigned Size = DstTy.getSizeInBits(); +// Even split to 128-bit loads +if (Size > 128) { + LLT B128; + if (DstTy.isVector()) { +LLT EltTy = DstTy.getElementType(); +B128 = LLT::f
[llvm-branch-commits] [clang] [Clang][CWG2369] Implement GCC's heuristic for DR 2369 (PR #124231)
https://github.com/zyn0217 updated https://github.com/llvm/llvm-project/pull/124231 >From f766c8c099cf8f1bc076c0308afc3a2832a5b495 Mon Sep 17 00:00:00 2001 From: Younan Zhang Date: Fri, 24 Jan 2025 13:52:37 +0800 Subject: [PATCH] Implement GCC's CWG 2369 heuristic --- clang/include/clang/Sema/Sema.h | 9 +- clang/lib/Sema/SemaOverload.cpp | 62 ++- clang/lib/Sema/SemaTemplateDeduction.cpp | 14 +- .../SemaTemplate/concepts-recursive-inst.cpp | 169 ++ 4 files changed, 241 insertions(+), 13 deletions(-) diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h index 87d9a335763e31..99ca65159106b5 100644 --- a/clang/include/clang/Sema/Sema.h +++ b/clang/include/clang/Sema/Sema.h @@ -10236,7 +10236,8 @@ class Sema final : public SemaBase { FunctionTemplateDecl *FunctionTemplate, ArrayRef ParamTypes, ArrayRef Args, OverloadCandidateSet &CandidateSet, ConversionSequenceList &Conversions, bool SuppressUserConversions, - CXXRecordDecl *ActingContext = nullptr, QualType ObjectType = QualType(), + bool NonInstOnly, CXXRecordDecl *ActingContext = nullptr, + QualType ObjectType = QualType(), Expr::Classification ObjectClassification = {}, OverloadCandidateParamOrder PO = {}); @@ -12272,7 +12273,9 @@ class Sema final : public SemaBase { sema::TemplateDeductionInfo &Info, SmallVectorImpl const *OriginalCallArgs = nullptr, bool PartialOverloading = false, - llvm::function_ref CheckNonDependent = [] { return false; }); + llvm::function_ref CheckNonDependent = [](bool) { +return false; + }); /// Perform template argument deduction from a function call /// (C++ [temp.deduct.call]). @@ -12306,7 +12309,7 @@ class Sema final : public SemaBase { FunctionDecl *&Specialization, sema::TemplateDeductionInfo &Info, bool PartialOverloading, bool AggregateDeductionCandidate, QualType ObjectType, Expr::Classification ObjectClassification, - llvm::function_ref)> CheckNonDependent); + llvm::function_ref, bool)> CheckNonDependent); /// Deduce template arguments when taking the address of a function /// template (C++ [temp.deduct.funcaddr]) or matching a specialization to diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp index 3be9ade80f1d94..c2baa75c09bce9 100644 --- a/clang/lib/Sema/SemaOverload.cpp +++ b/clang/lib/Sema/SemaOverload.cpp @@ -7733,10 +7733,10 @@ void Sema::AddMethodTemplateCandidate( MethodTmpl, ExplicitTemplateArgs, Args, Specialization, Info, PartialOverloading, /*AggregateDeductionCandidate=*/false, ObjectType, ObjectClassification, - [&](ArrayRef ParamTypes) { + [&](ArrayRef ParamTypes, bool NonInstOnly) { return CheckNonDependentConversions( MethodTmpl, ParamTypes, Args, CandidateSet, Conversions, -SuppressUserConversions, ActingContext, ObjectType, +SuppressUserConversions, NonInstOnly, ActingContext, ObjectType, ObjectClassification, PO); }); Result != TemplateDeductionResult::Success) { @@ -7818,10 +7818,11 @@ void Sema::AddTemplateOverloadCandidate( PartialOverloading, AggregateCandidateDeduction, /*ObjectType=*/QualType(), /*ObjectClassification=*/Expr::Classification(), - [&](ArrayRef ParamTypes) { + [&](ArrayRef ParamTypes, bool NonInstOnly) { return CheckNonDependentConversions( FunctionTemplate, ParamTypes, Args, CandidateSet, Conversions, -SuppressUserConversions, nullptr, QualType(), {}, PO); +SuppressUserConversions, NonInstOnly, nullptr, QualType(), {}, +PO); }); Result != TemplateDeductionResult::Success) { OverloadCandidate &Candidate = @@ -7863,7 +7864,7 @@ bool Sema::CheckNonDependentConversions( FunctionTemplateDecl *FunctionTemplate, ArrayRef ParamTypes, ArrayRef Args, OverloadCandidateSet &CandidateSet, ConversionSequenceList &Conversions, bool SuppressUserConversions, -CXXRecordDecl *ActingContext, QualType ObjectType, +bool NonInstOnly, CXXRecordDecl *ActingContext, QualType ObjectType, Expr::Classification ObjectClassification, OverloadCandidateParamOrder PO) { // FIXME: The cases in which we allow explicit conversions for constructor // arguments never consider calling a constructor template. It's not clear @@ -7900,6 +7901,54 @@ bool Sema::CheckNonDependentConversions( } } + // A heuristic & speculative workaround for bug + // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99599 that manifests after + // CWG2369. + auto ConversionMightInduceInstantiation = [&](QualType ParmType, +QualType ArgType) { +ParmType = ParmType.getNonReferenceType(); +
[llvm-branch-commits] [llvm] [Coro] Use DebugInfoCache to speed up cloning in CoroSplitPass (PR #118630)
https://github.com/artempyanykh updated https://github.com/llvm/llvm-project/pull/118630 >From 27e99070e3694c4bdb4b71fcdfa5c6153b8b6d1e Mon Sep 17 00:00:00 2001 From: Artem Pianykh Date: Sun, 15 Sep 2024 11:00:00 -0700 Subject: [PATCH] [Coro] Use DebugInfoCache to speed up cloning in CoroSplitPass Summary: We can use a DebugInfoFinder from DebugInfoCache which is already primed on a compile unit to speed up collection of module-level debug info. The pass could likely be another 2x+ faster if we avoid rebuilding the set of common debug info. This needs further massaging of CloneFunction and ValueMapper, though, and can be done incrementally on top of this. Comparing performance of CoroSplitPass at various points in this stack, this is anecdata from a sample cpp file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt CommonDI | Cached CU DIFinder (cur.) | |-|--||---|---| | CoroSplitPass | 306ms| 221ms | 68ms | 17ms | | CoroCloner | 101ms| 72ms | 0.5ms | 0.5ms | | CollectCommonDI | -| - | 63ms | 13ms | | Speed up| 1x | 1.4x | 4.5x | 18x | Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample cpp file with time trace to get the avg. duration of the pass and inner scopes. stack-info: PR: https://github.com/llvm/llvm-project/pull/118630, branch: users/artempyanykh/fast-coro-upstream/11 --- llvm/include/llvm/Transforms/Coroutines/ABI.h | 13 +++-- llvm/lib/Analysis/CGSCCPassManager.cpp| 7 +++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp | 55 +++ llvm/test/Other/new-pass-manager.ll | 1 + llvm/test/Other/new-pm-defaults.ll| 1 + llvm/test/Other/new-pm-lto-defaults.ll| 1 + llvm/test/Other/new-pm-pgo-preinline.ll | 1 + .../Other/new-pm-thinlto-postlink-defaults.ll | 1 + .../new-pm-thinlto-postlink-pgo-defaults.ll | 1 + ...-pm-thinlto-postlink-samplepgo-defaults.ll | 1 + .../Other/new-pm-thinlto-prelink-defaults.ll | 1 + .../new-pm-thinlto-prelink-pgo-defaults.ll| 1 + ...w-pm-thinlto-prelink-samplepgo-defaults.ll | 1 + .../Analysis/CGSCCPassManagerTest.cpp | 4 +- 14 files changed, 72 insertions(+), 17 deletions(-) diff --git a/llvm/include/llvm/Transforms/Coroutines/ABI.h b/llvm/include/llvm/Transforms/Coroutines/ABI.h index 0b2d405f3caec4..2cf614b6bb1e2a 100644 --- a/llvm/include/llvm/Transforms/Coroutines/ABI.h +++ b/llvm/include/llvm/Transforms/Coroutines/ABI.h @@ -15,6 +15,7 @@ #ifndef LLVM_TRANSFORMS_COROUTINES_ABI_H #define LLVM_TRANSFORMS_COROUTINES_ABI_H +#include "llvm/Analysis/DebugInfoCache.h" #include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/Transforms/Coroutines/CoroShape.h" #include "llvm/Transforms/Coroutines/MaterializationUtils.h" @@ -53,7 +54,8 @@ class BaseABI { // Perform the function splitting according to the ABI. virtual void splitCoroutine(Function &F, coro::Shape &Shape, SmallVectorImpl &Clones, - TargetTransformInfo &TTI) = 0; + TargetTransformInfo &TTI, + const DebugInfoCache *DICache) = 0; Function &F; coro::Shape &Shape; @@ -73,7 +75,8 @@ class SwitchABI : public BaseABI { void splitCoroutine(Function &F, coro::Shape &Shape, SmallVectorImpl &Clones, - TargetTransformInfo &TTI) override; + TargetTransformInfo &TTI, + const DebugInfoCache *DICache) override; }; class AsyncABI : public BaseABI { @@ -86,7 +89,8 @@ class AsyncABI : public BaseABI { void splitCoroutine(Function &F, coro::Shape &Shape, SmallVectorImpl &Clones, - TargetTransformInfo &TTI) override; + TargetTransformInfo &TTI, + const DebugInfoCache *DICache) override; }; class AnyRetconABI : public BaseABI { @@ -99,7 +103,8 @@ class AnyRetconABI : public BaseABI { void splitCoroutine(Function &F, coro::Shape &Shape, SmallVectorImpl &Clones, - TargetTransformInfo &TTI) override; + TargetTransformInfo &TTI, + const DebugInfoCache *DICache) override; }; } // end namespace coro diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp b/llvm/lib/Analysis/CGSCCPassManager.cpp index 948bc2435ab275..3ba085cdb0be8b 100644 --- a/llvm/lib/Analysis/CGSCCPassManager.cpp +++ b/llvm/lib/Analysis/CGSCCPassManager.cpp @@ -14,6 +14,7 @@ #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/iterator_range.h" +#include "llvm/Analy
[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)
https://github.com/mizvekov created https://github.com/llvm/llvm-project/pull/124386 Converted template arguments need to be converted again, if the corresponding template parameter changed, as different conversions might apply in that case. >From 9b174f4505eaf19e0ccfb1ec905c8206bb575d4b Mon Sep 17 00:00:00 2001 From: Matheus Izvekov Date: Fri, 24 Jan 2025 19:25:38 -0300 Subject: [PATCH] [clang] fix template argument conversion Converted template arguments need to be converted again, if the corresponding template parameter changed, as different conversions might apply in that case. --- clang/docs/ReleaseNotes.rst | 3 + clang/lib/Sema/SemaTemplate.cpp | 120 +--- clang/lib/Sema/TreeTransform.h | 6 +- clang/test/SemaTemplate/cwg2398.cpp | 8 -- 4 files changed, 80 insertions(+), 57 deletions(-) diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index b89d055304f4a6..27574924a14a92 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -993,6 +993,9 @@ Bug Fixes to C++ Support - Fix immediate escalation not propagating through inherited constructors. (#GH112677) - Fixed assertions or false compiler diagnostics in the case of C++ modules for lambda functions or inline friend functions defined inside templates (#GH122493). +- Fix template argument checking so that converted template arguments are + converted again. This fixes some issues with partial ordering involving + template template parameters with non-type template parameters. Bug Fixes to AST Handling ^ diff --git a/clang/lib/Sema/SemaTemplate.cpp b/clang/lib/Sema/SemaTemplate.cpp index 210df2836eeb07..62c45f15dec54e 100644 --- a/clang/lib/Sema/SemaTemplate.cpp +++ b/clang/lib/Sema/SemaTemplate.cpp @@ -5199,7 +5199,7 @@ convertTypeTemplateArgumentToTemplate(ASTContext &Context, TypeLoc TLoc) { } bool Sema::CheckTemplateArgument( -NamedDecl *Param, TemplateArgumentLoc &Arg, NamedDecl *Template, +NamedDecl *Param, TemplateArgumentLoc &ArgLoc, NamedDecl *Template, SourceLocation TemplateLoc, SourceLocation RAngleLoc, unsigned ArgumentPackIndex, SmallVectorImpl &SugaredConverted, @@ -5208,9 +5208,10 @@ bool Sema::CheckTemplateArgument( bool PartialOrderingTTP, bool *MatchedPackOnParmToNonPackOnArg) { // Check template type parameters. if (TemplateTypeParmDecl *TTP = dyn_cast(Param)) -return CheckTemplateTypeArgument(TTP, Arg, SugaredConverted, +return CheckTemplateTypeArgument(TTP, ArgLoc, SugaredConverted, CanonicalConverted); + const TemplateArgument &Arg = ArgLoc.getArgument(); // Check non-type template parameters. if (NonTypeTemplateParmDecl *NTTP =dyn_cast(Param)) { // Do substitution on the type of the non-type template parameter @@ -5252,63 +5253,89 @@ bool Sema::CheckTemplateArgument( return true; } -switch (Arg.getArgument().getKind()) { -case TemplateArgument::Null: - llvm_unreachable("Should never see a NULL template argument here"); - -case TemplateArgument::Expression: { - Expr *E = Arg.getArgument().getAsExpr(); +auto checkExpr = [&](Expr *E) -> Expr * { TemplateArgument SugaredResult, CanonicalResult; unsigned CurSFINAEErrors = NumSFINAEErrors; ExprResult Res = CheckTemplateArgument(NTTP, NTTPType, E, SugaredResult, CanonicalResult, PartialOrderingTTP, CTAK); - if (Res.isInvalid()) -return true; // If the current template argument causes an error, give up now. - if (CurSFINAEErrors < NumSFINAEErrors) -return true; + if (Res.isInvalid() || CurSFINAEErrors < NumSFINAEErrors) +return nullptr; + SugaredConverted.push_back(SugaredResult); + CanonicalConverted.push_back(CanonicalResult); + return Res.get(); +}; + +switch (Arg.getKind()) { +case TemplateArgument::Null: + llvm_unreachable("Should never see a NULL template argument here"); +case TemplateArgument::Expression: { + Expr *E = Arg.getAsExpr(); + Expr *R = checkExpr(E); + if (!R) +return true; // If the resulting expression is new, then use it in place of the // old expression in the template argument. - if (Res.get() != E) { -TemplateArgument TA(Res.get()); -Arg = TemplateArgumentLoc(TA, Res.get()); + if (R != E) { +TemplateArgument TA(R); +ArgLoc = TemplateArgumentLoc(TA, R); } - - SugaredConverted.push_back(SugaredResult); - CanonicalConverted.push_back(CanonicalResult); break; } -case TemplateArgument::Declaration: -case TemplateArgument::Integral: +// As for the converted NTTP kinds, they still might need another +// conversion, as the new corresponding parameter might be different. +// Ideally, we would always perform substitution sta
[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Matheus Izvekov (mizvekov) Changes Converted template arguments need to be converted again, if the corresponding template parameter changed, as different conversions might apply in that case. --- Full diff: https://github.com/llvm/llvm-project/pull/124386.diff 4 Files Affected: - (modified) clang/docs/ReleaseNotes.rst (+3) - (modified) clang/lib/Sema/SemaTemplate.cpp (+73-47) - (modified) clang/lib/Sema/TreeTransform.h (+4-2) - (modified) clang/test/SemaTemplate/cwg2398.cpp (-8) ``diff diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index b89d055304f4a6..27574924a14a92 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -993,6 +993,9 @@ Bug Fixes to C++ Support - Fix immediate escalation not propagating through inherited constructors. (#GH112677) - Fixed assertions or false compiler diagnostics in the case of C++ modules for lambda functions or inline friend functions defined inside templates (#GH122493). +- Fix template argument checking so that converted template arguments are + converted again. This fixes some issues with partial ordering involving + template template parameters with non-type template parameters. Bug Fixes to AST Handling ^ diff --git a/clang/lib/Sema/SemaTemplate.cpp b/clang/lib/Sema/SemaTemplate.cpp index 210df2836eeb07..62c45f15dec54e 100644 --- a/clang/lib/Sema/SemaTemplate.cpp +++ b/clang/lib/Sema/SemaTemplate.cpp @@ -5199,7 +5199,7 @@ convertTypeTemplateArgumentToTemplate(ASTContext &Context, TypeLoc TLoc) { } bool Sema::CheckTemplateArgument( -NamedDecl *Param, TemplateArgumentLoc &Arg, NamedDecl *Template, +NamedDecl *Param, TemplateArgumentLoc &ArgLoc, NamedDecl *Template, SourceLocation TemplateLoc, SourceLocation RAngleLoc, unsigned ArgumentPackIndex, SmallVectorImpl &SugaredConverted, @@ -5208,9 +5208,10 @@ bool Sema::CheckTemplateArgument( bool PartialOrderingTTP, bool *MatchedPackOnParmToNonPackOnArg) { // Check template type parameters. if (TemplateTypeParmDecl *TTP = dyn_cast(Param)) -return CheckTemplateTypeArgument(TTP, Arg, SugaredConverted, +return CheckTemplateTypeArgument(TTP, ArgLoc, SugaredConverted, CanonicalConverted); + const TemplateArgument &Arg = ArgLoc.getArgument(); // Check non-type template parameters. if (NonTypeTemplateParmDecl *NTTP =dyn_cast(Param)) { // Do substitution on the type of the non-type template parameter @@ -5252,63 +5253,89 @@ bool Sema::CheckTemplateArgument( return true; } -switch (Arg.getArgument().getKind()) { -case TemplateArgument::Null: - llvm_unreachable("Should never see a NULL template argument here"); - -case TemplateArgument::Expression: { - Expr *E = Arg.getArgument().getAsExpr(); +auto checkExpr = [&](Expr *E) -> Expr * { TemplateArgument SugaredResult, CanonicalResult; unsigned CurSFINAEErrors = NumSFINAEErrors; ExprResult Res = CheckTemplateArgument(NTTP, NTTPType, E, SugaredResult, CanonicalResult, PartialOrderingTTP, CTAK); - if (Res.isInvalid()) -return true; // If the current template argument causes an error, give up now. - if (CurSFINAEErrors < NumSFINAEErrors) -return true; + if (Res.isInvalid() || CurSFINAEErrors < NumSFINAEErrors) +return nullptr; + SugaredConverted.push_back(SugaredResult); + CanonicalConverted.push_back(CanonicalResult); + return Res.get(); +}; + +switch (Arg.getKind()) { +case TemplateArgument::Null: + llvm_unreachable("Should never see a NULL template argument here"); +case TemplateArgument::Expression: { + Expr *E = Arg.getAsExpr(); + Expr *R = checkExpr(E); + if (!R) +return true; // If the resulting expression is new, then use it in place of the // old expression in the template argument. - if (Res.get() != E) { -TemplateArgument TA(Res.get()); -Arg = TemplateArgumentLoc(TA, Res.get()); + if (R != E) { +TemplateArgument TA(R); +ArgLoc = TemplateArgumentLoc(TA, R); } - - SugaredConverted.push_back(SugaredResult); - CanonicalConverted.push_back(CanonicalResult); break; } -case TemplateArgument::Declaration: -case TemplateArgument::Integral: +// As for the converted NTTP kinds, they still might need another +// conversion, as the new corresponding parameter might be different. +// Ideally, we would always perform substitution starting with sugared types +// and never need these, as we would still have expressions. Since these are +// needed so rarely, it's probably a better tradeoff to just convert them +// back to expressions. +case TemplateArgument::Integral: { + IntegerLiteral ILE(Context, A