[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][CodeGen]Introduce function address map section and emit dynamic instruction count(CodeGen part) (PR #124334)

2025-01-24 Thread Lei Wang via llvm-branch-commits

https://github.com/wlei-llvm created 
https://github.com/llvm/llvm-project/pull/124334



Test Plan: llvm/test/CodeGen/X86/function-address-map-function-sections.ll



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] Handle directive arguments in OmpDirectiveSpecifier (PR #124278)

2025-01-24 Thread Krzysztof Parzyszek via llvm-branch-commits

kparzysz wrote:

Summary of code changes:
- Remove the indirection from 
`std::optional>>` inside 
OmpDirectiveSpecifier.  The type with indirection was not convertible to 
`std::optional`, which is a relatively useful abstraction. The 
consequence of that was that the definition of `OmpDirectiveSpecifier` had to 
be moved past the definitions of all clauses, and any occurrence of 
OmpDirectiveSpecifier in clauses now had to be wrapped in indirection.
- New classes for arguments (and their parsers) were created: OmpLocator, 
OmpReductionSpecifier, and a union-like struct OmpArgument.  
OmpDeclareMapperSpecifier was renamed to OmpMapperSpecifier.  All of them were 
moved to before clause definitions.  The intent here was to create argument 
classes for directives, while keeping in mind support for clause arguments as a 
long(er)-term goal.
- Some other cleanups were made in parse-tree.h as well: the 
MODIFIER_BOILERPLATE macro was moved closer to the modifier definitions, 
OmpObject definition was moved to the top of OpenMP definitions.
- Extend symbol resolution to properly resolve symbols in OmpMapperSpecifier 
and OmpReductionSpecifier when embedded in WHEN/OTHERWISE clauses (i.e. inside 
of a METADIRECTIVE specification).

https://github.com/llvm/llvm-project/pull/124278
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][llvm-readobj]Introduce function address map section and emit dynamic instruction count(readobj part) (PR #124333)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-binary-utilities

Author: Lei Wang (wlei-llvm)


Changes



Test Plan: llvm/test/tools/llvm-readobj/ELF/func-map.test


---
Full diff: https://github.com/llvm/llvm-project/pull/124333.diff


7 Files Affected:

- (modified) llvm/include/llvm/Object/ELF.h (+7) 
- (modified) llvm/lib/Object/ELF.cpp (+98) 
- (added) llvm/test/tools/llvm-readobj/ELF/func-map.test (+96) 
- (modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+60) 
- (modified) llvm/tools/llvm-readobj/ObjDumper.h (+1) 
- (modified) llvm/tools/llvm-readobj/Opts.td (+1) 
- (modified) llvm/tools/llvm-readobj/llvm-readobj.cpp (+4) 


``diff
diff --git a/llvm/include/llvm/Object/ELF.h b/llvm/include/llvm/Object/ELF.h
index 3aa1d7864fcb70..a688672a3e5190 100644
--- a/llvm/include/llvm/Object/ELF.h
+++ b/llvm/include/llvm/Object/ELF.h
@@ -513,6 +513,13 @@ class ELFFile {
   decodeBBAddrMap(const Elf_Shdr &Sec, const Elf_Shdr *RelaSec = nullptr,
   std::vector *PGOAnalyses = nullptr) const;
 
+  /// Returns a vector of FuncMap structs corresponding to each function
+  /// within the text section that the SHT_LLVM_FUNC_MAP section \p Sec
+  /// is associated with. If the current ELFFile is relocatable, a 
corresponding
+  /// \p RelaSec must be passed in as an argument.
+  Expected>
+  decodeFuncMap(const Elf_Shdr &Sec, const Elf_Shdr *RelaSec = nullptr) const;
+
   /// Returns a map from every section matching \p IsMatch to its relocation
   /// section, or \p nullptr if it has no relocation section. This function
   /// returns an error if any of the \p IsMatch calls fail or if it fails to
diff --git a/llvm/lib/Object/ELF.cpp b/llvm/lib/Object/ELF.cpp
index 41c3fb4cc5e406..87a9e5469f46d2 100644
--- a/llvm/lib/Object/ELF.cpp
+++ b/llvm/lib/Object/ELF.cpp
@@ -940,6 +940,104 @@ ELFFile::decodeBBAddrMap(const Elf_Shdr &Sec, const 
Elf_Shdr *RelaSec,
   return std::move(AddrMapsOrErr);
 }
 
+template 
+Expected>
+ELFFile::decodeFuncMap(const Elf_Shdr &Sec,
+ const Elf_Shdr *RelaSec) const {
+  bool IsRelocatable = this->getHeader().e_type == ELF::ET_REL;
+
+  // This DenseMap maps the offset of each function (the location of the
+  // reference to the function in the SHT_LLVM_FUNC_ADDR_MAP section) to the
+  // addend (the location of the function in the text section).
+  llvm::DenseMap FunctionOffsetTranslations;
+  if (IsRelocatable && RelaSec) {
+assert(RelaSec &&
+   "Can't read a SHT_LLVM_FUNC_ADDR_MAP section in a relocatable "
+   "object file without providing a relocation section.");
+Expected::Elf_Rela_Range> Relas =
+this->relas(*RelaSec);
+if (!Relas)
+  return createError("unable to read relocations for section " +
+ describe(*this, Sec) + ": " +
+ toString(Relas.takeError()));
+for (typename ELFFile::Elf_Rela Rela : *Relas)
+  FunctionOffsetTranslations[Rela.r_offset] = Rela.r_addend;
+  }
+  auto GetAddressForRelocation =
+  [&](unsigned RelocationOffsetInSection) -> Expected {
+auto FOTIterator =
+FunctionOffsetTranslations.find(RelocationOffsetInSection);
+if (FOTIterator == FunctionOffsetTranslations.end()) {
+  return createError("failed to get relocation data for offset: " +
+ Twine::utohexstr(RelocationOffsetInSection) +
+ " in section " + describe(*this, Sec));
+}
+return FOTIterator->second;
+  };
+  Expected> ContentsOrErr = this->getSectionContents(Sec);
+  if (!ContentsOrErr)
+return ContentsOrErr.takeError();
+  ArrayRef Content = *ContentsOrErr;
+  DataExtractor Data(Content, this->isLE(), ELFT::Is64Bits ? 8 : 4);
+  std::vector FunctionEntries;
+
+  DataExtractor::Cursor Cur(0);
+  Error ULEBSizeErr = Error::success();
+
+  // Helper lampda to extract the (possiblly relocatable) address stored at 
Cur.
+  auto ExtractAddress = [&]() -> Expected::uintX_t> {
+uint64_t RelocationOffsetInSection = Cur.tell();
+auto Address =
+static_cast::uintX_t>(Data.getAddress(Cur));
+if (!Cur)
+  return Cur.takeError();
+if (!IsRelocatable)
+  return Address;
+assert(Address == 0);
+Expected AddressOrErr =
+GetAddressForRelocation(RelocationOffsetInSection);
+if (!AddressOrErr)
+  return AddressOrErr.takeError();
+return *AddressOrErr;
+  };
+
+  uint8_t Version = 0;
+  uint8_t Feature = 0;
+  FuncMap::Features FeatEnable{};
+  while (!ULEBSizeErr && Cur && Cur.tell() < Content.size()) {
+if (Sec.sh_type == ELF::SHT_LLVM_FUNC_MAP) {
+  Version = Data.getU8(Cur);
+  if (!Cur)
+break;
+  if (Version > 1)
+return createError("unsupported SHT_LLVM_FUNC_MAP version: " +
+   Twine(static_cast(Version)));
+  Feature = Data.getU8(Cur); // Feature byte
+  if (!Cur)
+break;
+  auto FeatEnableOrErr = FuncMap::Features::decode(Feature);
+  if (!FeatEn

[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][llvm-readobj]Introduce function address map section and emit dynamic instruction count(readobj part) (PR #124333)

2025-01-24 Thread Lei Wang via llvm-branch-commits

https://github.com/wlei-llvm created 
https://github.com/llvm/llvm-project/pull/124333



Test Plan: llvm/test/tools/llvm-readobj/ELF/func-map.test



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][CodeGen]Introduce function address map section and emit dynamic instruction count(CodeGen part) (PR #124334)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mc

Author: Lei Wang (wlei-llvm)


Changes



Test Plan: llvm/test/CodeGen/X86/function-address-map-function-sections.ll


---
Full diff: https://github.com/llvm/llvm-project/pull/124334.diff


11 Files Affected:

- (modified) llvm/docs/Extensions.rst (+24-1) 
- (modified) llvm/include/llvm/CodeGen/AsmPrinter.h (+2) 
- (modified) llvm/include/llvm/MC/MCContext.h (+5) 
- (modified) llvm/include/llvm/MC/MCObjectFileInfo.h (+2) 
- (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+56-1) 
- (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+17) 
- (modified) llvm/lib/MC/MCParser/ELFAsmParser.cpp (+2) 
- (modified) llvm/lib/MC/MCSectionELF.cpp (+2) 
- (added) llvm/test/CodeGen/X86/function-address-map-dyn-inst-count.ll (+110) 
- (added) llvm/test/CodeGen/X86/function-address-map-function-sections.ll (+41) 
- (modified) llvm/test/MC/AsmParser/llvm_section_types.s (+4) 


``diff
diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst
index ea267842cdc353..d94e35eeefa6ad 100644
--- a/llvm/docs/Extensions.rst
+++ b/llvm/docs/Extensions.rst
@@ -535,6 +535,30 @@ Example of BBAddrMap with PGO data:
.uleb128  1000 # BB_3 basic block frequency (only 
when enabled)
.uleb128  0# BB_3 successors count (only 
enabled with branch probabilities)
 
+``SHT_LLVM_FUNC_MAP`` Section (function address map)
+^^
+This section stores the mapping from the binary address of function to its
+related metadata features. It is used to emit function-level analysis data and
+can be enabled through ``--func-map=`` option.
+
+Three fields are stored at the beginning: a version number byte for backward
+compatibility, a feature byte where each bit represents a specific feature, and
+the function's entry address. The encodings for each enabled feature come after
+these fields. The currently supported feature is:
+
+#. Dynamic Instruction Count - Total PGO counts for all instructions within 
the function.
+
+Example:
+
+.. code-block:: gas
+
+  .section  ".llvm_func_map","",@llvm_func_map
+  .byte 1 # version number
+  .byte 1 # feature
+  .quad .Lfunc_begin1 # function address
+  .uleb128  333   # dynamic instruction count
+
+
 ``SHT_LLVM_OFFLOADING`` Section (offloading data)
 ^^
 This section stores the binary data used to perform offloading device linking
@@ -725,4 +749,3 @@ follows:
   add x16, x16, :lo12:__chkstk
   blr x16
   sub sp, sp, x15, lsl #4
-
diff --git a/llvm/include/llvm/CodeGen/AsmPrinter.h 
b/llvm/include/llvm/CodeGen/AsmPrinter.h
index 5291369b3b9f1d..5fe35c283cceda 100644
--- a/llvm/include/llvm/CodeGen/AsmPrinter.h
+++ b/llvm/include/llvm/CodeGen/AsmPrinter.h
@@ -414,6 +414,8 @@ class AsmPrinter : public MachineFunctionPass {
 
   void emitBBAddrMapSection(const MachineFunction &MF);
 
+  void emitFuncMapSection(const MachineFunction &MF);
+
   void emitKCFITrapEntry(const MachineFunction &MF, const MCSymbol *Symbol);
   virtual void emitKCFITypeId(const MachineFunction &MF);
 
diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h
index 57ba40f7ac26fc..6fc9eaafeb09e3 100644
--- a/llvm/include/llvm/MC/MCContext.h
+++ b/llvm/include/llvm/MC/MCContext.h
@@ -177,6 +177,9 @@ class MCContext {
   /// LLVM_BB_ADDR_MAP version to emit.
   uint8_t BBAddrMapVersion = 2;
 
+  /// LLVM_FUNC_MAP version to emit.
+  uint8_t FuncMapVersion = 1;
+
   /// The file name of the log file from the environment variable
   /// AS_SECURE_LOG_FILE.  Which must be set before the .secure_log_unique
   /// directive is used or it is an error.
@@ -656,6 +659,8 @@ class MCContext {
 
   uint8_t getBBAddrMapVersion() const { return BBAddrMapVersion; }
 
+  uint8_t getFuncMapVersion() const { return FuncMapVersion; }
+
   /// @}
 
   /// \name Dwarf Management
diff --git a/llvm/include/llvm/MC/MCObjectFileInfo.h 
b/llvm/include/llvm/MC/MCObjectFileInfo.h
index fb575fe721015c..e344d4772e3fec 100644
--- a/llvm/include/llvm/MC/MCObjectFileInfo.h
+++ b/llvm/include/llvm/MC/MCObjectFileInfo.h
@@ -364,6 +364,8 @@ class MCObjectFileInfo {
 
   MCSection *getBBAddrMapSection(const MCSection &TextSec) const;
 
+  MCSection *getFuncMapSection(const MCSection &TextSec) const;
+
   MCSection *getKCFITrapSection(const MCSection &TextSec) const;
 
   MCSection *getPseudoProbeSection(const MCSection &TextSec) const;
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp 
b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index b2a4721f37b268..a00db04ef654c2 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -147,6 +147,11 @@ enum class PGOMapFeaturesEnum {
   BrProb,
   All,
 };
+
+enum class FuncMapFeaturesEnum {
+  DynamicInstCount,
+};
+
 static cl::bits PgoA

[llvm-branch-commits] [SHT_LLVM_FUNC_MAP][CodeGen]Introduce function address map section and emit dynamic instruction count(CodeGen part) (PR #124334)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-x86

Author: Lei Wang (wlei-llvm)


Changes



Test Plan: llvm/test/CodeGen/X86/function-address-map-function-sections.ll


---
Full diff: https://github.com/llvm/llvm-project/pull/124334.diff


11 Files Affected:

- (modified) llvm/docs/Extensions.rst (+24-1) 
- (modified) llvm/include/llvm/CodeGen/AsmPrinter.h (+2) 
- (modified) llvm/include/llvm/MC/MCContext.h (+5) 
- (modified) llvm/include/llvm/MC/MCObjectFileInfo.h (+2) 
- (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+56-1) 
- (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+17) 
- (modified) llvm/lib/MC/MCParser/ELFAsmParser.cpp (+2) 
- (modified) llvm/lib/MC/MCSectionELF.cpp (+2) 
- (added) llvm/test/CodeGen/X86/function-address-map-dyn-inst-count.ll (+110) 
- (added) llvm/test/CodeGen/X86/function-address-map-function-sections.ll (+41) 
- (modified) llvm/test/MC/AsmParser/llvm_section_types.s (+4) 


``diff
diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst
index ea267842cdc353..d94e35eeefa6ad 100644
--- a/llvm/docs/Extensions.rst
+++ b/llvm/docs/Extensions.rst
@@ -535,6 +535,30 @@ Example of BBAddrMap with PGO data:
.uleb128  1000 # BB_3 basic block frequency (only 
when enabled)
.uleb128  0# BB_3 successors count (only 
enabled with branch probabilities)
 
+``SHT_LLVM_FUNC_MAP`` Section (function address map)
+^^
+This section stores the mapping from the binary address of function to its
+related metadata features. It is used to emit function-level analysis data and
+can be enabled through ``--func-map=`` option.
+
+Three fields are stored at the beginning: a version number byte for backward
+compatibility, a feature byte where each bit represents a specific feature, and
+the function's entry address. The encodings for each enabled feature come after
+these fields. The currently supported feature is:
+
+#. Dynamic Instruction Count - Total PGO counts for all instructions within 
the function.
+
+Example:
+
+.. code-block:: gas
+
+  .section  ".llvm_func_map","",@llvm_func_map
+  .byte 1 # version number
+  .byte 1 # feature
+  .quad .Lfunc_begin1 # function address
+  .uleb128  333   # dynamic instruction count
+
+
 ``SHT_LLVM_OFFLOADING`` Section (offloading data)
 ^^
 This section stores the binary data used to perform offloading device linking
@@ -725,4 +749,3 @@ follows:
   add x16, x16, :lo12:__chkstk
   blr x16
   sub sp, sp, x15, lsl #4
-
diff --git a/llvm/include/llvm/CodeGen/AsmPrinter.h 
b/llvm/include/llvm/CodeGen/AsmPrinter.h
index 5291369b3b9f1d..5fe35c283cceda 100644
--- a/llvm/include/llvm/CodeGen/AsmPrinter.h
+++ b/llvm/include/llvm/CodeGen/AsmPrinter.h
@@ -414,6 +414,8 @@ class AsmPrinter : public MachineFunctionPass {
 
   void emitBBAddrMapSection(const MachineFunction &MF);
 
+  void emitFuncMapSection(const MachineFunction &MF);
+
   void emitKCFITrapEntry(const MachineFunction &MF, const MCSymbol *Symbol);
   virtual void emitKCFITypeId(const MachineFunction &MF);
 
diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h
index 57ba40f7ac26fc..6fc9eaafeb09e3 100644
--- a/llvm/include/llvm/MC/MCContext.h
+++ b/llvm/include/llvm/MC/MCContext.h
@@ -177,6 +177,9 @@ class MCContext {
   /// LLVM_BB_ADDR_MAP version to emit.
   uint8_t BBAddrMapVersion = 2;
 
+  /// LLVM_FUNC_MAP version to emit.
+  uint8_t FuncMapVersion = 1;
+
   /// The file name of the log file from the environment variable
   /// AS_SECURE_LOG_FILE.  Which must be set before the .secure_log_unique
   /// directive is used or it is an error.
@@ -656,6 +659,8 @@ class MCContext {
 
   uint8_t getBBAddrMapVersion() const { return BBAddrMapVersion; }
 
+  uint8_t getFuncMapVersion() const { return FuncMapVersion; }
+
   /// @}
 
   /// \name Dwarf Management
diff --git a/llvm/include/llvm/MC/MCObjectFileInfo.h 
b/llvm/include/llvm/MC/MCObjectFileInfo.h
index fb575fe721015c..e344d4772e3fec 100644
--- a/llvm/include/llvm/MC/MCObjectFileInfo.h
+++ b/llvm/include/llvm/MC/MCObjectFileInfo.h
@@ -364,6 +364,8 @@ class MCObjectFileInfo {
 
   MCSection *getBBAddrMapSection(const MCSection &TextSec) const;
 
+  MCSection *getFuncMapSection(const MCSection &TextSec) const;
+
   MCSection *getKCFITrapSection(const MCSection &TextSec) const;
 
   MCSection *getPseudoProbeSection(const MCSection &TextSec) const;
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp 
b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index b2a4721f37b268..a00db04ef654c2 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -147,6 +147,11 @@ enum class PGOMapFeaturesEnum {
   BrProb,
   All,
 };
+
+enum class FuncMapFeaturesEnum {
+  DynamicInstCount,
+};
+
 static cl::

[llvm-branch-commits] [flang] [Flang] Remove FLANG_INCLUDE_RUNTIME (PR #124126)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur updated 
https://github.com/llvm/llvm-project/pull/124126

>From c515d13f0ad684763e6d76a87a610801482c15f4 Mon Sep 17 00:00:00 2001
From: Michael Kruse 
Date: Fri, 24 Jan 2025 16:52:46 +0100
Subject: [PATCH] [Flang] Remove FLANG_INCLUDE_RUNTIME

---
 flang/CMakeLists.txt  |  25 +-
 .../modules/AddFlangOffloadRuntime.cmake  | 146 
 flang/runtime/CMakeLists.txt  | 350 --
 flang/runtime/CUDA/CMakeLists.txt |  41 --
 flang/runtime/Float128Math/CMakeLists.txt | 133 ---
 flang/test/CMakeLists.txt |  10 -
 flang/test/lit.cfg.py |   3 -
 flang/test/lit.site.cfg.py.in |   1 -
 flang/tools/f18/CMakeLists.txt|  17 +-
 flang/unittests/CMakeLists.txt|  43 +--
 flang/unittests/Evaluate/CMakeLists.txt   |  16 -
 11 files changed, 5 insertions(+), 780 deletions(-)
 delete mode 100644 flang/cmake/modules/AddFlangOffloadRuntime.cmake
 delete mode 100644 flang/runtime/CMakeLists.txt
 delete mode 100644 flang/runtime/CUDA/CMakeLists.txt
 delete mode 100644 flang/runtime/Float128Math/CMakeLists.txt

diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt
index 38004c149b7835..aceb2d09c54388 100644
--- a/flang/CMakeLists.txt
+++ b/flang/CMakeLists.txt
@@ -23,7 +23,6 @@ if (LLVM_ENABLE_EH)
 endif()
 
 set(FLANG_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
-set(FLANG_RT_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../flang-rt")
 
 if (CMAKE_SOURCE_DIR STREQUAL CMAKE_BINARY_DIR AND NOT MSVC_IDE)
   message(FATAL_ERROR "In-source builds are not allowed. \
@@ -237,24 +236,8 @@ else()
   include_directories(SYSTEM ${MLIR_TABLEGEN_OUTPUT_DIR})
 endif()
 
-set(FLANG_INCLUDE_RUNTIME_default ON)
-if ("flang-rt" IN_LIST LLVM_ENABLE_RUNTIMES)
-  set(FLANG_INCLUDE_RUNTIME_default OFF)
-endif ()
-option(FLANG_INCLUDE_RUNTIME "Build the runtime in-tree (deprecated; to be 
replaced with LLVM_ENABLE_RUNTIMES=flang-rt)" FLANG_INCLUDE_RUNTIME_default)
-if (FLANG_INCLUDE_RUNTIME)
-  if ("flang-rt" IN_LIST LLVM_ENABLE_RUNTIMES)
-message(WARNING "Building Flang-RT using LLVM_ENABLE_RUNTIMES. 
FLANG_INCLUDE_RUNTIME=${FLANG_INCLUDE_RUNTIME} ignored.")
-set(FLANG_INCLUDE_RUNTIME OFF)
-  else ()
- message(STATUS "Building flang_rt in-tree")
-  endif ()
-else ()
-  if ("flang-rt" IN_LIST LLVM_ENABLE_RUNTIMES)
-message(STATUS "Building Flang-RT using LLVM_ENABLE_RUNTIMES.")
-  else ()
-message(STATUS "Not building Flang-RT. For a usable Fortran toolchain, 
compile a standalone Flang-RT")
-  endif ()
+if (NOT "flang-rt" IN_LIST LLVM_ENABLE_RUNTIMES)
+  message(STATUS "Not building Flang-RT. For a usable Fortran toolchain, 
compile a standalone Flang-RT")
 endif ()
 
 set(FLANG_TOOLS_INSTALL_DIR "${CMAKE_INSTALL_BINDIR}" CACHE PATH
@@ -484,10 +467,6 @@ if (FLANG_CUF_RUNTIME)
   find_package(CUDAToolkit REQUIRED)
 endif()
 
-if (FLANG_INCLUDE_RUNTIME)
-  add_subdirectory(runtime)
-endif ()
-
 if (LLVM_INCLUDE_EXAMPLES)
   add_subdirectory(examples)
 endif()
diff --git a/flang/cmake/modules/AddFlangOffloadRuntime.cmake 
b/flang/cmake/modules/AddFlangOffloadRuntime.cmake
deleted file mode 100644
index 8e4f47d18535dc..00
--- a/flang/cmake/modules/AddFlangOffloadRuntime.cmake
+++ /dev/null
@@ -1,146 +0,0 @@
-option(FLANG_EXPERIMENTAL_CUDA_RUNTIME
-  "Compile Fortran runtime as CUDA sources (experimental)" OFF
-  )
-
-option(FLANG_CUDA_RUNTIME_PTX_WITHOUT_GLOBAL_VARS
-  "Do not compile global variables' definitions when producing PTX library" OFF
-  )
-
-set(FLANG_LIBCUDACXX_PATH "" CACHE PATH "Path to libcu++ package installation")
-
-set(FLANG_EXPERIMENTAL_OMP_OFFLOAD_BUILD "off" CACHE STRING
-  "Compile Fortran runtime as OpenMP target offload sources (experimental). 
Valid options are 'off', 'host_device', 'nohost'")
-
-set(FLANG_OMP_DEVICE_ARCHITECTURES "all" CACHE STRING
-  "List of OpenMP device architectures to be used to compile the Fortran 
runtime (e.g. 'gfx1103;sm_90')")
-
-macro(enable_cuda_compilation name files)
-  if (FLANG_EXPERIMENTAL_CUDA_RUNTIME)
-if (BUILD_SHARED_LIBS)
-  message(FATAL_ERROR
-"BUILD_SHARED_LIBS is not supported for CUDA build of Fortran runtime"
-)
-endif()
-
-enable_language(CUDA)
-
-# TODO: figure out how to make target property CUDA_SEPARABLE_COMPILATION
-# work, and avoid setting CMAKE_CUDA_SEPARABLE_COMPILATION.
-set(CMAKE_CUDA_SEPARABLE_COMPILATION ON)
-
-# Treat all supported sources as CUDA files.
-set_source_files_properties(${files} PROPERTIES LANGUAGE CUDA)
-set(CUDA_COMPILE_OPTIONS)
-if ("${CMAKE_CUDA_COMPILER_ID}" MATCHES "Clang")
-  # Allow varargs.
-  set(CUDA_COMPILE_OPTIONS
--Xclang -fcuda-allow-variadic-functions
-)
-endif()
-if ("${CMAKE_CUDA_COMPILER_ID}" MATCHES "NVIDIA")
-  set(CUDA_COMPILE_OPTIONS
---expt-relaxed-constexpr
-# Disable these warnings:
-# 

[llvm-branch-commits] [clang] [flang] [lld] [Flang] Rename libFortranRuntime.a to libflang_rt.a (PR #122341)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-driver

Author: Michael Kruse (Meinersbur)


Changes

The future name of Flang's runtime component is `flang_rt`, as already used in 
PR #110217 (Flang-RT). Since the flang driver has to select the runtime 
to link, both build instructions must agree on the name.

Extracted out of #110217

---

Patch is 23.32 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/122341.diff


26 Files Affected:

- (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+2-2) 
- (modified) clang/lib/Driver/ToolChains/Flang.cpp (+4-4) 
- (modified) flang/CMakeLists.txt (+1-1) 
- (modified) flang/docs/FlangDriver.md (+3-3) 
- (modified) flang/docs/GettingStarted.md (+3-3) 
- (modified) flang/docs/OpenACC-descriptor-management.md (+1-1) 
- (modified) flang/docs/ReleaseNotes.md (+2) 
- (modified) flang/examples/ExternalHelloWorld/CMakeLists.txt (+1-1) 
- (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+1-1) 
- (modified) flang/runtime/CMakeLists.txt (+23-17) 
- (modified) flang/runtime/CUDA/CMakeLists.txt (+1-1) 
- (modified) flang/runtime/Float128Math/CMakeLists.txt (+1-1) 
- (modified) flang/runtime/time-intrinsic.cpp (+1-1) 
- (modified) flang/test/CMakeLists.txt (+7-1) 
- (modified) flang/test/Driver/gcc-toolchain-install-dir.f90 (+1-1) 
- (modified) flang/test/Driver/linker-flags.f90 (+4-4) 
- (modified) flang/test/Driver/msvc-dependent-lib-flags.f90 (+4-4) 
- (modified) flang/test/Driver/nostdlib.f90 (+1-1) 
- (modified) flang/test/Runtime/no-cpp-dep.c (+1-1) 
- (modified) flang/test/lit.cfg.py (+1-1) 
- (modified) flang/tools/f18/CMakeLists.txt (+4-4) 
- (modified) flang/unittests/CMakeLists.txt (+1-1) 
- (modified) flang/unittests/Evaluate/CMakeLists.txt (+2-2) 
- (modified) flang/unittests/Runtime/CMakeLists.txt (+1-1) 
- (modified) flang/unittests/Runtime/CUDA/CMakeLists.txt (+1-1) 
- (modified) lld/COFF/MinGW.cpp (+1-1) 


``diff
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index b5273dd8cf1e3a..c7b0a660ee021f 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1321,7 +1321,7 @@ void tools::addOpenMPHostOffloadingArgs(const Compilation 
&C,
 /// Add Fortran runtime libs
 void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args,
   llvm::opt::ArgStringList &CmdArgs) {
-  // Link FortranRuntime
+  // Link flang_rt
   // These are handled earlier on Windows by telling the frontend driver to
   // add the correct libraries to link against as dependents in the object
   // file.
@@ -1337,7 +1337,7 @@ void tools::addFortranRuntimeLibs(const ToolChain &TC, 
const ArgList &Args,
   if (AsNeeded)
 addAsNeededOption(TC, Args, CmdArgs, /*as_needed=*/false);
 }
-CmdArgs.push_back("-lFortranRuntime");
+CmdArgs.push_back("-lflang_rt");
 addArchSpecificRPath(TC, Args, CmdArgs);
   }
 
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index f1bf32b3238270..68a17edf8ca341 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -360,26 +360,26 @@ static void processVSRuntimeLibrary(const ToolChain &TC, 
const ArgList &Args,
   case options::OPT__SLASH_MT:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("--dependent-lib=libcmt");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.static.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.static.lib");
 break;
   case options::OPT__SLASH_MTd:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DEBUG");
 CmdArgs.push_back("--dependent-lib=libcmtd");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.static_dbg.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.static_dbg.lib");
 break;
   case options::OPT__SLASH_MD:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DLL");
 CmdArgs.push_back("--dependent-lib=msvcrt");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.dynamic.lib");
 break;
   case options::OPT__SLASH_MDd:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DEBUG");
 CmdArgs.push_back("-D_DLL");
 CmdArgs.push_back("--dependent-lib=msvcrtd");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic_dbg.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.dynamic_dbg.lib");
 break;
   }
 }
diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt
index 7d6dcb5c184a52..8a8b8bfa73b007 100644
--- a/flang/CMakeLists.txt
+++ b/flang/CMakeLists.txt
@@ -301,7 +301,7 @@ set(FLANG_DEFAULT_LINKER "" CACHE STRING
   "Default linker to use (linker name or absolute path, empty for platform 
default)")
 
 set(FLANG_DEFAULT_RTLIB "" CACHE STRING
-   "Default Fortran runtime library to use (\"libFortranRuntime\"), leave 
empty for platform default.")
+   "Defaul

[llvm-branch-commits] [clang] [flang] [lld] [Flang] Rename libFortranRuntime.a to libflang_rt.a (PR #122341)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-semantics

Author: Michael Kruse (Meinersbur)


Changes

The future name of Flang's runtime component is `flang_rt`, as already used in 
PR #110217 (Flang-RT). Since the flang driver has to select the runtime 
to link, both build instructions must agree on the name.

Extracted out of #110217

---

Patch is 23.32 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/122341.diff


26 Files Affected:

- (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+2-2) 
- (modified) clang/lib/Driver/ToolChains/Flang.cpp (+4-4) 
- (modified) flang/CMakeLists.txt (+1-1) 
- (modified) flang/docs/FlangDriver.md (+3-3) 
- (modified) flang/docs/GettingStarted.md (+3-3) 
- (modified) flang/docs/OpenACC-descriptor-management.md (+1-1) 
- (modified) flang/docs/ReleaseNotes.md (+2) 
- (modified) flang/examples/ExternalHelloWorld/CMakeLists.txt (+1-1) 
- (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+1-1) 
- (modified) flang/runtime/CMakeLists.txt (+23-17) 
- (modified) flang/runtime/CUDA/CMakeLists.txt (+1-1) 
- (modified) flang/runtime/Float128Math/CMakeLists.txt (+1-1) 
- (modified) flang/runtime/time-intrinsic.cpp (+1-1) 
- (modified) flang/test/CMakeLists.txt (+7-1) 
- (modified) flang/test/Driver/gcc-toolchain-install-dir.f90 (+1-1) 
- (modified) flang/test/Driver/linker-flags.f90 (+4-4) 
- (modified) flang/test/Driver/msvc-dependent-lib-flags.f90 (+4-4) 
- (modified) flang/test/Driver/nostdlib.f90 (+1-1) 
- (modified) flang/test/Runtime/no-cpp-dep.c (+1-1) 
- (modified) flang/test/lit.cfg.py (+1-1) 
- (modified) flang/tools/f18/CMakeLists.txt (+4-4) 
- (modified) flang/unittests/CMakeLists.txt (+1-1) 
- (modified) flang/unittests/Evaluate/CMakeLists.txt (+2-2) 
- (modified) flang/unittests/Runtime/CMakeLists.txt (+1-1) 
- (modified) flang/unittests/Runtime/CUDA/CMakeLists.txt (+1-1) 
- (modified) lld/COFF/MinGW.cpp (+1-1) 


``diff
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index b5273dd8cf1e3a..c7b0a660ee021f 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1321,7 +1321,7 @@ void tools::addOpenMPHostOffloadingArgs(const Compilation 
&C,
 /// Add Fortran runtime libs
 void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args,
   llvm::opt::ArgStringList &CmdArgs) {
-  // Link FortranRuntime
+  // Link flang_rt
   // These are handled earlier on Windows by telling the frontend driver to
   // add the correct libraries to link against as dependents in the object
   // file.
@@ -1337,7 +1337,7 @@ void tools::addFortranRuntimeLibs(const ToolChain &TC, 
const ArgList &Args,
   if (AsNeeded)
 addAsNeededOption(TC, Args, CmdArgs, /*as_needed=*/false);
 }
-CmdArgs.push_back("-lFortranRuntime");
+CmdArgs.push_back("-lflang_rt");
 addArchSpecificRPath(TC, Args, CmdArgs);
   }
 
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index f1bf32b3238270..68a17edf8ca341 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -360,26 +360,26 @@ static void processVSRuntimeLibrary(const ToolChain &TC, 
const ArgList &Args,
   case options::OPT__SLASH_MT:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("--dependent-lib=libcmt");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.static.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.static.lib");
 break;
   case options::OPT__SLASH_MTd:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DEBUG");
 CmdArgs.push_back("--dependent-lib=libcmtd");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.static_dbg.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.static_dbg.lib");
 break;
   case options::OPT__SLASH_MD:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DLL");
 CmdArgs.push_back("--dependent-lib=msvcrt");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.dynamic.lib");
 break;
   case options::OPT__SLASH_MDd:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DEBUG");
 CmdArgs.push_back("-D_DLL");
 CmdArgs.push_back("--dependent-lib=msvcrtd");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic_dbg.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.dynamic_dbg.lib");
 break;
   }
 }
diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt
index 7d6dcb5c184a52..8a8b8bfa73b007 100644
--- a/flang/CMakeLists.txt
+++ b/flang/CMakeLists.txt
@@ -301,7 +301,7 @@ set(FLANG_DEFAULT_LINKER "" CACHE STRING
   "Default linker to use (linker name or absolute path, empty for platform 
default)")
 
 set(FLANG_DEFAULT_RTLIB "" CACHE STRING
-   "Default Fortran runtime library to use (\"libFortranRuntime\"), leave 
empty for platform default.")
+   "Def

[llvm-branch-commits] [clang] [flang] [llvm] [Flang] LLVM_ENABLE_RUNTIMES=flang-rt (PR #110217)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Michael Kruse (Meinersbur)


Changes

Extract Flang's runtime library to use the LLVM_ENABLE_RUNTIME mechanism.

Motivation:
 * Consistency with LLVM's other runtime libraries (compiler-rt, libc, libcxx, 
openmp offload, ...)
 * Allows compiling the runtime for multiple targets at once using the 
LLVM_RUNTIME_TARGETS configuration options
 * Installs the runtime into the compiler's per-target resource directory so it 
can be automatically found even when cross-compiling

Potential future directions: 
 * Uses CMake's support for compiling Fortran files, including dependency 
resolution of Fortran modules
 * Improve robustness of compiling `libomp.mod` when openmp is available
 * Remove Flang's dependency from flang-rt's RTNAME function declarations 
(tblgen?)
 * Reduce Flang's build-time dependency from flang-rt's `REAL(16)` support

See RFC discussion at 
https://discourse.llvm.org/t/rfc-use-llvm-enable-runtimes-for-flangs-runtime/80826

Patch series:
 * #110244
 * #112188
 * #121997
 * #122069
 * #122334
 * #122336
 * #122341
 * #110298
 * #110217 (this PR)
 * #121782
 * #124126

Patch for lab.llvm.org buildbots:
 * https://github.com/llvm/llvm-zorg/pull/333


---

Patch is 108.72 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/110217.diff


41 Files Affected:

- (modified) clang/lib/Driver/ToolChains/Flang.cpp (+9-5) 
- (added) flang-rt/.clang-tidy (+2) 
- (added) flang-rt/CMakeLists.txt (+248) 
- (added) flang-rt/CODE_OWNERS.TXT (+14) 
- (added) flang-rt/LICENSE.TXT (+234) 
- (added) flang-rt/README.md (+188) 
- (added) flang-rt/cmake/modules/AddFlangRT.cmake (+186) 
- (added) flang-rt/cmake/modules/AddFlangRTOffload.cmake (+101) 
- (added) flang-rt/cmake/modules/GetToolchainDirs.cmake (+125) 
- (added) flang-rt/lib/CMakeLists.txt (+18) 
- (added) flang-rt/lib/FortranFloat128Math/CMakeLists.txt (+136) 
- (added) flang-rt/lib/Testing/CMakeLists.txt (+20) 
- (added) flang-rt/lib/flang_rt/CMakeLists.txt (+213) 
- (added) flang-rt/lib/flang_rt/CUDA/CMakeLists.txt (+33) 
- (modified) flang-rt/lib/flang_rt/io-api-minimal.cpp (+1-1) 
- (added) flang-rt/test/CMakeLists.txt (+59) 
- (modified) flang-rt/test/Driver/ctofortran.f90 (+5-24) 
- (modified) flang-rt/test/Driver/exec.f90 (+4-4) 
- (added) flang-rt/test/NonGtestUnit/lit.cfg.py (+22) 
- (added) flang-rt/test/NonGtestUnit/lit.site.cfg.py.in (+14) 
- (modified) flang-rt/test/Runtime/no-cpp-dep.c (+3-2) 
- (added) flang-rt/test/Unit/lit.cfg.py (+21) 
- (added) flang-rt/test/Unit/lit.site.cfg.py.in (+15) 
- (added) flang-rt/test/lit.cfg.py (+102) 
- (added) flang-rt/test/lit.site.cfg.py.in (+19) 
- (added) flang-rt/unittests/CMakeLists.txt (+111) 
- (added) flang-rt/unittests/Evaluate/CMakeLists.txt (+21) 
- (added) flang-rt/unittests/Runtime/CMakeLists.txt (+48) 
- (added) flang-rt/unittests/Runtime/CUDA/CMakeLists.txt (+18) 
- (modified) flang/CMakeLists.txt (+26-27) 
- (added) flang/cmake/modules/FlangCommon.cmake (+43) 
- (modified) flang/docs/GettingStarted.md (+58-50) 
- (modified) flang/docs/ReleaseNotes.md (+7-1) 
- (modified) flang/module/iso_fortran_env_impl.f90 (+1-1) 
- (modified) flang/test/lit.cfg.py (-20) 
- (modified) flang/test/lit.site.cfg.py.in (-3) 
- (modified) llvm/CMakeLists.txt (+7-1) 
- (modified) llvm/cmake/modules/LLVMExternalProjectUtils.cmake (+15-1) 
- (modified) llvm/projects/CMakeLists.txt (+3-1) 
- (modified) llvm/runtimes/CMakeLists.txt (+18-7) 
- (modified) runtimes/CMakeLists.txt (+1-1) 


``diff
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 68a17edf8ca341..17a8a4dd8d0a87 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -342,11 +342,15 @@ static void processVSRuntimeLibrary(const ToolChain &TC, 
const ArgList &Args,
 ArgStringList &CmdArgs) {
   assert(TC.getTriple().isKnownWindowsMSVCEnvironment() &&
  "can only add VS runtime library on Windows!");
-  // if -fno-fortran-main has been passed, skip linking Fortran_main.a
-  if (TC.getTriple().isKnownWindowsMSVCEnvironment()) {
-CmdArgs.push_back(Args.MakeArgString(
-"--dependent-lib=" + TC.getCompilerRTBasename(Args, "builtins")));
-  }
+
+  // Flang/Clang (including clang-cl) -compiled programs targeting the MSVC ABI
+  // should only depend on msv(u)crt. LLVM still emits libgcc/compiler-rt
+  // functions in some cases like 128-bit integer math (__udivti3, __modti3,
+  // __fixsfti, __floattidf, ...) that msvc does not support. We are injecting 
a
+  // dependency to Compiler-RT's builtin library where these are implemented.
+  CmdArgs.push_back(Args.MakeArgString(
+  "--dependent-lib=" + TC.getCompilerRTBasename(Args, "builtins")));
+
   unsigned RTOptionID = options::OPT__SLASH_MT;
   if (auto *rtl = Args.getLastArg(options::OPT_fms_runtime_lib_EQ)) {
 RTOptionID = llvm::StringSwitch(rtl->getValue())
d

[llvm-branch-commits] [flang] [Flang] Promote FortranEvaluateTesting library (PR #122334)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur edited 
https://github.com/llvm/llvm-project/pull/122334
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Flang-RT] Build libflang_rt.so (PR #121782)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur updated 
https://github.com/llvm/llvm-project/pull/121782

>From b05c9a033158aea459d51ff34b8ec47e72f85740 Mon Sep 17 00:00:00 2001
From: Michael Kruse 
Date: Fri, 24 Jan 2025 16:51:27 +0100
Subject: [PATCH] [Flang-RT] Build libflang_rt.so

---
 flang-rt/CMakeLists.txt   |  30 ++
 flang-rt/cmake/modules/AddFlangRT.cmake   | 324 --
 .../cmake/modules/AddFlangRTOffload.cmake |  18 +-
 flang-rt/lib/flang_rt/CMakeLists.txt  |   9 +-
 flang-rt/lib/flang_rt/CUDA/CMakeLists.txt |  26 +-
 flang-rt/test/CMakeLists.txt  |   2 +-
 flang-rt/test/lit.cfg.py  |   2 +-
 7 files changed, 283 insertions(+), 128 deletions(-)

diff --git a/flang-rt/CMakeLists.txt b/flang-rt/CMakeLists.txt
index 655d0a55b40044..0b91b6ae7eea78 100644
--- a/flang-rt/CMakeLists.txt
+++ b/flang-rt/CMakeLists.txt
@@ -115,6 +115,15 @@ endif ()
 extend_path(FLANG_RT_INSTALL_RESOURCE_LIB_PATH 
"${FLANG_RT_INSTALL_RESOURCE_PATH}" "${toolchain_lib_subdir}")
 cmake_path(NORMAL_PATH FLANG_RT_OUTPUT_RESOURCE_DIR)
 cmake_path(NORMAL_PATH FLANG_RT_INSTALL_RESOURCE_PATH)
+# FIXME: For the libflang_rt.so, the toolchain resource lib dir is not a good
+#destination because it is not a ld.so default search path.
+#The machine where the executable is eventually executed may not be the
+#machine where the Flang compiler and its resource dir is installed, so
+#setting RPath by the driver is not an solution. It should belong into
+#/usr/lib//libflang_rt.so, like e.g. libgcc_s.so.
+#But the linker as invoked by the Flang driver also requires
+#libflang_rt.so to be found when linking and the resource lib dir is
+#the only reliable location.
 cmake_path(NORMAL_PATH FLANG_RT_OUTPUT_RESOURCE_LIB_DIR)
 cmake_path(NORMAL_PATH FLANG_RT_INSTALL_RESOURCE_LIB_PATH)
 
@@ -129,6 +138,27 @@ cmake_path(NORMAL_PATH FLANG_RT_INSTALL_RESOURCE_LIB_PATH)
 option(FLANG_RT_INCLUDE_TESTS "Generate build targets for the flang-rt unit 
and regression-tests." "${LLVM_INCLUDE_TESTS}")
 
 
+option(FLANG_RT_ENABLE_STATIC "Build Flang-RT as a static library." ON)
+if (WIN32)
+  # Windows DLL currently not implemented.
+  set(FLANG_RT_ENABLE_SHARED OFF)
+else ()
+  # TODO: Enable by default to increase test coverage, and which version of the
+  #   library should be the user's choice anyway.
+  #   Currently, the Flang driver adds `-L"libdir" -lflang_rt` as linker
+  #   argument, which leaves the choice which library to use to the linker.
+  #   Since most linkers prefer the shared library, this would constitute a
+  #   breaking change unless the driver is changed.
+  option(FLANG_RT_ENABLE_SHARED "Build Flang-RT as a shared library." OFF)
+endif ()
+if (NOT FLANG_RT_ENABLE_STATIC AND NOT FLANG_RT_ENABLE_SHARED)
+  message(FATAL_ERROR "
+  Must build at least one type of library
+  (FLANG_RT_ENABLE_STATIC=ON, FLANG_RT_ENABLE_SHARED=ON, or both)
+")
+endif ()
+
+
 set(FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT "" CACHE STRING "Compile Flang-RT 
with GPU support (CUDA or OpenMP)")
 set_property(CACHE FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT PROPERTY STRINGS
 ""
diff --git a/flang-rt/cmake/modules/AddFlangRT.cmake 
b/flang-rt/cmake/modules/AddFlangRT.cmake
index aa8adedf61752a..87ec58b2e854eb 100644
--- a/flang-rt/cmake/modules/AddFlangRT.cmake
+++ b/flang-rt/cmake/modules/AddFlangRT.cmake
@@ -16,7 +16,8 @@
 #   STATIC
 # Build a static (.a/.lib) library
 #   OBJECT
-# Create only object files without static/dynamic library
+# Always create an object library.
+# Without SHARED/STATIC, build only the object library.
 #   INSTALL_WITH_TOOLCHAIN
 # Install library into Clang's resource directory so it can be found by the
 # Flang driver during compilation, including tests
@@ -48,17 +49,73 @@ function (add_flangrt_library name)
   ")
   endif ()
 
-  # Forward libtype to add_library
-  set(extra_args "")
-  if (ARG_SHARED)
-list(APPEND extra_args SHARED)
+  # Internal names of libraries. If called with just single type option, use
+  # the default name for it. Name of targets must only depend on function
+  # arguments to be predictable for callers.
+  set(name_static "${name}.static")
+  set(name_shared "${name}.shared")
+  set(name_object "obj.${name}")
+  if (ARG_STATIC AND NOT ARG_SHARED)
+set(name_static "${name}")
+  elseif (NOT ARG_STATIC AND ARG_SHARED)
+set(name_shared "${name}")
+  elseif (NOT ARG_STATIC AND NOT ARG_SHARED AND ARG_OBJECT)
+set(name_object "${name}")
+  elseif (NOT ARG_STATIC AND NOT ARG_SHARED AND NOT ARG_OBJECT)
+# Only one of them will actually be built.
+set(name_static "${name}")
+set(name_shared "${name}")
   endif ()
-  if (ARG_STATIC)
-list(APPEND extra_args STATIC)
+
+  # Determine what to build. If not explicitly specified, honor
+  # BUILD_SHARED_LIBS (e.g. for unittest libraries). If can build s

[llvm-branch-commits] [Flang] Introduce FortranSupport (PR #122069)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur closed 
https://github.com/llvm/llvm-project/pull/122069
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [flang] [lld] [Flang] Rename libFortranRuntime.a to libflang_rt.a (PR #122341)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-lld

@llvm/pr-subscribers-flang-driver

Author: Michael Kruse (Meinersbur)


Changes

The future name of Flang's runtime component is `flang_rt`, as already used in 
PR #110217 (Flang-RT). Since the flang driver has to select the runtime 
to link, both build instructions must agree on the name.

Extracted out of #110217

---

Patch is 23.32 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/122341.diff


26 Files Affected:

- (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+2-2) 
- (modified) clang/lib/Driver/ToolChains/Flang.cpp (+4-4) 
- (modified) flang/CMakeLists.txt (+1-1) 
- (modified) flang/docs/FlangDriver.md (+3-3) 
- (modified) flang/docs/GettingStarted.md (+3-3) 
- (modified) flang/docs/OpenACC-descriptor-management.md (+1-1) 
- (modified) flang/docs/ReleaseNotes.md (+2) 
- (modified) flang/examples/ExternalHelloWorld/CMakeLists.txt (+1-1) 
- (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+1-1) 
- (modified) flang/runtime/CMakeLists.txt (+23-17) 
- (modified) flang/runtime/CUDA/CMakeLists.txt (+1-1) 
- (modified) flang/runtime/Float128Math/CMakeLists.txt (+1-1) 
- (modified) flang/runtime/time-intrinsic.cpp (+1-1) 
- (modified) flang/test/CMakeLists.txt (+7-1) 
- (modified) flang/test/Driver/gcc-toolchain-install-dir.f90 (+1-1) 
- (modified) flang/test/Driver/linker-flags.f90 (+4-4) 
- (modified) flang/test/Driver/msvc-dependent-lib-flags.f90 (+4-4) 
- (modified) flang/test/Driver/nostdlib.f90 (+1-1) 
- (modified) flang/test/Runtime/no-cpp-dep.c (+1-1) 
- (modified) flang/test/lit.cfg.py (+1-1) 
- (modified) flang/tools/f18/CMakeLists.txt (+4-4) 
- (modified) flang/unittests/CMakeLists.txt (+1-1) 
- (modified) flang/unittests/Evaluate/CMakeLists.txt (+2-2) 
- (modified) flang/unittests/Runtime/CMakeLists.txt (+1-1) 
- (modified) flang/unittests/Runtime/CUDA/CMakeLists.txt (+1-1) 
- (modified) lld/COFF/MinGW.cpp (+1-1) 


``diff
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index b5273dd8cf1e3a..c7b0a660ee021f 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1321,7 +1321,7 @@ void tools::addOpenMPHostOffloadingArgs(const Compilation 
&C,
 /// Add Fortran runtime libs
 void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args,
   llvm::opt::ArgStringList &CmdArgs) {
-  // Link FortranRuntime
+  // Link flang_rt
   // These are handled earlier on Windows by telling the frontend driver to
   // add the correct libraries to link against as dependents in the object
   // file.
@@ -1337,7 +1337,7 @@ void tools::addFortranRuntimeLibs(const ToolChain &TC, 
const ArgList &Args,
   if (AsNeeded)
 addAsNeededOption(TC, Args, CmdArgs, /*as_needed=*/false);
 }
-CmdArgs.push_back("-lFortranRuntime");
+CmdArgs.push_back("-lflang_rt");
 addArchSpecificRPath(TC, Args, CmdArgs);
   }
 
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index f1bf32b3238270..68a17edf8ca341 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -360,26 +360,26 @@ static void processVSRuntimeLibrary(const ToolChain &TC, 
const ArgList &Args,
   case options::OPT__SLASH_MT:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("--dependent-lib=libcmt");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.static.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.static.lib");
 break;
   case options::OPT__SLASH_MTd:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DEBUG");
 CmdArgs.push_back("--dependent-lib=libcmtd");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.static_dbg.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.static_dbg.lib");
 break;
   case options::OPT__SLASH_MD:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DLL");
 CmdArgs.push_back("--dependent-lib=msvcrt");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.dynamic.lib");
 break;
   case options::OPT__SLASH_MDd:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DEBUG");
 CmdArgs.push_back("-D_DLL");
 CmdArgs.push_back("--dependent-lib=msvcrtd");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic_dbg.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.dynamic_dbg.lib");
 break;
   }
 }
diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt
index 7d6dcb5c184a52..8a8b8bfa73b007 100644
--- a/flang/CMakeLists.txt
+++ b/flang/CMakeLists.txt
@@ -301,7 +301,7 @@ set(FLANG_DEFAULT_LINKER "" CACHE STRING
   "Default linker to use (linker name or absolute path, empty for platform 
default)")
 
 set(FLANG_DEFAULT_RTLIB "" CACHE STRING
-   "Default Fortran runtime library to use (\"libFortranRuntime\"), leave 
empty for platfo

[llvm-branch-commits] [flang] [Flang] Optionally do not compile the runtime in-tree (PR #122336)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur updated 
https://github.com/llvm/llvm-project/pull/122336

>From 4c676f468ba344ac0c388583a4ed28035d05ae89 Mon Sep 17 00:00:00 2001
From: Michael Kruse 
Date: Fri, 24 Jan 2025 15:00:16 +0100
Subject: [PATCH] users/meinersbur/flang_runtime_FLANG_INCLUDE_RUNTIME

---
 flang/CMakeLists.txt|  6 +-
 flang/test/CMakeLists.txt   |  6 +-
 flang/test/Driver/ctofortran.f90|  1 +
 flang/test/Driver/exec.f90  |  1 +
 flang/test/Runtime/no-cpp-dep.c |  2 +-
 flang/test/lit.cfg.py   |  5 -
 flang/test/lit.site.cfg.py.in   |  2 ++
 flang/tools/f18/CMakeLists.txt  |  2 +-
 flang/unittests/CMakeLists.txt  | 11 +-
 flang/unittests/Evaluate/CMakeLists.txt | 27 +
 10 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt
index b619553ef83021..7d6dcb5c184a52 100644
--- a/flang/CMakeLists.txt
+++ b/flang/CMakeLists.txt
@@ -247,6 +247,8 @@ else()
   include_directories(SYSTEM ${MLIR_TABLEGEN_OUTPUT_DIR})
 endif()
 
+option(FLANG_INCLUDE_RUNTIME "Build the runtime in-tree (deprecated; to be 
replaced with LLVM_ENABLE_RUNTIMES=flang-rt)" ON)
+
 set(FLANG_TOOLS_INSTALL_DIR "${CMAKE_INSTALL_BINDIR}" CACHE PATH
 "Path for binary subdirectory (defaults to '${CMAKE_INSTALL_BINDIR}')")
 mark_as_advanced(FLANG_TOOLS_INSTALL_DIR)
@@ -487,7 +489,9 @@ if (FLANG_CUF_RUNTIME)
   find_package(CUDAToolkit REQUIRED)
 endif()
 
-add_subdirectory(runtime)
+if (FLANG_INCLUDE_RUNTIME)
+  add_subdirectory(runtime)
+endif ()
 
 if (LLVM_INCLUDE_EXAMPLES)
   add_subdirectory(examples)
diff --git a/flang/test/CMakeLists.txt b/flang/test/CMakeLists.txt
index cab214c2ef4c8c..e398e0786147aa 100644
--- a/flang/test/CMakeLists.txt
+++ b/flang/test/CMakeLists.txt
@@ -71,9 +71,13 @@ set(FLANG_TEST_DEPENDS
   llvm-objdump
   llvm-readobj
   split-file
-  FortranRuntime
   FortranDecimal
 )
+
+if (FLANG_INCLUDE_RUNTIME)
+  list(APPEND FLANG_TEST_DEPENDS FortranRuntime)
+endif ()
+
 if (LLVM_ENABLE_PLUGINS AND NOT WIN32)
   list(APPEND FLANG_TEST_DEPENDS Bye)
 endif()
diff --git a/flang/test/Driver/ctofortran.f90 b/flang/test/Driver/ctofortran.f90
index 78eac32133b18e..10c7adaccc9588 100644
--- a/flang/test/Driver/ctofortran.f90
+++ b/flang/test/Driver/ctofortran.f90
@@ -1,4 +1,5 @@
 ! UNSUPPORTED: system-windows
+! REQUIRES: flang-rt
 ! RUN: split-file %s %t
 ! RUN: chmod +x %t/runtest.sh
 ! RUN: %t/runtest.sh %t %t/ffile.f90 %t/cfile.c %flang | FileCheck %s
diff --git a/flang/test/Driver/exec.f90 b/flang/test/Driver/exec.f90
index fd174005ddf62a..9ca91ee24011c9 100644
--- a/flang/test/Driver/exec.f90
+++ b/flang/test/Driver/exec.f90
@@ -1,4 +1,5 @@
 ! UNSUPPORTED: system-windows
+! REQUIRES: flang-rt
 ! Verify that flang can correctly build executables.
 
 ! RUN: %flang %s -o %t
diff --git a/flang/test/Runtime/no-cpp-dep.c b/flang/test/Runtime/no-cpp-dep.c
index b1a5fa004014cc..7303ce63fdec41 100644
--- a/flang/test/Runtime/no-cpp-dep.c
+++ b/flang/test/Runtime/no-cpp-dep.c
@@ -3,7 +3,7 @@ This test makes sure that flang's runtime does not depend on 
the C++ runtime
 library. It tries to link this simple file against libFortranRuntime.a with
 a C compiler.
 
-REQUIRES: c-compiler
+REQUIRES: c-compiler, flang-rt
 
 RUN: %if system-aix %{ export OBJECT_MODE=64 %}
 RUN: %cc -std=c99 %s -I%include %libruntime -lm  \
diff --git a/flang/test/lit.cfg.py b/flang/test/lit.cfg.py
index c452b6d231c89f..f4580afc8c47b1 100644
--- a/flang/test/lit.cfg.py
+++ b/flang/test/lit.cfg.py
@@ -163,10 +163,13 @@
 ToolSubst("%not_todo_abort_cmd", command=FindTool("not"), 
unresolved="fatal")
 )
 
+if config.flang_include_runtime:
+config.available_features.add("flang-rt")
+
 # Define some variables to help us test that the flang runtime doesn't depend 
on
 # the C++ runtime libraries. For this we need a C compiler. If for some reason
 # we don't have one, we can just disable the test.
-if config.cc:
+if config.flang_include_runtime and config.cc:
 libruntime = os.path.join(config.flang_lib_dir, "libFortranRuntime.a")
 include = os.path.join(config.flang_src_dir, "include")
 
diff --git a/flang/test/lit.site.cfg.py.in b/flang/test/lit.site.cfg.py.in
index d1a0ac763cf8a0..697ba3fa797633 100644
--- a/flang/test/lit.site.cfg.py.in
+++ b/flang/test/lit.site.cfg.py.in
@@ -1,6 +1,7 @@
 @LIT_SITE_CFG_IN_HEADER@
 
 import sys
+import lit.util
 
 config.llvm_tools_dir = lit_config.substitute("@LLVM_TOOLS_DIR@")
 config.llvm_shlib_dir = lit_config.substitute(path(r"@SHLIBDIR@"))
@@ -32,6 +33,7 @@ else:
 config.openmp_module_dir = None
 config.flang_runtime_f128_math_lib = "@FLANG_RUNTIME_F128_MATH_LIB@"
 config.have_ldbl_mant_dig_113 = "@HAVE_LDBL_MANT_DIG_113@"
+config.flang_include_runtime = 
lit.util.pythonize_bool("@FLANG_INCLUDE_RUNTIME@")
 
 import lit.llvm
 lit.llvm.initialize(lit_config, config)
diff --git a/flang/tools/f18/CMakeLists

[llvm-branch-commits] [flang] [Flang] Optionally do not compile the runtime in-tree (PR #122336)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur edited 
https://github.com/llvm/llvm-project/pull/122336
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [Flang] Promote FortranEvaluateTesting library (PR #122334)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur closed 
https://github.com/llvm/llvm-project/pull/122334
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [flang] [lld] [Flang] Rename libFortranRuntime.a to libflang_rt.a (PR #122341)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur updated 
https://github.com/llvm/llvm-project/pull/122341

>From 875607fdecfada90a80ec732637ea9595fe72ba3 Mon Sep 17 00:00:00 2001
From: Michael Kruse 
Date: Fri, 24 Jan 2025 16:42:24 +0100
Subject: [PATCH] [Flang] Rename libFortranRuntime.a to libflang_rt.a

---
 clang/lib/Driver/ToolChains/CommonArgs.cpp|  4 +-
 clang/lib/Driver/ToolChains/Flang.cpp |  8 ++--
 flang/CMakeLists.txt  |  2 +-
 flang/docs/FlangDriver.md |  6 +--
 flang/docs/GettingStarted.md  |  6 +--
 flang/docs/OpenACC-descriptor-management.md   |  2 +-
 flang/docs/ReleaseNotes.md|  2 +
 .../ExternalHelloWorld/CMakeLists.txt |  2 +-
 flang/lib/Optimizer/Builder/IntrinsicCall.cpp |  2 +-
 flang/runtime/CMakeLists.txt  | 40 +++
 flang/runtime/CUDA/CMakeLists.txt |  2 +-
 flang/runtime/Float128Math/CMakeLists.txt |  2 +-
 flang/runtime/time-intrinsic.cpp  |  2 +-
 flang/test/CMakeLists.txt |  8 +++-
 .../test/Driver/gcc-toolchain-install-dir.f90 |  2 +-
 flang/test/Driver/linker-flags.f90|  8 ++--
 .../test/Driver/msvc-dependent-lib-flags.f90  |  8 ++--
 flang/test/Driver/nostdlib.f90|  2 +-
 flang/test/Runtime/no-cpp-dep.c   |  2 +-
 flang/test/lit.cfg.py |  2 +-
 flang/tools/f18/CMakeLists.txt|  8 ++--
 flang/unittests/CMakeLists.txt|  2 +-
 flang/unittests/Evaluate/CMakeLists.txt   |  4 +-
 flang/unittests/Runtime/CMakeLists.txt|  2 +-
 flang/unittests/Runtime/CUDA/CMakeLists.txt   |  2 +-
 lld/COFF/MinGW.cpp|  2 +-
 26 files changed, 73 insertions(+), 59 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp 
b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index b5273dd8cf1e3a..c7b0a660ee021f 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1321,7 +1321,7 @@ void tools::addOpenMPHostOffloadingArgs(const Compilation 
&C,
 /// Add Fortran runtime libs
 void tools::addFortranRuntimeLibs(const ToolChain &TC, const ArgList &Args,
   llvm::opt::ArgStringList &CmdArgs) {
-  // Link FortranRuntime
+  // Link flang_rt
   // These are handled earlier on Windows by telling the frontend driver to
   // add the correct libraries to link against as dependents in the object
   // file.
@@ -1337,7 +1337,7 @@ void tools::addFortranRuntimeLibs(const ToolChain &TC, 
const ArgList &Args,
   if (AsNeeded)
 addAsNeededOption(TC, Args, CmdArgs, /*as_needed=*/false);
 }
-CmdArgs.push_back("-lFortranRuntime");
+CmdArgs.push_back("-lflang_rt");
 addArchSpecificRPath(TC, Args, CmdArgs);
   }
 
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index f1bf32b3238270..68a17edf8ca341 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -360,26 +360,26 @@ static void processVSRuntimeLibrary(const ToolChain &TC, 
const ArgList &Args,
   case options::OPT__SLASH_MT:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("--dependent-lib=libcmt");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.static.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.static.lib");
 break;
   case options::OPT__SLASH_MTd:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DEBUG");
 CmdArgs.push_back("--dependent-lib=libcmtd");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.static_dbg.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.static_dbg.lib");
 break;
   case options::OPT__SLASH_MD:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DLL");
 CmdArgs.push_back("--dependent-lib=msvcrt");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.dynamic.lib");
 break;
   case options::OPT__SLASH_MDd:
 CmdArgs.push_back("-D_MT");
 CmdArgs.push_back("-D_DEBUG");
 CmdArgs.push_back("-D_DLL");
 CmdArgs.push_back("--dependent-lib=msvcrtd");
-CmdArgs.push_back("--dependent-lib=FortranRuntime.dynamic_dbg.lib");
+CmdArgs.push_back("--dependent-lib=flang_rt.dynamic_dbg.lib");
 break;
   }
 }
diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt
index 7d6dcb5c184a52..8a8b8bfa73b007 100644
--- a/flang/CMakeLists.txt
+++ b/flang/CMakeLists.txt
@@ -301,7 +301,7 @@ set(FLANG_DEFAULT_LINKER "" CACHE STRING
   "Default linker to use (linker name or absolute path, empty for platform 
default)")
 
 set(FLANG_DEFAULT_RTLIB "" CACHE STRING
-   "Default Fortran runtime library to use (\"libFortranRuntime\"), leave 
empty for platform default.")
+   "Default Fortran runtime library to use (\"libflang_rt\"), leave empty for 
platform default.")
 
 if (NOT(FLANG_DEFAULT_RTLIB STREQUAL ""))
   message(W

[llvm-branch-commits] [flang] [Flang] Promote FortranEvaluateTesting library (PR #122334)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur updated 
https://github.com/llvm/llvm-project/pull/122334

>From 71015c8f9ab17431d052472aec99dc67929a166e Mon Sep 17 00:00:00 2001
From: Michael Kruse 
Date: Fri, 24 Jan 2025 16:30:47 +0100
Subject: [PATCH] [Flang] Promote FortranEvaluateTesting library

---
 .../flang/Testing}/fp-testing.h   | 14 ++--
 .../flang/Testing}/testing.h  | 14 ++--
 flang/lib/CMakeLists.txt  |  4 +++
 flang/lib/Testing/CMakeLists.txt  | 20 +++
 .../Evaluate => lib/Testing}/fp-testing.cpp   | 10 +-
 .../Evaluate => lib/Testing}/testing.cpp  | 10 +-
 flang/unittests/Evaluate/CMakeLists.txt   | 35 ++-
 .../Evaluate/ISO-Fortran-binding.cpp  |  2 +-
 .../Evaluate/bit-population-count.cpp |  2 +-
 flang/unittests/Evaluate/expression.cpp   |  2 +-
 flang/unittests/Evaluate/folding.cpp  |  2 +-
 flang/unittests/Evaluate/integer.cpp  |  2 +-
 flang/unittests/Evaluate/intrinsics.cpp   |  2 +-
 .../Evaluate/leading-zero-bit-count.cpp   |  2 +-
 flang/unittests/Evaluate/logical.cpp  |  2 +-
 flang/unittests/Evaluate/real.cpp |  4 +--
 flang/unittests/Evaluate/reshape.cpp  |  2 +-
 flang/unittests/Evaluate/uint128.cpp  |  2 +-
 18 files changed, 87 insertions(+), 44 deletions(-)
 rename flang/{unittests/Evaluate => include/flang/Testing}/fp-testing.h (54%)
 rename flang/{unittests/Evaluate => include/flang/Testing}/testing.h (74%)
 create mode 100644 flang/lib/Testing/CMakeLists.txt
 rename flang/{unittests/Evaluate => lib/Testing}/fp-testing.cpp (87%)
 rename flang/{unittests/Evaluate => lib/Testing}/testing.cpp (88%)

diff --git a/flang/unittests/Evaluate/fp-testing.h 
b/flang/include/flang/Testing/fp-testing.h
similarity index 54%
rename from flang/unittests/Evaluate/fp-testing.h
rename to flang/include/flang/Testing/fp-testing.h
index 9091963a99b32d..e223d2ef7d1b8b 100644
--- a/flang/unittests/Evaluate/fp-testing.h
+++ b/flang/include/flang/Testing/fp-testing.h
@@ -1,5 +1,13 @@
-#ifndef FORTRAN_TEST_EVALUATE_FP_TESTING_H_
-#define FORTRAN_TEST_EVALUATE_FP_TESTING_H_
+//===-- include/flang/Testing/fp-testing.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef FORTRAN_TESTING_FP_TESTING_H_
+#define FORTRAN_TESTING_FP_TESTING_H_
 
 #include "flang/Common/target-rounding.h"
 #include 
@@ -24,4 +32,4 @@ class ScopedHostFloatingPointEnvironment {
 #endif
 };
 
-#endif // FORTRAN_TEST_EVALUATE_FP_TESTING_H_
+#endif /* FORTRAN_TESTING_FP_TESTING_H_ */
diff --git a/flang/unittests/Evaluate/testing.h 
b/flang/include/flang/Testing/testing.h
similarity index 74%
rename from flang/unittests/Evaluate/testing.h
rename to flang/include/flang/Testing/testing.h
index 422e2853c05bc6..404650c9a89f2c 100644
--- a/flang/unittests/Evaluate/testing.h
+++ b/flang/include/flang/Testing/testing.h
@@ -1,5 +1,13 @@
-#ifndef FORTRAN_EVALUATE_TESTING_H_
-#define FORTRAN_EVALUATE_TESTING_H_
+//===-- include/flang/Testing/testing.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef FORTRAN_TESTING_TESTING_H_
+#define FORTRAN_TESTING_TESTING_H_
 
 #include 
 #include 
@@ -33,4 +41,4 @@ FailureDetailPrinter Match(const char *file, int line, const 
std::string &want,
 FailureDetailPrinter Compare(const char *file, int line, const char *xs,
 const char *rel, const char *ys, std::uint64_t x, std::uint64_t y);
 } // namespace testing
-#endif // FORTRAN_EVALUATE_TESTING_H_
+#endif /* FORTRAN_TESTING_TESTING_H_ */
diff --git a/flang/lib/CMakeLists.txt b/flang/lib/CMakeLists.txt
index 05c3535b09b3d3..8b201d9a758a80 100644
--- a/flang/lib/CMakeLists.txt
+++ b/flang/lib/CMakeLists.txt
@@ -8,3 +8,7 @@ add_subdirectory(Frontend)
 add_subdirectory(FrontendTool)
 
 add_subdirectory(Optimizer)
+
+if (FLANG_INCLUDE_TESTS)
+  add_subdirectory(Testing)
+endif ()
diff --git a/flang/lib/Testing/CMakeLists.txt b/flang/lib/Testing/CMakeLists.txt
new file mode 100644
index 00..8051bc09736d16
--- /dev/null
+++ b/flang/lib/Testing/CMakeLists.txt
@@ -0,0 +1,20 @@
+#===-- lib/Testing/CMakeLists.txt 
--===#
+#
+# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+# See https://llvm.org/LICENSE.txt for license information.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+#
+#===

[llvm-branch-commits] [flang] [Flang] Promote FortranEvaluateTesting library (PR #122334)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

Meinersbur wrote:

GitHub interpreted pushing to the target branch (NOT main) of the patch series 
as "merging". There seems to be no way te re-open this PR, I will create a new 
one.

https://github.com/llvm/llvm-project/pull/122334
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Flang] Introduce FortranSupport (PR #122069)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

Meinersbur wrote:

GitHub interpreted pushing to the target branch (NOT main) of the patch series 
as "merging". There seems to be no way te re-open this PR, I will create a new 
one.

https://github.com/llvm/llvm-project/pull/122069
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [flang] [llvm] [Flang] LLVM_ENABLE_RUNTIMES=flang-rt (PR #110217)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-driver

Author: Michael Kruse (Meinersbur)


Changes

Extract Flang's runtime library to use the LLVM_ENABLE_RUNTIME mechanism.

Motivation:
 * Consistency with LLVM's other runtime libraries (compiler-rt, libc, libcxx, 
openmp offload, ...)
 * Allows compiling the runtime for multiple targets at once using the 
LLVM_RUNTIME_TARGETS configuration options
 * Installs the runtime into the compiler's per-target resource directory so it 
can be automatically found even when cross-compiling

Potential future directions: 
 * Uses CMake's support for compiling Fortran files, including dependency 
resolution of Fortran modules
 * Improve robustness of compiling `libomp.mod` when openmp is available
 * Remove Flang's dependency from flang-rt's RTNAME function declarations 
(tblgen?)
 * Reduce Flang's build-time dependency from flang-rt's `REAL(16)` support

See RFC discussion at 
https://discourse.llvm.org/t/rfc-use-llvm-enable-runtimes-for-flangs-runtime/80826

Patch series:
 * #110244
 * #112188
 * #121997
 * #122069
 * #122334
 * #122336
 * #122341
 * #110298
 * #110217 (this PR)
 * #121782
 * #124126

Patch for lab.llvm.org buildbots:
 * https://github.com/llvm/llvm-zorg/pull/333


---

Patch is 108.72 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/110217.diff


41 Files Affected:

- (modified) clang/lib/Driver/ToolChains/Flang.cpp (+9-5) 
- (added) flang-rt/.clang-tidy (+2) 
- (added) flang-rt/CMakeLists.txt (+248) 
- (added) flang-rt/CODE_OWNERS.TXT (+14) 
- (added) flang-rt/LICENSE.TXT (+234) 
- (added) flang-rt/README.md (+188) 
- (added) flang-rt/cmake/modules/AddFlangRT.cmake (+186) 
- (added) flang-rt/cmake/modules/AddFlangRTOffload.cmake (+101) 
- (added) flang-rt/cmake/modules/GetToolchainDirs.cmake (+125) 
- (added) flang-rt/lib/CMakeLists.txt (+18) 
- (added) flang-rt/lib/FortranFloat128Math/CMakeLists.txt (+136) 
- (added) flang-rt/lib/Testing/CMakeLists.txt (+20) 
- (added) flang-rt/lib/flang_rt/CMakeLists.txt (+213) 
- (added) flang-rt/lib/flang_rt/CUDA/CMakeLists.txt (+33) 
- (modified) flang-rt/lib/flang_rt/io-api-minimal.cpp (+1-1) 
- (added) flang-rt/test/CMakeLists.txt (+59) 
- (modified) flang-rt/test/Driver/ctofortran.f90 (+5-24) 
- (modified) flang-rt/test/Driver/exec.f90 (+4-4) 
- (added) flang-rt/test/NonGtestUnit/lit.cfg.py (+22) 
- (added) flang-rt/test/NonGtestUnit/lit.site.cfg.py.in (+14) 
- (modified) flang-rt/test/Runtime/no-cpp-dep.c (+3-2) 
- (added) flang-rt/test/Unit/lit.cfg.py (+21) 
- (added) flang-rt/test/Unit/lit.site.cfg.py.in (+15) 
- (added) flang-rt/test/lit.cfg.py (+102) 
- (added) flang-rt/test/lit.site.cfg.py.in (+19) 
- (added) flang-rt/unittests/CMakeLists.txt (+111) 
- (added) flang-rt/unittests/Evaluate/CMakeLists.txt (+21) 
- (added) flang-rt/unittests/Runtime/CMakeLists.txt (+48) 
- (added) flang-rt/unittests/Runtime/CUDA/CMakeLists.txt (+18) 
- (modified) flang/CMakeLists.txt (+26-27) 
- (added) flang/cmake/modules/FlangCommon.cmake (+43) 
- (modified) flang/docs/GettingStarted.md (+58-50) 
- (modified) flang/docs/ReleaseNotes.md (+7-1) 
- (modified) flang/module/iso_fortran_env_impl.f90 (+1-1) 
- (modified) flang/test/lit.cfg.py (-20) 
- (modified) flang/test/lit.site.cfg.py.in (-3) 
- (modified) llvm/CMakeLists.txt (+7-1) 
- (modified) llvm/cmake/modules/LLVMExternalProjectUtils.cmake (+15-1) 
- (modified) llvm/projects/CMakeLists.txt (+3-1) 
- (modified) llvm/runtimes/CMakeLists.txt (+18-7) 
- (modified) runtimes/CMakeLists.txt (+1-1) 


``diff
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 68a17edf8ca341..17a8a4dd8d0a87 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -342,11 +342,15 @@ static void processVSRuntimeLibrary(const ToolChain &TC, 
const ArgList &Args,
 ArgStringList &CmdArgs) {
   assert(TC.getTriple().isKnownWindowsMSVCEnvironment() &&
  "can only add VS runtime library on Windows!");
-  // if -fno-fortran-main has been passed, skip linking Fortran_main.a
-  if (TC.getTriple().isKnownWindowsMSVCEnvironment()) {
-CmdArgs.push_back(Args.MakeArgString(
-"--dependent-lib=" + TC.getCompilerRTBasename(Args, "builtins")));
-  }
+
+  // Flang/Clang (including clang-cl) -compiled programs targeting the MSVC ABI
+  // should only depend on msv(u)crt. LLVM still emits libgcc/compiler-rt
+  // functions in some cases like 128-bit integer math (__udivti3, __modti3,
+  // __fixsfti, __floattidf, ...) that msvc does not support. We are injecting 
a
+  // dependency to Compiler-RT's builtin library where these are implemented.
+  CmdArgs.push_back(Args.MakeArgString(
+  "--dependent-lib=" + TC.getCompilerRTBasename(Args, "builtins")));
+
   unsigned RTOptionID = options::OPT__SLASH_MT;
   if (auto *rtl = Args.getLastArg(options::OPT_fms_runtime_lib_EQ)) {
 RTOptionID = llvm::StringSwitch(rtl->getVal

[llvm-branch-commits] [flang] [Flang] Introduce FortranSupport (PR #122069)

2025-01-24 Thread Michael Kruse via llvm-branch-commits

Meinersbur wrote:

Moving this PR out of the chain causes merge conflicts further down, making 
maintaining consistency of the series even more difficult. At least the change 
`std::optional` to `optional.h` is needed or I cannot compile it with nvcc (I 
am not sure how you do). I  understand that with submitting patches I accept 
some responsibility to maintain it. I don't want to do so if it is not in a 
maintainable state.

I will put this PR as first into the chain, so at least it can be applied 
independently.

> breaking it up based on today's runtime usage of it is a bit artificial (some 
> features from it that are not used today in the runtime may very well be 
> already usable and may be used tomorrow).

The current FortranCommon is already artificial presumably because it has grown 
organically: Everything that does not belong to some other library. Some of it 
(e.g. `genEntryBlock` from `OpenMP-utils.h`, `getFlangRepositoryString()` from 
`Version.h`) should conceptually never be used by the runtime. Splitting it up 
by usages actually makes it less artificial. 

Any code that is currently not be used in the runtime should be assumed to not 
work with the runtime. E.g. because it causes an additional link dependency to 
`FortranCommon.a/so` or libc++, cannot be compiled with nvcc, or requires 
annotation for offloading. Similarly, code that is not being tested can be 
assumed to not work or after changes to the future. That is, additional work to 
make it usable within the runtime is required anyways. 

Every other code, including those in other libraries, might also be eventually 
be useful for the runtime (e.g. if we are to include a JIT). That is not a 
reason to preemptively make all of them a dependency of the runtime.

Partial use of libraries is called "erroneous configuration" in LLVM, see 
https://github.com/llvm/llvm-project/commit/ebc3302725350c44aaf5f97ce7ba484e30b3efa8
 

Sorry, I cannot follow the argument.

https://github.com/llvm/llvm-project/pull/122069
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)

2025-01-24 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/124019
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)

2025-01-24 Thread Kareem Ergawy via llvm-branch-commits


@@ -34,52 +34,48 @@ def PrivateClauseOp : OpenMP_Op<"private", 
[IsolatedFromAbove, RecipeInterface]>
   let description = [{
 This operation provides a declaration of how to implement the
 [first]privatization of a variable. The dialect users should provide
-information about how to create an instance of the type in the alloc 
region,
-how to initialize the copy from the original item in the copy region, and 
if
-needed, how to deallocate allocated memory in the dealloc region.
+which type should be allocated for this variable. The allocated (usually by
+alloca) variable is passed to the initialization region which does 
everything
+else (e.g. initialization of Fortran runtime descriptors). Information 
about
+how to initialize the copy from the original item should be given in the
+copy region, and if needed, how to deallocate memory (allocated by the
+initialization region) in the dealloc region.

ergawy wrote:

Ah, thanks for the clarification. I see you expanded the docs below.

https://github.com/llvm/llvm-project/pull/124019
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Analysis] Add DebugInfoCache analysis (PR #118629)

2025-01-24 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/118629

>From 54bc13d26e0c0c3cd9b2205ca3453c58a815be4e Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Sun, 15 Sep 2024 10:51:38 -0700
Subject: [PATCH] [Analysis] Add DebugInfoCache analysis

Summary:
The analysis simply primes and caches DebugInfoFinders for each DICompileUnit 
in a module. This
allows (future) callers like CoroSplitPass to compute global debug info 
metadata (required for
coroutine function cloning) much faster. Specifically, pay the price of 
DICompileUnit processing
only once per compile unit, rather than once per coroutine.

Test Plan:
Added a smoke test for the new analysis
ninja check-llvm-unit check-llvm

stack-info: PR: https://github.com/llvm/llvm-project/pull/118629, branch: 
users/artempyanykh/fast-coro-upstream/10
---
 llvm/include/llvm/Analysis/DebugInfoCache.h   |  50 +
 llvm/include/llvm/IR/DebugInfo.h  |   4 +-
 llvm/lib/Analysis/CMakeLists.txt  |   1 +
 llvm/lib/Analysis/DebugInfoCache.cpp  |  47 
 llvm/lib/Passes/PassBuilder.cpp   |   1 +
 llvm/lib/Passes/PassRegistry.def  |   1 +
 llvm/unittests/Analysis/CMakeLists.txt|   1 +
 .../unittests/Analysis/DebugInfoCacheTest.cpp | 211 ++
 8 files changed, 315 insertions(+), 1 deletion(-)
 create mode 100644 llvm/include/llvm/Analysis/DebugInfoCache.h
 create mode 100644 llvm/lib/Analysis/DebugInfoCache.cpp
 create mode 100644 llvm/unittests/Analysis/DebugInfoCacheTest.cpp

diff --git a/llvm/include/llvm/Analysis/DebugInfoCache.h 
b/llvm/include/llvm/Analysis/DebugInfoCache.h
new file mode 100644
index 00..dbd6802c99ea01
--- /dev/null
+++ b/llvm/include/llvm/Analysis/DebugInfoCache.h
@@ -0,0 +1,50 @@
+//===- llvm/Analysis/DebugInfoCache.h - debug info cache *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This file contains an analysis that builds a cache of debug info for each
+// DICompileUnit in a module.
+//
+//===--===//
+
+#ifndef LLVM_ANALYSIS_DEBUGINFOCACHE_H
+#define LLVM_ANALYSIS_DEBUGINFOCACHE_H
+
+#include "llvm/IR/DebugInfo.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+/// Processes and caches debug info for each DICompileUnit in a module.
+///
+/// The result of the analysis is a set of DebugInfoFinders primed on their
+/// respective DICompileUnit. Such DebugInfoFinders can be used to speed up
+/// function cloning which otherwise requires an expensive traversal of
+/// DICompileUnit-level debug info. See an example usage in CoroSplit.
+class DebugInfoCache {
+public:
+  using DIFinderCache = SmallDenseMap;
+  DIFinderCache Result;
+
+  DebugInfoCache(const Module &M);
+
+  bool invalidate(Module &, const PreservedAnalyses &,
+  ModuleAnalysisManager::Invalidator &);
+};
+
+class DebugInfoCacheAnalysis
+: public AnalysisInfoMixin {
+  friend AnalysisInfoMixin;
+  static AnalysisKey Key;
+
+public:
+  using Result = DebugInfoCache;
+  Result run(Module &M, ModuleAnalysisManager &);
+};
+} // namespace llvm
+
+#endif
diff --git a/llvm/include/llvm/IR/DebugInfo.h b/llvm/include/llvm/IR/DebugInfo.h
index 73f45c3769be44..11907fbb7f20b3 100644
--- a/llvm/include/llvm/IR/DebugInfo.h
+++ b/llvm/include/llvm/IR/DebugInfo.h
@@ -120,11 +120,13 @@ class DebugInfoFinder {
   /// Process subprogram.
   void processSubprogram(DISubprogram *SP);
 
+  /// Process a compile unit.
+  void processCompileUnit(DICompileUnit *CU);
+
   /// Clear all lists.
   void reset();
 
 private:
-  void processCompileUnit(DICompileUnit *CU);
   void processScope(DIScope *Scope);
   void processType(DIType *DT);
   bool addCompileUnit(DICompileUnit *CU);
diff --git a/llvm/lib/Analysis/CMakeLists.txt b/llvm/lib/Analysis/CMakeLists.txt
index 0db5b80f336cb5..db9a569e301563 100644
--- a/llvm/lib/Analysis/CMakeLists.txt
+++ b/llvm/lib/Analysis/CMakeLists.txt
@@ -52,6 +52,7 @@ add_llvm_component_library(LLVMAnalysis
   DDGPrinter.cpp
   ConstraintSystem.cpp
   Delinearization.cpp
+  DebugInfoCache.cpp
   DemandedBits.cpp
   DependenceAnalysis.cpp
   DependenceGraphBuilder.cpp
diff --git a/llvm/lib/Analysis/DebugInfoCache.cpp 
b/llvm/lib/Analysis/DebugInfoCache.cpp
new file mode 100644
index 00..c1a3e89f0a6ccf
--- /dev/null
+++ b/llvm/lib/Analysis/DebugInfoCache.cpp
@@ -0,0 +1,47 @@
+//===- llvm/Analysis/DebugInfoCache.cpp - debug info cache 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===-

[llvm-branch-commits] [llvm] [Coro] Use DebugInfoCache to speed up cloning in CoroSplitPass (PR #118630)

2025-01-24 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/118630

>From fc245ef152cfe134e8f9d6a39a7a38043163b7ce Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Sun, 15 Sep 2024 11:00:00 -0700
Subject: [PATCH] [Coro] Use DebugInfoCache to speed up cloning in
 CoroSplitPass

Summary:
We can use a DebugInfoFinder from DebugInfoCache which is already primed on a 
compile unit to speed
up collection of module-level debug info.

The pass could likely be another 2x+ faster if we avoid rebuilding the set of 
global debug
info. This needs further massaging of CloneFunction and ValueMapper, though, 
and can be done
incrementally on top of this.

Comparing performance of CoroSplitPass at various points in this stack, this is 
anecdata from a sample
cpp file compiled with full debug info:
| | Baseline | IdentityMD set | Prebuilt CommonDI | Cached CU 
DIFinder (cur.) |
|-|--||---|---|
| CoroSplitPass   | 306ms| 221ms  | 68ms  | 17ms
  |
| CoroCloner  | 101ms| 72ms   | 0.5ms | 0.5ms   
  |
| CollectGlobalDI | -| -  | 63ms  | 13ms
  |
| Speed up| 1x   | 1.4x   | 4.5x  | 18x 
  |

Test Plan:
ninja check-llvm-unit
ninja check-llvm

Compiled a sample cpp file with time trace to get the avg. duration of the pass 
and inner scopes.

stack-info: PR: https://github.com/llvm/llvm-project/pull/118630, branch: 
users/artempyanykh/fast-coro-upstream/11
---
 llvm/include/llvm/Transforms/Coroutines/ABI.h | 13 +++--
 llvm/lib/Analysis/CGSCCPassManager.cpp|  7 +++
 llvm/lib/Transforms/Coroutines/CoroSplit.cpp  | 55 +++
 llvm/test/Other/new-pass-manager.ll   |  1 +
 llvm/test/Other/new-pm-defaults.ll|  1 +
 llvm/test/Other/new-pm-lto-defaults.ll|  1 +
 llvm/test/Other/new-pm-pgo-preinline.ll   |  1 +
 .../Other/new-pm-thinlto-postlink-defaults.ll |  1 +
 .../new-pm-thinlto-postlink-pgo-defaults.ll   |  1 +
 ...-pm-thinlto-postlink-samplepgo-defaults.ll |  1 +
 .../Other/new-pm-thinlto-prelink-defaults.ll  |  1 +
 .../new-pm-thinlto-prelink-pgo-defaults.ll|  1 +
 ...w-pm-thinlto-prelink-samplepgo-defaults.ll |  1 +
 .../Analysis/CGSCCPassManagerTest.cpp |  4 +-
 14 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/llvm/include/llvm/Transforms/Coroutines/ABI.h 
b/llvm/include/llvm/Transforms/Coroutines/ABI.h
index 0b2d405f3caec4..2cf614b6bb1e2a 100644
--- a/llvm/include/llvm/Transforms/Coroutines/ABI.h
+++ b/llvm/include/llvm/Transforms/Coroutines/ABI.h
@@ -15,6 +15,7 @@
 #ifndef LLVM_TRANSFORMS_COROUTINES_ABI_H
 #define LLVM_TRANSFORMS_COROUTINES_ABI_H
 
+#include "llvm/Analysis/DebugInfoCache.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Transforms/Coroutines/CoroShape.h"
 #include "llvm/Transforms/Coroutines/MaterializationUtils.h"
@@ -53,7 +54,8 @@ class BaseABI {
   // Perform the function splitting according to the ABI.
   virtual void splitCoroutine(Function &F, coro::Shape &Shape,
   SmallVectorImpl &Clones,
-  TargetTransformInfo &TTI) = 0;
+  TargetTransformInfo &TTI,
+  const DebugInfoCache *DICache) = 0;
 
   Function &F;
   coro::Shape &Shape;
@@ -73,7 +75,8 @@ class SwitchABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
   SmallVectorImpl &Clones,
-  TargetTransformInfo &TTI) override;
+  TargetTransformInfo &TTI,
+  const DebugInfoCache *DICache) override;
 };
 
 class AsyncABI : public BaseABI {
@@ -86,7 +89,8 @@ class AsyncABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
   SmallVectorImpl &Clones,
-  TargetTransformInfo &TTI) override;
+  TargetTransformInfo &TTI,
+  const DebugInfoCache *DICache) override;
 };
 
 class AnyRetconABI : public BaseABI {
@@ -99,7 +103,8 @@ class AnyRetconABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
   SmallVectorImpl &Clones,
-  TargetTransformInfo &TTI) override;
+  TargetTransformInfo &TTI,
+  const DebugInfoCache *DICache) override;
 };
 
 } // end namespace coro
diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp 
b/llvm/lib/Analysis/CGSCCPassManager.cpp
index 948bc2435ab275..3ba085cdb0be8b 100644
--- a/llvm/lib/Analysis/CGSCCPassManager.cpp
+++ b/llvm/lib/Analysis/CGSCCPassManager.cpp
@@ -14,6 +14,7 @@
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/iterator_range.h"
+#include "llvm/Analy

[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)

2025-01-24 Thread Kareem Ergawy via llvm-branch-commits


@@ -55,15 +55,19 @@ class MapsForPrivatizedSymbolsPass
 std::underlying_type_t>(
 llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO);
 Operation *definingOp = var.getDefiningOp();
-auto declOp = llvm::dyn_cast_or_null(definingOp);
-assert(declOp &&
-   "Expected defining Op of privatized var to be hlfir.declare");
+assert(definingOp &&
+   "Privatizing a block argument without any hlfir.declare");

ergawy wrote:

> MLIR values can come from two places:

That's exactly my point. What prevents us from working with block args here? 
Why do we need to assume it is defined by an op?

I am not against that, we can keep the assertion. But beyond the fact that we 
need to call `getBase` below, we only care about `var` and a `Value` and not 
about its defining op.

https://github.com/llvm/llvm-project/pull/124019
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)

2025-01-24 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy approved this pull request.

LGTM! Thanks Tom! However, I have to admit, the `fir` dialect type system is 
still a "semi-"blackbox to me, so someone more familiar with it needs to 
carefully review changes in `PrivateReductionUtils.cpp`.

https://github.com/llvm/llvm-project/pull/124019
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic created 
https://github.com/llvm/llvm-project/pull/124298

Record all uses outside cycle with divergent exit during
propagateTemporalDivergence in Uniformity analysis.
With this list of candidates for temporal divergence lowering,
excluding known lane masks from control flow intrinsics,
find sources from inside the cycle that are not i1 and uniform.
Temporal divergence lowering (non i1):
create copy(v_mov) to vgpr, with implicit exec (to stop other
passes from moving this copy outside of the cycle) and use this
vgpr outside of the cycle instead of original uniform source.

>From 3e04401258c91639105b1f2f17a84badbdf928ae Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Fri, 24 Jan 2025 16:56:30 +0100
Subject: [PATCH] AMDGPU/GlobalISel: Temporal divergence lowering (non i1)

Record all uses outside cycle with divergent exit during
propagateTemporalDivergence in Uniformity analysis.
With this list of candidates for temporal divergence lowering,
excluding known lane masks from control flow intrinsics,
find sources from inside the cycle that are not i1 and uniform.
Temporal divergence lowering (non i1):
create copy(v_mov) to vgpr, with implicit exec (to stop other
passes from moving this copy outside of the cycle) and use this
vgpr outside of the cycle instead of original uniform source.
---
 llvm/include/llvm/ADT/GenericUniformityImpl.h | 37 +++
 llvm/include/llvm/ADT/GenericUniformityInfo.h |  6 +++
 llvm/lib/Analysis/UniformityAnalysis.cpp  |  3 +-
 .../lib/CodeGen/MachineUniformityAnalysis.cpp |  8 ++--
 .../AMDGPUGlobalISelDivergenceLowering.cpp| 47 ++-
 .../lib/Target/AMDGPU/AMDGPURegBankSelect.cpp | 24 --
 llvm/lib/Target/AMDGPU/SILowerI1Copies.h  |  6 +++
 ...divergent-i1-phis-no-lane-mask-merging.mir |  7 +--
 ...ergence-divergent-i1-used-outside-loop.mir | 19 
 .../divergence-temporal-divergent-reg.ll  | 18 +++
 .../divergence-temporal-divergent-reg.mir |  3 +-
 .../AMDGPU/GlobalISel/regbankselect-mui.ll| 17 +++
 12 files changed, 153 insertions(+), 42 deletions(-)

diff --git a/llvm/include/llvm/ADT/GenericUniformityImpl.h 
b/llvm/include/llvm/ADT/GenericUniformityImpl.h
index bd09f4fe43e087..91ee0e41332199 100644
--- a/llvm/include/llvm/ADT/GenericUniformityImpl.h
+++ b/llvm/include/llvm/ADT/GenericUniformityImpl.h
@@ -342,6 +342,10 @@ template  class 
GenericUniformityAnalysisImpl {
   typename SyncDependenceAnalysisT::DivergenceDescriptor;
   using BlockLabelMapT = typename SyncDependenceAnalysisT::BlockLabelMap;
 
+  // Use outside cycle with divergent exit
+  using UOCWDE =
+  std::tuple;
+
   GenericUniformityAnalysisImpl(const DominatorTreeT &DT, const CycleInfoT &CI,
 const TargetTransformInfo *TTI)
   : Context(CI.getSSAContext()), F(*Context.getFunction()), CI(CI),
@@ -395,6 +399,14 @@ template  class 
GenericUniformityAnalysisImpl {
   }
 
   void print(raw_ostream &out) const;
+  SmallVector UsesOutsideCycleWithDivergentExit;
+  void recordUseOutsideCycleWithDivergentExit(const InstructionT *,
+  const InstructionT *,
+  const CycleT *);
+  inline iterator_range getUsesOutsideCycleWithDivergentExit() const 
{
+return make_range(UsesOutsideCycleWithDivergentExit.begin(),
+  UsesOutsideCycleWithDivergentExit.end());
+  }
 
 protected:
   /// \brief Value/block pair representing a single phi input.
@@ -1129,6 +1141,14 @@ void GenericUniformityAnalysisImpl::compute() {
   }
 }
 
+template 
+void GenericUniformityAnalysisImpl<
+ContextT>::recordUseOutsideCycleWithDivergentExit(const InstructionT *Inst,
+  const InstructionT *User,
+  const CycleT *Cycle) {
+  UsesOutsideCycleWithDivergentExit.emplace_back(Inst, User, Cycle);
+}
+
 template 
 bool GenericUniformityAnalysisImpl::isAlwaysUniform(
 const InstructionT &Instr) const {
@@ -1180,6 +1200,16 @@ void 
GenericUniformityAnalysisImpl::print(raw_ostream &OS) const {
 }
   }
 
+  if (!UsesOutsideCycleWithDivergentExit.empty()) {
+OS << "\nUSES OUTSIDE CYCLES WITH DIVERGENT EXIT:\n";
+
+for (auto [Inst, UseInst, Cycle] : UsesOutsideCycleWithDivergentExit) {
+  OS << "Inst:" << Context.print(Inst)
+ << "Used by :" << Context.print(UseInst)
+ << "Outside cycle :" << Cycle->print(Context) << "\n\n";
+}
+  }
+
   for (auto &block : F) {
 OS << "\nBLOCK " << Context.print(&block) << '\n';
 
@@ -1210,6 +1240,13 @@ void 
GenericUniformityAnalysisImpl::print(raw_ostream &OS) const {
   }
 }
 
+template 
+iterator_range::UOCWDE *>
+GenericUniformityInfo::getUsesOutsideCycleWithDivergentExit() const {
+  return make_range(DA->UsesOutsideCycleWithDivergentExit.begin(),
+DA->UsesOutsideCycleWithDivergentExit.end());

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering i1 (PR #124299)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

petar-avramovic wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/124299?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#124299** https://app.graphite.dev/github/pr/llvm/llvm-project/124299?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> šŸ‘ˆ https://app.graphite.dev/github/pr/llvm/llvm-project/124299?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#124298** https://app.graphite.dev/github/pr/llvm/llvm-project/124298?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#124297** https://app.graphite.dev/github/pr/llvm/llvm-project/124297?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/124299
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

petar-avramovic wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/124298?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#124299** https://app.graphite.dev/github/pr/llvm/llvm-project/124299?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#124298** https://app.graphite.dev/github/pr/llvm/llvm-project/124298?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> šŸ‘ˆ https://app.graphite.dev/github/pr/llvm/llvm-project/124298?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#124297** https://app.graphite.dev/github/pr/llvm/llvm-project/124297?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/124298
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic ready_for_review 
https://github.com/llvm/llvm-project/pull/124298
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering i1 (PR #124299)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Petar Avramovic (petar-avramovic)


Changes

Use of i1 outside of the cycle, both uniform and divergent,
is lane mask(in sgpr) that contains i1 at iteration that lane
exited the cycle.
Create phi that merges lane mask across all iterations.

---

Patch is 124.89 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/124299.diff


9 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp 
(+55) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
 (+20-10) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
 (+33-19) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll
 (+87-69) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir
 (+160-127) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.ll 
(+64-59) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir 
(+104-88) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.ll 
(+36-23) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.mir 
(+55-34) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
index d8cd1e7379c93f..7e8b9d5524be32 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
@@ -80,6 +80,7 @@ class DivergenceLoweringHelper : public PhiLoweringHelper {
   void constrainAsLaneMask(Incoming &In) override;
 
   bool lowerTempDivergence();
+  bool lowerTempDivergenceI1();
 };
 
 DivergenceLoweringHelper::DivergenceLoweringHelper(
@@ -221,6 +222,54 @@ bool DivergenceLoweringHelper::lowerTempDivergence() {
   return false;
 }
 
+bool DivergenceLoweringHelper::lowerTempDivergenceI1() {
+  MachineRegisterInfo::VRegAttrs BoolS1 = {ST->getBoolRC(), LLT::scalar(1)};
+  initializeLaneMaskRegisterAttributes(BoolS1);
+
+  for (auto [Inst, UseInst, Cycle] : MUI->get_TDCs()) {
+Register Reg = Inst->getOperand(0).getReg();
+if (MRI->getType(Reg) != LLT::scalar(1))
+  continue;
+
+Register MergedMask = MRI->createVirtualRegister(BoolS1);
+Register PrevIterMask = MRI->createVirtualRegister(BoolS1);
+
+MachineBasicBlock *CycleHeaderMBB = Cycle->getHeader();
+SmallVector ExitingBlocks;
+Cycle->getExitingBlocks(ExitingBlocks);
+assert(ExitingBlocks.size() == 1);
+MachineBasicBlock *CycleExitingMBB = ExitingBlocks[0];
+
+B.setInsertPt(*CycleHeaderMBB, CycleHeaderMBB->begin());
+auto CrossIterPHI = B.buildInstr(AMDGPU::PHI).addDef(PrevIterMask);
+
+// We only care about cycle iterration path - merge Reg with previous
+// iteration. For other incomings use implicit def.
+// Predecessors should be CyclePredecessor and CycleExitingMBB.
+// In older versions of irreducible control flow lowering there could be
+// cases with more predecessors. To keep this lowering as generic as
+// possible also handle those cases.
+for (auto MBB : CycleHeaderMBB->predecessors()) {
+  if (MBB == CycleExitingMBB) {
+CrossIterPHI.addReg(MergedMask);
+  } else {
+B.setInsertPt(*MBB, MBB->getFirstTerminator());
+auto ImplDef = B.buildInstr(AMDGPU::IMPLICIT_DEF, {BoolS1}, {});
+CrossIterPHI.addReg(ImplDef.getReg(0));
+  }
+  CrossIterPHI.addMBB(MBB);
+}
+
+buildMergeLaneMasks(*CycleExitingMBB, 
CycleExitingMBB->getFirstTerminator(),
+{}, MergedMask, PrevIterMask, Reg);
+
+replaceUsesOfRegInInstWith(Reg, const_cast(UseInst),
+   MergedMask);
+  }
+
+  return false;
+}
+
 } // End anonymous namespace.
 
 INITIALIZE_PASS_BEGIN(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE,
@@ -260,6 +309,12 @@ bool 
AMDGPUGlobalISelDivergenceLowering::runOnMachineFunction(
 
   // Non-i1 temporal divergence lowering.
   Changed |= Helper.lowerTempDivergence();
+  // This covers both uniform and divergent i1s. Lane masks are in sgpr and 
need
+  // to be updated in each iteration.
+  Changed |= Helper.lowerTempDivergenceI1();
+  // Temporal divergence lowering of divergent i1 phi used outside of the cycle
+  // could also be handled by lowerPhis but we do it in lowerTempDivergenceI1
+  // since in some case lowerPhis does unnecessary lane mask merging.
   Changed |= Helper.lowerPhis();
   return Changed;
 }
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
index 65c96a3db5bbfa..11acd451d98d7d 100644
--- 
a/llvm/test/CodeGen/AMD

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: Petar Avramovic (petar-avramovic)


Changes

Record all uses outside cycle with divergent exit during
propagateTemporalDivergence in Uniformity analysis.
With this list of candidates for temporal divergence lowering,
excluding known lane masks from control flow intrinsics,
find sources from inside the cycle that are not i1 and uniform.
Temporal divergence lowering (non i1):
create copy(v_mov) to vgpr, with implicit exec (to stop other
passes from moving this copy outside of the cycle) and use this
vgpr outside of the cycle instead of original uniform source.

---

Patch is 23.56 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/124298.diff


12 Files Affected:

- (modified) llvm/include/llvm/ADT/GenericUniformityImpl.h (+37) 
- (modified) llvm/include/llvm/ADT/GenericUniformityInfo.h (+6) 
- (modified) llvm/lib/Analysis/UniformityAnalysis.cpp (+1-2) 
- (modified) llvm/lib/CodeGen/MachineUniformityAnalysis.cpp (+4-4) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp 
(+45-2) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp (+20-4) 
- (modified) llvm/lib/Target/AMDGPU/SILowerI1Copies.h (+6) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
 (+4-3) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir
 (+10-9) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-reg.ll (+9-9) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-reg.mir 
(+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui.ll (+9-8) 


``diff
diff --git a/llvm/include/llvm/ADT/GenericUniformityImpl.h 
b/llvm/include/llvm/ADT/GenericUniformityImpl.h
index bd09f4fe43e087..91ee0e41332199 100644
--- a/llvm/include/llvm/ADT/GenericUniformityImpl.h
+++ b/llvm/include/llvm/ADT/GenericUniformityImpl.h
@@ -342,6 +342,10 @@ template  class 
GenericUniformityAnalysisImpl {
   typename SyncDependenceAnalysisT::DivergenceDescriptor;
   using BlockLabelMapT = typename SyncDependenceAnalysisT::BlockLabelMap;
 
+  // Use outside cycle with divergent exit
+  using UOCWDE =
+  std::tuple;
+
   GenericUniformityAnalysisImpl(const DominatorTreeT &DT, const CycleInfoT &CI,
 const TargetTransformInfo *TTI)
   : Context(CI.getSSAContext()), F(*Context.getFunction()), CI(CI),
@@ -395,6 +399,14 @@ template  class 
GenericUniformityAnalysisImpl {
   }
 
   void print(raw_ostream &out) const;
+  SmallVector UsesOutsideCycleWithDivergentExit;
+  void recordUseOutsideCycleWithDivergentExit(const InstructionT *,
+  const InstructionT *,
+  const CycleT *);
+  inline iterator_range getUsesOutsideCycleWithDivergentExit() const 
{
+return make_range(UsesOutsideCycleWithDivergentExit.begin(),
+  UsesOutsideCycleWithDivergentExit.end());
+  }
 
 protected:
   /// \brief Value/block pair representing a single phi input.
@@ -1129,6 +1141,14 @@ void GenericUniformityAnalysisImpl::compute() {
   }
 }
 
+template 
+void GenericUniformityAnalysisImpl<
+ContextT>::recordUseOutsideCycleWithDivergentExit(const InstructionT *Inst,
+  const InstructionT *User,
+  const CycleT *Cycle) {
+  UsesOutsideCycleWithDivergentExit.emplace_back(Inst, User, Cycle);
+}
+
 template 
 bool GenericUniformityAnalysisImpl::isAlwaysUniform(
 const InstructionT &Instr) const {
@@ -1180,6 +1200,16 @@ void 
GenericUniformityAnalysisImpl::print(raw_ostream &OS) const {
 }
   }
 
+  if (!UsesOutsideCycleWithDivergentExit.empty()) {
+OS << "\nUSES OUTSIDE CYCLES WITH DIVERGENT EXIT:\n";
+
+for (auto [Inst, UseInst, Cycle] : UsesOutsideCycleWithDivergentExit) {
+  OS << "Inst:" << Context.print(Inst)
+ << "Used by :" << Context.print(UseInst)
+ << "Outside cycle :" << Cycle->print(Context) << "\n\n";
+}
+  }
+
   for (auto &block : F) {
 OS << "\nBLOCK " << Context.print(&block) << '\n';
 
@@ -1210,6 +1240,13 @@ void 
GenericUniformityAnalysisImpl::print(raw_ostream &OS) const {
   }
 }
 
+template 
+iterator_range::UOCWDE *>
+GenericUniformityInfo::getUsesOutsideCycleWithDivergentExit() const {
+  return make_range(DA->UsesOutsideCycleWithDivergentExit.begin(),
+DA->UsesOutsideCycleWithDivergentExit.end());
+}
+
 template 
 bool GenericUniformityInfo::hasDivergence() const {
   return DA->hasDivergence();
diff --git a/llvm/include/llvm/ADT/GenericUniformityInfo.h 
b/llvm/include/llvm/ADT/GenericUniformityInfo.h
index e53afccc020b46..660fd6d46114d7 100644
--- a/llvm/include/llvm/ADT/GenericUniformityInfo.h
+++ b/llvm/include/llvm/ADT/GenericUniformityInfo.h
@@ -40

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-analysis

Author: Petar Avramovic (petar-avramovic)


Changes

Record all uses outside cycle with divergent exit during
propagateTemporalDivergence in Uniformity analysis.
With this list of candidates for temporal divergence lowering,
excluding known lane masks from control flow intrinsics,
find sources from inside the cycle that are not i1 and uniform.
Temporal divergence lowering (non i1):
create copy(v_mov) to vgpr, with implicit exec (to stop other
passes from moving this copy outside of the cycle) and use this
vgpr outside of the cycle instead of original uniform source.

---

Patch is 23.56 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/124298.diff


12 Files Affected:

- (modified) llvm/include/llvm/ADT/GenericUniformityImpl.h (+37) 
- (modified) llvm/include/llvm/ADT/GenericUniformityInfo.h (+6) 
- (modified) llvm/lib/Analysis/UniformityAnalysis.cpp (+1-2) 
- (modified) llvm/lib/CodeGen/MachineUniformityAnalysis.cpp (+4-4) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp 
(+45-2) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp (+20-4) 
- (modified) llvm/lib/Target/AMDGPU/SILowerI1Copies.h (+6) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
 (+4-3) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir
 (+10-9) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-reg.ll (+9-9) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-reg.mir 
(+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mui.ll (+9-8) 


``diff
diff --git a/llvm/include/llvm/ADT/GenericUniformityImpl.h 
b/llvm/include/llvm/ADT/GenericUniformityImpl.h
index bd09f4fe43e087..91ee0e41332199 100644
--- a/llvm/include/llvm/ADT/GenericUniformityImpl.h
+++ b/llvm/include/llvm/ADT/GenericUniformityImpl.h
@@ -342,6 +342,10 @@ template  class 
GenericUniformityAnalysisImpl {
   typename SyncDependenceAnalysisT::DivergenceDescriptor;
   using BlockLabelMapT = typename SyncDependenceAnalysisT::BlockLabelMap;
 
+  // Use outside cycle with divergent exit
+  using UOCWDE =
+  std::tuple;
+
   GenericUniformityAnalysisImpl(const DominatorTreeT &DT, const CycleInfoT &CI,
 const TargetTransformInfo *TTI)
   : Context(CI.getSSAContext()), F(*Context.getFunction()), CI(CI),
@@ -395,6 +399,14 @@ template  class 
GenericUniformityAnalysisImpl {
   }
 
   void print(raw_ostream &out) const;
+  SmallVector UsesOutsideCycleWithDivergentExit;
+  void recordUseOutsideCycleWithDivergentExit(const InstructionT *,
+  const InstructionT *,
+  const CycleT *);
+  inline iterator_range getUsesOutsideCycleWithDivergentExit() const 
{
+return make_range(UsesOutsideCycleWithDivergentExit.begin(),
+  UsesOutsideCycleWithDivergentExit.end());
+  }
 
 protected:
   /// \brief Value/block pair representing a single phi input.
@@ -1129,6 +1141,14 @@ void GenericUniformityAnalysisImpl::compute() {
   }
 }
 
+template 
+void GenericUniformityAnalysisImpl<
+ContextT>::recordUseOutsideCycleWithDivergentExit(const InstructionT *Inst,
+  const InstructionT *User,
+  const CycleT *Cycle) {
+  UsesOutsideCycleWithDivergentExit.emplace_back(Inst, User, Cycle);
+}
+
 template 
 bool GenericUniformityAnalysisImpl::isAlwaysUniform(
 const InstructionT &Instr) const {
@@ -1180,6 +1200,16 @@ void 
GenericUniformityAnalysisImpl::print(raw_ostream &OS) const {
 }
   }
 
+  if (!UsesOutsideCycleWithDivergentExit.empty()) {
+OS << "\nUSES OUTSIDE CYCLES WITH DIVERGENT EXIT:\n";
+
+for (auto [Inst, UseInst, Cycle] : UsesOutsideCycleWithDivergentExit) {
+  OS << "Inst:" << Context.print(Inst)
+ << "Used by :" << Context.print(UseInst)
+ << "Outside cycle :" << Cycle->print(Context) << "\n\n";
+}
+  }
+
   for (auto &block : F) {
 OS << "\nBLOCK " << Context.print(&block) << '\n';
 
@@ -1210,6 +1240,13 @@ void 
GenericUniformityAnalysisImpl::print(raw_ostream &OS) const {
   }
 }
 
+template 
+iterator_range::UOCWDE *>
+GenericUniformityInfo::getUsesOutsideCycleWithDivergentExit() const {
+  return make_range(DA->UsesOutsideCycleWithDivergentExit.begin(),
+DA->UsesOutsideCycleWithDivergentExit.end());
+}
+
 template 
 bool GenericUniformityInfo::hasDivergence() const {
   return DA->hasDivergence();
diff --git a/llvm/include/llvm/ADT/GenericUniformityInfo.h 
b/llvm/include/llvm/ADT/GenericUniformityInfo.h
index e53afccc020b46..660fd6d46114d7 100644
--- a/llvm/include/llvm/ADT/GenericUniformityInfo.h
+++ b/llvm/include/llvm/ADT/GenericUniformityInfo.h
@@ -40,6

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering i1 (PR #124299)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic ready_for_review 
https://github.com/llvm/llvm-project/pull/124299
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-24 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 1728ab49b46a31b63d8ecdc81fe87851aa40a725 
3e04401258c91639105b1f2f17a84badbdf928ae --extensions cpp,h -- 
llvm/include/llvm/ADT/GenericUniformityImpl.h 
llvm/include/llvm/ADT/GenericUniformityInfo.h 
llvm/lib/Analysis/UniformityAnalysis.cpp 
llvm/lib/CodeGen/MachineUniformityAnalysis.cpp 
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp 
llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp 
llvm/lib/Target/AMDGPU/SILowerI1Copies.h
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp
index 452d754985..8a0c9faa34 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp
@@ -225,7 +225,8 @@ bool 
AMDGPURegBankSelect::runOnMachineFunction(MachineFunction &MF) {
   getAnalysis().getUniformityInfo();
   MachineRegisterInfo &MRI = *B.getMRI();
   const GCNSubtarget &ST = MF.getSubtarget();
-  RegBankSelectHelper RBSHelper(B, ILMA, MUI, *ST.getRegisterInfo(), 
*ST.getRegBankInfo());
+  RegBankSelectHelper RBSHelper(B, ILMA, MUI, *ST.getRegisterInfo(),
+*ST.getRegBankInfo());
   // Virtual registers at this point don't have register banks.
   // Virtual registers in def and use operands of already inst-selected
   // instruction have register class.

``




https://github.com/llvm/llvm-project/pull/124298
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [PassBuilder][CodeGen] Add callback style pass buider (PR #116913)

2025-01-24 Thread Akshat Oke via llvm-branch-commits

optimisan wrote:

> I created https://github.com/llvm/llvm-project/pull/76714, but disabling 
> arbitrary passes is not we expect. Maybe we could add an allowlist as a 
> compromise...

Okay, I see. Will look if other solutions are possible as well.

https://github.com/llvm/llvm-project/pull/116913
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits

https://github.com/skatrak commented:

Thank you Andrew, I have some minor comments but this generally looks fine to 
me. I'm not that familiar with mapping, so it's likely I would miss nontrivial 
problems if there were any, though šŸ˜….

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -3197,37 +3303,77 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
 // what we support as expected.
 llvm::omp::OpenMPOffloadMappingFlags mapFlag = mapData.Types[mapDataIndex];
 ompBuilder.setCorrectMemberOfFlag(mapFlag, memberOfFlag);
-combinedInfo.Types.emplace_back(mapFlag);
-combinedInfo.DevicePointers.emplace_back(
-llvm::OpenMPIRBuilder::DeviceInfoTy::None);
-combinedInfo.Names.emplace_back(LLVM::createMappingInformation(
-mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder));
-combinedInfo.BasePointers.emplace_back(mapData.BasePointers[mapDataIndex]);
-combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]);
-combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]);
-  }
-  return memberOfFlag;
-}
-
-// The intent is to verify if the mapped data being passed is a
-// pointer -> pointee that requires special handling in certain cases,
-// e.g. applying the OMP_MAP_PTR_AND_OBJ map type.
-//
-// There may be a better way to verify this, but unfortunately with
-// opaque pointers we lose the ability to easily check if something is
-// a pointer whilst maintaining access to the underlying type.
-static bool checkIfPointerMap(omp::MapInfoOp mapOp) {
-  // If we have a varPtrPtr field assigned then the underlying type is a 
pointer
-  if (mapOp.getVarPtrPtr())
-return true;
 
-  // If the map data is declare target with a link clause, then it's 
represented
-  // as a pointer when we lower it to LLVM-IR even if at the MLIR level it has
-  // no relation to pointers.
-  if (isDeclareTargetLink(mapOp.getVarPtr()))
-return true;
+if (targetDirective == TargetDirective::TargetUpdate) {
+  combinedInfo.Types.emplace_back(mapFlag);
+  combinedInfo.DevicePointers.emplace_back(
+  mapData.DevicePointers[mapDataIndex]);
+  combinedInfo.Names.emplace_back(LLVM::createMappingInformation(
+  mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder));
+  combinedInfo.BasePointers.emplace_back(
+  mapData.BasePointers[mapDataIndex]);
+  combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]);
+  combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]);
+} else {
+  llvm::SmallVector overlapIdxs;
+  // Find all of the members that "overlap", i.e. occlude other members 
that
+  // were mapped alongside the parent, e.g. member [0], occludes
+  getOverlappedMembers(overlapIdxs, mapData, parentClause);
+  // We need to make sure the overlapped members are sorted in order of
+  // lowest address to highest address

skatrak wrote:

```suggestion
  // lowest address to highest address.
```

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -3110,6 +3132,91 @@ calculateBoundsOffset(LLVM::ModuleTranslation 
&moduleTranslation,
   return idx;
 }
 
+// Gathers members that are overlapping in the parent, excluding members that
+// themselves overlap, keeping the top-most (closest to parents level) map.
+static void getOverlappedMembers(llvm::SmallVector &overlapMapDataIdxs,
+ MapInfoData &mapData,
+ omp::MapInfoOp parentOp) {
+  // No members mapped, no overlaps.
+  if (parentOp.getMembers().empty())
+return;
+
+  // Single member, we can insert and return early.
+  if (parentOp.getMembers().size() == 1) {
+overlapMapDataIdxs.push_back(0);
+return;
+  }
+
+  // 1) collect list of top-level overlapping members from MemberOp
+  llvm::SmallVector> memberByIndex;
+  mlir::ArrayAttr indexAttr = parentOp.getMembersIndexAttr();
+  for (auto [memIndex, indicesAttr] : llvm::enumerate(indexAttr))
+memberByIndex.push_back(
+std::make_pair(memIndex, mlir::cast(indicesAttr)));
+
+  // Sort the smallest first (higher up the parent -> member chain), so that
+  // when we remove members, we remove as much as we can in the initial
+  // iterations, shortening the number of passes required.
+  llvm::sort(memberByIndex.begin(), memberByIndex.end(),
+ [&](auto a, auto b) { return a.second.size() < b.second.size(); 
});
+
+  auto getAsIntegers = [](mlir::ArrayAttr values) {
+llvm::SmallVector ints;
+ints.reserve(values.size());
+llvm::transform(values, std::back_inserter(ints),
+[](mlir::Attribute value) {
+  return mlir::cast(value).getInt();
+});
+return ints;
+  };
+
+  // Remove elements from the vector if there is a parent element that
+  // supersedes it. i.e. if member [0] is mapped, we can remove members [0,1],
+  // [0,2].. etc.
+  for (auto v : make_early_inc_range(memberByIndex)) {
+auto vArr = getAsIntegers(v.second);
+memberByIndex.erase(

skatrak wrote:

Do we know for sure this always works? Reading the documentation for 
`make_early_inc_range`, my understanding is that we're allowed to mutate the 
underlying range as long as we don't invalidate the next iterator. But, if we 
try to delete elements which could be anywhere in the range, it seems possible 
that we would end up doing just that.

Maybe it would be safer to just create an integer set of to-be-skipped elements 
and only add to `overlapMapDataIdxs` elements in `memberByIndex` which are not 
part of that set.

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -3197,37 +3303,77 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
 // what we support as expected.
 llvm::omp::OpenMPOffloadMappingFlags mapFlag = mapData.Types[mapDataIndex];
 ompBuilder.setCorrectMemberOfFlag(mapFlag, memberOfFlag);
-combinedInfo.Types.emplace_back(mapFlag);
-combinedInfo.DevicePointers.emplace_back(
-llvm::OpenMPIRBuilder::DeviceInfoTy::None);
-combinedInfo.Names.emplace_back(LLVM::createMappingInformation(
-mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder));
-combinedInfo.BasePointers.emplace_back(mapData.BasePointers[mapDataIndex]);
-combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]);
-combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]);
-  }
-  return memberOfFlag;
-}
-
-// The intent is to verify if the mapped data being passed is a
-// pointer -> pointee that requires special handling in certain cases,
-// e.g. applying the OMP_MAP_PTR_AND_OBJ map type.
-//
-// There may be a better way to verify this, but unfortunately with
-// opaque pointers we lose the ability to easily check if something is
-// a pointer whilst maintaining access to the underlying type.
-static bool checkIfPointerMap(omp::MapInfoOp mapOp) {
-  // If we have a varPtrPtr field assigned then the underlying type is a 
pointer
-  if (mapOp.getVarPtrPtr())
-return true;
 
-  // If the map data is declare target with a link clause, then it's 
represented
-  // as a pointer when we lower it to LLVM-IR even if at the MLIR level it has
-  // no relation to pointers.
-  if (isDeclareTargetLink(mapOp.getVarPtr()))
-return true;
+if (targetDirective == TargetDirective::TargetUpdate) {
+  combinedInfo.Types.emplace_back(mapFlag);
+  combinedInfo.DevicePointers.emplace_back(
+  mapData.DevicePointers[mapDataIndex]);
+  combinedInfo.Names.emplace_back(LLVM::createMappingInformation(
+  mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder));
+  combinedInfo.BasePointers.emplace_back(
+  mapData.BasePointers[mapDataIndex]);
+  combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]);
+  combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]);
+} else {
+  llvm::SmallVector overlapIdxs;
+  // Find all of the members that "overlap", i.e. occlude other members 
that
+  // were mapped alongside the parent, e.g. member [0], occludes
+  getOverlappedMembers(overlapIdxs, mapData, parentClause);
+  // We need to make sure the overlapped members are sorted in order of
+  // lowest address to highest address
+  sortMapIndices(overlapIdxs, parentClause);
+
+  lowAddr = builder.CreatePointerCast(mapData.Pointers[mapDataIndex],
+  builder.getPtrTy());
+  highAddr = builder.CreatePointerCast(
+  builder.CreateConstGEP1_32(mapData.BaseType[mapDataIndex],
+ mapData.Pointers[mapDataIndex], 1),
+  builder.getPtrTy());
+
+  // TODO: We may want to skip arrays/array sections in this as Clang does
+  // so it appears to be an optimisation rather than a neccessity though,
+  // but this requires further investigation. However, we would have to 
make
+  // sure to not exclude maps with bounds that ARE pointers, as these are
+  // processed as seperate components, i.e. pointer + data.

skatrak wrote:

```suggestion
  // processed as separate components, i.e. pointer + data.
```

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -3110,6 +3132,91 @@ calculateBoundsOffset(LLVM::ModuleTranslation 
&moduleTranslation,
   return idx;
 }
 
+// Gathers members that are overlapping in the parent, excluding members that
+// themselves overlap, keeping the top-most (closest to parents level) map.
+static void getOverlappedMembers(llvm::SmallVector &overlapMapDataIdxs,

skatrak wrote:

```suggestion
static void getOverlappedMembers(llvm::SmallVectorImpl 
&overlapMapDataIdxs,
```

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, 
omp::MapInfoOp memberOp) {
   return std::distance(mapData.MapClause.begin(), res);
 }
 
-static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo,
-bool first) {
-  ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
-  // Only 1 member has been mapped, we can return it.
-  if (indexAttr.size() == 1)
-return cast(mapInfo.getMembers()[0].getDefiningOp());
+static void sortMapIndices(llvm::SmallVector &indices,

skatrak wrote:

General nit for changes in this file: There's a `using namespace mlir`, so we 
can remove `mlir::`. Same for `llvm::` cast-style functions, which are present 
in the `mlir` namespace as well.

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, 
omp::MapInfoOp memberOp) {
   return std::distance(mapData.MapClause.begin(), res);
 }
 
-static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo,
-bool first) {
-  ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
-  // Only 1 member has been mapped, we can return it.
-  if (indexAttr.size() == 1)
-return cast(mapInfo.getMembers()[0].getDefiningOp());
+static void sortMapIndices(llvm::SmallVector &indices,
+   mlir::omp::MapInfoOp mapInfo,
+   bool ascending = true) {
+  mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
+  if (indexAttr.empty() || indexAttr.size() == 1 || indices.empty() ||
+  indices.size() == 1)
+return;
 
-  llvm::SmallVector indices(indexAttr.size());
-  std::iota(indices.begin(), indices.end(), 0);
+  llvm::sort(
+  indices.begin(), indices.end(), [&](const size_t a, const size_t b) {
+auto memberIndicesA = mlir::cast(indexAttr[a]);
+auto memberIndicesB = mlir::cast(indexAttr[b]);
+
+size_t smallestMember = memberIndicesA.size() < memberIndicesB.size()
+? memberIndicesA.size()
+: memberIndicesB.size();
 
-  llvm::sort(indices.begin(), indices.end(),
- [&](const size_t a, const size_t b) {
-   auto memberIndicesA = cast(indexAttr[a]);
-   auto memberIndicesB = cast(indexAttr[b]);
-   for (const auto it : llvm::zip(memberIndicesA, memberIndicesB)) 
{
- int64_t aIndex = cast(std::get<0>(it)).getInt();
- int64_t bIndex = cast(std::get<1>(it)).getInt();
+for (size_t i = 0; i < smallestMember; ++i) {

skatrak wrote:

Nit: `llvm::zip` already iterates as long as both ranges have elements, so it 
stops at the shortest. I think it's better to use it in this case.

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -3197,37 +3303,77 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
 // what we support as expected.
 llvm::omp::OpenMPOffloadMappingFlags mapFlag = mapData.Types[mapDataIndex];
 ompBuilder.setCorrectMemberOfFlag(mapFlag, memberOfFlag);
-combinedInfo.Types.emplace_back(mapFlag);
-combinedInfo.DevicePointers.emplace_back(
-llvm::OpenMPIRBuilder::DeviceInfoTy::None);
-combinedInfo.Names.emplace_back(LLVM::createMappingInformation(
-mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder));
-combinedInfo.BasePointers.emplace_back(mapData.BasePointers[mapDataIndex]);
-combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]);
-combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]);
-  }
-  return memberOfFlag;
-}
-
-// The intent is to verify if the mapped data being passed is a
-// pointer -> pointee that requires special handling in certain cases,
-// e.g. applying the OMP_MAP_PTR_AND_OBJ map type.
-//
-// There may be a better way to verify this, but unfortunately with
-// opaque pointers we lose the ability to easily check if something is
-// a pointer whilst maintaining access to the underlying type.
-static bool checkIfPointerMap(omp::MapInfoOp mapOp) {
-  // If we have a varPtrPtr field assigned then the underlying type is a 
pointer
-  if (mapOp.getVarPtrPtr())
-return true;
 
-  // If the map data is declare target with a link clause, then it's 
represented
-  // as a pointer when we lower it to LLVM-IR even if at the MLIR level it has
-  // no relation to pointers.
-  if (isDeclareTargetLink(mapOp.getVarPtr()))
-return true;
+if (targetDirective == TargetDirective::TargetUpdate) {
+  combinedInfo.Types.emplace_back(mapFlag);
+  combinedInfo.DevicePointers.emplace_back(
+  mapData.DevicePointers[mapDataIndex]);
+  combinedInfo.Names.emplace_back(LLVM::createMappingInformation(
+  mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder));
+  combinedInfo.BasePointers.emplace_back(
+  mapData.BasePointers[mapDataIndex]);
+  combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]);
+  combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]);
+} else {
+  llvm::SmallVector overlapIdxs;
+  // Find all of the members that "overlap", i.e. occlude other members 
that
+  // were mapped alongside the parent, e.g. member [0], occludes

skatrak wrote:

Nit: This comment seems to be incomplete.

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, 
omp::MapInfoOp memberOp) {
   return std::distance(mapData.MapClause.begin(), res);
 }
 
-static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo,
-bool first) {
-  ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
-  // Only 1 member has been mapped, we can return it.
-  if (indexAttr.size() == 1)
-return cast(mapInfo.getMembers()[0].getDefiningOp());
+static void sortMapIndices(llvm::SmallVector &indices,
+   mlir::omp::MapInfoOp mapInfo,
+   bool ascending = true) {
+  mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
+  if (indexAttr.empty() || indexAttr.size() == 1 || indices.empty() ||
+  indices.size() == 1)
+return;
 
-  llvm::SmallVector indices(indexAttr.size());
-  std::iota(indices.begin(), indices.end(), 0);
+  llvm::sort(
+  indices.begin(), indices.end(), [&](const size_t a, const size_t b) {
+auto memberIndicesA = mlir::cast(indexAttr[a]);
+auto memberIndicesB = mlir::cast(indexAttr[b]);
+
+size_t smallestMember = memberIndicesA.size() < memberIndicesB.size()
+? memberIndicesA.size()
+: memberIndicesB.size();
 
-  llvm::sort(indices.begin(), indices.end(),
- [&](const size_t a, const size_t b) {
-   auto memberIndicesA = cast(indexAttr[a]);
-   auto memberIndicesB = cast(indexAttr[b]);
-   for (const auto it : llvm::zip(memberIndicesA, memberIndicesB)) 
{
- int64_t aIndex = cast(std::get<0>(it)).getInt();
- int64_t bIndex = cast(std::get<1>(it)).getInt();
+for (size_t i = 0; i < smallestMember; ++i) {
+  int64_t aIndex =
+  mlir::cast(memberIndicesA.getValue()[i])
+  .getInt();
+  int64_t bIndex =
+  mlir::cast(memberIndicesB.getValue()[i])
+  .getInt();
 
- if (aIndex == bIndex)
-   continue;
+  if (aIndex == bIndex)
+continue;
 
- if (aIndex < bIndex)
-   return first;
+  if (aIndex < bIndex)
+return ascending;
 
- if (aIndex > bIndex)
-   return !first;
-   }
+  if (aIndex > bIndex)
+return !ascending;
+}
 
-   // Iterated the up until the end of the smallest member and
-   // they were found to be equal up to that point, so select
-   // the member with the lowest index count, so the "parent"
-   return memberIndicesA.size() < memberIndicesB.size();
- });
+// Iterated up until the end of the smallest member and
+// they were found to be equal up to that point, so select
+// the member with the lowest index count, so the "parent"
+return memberIndicesA.size() < memberIndicesB.size();
+  });
+}
+
+static mlir::omp::MapInfoOp
+getFirstOrLastMappedMemberPtr(mlir::omp::MapInfoOp mapInfo, bool first) {
+  mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
+  // Only 1 member has been mapped, we can return it.
+  if (indexAttr.size() == 1)
+if (auto mapOp =
+dyn_cast(mapInfo.getMembers()[0].getDefiningOp()))

skatrak wrote:

Let me know if I understood this wrong, but it seems like there is nothing 
preventing the `llvm::cast` call at the end of this function to trigger an 
assert if there was a single member mapped that wasn't defined by an 
`omp.map.info`.

I don't know whether this function can be expected to return `null`, in which 
case we could replace the `cast` below with a `dyn_cast`, or if this check here 
should be replaced with `return cast(...)`.

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, 
omp::MapInfoOp memberOp) {
   return std::distance(mapData.MapClause.begin(), res);
 }
 
-static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo,
-bool first) {
-  ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
-  // Only 1 member has been mapped, we can return it.
-  if (indexAttr.size() == 1)
-return cast(mapInfo.getMembers()[0].getDefiningOp());
+static void sortMapIndices(llvm::SmallVector &indices,

skatrak wrote:

```suggestion
static void sortMapIndices(llvm::SmallVectorImpl &indices,
```

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits

https://github.com/skatrak edited 
https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, 
omp::MapInfoOp memberOp) {
   return std::distance(mapData.MapClause.begin(), res);
 }
 
-static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo,
-bool first) {
-  ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
-  // Only 1 member has been mapped, we can return it.
-  if (indexAttr.size() == 1)
-return cast(mapInfo.getMembers()[0].getDefiningOp());
+static void sortMapIndices(llvm::SmallVector &indices,
+   mlir::omp::MapInfoOp mapInfo,
+   bool ascending = true) {

skatrak wrote:

It seems a bit overkill to introduce this argument and allow sorting the list 
in reverse order just so that we can get the first or the last element in 
`getFirstOrLastMappedMemberPtr`. Wouldn't it be simpler to just update the 
`mapInfo.getMembers()[indices.front()].getDefiningOp());` expression to take 
`indices.front()` or `indices.back()` based on the `first` argument?

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, 
omp::MapInfoOp memberOp) {
   return std::distance(mapData.MapClause.begin(), res);
 }
 
-static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo,
-bool first) {
-  ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
-  // Only 1 member has been mapped, we can return it.
-  if (indexAttr.size() == 1)
-return cast(mapInfo.getMembers()[0].getDefiningOp());
+static void sortMapIndices(llvm::SmallVector &indices,
+   mlir::omp::MapInfoOp mapInfo,
+   bool ascending = true) {
+  mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
+  if (indexAttr.empty() || indexAttr.size() == 1 || indices.empty() ||
+  indices.size() == 1)
+return;
 
-  llvm::SmallVector indices(indexAttr.size());
-  std::iota(indices.begin(), indices.end(), 0);
+  llvm::sort(
+  indices.begin(), indices.end(), [&](const size_t a, const size_t b) {
+auto memberIndicesA = mlir::cast(indexAttr[a]);
+auto memberIndicesB = mlir::cast(indexAttr[b]);
+
+size_t smallestMember = memberIndicesA.size() < memberIndicesB.size()
+? memberIndicesA.size()
+: memberIndicesB.size();
 
-  llvm::sort(indices.begin(), indices.end(),
- [&](const size_t a, const size_t b) {
-   auto memberIndicesA = cast(indexAttr[a]);
-   auto memberIndicesB = cast(indexAttr[b]);
-   for (const auto it : llvm::zip(memberIndicesA, memberIndicesB)) 
{
- int64_t aIndex = cast(std::get<0>(it)).getInt();
- int64_t bIndex = cast(std::get<1>(it)).getInt();
+for (size_t i = 0; i < smallestMember; ++i) {
+  int64_t aIndex =
+  mlir::cast(memberIndicesA.getValue()[i])
+  .getInt();
+  int64_t bIndex =
+  mlir::cast(memberIndicesB.getValue()[i])
+  .getInt();
 
- if (aIndex == bIndex)
-   continue;
+  if (aIndex == bIndex)
+continue;
 
- if (aIndex < bIndex)
-   return first;
+  if (aIndex < bIndex)
+return ascending;
 
- if (aIndex > bIndex)
-   return !first;
-   }
+  if (aIndex > bIndex)
+return !ascending;
+}
 
-   // Iterated the up until the end of the smallest member and
-   // they were found to be equal up to that point, so select
-   // the member with the lowest index count, so the "parent"
-   return memberIndicesA.size() < memberIndicesB.size();
- });
+// Iterated up until the end of the smallest member and
+// they were found to be equal up to that point, so select
+// the member with the lowest index count, so the "parent"
+return memberIndicesA.size() < memberIndicesB.size();
+  });
+}
+
+static mlir::omp::MapInfoOp
+getFirstOrLastMappedMemberPtr(mlir::omp::MapInfoOp mapInfo, bool first) {
+  mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
+  // Only 1 member has been mapped, we can return it.
+  if (indexAttr.size() == 1)
+if (auto mapOp =
+dyn_cast(mapInfo.getMembers()[0].getDefiningOp()))
+  return mapOp;
+
+  llvm::SmallVector indices;
+  indices.resize(indexAttr.size());

skatrak wrote:

```suggestion
  llvm::SmallVector indices(indexAttr.size());
```

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, 
omp::MapInfoOp memberOp) {
   return std::distance(mapData.MapClause.begin(), res);
 }
 
-static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo,
-bool first) {
-  ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
-  // Only 1 member has been mapped, we can return it.
-  if (indexAttr.size() == 1)
-return cast(mapInfo.getMembers()[0].getDefiningOp());
+static void sortMapIndices(llvm::SmallVector &indices,
+   mlir::omp::MapInfoOp mapInfo,
+   bool ascending = true) {
+  mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
+  if (indexAttr.empty() || indexAttr.size() == 1 || indices.empty() ||
+  indices.size() == 1)
+return;

skatrak wrote:

Nit: I think this isn't necessary. `std::sort`, in which `llvm::sort` seems to 
be based, already returns early in these cases.

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-01-24 Thread Sergio Afonso via llvm-branch-commits


@@ -3197,37 +3303,77 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
 // what we support as expected.
 llvm::omp::OpenMPOffloadMappingFlags mapFlag = mapData.Types[mapDataIndex];
 ompBuilder.setCorrectMemberOfFlag(mapFlag, memberOfFlag);
-combinedInfo.Types.emplace_back(mapFlag);
-combinedInfo.DevicePointers.emplace_back(
-llvm::OpenMPIRBuilder::DeviceInfoTy::None);
-combinedInfo.Names.emplace_back(LLVM::createMappingInformation(
-mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder));
-combinedInfo.BasePointers.emplace_back(mapData.BasePointers[mapDataIndex]);
-combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]);
-combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]);
-  }
-  return memberOfFlag;
-}
-
-// The intent is to verify if the mapped data being passed is a
-// pointer -> pointee that requires special handling in certain cases,
-// e.g. applying the OMP_MAP_PTR_AND_OBJ map type.
-//
-// There may be a better way to verify this, but unfortunately with
-// opaque pointers we lose the ability to easily check if something is
-// a pointer whilst maintaining access to the underlying type.
-static bool checkIfPointerMap(omp::MapInfoOp mapOp) {
-  // If we have a varPtrPtr field assigned then the underlying type is a 
pointer
-  if (mapOp.getVarPtrPtr())
-return true;
 
-  // If the map data is declare target with a link clause, then it's 
represented
-  // as a pointer when we lower it to LLVM-IR even if at the MLIR level it has
-  // no relation to pointers.
-  if (isDeclareTargetLink(mapOp.getVarPtr()))
-return true;
+if (targetDirective == TargetDirective::TargetUpdate) {
+  combinedInfo.Types.emplace_back(mapFlag);
+  combinedInfo.DevicePointers.emplace_back(
+  mapData.DevicePointers[mapDataIndex]);
+  combinedInfo.Names.emplace_back(LLVM::createMappingInformation(
+  mapData.MapClause[mapDataIndex]->getLoc(), ompBuilder));
+  combinedInfo.BasePointers.emplace_back(
+  mapData.BasePointers[mapDataIndex]);
+  combinedInfo.Pointers.emplace_back(mapData.Pointers[mapDataIndex]);
+  combinedInfo.Sizes.emplace_back(mapData.Sizes[mapDataIndex]);
+} else {
+  llvm::SmallVector overlapIdxs;
+  // Find all of the members that "overlap", i.e. occlude other members 
that
+  // were mapped alongside the parent, e.g. member [0], occludes
+  getOverlappedMembers(overlapIdxs, mapData, parentClause);
+  // We need to make sure the overlapped members are sorted in order of
+  // lowest address to highest address
+  sortMapIndices(overlapIdxs, parentClause);
+
+  lowAddr = builder.CreatePointerCast(mapData.Pointers[mapDataIndex],
+  builder.getPtrTy());
+  highAddr = builder.CreatePointerCast(
+  builder.CreateConstGEP1_32(mapData.BaseType[mapDataIndex],
+ mapData.Pointers[mapDataIndex], 1),
+  builder.getPtrTy());
+
+  // TODO: We may want to skip arrays/array sections in this as Clang does
+  // so it appears to be an optimisation rather than a neccessity though,

skatrak wrote:

```suggestion
  // so. It appears to be an optimisation rather than a necessity though,
```

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] Handle directive arguments in OmpDirectiveSpecifier (PR #124278)

2025-01-24 Thread Krzysztof Parzyszek via llvm-branch-commits

https://github.com/kparzysz created 
https://github.com/llvm/llvm-project/pull/124278

Implement parsing and symbol resolution for directives that take arguments. 
There are a few, and most of them take objects. Special handling is needed for 
two that take more specialized arguments: DECLARE MAPPER and DECLARE REDUCTION.

This only affects directives in METADIRECTIVE's WHEN and OTHERWISE clauses. 
Parsing and semantic checks of other cases is unaffected.

>From e230e8ad3bcd09fc28b18f64a84fcd20d6e9bc65 Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Wed, 22 Jan 2025 09:47:44 -0600
Subject: [PATCH] [flang][OpenMP] Handle directive arguments in
 OmpDirectiveSpecifier

Implement parsing and symbol resolution for directives that take
arguments. There are a few, and most of them take objects. Special
handling is needed for two that take more specialized arguments:
DECLARE MAPPER and DECLARE REDUCTION.

This only affects directives in METADIRECTIVE's WHEN and OTHERWISE
clauses. Parsing and semantic checks of other cases is unaffected.
---
 flang/examples/FeatureList/FeatureList.cpp|   1 -
 flang/include/flang/Parser/dump-parse-tree.h  |   9 +-
 flang/include/flang/Parser/parse-tree.h   | 142 ++
 flang/lib/Parser/openmp-parsers.cpp   |  68 +++--
 flang/lib/Parser/unparse.cpp  |  40 +--
 flang/lib/Semantics/check-omp-structure.cpp   |   2 +-
 flang/lib/Semantics/resolve-names.cpp | 130 +++---
 .../Parser/OpenMP/declare-mapper-unparse.f90  |   4 +-
 .../Parser/OpenMP/metadirective-dirspec.f90   | 242 ++
 9 files changed, 517 insertions(+), 121 deletions(-)
 create mode 100644 flang/test/Parser/OpenMP/metadirective-dirspec.f90

diff --git a/flang/examples/FeatureList/FeatureList.cpp 
b/flang/examples/FeatureList/FeatureList.cpp
index 3a689c335c81c0..e35f120d8661ea 100644
--- a/flang/examples/FeatureList/FeatureList.cpp
+++ b/flang/examples/FeatureList/FeatureList.cpp
@@ -514,7 +514,6 @@ struct NodeVisitor {
   READ_FEATURE(OmpReductionClause)
   READ_FEATURE(OmpInReductionClause)
   READ_FEATURE(OmpReductionCombiner)
-  READ_FEATURE(OmpReductionCombiner::FunctionCombiner)
   READ_FEATURE(OmpReductionInitializerClause)
   READ_FEATURE(OmpReductionIdentifier)
   READ_FEATURE(OmpAllocateClause)
diff --git a/flang/include/flang/Parser/dump-parse-tree.h 
b/flang/include/flang/Parser/dump-parse-tree.h
index 1323fd695d4439..ce518c7c3edea0 100644
--- a/flang/include/flang/Parser/dump-parse-tree.h
+++ b/flang/include/flang/Parser/dump-parse-tree.h
@@ -476,6 +476,12 @@ class ParseTreeDumper {
   NODE(parser, NullInit)
   NODE(parser, ObjectDecl)
   NODE(parser, OldParameterStmt)
+  NODE(parser, OmpTypeSpecifier)
+  NODE(parser, OmpTypeNameList)
+  NODE(parser, OmpLocator)
+  NODE(parser, OmpLocatorList)
+  NODE(parser, OmpReductionSpecifier)
+  NODE(parser, OmpArgument)
   NODE(parser, OmpMetadirectiveDirective)
   NODE(parser, OmpMatchClause)
   NODE(parser, OmpOtherwiseClause)
@@ -541,7 +547,7 @@ class ParseTreeDumper {
   NODE(parser, OmpDeclareTargetSpecifier)
   NODE(parser, OmpDeclareTargetWithClause)
   NODE(parser, OmpDeclareTargetWithList)
-  NODE(parser, OmpDeclareMapperSpecifier)
+  NODE(parser, OmpMapperSpecifier)
   NODE(parser, OmpDefaultClause)
   NODE_ENUM(OmpDefaultClause, DataSharingAttribute)
   NODE(parser, OmpVariableCategory)
@@ -624,7 +630,6 @@ class ParseTreeDumper {
   NODE(parser, OmpReductionCombiner)
   NODE(parser, OmpTaskReductionClause)
   NODE(OmpTaskReductionClause, Modifier)
-  NODE(OmpReductionCombiner, FunctionCombiner)
   NODE(parser, OmpReductionInitializerClause)
   NODE(parser, OmpReductionIdentifier)
   NODE(parser, OmpAllocateClause)
diff --git a/flang/include/flang/Parser/parse-tree.h 
b/flang/include/flang/Parser/parse-tree.h
index 2e27b6ea7eafa1..993c1338f7235b 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -3454,15 +3454,7 @@ WRAPPER_CLASS(PauseStmt, std::optional);
 // --- Common definitions
 
 struct OmpClause;
-struct OmpClauseList;
-
-struct OmpDirectiveSpecification {
-  TUPLE_CLASS_BOILERPLATE(OmpDirectiveSpecification);
-  std::tuple>>
-  t;
-  CharBlock source;
-};
+struct OmpDirectiveSpecification;
 
 // 2.1 Directives or clauses may accept a list or extended-list.
 // A list item is a variable, array section or common block name (enclosed
@@ -3475,15 +3467,76 @@ struct OmpObject {
 
 WRAPPER_CLASS(OmpObjectList, std::list);
 
-#define MODIFIER_BOILERPLATE(...) \
-  struct Modifier { \
-using Variant = std::variant<__VA_ARGS__>; \
-UNION_CLASS_BOILERPLATE(Modifier); \
-CharBlock source; \
-Variant u; \
-  }
+// Ref: [4.5:201-207], [5.0:293-299], [5.1:325-331], [5.2:124]
+//
+// reduction-identifier ->
+//base-language-identifier |// since 4.5
+//- |   // since 4.5, until 5.2
+//+ | * | .AND. | .OR. | .EQV. | .NEQV. |   // since 4.5
+//MIN

[llvm-branch-commits] [flang] [flang][OpenMP] Handle directive arguments in OmpDirectiveSpecifier (PR #124278)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-openmp

Author: Krzysztof Parzyszek (kparzysz)


Changes

Implement parsing and symbol resolution for directives that take arguments. 
There are a few, and most of them take objects. Special handling is needed for 
two that take more specialized arguments: DECLARE MAPPER and DECLARE REDUCTION.

This only affects directives in METADIRECTIVE's WHEN and OTHERWISE clauses. 
Parsing and semantic checks of other cases is unaffected.

---

Patch is 37.25 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/124278.diff


9 Files Affected:

- (modified) flang/examples/FeatureList/FeatureList.cpp (-1) 
- (modified) flang/include/flang/Parser/dump-parse-tree.h (+7-2) 
- (modified) flang/include/flang/Parser/parse-tree.h (+97-45) 
- (modified) flang/lib/Parser/openmp-parsers.cpp (+45-23) 
- (modified) flang/lib/Parser/unparse.cpp (+25-15) 
- (modified) flang/lib/Semantics/check-omp-structure.cpp (+1-1) 
- (modified) flang/lib/Semantics/resolve-names.cpp (+98-32) 
- (modified) flang/test/Parser/OpenMP/declare-mapper-unparse.f90 (+2-2) 
- (added) flang/test/Parser/OpenMP/metadirective-dirspec.f90 (+242) 


``diff
diff --git a/flang/examples/FeatureList/FeatureList.cpp 
b/flang/examples/FeatureList/FeatureList.cpp
index 3a689c335c81c0..e35f120d8661ea 100644
--- a/flang/examples/FeatureList/FeatureList.cpp
+++ b/flang/examples/FeatureList/FeatureList.cpp
@@ -514,7 +514,6 @@ struct NodeVisitor {
   READ_FEATURE(OmpReductionClause)
   READ_FEATURE(OmpInReductionClause)
   READ_FEATURE(OmpReductionCombiner)
-  READ_FEATURE(OmpReductionCombiner::FunctionCombiner)
   READ_FEATURE(OmpReductionInitializerClause)
   READ_FEATURE(OmpReductionIdentifier)
   READ_FEATURE(OmpAllocateClause)
diff --git a/flang/include/flang/Parser/dump-parse-tree.h 
b/flang/include/flang/Parser/dump-parse-tree.h
index 1323fd695d4439..ce518c7c3edea0 100644
--- a/flang/include/flang/Parser/dump-parse-tree.h
+++ b/flang/include/flang/Parser/dump-parse-tree.h
@@ -476,6 +476,12 @@ class ParseTreeDumper {
   NODE(parser, NullInit)
   NODE(parser, ObjectDecl)
   NODE(parser, OldParameterStmt)
+  NODE(parser, OmpTypeSpecifier)
+  NODE(parser, OmpTypeNameList)
+  NODE(parser, OmpLocator)
+  NODE(parser, OmpLocatorList)
+  NODE(parser, OmpReductionSpecifier)
+  NODE(parser, OmpArgument)
   NODE(parser, OmpMetadirectiveDirective)
   NODE(parser, OmpMatchClause)
   NODE(parser, OmpOtherwiseClause)
@@ -541,7 +547,7 @@ class ParseTreeDumper {
   NODE(parser, OmpDeclareTargetSpecifier)
   NODE(parser, OmpDeclareTargetWithClause)
   NODE(parser, OmpDeclareTargetWithList)
-  NODE(parser, OmpDeclareMapperSpecifier)
+  NODE(parser, OmpMapperSpecifier)
   NODE(parser, OmpDefaultClause)
   NODE_ENUM(OmpDefaultClause, DataSharingAttribute)
   NODE(parser, OmpVariableCategory)
@@ -624,7 +630,6 @@ class ParseTreeDumper {
   NODE(parser, OmpReductionCombiner)
   NODE(parser, OmpTaskReductionClause)
   NODE(OmpTaskReductionClause, Modifier)
-  NODE(OmpReductionCombiner, FunctionCombiner)
   NODE(parser, OmpReductionInitializerClause)
   NODE(parser, OmpReductionIdentifier)
   NODE(parser, OmpAllocateClause)
diff --git a/flang/include/flang/Parser/parse-tree.h 
b/flang/include/flang/Parser/parse-tree.h
index 2e27b6ea7eafa1..993c1338f7235b 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -3454,15 +3454,7 @@ WRAPPER_CLASS(PauseStmt, std::optional);
 // --- Common definitions
 
 struct OmpClause;
-struct OmpClauseList;
-
-struct OmpDirectiveSpecification {
-  TUPLE_CLASS_BOILERPLATE(OmpDirectiveSpecification);
-  std::tuple>>
-  t;
-  CharBlock source;
-};
+struct OmpDirectiveSpecification;
 
 // 2.1 Directives or clauses may accept a list or extended-list.
 // A list item is a variable, array section or common block name (enclosed
@@ -3475,15 +3467,76 @@ struct OmpObject {
 
 WRAPPER_CLASS(OmpObjectList, std::list);
 
-#define MODIFIER_BOILERPLATE(...) \
-  struct Modifier { \
-using Variant = std::variant<__VA_ARGS__>; \
-UNION_CLASS_BOILERPLATE(Modifier); \
-CharBlock source; \
-Variant u; \
-  }
+// Ref: [4.5:201-207], [5.0:293-299], [5.1:325-331], [5.2:124]
+//
+// reduction-identifier ->
+//base-language-identifier |// since 4.5
+//- |   // since 4.5, until 5.2
+//+ | * | .AND. | .OR. | .EQV. | .NEQV. |   // since 4.5
+//MIN | MAX | IAND | IOR | IEOR // since 4.5
+struct OmpReductionIdentifier {
+  UNION_CLASS_BOILERPLATE(OmpReductionIdentifier);
+  std::variant u;
+};
 
-#define MODIFIERS() std::optional>
+// Ref: [4.5:222:6], [5.0:305:27], [5.1:337:19], [5.2:126:3-4], [6.0:240:27-28]
+//
+// combiner-expression ->   // since 4.5
+//assignment-statement |
+//function-reference
+struct OmpReductionCombiner {
+  UNION_CLASS_BOILERPLATE(OmpReductionCombiner);
+  std::variant u;
+};
+

[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/112866

>From 73554e86fc276e15db22462749aa71324d1e1f41 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 31 Oct 2024 14:10:57 +0100
Subject: [PATCH] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi

Change existing code for G_PHI to match what LLVM-IR version is doing
via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI
since it may appear with an undef operand and getVRegDef can fail.
Most notably this improves number of values that can be allocated
to sgpr register bank in AMDGPURegBankSelect.
Common case here are phis that appear in structurize-cfg lowering
for cycles with multiple exits:
Undef incoming value is coming from block that reached cycle exit
condition, if other incoming is uniform keep the phi uniform despite
the fact it is joining values from pair of blocks that are entered
via divergent condition branch.
---
 llvm/lib/CodeGen/MachineSSAContext.cpp| 27 +-
 .../AMDGPU/MIR/hidden-diverge-gmir.mir| 28 +++
 .../AMDGPU/MIR/hidden-loop-diverge.mir|  4 +-
 .../AMDGPU/MIR/uses-value-from-cycle.mir  |  8 +-
 .../GlobalISel/divergence-structurizer.mir| 80 --
 .../regbankselect-mui-regbanklegalize.mir | 69 ---
 .../regbankselect-mui-regbankselect.mir   | 18 ++--
 .../AMDGPU/GlobalISel/regbankselect-mui.ll| 84 ++-
 .../AMDGPU/GlobalISel/regbankselect-mui.mir   | 51 ++-
 9 files changed, 191 insertions(+), 178 deletions(-)

diff --git a/llvm/lib/CodeGen/MachineSSAContext.cpp 
b/llvm/lib/CodeGen/MachineSSAContext.cpp
index e384187b6e8593..8e13c0916dd9e1 100644
--- a/llvm/lib/CodeGen/MachineSSAContext.cpp
+++ b/llvm/lib/CodeGen/MachineSSAContext.cpp
@@ -54,9 +54,34 @@ const MachineBasicBlock 
*MachineSSAContext::getDefBlock(Register value) const {
   return F->getRegInfo().getVRegDef(value)->getParent();
 }
 
+static bool isUndef(const MachineInstr &MI) {
+  return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF ||
+ MI.getOpcode() == TargetOpcode::IMPLICIT_DEF;
+}
+
+/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue() for G_PHI.
 template <>
 bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) {
-  return Phi.isConstantValuePHI();
+  if (!Phi.isPHI())
+return false;
+
+  // In later passes PHI may appear with an undef operand, getVRegDef can fail.
+  if (Phi.getOpcode() == TargetOpcode::PHI)
+return Phi.isConstantValuePHI();
+
+  // For G_PHI we do equivalent of PHINode::hasConstantOrUndefValue().
+  const MachineRegisterInfo &MRI = Phi.getMF()->getRegInfo();
+  Register This = Phi.getOperand(0).getReg();
+  Register ConstantValue;
+  for (unsigned i = 1, e = Phi.getNumOperands(); i < e; i += 2) {
+Register Incoming = Phi.getOperand(i).getReg();
+if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) {
+  if (ConstantValue && ConstantValue != Incoming)
+return false;
+  ConstantValue = Incoming;
+}
+  }
+  return true;
 }
 
 template <>
diff --git 
a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir 
b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
index ce00edf3363f77..9694a340b5e906 100644
--- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
+++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
@@ -1,24 +1,24 @@
 # RUN: llc -mtriple=amdgcn-- -run-pass=print-machine-uniformity -o - %s 2>&1 | 
FileCheck %s
 # CHECK-LABEL: MachineUniformityInfo for function: hidden_diverge
 # CHECK-LABEL: BLOCK bb.0
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
-# CHECK: DIVERGENT: G_BR %bb.2
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
+# CHECK: DIVERGENT: G_BR %bb.2
 # CHECK-LABEL: BLOCK bb.1
 # CHECK-LABEL: BLOCK bb.2
-# CHECK: D

[llvm-branch-commits] [flang] [mlir] [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (PR #124019)

2025-01-24 Thread Tom Eccles via llvm-branch-commits


@@ -55,15 +55,19 @@ class MapsForPrivatizedSymbolsPass
 std::underlying_type_t>(
 llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO);
 Operation *definingOp = var.getDefiningOp();
-auto declOp = llvm::dyn_cast_or_null(definingOp);
-assert(declOp &&
-   "Expected defining Op of privatized var to be hlfir.declare");
+assert(definingOp &&
+   "Privatizing a block argument without any hlfir.declare");

tblah wrote:

I was nervous to make any functional change to the target stuff because I don't 
know how to test it. The previous implementation also wouldn't have worked for 
block arguments.

I can fix this if somebody at AMD is willing to test `omp target` for me?

https://github.com/llvm/llvm-project/pull/124019
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] a41ded8 - Revert "[GlobalMerge][NFC] Skip sorting by profitability when it is not neede…"

2025-01-24 Thread via llvm-branch-commits

Author: Michael Maitland
Date: 2025-01-24T23:42:18-05:00
New Revision: a41ded832d91141939c1b4aa2e955471a1047755

URL: 
https://github.com/llvm/llvm-project/commit/a41ded832d91141939c1b4aa2e955471a1047755
DIFF: 
https://github.com/llvm/llvm-project/commit/a41ded832d91141939c1b4aa2e955471a1047755.diff

LOG: Revert "[GlobalMerge][NFC] Skip sorting by profitability when it is not 
neede…"

This reverts commit e5e55c04d6af4ae32c99d574f59e632595abf607.

Added: 


Modified: 
llvm/lib/CodeGen/GlobalMerge.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/GlobalMerge.cpp 
b/llvm/lib/CodeGen/GlobalMerge.cpp
index 41e01a1d3ccd52..7b76155b175d1d 100644
--- a/llvm/lib/CodeGen/GlobalMerge.cpp
+++ b/llvm/lib/CodeGen/GlobalMerge.cpp
@@ -423,12 +423,24 @@ bool 
GlobalMergeImpl::doMerge(SmallVectorImpl &Globals,
 }
   }
 
+  // Now we found a bunch of sets of globals used together.  We accumulated
+  // the number of times we encountered the sets (i.e., the number of functions
+  // that use that exact set of globals).
+  //
+  // Multiply that by the size of the set to give us a crude profitability
+  // metric.
+  llvm::stable_sort(UsedGlobalSets,
+[](const UsedGlobalSet &UGS1, const UsedGlobalSet &UGS2) {
+  return UGS1.Globals.count() * UGS1.UsageCount <
+ UGS2.Globals.count() * UGS2.UsageCount;
+});
+
   // We can choose to merge all globals together, but ignore globals never used
   // with another global.  This catches the obviously non-profitable cases of
   // having a single global, but is aggressive enough for any other case.
   if (GlobalMergeIgnoreSingleUse) {
 BitVector AllGlobals(Globals.size());
-for (const UsedGlobalSet &UGS : UsedGlobalSets) {
+for (const UsedGlobalSet &UGS : llvm::reverse(UsedGlobalSets)) {
   if (UGS.UsageCount == 0)
 continue;
   if (UGS.Globals.count() > 1)
@@ -437,16 +449,6 @@ bool 
GlobalMergeImpl::doMerge(SmallVectorImpl &Globals,
 return doMerge(Globals, AllGlobals, M, isConst, AddrSpace);
   }
 
-  // Now we found a bunch of sets of globals used together. We accumulated
-  // the number of times we encountered the sets (i.e., the number of functions
-  // that use that exact set of globals). Multiply that by the size of the set
-  // to give us a crude profitability metric.
-  llvm::stable_sort(UsedGlobalSets,
-[](const UsedGlobalSet &UGS1, const UsedGlobalSet &UGS2) {
-  return UGS1.Globals.count() * UGS1.UsageCount >=
- UGS2.Globals.count() * UGS2.UsageCount;
-});
-
   // Starting from the sets with the best (=biggest) profitability, find a
   // good combination.
   // The ideal (and expensive) solution can only be found by trying all
@@ -456,7 +458,7 @@ bool 
GlobalMergeImpl::doMerge(SmallVectorImpl &Globals,
   BitVector PickedGlobals(Globals.size());
   bool Changed = false;
 
-  for (const UsedGlobalSet &UGS : UsedGlobalSets) {
+  for (const UsedGlobalSet &UGS : llvm::reverse(UsedGlobalSets)) {
 if (UGS.UsageCount == 0)
   continue;
 if (PickedGlobals.anyCommon(UGS.Globals))



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)

2025-01-24 Thread Younan Zhang via llvm-branch-commits

https://github.com/zyn0217 edited 
https://github.com/llvm/llvm-project/pull/124386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)

2025-01-24 Thread Younan Zhang via llvm-branch-commits

https://github.com/zyn0217 approved this pull request.

Generally looks good, but please give others a chance to take a look.

https://github.com/llvm/llvm-project/pull/124386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)

2025-01-24 Thread Younan Zhang via llvm-branch-commits


@@ -5252,63 +5253,89 @@ bool Sema::CheckTemplateArgument(
 return true;
 }
 
-switch (Arg.getArgument().getKind()) {
-case TemplateArgument::Null:
-  llvm_unreachable("Should never see a NULL template argument here");
-
-case TemplateArgument::Expression: {
-  Expr *E = Arg.getArgument().getAsExpr();
+auto checkExpr = [&](Expr *E) -> Expr * {
   TemplateArgument SugaredResult, CanonicalResult;
   unsigned CurSFINAEErrors = NumSFINAEErrors;
   ExprResult Res =
   CheckTemplateArgument(NTTP, NTTPType, E, SugaredResult,
 CanonicalResult, PartialOrderingTTP, CTAK);
-  if (Res.isInvalid())
-return true;
   // If the current template argument causes an error, give up now.
-  if (CurSFINAEErrors < NumSFINAEErrors)
-return true;
+  if (Res.isInvalid() || CurSFINAEErrors < NumSFINAEErrors)
+return nullptr;
+  SugaredConverted.push_back(SugaredResult);
+  CanonicalConverted.push_back(CanonicalResult);
+  return Res.get();
+};
+
+switch (Arg.getKind()) {
+case TemplateArgument::Null:
+  llvm_unreachable("Should never see a NULL template argument here");
 
+case TemplateArgument::Expression: {
+  Expr *E = Arg.getAsExpr();
+  Expr *R = checkExpr(E);
+  if (!R)
+return true;
   // If the resulting expression is new, then use it in place of the
   // old expression in the template argument.
-  if (Res.get() != E) {
-TemplateArgument TA(Res.get());
-Arg = TemplateArgumentLoc(TA, Res.get());
+  if (R != E) {
+TemplateArgument TA(R);
+ArgLoc = TemplateArgumentLoc(TA, R);
   }
-
-  SugaredConverted.push_back(SugaredResult);
-  CanonicalConverted.push_back(CanonicalResult);
   break;
 }
 
-case TemplateArgument::Declaration:
-case TemplateArgument::Integral:
+// As for the converted NTTP kinds, they still might need another
+// conversion, as the new corresponding parameter might be different.
+// Ideally, we would always perform substitution starting with sugared 
types
+// and never need these, as we would still have expressions. Since these 
are
+// needed so rarely, it's probably a better tradeoff to just convert them
+// back to expressions.
+case TemplateArgument::Integral: {
+  IntegerLiteral ILE(Context, Arg.getAsIntegral(), Arg.getIntegralType(),
+ SourceLocation());
+  if (!checkExpr(&ILE))

zyn0217 wrote:

So this makes `CheckTemplateArgument` take an Expr pointer to a temporary 
rather than anything persisted by ASTContext. Shall we document this behavior 
somewhere to avoid accidentally storing it longer in `CheckTemplateArgument`?

https://github.com/llvm/llvm-project/pull/124386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/112866

>From c336fe428d4d1824a4a437c99655cb909bf328c6 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 31 Oct 2024 14:10:57 +0100
Subject: [PATCH] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi

Change existing code for G_PHI to match what LLVM-IR version is doing
via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI
since it may appear with an undef operand and getVRegDef can fail.
Most notably this improves number of values that can be allocated
to sgpr register bank in AMDGPURegBankSelect.
Common case here are phis that appear in structurize-cfg lowering
for cycles with multiple exits:
Undef incoming value is coming from block that reached cycle exit
condition, if other incoming is uniform keep the phi uniform despite
the fact it is joining values from pair of blocks that are entered
via divergent condition branch.
---
 llvm/lib/CodeGen/MachineSSAContext.cpp| 27 +-
 .../AMDGPU/MIR/hidden-diverge-gmir.mir| 28 +++
 .../AMDGPU/MIR/hidden-loop-diverge.mir|  4 +-
 .../AMDGPU/MIR/uses-value-from-cycle.mir  |  8 +-
 .../GlobalISel/divergence-structurizer.mir| 80 --
 .../regbankselect-mui-regbanklegalize.mir | 69 ---
 .../regbankselect-mui-regbankselect.mir   | 18 ++--
 .../AMDGPU/GlobalISel/regbankselect-mui.ll| 84 ++-
 .../AMDGPU/GlobalISel/regbankselect-mui.mir   | 51 ++-
 9 files changed, 191 insertions(+), 178 deletions(-)

diff --git a/llvm/lib/CodeGen/MachineSSAContext.cpp 
b/llvm/lib/CodeGen/MachineSSAContext.cpp
index e384187b6e8593..8e13c0916dd9e1 100644
--- a/llvm/lib/CodeGen/MachineSSAContext.cpp
+++ b/llvm/lib/CodeGen/MachineSSAContext.cpp
@@ -54,9 +54,34 @@ const MachineBasicBlock 
*MachineSSAContext::getDefBlock(Register value) const {
   return F->getRegInfo().getVRegDef(value)->getParent();
 }
 
+static bool isUndef(const MachineInstr &MI) {
+  return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF ||
+ MI.getOpcode() == TargetOpcode::IMPLICIT_DEF;
+}
+
+/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue() for G_PHI.
 template <>
 bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) {
-  return Phi.isConstantValuePHI();
+  if (!Phi.isPHI())
+return false;
+
+  // In later passes PHI may appear with an undef operand, getVRegDef can fail.
+  if (Phi.getOpcode() == TargetOpcode::PHI)
+return Phi.isConstantValuePHI();
+
+  // For G_PHI we do equivalent of PHINode::hasConstantOrUndefValue().
+  const MachineRegisterInfo &MRI = Phi.getMF()->getRegInfo();
+  Register This = Phi.getOperand(0).getReg();
+  Register ConstantValue;
+  for (unsigned i = 1, e = Phi.getNumOperands(); i < e; i += 2) {
+Register Incoming = Phi.getOperand(i).getReg();
+if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) {
+  if (ConstantValue && ConstantValue != Incoming)
+return false;
+  ConstantValue = Incoming;
+}
+  }
+  return true;
 }
 
 template <>
diff --git 
a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir 
b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
index ce00edf3363f77..9694a340b5e906 100644
--- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
+++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
@@ -1,24 +1,24 @@
 # RUN: llc -mtriple=amdgcn-- -run-pass=print-machine-uniformity -o - %s 2>&1 | 
FileCheck %s
 # CHECK-LABEL: MachineUniformityInfo for function: hidden_diverge
 # CHECK-LABEL: BLOCK bb.0
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
-# CHECK: DIVERGENT: G_BR %bb.2
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
+# CHECK: DIVERGENT: G_BR %bb.2
 # CHECK-LABEL: BLOCK bb.1
 # CHECK-LABEL: BLOCK bb.2
-# CHECK: D

[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/112866

>From 87c8fc15b5b8ccb0b7d48065caa82cdbeddfeac5 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 31 Oct 2024 14:10:57 +0100
Subject: [PATCH] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi

Change existing code for G_PHI to match what LLVM-IR version is doing
via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI
since it may appear with an undef operand and getVRegDef can fail.
Most notably this improves number of values that can be allocated
to sgpr register bank in AMDGPURegBankSelect.
Common case here are phis that appear in structurize-cfg lowering
for cycles with multiple exits:
Undef incoming value is coming from block that reached cycle exit
condition, if other incoming is uniform keep the phi uniform despite
the fact it is joining values from pair of blocks that are entered
via divergent condition branch.
---
 llvm/lib/CodeGen/MachineSSAContext.cpp| 27 +-
 .../AMDGPU/MIR/hidden-diverge-gmir.mir| 28 +++
 .../AMDGPU/MIR/hidden-loop-diverge.mir|  4 +-
 .../AMDGPU/MIR/uses-value-from-cycle.mir  |  8 +-
 .../GlobalISel/divergence-structurizer.mir| 80 --
 .../regbankselect-mui-regbanklegalize.mir | 69 ---
 .../regbankselect-mui-regbankselect.mir   | 18 ++--
 .../AMDGPU/GlobalISel/regbankselect-mui.ll| 84 ++-
 .../AMDGPU/GlobalISel/regbankselect-mui.mir   | 51 ++-
 9 files changed, 191 insertions(+), 178 deletions(-)

diff --git a/llvm/lib/CodeGen/MachineSSAContext.cpp 
b/llvm/lib/CodeGen/MachineSSAContext.cpp
index e384187b6e8593..8e13c0916dd9e1 100644
--- a/llvm/lib/CodeGen/MachineSSAContext.cpp
+++ b/llvm/lib/CodeGen/MachineSSAContext.cpp
@@ -54,9 +54,34 @@ const MachineBasicBlock 
*MachineSSAContext::getDefBlock(Register value) const {
   return F->getRegInfo().getVRegDef(value)->getParent();
 }
 
+static bool isUndef(const MachineInstr &MI) {
+  return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF ||
+ MI.getOpcode() == TargetOpcode::IMPLICIT_DEF;
+}
+
+/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue() for G_PHI.
 template <>
 bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) {
-  return Phi.isConstantValuePHI();
+  if (!Phi.isPHI())
+return false;
+
+  // In later passes PHI may appear with an undef operand, getVRegDef can fail.
+  if (Phi.getOpcode() == TargetOpcode::PHI)
+return Phi.isConstantValuePHI();
+
+  // For G_PHI we do equivalent of PHINode::hasConstantOrUndefValue().
+  const MachineRegisterInfo &MRI = Phi.getMF()->getRegInfo();
+  Register This = Phi.getOperand(0).getReg();
+  Register ConstantValue;
+  for (unsigned i = 1, e = Phi.getNumOperands(); i < e; i += 2) {
+Register Incoming = Phi.getOperand(i).getReg();
+if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) {
+  if (ConstantValue && ConstantValue != Incoming)
+return false;
+  ConstantValue = Incoming;
+}
+  }
+  return true;
 }
 
 template <>
diff --git 
a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir 
b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
index ce00edf3363f77..9694a340b5e906 100644
--- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
+++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
@@ -1,24 +1,24 @@
 # RUN: llc -mtriple=amdgcn-- -run-pass=print-machine-uniformity -o - %s 2>&1 | 
FileCheck %s
 # CHECK-LABEL: MachineUniformityInfo for function: hidden_diverge
 # CHECK-LABEL: BLOCK bb.0
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
-# CHECK: DIVERGENT: G_BR %bb.2
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
+# CHECK: DIVERGENT: G_BR %bb.2
 # CHECK-LABEL: BLOCK bb.1
 # CHECK-LABEL: BLOCK bb.2
-# CHECK: D

[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/112866

>From 87c8fc15b5b8ccb0b7d48065caa82cdbeddfeac5 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 31 Oct 2024 14:10:57 +0100
Subject: [PATCH] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi

Change existing code for G_PHI to match what LLVM-IR version is doing
via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI
since it may appear with an undef operand and getVRegDef can fail.
Most notably this improves number of values that can be allocated
to sgpr register bank in AMDGPURegBankSelect.
Common case here are phis that appear in structurize-cfg lowering
for cycles with multiple exits:
Undef incoming value is coming from block that reached cycle exit
condition, if other incoming is uniform keep the phi uniform despite
the fact it is joining values from pair of blocks that are entered
via divergent condition branch.
---
 llvm/lib/CodeGen/MachineSSAContext.cpp| 27 +-
 .../AMDGPU/MIR/hidden-diverge-gmir.mir| 28 +++
 .../AMDGPU/MIR/hidden-loop-diverge.mir|  4 +-
 .../AMDGPU/MIR/uses-value-from-cycle.mir  |  8 +-
 .../GlobalISel/divergence-structurizer.mir| 80 --
 .../regbankselect-mui-regbanklegalize.mir | 69 ---
 .../regbankselect-mui-regbankselect.mir   | 18 ++--
 .../AMDGPU/GlobalISel/regbankselect-mui.ll| 84 ++-
 .../AMDGPU/GlobalISel/regbankselect-mui.mir   | 51 ++-
 9 files changed, 191 insertions(+), 178 deletions(-)

diff --git a/llvm/lib/CodeGen/MachineSSAContext.cpp 
b/llvm/lib/CodeGen/MachineSSAContext.cpp
index e384187b6e8593..8e13c0916dd9e1 100644
--- a/llvm/lib/CodeGen/MachineSSAContext.cpp
+++ b/llvm/lib/CodeGen/MachineSSAContext.cpp
@@ -54,9 +54,34 @@ const MachineBasicBlock 
*MachineSSAContext::getDefBlock(Register value) const {
   return F->getRegInfo().getVRegDef(value)->getParent();
 }
 
+static bool isUndef(const MachineInstr &MI) {
+  return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF ||
+ MI.getOpcode() == TargetOpcode::IMPLICIT_DEF;
+}
+
+/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue() for G_PHI.
 template <>
 bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) {
-  return Phi.isConstantValuePHI();
+  if (!Phi.isPHI())
+return false;
+
+  // In later passes PHI may appear with an undef operand, getVRegDef can fail.
+  if (Phi.getOpcode() == TargetOpcode::PHI)
+return Phi.isConstantValuePHI();
+
+  // For G_PHI we do equivalent of PHINode::hasConstantOrUndefValue().
+  const MachineRegisterInfo &MRI = Phi.getMF()->getRegInfo();
+  Register This = Phi.getOperand(0).getReg();
+  Register ConstantValue;
+  for (unsigned i = 1, e = Phi.getNumOperands(); i < e; i += 2) {
+Register Incoming = Phi.getOperand(i).getReg();
+if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) {
+  if (ConstantValue && ConstantValue != Incoming)
+return false;
+  ConstantValue = Incoming;
+}
+  }
+  return true;
 }
 
 template <>
diff --git 
a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir 
b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
index ce00edf3363f77..9694a340b5e906 100644
--- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
+++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
@@ -1,24 +1,24 @@
 # RUN: llc -mtriple=amdgcn-- -run-pass=print-machine-uniformity -o - %s 2>&1 | 
FileCheck %s
 # CHECK-LABEL: MachineUniformityInfo for function: hidden_diverge
 # CHECK-LABEL: BLOCK bb.0
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
-# CHECK: DIVERGENT: G_BR %bb.2
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
+# CHECK: DIVERGENT: G_BR %bb.2
 # CHECK-LABEL: BLOCK bb.1
 # CHECK-LABEL: BLOCK bb.2
-# CHECK: D

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for load (PR #112882)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/112882

>From 8cb73f44dd58c897ad3acde5e29014a21ea38ea4 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 23 Jan 2025 13:35:07 +0100
Subject: [PATCH] AMDGPU/GlobalISel: RegBankLegalize rules for load

Add IDs for bit width that cover multiple LLTs: B32 B64 etc.
"Predicate" wrapper class for bool predicate functions used to
write pretty rules. Predicates can be combined using &&, || and !.
Lowering for splitting and widening loads.
Write rules for loads to not change existing mir tests from old
regbankselect.
---
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 288 +++-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   5 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 278 ++-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |  65 +++-
 .../AMDGPU/GlobalISel/regbankselect-load.mir  | 320 +++---
 .../GlobalISel/regbankselect-zextload.mir |   9 +-
 6 files changed, 900 insertions(+), 65 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index d27fa1f62538b6..3c007987b84947 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -50,6 +50,83 @@ void 
RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) {
   lower(MI, Mapping, WaterfallSgprs);
 }
 
+void RegBankLegalizeHelper::splitLoad(MachineInstr &MI,
+  ArrayRef LLTBreakdown, LLT MergeTy) 
{
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register Base = MI.getOperand(1).getReg();
+  LLT PtrTy = MRI.getType(Base);
+  const RegisterBank *PtrRB = MRI.getRegBankOrNull(Base);
+  LLT OffsetTy = LLT::scalar(PtrTy.getSizeInBits());
+  SmallVector LoadPartRegs;
+
+  unsigned ByteOffset = 0;
+  for (LLT PartTy : LLTBreakdown) {
+Register BasePlusOffset;
+if (ByteOffset == 0) {
+  BasePlusOffset = Base;
+} else {
+  auto Offset = B.buildConstant({PtrRB, OffsetTy}, ByteOffset);
+  BasePlusOffset = B.buildPtrAdd({PtrRB, PtrTy}, Base, Offset).getReg(0);
+}
+auto *OffsetMMO = MF.getMachineMemOperand(&BaseMMO, ByteOffset, PartTy);
+auto LoadPart = B.buildLoad({DstRB, PartTy}, BasePlusOffset, *OffsetMMO);
+LoadPartRegs.push_back(LoadPart.getReg(0));
+ByteOffset += PartTy.getSizeInBytes();
+  }
+
+  if (!MergeTy.isValid()) {
+// Loads are of same size, concat or merge them together.
+B.buildMergeLikeInstr(Dst, LoadPartRegs);
+  } else {
+// Loads are not all of same size, need to unmerge them to smaller pieces
+// of MergeTy type, then merge pieces to Dst.
+SmallVector MergeTyParts;
+for (Register Reg : LoadPartRegs) {
+  if (MRI.getType(Reg) == MergeTy) {
+MergeTyParts.push_back(Reg);
+  } else {
+auto Unmerge = B.buildUnmerge({DstRB, MergeTy}, Reg);
+for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i)
+  MergeTyParts.push_back(Unmerge.getReg(i));
+  }
+}
+B.buildMergeLikeInstr(Dst, MergeTyParts);
+  }
+  MI.eraseFromParent();
+}
+
+void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy,
+  LLT MergeTy) {
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register Base = MI.getOperand(1).getReg();
+
+  MachineMemOperand *WideMMO = MF.getMachineMemOperand(&BaseMMO, 0, WideTy);
+  auto WideLoad = B.buildLoad({DstRB, WideTy}, Base, *WideMMO);
+
+  if (WideTy.isScalar()) {
+B.buildTrunc(Dst, WideLoad);
+  } else {
+SmallVector MergeTyParts;
+auto Unmerge = B.buildUnmerge({DstRB, MergeTy}, WideLoad);
+
+LLT DstTy = MRI.getType(Dst);
+unsigned NumElts = DstTy.getSizeInBits() / MergeTy.getSizeInBits();
+for (unsigned i = 0; i < NumElts; ++i) {
+  MergeTyParts.push_back(Unmerge.getReg(i));
+}
+B.buildMergeLikeInstr(Dst, MergeTyParts);
+  }
+  MI.eraseFromParent();
+}
+
 void RegBankLegalizeHelper::lower(MachineInstr &MI,
   const RegBankLLTMapping &Mapping,
   SmallSet &WaterfallSgprs) {
@@ -128,6 +205,54 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
 MI.eraseFromParent();
 break;
   }
+  case SplitLoad: {
+LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
+unsigned Size = DstTy.getSizeInBits();
+// Even split to 128-bit loads
+if (Size > 128) {
+  LLT B128;
+  if (DstTy.isVector()) {
+LLT EltTy = DstTy.getElementType();
+B128 = LLT::f

[llvm-branch-commits] [llvm] [Coro] Use DebugInfoCache to speed up cloning in CoroSplitPass (PR #118630)

2025-01-24 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh edited 
https://github.com/llvm/llvm-project/pull/118630
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Analysis] Add DebugInfoCache analysis (PR #118629)

2025-01-24 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh edited 
https://github.com/llvm/llvm-project/pull/118629
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for load (PR #112882)

2025-01-24 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/112882

>From 0030251f71c08000c1b4ff123cc401b70c72014f Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 23 Jan 2025 13:35:07 +0100
Subject: [PATCH] AMDGPU/GlobalISel: RegBankLegalize rules for load

Add IDs for bit width that cover multiple LLTs: B32 B64 etc.
"Predicate" wrapper class for bool predicate functions used to
write pretty rules. Predicates can be combined using &&, || and !.
Lowering for splitting and widening loads.
Write rules for loads to not change existing mir tests from old
regbankselect.
---
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 288 +++-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   5 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 278 ++-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |  65 +++-
 .../AMDGPU/GlobalISel/regbankselect-load.mir  | 320 +++---
 .../GlobalISel/regbankselect-zextload.mir |   9 +-
 6 files changed, 900 insertions(+), 65 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index d27fa1f62538b6..3c007987b84947 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -50,6 +50,83 @@ void 
RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) {
   lower(MI, Mapping, WaterfallSgprs);
 }
 
+void RegBankLegalizeHelper::splitLoad(MachineInstr &MI,
+  ArrayRef LLTBreakdown, LLT MergeTy) 
{
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register Base = MI.getOperand(1).getReg();
+  LLT PtrTy = MRI.getType(Base);
+  const RegisterBank *PtrRB = MRI.getRegBankOrNull(Base);
+  LLT OffsetTy = LLT::scalar(PtrTy.getSizeInBits());
+  SmallVector LoadPartRegs;
+
+  unsigned ByteOffset = 0;
+  for (LLT PartTy : LLTBreakdown) {
+Register BasePlusOffset;
+if (ByteOffset == 0) {
+  BasePlusOffset = Base;
+} else {
+  auto Offset = B.buildConstant({PtrRB, OffsetTy}, ByteOffset);
+  BasePlusOffset = B.buildPtrAdd({PtrRB, PtrTy}, Base, Offset).getReg(0);
+}
+auto *OffsetMMO = MF.getMachineMemOperand(&BaseMMO, ByteOffset, PartTy);
+auto LoadPart = B.buildLoad({DstRB, PartTy}, BasePlusOffset, *OffsetMMO);
+LoadPartRegs.push_back(LoadPart.getReg(0));
+ByteOffset += PartTy.getSizeInBytes();
+  }
+
+  if (!MergeTy.isValid()) {
+// Loads are of same size, concat or merge them together.
+B.buildMergeLikeInstr(Dst, LoadPartRegs);
+  } else {
+// Loads are not all of same size, need to unmerge them to smaller pieces
+// of MergeTy type, then merge pieces to Dst.
+SmallVector MergeTyParts;
+for (Register Reg : LoadPartRegs) {
+  if (MRI.getType(Reg) == MergeTy) {
+MergeTyParts.push_back(Reg);
+  } else {
+auto Unmerge = B.buildUnmerge({DstRB, MergeTy}, Reg);
+for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i)
+  MergeTyParts.push_back(Unmerge.getReg(i));
+  }
+}
+B.buildMergeLikeInstr(Dst, MergeTyParts);
+  }
+  MI.eraseFromParent();
+}
+
+void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy,
+  LLT MergeTy) {
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register Base = MI.getOperand(1).getReg();
+
+  MachineMemOperand *WideMMO = MF.getMachineMemOperand(&BaseMMO, 0, WideTy);
+  auto WideLoad = B.buildLoad({DstRB, WideTy}, Base, *WideMMO);
+
+  if (WideTy.isScalar()) {
+B.buildTrunc(Dst, WideLoad);
+  } else {
+SmallVector MergeTyParts;
+auto Unmerge = B.buildUnmerge({DstRB, MergeTy}, WideLoad);
+
+LLT DstTy = MRI.getType(Dst);
+unsigned NumElts = DstTy.getSizeInBits() / MergeTy.getSizeInBits();
+for (unsigned i = 0; i < NumElts; ++i) {
+  MergeTyParts.push_back(Unmerge.getReg(i));
+}
+B.buildMergeLikeInstr(Dst, MergeTyParts);
+  }
+  MI.eraseFromParent();
+}
+
 void RegBankLegalizeHelper::lower(MachineInstr &MI,
   const RegBankLLTMapping &Mapping,
   SmallSet &WaterfallSgprs) {
@@ -128,6 +205,54 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
 MI.eraseFromParent();
 break;
   }
+  case SplitLoad: {
+LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
+unsigned Size = DstTy.getSizeInBits();
+// Even split to 128-bit loads
+if (Size > 128) {
+  LLT B128;
+  if (DstTy.isVector()) {
+LLT EltTy = DstTy.getElementType();
+B128 = LLT::f

[llvm-branch-commits] [clang] [Clang][CWG2369] Implement GCC's heuristic for DR 2369 (PR #124231)

2025-01-24 Thread Younan Zhang via llvm-branch-commits

https://github.com/zyn0217 updated 
https://github.com/llvm/llvm-project/pull/124231

>From f766c8c099cf8f1bc076c0308afc3a2832a5b495 Mon Sep 17 00:00:00 2001
From: Younan Zhang 
Date: Fri, 24 Jan 2025 13:52:37 +0800
Subject: [PATCH] Implement GCC's CWG 2369 heuristic

---
 clang/include/clang/Sema/Sema.h   |   9 +-
 clang/lib/Sema/SemaOverload.cpp   |  62 ++-
 clang/lib/Sema/SemaTemplateDeduction.cpp  |  14 +-
 .../SemaTemplate/concepts-recursive-inst.cpp  | 169 ++
 4 files changed, 241 insertions(+), 13 deletions(-)

diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 87d9a335763e31..99ca65159106b5 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -10236,7 +10236,8 @@ class Sema final : public SemaBase {
   FunctionTemplateDecl *FunctionTemplate, ArrayRef ParamTypes,
   ArrayRef Args, OverloadCandidateSet &CandidateSet,
   ConversionSequenceList &Conversions, bool SuppressUserConversions,
-  CXXRecordDecl *ActingContext = nullptr, QualType ObjectType = QualType(),
+  bool NonInstOnly, CXXRecordDecl *ActingContext = nullptr,
+  QualType ObjectType = QualType(),
   Expr::Classification ObjectClassification = {},
   OverloadCandidateParamOrder PO = {});
 
@@ -12272,7 +12273,9 @@ class Sema final : public SemaBase {
   sema::TemplateDeductionInfo &Info,
   SmallVectorImpl const *OriginalCallArgs = nullptr,
   bool PartialOverloading = false,
-  llvm::function_ref CheckNonDependent = [] { return false; });
+  llvm::function_ref CheckNonDependent = [](bool) {
+return false;
+  });
 
   /// Perform template argument deduction from a function call
   /// (C++ [temp.deduct.call]).
@@ -12306,7 +12309,7 @@ class Sema final : public SemaBase {
   FunctionDecl *&Specialization, sema::TemplateDeductionInfo &Info,
   bool PartialOverloading, bool AggregateDeductionCandidate,
   QualType ObjectType, Expr::Classification ObjectClassification,
-  llvm::function_ref)> CheckNonDependent);
+  llvm::function_ref, bool)> CheckNonDependent);
 
   /// Deduce template arguments when taking the address of a function
   /// template (C++ [temp.deduct.funcaddr]) or matching a specialization to
diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 3be9ade80f1d94..c2baa75c09bce9 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -7733,10 +7733,10 @@ void Sema::AddMethodTemplateCandidate(
   MethodTmpl, ExplicitTemplateArgs, Args, Specialization, Info,
   PartialOverloading, /*AggregateDeductionCandidate=*/false, 
ObjectType,
   ObjectClassification,
-  [&](ArrayRef ParamTypes) {
+  [&](ArrayRef ParamTypes, bool NonInstOnly) {
 return CheckNonDependentConversions(
 MethodTmpl, ParamTypes, Args, CandidateSet, Conversions,
-SuppressUserConversions, ActingContext, ObjectType,
+SuppressUserConversions, NonInstOnly, ActingContext, 
ObjectType,
 ObjectClassification, PO);
   });
   Result != TemplateDeductionResult::Success) {
@@ -7818,10 +7818,11 @@ void Sema::AddTemplateOverloadCandidate(
   PartialOverloading, AggregateCandidateDeduction,
   /*ObjectType=*/QualType(),
   /*ObjectClassification=*/Expr::Classification(),
-  [&](ArrayRef ParamTypes) {
+  [&](ArrayRef ParamTypes, bool NonInstOnly) {
 return CheckNonDependentConversions(
 FunctionTemplate, ParamTypes, Args, CandidateSet, Conversions,
-SuppressUserConversions, nullptr, QualType(), {}, PO);
+SuppressUserConversions, NonInstOnly, nullptr, QualType(), {},
+PO);
   });
   Result != TemplateDeductionResult::Success) {
 OverloadCandidate &Candidate =
@@ -7863,7 +7864,7 @@ bool Sema::CheckNonDependentConversions(
 FunctionTemplateDecl *FunctionTemplate, ArrayRef ParamTypes,
 ArrayRef Args, OverloadCandidateSet &CandidateSet,
 ConversionSequenceList &Conversions, bool SuppressUserConversions,
-CXXRecordDecl *ActingContext, QualType ObjectType,
+bool NonInstOnly, CXXRecordDecl *ActingContext, QualType ObjectType,
 Expr::Classification ObjectClassification, OverloadCandidateParamOrder PO) 
{
   // FIXME: The cases in which we allow explicit conversions for constructor
   // arguments never consider calling a constructor template. It's not clear
@@ -7900,6 +7901,54 @@ bool Sema::CheckNonDependentConversions(
 }
   }
 
+  // A heuristic & speculative workaround for bug
+  // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99599 that manifests after
+  // CWG2369.
+  auto ConversionMightInduceInstantiation = [&](QualType ParmType,
+QualType ArgType) {
+ParmType = ParmType.getNonReferenceType();
+  

[llvm-branch-commits] [llvm] [Coro] Use DebugInfoCache to speed up cloning in CoroSplitPass (PR #118630)

2025-01-24 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/118630

>From 27e99070e3694c4bdb4b71fcdfa5c6153b8b6d1e Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Sun, 15 Sep 2024 11:00:00 -0700
Subject: [PATCH] [Coro] Use DebugInfoCache to speed up cloning in
 CoroSplitPass

Summary:
We can use a DebugInfoFinder from DebugInfoCache which is already primed on a 
compile unit to speed
up collection of module-level debug info.

The pass could likely be another 2x+ faster if we avoid rebuilding the set of 
common debug
info. This needs further massaging of CloneFunction and ValueMapper, though, 
and can be done
incrementally on top of this.

Comparing performance of CoroSplitPass at various points in this stack, this is 
anecdata from a sample
cpp file compiled with full debug info:
| | Baseline | IdentityMD set | Prebuilt CommonDI | Cached CU 
DIFinder (cur.) |
|-|--||---|---|
| CoroSplitPass   | 306ms| 221ms  | 68ms  | 17ms
  |
| CoroCloner  | 101ms| 72ms   | 0.5ms | 0.5ms   
  |
| CollectCommonDI | -| -  | 63ms  | 13ms
  |
| Speed up| 1x   | 1.4x   | 4.5x  | 18x 
  |

Test Plan:
ninja check-llvm-unit
ninja check-llvm

Compiled a sample cpp file with time trace to get the avg. duration of the pass 
and inner scopes.

stack-info: PR: https://github.com/llvm/llvm-project/pull/118630, branch: 
users/artempyanykh/fast-coro-upstream/11
---
 llvm/include/llvm/Transforms/Coroutines/ABI.h | 13 +++--
 llvm/lib/Analysis/CGSCCPassManager.cpp|  7 +++
 llvm/lib/Transforms/Coroutines/CoroSplit.cpp  | 55 +++
 llvm/test/Other/new-pass-manager.ll   |  1 +
 llvm/test/Other/new-pm-defaults.ll|  1 +
 llvm/test/Other/new-pm-lto-defaults.ll|  1 +
 llvm/test/Other/new-pm-pgo-preinline.ll   |  1 +
 .../Other/new-pm-thinlto-postlink-defaults.ll |  1 +
 .../new-pm-thinlto-postlink-pgo-defaults.ll   |  1 +
 ...-pm-thinlto-postlink-samplepgo-defaults.ll |  1 +
 .../Other/new-pm-thinlto-prelink-defaults.ll  |  1 +
 .../new-pm-thinlto-prelink-pgo-defaults.ll|  1 +
 ...w-pm-thinlto-prelink-samplepgo-defaults.ll |  1 +
 .../Analysis/CGSCCPassManagerTest.cpp |  4 +-
 14 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/llvm/include/llvm/Transforms/Coroutines/ABI.h 
b/llvm/include/llvm/Transforms/Coroutines/ABI.h
index 0b2d405f3caec4..2cf614b6bb1e2a 100644
--- a/llvm/include/llvm/Transforms/Coroutines/ABI.h
+++ b/llvm/include/llvm/Transforms/Coroutines/ABI.h
@@ -15,6 +15,7 @@
 #ifndef LLVM_TRANSFORMS_COROUTINES_ABI_H
 #define LLVM_TRANSFORMS_COROUTINES_ABI_H
 
+#include "llvm/Analysis/DebugInfoCache.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Transforms/Coroutines/CoroShape.h"
 #include "llvm/Transforms/Coroutines/MaterializationUtils.h"
@@ -53,7 +54,8 @@ class BaseABI {
   // Perform the function splitting according to the ABI.
   virtual void splitCoroutine(Function &F, coro::Shape &Shape,
   SmallVectorImpl &Clones,
-  TargetTransformInfo &TTI) = 0;
+  TargetTransformInfo &TTI,
+  const DebugInfoCache *DICache) = 0;
 
   Function &F;
   coro::Shape &Shape;
@@ -73,7 +75,8 @@ class SwitchABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
   SmallVectorImpl &Clones,
-  TargetTransformInfo &TTI) override;
+  TargetTransformInfo &TTI,
+  const DebugInfoCache *DICache) override;
 };
 
 class AsyncABI : public BaseABI {
@@ -86,7 +89,8 @@ class AsyncABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
   SmallVectorImpl &Clones,
-  TargetTransformInfo &TTI) override;
+  TargetTransformInfo &TTI,
+  const DebugInfoCache *DICache) override;
 };
 
 class AnyRetconABI : public BaseABI {
@@ -99,7 +103,8 @@ class AnyRetconABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
   SmallVectorImpl &Clones,
-  TargetTransformInfo &TTI) override;
+  TargetTransformInfo &TTI,
+  const DebugInfoCache *DICache) override;
 };
 
 } // end namespace coro
diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp 
b/llvm/lib/Analysis/CGSCCPassManager.cpp
index 948bc2435ab275..3ba085cdb0be8b 100644
--- a/llvm/lib/Analysis/CGSCCPassManager.cpp
+++ b/llvm/lib/Analysis/CGSCCPassManager.cpp
@@ -14,6 +14,7 @@
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/iterator_range.h"
+#include "llvm/Analy

[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)

2025-01-24 Thread Matheus Izvekov via llvm-branch-commits

https://github.com/mizvekov created 
https://github.com/llvm/llvm-project/pull/124386

Converted template arguments need to be converted again, if the corresponding 
template parameter changed, as different conversions might apply in that case.

>From 9b174f4505eaf19e0ccfb1ec905c8206bb575d4b Mon Sep 17 00:00:00 2001
From: Matheus Izvekov 
Date: Fri, 24 Jan 2025 19:25:38 -0300
Subject: [PATCH] [clang] fix template argument conversion

Converted template arguments need to be converted again, if
the corresponding template parameter changed, as different
conversions might apply in that case.
---
 clang/docs/ReleaseNotes.rst |   3 +
 clang/lib/Sema/SemaTemplate.cpp | 120 +---
 clang/lib/Sema/TreeTransform.h  |   6 +-
 clang/test/SemaTemplate/cwg2398.cpp |   8 --
 4 files changed, 80 insertions(+), 57 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index b89d055304f4a6..27574924a14a92 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -993,6 +993,9 @@ Bug Fixes to C++ Support
 - Fix immediate escalation not propagating through inherited constructors.  
(#GH112677)
 - Fixed assertions or false compiler diagnostics in the case of C++ modules for
   lambda functions or inline friend functions defined inside templates 
(#GH122493).
+- Fix template argument checking so that converted template arguments are
+  converted again. This fixes some issues with partial ordering involving
+  template template parameters with non-type template parameters.
 
 Bug Fixes to AST Handling
 ^
diff --git a/clang/lib/Sema/SemaTemplate.cpp b/clang/lib/Sema/SemaTemplate.cpp
index 210df2836eeb07..62c45f15dec54e 100644
--- a/clang/lib/Sema/SemaTemplate.cpp
+++ b/clang/lib/Sema/SemaTemplate.cpp
@@ -5199,7 +5199,7 @@ convertTypeTemplateArgumentToTemplate(ASTContext 
&Context, TypeLoc TLoc) {
 }
 
 bool Sema::CheckTemplateArgument(
-NamedDecl *Param, TemplateArgumentLoc &Arg, NamedDecl *Template,
+NamedDecl *Param, TemplateArgumentLoc &ArgLoc, NamedDecl *Template,
 SourceLocation TemplateLoc, SourceLocation RAngleLoc,
 unsigned ArgumentPackIndex,
 SmallVectorImpl &SugaredConverted,
@@ -5208,9 +5208,10 @@ bool Sema::CheckTemplateArgument(
 bool PartialOrderingTTP, bool *MatchedPackOnParmToNonPackOnArg) {
   // Check template type parameters.
   if (TemplateTypeParmDecl *TTP = dyn_cast(Param))
-return CheckTemplateTypeArgument(TTP, Arg, SugaredConverted,
+return CheckTemplateTypeArgument(TTP, ArgLoc, SugaredConverted,
  CanonicalConverted);
 
+  const TemplateArgument &Arg = ArgLoc.getArgument();
   // Check non-type template parameters.
   if (NonTypeTemplateParmDecl *NTTP =dyn_cast(Param)) 
{
 // Do substitution on the type of the non-type template parameter
@@ -5252,63 +5253,89 @@ bool Sema::CheckTemplateArgument(
 return true;
 }
 
-switch (Arg.getArgument().getKind()) {
-case TemplateArgument::Null:
-  llvm_unreachable("Should never see a NULL template argument here");
-
-case TemplateArgument::Expression: {
-  Expr *E = Arg.getArgument().getAsExpr();
+auto checkExpr = [&](Expr *E) -> Expr * {
   TemplateArgument SugaredResult, CanonicalResult;
   unsigned CurSFINAEErrors = NumSFINAEErrors;
   ExprResult Res =
   CheckTemplateArgument(NTTP, NTTPType, E, SugaredResult,
 CanonicalResult, PartialOrderingTTP, CTAK);
-  if (Res.isInvalid())
-return true;
   // If the current template argument causes an error, give up now.
-  if (CurSFINAEErrors < NumSFINAEErrors)
-return true;
+  if (Res.isInvalid() || CurSFINAEErrors < NumSFINAEErrors)
+return nullptr;
+  SugaredConverted.push_back(SugaredResult);
+  CanonicalConverted.push_back(CanonicalResult);
+  return Res.get();
+};
+
+switch (Arg.getKind()) {
+case TemplateArgument::Null:
+  llvm_unreachable("Should never see a NULL template argument here");
 
+case TemplateArgument::Expression: {
+  Expr *E = Arg.getAsExpr();
+  Expr *R = checkExpr(E);
+  if (!R)
+return true;
   // If the resulting expression is new, then use it in place of the
   // old expression in the template argument.
-  if (Res.get() != E) {
-TemplateArgument TA(Res.get());
-Arg = TemplateArgumentLoc(TA, Res.get());
+  if (R != E) {
+TemplateArgument TA(R);
+ArgLoc = TemplateArgumentLoc(TA, R);
   }
-
-  SugaredConverted.push_back(SugaredResult);
-  CanonicalConverted.push_back(CanonicalResult);
   break;
 }
 
-case TemplateArgument::Declaration:
-case TemplateArgument::Integral:
+// As for the converted NTTP kinds, they still might need another
+// conversion, as the new corresponding parameter might be different.
+// Ideally, we would always perform substitution sta

[llvm-branch-commits] [clang] [clang] fix template argument conversion (PR #124386)

2025-01-24 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Matheus Izvekov (mizvekov)


Changes

Converted template arguments need to be converted again, if the corresponding 
template parameter changed, as different conversions might apply in that case.

---
Full diff: https://github.com/llvm/llvm-project/pull/124386.diff


4 Files Affected:

- (modified) clang/docs/ReleaseNotes.rst (+3) 
- (modified) clang/lib/Sema/SemaTemplate.cpp (+73-47) 
- (modified) clang/lib/Sema/TreeTransform.h (+4-2) 
- (modified) clang/test/SemaTemplate/cwg2398.cpp (-8) 


``diff
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index b89d055304f4a6..27574924a14a92 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -993,6 +993,9 @@ Bug Fixes to C++ Support
 - Fix immediate escalation not propagating through inherited constructors.  
(#GH112677)
 - Fixed assertions or false compiler diagnostics in the case of C++ modules for
   lambda functions or inline friend functions defined inside templates 
(#GH122493).
+- Fix template argument checking so that converted template arguments are
+  converted again. This fixes some issues with partial ordering involving
+  template template parameters with non-type template parameters.
 
 Bug Fixes to AST Handling
 ^
diff --git a/clang/lib/Sema/SemaTemplate.cpp b/clang/lib/Sema/SemaTemplate.cpp
index 210df2836eeb07..62c45f15dec54e 100644
--- a/clang/lib/Sema/SemaTemplate.cpp
+++ b/clang/lib/Sema/SemaTemplate.cpp
@@ -5199,7 +5199,7 @@ convertTypeTemplateArgumentToTemplate(ASTContext 
&Context, TypeLoc TLoc) {
 }
 
 bool Sema::CheckTemplateArgument(
-NamedDecl *Param, TemplateArgumentLoc &Arg, NamedDecl *Template,
+NamedDecl *Param, TemplateArgumentLoc &ArgLoc, NamedDecl *Template,
 SourceLocation TemplateLoc, SourceLocation RAngleLoc,
 unsigned ArgumentPackIndex,
 SmallVectorImpl &SugaredConverted,
@@ -5208,9 +5208,10 @@ bool Sema::CheckTemplateArgument(
 bool PartialOrderingTTP, bool *MatchedPackOnParmToNonPackOnArg) {
   // Check template type parameters.
   if (TemplateTypeParmDecl *TTP = dyn_cast(Param))
-return CheckTemplateTypeArgument(TTP, Arg, SugaredConverted,
+return CheckTemplateTypeArgument(TTP, ArgLoc, SugaredConverted,
  CanonicalConverted);
 
+  const TemplateArgument &Arg = ArgLoc.getArgument();
   // Check non-type template parameters.
   if (NonTypeTemplateParmDecl *NTTP =dyn_cast(Param)) 
{
 // Do substitution on the type of the non-type template parameter
@@ -5252,63 +5253,89 @@ bool Sema::CheckTemplateArgument(
 return true;
 }
 
-switch (Arg.getArgument().getKind()) {
-case TemplateArgument::Null:
-  llvm_unreachable("Should never see a NULL template argument here");
-
-case TemplateArgument::Expression: {
-  Expr *E = Arg.getArgument().getAsExpr();
+auto checkExpr = [&](Expr *E) -> Expr * {
   TemplateArgument SugaredResult, CanonicalResult;
   unsigned CurSFINAEErrors = NumSFINAEErrors;
   ExprResult Res =
   CheckTemplateArgument(NTTP, NTTPType, E, SugaredResult,
 CanonicalResult, PartialOrderingTTP, CTAK);
-  if (Res.isInvalid())
-return true;
   // If the current template argument causes an error, give up now.
-  if (CurSFINAEErrors < NumSFINAEErrors)
-return true;
+  if (Res.isInvalid() || CurSFINAEErrors < NumSFINAEErrors)
+return nullptr;
+  SugaredConverted.push_back(SugaredResult);
+  CanonicalConverted.push_back(CanonicalResult);
+  return Res.get();
+};
+
+switch (Arg.getKind()) {
+case TemplateArgument::Null:
+  llvm_unreachable("Should never see a NULL template argument here");
 
+case TemplateArgument::Expression: {
+  Expr *E = Arg.getAsExpr();
+  Expr *R = checkExpr(E);
+  if (!R)
+return true;
   // If the resulting expression is new, then use it in place of the
   // old expression in the template argument.
-  if (Res.get() != E) {
-TemplateArgument TA(Res.get());
-Arg = TemplateArgumentLoc(TA, Res.get());
+  if (R != E) {
+TemplateArgument TA(R);
+ArgLoc = TemplateArgumentLoc(TA, R);
   }
-
-  SugaredConverted.push_back(SugaredResult);
-  CanonicalConverted.push_back(CanonicalResult);
   break;
 }
 
-case TemplateArgument::Declaration:
-case TemplateArgument::Integral:
+// As for the converted NTTP kinds, they still might need another
+// conversion, as the new corresponding parameter might be different.
+// Ideally, we would always perform substitution starting with sugared 
types
+// and never need these, as we would still have expressions. Since these 
are
+// needed so rarely, it's probably a better tradeoff to just convert them
+// back to expressions.
+case TemplateArgument::Integral: {
+  IntegerLiteral ILE(Context, A