[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits

https://github.com/ChuanqiXu9 updated 
https://github.com/llvm/llvm-project/pull/83237

>From f2e53e44eebab4720a1dbade24fcb14d698fb03f Mon Sep 17 00:00:00 2001
From: Chuanqi Xu 
Date: Wed, 28 Feb 2024 11:41:53 +0800
Subject: [PATCH 1/6] [Serialization] Code cleanups and polish 83233

---
 clang/include/clang/AST/DeclTemplate.h|  39 +-
 clang/include/clang/AST/ExternalASTSource.h   |   8 +-
 .../clang/Sema/MultiplexExternalSemaSource.h  |   4 +-
 .../include/clang/Serialization/ASTBitCodes.h |   2 +-
 clang/include/clang/Serialization/ASTReader.h |   4 +-
 clang/lib/AST/DeclTemplate.cpp|  85 ++--
 clang/lib/AST/ExternalASTSource.cpp   |  10 +-
 clang/lib/AST/ODRHash.cpp |  10 -
 .../lib/Sema/MultiplexExternalSemaSource.cpp  |  13 +-
 clang/lib/Serialization/ASTCommon.h   |   1 -
 clang/lib/Serialization/ASTReader.cpp |  42 +-
 clang/lib/Serialization/ASTReaderDecl.cpp |  76 +---
 clang/lib/Serialization/ASTReaderInternals.h  |   1 -
 clang/lib/Serialization/ASTWriter.cpp |  27 +-
 clang/lib/Serialization/ASTWriterDecl.cpp |  52 +--
 clang/lib/Serialization/CMakeLists.txt|   1 +
 .../Serialization/TemplateArgumentHasher.cpp  | 423 ++
 .../Serialization/TemplateArgumentHasher.h|  34 ++
 clang/test/Modules/cxx-templates.cpp  |   8 +-
 .../Modules/recursive-instantiations.cppm |  40 ++
 .../test/OpenMP/target_parallel_ast_print.cpp |   4 -
 clang/test/OpenMP/target_teams_ast_print.cpp  |   4 -
 clang/test/OpenMP/task_ast_print.cpp  |   4 -
 clang/test/OpenMP/teams_ast_print.cpp |   4 -
 24 files changed, 610 insertions(+), 286 deletions(-)
 create mode 100644 clang/lib/Serialization/TemplateArgumentHasher.cpp
 create mode 100644 clang/lib/Serialization/TemplateArgumentHasher.h
 create mode 100644 clang/test/Modules/recursive-instantiations.cppm

diff --git a/clang/include/clang/AST/DeclTemplate.h 
b/clang/include/clang/AST/DeclTemplate.h
index 44f840d297465d..7406252363d223 100644
--- a/clang/include/clang/AST/DeclTemplate.h
+++ b/clang/include/clang/AST/DeclTemplate.h
@@ -256,9 +256,6 @@ class TemplateArgumentList final
   TemplateArgumentList(const TemplateArgumentList &) = delete;
   TemplateArgumentList &operator=(const TemplateArgumentList &) = delete;
 
-  /// Create hash for the given arguments.
-  static unsigned ComputeODRHash(ArrayRef Args);
-
   /// Create a new template argument list that copies the given set of
   /// template arguments.
   static TemplateArgumentList *CreateCopy(ASTContext &Context,
@@ -732,25 +729,6 @@ class RedeclarableTemplateDecl : public TemplateDecl,
   }
 
   void anchor() override;
-  struct LazySpecializationInfo {
-GlobalDeclID DeclID = GlobalDeclID();
-unsigned ODRHash = ~0U;
-bool IsPartial = false;
-LazySpecializationInfo(GlobalDeclID ID, unsigned Hash = ~0U,
-   bool Partial = false)
-: DeclID(ID), ODRHash(Hash), IsPartial(Partial) {}
-LazySpecializationInfo() {}
-bool operator<(const LazySpecializationInfo &Other) const {
-  return DeclID < Other.DeclID;
-}
-bool operator==(const LazySpecializationInfo &Other) const {
-  assert((DeclID != Other.DeclID || ODRHash == Other.ODRHash) &&
- "Hashes differ!");
-  assert((DeclID != Other.DeclID || IsPartial == Other.IsPartial) &&
- "Both must be the same kinds!");
-  return DeclID == Other.DeclID;
-}
-  };
 
 protected:
   template  struct SpecEntryTraits {
@@ -794,16 +772,20 @@ class RedeclarableTemplateDecl : public TemplateDecl,
 
   void loadLazySpecializationsImpl(bool OnlyPartial = false) const;
 
-  void loadLazySpecializationsImpl(llvm::ArrayRef Args,
+  bool loadLazySpecializationsImpl(llvm::ArrayRef Args,
TemplateParameterList *TPL = nullptr) const;
 
-  Decl *loadLazySpecializationImpl(LazySpecializationInfo &LazySpecInfo) const;
-
   template 
   typename SpecEntryTraits::DeclType*
   findSpecializationImpl(llvm::FoldingSetVector &Specs,
  void *&InsertPos, ProfileArguments &&...ProfileArgs);
 
+  template 
+  typename SpecEntryTraits::DeclType *
+  findSpecializationLocally(llvm::FoldingSetVector &Specs,
+void *&InsertPos,
+ProfileArguments &&...ProfileArgs);
+
   template 
   void addSpecializationImpl(llvm::FoldingSetVector &Specs,
  EntryType *Entry, void *InsertPos);
@@ -819,13 +801,6 @@ class RedeclarableTemplateDecl : public TemplateDecl,
 llvm::PointerIntPair
   InstantiatedFromMember;
 
-/// If non-null, points to an array of specializations (including
-/// partial specializations) known only by their external declaration IDs.
-///
-/// The first value in the array is the number of specializations/partial
-/// specializations that follow.
-LazySpecializationInfo *LazySpecializations = n

[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits

ChuanqiXu9 wrote:

I think now I understand the problem. The root cause happens in 
https://github.com/llvm/llvm-project/blob/175aa864f33786f3a6a4ee7381cbcafd0758501a/clang/lib/Serialization/MultiOnDiskHashTable.h#L329

The description in () is optional. You can skip it if you're not interested it 
or in the first iteration.

what the code does is: when we write a on-disk hash table, try to write the 
imported merged hash table in the same process so that we don't need to read 
these tables again. However, in line 329 the function will try to omit the data 
from imported table with the same key which already emitted by the current 
module file. This is the root cause of the problem.

(The wrotten merged hash table are called overiden files, and they will be 
removed in 
https://github.com/llvm/llvm-project/blob/175aa864f33786f3a6a4ee7381cbcafd0758501a/clang/lib/Serialization/MultiOnDiskHashTable.h#L133-L137)

(when will the table will be merged? when the number of on disk hash table for 
the same item is large than some threshold (by default 4), we will merge them 
into an in memory table to try to speedup the querying. So this is majorly an 
optimization.)

It is bad to skip data with the same key. Since it violates the big assumption 
that we discussed for a long time:
- It is bad to have different key values for the logical same specializations.
- But it is actually good to have the same key values for the different 
specializations. And the code should work well if we counts the hash value for 
all template arguments as 0x12345678.

And the implicitly optimization to skip data with the same key, violates the 
second assumption above. So this is the root cause of the problem.

(Why my previous try works? Since it will remove the imported table if it loads 
all the items from it, so it avoids the "optimization" surprisingly.)

Then it looks pretty simple to overcome the issue, just skip the optimization 
like I did in the most new commit.

@ilya-biryukov @alexfh I think we can start another round of test. Thanks in 
ahead.

https://github.com/llvm/llvm-project/pull/83237
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -1827,6 +1833,12 @@ void ASTDeclWriter::VisitVarTemplateDecl(VarTemplateDecl 
*D) {
 
 void ASTDeclWriter::VisitVarTemplateSpecializationDecl(
 VarTemplateSpecializationDecl *D) {
+  // FIXME: We need to load the "logical" first declaration before writing
+  // the Redeclarable part. But it may be too expensive to load all the
+  // specializations. Maybe we can find a way to load the "logical" first
+  // declaration only. Or we should try to solve this on the reader side.

ChuanqiXu9 wrote:

Yeah, but I tried to find the root cause : )

https://github.com/llvm/llvm-project/pull/83237
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] release/19.x: [compiler-rt] Fix definition of `usize` on 32-bit Windows (PR #106303)

2024-08-28 Thread Martin Storsjö via llvm-branch-commits

mstorsjo wrote:

> @mstorsjo What do you think about merging this PR to the release branch?

LGTM!

https://github.com/llvm/llvm-project/pull/106303
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tools-extra] [clangd] Add clangd 19 release notes (PR #105975)

2024-08-28 Thread kadir çetinkaya via llvm-branch-commits

https://github.com/kadircet approved this pull request.

thanks a lot for doing this @HighCommander4!

https://github.com/llvm/llvm-project/pull/105975
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits

https://github.com/ChuanqiXu9 commented:

The patch looks good to me except the thing I mentioned in 
https://github.com/llvm/llvm-project/pull/99282#pullrequestreview-2265588601

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -1455,6 +1462,74 @@ struct SwitchCoroutineSplitter {
 setCoroInfo(F, Shape, Clones);
   }
 
+  // Create a variant of ramp function that does not perform heap allocation
+  // for a switch ABI coroutine.
+  //
+  // The newly split `.noalloc` ramp function has the following differences:
+  //  - Has one additional frame pointer parameter in lieu of dynamic
+  //  allocation.
+  //  - Suppressed allocations by replacing coro.alloc and coro.free.
+  static Function *createNoAllocVariant(Function &F, coro::Shape &Shape,
+SmallVectorImpl &Clones) {
+auto *OrigFnTy = F.getFunctionType();

ChuanqiXu9 wrote:

nit: I feel better with an assertion here that the ABI is switch ABI.

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -1455,6 +1462,74 @@ struct SwitchCoroutineSplitter {
 setCoroInfo(F, Shape, Clones);
   }
 
+  // Create a variant of ramp function that does not perform heap allocation
+  // for a switch ABI coroutine.
+  //
+  // The newly split `.noalloc` ramp function has the following differences:
+  //  - Has one additional frame pointer parameter in lieu of dynamic
+  //  allocation.
+  //  - Suppressed allocations by replacing coro.alloc and coro.free.
+  static Function *createNoAllocVariant(Function &F, coro::Shape &Shape,
+SmallVectorImpl &Clones) {
+auto *OrigFnTy = F.getFunctionType();
+auto OldParams = OrigFnTy->params();
+
+SmallVector NewParams;
+NewParams.reserve(OldParams.size() + 1);
+NewParams.append(OldParams.begin(), OldParams.end());
+NewParams.push_back(PointerType::getUnqual(Shape.FrameTy));
+
+auto *NewFnTy = FunctionType::get(OrigFnTy->getReturnType(), NewParams,
+  OrigFnTy->isVarArg());
+Function *NoAllocF =
+Function::Create(NewFnTy, F.getLinkage(), F.getName() + ".noalloc");
+
+ValueToValueMapTy VMap;
+unsigned int Idx = 0;
+for (const auto &I : F.args()) {
+  VMap[&I] = NoAllocF->getArg(Idx++);
+}
+SmallVector Returns;
+CloneFunctionInto(NoAllocF, &F, VMap,
+  CloneFunctionChangeType::LocalChangesOnly, Returns);
+
+if (Shape.CoroBegin) {
+  auto *NewCoroBegin =
+  cast_if_present(VMap[Shape.CoroBegin]);
+  auto *NewCoroId = cast(NewCoroBegin->getId());
+  coro::replaceCoroFree(NewCoroId, /*Elide=*/true);
+  coro::suppressCoroAllocs(NewCoroId);
+  NewCoroBegin->replaceAllUsesWith(NoAllocF->getArg(Idx));

ChuanqiXu9 wrote:

nit: it looks better to use `FrameIdx` below instead of using the induction 
variable across code sections.

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -26,6 +26,10 @@ bool declaresIntrinsics(const Module &M,
 const std::initializer_list);
 void replaceCoroFree(CoroIdInst *CoroId, bool Elide);
 
+void suppressCoroAllocs(CoroIdInst *CoroId);

ChuanqiXu9 wrote:

Let's add some comments for this since I can't guess its job by its name.

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -2049,6 +2055,21 @@ the coroutine must reach the final suspend point when it 
get destroyed.
 
 This attribute only works for switched-resume coroutines now.
 
+coro_elide_safe
+---
+
+When a Call or Invoke instruction is marked with `coro_elide_safe`,
+CoroAnnotationElidePass performs heap elision when possible. Note that for
+recursive or mutually recursive functions this elision is usually not possible.
+
+coro_gen_noalloc_ramp
+-
+
+This attribute hints CoroSplitPass to generate a `f.noalloc` ramp function for

ChuanqiXu9 wrote:

It will be better to explain and describe the `f.noalloc` ramp function  in 
this document. And it will be better to have some  example codes for it and 
compare it with the normal ramp functions.

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -0,0 +1,147 @@
+//===- CoroAnnotationElide.cpp - Elide attributed safe coroutine calls 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
+
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Utils/CallGraphUpdater.h"
+
+#include 
+
+using namespace llvm;
+
+#define DEBUG_TYPE "coro-annotation-elide"
+
+static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
+  for (Instruction &I : F->getEntryBlock())
+if (!isa(&I))
+  return &I;
+  llvm_unreachable("no terminator in the entry block");
+}
+
+// Create an alloca in the caller, using FrameSize and FrameAlign as the callee
+// coroutine's activation frame.
+static Value *allocateFrameInCaller(Function *Caller, uint64_t FrameSize,
+Align FrameAlign) {
+  LLVMContext &C = Caller->getContext();
+  BasicBlock::iterator InsertPt =
+  getFirstNonAllocaInTheEntryBlock(Caller)->getIterator();
+  const DataLayout &DL = Caller->getDataLayout();
+  auto FrameTy = ArrayType::get(Type::getInt8Ty(C), FrameSize);
+  auto *Frame = new AllocaInst(FrameTy, DL.getAllocaAddrSpace(), "", InsertPt);
+  Frame->setAlignment(FrameAlign);
+  return new BitCastInst(Frame, PointerType::getUnqual(C), "vFrame", InsertPt);

ChuanqiXu9 wrote:

Why do we need bit case here? Since I remember we're in the era of  opaque 
pointers. Do I misunderstand anything?

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -0,0 +1,147 @@
+//===- CoroAnnotationElide.cpp - Elide attributed safe coroutine calls 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
+
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Utils/CallGraphUpdater.h"
+
+#include 
+
+using namespace llvm;
+
+#define DEBUG_TYPE "coro-annotation-elide"
+
+static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
+  for (Instruction &I : F->getEntryBlock())
+if (!isa(&I))
+  return &I;
+  llvm_unreachable("no terminator in the entry block");
+}
+
+// Create an alloca in the caller, using FrameSize and FrameAlign as the callee
+// coroutine's activation frame.
+static Value *allocateFrameInCaller(Function *Caller, uint64_t FrameSize,
+Align FrameAlign) {
+  LLVMContext &C = Caller->getContext();
+  BasicBlock::iterator InsertPt =
+  getFirstNonAllocaInTheEntryBlock(Caller)->getIterator();
+  const DataLayout &DL = Caller->getDataLayout();
+  auto FrameTy = ArrayType::get(Type::getInt8Ty(C), FrameSize);
+  auto *Frame = new AllocaInst(FrameTy, DL.getAllocaAddrSpace(), "", InsertPt);
+  Frame->setAlignment(FrameAlign);
+  return new BitCastInst(Frame, PointerType::getUnqual(C), "vFrame", InsertPt);
+}
+
+// Given a call or invoke instruction to the elide safe coroutine, this 
function
+// does the following:
+//  - Allocate a frame for the callee coroutine in the caller using alloca.
+//  - Replace the old CB with a new Call or Invoke to `NewCallee`, with the
+//pointer to the frame as an additional argument to NewCallee.
+static void processCall(CallBase *CB, Function *Caller, Function *NewCallee,
+uint64_t FrameSize, Align FrameAlign) {
+  auto *FramePtr = allocateFrameInCaller(Caller, FrameSize, FrameAlign);
+  auto NewCBInsertPt = CB->getIterator();
+  llvm::CallBase *NewCB = nullptr;
+  SmallVector NewArgs;
+  NewArgs.append(CB->arg_begin(), CB->arg_end());
+  NewArgs.push_back(FramePtr);
+
+  if (auto *CI = dyn_cast(CB)) {
+auto *NewCI = CallInst::Create(NewCallee->getFunctionType(), NewCallee,
+   NewArgs, "", NewCBInsertPt);
+NewCI->setTailCallKind(CI->getTailCallKind());
+NewCB = NewCI;

ChuanqiXu9 wrote:

Out of curious, why do we use a new variable here but not in the following 
branch?

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -0,0 +1,147 @@
+//===- CoroAnnotationElide.cpp - Elide attributed safe coroutine calls 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
+
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Utils/CallGraphUpdater.h"
+
+#include 
+
+using namespace llvm;
+
+#define DEBUG_TYPE "coro-annotation-elide"
+
+static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
+  for (Instruction &I : F->getEntryBlock())
+if (!isa(&I))
+  return &I;
+  llvm_unreachable("no terminator in the entry block");
+}
+
+// Create an alloca in the caller, using FrameSize and FrameAlign as the callee
+// coroutine's activation frame.
+static Value *allocateFrameInCaller(Function *Caller, uint64_t FrameSize,
+Align FrameAlign) {
+  LLVMContext &C = Caller->getContext();
+  BasicBlock::iterator InsertPt =
+  getFirstNonAllocaInTheEntryBlock(Caller)->getIterator();
+  const DataLayout &DL = Caller->getDataLayout();
+  auto FrameTy = ArrayType::get(Type::getInt8Ty(C), FrameSize);
+  auto *Frame = new AllocaInst(FrameTy, DL.getAllocaAddrSpace(), "", InsertPt);
+  Frame->setAlignment(FrameAlign);
+  return new BitCastInst(Frame, PointerType::getUnqual(C), "vFrame", InsertPt);
+}
+
+// Given a call or invoke instruction to the elide safe coroutine, this 
function
+// does the following:
+//  - Allocate a frame for the callee coroutine in the caller using alloca.
+//  - Replace the old CB with a new Call or Invoke to `NewCallee`, with the
+//pointer to the frame as an additional argument to NewCallee.
+static void processCall(CallBase *CB, Function *Caller, Function *NewCallee,
+uint64_t FrameSize, Align FrameAlign) {
+  auto *FramePtr = allocateFrameInCaller(Caller, FrameSize, FrameAlign);

ChuanqiXu9 wrote:

It will be better for the performance to generate the lifetime intrinsics for 
the new frame.

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction (PR #106332)

2024-08-28 Thread via llvm-branch-commits


@@ -41,11 +43,82 @@ LoongArchMatInt::InstSeq 
LoongArchMatInt::generateInstSeq(int64_t Val) {
   Insts.push_back(Inst(LoongArch::ORI, Lo12));
   }
 
+  // hi32
+  // Higher20
   if (SignExtend32<1>(Hi20 >> 19) != SignExtend32<20>(Higher20))
 Insts.push_back(Inst(LoongArch::LU32I_D, SignExtend64<20>(Higher20)));
 
+  // Highest12
   if (SignExtend32<1>(Higher20 >> 19) != SignExtend32<12>(Highest12))
 Insts.push_back(Inst(LoongArch::LU52I_D, SignExtend64<12>(Highest12)));
 
+  size_t N = Insts.size();
+  if (N < 3)
+return Insts;
+
+  // When the number of instruction sequences is greater than 2, we have the
+  // opportunity to optimize using the BSTRINS_D instruction. The scenario is 
as
+  // follows:
+  //
+  // N of Insts = 3
+  // 1. ORI + LU32I_D + LU52I_D => ORI + BSTRINS_D, TmpVal = ORI
+  // 2. ADDI_W + LU32I_D + LU32I_D  =>  ADDI_W + BSTRINS_D, TmpVal = ADDI_W
+  // 3. LU12I_W + ORI + LU32I_D => ORI + BSTRINS_D, TmpVal = ORI
+  // 4. LU12I_W + LU32I_D + LU52I_D => LU12I_W + BSTRINS_D, TmpVal = LU12I_W
+  //
+  // N of Insts = 4
+  // 5. LU12I_W + ORI + LU32I_D + LU52I_D => LU12I_W + ORI + BSTRINS_D
+  //  => ORI + LU52I_D + BSTRINS_D
+  //TmpVal = (LU12I_W | ORI) or (ORI | LU52I_D)
+  // The BSTRINS_D instruction will use the `TmpVal` to construct the `Val`.
+  uint64_t TmpVal1 = 0;
+  uint64_t TmpVal2 = 0;
+  switch (Insts[0].Opc) {
+  default:
+llvm_unreachable("unexpected opcode");
+break;
+  case LoongArch::LU12I_W:
+if (Insts[1].Opc == LoongArch::ORI) {
+  TmpVal1 = Insts[1].Imm;
+  if (N == 3)
+break;
+  TmpVal2 = Insts[3].Imm << 52 | TmpVal1;
+}
+TmpVal1 |= Insts[0].Imm << 12;
+break;
+  case LoongArch::ORI:
+  case LoongArch::ADDI_W:
+TmpVal1 = Insts[0].Imm;
+break;
+  }
+
+  for (uint64_t Msb = 32; Msb < 64; ++Msb) {
+uint64_t HighMask = ~((1ULL << (Msb + 1)) - 1);
+for (uint64_t Lsb = Msb; Lsb > 0; --Lsb) {

heiher wrote:

It appears the maximum number of iterations may be up to `∑_{i=32}^{63}`. Could 
we reduce the complexity?

https://github.com/llvm/llvm-project/pull/106332
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction (PR #106332)

2024-08-28 Thread via llvm-branch-commits


@@ -41,11 +43,82 @@ LoongArchMatInt::InstSeq 
LoongArchMatInt::generateInstSeq(int64_t Val) {
   Insts.push_back(Inst(LoongArch::ORI, Lo12));
   }
 
+  // hi32
+  // Higher20
   if (SignExtend32<1>(Hi20 >> 19) != SignExtend32<20>(Higher20))
 Insts.push_back(Inst(LoongArch::LU32I_D, SignExtend64<20>(Higher20)));
 
+  // Highest12
   if (SignExtend32<1>(Higher20 >> 19) != SignExtend32<12>(Highest12))
 Insts.push_back(Inst(LoongArch::LU52I_D, SignExtend64<12>(Highest12)));
 
+  size_t N = Insts.size();
+  if (N < 3)
+return Insts;
+
+  // When the number of instruction sequences is greater than 2, we have the
+  // opportunity to optimize using the BSTRINS_D instruction. The scenario is 
as
+  // follows:
+  //
+  // N of Insts = 3
+  // 1. ORI + LU32I_D + LU52I_D => ORI + BSTRINS_D, TmpVal = ORI
+  // 2. ADDI_W + LU32I_D + LU32I_D  =>  ADDI_W + BSTRINS_D, TmpVal = ADDI_W

heiher wrote:

ADDI_W + LU32I_D + LU{52}I_D

https://github.com/llvm/llvm-project/pull/106332
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -0,0 +1,147 @@
+//===- CoroAnnotationElide.cpp - Elide attributed safe coroutine calls 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
+
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Utils/CallGraphUpdater.h"
+
+#include 
+
+using namespace llvm;
+
+#define DEBUG_TYPE "coro-annotation-elide"
+
+static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
+  for (Instruction &I : F->getEntryBlock())
+if (!isa(&I))
+  return &I;
+  llvm_unreachable("no terminator in the entry block");
+}
+
+// Create an alloca in the caller, using FrameSize and FrameAlign as the callee
+// coroutine's activation frame.
+static Value *allocateFrameInCaller(Function *Caller, uint64_t FrameSize,
+Align FrameAlign) {
+  LLVMContext &C = Caller->getContext();
+  BasicBlock::iterator InsertPt =
+  getFirstNonAllocaInTheEntryBlock(Caller)->getIterator();
+  const DataLayout &DL = Caller->getDataLayout();
+  auto FrameTy = ArrayType::get(Type::getInt8Ty(C), FrameSize);
+  auto *Frame = new AllocaInst(FrameTy, DL.getAllocaAddrSpace(), "", InsertPt);
+  Frame->setAlignment(FrameAlign);
+  return new BitCastInst(Frame, PointerType::getUnqual(C), "vFrame", InsertPt);
+}
+
+// Given a call or invoke instruction to the elide safe coroutine, this 
function
+// does the following:
+//  - Allocate a frame for the callee coroutine in the caller using alloca.
+//  - Replace the old CB with a new Call or Invoke to `NewCallee`, with the
+//pointer to the frame as an additional argument to NewCallee.
+static void processCall(CallBase *CB, Function *Caller, Function *NewCallee,
+uint64_t FrameSize, Align FrameAlign) {
+  auto *FramePtr = allocateFrameInCaller(Caller, FrameSize, FrameAlign);

ChuanqiXu9 wrote:

This can be done as a new optimization in other patches. But let's leave to 
TODO here.

I think we can do this by introducing two pesudo lifetime intrinsics in the 
frontend around the `co_await` expression and convert the pesudo lifetime 
intrinsics to real lifetime intrinsics here.

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits

https://github.com/ChuanqiXu9 edited 
https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)

2024-08-28 Thread Serge Pavlov via llvm-branch-commits

https://github.com/spavloff approved this pull request.

LGTM.


https://github.com/llvm/llvm-project/pull/105577
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction (PR #106332)

2024-08-28 Thread via llvm-branch-commits

https://github.com/wangleiat updated 
https://github.com/llvm/llvm-project/pull/106332

>From b2e3659d23ff3a576e2967576d501b24d6466e87 Mon Sep 17 00:00:00 2001
From: wanglei 
Date: Wed, 28 Aug 2024 12:16:47 +0800
Subject: [PATCH] update test sextw-removal.ll

Created using spr 1.3.5-bogner
---
 llvm/test/CodeGen/LoongArch/sextw-removal.ll | 40 
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/llvm/test/CodeGen/LoongArch/sextw-removal.ll 
b/llvm/test/CodeGen/LoongArch/sextw-removal.ll
index 2bb39395c1d1b6..7500b5ae09359a 100644
--- a/llvm/test/CodeGen/LoongArch/sextw-removal.ll
+++ b/llvm/test/CodeGen/LoongArch/sextw-removal.ll
@@ -323,21 +323,17 @@ define void @test7(i32 signext %arg, i32 signext %arg1) 
nounwind {
 ; CHECK-NEXT:st.d $s2, $sp, 8 # 8-byte Folded Spill
 ; CHECK-NEXT:sra.w $a0, $a0, $a1
 ; CHECK-NEXT:lu12i.w $a1, 349525
-; CHECK-NEXT:ori $a1, $a1, 1365
-; CHECK-NEXT:lu32i.d $a1, 349525
-; CHECK-NEXT:lu52i.d $fp, $a1, 1365
+; CHECK-NEXT:ori $fp, $a1, 1365
+; CHECK-NEXT:bstrins.d $fp, $fp, 62, 32
 ; CHECK-NEXT:lu12i.w $a1, 209715
-; CHECK-NEXT:ori $a1, $a1, 819
-; CHECK-NEXT:lu32i.d $a1, 209715
-; CHECK-NEXT:lu52i.d $s0, $a1, 819
+; CHECK-NEXT:ori $s0, $a1, 819
+; CHECK-NEXT:bstrins.d $s0, $s0, 61, 32
 ; CHECK-NEXT:lu12i.w $a1, 61680
-; CHECK-NEXT:ori $a1, $a1, 3855
-; CHECK-NEXT:lu32i.d $a1, -61681
-; CHECK-NEXT:lu52i.d $s1, $a1, 240
+; CHECK-NEXT:ori $s1, $a1, 3855
+; CHECK-NEXT:bstrins.d $s1, $s1, 59, 32
 ; CHECK-NEXT:lu12i.w $a1, 4112
-; CHECK-NEXT:ori $a1, $a1, 257
-; CHECK-NEXT:lu32i.d $a1, 65793
-; CHECK-NEXT:lu52i.d $s2, $a1, 16
+; CHECK-NEXT:ori $s2, $a1, 257
+; CHECK-NEXT:bstrins.d $s2, $s2, 56, 32
 ; CHECK-NEXT:.p2align 4, , 16
 ; CHECK-NEXT:  .LBB6_1: # %bb2
 ; CHECK-NEXT:# =>This Inner Loop Header: Depth=1
@@ -374,21 +370,17 @@ define void @test7(i32 signext %arg, i32 signext %arg1) 
nounwind {
 ; NORMV-NEXT:st.d $s2, $sp, 8 # 8-byte Folded Spill
 ; NORMV-NEXT:sra.w $a0, $a0, $a1
 ; NORMV-NEXT:lu12i.w $a1, 349525
-; NORMV-NEXT:ori $a1, $a1, 1365
-; NORMV-NEXT:lu32i.d $a1, 349525
-; NORMV-NEXT:lu52i.d $fp, $a1, 1365
+; NORMV-NEXT:ori $fp, $a1, 1365
+; NORMV-NEXT:bstrins.d $fp, $fp, 62, 32
 ; NORMV-NEXT:lu12i.w $a1, 209715
-; NORMV-NEXT:ori $a1, $a1, 819
-; NORMV-NEXT:lu32i.d $a1, 209715
-; NORMV-NEXT:lu52i.d $s0, $a1, 819
+; NORMV-NEXT:ori $s0, $a1, 819
+; NORMV-NEXT:bstrins.d $s0, $s0, 61, 32
 ; NORMV-NEXT:lu12i.w $a1, 61680
-; NORMV-NEXT:ori $a1, $a1, 3855
-; NORMV-NEXT:lu32i.d $a1, -61681
-; NORMV-NEXT:lu52i.d $s1, $a1, 240
+; NORMV-NEXT:ori $s1, $a1, 3855
+; NORMV-NEXT:bstrins.d $s1, $s1, 59, 32
 ; NORMV-NEXT:lu12i.w $a1, 4112
-; NORMV-NEXT:ori $a1, $a1, 257
-; NORMV-NEXT:lu32i.d $a1, 65793
-; NORMV-NEXT:lu52i.d $s2, $a1, 16
+; NORMV-NEXT:ori $s2, $a1, 257
+; NORMV-NEXT:bstrins.d $s2, $s2, 56, 32
 ; NORMV-NEXT:.p2align 4, , 16
 ; NORMV-NEXT:  .LBB6_1: # %bb2
 ; NORMV-NEXT:# =>This Inner Loop Header: Depth=1

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction (PR #106332)

2024-08-28 Thread via llvm-branch-commits

https://github.com/wangleiat updated 
https://github.com/llvm/llvm-project/pull/106332

>From b2e3659d23ff3a576e2967576d501b24d6466e87 Mon Sep 17 00:00:00 2001
From: wanglei 
Date: Wed, 28 Aug 2024 12:16:47 +0800
Subject: [PATCH] update test sextw-removal.ll

Created using spr 1.3.5-bogner
---
 llvm/test/CodeGen/LoongArch/sextw-removal.ll | 40 
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/llvm/test/CodeGen/LoongArch/sextw-removal.ll 
b/llvm/test/CodeGen/LoongArch/sextw-removal.ll
index 2bb39395c1d1b6..7500b5ae09359a 100644
--- a/llvm/test/CodeGen/LoongArch/sextw-removal.ll
+++ b/llvm/test/CodeGen/LoongArch/sextw-removal.ll
@@ -323,21 +323,17 @@ define void @test7(i32 signext %arg, i32 signext %arg1) 
nounwind {
 ; CHECK-NEXT:st.d $s2, $sp, 8 # 8-byte Folded Spill
 ; CHECK-NEXT:sra.w $a0, $a0, $a1
 ; CHECK-NEXT:lu12i.w $a1, 349525
-; CHECK-NEXT:ori $a1, $a1, 1365
-; CHECK-NEXT:lu32i.d $a1, 349525
-; CHECK-NEXT:lu52i.d $fp, $a1, 1365
+; CHECK-NEXT:ori $fp, $a1, 1365
+; CHECK-NEXT:bstrins.d $fp, $fp, 62, 32
 ; CHECK-NEXT:lu12i.w $a1, 209715
-; CHECK-NEXT:ori $a1, $a1, 819
-; CHECK-NEXT:lu32i.d $a1, 209715
-; CHECK-NEXT:lu52i.d $s0, $a1, 819
+; CHECK-NEXT:ori $s0, $a1, 819
+; CHECK-NEXT:bstrins.d $s0, $s0, 61, 32
 ; CHECK-NEXT:lu12i.w $a1, 61680
-; CHECK-NEXT:ori $a1, $a1, 3855
-; CHECK-NEXT:lu32i.d $a1, -61681
-; CHECK-NEXT:lu52i.d $s1, $a1, 240
+; CHECK-NEXT:ori $s1, $a1, 3855
+; CHECK-NEXT:bstrins.d $s1, $s1, 59, 32
 ; CHECK-NEXT:lu12i.w $a1, 4112
-; CHECK-NEXT:ori $a1, $a1, 257
-; CHECK-NEXT:lu32i.d $a1, 65793
-; CHECK-NEXT:lu52i.d $s2, $a1, 16
+; CHECK-NEXT:ori $s2, $a1, 257
+; CHECK-NEXT:bstrins.d $s2, $s2, 56, 32
 ; CHECK-NEXT:.p2align 4, , 16
 ; CHECK-NEXT:  .LBB6_1: # %bb2
 ; CHECK-NEXT:# =>This Inner Loop Header: Depth=1
@@ -374,21 +370,17 @@ define void @test7(i32 signext %arg, i32 signext %arg1) 
nounwind {
 ; NORMV-NEXT:st.d $s2, $sp, 8 # 8-byte Folded Spill
 ; NORMV-NEXT:sra.w $a0, $a0, $a1
 ; NORMV-NEXT:lu12i.w $a1, 349525
-; NORMV-NEXT:ori $a1, $a1, 1365
-; NORMV-NEXT:lu32i.d $a1, 349525
-; NORMV-NEXT:lu52i.d $fp, $a1, 1365
+; NORMV-NEXT:ori $fp, $a1, 1365
+; NORMV-NEXT:bstrins.d $fp, $fp, 62, 32
 ; NORMV-NEXT:lu12i.w $a1, 209715
-; NORMV-NEXT:ori $a1, $a1, 819
-; NORMV-NEXT:lu32i.d $a1, 209715
-; NORMV-NEXT:lu52i.d $s0, $a1, 819
+; NORMV-NEXT:ori $s0, $a1, 819
+; NORMV-NEXT:bstrins.d $s0, $s0, 61, 32
 ; NORMV-NEXT:lu12i.w $a1, 61680
-; NORMV-NEXT:ori $a1, $a1, 3855
-; NORMV-NEXT:lu32i.d $a1, -61681
-; NORMV-NEXT:lu52i.d $s1, $a1, 240
+; NORMV-NEXT:ori $s1, $a1, 3855
+; NORMV-NEXT:bstrins.d $s1, $s1, 59, 32
 ; NORMV-NEXT:lu12i.w $a1, 4112
-; NORMV-NEXT:ori $a1, $a1, 257
-; NORMV-NEXT:lu32i.d $a1, 65793
-; NORMV-NEXT:lu52i.d $s2, $a1, 16
+; NORMV-NEXT:ori $s2, $a1, 257
+; NORMV-NEXT:bstrins.d $s2, $s2, 56, 32
 ; NORMV-NEXT:.p2align 4, , 16
 ; NORMV-NEXT:  .LBB6_1: # %bb2
 ; NORMV-NEXT:# =>This Inner Loop Header: Depth=1

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction (PR #106332)

2024-08-28 Thread via llvm-branch-commits


@@ -41,11 +43,82 @@ LoongArchMatInt::InstSeq 
LoongArchMatInt::generateInstSeq(int64_t Val) {
   Insts.push_back(Inst(LoongArch::ORI, Lo12));
   }
 
+  // hi32
+  // Higher20
   if (SignExtend32<1>(Hi20 >> 19) != SignExtend32<20>(Higher20))
 Insts.push_back(Inst(LoongArch::LU32I_D, SignExtend64<20>(Higher20)));
 
+  // Highest12
   if (SignExtend32<1>(Higher20 >> 19) != SignExtend32<12>(Highest12))
 Insts.push_back(Inst(LoongArch::LU52I_D, SignExtend64<12>(Highest12)));
 
+  size_t N = Insts.size();
+  if (N < 3)
+return Insts;
+
+  // When the number of instruction sequences is greater than 2, we have the
+  // opportunity to optimize using the BSTRINS_D instruction. The scenario is 
as
+  // follows:
+  //
+  // N of Insts = 3
+  // 1. ORI + LU32I_D + LU52I_D => ORI + BSTRINS_D, TmpVal = ORI
+  // 2. ADDI_W + LU32I_D + LU32I_D  =>  ADDI_W + BSTRINS_D, TmpVal = ADDI_W
+  // 3. LU12I_W + ORI + LU32I_D => ORI + BSTRINS_D, TmpVal = ORI
+  // 4. LU12I_W + LU32I_D + LU52I_D => LU12I_W + BSTRINS_D, TmpVal = LU12I_W
+  //
+  // N of Insts = 4
+  // 5. LU12I_W + ORI + LU32I_D + LU52I_D => LU12I_W + ORI + BSTRINS_D
+  //  => ORI + LU52I_D + BSTRINS_D
+  //TmpVal = (LU12I_W | ORI) or (ORI | LU52I_D)
+  // The BSTRINS_D instruction will use the `TmpVal` to construct the `Val`.
+  uint64_t TmpVal1 = 0;
+  uint64_t TmpVal2 = 0;
+  switch (Insts[0].Opc) {
+  default:
+llvm_unreachable("unexpected opcode");
+break;
+  case LoongArch::LU12I_W:
+if (Insts[1].Opc == LoongArch::ORI) {
+  TmpVal1 = Insts[1].Imm;
+  if (N == 3)
+break;
+  TmpVal2 = Insts[3].Imm << 52 | TmpVal1;
+}
+TmpVal1 |= Insts[0].Imm << 12;
+break;
+  case LoongArch::ORI:
+  case LoongArch::ADDI_W:
+TmpVal1 = Insts[0].Imm;
+break;
+  }
+
+  for (uint64_t Msb = 32; Msb < 64; ++Msb) {
+uint64_t HighMask = ~((1ULL << (Msb + 1)) - 1);
+for (uint64_t Lsb = Msb; Lsb > 0; --Lsb) {

wangleiat wrote:

I currently don't have a good way to reduce the number of loops, except for 
some obvious cases such as `hi32 = lo32`.

https://github.com/llvm/llvm-project/pull/106332
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT][NFC] Rename profile-use-pseudo-probes (PR #106364)

2024-08-28 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov created 
https://github.com/llvm/llvm-project/pull/106364

None


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT][NFC] Rename profile-use-pseudo-probes (PR #106364)

2024-08-28 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/106364.diff


5 Files Affected:

- (modified) bolt/lib/Profile/DataAggregator.cpp (+2-2) 
- (modified) bolt/lib/Profile/YAMLProfileReader.cpp (-5) 
- (modified) bolt/lib/Profile/YAMLProfileWriter.cpp (+8-3) 
- (modified) bolt/lib/Rewrite/PseudoProbeRewriter.cpp (+3-3) 
- (modified) bolt/test/X86/pseudoprobe-decoding-inline.test (+3-3) 


``diff
diff --git a/bolt/lib/Profile/DataAggregator.cpp 
b/bolt/lib/Profile/DataAggregator.cpp
index 813d825f8b570c..10d745cc69824b 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -88,7 +88,7 @@ MaxSamples("max-samples",
   cl::cat(AggregatorCategory));
 
 extern cl::opt ProfileFormat;
-extern cl::opt ProfileUsePseudoProbes;
+extern cl::opt ProfileWritePseudoProbes;
 extern cl::opt SaveProfile;
 
 cl::opt ReadPreAggregated(
@@ -2300,7 +2300,7 @@ std::error_code 
DataAggregator::writeBATYAML(BinaryContext &BC,
   yaml::bolt::BinaryProfile BP;
 
   const MCPseudoProbeDecoder *PseudoProbeDecoder =
-  opts::ProfileUsePseudoProbes ? BC.getPseudoProbeDecoder() : nullptr;
+  opts::ProfileWritePseudoProbes ? BC.getPseudoProbeDecoder() : nullptr;
 
   // Fill out the header info.
   BP.Header.Version = 1;
diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 3eca5e972fa5ba..604a9fb4813be4 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -49,11 +49,6 @@ llvm::cl::opt
 llvm::cl::opt ProfileUseDFS("profile-use-dfs",
   cl::desc("use DFS order for YAML profile"),
   cl::Hidden, cl::cat(BoltOptCategory));
-
-llvm::cl::opt ProfileUsePseudoProbes(
-"profile-use-pseudo-probes",
-cl::desc("Use pseudo probes for profile generation and matching"),
-cl::Hidden, cl::cat(BoltOptCategory));
 } // namespace opts
 
 namespace llvm {
diff --git a/bolt/lib/Profile/YAMLProfileWriter.cpp 
b/bolt/lib/Profile/YAMLProfileWriter.cpp
index f74cf60e076d0a..ffbf2388e912fb 100644
--- a/bolt/lib/Profile/YAMLProfileWriter.cpp
+++ b/bolt/lib/Profile/YAMLProfileWriter.cpp
@@ -13,6 +13,7 @@
 #include "bolt/Profile/DataAggregator.h"
 #include "bolt/Profile/ProfileReaderBase.h"
 #include "bolt/Rewrite/RewriteInstance.h"
+#include "bolt/Utils/CommandLineOpts.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/raw_ostream.h"
@@ -21,8 +22,12 @@
 #define DEBUG_TYPE "bolt-prof"
 
 namespace opts {
-extern llvm::cl::opt ProfileUseDFS;
-extern llvm::cl::opt ProfileUsePseudoProbes;
+using namespace llvm;
+extern cl::opt ProfileUseDFS;
+cl::opt ProfileWritePseudoProbes(
+"profile-write-pseudo-probes",
+cl::desc("Use pseudo probes in profile generation"), cl::Hidden,
+cl::cat(BoltOptCategory));
 } // namespace opts
 
 namespace llvm {
@@ -59,7 +64,7 @@ YAMLProfileWriter::convert(const BinaryFunction &BF, bool 
UseDFS,
   yaml::bolt::BinaryFunctionProfile YamlBF;
   const BinaryContext &BC = BF.getBinaryContext();
   const MCPseudoProbeDecoder *PseudoProbeDecoder =
-  opts::ProfileUsePseudoProbes ? BC.getPseudoProbeDecoder() : nullptr;
+  opts::ProfileWritePseudoProbes ? BC.getPseudoProbeDecoder() : nullptr;
 
   const uint16_t LBRProfile = BF.getProfileFlags() & BinaryFunction::PF_LBR;
 
diff --git a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp 
b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
index 6e80d9b0014b7b..228913e6ea1f39 100644
--- a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
+++ b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
@@ -50,7 +50,7 @@ static cl::opt PrintPseudoProbes(
clEnumValN(PPP_All, "all", "enable all debugging printout")),
 cl::Hidden, cl::cat(BoltCategory));
 
-extern cl::opt ProfileUsePseudoProbes;
+extern cl::opt ProfileWritePseudoProbes;
 } // namespace opts
 
 namespace {
@@ -91,14 +91,14 @@ class PseudoProbeRewriter final : public MetadataRewriter {
 };
 
 Error PseudoProbeRewriter::preCFGInitializer() {
-  if (opts::ProfileUsePseudoProbes)
+  if (opts::ProfileWritePseudoProbes)
 parsePseudoProbe();
 
   return Error::success();
 }
 
 Error PseudoProbeRewriter::postEmitFinalizer() {
-  if (!opts::ProfileUsePseudoProbes)
+  if (!opts::ProfileWritePseudoProbes)
 parsePseudoProbe();
   updatePseudoProbes();
 
diff --git a/bolt/test/X86/pseudoprobe-decoding-inline.test 
b/bolt/test/X86/pseudoprobe-decoding-inline.test
index b361551e5711ea..1fdd00c7ef6c4b 100644
--- a/bolt/test/X86/pseudoprobe-decoding-inline.test
+++ b/bolt/test/X86/pseudoprobe-decoding-inline.test
@@ -6,11 +6,11 @@
 # PREAGG: B X:0 #main# 1 0
 ## Check pseudo-probes in regular YAML profile (non-BOLTed binary)
 # RUN: link_fdata %s 
%S/../../../llvm/test/tools/llvm-profgen/Inputs/inline-cs-pseudoprobe.perfbin 
%t.preagg PREAGG
-# RUN: perf2bolt 
%S/../../../llvm/test/tools/llvm-

[llvm-branch-commits] [BOLT][NFC] Rename profile-use-pseudo-probes (PR #106364)

2024-08-28 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/106364
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Only parse probes for profiled functions in profile-write-pseudo-probes mode (PR #106365)

2024-08-28 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov created 
https://github.com/llvm/llvm-project/pull/106365

None


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Only parse probes for profiled functions in profile-write-pseudo-probes mode (PR #106365)

2024-08-28 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/106365.diff


1 Files Affected:

- (modified) bolt/lib/Rewrite/PseudoProbeRewriter.cpp (+7-3) 


``diff
diff --git a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp 
b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
index 228913e6ea1f39..89a7fddbb5d2af 100644
--- a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
+++ b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
@@ -72,7 +72,8 @@ class PseudoProbeRewriter final : public MetadataRewriter {
 
   /// Parse .pseudo_probe_desc section and .pseudo_probe section
   /// Setup Pseudo probe decoder
-  void parsePseudoProbe();
+  /// If \p ProfiledOnly is set, only parse records for functions with profile.
+  void parsePseudoProbe(bool ProfiledOnly = false);
 
   /// PseudoProbe decoder
   std::shared_ptr ProbeDecoderPtr;
@@ -92,7 +93,7 @@ class PseudoProbeRewriter final : public MetadataRewriter {
 
 Error PseudoProbeRewriter::preCFGInitializer() {
   if (opts::ProfileWritePseudoProbes)
-parsePseudoProbe();
+parsePseudoProbe(true);
 
   return Error::success();
 }
@@ -105,7 +106,7 @@ Error PseudoProbeRewriter::postEmitFinalizer() {
   return Error::success();
 }
 
-void PseudoProbeRewriter::parsePseudoProbe() {
+void PseudoProbeRewriter::parsePseudoProbe(bool ProfiledOnly) {
   MCPseudoProbeDecoder &ProbeDecoder(*ProbeDecoderPtr);
   PseudoProbeDescSection = BC.getUniqueSectionByName(".pseudo_probe_desc");
   PseudoProbeSection = BC.getUniqueSectionByName(".pseudo_probe");
@@ -136,6 +137,7 @@ void PseudoProbeRewriter::parsePseudoProbe() {
   MCPseudoProbeDecoder::Uint64Map FuncStartAddrs;
   SmallVector Suffixes({".llvm.", ".destroy", ".resume"});
   for (const BinaryFunction *F : BC.getAllBinaryFunctions()) {
+bool HasProfile = F->hasProfileAvailable();
 for (const MCSymbol *Sym : F->getSymbols()) {
   StringRef SymName = NameResolver::restore(Sym->getName());
   if (std::optional CommonName =
@@ -144,6 +146,8 @@ void PseudoProbeRewriter::parsePseudoProbe() {
   }
   uint64_t GUID = Function::getGUID(SymName);
   FuncStartAddrs[GUID] = F->getAddress();
+  if (ProfiledOnly && HasProfile)
+GuidFilter.insert(GUID);
 }
   }
   Contents = PseudoProbeSection->getContents();

``




https://github.com/llvm/llvm-project/pull/106365
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Only parse probes for profiled functions in profile-write-pseudo-probes mode (PR #106365)

2024-08-28 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/106365
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Only parse probes for profiled functions in profile-write-pseudo-probes mode (PR #106365)

2024-08-28 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/106365
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Handle lowering unordered compare with inf (PR #100378)

2024-08-28 Thread Serge Pavlov via llvm-branch-commits


@@ -219,9 +219,13 @@ findSplitPointForStackProtector(MachineBasicBlock *BB,
 /// (i.e. fewer instructions should be required to lower it).  An example is 
the
 /// test "inf|normal|subnormal|zero", which is an inversion of "nan".
 /// \param Test The test as specified in 'is_fpclass' intrinsic invocation.
+///
+/// \param UseFP The intention is to perform the comparison using 
floating-point
+/// compare instructions which check for nan.
+///

spavloff wrote:

In the example in 
https://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments
 parameter lines are not separated by blank lines.

It is not a big deal, but the params separated from each other and NOT 
separated from the description didn't look good.

https://github.com/llvm/llvm-project/pull/100378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits


@@ -0,0 +1,147 @@
+//===- CoroAnnotationElide.cpp - Elide attributed safe coroutine calls 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
+
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Utils/CallGraphUpdater.h"
+
+#include 
+
+using namespace llvm;
+
+#define DEBUG_TYPE "coro-annotation-elide"
+
+static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
+  for (Instruction &I : F->getEntryBlock())
+if (!isa(&I))
+  return &I;
+  llvm_unreachable("no terminator in the entry block");
+}
+
+// Create an alloca in the caller, using FrameSize and FrameAlign as the callee
+// coroutine's activation frame.
+static Value *allocateFrameInCaller(Function *Caller, uint64_t FrameSize,
+Align FrameAlign) {
+  LLVMContext &C = Caller->getContext();
+  BasicBlock::iterator InsertPt =
+  getFirstNonAllocaInTheEntryBlock(Caller)->getIterator();
+  const DataLayout &DL = Caller->getDataLayout();
+  auto FrameTy = ArrayType::get(Type::getInt8Ty(C), FrameSize);
+  auto *Frame = new AllocaInst(FrameTy, DL.getAllocaAddrSpace(), "", InsertPt);
+  Frame->setAlignment(FrameAlign);
+  return new BitCastInst(Frame, PointerType::getUnqual(C), "vFrame", InsertPt);
+}
+
+// Given a call or invoke instruction to the elide safe coroutine, this 
function
+// does the following:
+//  - Allocate a frame for the callee coroutine in the caller using alloca.
+//  - Replace the old CB with a new Call or Invoke to `NewCallee`, with the
+//pointer to the frame as an additional argument to NewCallee.
+static void processCall(CallBase *CB, Function *Caller, Function *NewCallee,
+uint64_t FrameSize, Align FrameAlign) {
+  auto *FramePtr = allocateFrameInCaller(Caller, FrameSize, FrameAlign);

yuxuanchen1997 wrote:

The old CoroElide didn't have it and just right out of my mind I don't see a 
clear path for allowing this in the LLVM Coroutine semantics. 

In C++ semantics this is doable (lifetime of the coroutine ended at the full 
expression after `co_await`.) Maybe introduce this from FE? 

But sure leave a todo here for another day. 

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits


@@ -0,0 +1,147 @@
+//===- CoroAnnotationElide.cpp - Elide attributed safe coroutine calls 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
+
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Utils/CallGraphUpdater.h"
+
+#include 
+
+using namespace llvm;
+
+#define DEBUG_TYPE "coro-annotation-elide"
+
+static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
+  for (Instruction &I : F->getEntryBlock())
+if (!isa(&I))
+  return &I;
+  llvm_unreachable("no terminator in the entry block");
+}
+
+// Create an alloca in the caller, using FrameSize and FrameAlign as the callee
+// coroutine's activation frame.
+static Value *allocateFrameInCaller(Function *Caller, uint64_t FrameSize,
+Align FrameAlign) {
+  LLVMContext &C = Caller->getContext();
+  BasicBlock::iterator InsertPt =
+  getFirstNonAllocaInTheEntryBlock(Caller)->getIterator();
+  const DataLayout &DL = Caller->getDataLayout();
+  auto FrameTy = ArrayType::get(Type::getInt8Ty(C), FrameSize);
+  auto *Frame = new AllocaInst(FrameTy, DL.getAllocaAddrSpace(), "", InsertPt);
+  Frame->setAlignment(FrameAlign);
+  return new BitCastInst(Frame, PointerType::getUnqual(C), "vFrame", InsertPt);

yuxuanchen1997 wrote:

This is the same procedure as in `CoroElide`. Let's remove the bitcast I guess?

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits


@@ -0,0 +1,147 @@
+//===- CoroAnnotationElide.cpp - Elide attributed safe coroutine calls 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
+
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Utils/CallGraphUpdater.h"
+
+#include 
+
+using namespace llvm;
+
+#define DEBUG_TYPE "coro-annotation-elide"
+
+static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
+  for (Instruction &I : F->getEntryBlock())
+if (!isa(&I))
+  return &I;
+  llvm_unreachable("no terminator in the entry block");
+}
+
+// Create an alloca in the caller, using FrameSize and FrameAlign as the callee
+// coroutine's activation frame.
+static Value *allocateFrameInCaller(Function *Caller, uint64_t FrameSize,
+Align FrameAlign) {
+  LLVMContext &C = Caller->getContext();
+  BasicBlock::iterator InsertPt =
+  getFirstNonAllocaInTheEntryBlock(Caller)->getIterator();
+  const DataLayout &DL = Caller->getDataLayout();
+  auto FrameTy = ArrayType::get(Type::getInt8Ty(C), FrameSize);
+  auto *Frame = new AllocaInst(FrameTy, DL.getAllocaAddrSpace(), "", InsertPt);
+  Frame->setAlignment(FrameAlign);
+  return new BitCastInst(Frame, PointerType::getUnqual(C), "vFrame", InsertPt);
+}
+
+// Given a call or invoke instruction to the elide safe coroutine, this 
function
+// does the following:
+//  - Allocate a frame for the callee coroutine in the caller using alloca.
+//  - Replace the old CB with a new Call or Invoke to `NewCallee`, with the
+//pointer to the frame as an additional argument to NewCallee.
+static void processCall(CallBase *CB, Function *Caller, Function *NewCallee,
+uint64_t FrameSize, Align FrameAlign) {
+  auto *FramePtr = allocateFrameInCaller(Caller, FrameSize, FrameAlign);
+  auto NewCBInsertPt = CB->getIterator();
+  llvm::CallBase *NewCB = nullptr;
+  SmallVector NewArgs;
+  NewArgs.append(CB->arg_begin(), CB->arg_end());
+  NewArgs.push_back(FramePtr);
+
+  if (auto *CI = dyn_cast(CB)) {
+auto *NewCI = CallInst::Create(NewCallee->getFunctionType(), NewCallee,
+   NewArgs, "", NewCBInsertPt);
+NewCI->setTailCallKind(CI->getTailCallKind());
+NewCB = NewCI;

yuxuanchen1997 wrote:

`setTailCallKind` is on `CallInst` not `CallBase`. This `NewCB = NewCI` is 
upcasting the pointer. 

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AArch64: Use consistent atomicrmw expansion for FP operations (PR #103702)

2024-08-28 Thread Matt Arsenault via llvm-branch-commits


@@ -27056,21 +27056,35 @@ 
AArch64TargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {
  : AtomicExpansionKind::LLSC;
 }
 
+// Return true if the atomic operation expansion will lower to use a library
+// call, and is thus ineligible to use an LLSC expansion.
+static bool rmwOpMayLowerToLibcall(const AtomicRMWInst *RMW) {
+  if (!RMW->isFloatingPointOperation())
+return false;
+  switch (RMW->getType()->getScalarType()->getTypeID()) {
+  case Type::FloatTyID:
+  case Type::DoubleTyID:
+  case Type::HalfTyID:
+  case Type::BFloatTyID:
+return false;

arsenm wrote:

That is in the test (in the parent #103701)

https://github.com/llvm/llvm-project/pull/103702
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AArch64: Use consistent atomicrmw expansion for FP operations (PR #103702)

2024-08-28 Thread Eli Friedman via llvm-branch-commits

https://github.com/efriedma-quic approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/103702
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tools-extra] [clangd] Add clangd 19 release notes (PR #105975)

2024-08-28 Thread Nathan Ridge via llvm-branch-commits

https://github.com/HighCommander4 milestoned 
https://github.com/llvm/llvm-project/pull/105975
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tools-extra] [clangd] Add clangd 19 release notes (PR #105975)

2024-08-28 Thread Nathan Ridge via llvm-branch-commits

HighCommander4 wrote:

Thanks for the review.

@tstellar could you merge these release notes for us please?

https://github.com/llvm/llvm-project/pull/105975
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 5a00383 - Revert "Revert "[MemProf] Reduce cloning overhead by sharing nodes when possi…"

2024-08-28 Thread via llvm-branch-commits

Author: Teresa Johnson
Date: 2024-08-28T11:44:54-07:00
New Revision: 5a00383d7f192a2951e3add4d8ab1f918e7d58f8

URL: 
https://github.com/llvm/llvm-project/commit/5a00383d7f192a2951e3add4d8ab1f918e7d58f8
DIFF: 
https://github.com/llvm/llvm-project/commit/5a00383d7f192a2951e3add4d8ab1f918e7d58f8.diff

LOG: Revert "Revert "[MemProf] Reduce cloning overhead by sharing nodes when 
possi…"

This reverts commit 11aa31f595325d6b2dede3364e4b86d78fffe635.

Added: 


Modified: 
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp 
b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index 66b68d5cd457fb..c9de9c964bba0a 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -242,9 +242,16 @@ class CallsiteContextGraph {
 // recursion.
 bool Recursive = false;
 
-// The corresponding allocation or interior call.
+// The corresponding allocation or interior call. This is the primary call
+// for which we have created this node.
 CallInfo Call;
 
+// List of other calls that can be treated the same as the primary call
+// through cloning. I.e. located in the same function and have the same
+// (possibly pruned) stack ids. They will be updated the same way as the
+// primary call when assigning to function clones.
+std::vector MatchingCalls;
+
 // For alloc nodes this is a unique id assigned when constructed, and for
 // callsite stack nodes it is the original stack id when the node is
 // constructed from the memprof MIB metadata on the alloc nodes. Note that
@@ -457,6 +464,9 @@ class CallsiteContextGraph {
   /// iteration.
   MapVector> FuncToCallsWithMetadata;
 
+  /// Records the function each call is located in.
+  DenseMap CallToFunc;
+
   /// Map from callsite node to the enclosing caller function.
   std::map NodeToCallingFunc;
 
@@ -474,7 +484,8 @@ class CallsiteContextGraph {
   /// StackIdToMatchingCalls map.
   void assignStackNodesPostOrder(
   ContextNode *Node, DenseSet &Visited,
-  DenseMap> 
&StackIdToMatchingCalls);
+  DenseMap> &StackIdToMatchingCalls,
+  DenseMap &CallToMatchingCall);
 
   /// Duplicates the given set of context ids, updating the provided
   /// map from each original id with the newly generated context ids,
@@ -1230,10 +1241,11 @@ static void checkNode(const ContextNode *Node,
 
 template 
 void CallsiteContextGraph::
-assignStackNodesPostOrder(ContextNode *Node,
-  DenseSet &Visited,
-  DenseMap>
-  &StackIdToMatchingCalls) {
+assignStackNodesPostOrder(
+ContextNode *Node, DenseSet &Visited,
+DenseMap>
+&StackIdToMatchingCalls,
+DenseMap &CallToMatchingCall) {
   auto Inserted = Visited.insert(Node);
   if (!Inserted.second)
 return;
@@ -1246,7 +1258,8 @@ void CallsiteContextGraph::
 // Skip any that have been removed during the recursion.
 if (!Edge)
   continue;
-assignStackNodesPostOrder(Edge->Caller, Visited, StackIdToMatchingCalls);
+assignStackNodesPostOrder(Edge->Caller, Visited, StackIdToMatchingCalls,
+  CallToMatchingCall);
   }
 
   // If this node's stack id is in the map, update the graph to contain new
@@ -1289,8 +1302,19 @@ void CallsiteContextGraph::
 auto &[Call, Ids, Func, SavedContextIds] = Calls[I];
 // Skip any for which we didn't assign any ids, these don't get a node in
 // the graph.
-if (SavedContextIds.empty())
+if (SavedContextIds.empty()) {
+  // If this call has a matching call (located in the same function and
+  // having the same stack ids), simply add it to the context node created
+  // for its matching call earlier. These can be treated the same through
+  // cloning and get updated at the same time.
+  if (!CallToMatchingCall.contains(Call))
+continue;
+  auto MatchingCall = CallToMatchingCall[Call];
+  assert(NonAllocationCallToContextNodeMap.contains(MatchingCall));
+  NonAllocationCallToContextNodeMap[MatchingCall]->MatchingCalls.push_back(
+  Call);
   continue;
+}
 
 assert(LastId == Ids.back());
 
@@ -1422,6 +1446,10 @@ void CallsiteContextGraph::updateStackNodes() {
   // there is more than one call with the same stack ids. Their (possibly newly
   // duplicated) context ids are saved in the StackIdToMatchingCalls map.
   DenseMap> OldToNewContextIds;
+  // Save a map from each call to any that are found to match it. I.e. located
+  // in the same function and have the same (possibly pruned) stack ids. We use
+  // this to avoid creating extra graph nodes as they can be treated the same.
+  DenseMap CallToMatchingCall;
   for (auto &It : StackIdTo

[llvm-branch-commits] [clang] Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226) (PR #106439)

2024-08-28 Thread Artem Dergachev via llvm-branch-commits

https://github.com/haoNoQ milestoned 
https://github.com/llvm/llvm-project/pull/106439
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226) (PR #106439)

2024-08-28 Thread Artem Dergachev via llvm-branch-commits

https://github.com/haoNoQ created 
https://github.com/llvm/llvm-project/pull/106439

This reverts commit 90ccf2187332ff900d46a58a27cb0353577d37cb.

Cherry picked from commit 030ee841a9c9fbbd6e7c001e751737381da01f7b.

Conflicts:
clang/test/Driver/linker-wrapper-passes.c

>From 5e343fa7c1bef713f367afafbfe25e114c8f86d5 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 6 Aug 2024 21:33:25 -0500
Subject: [PATCH] Revert "[LinkerWrapper] Extend with usual pass options
 (#96704)" (#102226)

This reverts commit 90ccf2187332ff900d46a58a27cb0353577d37cb.

Fixes: https://github.com/llvm/llvm-project/issues/100212
(cherry picked from commit 030ee841a9c9fbbd6e7c001e751737381da01f7b)

Conflicts:
clang/test/Driver/linker-wrapper-passes.c
---
 clang/test/Driver/linker-wrapper-passes.c | 71 ---
 clang/test/lit.cfg.py | 12 
 clang/test/lit.site.cfg.py.in |  4 --
 3 files changed, 87 deletions(-)
 delete mode 100644 clang/test/Driver/linker-wrapper-passes.c

diff --git a/clang/test/Driver/linker-wrapper-passes.c 
b/clang/test/Driver/linker-wrapper-passes.c
deleted file mode 100644
index b257c942afa075..00
--- a/clang/test/Driver/linker-wrapper-passes.c
+++ /dev/null
@@ -1,71 +0,0 @@
-// Check various clang-linker-wrapper pass options after -offload-opt.
-
-// REQUIRES: llvm-plugins, llvm-examples
-// REQUIRES: x86-registered-target
-// REQUIRES: amdgpu-registered-target
-// Setup.
-// RUN: mkdir -p %t
-// RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \
-// RUN: -disable-O0-optnone -triple=x86_64-unknown-linux-gnu %s
-// RUN: %clang -cc1 -emit-llvm-bc -o %t/openmp-amdgcn-amd-amdhsa.bc \
-// RUN: -disable-O0-optnone -triple=amdgcn-amd-amdhsa %s
-// RUN: opt %t/openmp-amdgcn-amd-amdhsa.bc -o %t/openmp-amdgcn-amd-amdhsa.bc \
-// RUN: -passes=forceattrs -force-remove-attribute=f:noinline
-// RUN: clang-offload-packager -o %t/openmp-x86_64-unknown-linux-gnu.out \
-// RUN: 
--image=file=%t/openmp-amdgcn-amd-amdhsa.bc,triple=amdgcn-amd-amdhsa
-// RUN: %clang -cc1 -S -o %t/host-x86_64-unknown-linux-gnu.s \
-// RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
-// RUN: -fembed-offload-object=%t/openmp-x86_64-unknown-linux-gnu.out \
-// RUN: %t/host-x86_64-unknown-linux-gnu.bc
-// RUN: %clang -cc1as -o %t/host-x86_64-unknown-linux-gnu.o \
-// RUN: -triple x86_64-unknown-linux-gnu -filetype obj -target-cpu x86-64 \
-// RUN: %t/host-x86_64-unknown-linux-gnu.s
-
-// Check plugin, -passes, and no remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
-// RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
-// RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
-// RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \
-// RUN:   FileCheck -match-full-lines -check-prefixes=OUT %s
-
-// Check plugin, -p, and remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
-// RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
-// RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
-// RUN: --offload-opt=-p="function(goodbye),module(inline)" \
-// RUN: --offload-opt=-pass-remarks=inline \
-// RUN: --offload-opt=-pass-remarks-output=%t/remarks.yml \
-// RUN: --offload-opt=-pass-remarks-filter=inline \
-// RUN: --offload-opt=-pass-remarks-format=yaml 2>&1 | \
-// RUN:   FileCheck -match-full-lines -check-prefixes=OUT,REM %s
-// RUN: FileCheck -input-file=%t/remarks.yml -match-full-lines \
-// RUN: -check-prefixes=YML %s
-
-// Check handling of bad plugin.
-// RUN: not clang-linker-wrapper \
-// RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \
-// RUN:   FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s
-
-//  OUT-NOT: {{.}}
-//  OUT: Bye: f
-// OUT-NEXT: Bye: test
-// REM-NEXT: remark: {{.*}} 'f' inlined into 'test' {{.*}}
-//  OUT-NOT: {{.}}
-
-//  YML-NOT: {{.}}
-//  YML: --- !Passed
-// YML-NEXT: Pass: inline
-// YML-NEXT: Name: Inlined
-// YML-NEXT: Function: test
-// YML-NEXT: Args:
-//  YML:  - Callee: f
-//  YML:  - Caller: test
-//  YML: ...
-//  YML-NOT: {{.}}
-
-// BAD-PLUGIN-NOT: {{.}}
-// BAD-PLUGIN: {{.*}}Could not load library {{.*}}nonexistent.so{{.*}}
-// BAD-PLUGIN-NOT: {{.}}
-
-void f() {}
-void test() { f(); }
diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py
index 2bd7501136a10e..92a3361ce672e2 100644
--- a/clang/test/lit.cfg.py
+++ b/clang/test/lit.cfg.py
@@ -110,15 +110,6 @@
 if config.clang_examples:
 config.available_features.add("examples")
 
-if config.llvm_examples:
-config.available_features.add("llvm-examples")
-
-if config.llvm_linked_bye_extension:
-config.substitutions.append(("%offload-opt-loadbye", ""))
-else:
-loadbye = 
f"-load-pass-plugin={config.llvm_shlib_dir}/Bye{config.llvm_shlib_ext}"
-config.substitutions.append(("%offload-opt-loadbye", 
f"--offload-opt={loadby

[llvm-branch-commits] [clang] Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226) (PR #106439)

2024-08-28 Thread Artem Dergachev via llvm-branch-commits

https://github.com/haoNoQ edited 
https://github.com/llvm/llvm-project/pull/106439
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226) (PR #106439)

2024-08-28 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-driver

Author: Artem Dergachev (haoNoQ)


Changes

This reverts commit 90ccf2187332ff900d46a58a27cb0353577d37cb.

Cherry picked from commit 030ee841a9c9fbbd6e7c001e751737381da01f7b.

Conflicts:
clang/test/Driver/linker-wrapper-passes.c

---
Full diff: https://github.com/llvm/llvm-project/pull/106439.diff


3 Files Affected:

- (removed) clang/test/Driver/linker-wrapper-passes.c (-71) 
- (modified) clang/test/lit.cfg.py (-12) 
- (modified) clang/test/lit.site.cfg.py.in (-4) 


``diff
diff --git a/clang/test/Driver/linker-wrapper-passes.c 
b/clang/test/Driver/linker-wrapper-passes.c
deleted file mode 100644
index b257c942afa075..00
--- a/clang/test/Driver/linker-wrapper-passes.c
+++ /dev/null
@@ -1,71 +0,0 @@
-// Check various clang-linker-wrapper pass options after -offload-opt.
-
-// REQUIRES: llvm-plugins, llvm-examples
-// REQUIRES: x86-registered-target
-// REQUIRES: amdgpu-registered-target
-// Setup.
-// RUN: mkdir -p %t
-// RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \
-// RUN: -disable-O0-optnone -triple=x86_64-unknown-linux-gnu %s
-// RUN: %clang -cc1 -emit-llvm-bc -o %t/openmp-amdgcn-amd-amdhsa.bc \
-// RUN: -disable-O0-optnone -triple=amdgcn-amd-amdhsa %s
-// RUN: opt %t/openmp-amdgcn-amd-amdhsa.bc -o %t/openmp-amdgcn-amd-amdhsa.bc \
-// RUN: -passes=forceattrs -force-remove-attribute=f:noinline
-// RUN: clang-offload-packager -o %t/openmp-x86_64-unknown-linux-gnu.out \
-// RUN: 
--image=file=%t/openmp-amdgcn-amd-amdhsa.bc,triple=amdgcn-amd-amdhsa
-// RUN: %clang -cc1 -S -o %t/host-x86_64-unknown-linux-gnu.s \
-// RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
-// RUN: -fembed-offload-object=%t/openmp-x86_64-unknown-linux-gnu.out \
-// RUN: %t/host-x86_64-unknown-linux-gnu.bc
-// RUN: %clang -cc1as -o %t/host-x86_64-unknown-linux-gnu.o \
-// RUN: -triple x86_64-unknown-linux-gnu -filetype obj -target-cpu x86-64 \
-// RUN: %t/host-x86_64-unknown-linux-gnu.s
-
-// Check plugin, -passes, and no remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
-// RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
-// RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
-// RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \
-// RUN:   FileCheck -match-full-lines -check-prefixes=OUT %s
-
-// Check plugin, -p, and remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
-// RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
-// RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
-// RUN: --offload-opt=-p="function(goodbye),module(inline)" \
-// RUN: --offload-opt=-pass-remarks=inline \
-// RUN: --offload-opt=-pass-remarks-output=%t/remarks.yml \
-// RUN: --offload-opt=-pass-remarks-filter=inline \
-// RUN: --offload-opt=-pass-remarks-format=yaml 2>&1 | \
-// RUN:   FileCheck -match-full-lines -check-prefixes=OUT,REM %s
-// RUN: FileCheck -input-file=%t/remarks.yml -match-full-lines \
-// RUN: -check-prefixes=YML %s
-
-// Check handling of bad plugin.
-// RUN: not clang-linker-wrapper \
-// RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \
-// RUN:   FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s
-
-//  OUT-NOT: {{.}}
-//  OUT: Bye: f
-// OUT-NEXT: Bye: test
-// REM-NEXT: remark: {{.*}} 'f' inlined into 'test' {{.*}}
-//  OUT-NOT: {{.}}
-
-//  YML-NOT: {{.}}
-//  YML: --- !Passed
-// YML-NEXT: Pass: inline
-// YML-NEXT: Name: Inlined
-// YML-NEXT: Function: test
-// YML-NEXT: Args:
-//  YML:  - Callee: f
-//  YML:  - Caller: test
-//  YML: ...
-//  YML-NOT: {{.}}
-
-// BAD-PLUGIN-NOT: {{.}}
-// BAD-PLUGIN: {{.*}}Could not load library {{.*}}nonexistent.so{{.*}}
-// BAD-PLUGIN-NOT: {{.}}
-
-void f() {}
-void test() { f(); }
diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py
index 2bd7501136a10e..92a3361ce672e2 100644
--- a/clang/test/lit.cfg.py
+++ b/clang/test/lit.cfg.py
@@ -110,15 +110,6 @@
 if config.clang_examples:
 config.available_features.add("examples")
 
-if config.llvm_examples:
-config.available_features.add("llvm-examples")
-
-if config.llvm_linked_bye_extension:
-config.substitutions.append(("%offload-opt-loadbye", ""))
-else:
-loadbye = 
f"-load-pass-plugin={config.llvm_shlib_dir}/Bye{config.llvm_shlib_ext}"
-config.substitutions.append(("%offload-opt-loadbye", 
f"--offload-opt={loadbye}"))
-
 
 def have_host_jit_feature_support(feature_name):
 clang_repl_exe = lit.util.which("clang-repl", config.clang_tools_dir)
@@ -223,9 +214,6 @@ def have_host_clang_repl_cuda():
 if config.has_plugins and config.llvm_plugin_ext:
 config.available_features.add("plugins")
 
-if config.llvm_has_plugins and config.llvm_plugin_ext:
-config.available_features.add("llvm-plugins")
-
 if config.clang_default_pie_on_linux:
 config.available_features.add("default-p

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits


@@ -2049,6 +2055,21 @@ the coroutine must reach the final suspend point when it 
get destroyed.
 
 This attribute only works for switched-resume coroutines now.
 
+coro_elide_safe
+---
+
+When a Call or Invoke instruction is marked with `coro_elide_safe`,
+CoroAnnotationElidePass performs heap elision when possible. Note that for
+recursive or mutually recursive functions this elision is usually not possible.
+
+coro_gen_noalloc_ramp
+-
+
+This attribute hints CoroSplitPass to generate a `f.noalloc` ramp function for

yuxuanchen1997 wrote:

This attribute is deleted while addressing your feedback in 
https://github.com/llvm/llvm-project/pull/99282#pullrequestreview-2265588601

I can add a clarification in the documentation for coro_safe_elide.

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 1740035 - Revert "[CodeGen] Use MachineInstr::{all_uses, all_defs} (NFC) (#106404)"

2024-08-28 Thread via llvm-branch-commits

Author: Vitaly Buka
Date: 2024-08-28T13:35:28-07:00
New Revision: 1740035264c3326d7dabee0682dd3802bc4384d7

URL: 
https://github.com/llvm/llvm-project/commit/1740035264c3326d7dabee0682dd3802bc4384d7
DIFF: 
https://github.com/llvm/llvm-project/commit/1740035264c3326d7dabee0682dd3802bc4384d7.diff

LOG: Revert "[CodeGen] Use MachineInstr::{all_uses,all_defs} (NFC) (#106404)"

This reverts commit a4989cd603b8e8185e35e3c2b7b48b422d4898be.

Added: 


Modified: 
llvm/lib/CodeGen/MachineConvergenceVerifier.cpp
llvm/lib/CodeGen/MachineInstr.cpp
llvm/lib/CodeGen/RegAllocFast.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/MachineConvergenceVerifier.cpp 
b/llvm/lib/CodeGen/MachineConvergenceVerifier.cpp
index ac6b04a202c533..3d3c55faa82465 100644
--- a/llvm/lib/CodeGen/MachineConvergenceVerifier.cpp
+++ b/llvm/lib/CodeGen/MachineConvergenceVerifier.cpp
@@ -51,7 +51,9 @@ 
GenericConvergenceVerifier::findAndCheckConvergenceTokenUsed(
   const MachineRegisterInfo &MRI = Context.getFunction()->getRegInfo();
   const MachineInstr *TokenDef = nullptr;
 
-  for (const MachineOperand &MO : MI.all_uses()) {
+  for (const MachineOperand &MO : MI.operands()) {
+if (!MO.isReg() || !MO.isUse())
+  continue;
 Register OpReg = MO.getReg();
 if (!OpReg.isVirtual())
   continue;

diff  --git a/llvm/lib/CodeGen/MachineInstr.cpp 
b/llvm/lib/CodeGen/MachineInstr.cpp
index 7f81aeb545d328..f21910ee3a444a 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1041,9 +1041,10 @@ unsigned MachineInstr::getBundleSize() const {
 /// Returns true if the MachineInstr has an implicit-use operand of exactly
 /// the given register (not considering sub/super-registers).
 bool MachineInstr::hasRegisterImplicitUseOperand(Register Reg) const {
-  for (const MachineOperand &MO : all_uses())
-if (MO.isImplicit() && MO.getReg() == Reg)
+  for (const MachineOperand &MO : operands()) {
+if (MO.isReg() && MO.isUse() && MO.isImplicit() && MO.getReg() == Reg)
   return true;
+  }
   return false;
 }
 
@@ -1263,8 +1264,10 @@ unsigned MachineInstr::findTiedOperandIdx(unsigned 
OpIdx) const {
 /// clearKillInfo - Clears kill flags on all operands.
 ///
 void MachineInstr::clearKillInfo() {
-  for (MachineOperand &MO : all_uses())
-MO.setIsKill(false);
+  for (MachineOperand &MO : operands()) {
+if (MO.isReg() && MO.isUse())
+  MO.setIsKill(false);
+  }
 }
 
 void MachineInstr::substituteRegister(Register FromReg, Register ToReg,
@@ -1546,9 +1549,12 @@ bool MachineInstr::isLoadFoldBarrier() const {
 /// allDefsAreDead - Return true if all the defs of this instruction are dead.
 ///
 bool MachineInstr::allDefsAreDead() const {
-  for (const MachineOperand &MO : all_defs())
+  for (const MachineOperand &MO : operands()) {
+if (!MO.isReg() || MO.isUse())
+  continue;
 if (!MO.isDead())
   return false;
+  }
   return true;
 }
 
@@ -2057,8 +2063,8 @@ void MachineInstr::clearRegisterKills(Register Reg,
   const TargetRegisterInfo *RegInfo) {
   if (!Reg.isPhysical())
 RegInfo = nullptr;
-  for (MachineOperand &MO : all_uses()) {
-if (!MO.isKill())
+  for (MachineOperand &MO : operands()) {
+if (!MO.isReg() || !MO.isUse() || !MO.isKill())
   continue;
 Register OpReg = MO.getReg();
 if ((RegInfo && RegInfo->regsOverlap(Reg, OpReg)) || Reg == OpReg)

diff  --git a/llvm/lib/CodeGen/RegAllocFast.cpp 
b/llvm/lib/CodeGen/RegAllocFast.cpp
index a0a8a8897af7f2..6babd5a3f1f96f 100644
--- a/llvm/lib/CodeGen/RegAllocFast.cpp
+++ b/llvm/lib/CodeGen/RegAllocFast.cpp
@@ -1563,7 +1563,9 @@ void RegAllocFastImpl::allocateInstruction(MachineInstr 
&MI) {
   bool ReArrangedImplicitMOs = true;
   while (ReArrangedImplicitMOs) {
 ReArrangedImplicitMOs = false;
-for (MachineOperand &MO : MI.all_uses()) {
+for (MachineOperand &MO : MI.operands()) {
+  if (!MO.isReg() || !MO.isUse())
+continue;
   Register Reg = MO.getReg();
   if (!Reg.isVirtual() || !shouldAllocateRegister(Reg))
 continue;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] 77e8b2f - Revert "[mlir][spirv] Add an argmax integration test with `mlir-vulkan-runner…"

2024-08-28 Thread via llvm-branch-commits

Author: Jakub Kuderski
Date: 2024-08-28T17:25:55-04:00
New Revision: 77e8b2fe44d540e23f395789644ccc2d597a956a

URL: 
https://github.com/llvm/llvm-project/commit/77e8b2fe44d540e23f395789644ccc2d597a956a
DIFF: 
https://github.com/llvm/llvm-project/commit/77e8b2fe44d540e23f395789644ccc2d597a956a.diff

LOG: Revert "[mlir][spirv] Add an argmax integration test with 
`mlir-vulkan-runner…"

This reverts commit 17b7a9da46cef85b1a00b574c18c5f8cd5a761e1.

Added: 


Modified: 
mlir/tools/mlir-vulkan-runner/CMakeLists.txt
mlir/tools/mlir-vulkan-runner/mlir-vulkan-runner.cpp
utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

Removed: 
mlir/test/mlir-vulkan-runner/argmax.mlir



diff  --git a/mlir/test/mlir-vulkan-runner/argmax.mlir 
b/mlir/test/mlir-vulkan-runner/argmax.mlir
deleted file mode 100644
index d30c1cb5b58bdc..00
--- a/mlir/test/mlir-vulkan-runner/argmax.mlir
+++ /dev/null
@@ -1,109 +0,0 @@
-// RUN: mlir-vulkan-runner %s \
-// RUN:  --shared-libs=%vulkan-runtime-wrappers,%mlir_runner_utils \
-// RUN:  --entry-point-result=void | FileCheck %s
-
-// This kernel computes the argmax (index of the maximum element) from an array
-// of integers. Each thread computes a lane maximum using a single `scf.for`.
-// Then `gpu.subgroup_reduce` is used to find the maximum across the entire
-// subgroup, which is then used by SPIR-V subgroup ops to compute the argmax
-// of the entire input array. Note that this kernel only works if we have a
-// single workgroup.
-
-// CHECK: [15]
-module attributes {
-  gpu.container_module,
-  spirv.target_env = #spirv.target_env<
-#spirv.vce, 
#spirv.resource_limits<>>
-} {
-  gpu.module @kernels {
-gpu.func @kernel_argmax(%input : memref<128xi32>, %output : memref<1xi32>, 
%total_count_buf : memref<1xi32>) kernel
-  attributes {spirv.entry_point_abi = 
#spirv.entry_point_abi} {
-  %idx0 = arith.constant 0 : index
-  %idx1 = arith.constant 1 : index
-
-  %total_count = memref.load %total_count_buf[%idx0] : memref<1xi32>
-  %lane_count_idx = gpu.subgroup_size : index
-  %lane_count_i32 = index.castu %lane_count_idx : index to i32
-  %lane_id_idx = gpu.thread_id x
-  %lane_id_i32 = index.castu %lane_id_idx : index to i32
-  %lane_res_init = arith.constant 0 : i32
-  %lane_max_init = memref.load %input[%lane_id_idx] : memref<128xi32>
-  %num_batches_i32 = arith.divui %total_count, %lane_count_i32 : i32
-  %num_batches_idx = index.castu %num_batches_i32 : i32 to index
-
-  %lane_res, %lane_max = scf.for %iter = %idx1 to %num_batches_idx step 
%idx1
-  iter_args(%lane_res_iter = %lane_res_init, %lane_max_iter = 
%lane_max_init) -> (i32, i32) {
-%iter_i32 = index.castu %iter : index to i32
-%mul = arith.muli %lane_count_i32, %iter_i32 : i32
-%idx_i32 = arith.addi %mul, %lane_id_i32 : i32
-%idx = index.castu %idx_i32 : i32 to index
-%elem = memref.load %input[%idx] : memref<128xi32>
-%gt = arith.cmpi sgt, %elem, %lane_max_iter : i32
-%lane_res_next = arith.select %gt, %idx_i32, %lane_res_iter : i32
-%lane_max_next = arith.select %gt, %elem, %lane_max_iter : i32
-scf.yield %lane_res_next, %lane_max_next : i32, i32
-  }
-
-  %subgroup_max = gpu.subgroup_reduce maxsi %lane_max : (i32) -> (i32)
-  %eq = arith.cmpi eq, %lane_max, %subgroup_max : i32
-  %ballot = spirv.GroupNonUniformBallot  %eq : vector<4xi32>
-  %lsb = spirv.GroupNonUniformBallotFindLSB  %ballot : 
vector<4xi32>, i32
-  %cond = arith.cmpi eq, %lsb, %lane_id_i32 : i32
-
-  scf.if %cond {
-memref.store %lane_res, %output[%idx0] : memref<1xi32>
-  }
-
-  gpu.return
-}
-  }
-
-  func.func @main() {
-// Allocate 3 buffers.
-%in_buf = memref.alloc() : memref<128xi32>
-%out_buf = memref.alloc() : memref<1xi32>
-%total_count_buf = memref.alloc() : memref<1xi32>
-
-// Constants.
-%cst0 = arith.constant 0 : i32
-%idx0 = arith.constant 0 : index
-%idx1 = arith.constant 1 : index
-%idx16 = arith.constant 16 : index
-%idx32 = arith.constant 32 : index
-%idx48 = arith.constant 48 : index
-%idx64 = arith.constant 64 : index
-%idx80 = arith.constant 80 : index
-%idx96 = arith.constant 96 : index
-%idx112 = arith.constant 112 : index
-
-// Initialize input buffer.
-%in_vec = arith.constant dense<[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15]> : vector<16xi32>
-vector.store %in_vec, %in_buf[%idx0] : memref<128xi32>, vector<16xi32>
-vector.store %in_vec, %in_buf[%idx16] : memref<128xi32>, vector<16xi32>
-vector.store %in_vec, %in_buf[%idx32] : memref<128xi32>, vector<16xi32>
-vector.store %in_vec, %in_buf[%idx48] : memref<128xi32>, vector<16xi32>
-vector.store %in_vec, %in_buf[%idx64] : memref<128xi32>, vector<16xi32>
-vector.store %in_vec, %in_buf[%idx80] :

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits

https://github.com/yuxuanchen1997 updated 
https://github.com/llvm/llvm-project/pull/99285

>From d6f2e78230c0907db95568e5b920d574ce6b4758 Mon Sep 17 00:00:00 2001
From: Yuxuan Chen 
Date: Mon, 15 Jul 2024 15:01:39 -0700
Subject: [PATCH] [LLVM][Coroutines] Transform "coro_elide_safe" calls to
 switch ABI coroutines to the `noalloc` variant

---
 .../Coroutines/CoroAnnotationElide.h  |  36 +
 llvm/lib/Passes/PassBuilder.cpp   |   1 +
 llvm/lib/Passes/PassBuilderPipelines.cpp  |  10 +-
 llvm/lib/Passes/PassRegistry.def  |   1 +
 llvm/lib/Transforms/Coroutines/CMakeLists.txt |   1 +
 .../Coroutines/CoroAnnotationElide.cpp| 152 ++
 llvm/test/Other/new-pm-defaults.ll|   1 +
 .../Other/new-pm-thinlto-postlink-defaults.ll |   1 +
 .../new-pm-thinlto-postlink-pgo-defaults.ll   |   1 +
 ...-pm-thinlto-postlink-samplepgo-defaults.ll |   1 +
 .../Coroutines/coro-transform-must-elide.ll   |  76 +
 11 files changed, 279 insertions(+), 2 deletions(-)
 create mode 100644 
llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
 create mode 100644 llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
 create mode 100644 llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll

diff --git a/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h 
b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
new file mode 100644
index 00..352c9e14526697
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
@@ -0,0 +1,36 @@
+//===- CoroAnnotationElide.h - Elide attributed safe coroutine calls 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#ifndef LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+#define LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+
+#include "llvm/Analysis/CGSCCPassManager.h"
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+struct CoroAnnotationElidePass : PassInfoMixin {
+  CoroAnnotationElidePass() {}
+
+  PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
+LazyCallGraph &CG, CGSCCUpdateResult &UR);
+
+  static bool isRequired() { return false; }
+};
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 17eed97fd950c9..c2b99a0d1f8cea 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -138,6 +138,7 @@
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 1184123c7710f0..992b4fca8a6919 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/VirtualFileSystem.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
@@ -984,8 +985,10 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
   MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
   RequireAnalysisPass()));
 
-  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink)
+  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink) {
 MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
+MainCGPipeline.addPass(CoroAnnotationElidePass());
+  }
 
   // Make sure we don't affect potential future NoRerun CGSCC adaptors.
   MIWP.addLateModulePass(createModuleToFunctionPassAdaptor(
@@ -1027,9 +1030,12 @@ 
PassBuilder::buildModuleInlinerPipeline(OptimizationLevel Level,
   buildFunctionSimplificationPipeline(Level, Phase),
   PTO.EagerlyInvalidateAnalyses));
 
-  if (Phase != ThinOrFullLT

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits

https://github.com/yuxuanchen1997 updated 
https://github.com/llvm/llvm-project/pull/99283

>From e2a6027dd2af62f4fbfa92795873f0489fd35cfd Mon Sep 17 00:00:00 2001
From: Yuxuan Chen 
Date: Tue, 4 Jun 2024 23:22:00 -0700
Subject: [PATCH] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI
 coroutine ramp functions during CoroSplit

---
 llvm/docs/Coroutines.rst  |  18 +++
 llvm/lib/Transforms/Coroutines/CoroInternal.h |   7 +
 llvm/lib/Transforms/Coroutines/CoroSplit.cpp  | 150 +++---
 llvm/lib/Transforms/Coroutines/Coroutines.cpp |  27 
 .../Transforms/Coroutines/coro-split-00.ll|  15 ++
 5 files changed, 191 insertions(+), 26 deletions(-)

diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst
index 36092325e536fb..5679aefcb421d8 100644
--- a/llvm/docs/Coroutines.rst
+++ b/llvm/docs/Coroutines.rst
@@ -2022,6 +2022,12 @@ The pass CoroSplit builds coroutine frame and outlines 
resume and destroy parts
 into separate functions. This pass also lowers `coro.await.suspend.void`_,
 `coro.await.suspend.bool`_ and `coro.await.suspend.handle`_ intrinsics.
 
+CoroAnnotationElide
+---
+This pass finds all usages of coroutines that are "must elide" and replaces
+`coro.begin` intrinsic with an address of a coroutine frame placed on its 
caller
+and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null`
+respectively to remove the deallocation code.
 
 CoroElide
 -
@@ -2049,6 +2055,18 @@ the coroutine must reach the final suspend point when it 
get destroyed.
 
 This attribute only works for switched-resume coroutines now.
 
+coro_elide_safe
+---
+
+When a Call or Invoke instruction to switch ABI coroutine `f` is marked with
+`coro_elide_safe`, CoroSplitPass generates a `f.noalloc` ramp function.
+`f.noalloc` has one more argument than its original ramp function `f`, which is
+the pointer to the allocated frame. `f.noalloc` also suppressed any allocations
+or deallocations that may be guarded by `@llvm.coro.alloc` and 
`@llvm.coro.free`.
+
+CoroAnnotationElidePass performs the heap elision when possible. Note that for
+recursive or mutually recursive functions this elision is usually not possible.
+
 Metadata
 
 
diff --git a/llvm/lib/Transforms/Coroutines/CoroInternal.h 
b/llvm/lib/Transforms/Coroutines/CoroInternal.h
index d535ad7f85d74a..be86f96525b677 100644
--- a/llvm/lib/Transforms/Coroutines/CoroInternal.h
+++ b/llvm/lib/Transforms/Coroutines/CoroInternal.h
@@ -26,6 +26,13 @@ bool declaresIntrinsics(const Module &M,
 const std::initializer_list);
 void replaceCoroFree(CoroIdInst *CoroId, bool Elide);
 
+/// Replaces all @llvm.coro.alloc intrinsics calls associated with a given
+/// call @llvm.coro.id instruction with boolean value false.
+void suppressCoroAllocs(CoroIdInst *CoroId);
+/// Replaces CoroAllocs with boolean value false.
+void suppressCoroAllocs(LLVMContext &Context,
+ArrayRef CoroAllocs);
+
 /// Attempts to rewrite the location operand of debug intrinsics in terms of
 /// the coroutine frame pointer, folding pointer offsets into the DIExpression
 /// of the intrinsic.
diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp 
b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index 6bf3c75b95113e..494c4d632de95f 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -25,6 +25,7 @@
 #include "llvm/ADT/PriorityWorklist.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/Analysis/CFG.h"
@@ -1177,6 +1178,14 @@ static void 
updateAsyncFuncPointerContextSize(coro::Shape &Shape) {
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
+static TypeSize getFrameSizeForShape(coro::Shape &Shape) {
+  // In the same function all coro.sizes should have the same result type.
+  auto *SizeIntrin = Shape.CoroSizes.back();
+  Module *M = SizeIntrin->getModule();
+  const DataLayout &DL = M->getDataLayout();
+  return DL.getTypeAllocSize(Shape.FrameTy);
+}
+
 static void replaceFrameSizeAndAlignment(coro::Shape &Shape) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
@@ -1192,10 +1201,8 @@ static void replaceFrameSizeAndAlignment(coro::Shape 
&Shape) {
 
   // In the same function all coro.sizes should have the same result type.
   auto *SizeIntrin = Shape.CoroSizes.back();
-  Module *M = SizeIntrin->getModule();
-  const DataLayout &DL = M->getDataLayout();
-  auto Size = DL.getTypeAllocSize(Shape.FrameTy);
-  auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+  auto *SizeConstant =
+  ConstantInt::get(SizeIntrin->getType(), getFrameSizeForShape(Shape));
 
   for (CoroSizeInst *CS : Shape.CoroSizes) {
 CS->replaceAllUsesWith(SizeConstant);
@@ -1452,6 +1459,75 @@ struct SwitchCorou

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits

https://github.com/yuxuanchen1997 updated 
https://github.com/llvm/llvm-project/pull/99283

>From e2a6027dd2af62f4fbfa92795873f0489fd35cfd Mon Sep 17 00:00:00 2001
From: Yuxuan Chen 
Date: Tue, 4 Jun 2024 23:22:00 -0700
Subject: [PATCH] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI
 coroutine ramp functions during CoroSplit

---
 llvm/docs/Coroutines.rst  |  18 +++
 llvm/lib/Transforms/Coroutines/CoroInternal.h |   7 +
 llvm/lib/Transforms/Coroutines/CoroSplit.cpp  | 150 +++---
 llvm/lib/Transforms/Coroutines/Coroutines.cpp |  27 
 .../Transforms/Coroutines/coro-split-00.ll|  15 ++
 5 files changed, 191 insertions(+), 26 deletions(-)

diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst
index 36092325e536fb..5679aefcb421d8 100644
--- a/llvm/docs/Coroutines.rst
+++ b/llvm/docs/Coroutines.rst
@@ -2022,6 +2022,12 @@ The pass CoroSplit builds coroutine frame and outlines 
resume and destroy parts
 into separate functions. This pass also lowers `coro.await.suspend.void`_,
 `coro.await.suspend.bool`_ and `coro.await.suspend.handle`_ intrinsics.
 
+CoroAnnotationElide
+---
+This pass finds all usages of coroutines that are "must elide" and replaces
+`coro.begin` intrinsic with an address of a coroutine frame placed on its 
caller
+and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null`
+respectively to remove the deallocation code.
 
 CoroElide
 -
@@ -2049,6 +2055,18 @@ the coroutine must reach the final suspend point when it 
get destroyed.
 
 This attribute only works for switched-resume coroutines now.
 
+coro_elide_safe
+---
+
+When a Call or Invoke instruction to switch ABI coroutine `f` is marked with
+`coro_elide_safe`, CoroSplitPass generates a `f.noalloc` ramp function.
+`f.noalloc` has one more argument than its original ramp function `f`, which is
+the pointer to the allocated frame. `f.noalloc` also suppressed any allocations
+or deallocations that may be guarded by `@llvm.coro.alloc` and 
`@llvm.coro.free`.
+
+CoroAnnotationElidePass performs the heap elision when possible. Note that for
+recursive or mutually recursive functions this elision is usually not possible.
+
 Metadata
 
 
diff --git a/llvm/lib/Transforms/Coroutines/CoroInternal.h 
b/llvm/lib/Transforms/Coroutines/CoroInternal.h
index d535ad7f85d74a..be86f96525b677 100644
--- a/llvm/lib/Transforms/Coroutines/CoroInternal.h
+++ b/llvm/lib/Transforms/Coroutines/CoroInternal.h
@@ -26,6 +26,13 @@ bool declaresIntrinsics(const Module &M,
 const std::initializer_list);
 void replaceCoroFree(CoroIdInst *CoroId, bool Elide);
 
+/// Replaces all @llvm.coro.alloc intrinsics calls associated with a given
+/// call @llvm.coro.id instruction with boolean value false.
+void suppressCoroAllocs(CoroIdInst *CoroId);
+/// Replaces CoroAllocs with boolean value false.
+void suppressCoroAllocs(LLVMContext &Context,
+ArrayRef CoroAllocs);
+
 /// Attempts to rewrite the location operand of debug intrinsics in terms of
 /// the coroutine frame pointer, folding pointer offsets into the DIExpression
 /// of the intrinsic.
diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp 
b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index 6bf3c75b95113e..494c4d632de95f 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -25,6 +25,7 @@
 #include "llvm/ADT/PriorityWorklist.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/Analysis/CFG.h"
@@ -1177,6 +1178,14 @@ static void 
updateAsyncFuncPointerContextSize(coro::Shape &Shape) {
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
+static TypeSize getFrameSizeForShape(coro::Shape &Shape) {
+  // In the same function all coro.sizes should have the same result type.
+  auto *SizeIntrin = Shape.CoroSizes.back();
+  Module *M = SizeIntrin->getModule();
+  const DataLayout &DL = M->getDataLayout();
+  return DL.getTypeAllocSize(Shape.FrameTy);
+}
+
 static void replaceFrameSizeAndAlignment(coro::Shape &Shape) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
@@ -1192,10 +1201,8 @@ static void replaceFrameSizeAndAlignment(coro::Shape 
&Shape) {
 
   // In the same function all coro.sizes should have the same result type.
   auto *SizeIntrin = Shape.CoroSizes.back();
-  Module *M = SizeIntrin->getModule();
-  const DataLayout &DL = M->getDataLayout();
-  auto Size = DL.getTypeAllocSize(Shape.FrameTy);
-  auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+  auto *SizeConstant =
+  ConstantInt::get(SizeIntrin->getType(), getFrameSizeForShape(Shape));
 
   for (CoroSizeInst *CS : Shape.CoroSizes) {
 CS->replaceAllUsesWith(SizeConstant);
@@ -1452,6 +1459,75 @@ struct SwitchCorou

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits

https://github.com/yuxuanchen1997 updated 
https://github.com/llvm/llvm-project/pull/99285

>From d6f2e78230c0907db95568e5b920d574ce6b4758 Mon Sep 17 00:00:00 2001
From: Yuxuan Chen 
Date: Mon, 15 Jul 2024 15:01:39 -0700
Subject: [PATCH] [LLVM][Coroutines] Transform "coro_elide_safe" calls to
 switch ABI coroutines to the `noalloc` variant

---
 .../Coroutines/CoroAnnotationElide.h  |  36 +
 llvm/lib/Passes/PassBuilder.cpp   |   1 +
 llvm/lib/Passes/PassBuilderPipelines.cpp  |  10 +-
 llvm/lib/Passes/PassRegistry.def  |   1 +
 llvm/lib/Transforms/Coroutines/CMakeLists.txt |   1 +
 .../Coroutines/CoroAnnotationElide.cpp| 152 ++
 llvm/test/Other/new-pm-defaults.ll|   1 +
 .../Other/new-pm-thinlto-postlink-defaults.ll |   1 +
 .../new-pm-thinlto-postlink-pgo-defaults.ll   |   1 +
 ...-pm-thinlto-postlink-samplepgo-defaults.ll |   1 +
 .../Coroutines/coro-transform-must-elide.ll   |  76 +
 11 files changed, 279 insertions(+), 2 deletions(-)
 create mode 100644 
llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
 create mode 100644 llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
 create mode 100644 llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll

diff --git a/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h 
b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
new file mode 100644
index 00..352c9e14526697
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
@@ -0,0 +1,36 @@
+//===- CoroAnnotationElide.h - Elide attributed safe coroutine calls 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#ifndef LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+#define LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+
+#include "llvm/Analysis/CGSCCPassManager.h"
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+struct CoroAnnotationElidePass : PassInfoMixin {
+  CoroAnnotationElidePass() {}
+
+  PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
+LazyCallGraph &CG, CGSCCUpdateResult &UR);
+
+  static bool isRequired() { return false; }
+};
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 17eed97fd950c9..c2b99a0d1f8cea 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -138,6 +138,7 @@
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 1184123c7710f0..992b4fca8a6919 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/VirtualFileSystem.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
@@ -984,8 +985,10 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
   MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
   RequireAnalysisPass()));
 
-  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink)
+  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink) {
 MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
+MainCGPipeline.addPass(CoroAnnotationElidePass());
+  }
 
   // Make sure we don't affect potential future NoRerun CGSCC adaptors.
   MIWP.addLateModulePass(createModuleToFunctionPassAdaptor(
@@ -1027,9 +1030,12 @@ 
PassBuilder::buildModuleInlinerPipeline(OptimizationLevel Level,
   buildFunctionSimplificationPipeline(Level, Phase),
   PTO.EagerlyInvalidateAnalyses));
 
-  if (Phase != ThinOrFullLT

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff e2a6027dd2af62f4fbfa92795873f0489fd35cfd 
d6f2e78230c0907db95568e5b920d574ce6b4758 --extensions cpp,h -- 
llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h 
llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp 
llvm/lib/Passes/PassBuilder.cpp llvm/lib/Passes/PassBuilderPipelines.cpp
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp 
b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
index e7c7e01f9c..28953f2137 100644
--- a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
@@ -143,9 +143,9 @@ PreservedAnalyses 
CoroAnnotationElidePass::run(LazyCallGraph::SCC &C,
  << "' elided in '" << ore::NV("caller", Caller->getName());
 });
 Changed = true;
-updateCGAndAnalysisManagerForCGSCCPass(CG, *CallerC, *CallerN, AM, UR, 
FAM);
+updateCGAndAnalysisManagerForCGSCCPass(CG, *CallerC, *CallerN, AM, UR,
+   FAM);
   }
-
 }
   }
   return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();

``




https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff e2a6027dd2af62f4fbfa92795873f0489fd35cfd 
d6f2e78230c0907db95568e5b920d574ce6b4758 --extensions h,cpp -- 
llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h 
llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp 
llvm/lib/Passes/PassBuilder.cpp llvm/lib/Passes/PassBuilderPipelines.cpp
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp 
b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
index e7c7e01f9c..28953f2137 100644
--- a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
@@ -143,9 +143,9 @@ PreservedAnalyses 
CoroAnnotationElidePass::run(LazyCallGraph::SCC &C,
  << "' elided in '" << ore::NV("caller", Caller->getName());
 });
 Changed = true;
-updateCGAndAnalysisManagerForCGSCCPass(CG, *CallerC, *CallerN, AM, UR, 
FAM);
+updateCGAndAnalysisManagerForCGSCCPass(CG, *CallerC, *CallerN, AM, UR,
+   FAM);
   }
-
 }
   }
   return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();

``




https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits

https://github.com/yuxuanchen1997 updated 
https://github.com/llvm/llvm-project/pull/99285

>From 68a410d159fdb96e7580a7f3fe035df00b893f3c Mon Sep 17 00:00:00 2001
From: Yuxuan Chen 
Date: Mon, 15 Jul 2024 15:01:39 -0700
Subject: [PATCH] [LLVM][Coroutines] Transform "coro_elide_safe" calls to
 switch ABI coroutines to the `noalloc` variant

---
 .../Coroutines/CoroAnnotationElide.h  |  36 +
 llvm/lib/Passes/PassBuilder.cpp   |   1 +
 llvm/lib/Passes/PassBuilderPipelines.cpp  |  10 +-
 llvm/lib/Passes/PassRegistry.def  |   1 +
 llvm/lib/Transforms/Coroutines/CMakeLists.txt |   1 +
 .../Coroutines/CoroAnnotationElide.cpp| 152 ++
 llvm/test/Other/new-pm-defaults.ll|   1 +
 .../Other/new-pm-thinlto-postlink-defaults.ll |   1 +
 .../new-pm-thinlto-postlink-pgo-defaults.ll   |   1 +
 ...-pm-thinlto-postlink-samplepgo-defaults.ll |   1 +
 .../Coroutines/coro-transform-must-elide.ll   |  76 +
 11 files changed, 279 insertions(+), 2 deletions(-)
 create mode 100644 
llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
 create mode 100644 llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
 create mode 100644 llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll

diff --git a/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h 
b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
new file mode 100644
index 00..352c9e14526697
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
@@ -0,0 +1,36 @@
+//===- CoroAnnotationElide.h - Elide attributed safe coroutine calls 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#ifndef LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+#define LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+
+#include "llvm/Analysis/CGSCCPassManager.h"
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+struct CoroAnnotationElidePass : PassInfoMixin {
+  CoroAnnotationElidePass() {}
+
+  PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
+LazyCallGraph &CG, CGSCCUpdateResult &UR);
+
+  static bool isRequired() { return false; }
+};
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 17eed97fd950c9..c2b99a0d1f8cea 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -138,6 +138,7 @@
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 1184123c7710f0..992b4fca8a6919 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/VirtualFileSystem.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
@@ -984,8 +985,10 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
   MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
   RequireAnalysisPass()));
 
-  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink)
+  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink) {
 MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
+MainCGPipeline.addPass(CoroAnnotationElidePass());
+  }
 
   // Make sure we don't affect potential future NoRerun CGSCC adaptors.
   MIWP.addLateModulePass(createModuleToFunctionPassAdaptor(
@@ -1027,9 +1030,12 @@ 
PassBuilder::buildModuleInlinerPipeline(OptimizationLevel Level,
   buildFunctionSimplificationPipeline(Level, Phase),
   PTO.EagerlyInvalidateAnalyses));
 
-  if (Phase != ThinOrFullLT

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Yuxuan Chen via llvm-branch-commits

https://github.com/yuxuanchen1997 updated 
https://github.com/llvm/llvm-project/pull/99285

>From 5b18641d2b59adf11810f71fe5ab3204a94a7a56 Mon Sep 17 00:00:00 2001
From: Yuxuan Chen 
Date: Mon, 15 Jul 2024 15:01:39 -0700
Subject: [PATCH] [LLVM][Coroutines] Transform "coro_elide_safe" calls to
 switch ABI coroutines to the `noalloc` variant

---
 .../Coroutines/CoroAnnotationElide.h  |  36 
 llvm/lib/Passes/PassBuilder.cpp   |   1 +
 llvm/lib/Passes/PassBuilderPipelines.cpp  |  10 +-
 llvm/lib/Passes/PassRegistry.def  |   1 +
 llvm/lib/Transforms/Coroutines/CMakeLists.txt |   1 +
 .../Coroutines/CoroAnnotationElide.cpp| 155 ++
 llvm/test/Other/new-pm-defaults.ll|   1 +
 .../Other/new-pm-thinlto-postlink-defaults.ll |   1 +
 .../new-pm-thinlto-postlink-pgo-defaults.ll   |   1 +
 ...-pm-thinlto-postlink-samplepgo-defaults.ll |   1 +
 .../Coroutines/coro-transform-must-elide.ll   |  75 +
 11 files changed, 281 insertions(+), 2 deletions(-)
 create mode 100644 
llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
 create mode 100644 llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
 create mode 100644 llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll

diff --git a/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h 
b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
new file mode 100644
index 00..352c9e14526697
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
@@ -0,0 +1,36 @@
+//===- CoroAnnotationElide.h - Elide attributed safe coroutine calls 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#ifndef LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+#define LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+
+#include "llvm/Analysis/CGSCCPassManager.h"
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+struct CoroAnnotationElidePass : PassInfoMixin {
+  CoroAnnotationElidePass() {}
+
+  PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
+LazyCallGraph &CG, CGSCCUpdateResult &UR);
+
+  static bool isRequired() { return false; }
+};
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 17eed97fd950c9..c2b99a0d1f8cea 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -138,6 +138,7 @@
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 1184123c7710f0..992b4fca8a6919 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/VirtualFileSystem.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
@@ -984,8 +985,10 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
   MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
   RequireAnalysisPass()));
 
-  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink)
+  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink) {
 MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
+MainCGPipeline.addPass(CoroAnnotationElidePass());
+  }
 
   // Make sure we don't affect potential future NoRerun CGSCC adaptors.
   MIWP.addLateModulePass(createModuleToFunctionPassAdaptor(
@@ -1027,9 +1030,12 @@ 
PassBuilder::buildModuleInlinerPipeline(OptimizationLevel Level,
   buildFunctionSimplificationPipeline(Level, Phase),
   PTO.EagerlyInvalidateAnalyses));
 
-  if (Phase != ThinOrFullLTO

[llvm-branch-commits] [clang] Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226) (PR #106439)

2024-08-28 Thread Artem Dergachev via llvm-branch-commits

haoNoQ wrote:

(According to the discussion in 102226, this patch was never supposed to be in 
the release branch.)

https://github.com/llvm/llvm-project/pull/106439
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226) (PR #106439)

2024-08-28 Thread Joseph Huber via llvm-branch-commits

https://github.com/jhuber6 approved this pull request.

Thanks

https://github.com/llvm/llvm-project/pull/106439
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: workflows/release-binaries: Enable flang builds on Windows (#101344) (PR #106480)

2024-08-28 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/106480

Backport 8927576b8f6442bb6129bda597efee46176f8aec

Requested by: @tstellar

>From b3eb0c3dfe85b18ed4ef8e3f804970680c0e94ca Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Wed, 28 Aug 2024 18:22:57 -0700
Subject: [PATCH] workflows/release-binaries: Enable flang builds on Windows
 (#101344)

Flang for Windows depends on compiler-rt, so we need to enable it for
the stage1 builds. This also fixes failures building the flang tests on
macOS.

Fixes #100202.

(cherry picked from commit 8927576b8f6442bb6129bda597efee46176f8aec)
---
 .github/workflows/release-binaries.yml | 8 
 clang/cmake/caches/Release.cmake   | 7 +--
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/.github/workflows/release-binaries.yml 
b/.github/workflows/release-binaries.yml
index 509016e5b89c45..672dd7517d23ce 100644
--- a/.github/workflows/release-binaries.yml
+++ b/.github/workflows/release-binaries.yml
@@ -135,16 +135,8 @@ jobs:
   target_cmake_flags="$target_cmake_flags 
-DBOOTSTRAP_DARWIN_osx_ARCHS=$arches 
-DBOOTSTRAP_DARWIN_osx_BUILTIN_ARCHS=$arches"
 fi
 
-# x86 macOS and x86 Windows have trouble building flang, so disable it.
-# Windows: https://github.com/llvm/llvm-project/issues/100202
-# macOS: 'rebase opcodes terminated early at offset 1 of 80016' when 
building __fortran_builtins.mod
 build_flang="true"
 
-if [ "$target" = "Windows-X64" ]; then
-  target_cmake_flags="$target_cmake_flags 
-DLLVM_RELEASE_ENABLE_PROJECTS=\"clang;lld;lldb;clang-tools-extra;bolt;polly;mlir\""
-  build_flang="false"
-fi
-
 if [ "${{ runner.os }}" = "Windows" ]; then
   # The build times out on Windows, so we need to disable LTO.
   target_cmake_flags="$target_cmake_flags 
-DLLVM_RELEASE_ENABLE_LTO=OFF"
diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake
index e5161dd9a27b96..6d5f75ca0074ee 100644
--- a/clang/cmake/caches/Release.cmake
+++ b/clang/cmake/caches/Release.cmake
@@ -47,11 +47,14 @@ set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "")
 set(CLANG_ENABLE_BOOTSTRAP ON CACHE BOOL "")
 
 set(STAGE1_PROJECTS "clang")
-set(STAGE1_RUNTIMES "")
+
+# Building Flang on Windows requires compiler-rt, so we need to build it in
+# stage1.  compiler-rt is also required for building the Flang tests on
+# macOS.
+set(STAGE1_RUNTIMES "compiler-rt")
 
 if (LLVM_RELEASE_ENABLE_PGO)
   list(APPEND STAGE1_PROJECTS "lld")
-  list(APPEND STAGE1_RUNTIMES "compiler-rt")
   set(CLANG_BOOTSTRAP_TARGETS
 generate-profdata
 stage2-package

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: workflows/release-binaries: Enable flang builds on Windows (#101344) (PR #106480)

2024-08-28 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/106480
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: workflows/release-binaries: Enable flang builds on Windows (#101344) (PR #106480)

2024-08-28 Thread via llvm-branch-commits

llvmbot wrote:

@tstellar What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/106480
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: workflows/release-binaries: Enable flang builds on Windows (#101344) (PR #106480)

2024-08-28 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 8927576b8f6442bb6129bda597efee46176f8aec

Requested by: @tstellar

---
Full diff: https://github.com/llvm/llvm-project/pull/106480.diff


2 Files Affected:

- (modified) .github/workflows/release-binaries.yml (-8) 
- (modified) clang/cmake/caches/Release.cmake (+5-2) 


``diff
diff --git a/.github/workflows/release-binaries.yml 
b/.github/workflows/release-binaries.yml
index 509016e5b89c45..672dd7517d23ce 100644
--- a/.github/workflows/release-binaries.yml
+++ b/.github/workflows/release-binaries.yml
@@ -135,16 +135,8 @@ jobs:
   target_cmake_flags="$target_cmake_flags 
-DBOOTSTRAP_DARWIN_osx_ARCHS=$arches 
-DBOOTSTRAP_DARWIN_osx_BUILTIN_ARCHS=$arches"
 fi
 
-# x86 macOS and x86 Windows have trouble building flang, so disable it.
-# Windows: https://github.com/llvm/llvm-project/issues/100202
-# macOS: 'rebase opcodes terminated early at offset 1 of 80016' when 
building __fortran_builtins.mod
 build_flang="true"
 
-if [ "$target" = "Windows-X64" ]; then
-  target_cmake_flags="$target_cmake_flags 
-DLLVM_RELEASE_ENABLE_PROJECTS=\"clang;lld;lldb;clang-tools-extra;bolt;polly;mlir\""
-  build_flang="false"
-fi
-
 if [ "${{ runner.os }}" = "Windows" ]; then
   # The build times out on Windows, so we need to disable LTO.
   target_cmake_flags="$target_cmake_flags 
-DLLVM_RELEASE_ENABLE_LTO=OFF"
diff --git a/clang/cmake/caches/Release.cmake b/clang/cmake/caches/Release.cmake
index e5161dd9a27b96..6d5f75ca0074ee 100644
--- a/clang/cmake/caches/Release.cmake
+++ b/clang/cmake/caches/Release.cmake
@@ -47,11 +47,14 @@ set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "")
 set(CLANG_ENABLE_BOOTSTRAP ON CACHE BOOL "")
 
 set(STAGE1_PROJECTS "clang")
-set(STAGE1_RUNTIMES "")
+
+# Building Flang on Windows requires compiler-rt, so we need to build it in
+# stage1.  compiler-rt is also required for building the Flang tests on
+# macOS.
+set(STAGE1_RUNTIMES "compiler-rt")
 
 if (LLVM_RELEASE_ENABLE_PGO)
   list(APPEND STAGE1_PROJECTS "lld")
-  list(APPEND STAGE1_RUNTIMES "compiler-rt")
   set(CLANG_BOOTSTRAP_TARGETS
 generate-profdata
 stage2-package

``




https://github.com/llvm/llvm-project/pull/106480
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 92885bb - Revert "[llvm-profdata] Enabled functionality to write split-layout profile (…"

2024-08-28 Thread via llvm-branch-commits

Author: William Junda Huang
Date: 2024-08-28T21:33:24-04:00
New Revision: 92885bbeab632875929827a09841237cd59405fb

URL: 
https://github.com/llvm/llvm-project/commit/92885bbeab632875929827a09841237cd59405fb
DIFF: 
https://github.com/llvm/llvm-project/commit/92885bbeab632875929827a09841237cd59405fb.diff

LOG: Revert "[llvm-profdata] Enabled functionality to write split-layout 
profile (…"

This reverts commit 75e9d191f52b047ea839f75ab2a7a7d9f8c6becd.

Added: 


Modified: 
llvm/docs/CommandGuide/llvm-profdata.rst
llvm/include/llvm/ProfileData/SampleProfReader.h
llvm/include/llvm/ProfileData/SampleProfWriter.h
llvm/lib/ProfileData/SampleProfReader.cpp
llvm/lib/ProfileData/SampleProfWriter.cpp
llvm/tools/llvm-profdata/llvm-profdata.cpp

Removed: 
llvm/test/tools/llvm-profdata/Inputs/split-layout.profdata
llvm/test/tools/llvm-profdata/sample-split-layout.test



diff  --git a/llvm/docs/CommandGuide/llvm-profdata.rst 
b/llvm/docs/CommandGuide/llvm-profdata.rst
index af840f3994b3d6..acf016a6dbcd70 100644
--- a/llvm/docs/CommandGuide/llvm-profdata.rst
+++ b/llvm/docs/CommandGuide/llvm-profdata.rst
@@ -162,12 +162,6 @@ OPTIONS
  coverage for the optimized target. This option can only be used with
  sample-based profile in extbinary format.
 
-.. option:: --split-layout=[true|false]
-
- Split the profile data section to two with one containing sample profiles with
- inlined functions and the other not. This option can only be used with
- sample-based profile in extbinary format.
-
 .. option:: --convert-sample-profile-layout=[nest|flat]
 
  Convert the merged profile into a profile with a new layout. Supported

diff  --git a/llvm/include/llvm/ProfileData/SampleProfReader.h 
b/llvm/include/llvm/ProfileData/SampleProfReader.h
index 0fd86600de21f0..f053946a5db0a9 100644
--- a/llvm/include/llvm/ProfileData/SampleProfReader.h
+++ b/llvm/include/llvm/ProfileData/SampleProfReader.h
@@ -495,9 +495,9 @@ class SampleProfileReader {
   /// are present.
   virtual void setProfileUseMD5() { ProfileIsMD5 = true; }
 
-  /// Don't read profile without context if the flag is set.
-  void setSkipFlatProf(bool Skip) { SkipFlatProf = Skip; }
-
+  /// Don't read profile without context if the flag is set. This is only 
meaningful
+  /// for ExtBinary format.
+  virtual void setSkipFlatProf(bool Skip) {}
   /// Return whether any name in the profile contains ".__uniq." suffix.
   virtual bool hasUniqSuffix() { return false; }
 
@@ -581,10 +581,6 @@ class SampleProfileReader {
   /// Whether the profile uses MD5 for Sample Contexts and function names. This
   /// can be one-way overriden by the user to force use MD5.
   bool ProfileIsMD5 = false;
-
-  /// If SkipFlatProf is true, skip functions marked with !Flat in text mode or
-  /// sections with SecFlagFlat flag in ExtBinary mode.
-  bool SkipFlatProf = false;
 };
 
 class SampleProfileReaderText : public SampleProfileReader {
@@ -793,6 +789,10 @@ class SampleProfileReaderExtBinaryBase : public 
SampleProfileReaderBinary {
   /// The set containing the functions to use when compiling a module.
   DenseSet FuncsToUse;
 
+  /// If SkipFlatProf is true, skip the sections with
+  /// SecFlagFlat flag.
+  bool SkipFlatProf = false;
+
 public:
   SampleProfileReaderExtBinaryBase(std::unique_ptr B,
LLVMContext &C, SampleProfileFormat Format)
@@ -815,6 +815,8 @@ class SampleProfileReaderExtBinaryBase : public 
SampleProfileReaderBinary {
 return std::move(ProfSymList);
   };
 
+  void setSkipFlatProf(bool Skip) override { SkipFlatProf = Skip; }
+
 private:
   /// Read the profiles on-demand for the given functions. This is used after
   /// stale call graph matching finds new functions whose profiles aren't 
loaded

diff  --git a/llvm/include/llvm/ProfileData/SampleProfWriter.h 
b/llvm/include/llvm/ProfileData/SampleProfWriter.h
index 4b659eaf950b3e..5398a44f13ba36 100644
--- a/llvm/include/llvm/ProfileData/SampleProfWriter.h
+++ b/llvm/include/llvm/ProfileData/SampleProfWriter.h
@@ -28,9 +28,9 @@ namespace sampleprof {
 
 enum SectionLayout {
   DefaultLayout,
-  // The layout splits profile with inlined functions from profile without
-  // inlined functions. When Thinlto is enabled, ThinLTO postlink phase only
-  // has to load profile with inlined functions and can skip the other part.
+  // The layout splits profile with context information from profile without
+  // context information. When Thinlto is enabled, ThinLTO postlink phase only
+  // has to load profile with context information and can skip the other part.
   CtxSplitLayout,
   NumOfLayout,
 };
@@ -128,7 +128,7 @@ class SampleProfileWriter {
   virtual void setToCompressAllSections() {}
   virtual void setUseMD5() {}
   virtual void setPartialProfile() {}
-  virtual void setUseCtxSplitLayout() {}
+  virtual void resetSecLayout(SectionLayout SL) {}
 
 protected:
   SamplePro

[llvm-branch-commits] [clang] release/19.x: [clang-format] Fix misalignments of pointers in angle brackets (#106013) (PR #106326)

2024-08-28 Thread Owen Pan via llvm-branch-commits

https://github.com/owenca approved this pull request.


https://github.com/llvm/llvm-project/pull/106326
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang-format] js handle anonymous classes (#106242) (PR #106390)

2024-08-28 Thread Owen Pan via llvm-branch-commits

https://github.com/owenca approved this pull request.


https://github.com/llvm/llvm-project/pull/106390
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [libcxx] [clang] Finish implementation of P0522 (PR #96023)

2024-08-28 Thread Matheus Izvekov via llvm-branch-commits

https://github.com/mizvekov updated 
https://github.com/llvm/llvm-project/pull/96023

>From 84f988ee7c2d8fc5f777bc98850f6ab126fb3b71 Mon Sep 17 00:00:00 2001
From: Matheus Izvekov 
Date: Mon, 17 Jun 2024 21:39:08 -0300
Subject: [PATCH] [clang] Finish implementation of P0522

This finishes the clang implementation of P0522, getting rid
of the fallback to the old, pre-P0522 rules.

Before this patch, when partial ordering template template parameters,
we would perform, in order:
* If the old rules would match, we would accept it. Otherwise, don't
  generate diagnostics yet.
* If the new rules would match, just accept it. Otherwise, don't
  generate any diagnostics yet again.
* Apply the old rules again, this time with diagnostics.

This situation was far from ideal, as we would sometimes:
* Accept some things we shouldn't.
* Reject some things we shouldn't.
* Only diagnose rejection in terms of the old rules.

With this patch, we apply the P0522 rules throughout.

This needed to extend template argument deduction in order
to accept the historial rule for TTP matching pack parameter to non-pack
arguments.
This change also makes us accept some combinations of historical and P0522
allowances we wouldn't before.

It also fixes a bunch of bugs that were documented in the test suite,
which I am not sure there are issues already created for them.

This causes a lot of changes to the way these failures are diagnosed,
with related test suite churn.

The problem here is that the old rules were very simple and
non-recursive, making it easy to provide customized diagnostics,
and to keep them consistent with each other.

The new rules are a lot more complex and rely on template argument
deduction, substitutions, and they are recursive.

The approach taken here is to mostly rely on existing diagnostics,
and create a new instantiation context that keeps track of this context.

So for example when a substitution failure occurs, we use the error
produced there unmodified, and just attach notes to it explaining
that it occurred in the context of partial ordering this template
argument against that template parameter.

This diverges from the old diagnostics, which would lead with an
error pointing to the template argument, explain the problem
in subsequent notes, and produce a final note pointing to the parameter.
---
 clang/docs/ReleaseNotes.rst   |  10 +
 .../clang/Basic/DiagnosticSemaKinds.td|   7 +
 clang/include/clang/Sema/Sema.h   |  14 +-
 clang/lib/Frontend/FrontendActions.cpp|   2 +
 clang/lib/Sema/SemaTemplate.cpp   |  94 ++---
 clang/lib/Sema/SemaTemplateDeduction.cpp  | 353 +-
 clang/lib/Sema/SemaTemplateInstantiate.cpp|  15 +
 .../temp/temp.arg/temp.arg.template/p3-0x.cpp |  31 +-
 clang/test/CXX/temp/temp.param/p12.cpp|  21 +-
 clang/test/Modules/cxx-templates.cpp  |  15 +-
 clang/test/SemaCXX/make_integer_seq.cpp   |   5 +-
 clang/test/SemaTemplate/cwg2398.cpp   | 138 ++-
 clang/test/SemaTemplate/temp_arg_nontype.cpp  |  46 ++-
 clang/test/SemaTemplate/temp_arg_template.cpp |  38 +-
 .../SemaTemplate/temp_arg_template_p0522.cpp  |  82 ++--
 .../Templight/templight-empty-entries-fix.cpp |  12 +
 .../templight-prior-template-arg.cpp  |  33 +-
 .../type_traits/is_specialization.verify.cpp  |   2 +-
 18 files changed, 641 insertions(+), 277 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 2639fe3270200d..3826a19e28a666 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -129,6 +129,10 @@ C++23 Feature Support
 C++20 Feature Support
 ^
 
+C++17 Feature Support
+^
+- The implementation of the relaxed template template argument matching rules 
is
+  more complete and reliable, and should provide more accurate diagnostics.
 
 Resolutions to C++ Defect Reports
 ^
@@ -255,6 +259,10 @@ Improvements to Clang's diagnostics
 
 - Clang now diagnoses when the result of a [[nodiscard]] function is discarded 
after being cast in C. Fixes #GH104391.
 
+- Clang now properly explains the reason a template template argument failed to
+  match a template template parameter, in terms of the C++17 relaxed matching 
rules
+  instead of the old ones.
+
 - Don't emit duplicated dangling diagnostics. (#GH93386).
 
 - Improved diagnostic when trying to befriend a concept. (#GH45182).
@@ -322,6 +330,8 @@ Bug Fixes to C++ Support
 - Correctly check constraints of explicit instantiations of member functions. 
(#GH46029)
 - When performing partial ordering of function templates, clang now checks that
   the deduction was consistent. Fixes (#GH18291).
+- Fixes to several issues in partial ordering of template template parameters, 
which
+  were documented in the test suite.
 - Fixed an assertion failure about a constraint of a friend function template 
references to a value with greater
   

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits


@@ -0,0 +1,147 @@
+//===- CoroAnnotationElide.cpp - Elide attributed safe coroutine calls 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_elide_safe" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
+
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Transforms/Utils/CallGraphUpdater.h"
+
+#include 
+
+using namespace llvm;
+
+#define DEBUG_TYPE "coro-annotation-elide"
+
+static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
+  for (Instruction &I : F->getEntryBlock())
+if (!isa(&I))
+  return &I;
+  llvm_unreachable("no terminator in the entry block");
+}
+
+// Create an alloca in the caller, using FrameSize and FrameAlign as the callee
+// coroutine's activation frame.
+static Value *allocateFrameInCaller(Function *Caller, uint64_t FrameSize,
+Align FrameAlign) {
+  LLVMContext &C = Caller->getContext();
+  BasicBlock::iterator InsertPt =
+  getFirstNonAllocaInTheEntryBlock(Caller)->getIterator();
+  const DataLayout &DL = Caller->getDataLayout();
+  auto FrameTy = ArrayType::get(Type::getInt8Ty(C), FrameSize);
+  auto *Frame = new AllocaInst(FrameTy, DL.getAllocaAddrSpace(), "", InsertPt);
+  Frame->setAlignment(FrameAlign);
+  return new BitCastInst(Frame, PointerType::getUnqual(C), "vFrame", InsertPt);
+}
+
+// Given a call or invoke instruction to the elide safe coroutine, this 
function
+// does the following:
+//  - Allocate a frame for the callee coroutine in the caller using alloca.
+//  - Replace the old CB with a new Call or Invoke to `NewCallee`, with the
+//pointer to the frame as an additional argument to NewCallee.
+static void processCall(CallBase *CB, Function *Caller, Function *NewCallee,
+uint64_t FrameSize, Align FrameAlign) {
+  auto *FramePtr = allocateFrameInCaller(Caller, FrameSize, FrameAlign);

ChuanqiXu9 wrote:

Yeah, we need to do this in the frontend.

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits

https://github.com/ChuanqiXu9 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-28 Thread Chuanqi Xu via llvm-branch-commits

https://github.com/ChuanqiXu9 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang-format] Revert "[clang-format][NFC] Delete TT_LambdaArrow (#70… (PR #106482)

2024-08-28 Thread Owen Pan via llvm-branch-commits

https://github.com/owenca created 
https://github.com/llvm/llvm-project/pull/106482

…… (#105923)

…519)"

This reverts commit e00d32afb9d33a1eca48e2b041c9688436706c5b and adds a test 
for lambda arrow SplitPenalty.

Fixes #105480.

>From 386f54403a6b38fd14d8e3126fcc46b7e579f575 Mon Sep 17 00:00:00 2001
From: Owen Pan 
Date: Wed, 28 Aug 2024 18:23:54 -0700
Subject: [PATCH] =?UTF-8?q?[clang-format]=20Revert=20"[clang-format][NFC]?=
 =?UTF-8?q?=20Delete=20TT=5FLambdaArrow=20(#70=E2=80=A6=20(#105923)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

…519)"

This reverts commit e00d32afb9d33a1eca48e2b041c9688436706c5b and adds a
test for lambda arrow SplitPenalty.

Fixes #105480.
---
 clang/lib/Format/ContinuationIndenter.cpp | 10 +++---
 clang/lib/Format/FormatToken.h|  3 +-
 clang/lib/Format/TokenAnnotator.cpp   | 33 ++
 clang/lib/Format/UnwrappedLineParser.cpp  |  2 +-
 clang/unittests/Format/TokenAnnotatorTest.cpp | 34 ++-
 5 files changed, 50 insertions(+), 32 deletions(-)

diff --git a/clang/lib/Format/ContinuationIndenter.cpp 
b/clang/lib/Format/ContinuationIndenter.cpp
index b07360425ca6e1..7d89f0e63dd225 100644
--- a/clang/lib/Format/ContinuationIndenter.cpp
+++ b/clang/lib/Format/ContinuationIndenter.cpp
@@ -842,10 +842,8 @@ void ContinuationIndenter::addTokenOnCurrentLine(LineState 
&State, bool DryRun,
 CurrentState.ContainsUnwrappedBuilder = true;
   }
 
-  if (Current.is(TT_TrailingReturnArrow) &&
-  Style.Language == FormatStyle::LK_Java) {
+  if (Current.is(TT_LambdaArrow) && Style.Language == FormatStyle::LK_Java)
 CurrentState.NoLineBreak = true;
-  }
   if (Current.isMemberAccess() && Previous.is(tok::r_paren) &&
   (Previous.MatchingParen &&
(Previous.TotalLength - Previous.MatchingParen->TotalLength > 10))) {
@@ -1000,7 +998,7 @@ unsigned ContinuationIndenter::addTokenOnNewLine(LineState 
&State,
   //
   // is common and should be formatted like a free-standing function. The same
   // goes for wrapping before the lambda return type arrow.
-  if (Current.isNot(TT_TrailingReturnArrow) &&
+  if (Current.isNot(TT_LambdaArrow) &&
   (!Style.isJavaScript() || Current.NestingLevel != 0 ||
!PreviousNonComment || PreviousNonComment->isNot(tok::equal) ||
!Current.isOneOf(Keywords.kw_async, Keywords.kw_function))) {
@@ -1257,7 +1255,7 @@ unsigned ContinuationIndenter::getNewLineColumn(const 
LineState &State) {
 }
 return CurrentState.Indent;
   }
-  if (Current.is(TT_TrailingReturnArrow) &&
+  if (Current.is(TT_LambdaArrow) &&
   Previous.isOneOf(tok::kw_noexcept, tok::kw_mutable, tok::kw_constexpr,
tok::kw_consteval, tok::kw_static, TT_AttributeSquare)) 
{
 return ContinuationIndent;
@@ -1590,7 +1588,7 @@ unsigned 
ContinuationIndenter::moveStateToNextToken(LineState &State,
   }
   if (Current.isOneOf(TT_BinaryOperator, TT_ConditionalExpr) && Newline)
 CurrentState.NestedBlockIndent = State.Column + Current.ColumnWidth + 1;
-  if (Current.isOneOf(TT_LambdaLSquare, TT_TrailingReturnArrow))
+  if (Current.isOneOf(TT_LambdaLSquare, TT_LambdaArrow))
 CurrentState.LastSpace = State.Column;
   if (Current.is(TT_RequiresExpression) &&
   Style.RequiresExpressionIndentation == FormatStyle::REI_Keyword) {
diff --git a/clang/lib/Format/FormatToken.h b/clang/lib/Format/FormatToken.h
index cc45d5a8c5c1ec..9bfeb2052164ee 100644
--- a/clang/lib/Format/FormatToken.h
+++ b/clang/lib/Format/FormatToken.h
@@ -102,6 +102,7 @@ namespace format {
   TYPE(JsTypeColon)
\
   TYPE(JsTypeOperator) 
\
   TYPE(JsTypeOptionalQuestion) 
\
+  TYPE(LambdaArrow)
\
   TYPE(LambdaLBrace)   
\
   TYPE(LambdaLSquare)  
\
   TYPE(LeadingJavaAnnotation)  
\
@@ -725,7 +726,7 @@ struct FormatToken {
   bool isMemberAccess() const {
 return isOneOf(tok::arrow, tok::period, tok::arrowstar) &&
!isOneOf(TT_DesignatedInitializerPeriod, TT_TrailingReturnArrow,
-TT_LeadingJavaAnnotation);
+TT_LambdaArrow, TT_LeadingJavaAnnotation);
   }
 
   bool isPointerOrReference() const {
diff --git a/clang/lib/Format/TokenAnnotator.cpp 
b/clang/lib/Format/TokenAnnotator.cpp
index 851f79895ac5ac..07b42e79ba9a61 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -831,7 +831,7 @@ class AnnotatingParser {
 }
 // An arrow after an ObjC method expression is not a lambda arrow.
 if (CurrentToken->is(TT_ObjCMethodExpr) && CurrentToken->Next &&
-Current

[llvm-branch-commits] [clang] [clang-format] Revert "[clang-format][NFC] Delete TT_LambdaArrow (#70… (PR #106482)

2024-08-28 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-format

Author: Owen Pan (owenca)


Changes

…… (#105923)

…519)"

This reverts commit e00d32afb9d33a1eca48e2b041c9688436706c5b and adds a test 
for lambda arrow SplitPenalty.

Fixes #105480.

---
Full diff: https://github.com/llvm/llvm-project/pull/106482.diff


5 Files Affected:

- (modified) clang/lib/Format/ContinuationIndenter.cpp (+4-6) 
- (modified) clang/lib/Format/FormatToken.h (+2-1) 
- (modified) clang/lib/Format/TokenAnnotator.cpp (+18-15) 
- (modified) clang/lib/Format/UnwrappedLineParser.cpp (+1-1) 
- (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+25-9) 


``diff
diff --git a/clang/lib/Format/ContinuationIndenter.cpp 
b/clang/lib/Format/ContinuationIndenter.cpp
index b07360425ca6e1..7d89f0e63dd225 100644
--- a/clang/lib/Format/ContinuationIndenter.cpp
+++ b/clang/lib/Format/ContinuationIndenter.cpp
@@ -842,10 +842,8 @@ void ContinuationIndenter::addTokenOnCurrentLine(LineState 
&State, bool DryRun,
 CurrentState.ContainsUnwrappedBuilder = true;
   }
 
-  if (Current.is(TT_TrailingReturnArrow) &&
-  Style.Language == FormatStyle::LK_Java) {
+  if (Current.is(TT_LambdaArrow) && Style.Language == FormatStyle::LK_Java)
 CurrentState.NoLineBreak = true;
-  }
   if (Current.isMemberAccess() && Previous.is(tok::r_paren) &&
   (Previous.MatchingParen &&
(Previous.TotalLength - Previous.MatchingParen->TotalLength > 10))) {
@@ -1000,7 +998,7 @@ unsigned ContinuationIndenter::addTokenOnNewLine(LineState 
&State,
   //
   // is common and should be formatted like a free-standing function. The same
   // goes for wrapping before the lambda return type arrow.
-  if (Current.isNot(TT_TrailingReturnArrow) &&
+  if (Current.isNot(TT_LambdaArrow) &&
   (!Style.isJavaScript() || Current.NestingLevel != 0 ||
!PreviousNonComment || PreviousNonComment->isNot(tok::equal) ||
!Current.isOneOf(Keywords.kw_async, Keywords.kw_function))) {
@@ -1257,7 +1255,7 @@ unsigned ContinuationIndenter::getNewLineColumn(const 
LineState &State) {
 }
 return CurrentState.Indent;
   }
-  if (Current.is(TT_TrailingReturnArrow) &&
+  if (Current.is(TT_LambdaArrow) &&
   Previous.isOneOf(tok::kw_noexcept, tok::kw_mutable, tok::kw_constexpr,
tok::kw_consteval, tok::kw_static, TT_AttributeSquare)) 
{
 return ContinuationIndent;
@@ -1590,7 +1588,7 @@ unsigned 
ContinuationIndenter::moveStateToNextToken(LineState &State,
   }
   if (Current.isOneOf(TT_BinaryOperator, TT_ConditionalExpr) && Newline)
 CurrentState.NestedBlockIndent = State.Column + Current.ColumnWidth + 1;
-  if (Current.isOneOf(TT_LambdaLSquare, TT_TrailingReturnArrow))
+  if (Current.isOneOf(TT_LambdaLSquare, TT_LambdaArrow))
 CurrentState.LastSpace = State.Column;
   if (Current.is(TT_RequiresExpression) &&
   Style.RequiresExpressionIndentation == FormatStyle::REI_Keyword) {
diff --git a/clang/lib/Format/FormatToken.h b/clang/lib/Format/FormatToken.h
index cc45d5a8c5c1ec..9bfeb2052164ee 100644
--- a/clang/lib/Format/FormatToken.h
+++ b/clang/lib/Format/FormatToken.h
@@ -102,6 +102,7 @@ namespace format {
   TYPE(JsTypeColon)
\
   TYPE(JsTypeOperator) 
\
   TYPE(JsTypeOptionalQuestion) 
\
+  TYPE(LambdaArrow)
\
   TYPE(LambdaLBrace)   
\
   TYPE(LambdaLSquare)  
\
   TYPE(LeadingJavaAnnotation)  
\
@@ -725,7 +726,7 @@ struct FormatToken {
   bool isMemberAccess() const {
 return isOneOf(tok::arrow, tok::period, tok::arrowstar) &&
!isOneOf(TT_DesignatedInitializerPeriod, TT_TrailingReturnArrow,
-TT_LeadingJavaAnnotation);
+TT_LambdaArrow, TT_LeadingJavaAnnotation);
   }
 
   bool isPointerOrReference() const {
diff --git a/clang/lib/Format/TokenAnnotator.cpp 
b/clang/lib/Format/TokenAnnotator.cpp
index 851f79895ac5ac..07b42e79ba9a61 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -831,7 +831,7 @@ class AnnotatingParser {
 }
 // An arrow after an ObjC method expression is not a lambda arrow.
 if (CurrentToken->is(TT_ObjCMethodExpr) && CurrentToken->Next &&
-CurrentToken->Next->is(TT_TrailingReturnArrow)) {
+CurrentToken->Next->is(TT_LambdaArrow)) {
   CurrentToken->Next->overwriteFixedType(TT_Unknown);
 }
 Left->MatchingParen = CurrentToken;
@@ -1769,8 +1769,10 @@ class AnnotatingParser {
   }
   break;
 case tok::arrow:
-  if (Tok->Previous && Tok->Previous->is(tok::kw_noexcept))
+  if (Tok->isNot(TT_LambdaArrow) && Tok->Previous &&
+  Tok

[llvm-branch-commits] [clang] [clang-format] Revert "[clang-format][NFC] Delete TT_LambdaArrow (#70… (PR #106482)

2024-08-28 Thread Owen Pan via llvm-branch-commits

https://github.com/owenca milestoned 
https://github.com/llvm/llvm-project/pull/106482
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction (PR #106332)

2024-08-28 Thread via llvm-branch-commits

https://github.com/wangleiat updated 
https://github.com/llvm/llvm-project/pull/106332

>From b2e3659d23ff3a576e2967576d501b24d6466e87 Mon Sep 17 00:00:00 2001
From: wanglei 
Date: Wed, 28 Aug 2024 12:16:47 +0800
Subject: [PATCH] update test sextw-removal.ll

Created using spr 1.3.5-bogner
---
 llvm/test/CodeGen/LoongArch/sextw-removal.ll | 40 
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/llvm/test/CodeGen/LoongArch/sextw-removal.ll 
b/llvm/test/CodeGen/LoongArch/sextw-removal.ll
index 2bb39395c1d1b6..7500b5ae09359a 100644
--- a/llvm/test/CodeGen/LoongArch/sextw-removal.ll
+++ b/llvm/test/CodeGen/LoongArch/sextw-removal.ll
@@ -323,21 +323,17 @@ define void @test7(i32 signext %arg, i32 signext %arg1) 
nounwind {
 ; CHECK-NEXT:st.d $s2, $sp, 8 # 8-byte Folded Spill
 ; CHECK-NEXT:sra.w $a0, $a0, $a1
 ; CHECK-NEXT:lu12i.w $a1, 349525
-; CHECK-NEXT:ori $a1, $a1, 1365
-; CHECK-NEXT:lu32i.d $a1, 349525
-; CHECK-NEXT:lu52i.d $fp, $a1, 1365
+; CHECK-NEXT:ori $fp, $a1, 1365
+; CHECK-NEXT:bstrins.d $fp, $fp, 62, 32
 ; CHECK-NEXT:lu12i.w $a1, 209715
-; CHECK-NEXT:ori $a1, $a1, 819
-; CHECK-NEXT:lu32i.d $a1, 209715
-; CHECK-NEXT:lu52i.d $s0, $a1, 819
+; CHECK-NEXT:ori $s0, $a1, 819
+; CHECK-NEXT:bstrins.d $s0, $s0, 61, 32
 ; CHECK-NEXT:lu12i.w $a1, 61680
-; CHECK-NEXT:ori $a1, $a1, 3855
-; CHECK-NEXT:lu32i.d $a1, -61681
-; CHECK-NEXT:lu52i.d $s1, $a1, 240
+; CHECK-NEXT:ori $s1, $a1, 3855
+; CHECK-NEXT:bstrins.d $s1, $s1, 59, 32
 ; CHECK-NEXT:lu12i.w $a1, 4112
-; CHECK-NEXT:ori $a1, $a1, 257
-; CHECK-NEXT:lu32i.d $a1, 65793
-; CHECK-NEXT:lu52i.d $s2, $a1, 16
+; CHECK-NEXT:ori $s2, $a1, 257
+; CHECK-NEXT:bstrins.d $s2, $s2, 56, 32
 ; CHECK-NEXT:.p2align 4, , 16
 ; CHECK-NEXT:  .LBB6_1: # %bb2
 ; CHECK-NEXT:# =>This Inner Loop Header: Depth=1
@@ -374,21 +370,17 @@ define void @test7(i32 signext %arg, i32 signext %arg1) 
nounwind {
 ; NORMV-NEXT:st.d $s2, $sp, 8 # 8-byte Folded Spill
 ; NORMV-NEXT:sra.w $a0, $a0, $a1
 ; NORMV-NEXT:lu12i.w $a1, 349525
-; NORMV-NEXT:ori $a1, $a1, 1365
-; NORMV-NEXT:lu32i.d $a1, 349525
-; NORMV-NEXT:lu52i.d $fp, $a1, 1365
+; NORMV-NEXT:ori $fp, $a1, 1365
+; NORMV-NEXT:bstrins.d $fp, $fp, 62, 32
 ; NORMV-NEXT:lu12i.w $a1, 209715
-; NORMV-NEXT:ori $a1, $a1, 819
-; NORMV-NEXT:lu32i.d $a1, 209715
-; NORMV-NEXT:lu52i.d $s0, $a1, 819
+; NORMV-NEXT:ori $s0, $a1, 819
+; NORMV-NEXT:bstrins.d $s0, $s0, 61, 32
 ; NORMV-NEXT:lu12i.w $a1, 61680
-; NORMV-NEXT:ori $a1, $a1, 3855
-; NORMV-NEXT:lu32i.d $a1, -61681
-; NORMV-NEXT:lu52i.d $s1, $a1, 240
+; NORMV-NEXT:ori $s1, $a1, 3855
+; NORMV-NEXT:bstrins.d $s1, $s1, 59, 32
 ; NORMV-NEXT:lu12i.w $a1, 4112
-; NORMV-NEXT:ori $a1, $a1, 257
-; NORMV-NEXT:lu32i.d $a1, 65793
-; NORMV-NEXT:lu52i.d $s2, $a1, 16
+; NORMV-NEXT:ori $s2, $a1, 257
+; NORMV-NEXT:bstrins.d $s2, $s2, 56, 32
 ; NORMV-NEXT:.p2align 4, , 16
 ; NORMV-NEXT:  .LBB6_1: # %bb2
 ; NORMV-NEXT:# =>This Inner Loop Header: Depth=1

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction (PR #106332)

2024-08-28 Thread via llvm-branch-commits

https://github.com/wangleiat updated 
https://github.com/llvm/llvm-project/pull/106332

>From b2e3659d23ff3a576e2967576d501b24d6466e87 Mon Sep 17 00:00:00 2001
From: wanglei 
Date: Wed, 28 Aug 2024 12:16:47 +0800
Subject: [PATCH] update test sextw-removal.ll

Created using spr 1.3.5-bogner
---
 llvm/test/CodeGen/LoongArch/sextw-removal.ll | 40 
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/llvm/test/CodeGen/LoongArch/sextw-removal.ll 
b/llvm/test/CodeGen/LoongArch/sextw-removal.ll
index 2bb39395c1d1b6..7500b5ae09359a 100644
--- a/llvm/test/CodeGen/LoongArch/sextw-removal.ll
+++ b/llvm/test/CodeGen/LoongArch/sextw-removal.ll
@@ -323,21 +323,17 @@ define void @test7(i32 signext %arg, i32 signext %arg1) 
nounwind {
 ; CHECK-NEXT:st.d $s2, $sp, 8 # 8-byte Folded Spill
 ; CHECK-NEXT:sra.w $a0, $a0, $a1
 ; CHECK-NEXT:lu12i.w $a1, 349525
-; CHECK-NEXT:ori $a1, $a1, 1365
-; CHECK-NEXT:lu32i.d $a1, 349525
-; CHECK-NEXT:lu52i.d $fp, $a1, 1365
+; CHECK-NEXT:ori $fp, $a1, 1365
+; CHECK-NEXT:bstrins.d $fp, $fp, 62, 32
 ; CHECK-NEXT:lu12i.w $a1, 209715
-; CHECK-NEXT:ori $a1, $a1, 819
-; CHECK-NEXT:lu32i.d $a1, 209715
-; CHECK-NEXT:lu52i.d $s0, $a1, 819
+; CHECK-NEXT:ori $s0, $a1, 819
+; CHECK-NEXT:bstrins.d $s0, $s0, 61, 32
 ; CHECK-NEXT:lu12i.w $a1, 61680
-; CHECK-NEXT:ori $a1, $a1, 3855
-; CHECK-NEXT:lu32i.d $a1, -61681
-; CHECK-NEXT:lu52i.d $s1, $a1, 240
+; CHECK-NEXT:ori $s1, $a1, 3855
+; CHECK-NEXT:bstrins.d $s1, $s1, 59, 32
 ; CHECK-NEXT:lu12i.w $a1, 4112
-; CHECK-NEXT:ori $a1, $a1, 257
-; CHECK-NEXT:lu32i.d $a1, 65793
-; CHECK-NEXT:lu52i.d $s2, $a1, 16
+; CHECK-NEXT:ori $s2, $a1, 257
+; CHECK-NEXT:bstrins.d $s2, $s2, 56, 32
 ; CHECK-NEXT:.p2align 4, , 16
 ; CHECK-NEXT:  .LBB6_1: # %bb2
 ; CHECK-NEXT:# =>This Inner Loop Header: Depth=1
@@ -374,21 +370,17 @@ define void @test7(i32 signext %arg, i32 signext %arg1) 
nounwind {
 ; NORMV-NEXT:st.d $s2, $sp, 8 # 8-byte Folded Spill
 ; NORMV-NEXT:sra.w $a0, $a0, $a1
 ; NORMV-NEXT:lu12i.w $a1, 349525
-; NORMV-NEXT:ori $a1, $a1, 1365
-; NORMV-NEXT:lu32i.d $a1, 349525
-; NORMV-NEXT:lu52i.d $fp, $a1, 1365
+; NORMV-NEXT:ori $fp, $a1, 1365
+; NORMV-NEXT:bstrins.d $fp, $fp, 62, 32
 ; NORMV-NEXT:lu12i.w $a1, 209715
-; NORMV-NEXT:ori $a1, $a1, 819
-; NORMV-NEXT:lu32i.d $a1, 209715
-; NORMV-NEXT:lu52i.d $s0, $a1, 819
+; NORMV-NEXT:ori $s0, $a1, 819
+; NORMV-NEXT:bstrins.d $s0, $s0, 61, 32
 ; NORMV-NEXT:lu12i.w $a1, 61680
-; NORMV-NEXT:ori $a1, $a1, 3855
-; NORMV-NEXT:lu32i.d $a1, -61681
-; NORMV-NEXT:lu52i.d $s1, $a1, 240
+; NORMV-NEXT:ori $s1, $a1, 3855
+; NORMV-NEXT:bstrins.d $s1, $s1, 59, 32
 ; NORMV-NEXT:lu12i.w $a1, 4112
-; NORMV-NEXT:ori $a1, $a1, 257
-; NORMV-NEXT:lu32i.d $a1, 65793
-; NORMV-NEXT:lu52i.d $s2, $a1, 16
+; NORMV-NEXT:ori $s2, $a1, 257
+; NORMV-NEXT:bstrins.d $s2, $s2, 56, 32
 ; NORMV-NEXT:.p2align 4, , 16
 ; NORMV-NEXT:  .LBB6_1: # %bb2
 ; NORMV-NEXT:# =>This Inner Loop Header: Depth=1

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] Introduce custom loop nest generation for loops in workshare construct (PR #101445)

2024-08-28 Thread Ivan R. Ivanov via llvm-branch-commits

ivanradanov wrote:


> ... However, they would work if they ran after the pass lowering 
> `omp.workshare` to a set of `omp.single` for the code in between 
> `omp.wsloop`s. That way we would not have to introduce a new loop wrapper and 
> also we could create passes assuming the parent of region of an `omp.wsloop` 
> is executed by all threads in the team. I don't think that should be an 
> issue, since in principle it makes sense to me that the `omp.workshare` 
> transformation would run immediately after PFT to MLIR lowering. What do you 
> think about that alternative?

Ideally, the `omp.workshare` lowering will run after the HLIF to FIR lowering, 
because missing the high level optimizations that HLFIR provides can result in 
very bad performance (unneeded temporary arrays, unnecessary copies, non-fused 
array computation, etc). The workshare lowering transforms the 
`omp.workshare.loop_wrapper`s into `omp.wsloop`s so they are gone after that.

Another factor is that there may not be PFT->loop lowerings for many constructs 
that need to be divided into units of work. so we may need to first generate 
HLFIR and alter the lowerings from HLFIR to FIR to get the `omp.wsloop` (or 
`omp.workshare.loop_wrapper`), which means that there will be portions of the 
pipeline (from PFT->HLFIR until HLFIR->FIR) where a `omp.wsloop` nested in an 
`omp.workshare` will be the wrong representation.

Are there any concerns with adding `omp.workshare.loop_wrapper`? I do not see 
that big of an overhead (maintenance or compile time) resulting from its 
addition, while it makes things clearer and more robust in my opinion.

https://github.com/llvm/llvm-project/pull/101445
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)

2024-08-28 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/105577

>From 9e23baea4d3444e7e0bccdf39b738f404abfe265 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 21 Aug 2024 20:15:55 +0400
Subject: [PATCH] DAG: Check if is_fpclass is custom, instead of
 isLegalOrCustom

For some reason, isOperationLegalOrCustom is not the same as
isOperationLegal || isOperationCustom. Unfortunately, it checks
if the type is legal which makes it uesless for custom lowering
on non-legal types (which is always ppcf128).

Really the DAG builder shouldn't be going to expand this in the
builder, it makes it difficult to work with. It's only here to work
around the DAG requiring legal integer types the same size as
the FP type after type legalization.
---
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   3 +-
 llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp |  17 +-
 llvm/test/CodeGen/AMDGPU/fract-match.ll   |  10 +-
 .../CodeGen/AMDGPU/llvm.is.fpclass.f16.ll | 205 +++---
 llvm/test/CodeGen/PowerPC/is_fpclass.ll   |  37 ++--
 5 files changed, 160 insertions(+), 112 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index d103308cce566a..ad24704d940a36 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -7031,7 +7031,8 @@ void SelectionDAGBuilder::visitIntrinsicCall(const 
CallInst &I,
 // If ISD::IS_FPCLASS should be expanded, do it right now, because the
 // expansion can use illegal types. Making expansion early allows
 // legalizing these types prior to selection.
-if (!TLI.isOperationLegalOrCustom(ISD::IS_FPCLASS, ArgVT)) {
+if (!TLI.isOperationLegal(ISD::IS_FPCLASS, ArgVT) &&
+!TLI.isOperationCustom(ISD::IS_FPCLASS, ArgVT)) {
   SDValue Result = TLI.expandIS_FPCLASS(DestVT, Op, Test, Flags, sdl, DAG);
   setValue(&I, Result);
   return;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 96143d688801aa..d24836b7eeb095 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -426,12 +426,17 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const 
TargetMachine &TM,
   // FIXME: These IS_FPCLASS vector fp types are marked custom so it reaches
   // scalarization code. Can be removed when IS_FPCLASS expand isn't called by
   // default unless marked custom/legal.
-  setOperationAction(
-  ISD::IS_FPCLASS,
-  {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16, MVT::v2f32, MVT::v3f32,
-   MVT::v4f32, MVT::v5f32, MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32,
-   MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64, MVT::v16f64},
-  Custom);
+  setOperationAction(ISD::IS_FPCLASS,
+ {MVT::v2f32, MVT::v3f32, MVT::v4f32, MVT::v5f32,
+  MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32,
+  MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64,
+  MVT::v16f64},
+ Custom);
+
+  if (isTypeLegal(MVT::f16))
+setOperationAction(ISD::IS_FPCLASS,
+   {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16},
+   Custom);
 
   // Expand to fneg + fadd.
   setOperationAction(ISD::FSUB, MVT::f64, Expand);
diff --git a/llvm/test/CodeGen/AMDGPU/fract-match.ll 
b/llvm/test/CodeGen/AMDGPU/fract-match.ll
index 1b28ddb2c58620..b212b9caf8400e 100644
--- a/llvm/test/CodeGen/AMDGPU/fract-match.ll
+++ b/llvm/test/CodeGen/AMDGPU/fract-match.ll
@@ -2135,16 +2135,16 @@ define <2 x half> @safe_math_fract_v2f16(<2 x half> %x, 
ptr addrspace(1) nocaptu
 ; GFX8-LABEL: safe_math_fract_v2f16:
 ; GFX8:   ; %bb.0: ; %entry
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v6, 0x204
+; GFX8-NEXT:s_movk_i32 s6, 0x204
 ; GFX8-NEXT:v_floor_f16_sdwa v3, v0 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:WORD_1
 ; GFX8-NEXT:v_floor_f16_e32 v4, v0
-; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, v6 src0_sel:WORD_1 
src1_sel:DWORD
+; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:WORD_1
+; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, s6 src0_sel:WORD_1 
src1_sel:DWORD
 ; GFX8-NEXT:v_pack_b32_f16 v3, v4, v3
 ; GFX8-NEXT:v_fract_f16_e32 v4, v0
-; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:WORD_1
-; GFX8-NEXT:v_cmp_class_f16_e32 vcc, v0, v6
 ; GFX8-NEXT:v_cndmask_b32_e64 v5, v5, 0, s[4:5]
-; GFX8-NEXT:v_cndmask_b32_e64 v0, v4, 0, vcc
+; GFX8-NEXT:v_cmp_class_f16_e64 s[4:5], v0, s6
+; GFX8-NEXT:v_cndmask_b32_e64 v0, v4, 0, s[4:5]
 ; GFX8-NEXT:v_pack_b32_f16 v0, v0, v5
 ; GFX8-NEXT:global_store_dword v[1:2], v3, off
 ; GFX8-NEXT:s_waitcnt vmcnt(0)
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.l

[llvm-branch-commits] [llvm] DAG: Handle lowering unordered compare with inf (PR #100378)

2024-08-28 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/100378

>From 4edffb2750e8320c39109cd7c9c086c2ee86e9d4 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 7 Feb 2023 12:22:05 -0400
Subject: [PATCH 1/3] DAG: Handle lowering unordered compare with inf

Try to take advantage of the nan check behavior of fcmp.
x86_64 looks better, x86_32 looks worse.
---
 llvm/include/llvm/CodeGen/CodeGenCommonISel.h |  7 +-
 llvm/lib/CodeGen/CodeGenCommonISel.cpp|  8 +-
 .../CodeGen/SelectionDAG/TargetLowering.cpp   | 53 +++--
 llvm/test/CodeGen/X86/is_fpclass.ll   | 78 +--
 4 files changed, 83 insertions(+), 63 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/CodeGenCommonISel.h 
b/llvm/include/llvm/CodeGen/CodeGenCommonISel.h
index 90ef890f22d1b1..e4b2e20babc07a 100644
--- a/llvm/include/llvm/CodeGen/CodeGenCommonISel.h
+++ b/llvm/include/llvm/CodeGen/CodeGenCommonISel.h
@@ -218,10 +218,15 @@ findSplitPointForStackProtector(MachineBasicBlock *BB,
 /// Evaluates if the specified FP class test is better performed as the inverse
 /// (i.e. fewer instructions should be required to lower it).  An example is 
the
 /// test "inf|normal|subnormal|zero", which is an inversion of "nan".
+///
 /// \param Test The test as specified in 'is_fpclass' intrinsic invocation.
+///
+/// \param UseFCmp The intention is to perform the comparison using
+/// floating-point compare instructions which check for nan.
+///
 /// \returns The inverted test, or fcNone, if inversion does not produce a
 /// simpler test.
-FPClassTest invertFPClassTestIfSimpler(FPClassTest Test);
+FPClassTest invertFPClassTestIfSimpler(FPClassTest Test, bool UseFCmp);
 
 /// Assuming the instruction \p MI is going to be deleted, attempt to salvage
 /// debug users of \p MI by writing the effect of \p MI in a DIExpression.
diff --git a/llvm/lib/CodeGen/CodeGenCommonISel.cpp 
b/llvm/lib/CodeGen/CodeGenCommonISel.cpp
index fe144d3c182039..d985751e2be0be 100644
--- a/llvm/lib/CodeGen/CodeGenCommonISel.cpp
+++ b/llvm/lib/CodeGen/CodeGenCommonISel.cpp
@@ -173,8 +173,9 @@ llvm::findSplitPointForStackProtector(MachineBasicBlock *BB,
   return SplitPoint;
 }
 
-FPClassTest llvm::invertFPClassTestIfSimpler(FPClassTest Test) {
+FPClassTest llvm::invertFPClassTestIfSimpler(FPClassTest Test, bool UseFCmp) {
   FPClassTest InvertedTest = ~Test;
+
   // Pick the direction with fewer tests
   // TODO: Handle more combinations of cases that can be handled together
   switch (static_cast(InvertedTest)) {
@@ -200,6 +201,11 @@ FPClassTest llvm::invertFPClassTestIfSimpler(FPClassTest 
Test) {
   case fcSubnormal | fcZero:
   case fcSubnormal | fcZero | fcNan:
 return InvertedTest;
+  case fcInf | fcNan:
+// If we're trying to use fcmp, we can take advantage of the nan check
+// behavior of the compare (but this is more instructions in the integer
+// expansion).
+return UseFCmp ? InvertedTest : fcNone;
   default:
 return fcNone;
   }
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4e796289cff0a1..1e3a0da0f3be5b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -8672,7 +8672,7 @@ SDValue TargetLowering::expandIS_FPCLASS(EVT ResultVT, 
SDValue Op,
   // Degenerated cases.
   if (Test == fcNone)
 return DAG.getBoolConstant(false, DL, ResultVT, OperandVT);
-  if ((Test & fcAllFlags) == fcAllFlags)
+  if (Test == fcAllFlags)
 return DAG.getBoolConstant(true, DL, ResultVT, OperandVT);
 
   // PPC double double is a pair of doubles, of which the higher part 
determines
@@ -8683,14 +8683,6 @@ SDValue TargetLowering::expandIS_FPCLASS(EVT ResultVT, 
SDValue Op,
 OperandVT = MVT::f64;
   }
 
-  // Some checks may be represented as inversion of simpler check, for example
-  // "inf|normal|subnormal|zero" => !"nan".
-  bool IsInverted = false;
-  if (FPClassTest InvertedCheck = invertFPClassTestIfSimpler(Test)) {
-IsInverted = true;
-Test = InvertedCheck;
-  }
-
   // Floating-point type properties.
   EVT ScalarFloatVT = OperandVT.getScalarType();
   const Type *FloatTy = ScalarFloatVT.getTypeForEVT(*DAG.getContext());
@@ -8702,9 +8694,16 @@ SDValue TargetLowering::expandIS_FPCLASS(EVT ResultVT, 
SDValue Op,
   if (Flags.hasNoFPExcept() &&
   isOperationLegalOrCustom(ISD::SETCC, OperandVT.getScalarType())) {
 FPClassTest FPTestMask = Test;
+bool IsInvertedFP = false;
+
+if (FPClassTest InvertedFPCheck =
+invertFPClassTestIfSimpler(FPTestMask, true)) {
+  FPTestMask = InvertedFPCheck;
+  IsInvertedFP = true;
+}
 
-ISD::CondCode OrderedCmpOpcode = IsInverted ? ISD::SETUNE : ISD::SETOEQ;
-ISD::CondCode UnorderedCmpOpcode = IsInverted ? ISD::SETONE : ISD::SETUEQ;
+ISD::CondCode OrderedCmpOpcode = IsInvertedFP ? ISD::SETUNE : ISD::SETOEQ;
+ISD::CondCode UnorderedCmpOpcode = IsInvertedFP ? ISD::

[llvm-branch-commits] [llvm] DAG: Lower single infinity is.fpclass tests to fcmp (PR #100380)

2024-08-28 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/100380

>From 7d48a3885d59edef708def4fada703032318a63e Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 1 Feb 2023 09:52:34 -0400
Subject: [PATCH] DAG: Lower single infinity is.fpclass tests to fcmp

InstCombine also should have taken care of this, but this
should be helpful when the fcmp based lowering strategy tries
to combine multiple tests.
---
 llvm/lib/CodeGen/CodeGenCommonISel.cpp|  2 +
 .../CodeGen/SelectionDAG/TargetLowering.cpp   | 16 
 llvm/test/CodeGen/X86/is_fpclass.ll   | 92 ---
 3 files changed, 54 insertions(+), 56 deletions(-)

diff --git a/llvm/lib/CodeGen/CodeGenCommonISel.cpp 
b/llvm/lib/CodeGen/CodeGenCommonISel.cpp
index d985751e2be0be..4cd2f6ae2fdb11 100644
--- a/llvm/lib/CodeGen/CodeGenCommonISel.cpp
+++ b/llvm/lib/CodeGen/CodeGenCommonISel.cpp
@@ -202,6 +202,8 @@ FPClassTest llvm::invertFPClassTestIfSimpler(FPClassTest 
Test, bool UseFCmp) {
   case fcSubnormal | fcZero | fcNan:
 return InvertedTest;
   case fcInf | fcNan:
+  case fcPosInf | fcNan:
+  case fcNegInf | fcNan:
 // If we're trying to use fcmp, we can take advantage of the nan check
 // behavior of the compare (but this is more instructions in the integer
 // expansion).
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index aa022480947a7d..e3fdea34f895ba 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -8751,6 +8751,22 @@ SDValue TargetLowering::expandIS_FPCLASS(EVT ResultVT, 
SDValue Op,
   IsOrderedInf ? OrderedCmpOpcode : 
UnorderedCmpOpcode);
 }
 
+if ((OrderedFPTestMask == fcPosInf || OrderedFPTestMask == fcNegInf) &&
+isCondCodeLegalOrCustom(IsOrdered ? OrderedCmpOpcode
+  : UnorderedCmpOpcode,
+OperandVT.getSimpleVT())) {
+  // isposinf(x) --> x == inf
+  // isneginf(x) --> x == -inf
+  // isposinf(x) || nan --> x u== inf
+  // isneginf(x) || nan --> x u== -inf
+
+  SDValue Inf = DAG.getConstantFP(
+  APFloat::getInf(Semantics, OrderedFPTestMask == fcNegInf), DL,
+  OperandVT);
+  return DAG.getSetCC(DL, ResultVT, Op, Inf,
+  IsOrdered ? OrderedCmpOpcode : UnorderedCmpOpcode);
+}
+
 if (OrderedFPTestMask == (fcSubnormal | fcZero) && !IsOrdered) {
   // TODO: Could handle ordered case, but it produces worse code for
   // x86. Maybe handle ordered if fabs is free?
diff --git a/llvm/test/CodeGen/X86/is_fpclass.ll 
b/llvm/test/CodeGen/X86/is_fpclass.ll
index cc4d4c4543a515..97136dafa6c2c0 100644
--- a/llvm/test/CodeGen/X86/is_fpclass.ll
+++ b/llvm/test/CodeGen/X86/is_fpclass.ll
@@ -2116,24 +2116,19 @@ entry:
 define i1 @is_plus_inf_or_nan_f(float %x) {
 ; X86-LABEL: is_plus_inf_or_nan_f:
 ; X86:   # %bb.0:
-; X86-NEXT:movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:cmpl $2139095040, %eax # imm = 0x7F80
-; X86-NEXT:sete %cl
-; X86-NEXT:andl $2147483647, %eax # imm = 0x7FFF
-; X86-NEXT:cmpl $2139095041, %eax # imm = 0x7F81
-; X86-NEXT:setge %al
-; X86-NEXT:orb %cl, %al
+; X86-NEXT:flds {{[0-9]+}}(%esp)
+; X86-NEXT:flds {{\.?LCPI[0-9]+_[0-9]+}}
+; X86-NEXT:fucompp
+; X86-NEXT:fnstsw %ax
+; X86-NEXT:# kill: def $ah killed $ah killed $ax
+; X86-NEXT:sahf
+; X86-NEXT:sete %al
 ; X86-NEXT:retl
 ;
 ; X64-LABEL: is_plus_inf_or_nan_f:
 ; X64:   # %bb.0:
-; X64-NEXT:movd %xmm0, %eax
-; X64-NEXT:cmpl $2139095040, %eax # imm = 0x7F80
-; X64-NEXT:sete %cl
-; X64-NEXT:andl $2147483647, %eax # imm = 0x7FFF
-; X64-NEXT:cmpl $2139095041, %eax # imm = 0x7F81
-; X64-NEXT:setge %al
-; X64-NEXT:orb %cl, %al
+; X64-NEXT:ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; X64-NEXT:sete %al
 ; X64-NEXT:retq
   %class = tail call i1 @llvm.is.fpclass.f32(float %x, i32 515)  ; 0x200|0x3 = 
"+inf|nan"
   ret i1 %class
@@ -2142,24 +2137,19 @@ define i1 @is_plus_inf_or_nan_f(float %x) {
 define i1 @is_minus_inf_or_nan_f(float %x) {
 ; X86-LABEL: is_minus_inf_or_nan_f:
 ; X86:   # %bb.0:
-; X86-NEXT:movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:cmpl $-8388608, %eax # imm = 0xFF80
-; X86-NEXT:sete %cl
-; X86-NEXT:andl $2147483647, %eax # imm = 0x7FFF
-; X86-NEXT:cmpl $2139095041, %eax # imm = 0x7F81
-; X86-NEXT:setge %al
-; X86-NEXT:orb %cl, %al
+; X86-NEXT:flds {{[0-9]+}}(%esp)
+; X86-NEXT:flds {{\.?LCPI[0-9]+_[0-9]+}}
+; X86-NEXT:fucompp
+; X86-NEXT:fnstsw %ax
+; X86-NEXT:# kill: def $ah killed $ah killed $ax
+; X86-NEXT:sahf
+; X86-NEXT:sete %al
 ; X86-NEXT:retl
 ;
 ; X64-LABEL: is_minus_inf_or_nan_f:
 ; X64:   # %bb.0:
-; X64-NEXT:movd %xmm0, %eax
-; X64-NEXT:cmpl $-8388608, %eax # imm 

[llvm-branch-commits] [llvm] DAG: Lower fcNormal is.fpclass to compare with inf (PR #100389)

2024-08-28 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/100389

>From f5da09293f633b8c4eb23de1a5c912a2546d1b9a Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 1 Feb 2023 09:06:59 -0400
Subject: [PATCH] DAG: Lower fcNormal is.fpclass to compare with inf

Looks worse for x86 without the fabs check. Not sure if
this is useful for any targets.
---
 .../CodeGen/SelectionDAG/TargetLowering.cpp   | 25 +++
 1 file changed, 25 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index e3fdea34f895ba..ff3aab645f24b4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -8787,6 +8787,31 @@ SDValue TargetLowering::expandIS_FPCLASS(EVT ResultVT, 
SDValue Op,
 IsOrdered ? OrderedOp : UnorderedOp);
   }
 }
+
+if (FPTestMask == fcNormal) {
+  // TODO: Handle unordered
+  ISD::CondCode IsFiniteOp = IsInvertedFP ? ISD::SETUGE : ISD::SETOLT;
+  ISD::CondCode IsNormalOp = IsInvertedFP ? ISD::SETOLT : ISD::SETUGE;
+
+  if (isCondCodeLegalOrCustom(IsFiniteOp,
+  OperandVT.getScalarType().getSimpleVT()) &&
+  isCondCodeLegalOrCustom(IsNormalOp,
+  OperandVT.getScalarType().getSimpleVT()) &&
+  isFAbsFree(OperandVT)) {
+// isnormal(x) --> fabs(x) < infinity && !(fabs(x) < smallest_normal)
+SDValue Inf =
+DAG.getConstantFP(APFloat::getInf(Semantics), DL, OperandVT);
+SDValue SmallestNormal = DAG.getConstantFP(
+APFloat::getSmallestNormalized(Semantics), DL, OperandVT);
+
+SDValue Abs = DAG.getNode(ISD::FABS, DL, OperandVT, Op);
+SDValue IsFinite = DAG.getSetCC(DL, ResultVT, Abs, Inf, IsFiniteOp);
+SDValue IsNormal =
+DAG.getSetCC(DL, ResultVT, Abs, SmallestNormal, IsNormalOp);
+unsigned LogicOp = IsInvertedFP ? ISD::OR : ISD::AND;
+return DAG.getNode(LogicOp, DL, ResultVT, IsFinite, IsNormal);
+  }
+}
   }
 
   // Some checks may be represented as inversion of simpler check, for example

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Lower single infinity is.fpclass tests to fcmp (PR #100380)

2024-08-28 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/100380
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: workflows/release-tasks: Pass required secrets to all called workflows (#106286) (PR #106491)

2024-08-28 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/106491

Backport 9d81e7e36e33aecdee05fef551c0652abafaa052

Requested by: @tstellar

>From c3beefa91b9e50c97a4ab7c32b40771d9fd0f97e Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Wed, 28 Aug 2024 22:18:08 -0700
Subject: [PATCH] workflows/release-tasks: Pass required secrets to all called
 workflows (#106286)

Called workflows don't have access to secrets by default, so we need to
explicitly pass secrets that we use.

(cherry picked from commit 9d81e7e36e33aecdee05fef551c0652abafaa052)
---
 .github/workflows/release-doxygen.yml |  7 ++-
 .github/workflows/release-lit.yml |  7 ++-
 .github/workflows/release-sources.yml |  4 
 .github/workflows/release-tasks.yml   | 12 
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/release-doxygen.yml 
b/.github/workflows/release-doxygen.yml
index ef00a438ce7ac4..ea95e5bb12b2b8 100644
--- a/.github/workflows/release-doxygen.yml
+++ b/.github/workflows/release-doxygen.yml
@@ -25,6 +25,10 @@ on:
 description: 'Upload documentation'
 required: false
 type: boolean
+secrets:
+  RELEASE_TASKS_USER_TOKEN:
+description: "Secret used to check user permissions."
+required: false
 
 jobs:
   release-doxygen:
@@ -63,5 +67,6 @@ jobs:
 if: env.upload
 env:
   GITHUB_TOKEN: ${{ github.token }}
+  USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 run: |
-  ./llvm/utils/release/github-upload-release.py --token 
"$GITHUB_TOKEN" --release "${{ inputs.release-version }}" --user "${{ 
github.actor }}" upload --files ./*doxygen*.tar.xz
+  ./llvm/utils/release/github-upload-release.py --token 
"$GITHUB_TOKEN" --release "${{ inputs.release-version }}" --user "${{ 
github.actor }}" --user-token "$USER_TOKEN" upload --files ./*doxygen*.tar.xz
diff --git a/.github/workflows/release-lit.yml 
b/.github/workflows/release-lit.yml
index 0316ba406041d6..9d6f3140e68830 100644
--- a/.github/workflows/release-lit.yml
+++ b/.github/workflows/release-lit.yml
@@ -17,6 +17,10 @@ on:
 description: 'Release Version'
 required: true
 type: string
+secrets:
+  RELEASE_TASKS_USER_TOKEN:
+description: "Secret used to check user permissions."
+required: false
 
 jobs:
   release-lit:
@@ -36,8 +40,9 @@ jobs:
   - name: Check Permissions
 env:
   GITHUB_TOKEN: ${{ github.token }}
+  USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 run: |
-  ./llvm/utils/release/./github-upload-release.py --token 
"$GITHUB_TOKEN" --user ${{ github.actor }} check-permissions
+  ./llvm/utils/release/./github-upload-release.py --token 
"$GITHUB_TOKEN" --user ${{ github.actor }} --user-token "$USER_TOKEN" 
check-permissions
 
   - name: Setup Cpp
 uses: aminya/setup-cpp@v1
diff --git a/.github/workflows/release-sources.yml 
b/.github/workflows/release-sources.yml
index 9c5b1a9f017092..edb0449ef7e2c2 100644
--- a/.github/workflows/release-sources.yml
+++ b/.github/workflows/release-sources.yml
@@ -16,6 +16,10 @@ on:
 description: Release Version
 required: true
 type: string
+secrets:
+  RELEASE_TASKS_USER_TOKEN:
+description: "Secret used to check user permissions."
+required: false
   # Run on pull_requests for testing purposes.
   pull_request:
 paths:
diff --git a/.github/workflows/release-tasks.yml 
b/.github/workflows/release-tasks.yml
index cf42730aaf8170..780dd0ff6325c9 100644
--- a/.github/workflows/release-tasks.yml
+++ b/.github/workflows/release-tasks.yml
@@ -66,6 +66,9 @@ jobs:
 with:
   release-version: ${{ needs.validate-tag.outputs.release-version }}
   upload: true
+# Called workflows don't have access to secrets by default, so we need to 
explicitly pass secrets that we use.
+secrets:
+  RELEASE_TASKS_USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 
   release-lit:
 name: Release Lit
@@ -73,6 +76,9 @@ jobs:
 uses: ./.github/workflows/release-lit.yml
 with:
   release-version: ${{ needs.validate-tag.outputs.release-version }}
+# Called workflows don't have access to secrets by default, so we need to 
explicitly pass secrets that we use.
+secrets:
+  RELEASE_TASKS_USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 
   release-binaries:
 name: Build Release Binaries
@@ -97,6 +103,9 @@ jobs:
   release-version: ${{ needs.validate-tag.outputs.release-version }}
   upload: true
   runs-on: ${{ matrix.runs-on }}
+# Called workflows don't have access to secrets by default, so we need to 
explicitly pass secrets that we use.
+secrets:
+  RELEASE_TASKS_USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 
   release-sources:
 name: Package Release Sources
@@ -109,3 +118,6 @@ jobs:
 uses: ./.github/workflows/release-sources.yml
   

[llvm-branch-commits] [llvm] release/19.x: workflows/release-tasks: Pass required secrets to all called workflows (#106286) (PR #106491)

2024-08-28 Thread via llvm-branch-commits

llvmbot wrote:

@tru What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/106491
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: workflows/release-tasks: Pass required secrets to all called workflows (#106286) (PR #106491)

2024-08-28 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/106491
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: workflows/release-tasks: Pass required secrets to all called workflows (#106286) (PR #106491)

2024-08-28 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-github-workflow

Author: None (llvmbot)


Changes

Backport 9d81e7e36e33aecdee05fef551c0652abafaa052

Requested by: @tstellar

---
Full diff: https://github.com/llvm/llvm-project/pull/106491.diff


4 Files Affected:

- (modified) .github/workflows/release-doxygen.yml (+6-1) 
- (modified) .github/workflows/release-lit.yml (+6-1) 
- (modified) .github/workflows/release-sources.yml (+4) 
- (modified) .github/workflows/release-tasks.yml (+12) 


``diff
diff --git a/.github/workflows/release-doxygen.yml 
b/.github/workflows/release-doxygen.yml
index ef00a438ce7ac4..ea95e5bb12b2b8 100644
--- a/.github/workflows/release-doxygen.yml
+++ b/.github/workflows/release-doxygen.yml
@@ -25,6 +25,10 @@ on:
 description: 'Upload documentation'
 required: false
 type: boolean
+secrets:
+  RELEASE_TASKS_USER_TOKEN:
+description: "Secret used to check user permissions."
+required: false
 
 jobs:
   release-doxygen:
@@ -63,5 +67,6 @@ jobs:
 if: env.upload
 env:
   GITHUB_TOKEN: ${{ github.token }}
+  USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 run: |
-  ./llvm/utils/release/github-upload-release.py --token 
"$GITHUB_TOKEN" --release "${{ inputs.release-version }}" --user "${{ 
github.actor }}" upload --files ./*doxygen*.tar.xz
+  ./llvm/utils/release/github-upload-release.py --token 
"$GITHUB_TOKEN" --release "${{ inputs.release-version }}" --user "${{ 
github.actor }}" --user-token "$USER_TOKEN" upload --files ./*doxygen*.tar.xz
diff --git a/.github/workflows/release-lit.yml 
b/.github/workflows/release-lit.yml
index 0316ba406041d6..9d6f3140e68830 100644
--- a/.github/workflows/release-lit.yml
+++ b/.github/workflows/release-lit.yml
@@ -17,6 +17,10 @@ on:
 description: 'Release Version'
 required: true
 type: string
+secrets:
+  RELEASE_TASKS_USER_TOKEN:
+description: "Secret used to check user permissions."
+required: false
 
 jobs:
   release-lit:
@@ -36,8 +40,9 @@ jobs:
   - name: Check Permissions
 env:
   GITHUB_TOKEN: ${{ github.token }}
+  USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 run: |
-  ./llvm/utils/release/./github-upload-release.py --token 
"$GITHUB_TOKEN" --user ${{ github.actor }} check-permissions
+  ./llvm/utils/release/./github-upload-release.py --token 
"$GITHUB_TOKEN" --user ${{ github.actor }} --user-token "$USER_TOKEN" 
check-permissions
 
   - name: Setup Cpp
 uses: aminya/setup-cpp@v1
diff --git a/.github/workflows/release-sources.yml 
b/.github/workflows/release-sources.yml
index 9c5b1a9f017092..edb0449ef7e2c2 100644
--- a/.github/workflows/release-sources.yml
+++ b/.github/workflows/release-sources.yml
@@ -16,6 +16,10 @@ on:
 description: Release Version
 required: true
 type: string
+secrets:
+  RELEASE_TASKS_USER_TOKEN:
+description: "Secret used to check user permissions."
+required: false
   # Run on pull_requests for testing purposes.
   pull_request:
 paths:
diff --git a/.github/workflows/release-tasks.yml 
b/.github/workflows/release-tasks.yml
index cf42730aaf8170..780dd0ff6325c9 100644
--- a/.github/workflows/release-tasks.yml
+++ b/.github/workflows/release-tasks.yml
@@ -66,6 +66,9 @@ jobs:
 with:
   release-version: ${{ needs.validate-tag.outputs.release-version }}
   upload: true
+# Called workflows don't have access to secrets by default, so we need to 
explicitly pass secrets that we use.
+secrets:
+  RELEASE_TASKS_USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 
   release-lit:
 name: Release Lit
@@ -73,6 +76,9 @@ jobs:
 uses: ./.github/workflows/release-lit.yml
 with:
   release-version: ${{ needs.validate-tag.outputs.release-version }}
+# Called workflows don't have access to secrets by default, so we need to 
explicitly pass secrets that we use.
+secrets:
+  RELEASE_TASKS_USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 
   release-binaries:
 name: Build Release Binaries
@@ -97,6 +103,9 @@ jobs:
   release-version: ${{ needs.validate-tag.outputs.release-version }}
   upload: true
   runs-on: ${{ matrix.runs-on }}
+# Called workflows don't have access to secrets by default, so we need to 
explicitly pass secrets that we use.
+secrets:
+  RELEASE_TASKS_USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}
 
   release-sources:
 name: Package Release Sources
@@ -109,3 +118,6 @@ jobs:
 uses: ./.github/workflows/release-sources.yml
 with:
   release-version: ${{ needs.validate-tag.outputs.release-version }}
+# Called workflows don't have access to secrets by default, so we need to 
explicitly pass secrets that we use.
+secrets:
+  RELEASE_TASKS_USER_TOKEN: ${{ secrets.RELEASE_TASKS_USER_TOKEN }}

``




https://github.com/llvm/llvm-proje

[llvm-branch-commits] [llvm] release/19.x: workflows/release-tasks: Pass required secrets to all called workflows (#106286) (PR #106491)

2024-08-28 Thread Tobias Hieta via llvm-branch-commits

https://github.com/tru approved this pull request.


https://github.com/llvm/llvm-project/pull/106491
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Only parse probes for profiled functions in profile-write-pseudo-probes mode (PR #106365)

2024-08-28 Thread Lei Wang via llvm-branch-commits

https://github.com/wlei-llvm approved this pull request.

LGTM, thanks.

https://github.com/llvm/llvm-project/pull/106365
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits